Statistics and Probability for Machine Learning

5 min readJan 6, 2021

Machine Learning uses statistics, probability, algorithms to learn from data and provide insights which can be used to build smart applications.

Probability and statistics are related areas of mathematics which concern themselves with analyzing the relative frequency of events.

Probability

Probability is given by the number of ways the particular event can occur divided by the total number of possible outcomes.

Joint Probability

Probability of events A and B denoted by P(A∩B) is the probability that events A and B both occur.

P(A∩B)=P(A).P(B)

This only applies if Aand Bare independent, which means that if Aoccurred, that doesn’t change the probability of B, and vice versa.

Conditional Probability

Let us consider A and B are not independent, because if A occurred, the probability of B is higher. When A and B are not independent, it is often useful to compute the conditional probability, P (A|B), which is the probability of A given that B occurred:

P(A|B)=P(A∩B)/P(B)

Bayes’ Theorem

Bayes’s theorem is a relationship between the conditional probabilities of two events. For example, if we want to find the probability of selling ice cream on a hot and sunny day, Bayes’ theorem gives us the tools to use prior knowledge about the likelihood of selling ice cream on any other type of day (rainy, windy, snowy etc.).

Descriptive Statistics

Descriptive statistics refers to methods for summarizing and organizing the information in a data set. We will use below table to describe some of the statistical concepts

Measures of Center: Mean, Median, Mode

Mean

The mean is the arithmetic average of a data set. To calculate the mean, add up the values and divide by the number of values. The sample mean is the arithmetic average of a sample, and is denoted x̄ (“x-bar”). The population mean is the arithmetic average of a population, and is denoted 𝜇 (“myu”, the Greek letter for m).

Median

The median is the middle data value, when there is an odd number of data values and the data have been sorted into ascending order. If there is an even number, the median is the mean of the two middle data values.

Mode

The mode is the data value that occurs with the greatest frequency. Both quantitative and categorical variables can have modes, but only quantitative variables can have means or medians.

Measures of Variability: Range, Variance, Standard Deviation

Quantify the amount of variation, spread or dispersion present in the data.

Range

The range of a variable equals the difference between the maximum and minimum values.

Variance

Population variance is defined as the average of the squared differences from the Mean, denoted as 𝜎² (“sigma-squared”):

Larger Variance means the data are more spread out.

Standard Deviation

The standard deviation or sd of a bunch of numbers tells you how much the individual numbers tend to differ from the mean.

The sample standard deviation is the square root of the sample variance

Measures of Position: Percentile, Z-score, Quartiles

Indicate the relative position of a particular data value in the data distribution.

Percentile

The pth percentile of a data set is the data value such that p percent of the values in the data set are at or below this value. The 50th percentile is the median. For example, the median income is $32,150, and 50% of the data values lie at or below this value.

Percentile rank

The percentile rank of a data value equals the percentage of values in the data set that are at or below that value. For example, the percentile rank. of Applicant 1’s income of $38,000 is 90%, since that is the percentage of incomes equal to or less than $38,000.

Interquartile Range (IQR)

The first quartile (Q1) is the 25th percentile of a data set; the second quartile (Q2) is the 50th percentile (median); and the third quartile (Q3) is the 75th percentile.

The IQR measures the difference between 75th and 25th observation using the formula: IQR = Q3 − Q1.

A data value x is an outlier if either x ≤ Q1 − 1.5(IQR), or x ≥ Q3 + 1.5(IQR).

Z-score

The Z-score for a particular data value represents how many standard deviations the data value lies above or below the mean.

So, If z is positive, it means that the value is above the average. For Applicant 6, the Z-score is (24,000 − 32,540)/ 7201 ≈ −1.2, which means the income of Applicant 6 lies 1.2 standard deviations below the mean.

Uni-variate Descriptive Statistics

Different ways you can describe patterns found in uni-variate data include central tendency : mean, mode and median and dispersion: range, variance, maximum, minimum, quartiles , and standard deviation.

The various plots used to visualize uni-variate data typically are Bar Charts, Histograms, Pie Charts. etc.

Bi-variate Descriptive Statistics

Bi-variate analysis involves the analysis of two variables for the purpose of determining the empirical relationship between them. The various plots used to visualize bi-variate data typically are scatter-plot, box-plot.

Correlation

A correlation is a statistic intended to quantify the strength of the relationship between two variables. The correlation coefficient r quantifies the strength and direction of the linear relationship between two quantitative variables.

If r is positive and significant, we say that x and y are positively correlated. An increase in x is associated with an increase in y.

If r is negative and significant, we say that x and y are negatively correlated. An increase in x is associated with a decrease in y.

Thank you for reading! I would appreciate any comments, notes, corrections, questions or suggestions — if there’s anything you’d like me to write about, please don’t hesitate to let me know.