Statistics

Statistics

Share:
The science of collecting, analyzing, and interpreting data. Statistics provides the mathematical foundation for understanding uncertainty, making inferences, and drawing conclusions from data.

Introduction to Statistics

Statistics is the science of learning from data. It provides tools for collecting, organizing, analyzing, and interpreting numerical information. While probability theory deals with the mathematics of uncertainty, statistics applies these tools to real-world data to make informed decisions.

The field has two main branches: descriptive statistics, which summarizes and describes data (means, medians, standard deviations), and inferential statistics, which uses sample data to make conclusions about populations. Modern statistics is essential to scientific research, business analytics, machine learning, and public policy.

Statistical identities provide the mathematical framework for probability calculations, hypothesis testing, and Bayesian inference. They enable rigorous analysis of uncertainty and guide decision-making under incomplete information.

Complete History

Statistics has roots in ancient civilizations that collected data for administrative purposes. The Babylonians, Egyptians, and Romans kept records of populations, taxes, and resources. However, statistics as a mathematical discipline began much later, emerging from probability theory and the need to analyze data systematically.

The foundations of probability theory were laid in the 17th century by mathematicians like Blaise Pascal (1623-1662) and Pierre de Fermat (1601-1665), who solved problems related to games of chance. Jacob Bernoulli (1654-1705) proved the Law of Large Numbers, showing that sample averages converge to population means. Abraham de Moivre (1667-1754) developed the normal distribution and the central limit theorem.

The 19th century saw statistics become a distinct field. Adolphe Quetelet (1796-1874) applied statistical methods to social phenomena, introducing concepts like the "average man." Francis Galton (1822-1911) developed correlation and regression analysis. Karl Pearson (1857-1936) founded the first statistics department and developed the chi-square test. Ronald Fisher (1890-1962) revolutionized statistics with methods like analysis of variance (ANOVA) and maximum likelihood estimation.

Modern statistics is essential to virtually every field: medicine (clinical trials), economics (econometrics), psychology (experimental design), biology (bioinformatics), and data science. The development of Bayesian statistics, machine learning, and big data analytics has expanded statistics' applications even further. Statistics provides the tools to make sense of uncertainty and variability in the world around us.

Key Concepts

Loading visualization...

Probability

Probability measures the likelihood of events. It ranges from 0 (impossible) to 1 (certain). The probability of an event A is written as P(A).

0P(A)1,P(all outcomes)=10 \leq P(A) \leq 1, \quad P(\text{all outcomes}) = 1

Probability theory provides the mathematical foundation for statistics, enabling quantification of uncertainty.

Conditional Probability

The probability of A given that B has occurred. This is fundamental to Bayesian reasoning and updating beliefs.

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

Conditional probability allows us to update our beliefs based on new information, which is the essence of Bayesian statistics.

Random Variables

A random variable is a function that assigns numerical values to outcomes of a random process. They can be discrete (countable outcomes) or continuous.

E[X]=xP(X=x) (discrete),E[X]=xf(x)dx (continuous)E[X] = \sum x P(X=x) \text{ (discrete)}, \quad E[X] = \int x f(x) dx \text{ (continuous)}

The expected value (mean) is a measure of central tendency, representing the long-run average.

Distributions

Probability distributions describe how probabilities are distributed over possible values. Common distributions include normal, binomial, and Poisson.

f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}

The normal (Gaussian) distribution is particularly important due to the Central Limit Theorem.

Fundamental Theory

Bayes' Theorem

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Describes how to update probabilities based on new evidence. It connects prior beliefs P(A) to posterior beliefs P(A|B) through the likelihood P(B|A) and evidence P(B).

Law of Total Probability

If events B₁, B₂, ..., Bₙ partition the sample space, then:

P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^n P(A|B_i) P(B_i)

This law is essential for calculating marginal probabilities and is used in the denominator of Bayes' Theorem.

Central Limit Theorem

As sample size increases, the distribution of sample means approaches a normal distribution, regardless of the original distribution:

XˉN(μ,σ2n)\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)

This theorem is fundamental to inferential statistics, enabling confidence intervals and hypothesis tests.

Quick Examples

Example 1: Basic Probability

If you roll a fair die, the probability of getting an even number is:

P(even)=P({2,4,6})=36=12P(\text{even}) = P(\{2,4,6\}) = \frac{3}{6} = \frac{1}{2}

There are 3 favorable outcomes out of 6 possible outcomes, giving a probability of 1/2.

Example 2: Conditional Probability

In a deck of 52 cards, if you draw a card and it's red, what's the probability it's a heart?

P(heartred)=P(heartred)P(red)=13/5226/52=12P(\text{heart}|\text{red}) = \frac{P(\text{heart} \cap \text{red})}{P(\text{red})} = \frac{13/52}{26/52} = \frac{1}{2}

Given that the card is red (26 possibilities), half are hearts (13), so the conditional probability is 1/2.

Example 3: Bayes' Theorem Application

A medical test has 95% sensitivity and 90% specificity. If 1% of the population has the disease, what's the probability someone with a positive test actually has the disease?

P(diseasepositive)=0.95×0.010.95×0.01+0.10×0.990.088P(\text{disease}|\text{positive}) = \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.10 \times 0.99} \approx 0.088

Only about 8.8%! This counterintuitive result shows why Bayes' Theorem is essential for interpreting test results correctly.

Example 4: Normal Distribution Probability

Given a normal distribution with μ = 100 and σ = 15, find P(85 < X < 115):

Z1=8510015=1,Z2=11510015=1Z_1 = \frac{85 - 100}{15} = -1, \quad Z_2 = \frac{115 - 100}{15} = 1
P(85<X<115)=P(1<Z<1)=2P(0<Z<1)=20.3413=0.6826P(85 < X < 115) = P(-1 < Z < 1) = 2 \cdot P(0 < Z < 1) = 2 \cdot 0.3413 = 0.6826

About 68% of values fall within one standard deviation of the mean in a normal distribution.

Example 5: Hypothesis Testing

Test H₀: μ = 50 vs H₁: μ ≠ 50 with sample mean x̄ = 52, n = 36, σ = 6, α = 0.05:

z=xˉμσ/n=52506/36=21=2z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}} = \frac{52 - 50}{6/\sqrt{36}} = \frac{2}{1} = 2

Critical value for two-tailed test: z_0.0250.025 = 1.96. Since |z| = 2 > 1.96, we reject H₀.

Practice Problems

Practice statistical concepts and calculations.

Problem 1: Mean and Standard Deviation

Find the mean and standard deviation of the data set: 5, 7, 9, 11, 13

Solution:

μ=5+7+9+11+135=455=9\mu = \frac{5 + 7 + 9 + 11 + 13}{5} = \frac{45}{5} = 9
σ2=(59)2+(79)2+(99)2+(119)2+(139)25=16+4+0+4+165=8\sigma^2 = \frac{(5-9)^2 + (7-9)^2 + (9-9)^2 + (11-9)^2 + (13-9)^2}{5} = \frac{16 + 4 + 0 + 4 + 16}{5} = 8
σ=8=222.83\sigma = \sqrt{8} = 2\sqrt{2} \approx 2.83

Problem 2: Probability

A fair die is rolled twice. What is the probability that the sum is 7?

Solution:

Total outcomes: 6 × 6 = 36

Favorable outcomes for sum = 7: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) = 6 outcomes

P(sum=7)=636=16P(\text{sum} = 7) = \frac{6}{36} = \frac{1}{6}

Applications

Statistics is essential to virtually every field that deals with data:

Scientific Research

Experimental design, hypothesis testing, and data analysis

Machine Learning

Training algorithms, validation, and model selection

Medicine

Clinical trials, epidemiology, and diagnostic testing

Business

Market research, quality control, and risk analysis

Public Policy

Polling, census analysis, and program evaluation

Finance

Risk modeling, portfolio optimization, and algorithmic trading

Resources

External resources for further learning: