Math · Statistics and Probability

Statistics revision notes

A concise JEE revision summary of Statistics.

FormulasRevision notes
Mathrevision notes

Key Concepts & Definitions

Statistics is the science of averages and their estimates, dealing with data collected for specific purposes to make decisions through analysis and interpretation.

1. Measures of Central Tendency A representative value that gives a rough idea of where data points are centered.

Mean (Arithmetic Mean):
The sum of observations divided by the number of observations.
Median:
The middlemost value when data is arranged in ascending or descending order. If nnn is odd, it is the (n+12)th\left(\frac{n+1}{2}\right)^{th}(2n+1​)th observation. If nnn is even, it is the mean of the (n2)th\left(\frac{n}{2}\right)^{th}(2n​)th and (n2+1)th\left(\frac{n}{2} + 1\right)^{th}(2n​+1)th observations.
Mode:
The most frequently occurring observation.

2. Variability & Measures of Dispersion Central tendency alone is insufficient to give complete information about data. Variability describes how scattered or bunched the data is around the central measure. A single number describing this variability is called a measure of dispersion. Types of measures of dispersion:

  • Range: The difference between the maximum and minimum values in a series.
  • Quartile Deviation: Half the difference between the upper and lower quartiles (Semi-interquartile range).
  • Mean Deviation: The mean of the absolute values of the deviations of observations from a central value (mean or median).
  • Standard Deviation: The positive square root of the variance, acting as the most reliable indicator of dispersion.

3. Variance The mean of the squares of the deviations from the mean. It overcomes the mathematical difficulties of the Mean Deviation (which uses absolute values) by squaring the deviations to ensure non-negativity. \rightarrow [JEE TIP] Variance is independent of the change of origin but depends on the change of scale.

4. Coefficient of Variation (CV) A dimensionless, relative measure of dispersion used to compare the variability of two or more distributions. CV=σxˉ×100CV = \frac{\sigma}{\bar{x}} \times 100. A distribution with a lower CV is termed more consistent, uniform, or stable. \rightarrow [JEE TIP] Always use CV, not standard deviation, to compare the consistency of two distinct datasets (e.g., comparing runs of two batsmen).

5. Historical Context (Theoretical Statistics)

  • A.L. Bowley & A.L. Boddington: Defined statistics as the science of averages/estimates.
  • Kautilya (300 B.C.): Maintained vital statistics in Arthshastra.
  • Francis Galton: Pioneered biometry statistics.
  • Karl Pearson: Discovered the Chi-square test and founded the first statistical laboratory.
  • Sir Ronald A. Fisher: Father of modern statistics.

Formulae, Equations & Units

1. Mean (xˉ\bar{x})

  • Ungrouped data: xˉ=1ni=1nxi\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i.
  • Discrete frequency distribution: xˉ=1Ni=1nfixi\bar{x} = \frac{1}{N} \sum_{i=1}^{n} f_i x_i, where N=fiN = \sum f_i.
  • Step-deviation method (Shortcut): xˉ=A+fiyiN×h\bar{x} = A + \frac{\sum f_i y_i}{N} \times h, where AA is the assumed mean, hh is the class width, and yi=xiAhy_i = \frac{x_i - A}{h}.

2. Median (M) for Continuous Frequency Distribution

  • M=l+(N2Cf)×hM = l + \left( \frac{\frac{N}{2} - C}{f} \right) \times h
    • ll = lower limit of median class (the class whose cumulative frequency is N/2\geq N/2)
    • NN = total frequency
    • CC = cumulative frequency of the class preceding the median class
    • ff = frequency of the median class
    • hh = width of the median class

3. Mean Deviation (M.D.)

  • About a general value aa: M.D.(a)=1nxiaM.D.(a) = \frac{1}{n} \sum |x_i - a|.
  • About Mean (Ungrouped): M.D.(xˉ)=1nxixˉM.D.(\bar{x}) = \frac{1}{n} \sum |x_i - \bar{x}|.
  • About Median (Ungrouped): M.D.(M)=1nxiMM.D.(M) = \frac{1}{n} \sum |x_i - M|.
  • About Mean (Grouped): M.D.(xˉ)=1NfixixˉM.D.(\bar{x}) = \frac{1}{N} \sum f_i |x_i - \bar{x}|.
  • About Median (Grouped): M.D.(M)=1NfixiMM.D.(M) = \frac{1}{N} \sum f_i |x_i - M|.

4. Variance (σ2\sigma^2)

  • Ungrouped data (Basic definition): σ2=1ni=1n(xixˉ)2\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2.
  • Ungrouped data (Calculation shortcut): σ2=1ni=1nxi2(xˉ)2\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} x_i^2 - (\bar{x})^2. \rightarrow [JEE TIP] Always use this form in JEE calculations to save time.
  • Discrete/Continuous Grouped data: σ2=1Ni=1nfi(xixˉ)2\sigma^2 = \frac{1}{N} \sum_{i=1}^{n} f_i (x_i - \bar{x})^2.
  • Grouped data (Calculation shortcut): σ2=1N2[Nfixi2(fixi)2]\sigma^2 = \frac{1}{N^2} \left[ N \sum f_i x_i^2 - \left(\sum f_i x_i\right)^2 \right].
  • Step-deviation method: σx2=h2σy2=h2N2[Nfiyi2(fiyi)2]\sigma_x^2 = h^2 \sigma_y^2 = \frac{h^2}{N^2} \left[ N \sum f_i y_i^2 - \left(\sum f_i y_i\right)^2 \right].

5. Standard Deviation (σ\sigma)

  • It is simply the positive square root of variance: σ=σ2\sigma = \sqrt{\sigma^2}.

6. Combined Mean and Combined Variance (JEE Advanced Focus) If two groups of sizes n1n_1 and n2n_2 have means xˉ1,xˉ2\bar{x}_1, \bar{x}_2 and variances σ12,σ22\sigma_1^2, \sigma_2^2:

  • Combined Mean: xˉc=n1xˉ1+n2xˉ2n1+n2\bar{x}_c = \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2}{n_1 + n_2}
  • Combined Variance: σc2=n1(σ12+d12)+n2(σ22+d22)n1+n2\sigma_c^2 = \frac{n_1(\sigma_1^2 + d_1^2) + n_2(\sigma_2^2 + d_2^2)}{n_1 + n_2}, where d1=xˉ1xˉcd_1 = \bar{x}_1 - \bar{x}_c and d2=xˉ2xˉcd_2 = \bar{x}_2 - \bar{x}_c.

Conditions & Limitations

  • Mean Deviation: Cannot be subjected to further algebraic/calculus treatment because it involves absolute values (modulus function is not differentiable everywhere).
  • Mean Deviation about Median: If the degree of variability in a series is very high, the median is not a reliable representative central tendency. Thus, M.D. about median cannot be fully relied upon in highly dispersed sets.
  • Sum of Deviations: The sum of standard deviations without absolute values from the mean is always zero ((xixˉ)=0\sum (x_i - \bar{x}) = 0). Hence, finding the simple average of deviations is useless for dispersion.
  • Continuous Distributions: The formulas assume that the frequency in each class is entirely centered at its mid-point (xix_i). This is an approximation.

⚠️ COMMON MISCONCEPTIONS & SIGN CONVENTIONS

  • Trap of Scale vs. Origin:
    • Adding or subtracting a constant aa from every observation (change of origin) changes the Mean (yˉ=xˉ±a\bar{y} = \bar{x} \pm a) but DOES NOT change Variance or Standard Deviation (σy=σx\sigma_y = \sigma_x).
    • Multiplying or dividing every observation by a constant kk (change of scale) changes the Mean (yˉ=kxˉ\bar{y} = k\bar{x}), changes Variance by k2k^2 (σy2=k2σx2\sigma_y^2 = k^2 \sigma_x^2), and changes Standard Deviation by k|k| (σy=kσx\sigma_y = |k|\sigma_x).
    • Linear transformation yi=axi+by_i = ax_i + b: yˉ=axˉ+b\bar{y} = a\bar{x} + b, and Var(y)=a2Var(x)Var(y) = a^2 Var(x). \rightarrow [JEE TIP] Note that bb vanishes completely for variance.
  • Sum of Squares Trap: The term xi2\sum x_i^2 is the sum of the squares of the observations. This is strictly distinct from (xi)2(\sum x_i)^2, which is the square of the sum. Do not confuse them in the variance formula xi2n(xin)2\frac{\sum x_i^2}{n} - (\frac{\sum x_i}{n})^2.
  • Modulus in Mean Deviation: Many students forget to drop the negative signs when calculating Mean Deviation. M.D. demands absolute values xixˉ|x_i - \bar{x}|, not brackets.

Previous Year JEE Topics

  1. Correcting Incorrect Observations: Given an incorrect mean and variance due to misread data points, calculating the exact correct mean and variance.
  2. Linear Transformations: If a dataset undergoes operations like xnew=αxold+βx_{new} = \alpha x_{old} + \beta, computing the new variance/SD.
  3. Variance of Natural Numbers: Specifically the variance of the first nn natural numbers, arithmetic progressions, or geometric progressions.
  4. Combined Variance: Questions providing data for boys vs. girls in a class and asking for the aggregate variance.

Standard Derivations & Step-by-Step Problem Solving

Correcting Incorrect Observations (Frequent JEE Profile) Scenario: A student calculates the mean and variance of nn observations but later discovers that one observation xwrongx_{wrong} was recorded incorrectly and should be xcorrectx_{correct}.

  • Step 1: Calculate Incorrect Sum of observations: xinc=n×xˉinc\sum x_{inc} = n \times \bar{x}_{inc}
  • Step 2: Calculate Correct Sum: xcor=xincxwrong+xcorrect\sum x_{cor} = \sum x_{inc} - x_{wrong} + x_{correct}
  • Step 3: Calculate Correct Mean: xˉcor=xcorn\bar{x}_{cor} = \frac{\sum x_{cor}}{n}
  • Step 4: Use the variance formula to find Incorrect Sum of Squares: σinc2=xinc2n(xˉinc)2    xinc2=n(σinc2+xˉinc2)\sigma_{inc}^2 = \frac{\sum x_{inc}^2}{n} - (\bar{x}_{inc})^2 \implies \sum x_{inc}^2 = n(\sigma_{inc}^2 + \bar{x}_{inc}^2)
  • Step 5: Calculate Correct Sum of Squares: xcor2=xinc2(xwrong)2+(xcorrect)2\sum x_{cor}^2 = \sum x_{inc}^2 - (x_{wrong})^2 + (x_{correct})^2
  • Step 6: Calculate Correct Variance: σcor2=xcor2n(xˉcor)2\sigma_{cor}^2 = \frac{\sum x_{cor}^2}{n} - (\bar{x}_{cor})^2

Variance of First nn Natural Numbers Let xi=ix_i = i for i=1,2,,ni = 1, 2, \ldots, n. Mean: xˉ=1ni=n(n+1)2n=n+12\bar{x} = \frac{1}{n} \sum i = \frac{n(n+1)}{2n} = \frac{n+1}{2}. Sum of squares: xi2=n(n+1)(2n+1)6\sum x_i^2 = \frac{n(n+1)(2n+1)}{6}. Variance: σ2=xi2n(xˉ)2=(n+1)(2n+1)6(n+12)2=n2112\sigma^2 = \frac{\sum x_i^2}{n} - (\bar{x})^2 = \frac{(n+1)(2n+1)}{6} - \left(\frac{n+1}{2}\right)^2 = \frac{n^2 - 1}{12}. \rightarrow [JEE TIP] Memorize σ2=n2112\sigma^2 = \frac{n^2 - 1}{12} for first nn natural numbers.

Memory Aids & JEE Traps

  • \rightarrow [JEE TIP] Trap 1 - "Variance of a constant sequence": If all observations are equal to kk, then variance is strictly 00 (since xixˉ=0x_i - \bar{x} = 0).
  • \rightarrow [JEE TIP] Trap 2 - "Addition vs Multiplication effect on Variance": If yi=xiAhy_i = \frac{x_i - A}{h}, do not forget that Var(x)=h2Var(y)Var(x) = h^2 Var(y). The constant AA vanishes, but the scaling factor hh is squared.
  • \rightarrow [JEE TIP] Trap 3 - "Minimum sum of squared deviations": The sum of squared deviations (xia)2\sum (x_i - a)^2 is mathematically minimized when aa is the Mean (xˉ\bar{x}). If asked to minimize this sum, set a=xˉa = \bar{x}.
  • \rightarrow [JEE TIP] Trap 4 - "Minimum sum of absolute deviations": The sum of absolute deviations xia\sum |x_i - a| is mathematically minimized when aa is the Median.

Top 10 MCQ Traps

  • [JEE TIP] Trap 1 - The Origin Shift Variance Inertia:

    • Misconception: Adding or subtracting a uniform constant from every data point in a distribution shifts and scales its variance.
    • Correct Understanding: Measures of dispersion (Variance, Standard Deviation, and Range) are completely independent of a change of origin. Adding a constant shifts the entire distribution uniformly without altering the spread between points, meaning the variance remains absolutely unchanged. Only a change of scale (multiplication/division) affects dispersion.
  • [JEE TIP] Trap 2 - The Zero Mean Deviation Nullification:

    • Misconception: Calculating the Mean Deviation (M.D.) involves evaluating the simple arithmetic average of standard data deviations: (xixˉ)n\frac{\sum(x_i - \bar{x})}{n}.
    • Correct Understanding: Mean Deviation strictly requires absolute values, denoted as xixˉn\frac{\sum|x_i - \bar{x}|}{n}. If you omit the modulus brackets, the positive and negative deviations around the arithmetic mean will perfectly balance out, causing the final summation to default to exactly zero every single time.
  • [JEE TIP] Trap 3 - Negative Standard Deviation Scaling:

    • Misconception: Multiplying every data point in a set by a negative scaling factor like 3-3 scales the Standard Deviation (σ\sigma) by that same factor of 3-3.
    • Correct Understanding: Standard Deviation represents a physical distance and must strictly be a non-negative real value. Scaling data by a factor of kk changes the standard deviation by its absolute magnitude: σnew=kσold\sigma_{\text{new}} = |k|\sigma_{\text{old}}. Multiplying by 3-3 increases the standard deviation by a positive factor of 3=3|-3| = 3.
  • [JEE TIP] Trap 4 - The Combined Variance Mean Shift Deficit:

    • Misconception: When merging two distinct datasets together, the combined variance of the total pool is simply the weighted average of their individual variances.
    • Correct Understanding: The weighted average calculation completely misses the dispersion caused by the distance between the two distinct group means. The correct combined variance formula must explicitly incorporate mean displacement factors (d1d_1 and d2d_2): σc2=n1(σ12+d12)+n2(σ22+d22)n1+n2\sigma_c^2 = \frac{n_1(\sigma_1^2 + d_1^2) + n_2(\sigma_2^2 + d_2^2)}{n_1 + n_2}, where d1=xˉ1xˉcd_1 = \bar{x}_1 - \bar{x}_c and d2=xˉ2xˉcd_2 = \bar{x}_2 - \bar{x}_c.
  • [JEE TIP] Trap 5 - Absolute vs. Relative Dispersion Comparisons:

    • Misconception: A dataset possessing a larger absolute Standard Deviation is automatically more variable or less stable than another dataset.
    • Correct Understanding: You cannot directly compare the standard deviations of datasets that have radically different units or completely different means. To perform a valid relative variability comparison, you must evaluate the Coefficient of Variation (C.V.), which normalizes the metric: C.V.=σxˉ×100\text{C.V.} = \frac{\sigma}{\bar{x}} \times 100. The dataset with the higher C.V. percentage is the one that is truly more variable.
  • [JEE TIP] Trap 6 - Variance Dimensional Unit Inflation:

    • Misconception: Variance is measured and expressed in the exact same dimensional units (e.g., kg, meters, seconds) as the raw observations.
    • Correct Understanding: Because the variance formula squares the distance values, its output is expressed in squared units (e.g., kg2\text{kg}^2, m2\text{m}^2). Only the Standard Deviation and the Mean Deviation, which apply a square root or handle raw linear distances, return the final value to the original dimensional units of the data.
  • [JEE TIP] Trap 7 - The Sum-Square Operator Algebraic Swap:

    • Misconception: The notation representing the sum of squares (xi2\sum x_i^2) and the notation for the square of the sum ((xi)2(\sum x_i)^2) are algebraically interchangeable.
    • Correct Understanding: These are fundamentally distinct operations. xi2\sum x_i^2 squares each element individually before adding them, whereas (xi)2(\sum x_i)^2 sums all elements first before squaring the total pool. Mixing them up will completely corrupt the standard variance computational formula: σ2=xi2n(xin)2\sigma^2 = \frac{\sum x_i^2}{n} - \left(\frac{\sum x_i}{n}\right)^2.
  • [JEE TIP] Trap 8 - The Median Class Midpoint Fallacy:

    • Misconception: In a continuous frequency distribution table, the median value can be estimated by choosing the exact midpoint of the designated median class.
    • Correct Understanding: The midpoint of a class is used for evaluating the mean of grouped data, not the median. Finding the median of a continuous frequency distribution requires performing linear interpolation across the cumulative frequency distribution curve using the precise formula: Median=l+(N2Cf)×h\text{Median} = l + \left(\frac{\frac{N}{2} - \text{C}}{f}\right) \times h.
  • [JEE TIP] Trap 9 - The Sum of Unsigned Deviations Illusion:

    • Misconception: Summing up the raw differences between each individual observation and the mean ((xixˉ)\sum(x_i - \bar{x})) serves as a valid approach to measure the total dispersion of a dataset.
    • Correct Understanding: This sum provides no information about dispersion because the algebraic sum of deviations from the arithmetic mean is universally equal to zero. The negative values below the mean always cancel out the positive values above it. To measure dispersion, you must use absolute values or square the differences.
  • [JEE TIP] Trap 10 - Linear Variance Scaling Delusion:

    • Misconception: If every observation in a dataset is divided by a factor of 22, the original variance of the system drops linearly by half (e.g., a variance of 55 becomes 2.52.5).
    • Correct Understanding: Variance scales quadratically with respect to multiplication or division. If every observation is scaled by a factor of 1k\frac{1}{k}, the variance scales by a factor of 1k2\frac{1}{k^2}. Therefore, dividing every data point by 22 alters the variance by a factor of 122=14\frac{1}{2^2} = \frac{1}{4}, transforming an initial variance of 55 into exactly 54=1.25\frac{5}{4} = 1.25.
Notes fade fast. Rhovecs re-surfaces each concept on a forgetting schedule and picks what you practise next — so revision sticks.See how it works
Other chapters

Rhovecs re-surfaces each concept right before you’d forget it — and picks the next thing to practise. We decide, you execute.

Get started