Sampling Techniques, Distribution, CLT, Hypothesis Testing Basics, Z-Test, T-Test, ANOVA, Chi-Square, Regression Analysis
Sampling Techniques, Distribution, CLT, Hypothesis Testing Basics, Z-Test, T-Test, ANOVA, Chi-Square, Regression Analysis, etc. (Quantitative Reasoning Course for BS/ B.Ed Hons Level)
Below is a rewritten, student-friendly overview that emphasises how each topic builds on the previous one, with clear illustrations, formulas, decision rules, and real-life examples suitable for undergraduate honours students.
1. Sampling Techniques & Sampling Distribution
🪚 Sampling Techniques: Sampling is the process of selecting a subset of individuals from a larger population to make statistical inferences. The goal is to obtain a representative sample.
Simple Random Sampling: Every member of the population has an equal chance of being selected.
Illustration: Drawing names out of a hat or using a random number generator to select 100 employees from a list of 5,000.
Stratified Sampling: The population is divided into homogeneous subgroups (strata), and samples are drawn randomly from each stratum. This ensures key subgroups are represented proportionally.
Illustration: A university divides its student body into four strata (Freshman, Sophomore, Junior, Senior) and then randomly selects 25 students from each stratum to survey.
| Technique | How it works | When to use | Pros & Cons |
|---|---|---|---|
| Simple Random | Every unit has an equal chance (lottery) | Homogeneous population | Pros: Unbiased Cons: May miss subgroups |
| Stratified | Divide into strata → random within strata | Heterogeneous population (gender, region) | Pros: Guarantees representation Cons: Needs prior info |
| Cluster | Randomly select clusters (schools) → all units inside | Large, spread-out population | Pros: Cheap Cons: Higher sampling error |
| Systematic | Every k-th unit (k = N/n) | Ordered list available | Pros: Easy Cons: Periodic patterns cause bias |
| Convenience | Whoever is easiest | Pilot studies only | Highly biased – avoid for inference |
📊 Sampling Distribution
The sampling distribution of a statistic (like the mean) is the probability distribution of that statistic calculated from all possible samples of a given size taken from the same population.
Illustration: Imagine a population of 1 million adults with an average height (u) of 170 cm.
- You take a random sample of n=50 adults and calculate the mean height (X1). Say it's 171 cm.
You take a second sample of n=50 and get X2 = 169.5 cm.
You repeat this process thousands of times.
Plotting all these individual sample means (X1, X2, X3,...) creates the sampling distribution of the mean.
This new distribution of means will be less spread out than the original population distribution of individual heights.
💡 Central Limit Theorem (CLT)
The CLT is arguably the most important theorem in statistics, as it justifies the use of parametric tests in inferential statistics.
Definition: The CLT states that if you take a sufficiently large sample size (typically n > 30) from any population (regardless of the original population's shape), the sampling distribution of the mean will be approximately normally distributed.
Illustration:
Scenario: A population's income is heavily right-skewed (a few people earn extremely high amounts).
Applying CLT: If you repeatedly take large samples (e.g., n=100) and plot the mean income of each sample, the resulting histogram of sample means will look like a bell curve (normal distribution).
Significance: It allows statisticians to use the properties of the normal distribution (like Z-scores) to calculate probabilities and perform hypothesis tests, even if the original population data isn't normally distributed.
🧪 Hypothesis Testing Basics
Hypothesis testing is a formal procedure for determining whether an observed effect or relationship in a sample is statistically significant or could have occurred by chance.
| Component | Description | Illustration |
| Null Hypothesis (H0) | A statement of no effect or no difference. The assumption we initially hold and try to reject. | H0: The new medicine has no effect on patient recovery time (mean recovery time u = 10 days). |
| Alternative Hypothesis (Ha) | A statement that contradicts H0. It represents the research claim we seek evidence for. | Ha: The new medicine reduces recovery time (u < 10 days). |
| P-value | The probability of observing a sample statistic (or one more extreme) if the Null Hypothesis is true. | If p=0.02 and the significance level (α) is 0.05, there's only a 2% chance of seeing a 7-day recovery mean if the medicine actually has no effect. Since 0.02 < 0.05, we reject H0. |
| Significance Level (α) | The threshold for rejecting H0, commonly set at 0.05 (5%). This is the maximum risk of a Type I Error (rejecting a true H0). | α=0.05 means we accept a 5% risk of falsely concluding the medicine works when it doesn't. |
🎯 Z-Test and T-Test Applications
Both are used to test hypotheses about population means, but their use depends on the sample size (n) and knowledge of the population standard deviation (σ).
| Test | Used When... | Test Statistic Follows... | Application Example |
| Z-Test | Population σ is known, OR Sample size n is large (n > 30). | Z-distribution (Standard Normal) | Comparing a sample of 1,000 university student test scores to the national average, where the national standard deviation is known. |
| T-Test | Population σ is unknown AND Sample size n is small (n < 30). | t-distribution (has "fatter" tails to account for uncertainty) | Comparing the average productivity of 15 employees before and after a new training program. (Small sample, σ is unknown). |
⚖️ Chi-Square Test and ANOVA
Χ² Chi-Square Test
The Chi-Square test is used for categorical data (counts/frequencies) to examine relationships or compare observed frequencies to expected frequencies.
Applications:
Test of Independence: Checks if there is a relationship between two categorical variables.
Illustration: Is there an association between Gender (Male/Female) and Voting Preference (Party A/Party B)?
Goodness-of-Fit: Checks if a single categorical variable's distribution matches a claimed or theoretical distribution.
Illustration: Does the observed number of cars sold by colour (Red, Blue, White) match the manufacturer's expected proportion (e.g., 50% White, 30% Blue, 20% Red)?
ANOVA (Analysis of Variance)
ANOVA is used to test for a statistically significant difference between the means of three or more independent groups using a continuous dependent variable.
Application: Comparing the mean crop yield (continuous variable) resulting from three different fertiliser types (categorical groups).
Mechanism: ANOVA compares the variance between the groups (due to the fertiliser) to the variance within the groups (due to random chance). The resulting F-ratio determines if the group means are statistically different from each other.
📈 Correlation and Regression
Correlation
Correlation measures the strength and direction of the linear relationship between two continuous variables. It is summarised by the Correlation Coefficient (r), which ranges from -1.0 to +1.0.
Illustration:
r = +0.9: A strong positive correlation. As hours studied increases, exam score consistently increases.
r = -0.6: A moderate negative correlation.
(As daily temperature rises, hot coffee sales tend to decrease).
r = 0.0: No linear correlation. The variables are unrelated.
Regression
Regression (most commonly Simple Linear Regression) extends correlation by allowing you to predict the value of a dependent variable (Y) based on an independent variable (X). It finds the "best-fit" straight line (the regression line) through the data points.
The Model: Y = b0 + b1X + ε
Y: Dependent/Outcome Variable (e.g., predicted Sales)
X: Independent/Predictor Variable (e.g., Advertising Spend)
b1: The slope (rate of change in Y for a unit change in X)
b0: The Y-intercept
Illustration: A company uses advertising spend (X) to predict sales (Y). Regression analysis can tell them: "For every PKR 1,000 increase in advertising, we predict an average increase of PKR 150 units in sales."
✒️ BY: Raja Bahar Khan Soomro

Comments