Lectern

Search

Search IconIcon to open search

Sampling

Last updated Nov 20, 2022

Refers to the process of extracting a subset of a population. Done either due to constraints around:

Statistics are often used to infer population parameters.

# Terminologies

TermDefinition
PopulationAn entire group of entites of which data can be collected from
SampleA subset of a population; a smaller group of entities selected from the population
ParameterA numerical measurement collected from a population
StatisticA numerical measurement computed from a sample

# Notations

RepresentationParameter symbol (population)Statistic symbol (sample)
SizeNNnn
Meanμ\muXˉ\bar X
Standard deviationσ\sigmass
Varianceσ2\sigma^2s2s^2

# Sampling methods

# Random sampling

A technique whereby a sample is selected from a population entirely by chance (randomly). Each entity in the population has a known probability of being selected. Reduces the possibility of bias in sampling.

There are three kinds of random sampling:

# Simple random sampling

A random sampling method that ensures that each entity in the population has an equal chance of being included in the sample.

# Stratified random sampling

A random sampling method that selects a sample from different groups in the population, ensuring that a particular group in the population won’t be missed out.

# Systematic random sampling

A random sampling method such that a starting point and every kk-th entity in the population is selected. Is easy to implement and reasonably efficient, but bias may exist if there is a certain pattern in the population list.

# Quota sampling

A technique commonly used in marketing research where interviewers are given a quota of interviewees from a certain type to conduct an interview with. Not random in nature as not every entity in the population has a chance to be selected.

There may also be additional bias as interviewers may approach more approchable and helpful interviewees than interviewees of a diverse background.

# Sampling distribution

Refers to the probability distribution of a statistic. More of a theoretical concept than one observed from experiment. As statistics are random variables, each statistic follows a particular distribution.

# Sampling distribution of the sample mean

Refers to the probability distribution of all possible values the sample mean can take when a sample (of size nn) is taken from a particular population. Is a continuous probability distribution

by nature.

An important statistic is the sample’s mean (Xˉ\bar X), meaning that we often concern ourselves with the sample distribution of the sample mean.

# Mean and variance

When a very large number of samples (each of size nn) of either:

is repeatedly and independently drawn from a population,

Expressed mathematically, when nn \to \infty, E(Xˉ)μ E(\bar X) \approx \mu Var(Xˉ)=1n×σ2=σ2n Var(\bar X) = \frac {1} {n} \times \sigma^2 = \frac {\sigma^2} {n}

On the other hand, if nn is done on a finite population (of size NN) and nn is not a very small fraction of NN, the finite population correction factor needs to be applied to the variance: Var(Xˉ)=σ2n×NnN1 Var(\bar X) = \frac {\sigma^2} n \times \frac {N - n} {N - 1}

# Sampling error

Sampling errors are defined as the differences between statistics and parameters as samples are not a perfect representation of a population that will always be present. The sampling error of the mean is the standard deviation of the sampling distribution of the mean.

When a very large number of samples (each of size nn) of either:

is repeatedly and independently drawn from a population, the standard error of the mean is as such: SEmean=Var(Xˉ)=σn SE_{mean} = \sqrt {Var(\bar X)} = \frac \sigma {\sqrt n} On the other hand, if nn is done on a finite population (of size NN) and nn is not a very small fraction of NN, the finite population correction factor needs to be applied: SEmean=Var(Xˉ)=σn×NnN1 SE_{mean} = \sqrt {Var(\bar X)} = \frac \sigma {\sqrt n} \times \sqrt \frac {N - n} {N - 1}

# Central limit theorem

If X1,X2,,XnX_1, X_2, …, X_n is a random sample (of size n30n \geq 30) taken from a population where with a random variable X (of any kind of distribution) with a mean (μ\mu) and variance (σ2\sigma^2), the sample mean (Xˉ\bar X) is approximately normal.

Expressed mathematically, when n30n \geq 30, XˉN(μ,σ2n) \bar X \sim N(\mu, \frac {\sigma^2} n)