Sampling
Refers to the process of extracting a subset of a population. Done either due to constraints around:
- time consumption;
- cost;
- feasibility; or
- impossibility.
Statistics are often used to infer population parameters.
# Terminologies
Term | Definition |
---|---|
Population | An entire group of entites of which data can be collected from |
Sample | A subset of a population; a smaller group of entities selected from the population |
Parameter | A numerical measurement collected from a population |
Statistic | A numerical measurement computed from a sample |
# Notations
Representation | Parameter symbol (population) | Statistic symbol (sample) |
---|---|---|
Size | ||
Mean | ||
Standard deviation | ||
Variance |
# Sampling methods
# Random sampling
A technique whereby a sample is selected from a population entirely by chance (randomly). Each entity in the population has a known probability of being selected. Reduces the possibility of bias in sampling.
There are three kinds of random sampling:
- simple random sampling;
- statified random sampling; and
- systematic random sampling.
# Simple random sampling
A random sampling method that ensures that each entity in the population has an equal chance of being included in the sample.
# Stratified random sampling
A random sampling method that selects a sample from different groups in the population, ensuring that a particular group in the population won’t be missed out.
# Systematic random sampling
A random sampling method such that a starting point and every -th entity in the population is selected. Is easy to implement and reasonably efficient, but bias may exist if there is a certain pattern in the population list.
# Quota sampling
A technique commonly used in marketing research where interviewers are given a quota of interviewees from a certain type to conduct an interview with. Not random in nature as not every entity in the population has a chance to be selected.
There may also be additional bias as interviewers may approach more approchable and helpful interviewees than interviewees of a diverse background.
# Sampling distribution
Refers to the probability distribution of a statistic. More of a theoretical concept than one observed from experiment. As statistics are random variables, each statistic follows a particular distribution.
# Sampling distribution of the sample mean
Refers to the probability distribution of all possible values the sample mean can take when a sample (of size ) is taken from a particular population. Is a continuous probability distribution
A continuous probability distribution is the probability distribution of a continuous random variable. A common distribution is the normal distribution.
Generally,...Continuous probability distribution
An important statistic is the sample’s mean (), meaning that we often concern ourselves with the sample distribution of the sample mean.
# Mean and variance
When a very large number of samples (each of size ) of either:
- an infinite population,
- a very large finite population, or
- a finite population with replacement
is repeatedly and independently drawn from a population,
- the average sample mean () will be approximately equal to the actual population mean (); and
- the variance of the sample mean () will only be times the population’s variance ().
Expressed mathematically, when ,
On the other hand, if is done on a finite population (of size ) and is not a very small fraction of , the finite population correction factor needs to be applied to the variance:
# Sampling error
Sampling errors are defined as the differences between statistics and parameters as samples are not a perfect representation of a population that will always be present. The sampling error of the mean is the standard deviation of the sampling distribution of the mean.
When a very large number of samples (each of size ) of either:
- an infinite population,
- a very large finite population, or
- a finite population with replacement
is repeatedly and independently drawn from a population, the standard error of the mean is as such: On the other hand, if is done on a finite population (of size ) and is not a very small fraction of , the finite population correction factor needs to be applied:
# Central limit theorem
If is a random sample (of size ) taken from a population where with a random variable X (of any kind of distribution) with a mean () and variance (), the sample mean () is approximately normal.
Expressed mathematically, when ,