When conducting a research study, it is not realistic to think of analyzing the entire population of subjects (whatever the subjects are, individuals, organizations, communities, etc). Even today, when we can collect big data from many sources (think for instance Facebook subscribers), it is not yet possible to collect data from everyone and everything. Therefore, from the group of subjects of interest for your study you must select a smaller part (“sample”) which can be observed and analyzed. The process of selecting a group from a population of interest is called “sampling”.
During your research you use the selected sample to collect data, which you will then analyze. In most of the cases in economic sciences you will use the analysis to make predictions and generalizations for the entire population of subjects that are of interest for your study. For this reason, it is extremely important that you select a sample that is representative of the entire population; inadequate or biased sampling will result in erroneous results.
The sampling process
The sampling process has three steps:
1. Defining the target population
2. Selecting a sample frame
3. Selecting a sample
The target population is the group of subjects (individuals or items) with the specific characteristics you want to study. The subjects within the target population are the unit of analysis. The unit of analysis can be a person, an organization, a group, a region, a country, an object.
The second step is selecting a sample frame within the target population. The sample frame is a part of the target population that you can easily have access to; typically, this is a list of subjects with contact information. The sample frame should be representative for the entire target population and should not be biased.
The third step is selecting a sample from the sample frame using sampling techniques. There are two categories of sampling techniques:
- Probability (random) sampling
- Nonprobability sampling
In case of probability sampling, every unit in the population has a chance of being selected in the sample. The probability sampling techniques can be single-stage (simple sampling, systematic sampling, stratified sampling, cluster sampling, matched-pairs sampling) and multi-stage, i.e. a combination of two or more single-stage sampling techniques.
In nonprobability sampling techniques, some elements of the target population have absolutely no chance of being selected in a sample. In these techniques, the units are chosen using non-random criteria, such as quota or convenience. Because the selection of a sample is non-random, the nonprobability sampling techniques may lead to sampling bias and the information from such a sample cannot be generalized to the entire target population. There are four types of nonprobability sampling techniques: convenience, quota, expert, snowball.
The sample bias refers to an erroneous selection process of the sample (selection bias), i.e. certain subjects of the sample frame have less chances of being selected in the sample. Ideally, a random selection of the sample ensures not only representativeness of that sample to the entire target population but also that the research results are valid. Although researchers are aware of the sampling bias phenomenon, it is still very common in research and it can happen to anyone. The bias can be avoided by using simple or stratified random sampling technique (see above). It is, however, imperative that the process of selecting the sample for research is verified multiple times to avoid biases and ensure that the sample is truly representative of your entire target population.
The sample error refers to the difference between the sample selected for research and the target population. It is not the same with the sample bias. Even when the sample is best selected using a simple random technique, the sample will still be different from the target population. This happens because the researcher cannot study every element of the population and selects a sample instead. It is not realistic or feasible to study every individual unit of a target population and a sample must be chosen, therefore it is very difficult to avoid the sampling error.
The sampling error is unavoidable, but it can be reduced. For example, one way to reduce the sample error is to increase the sample size, by selecting more subjects into the sample (for example, instead of 500 students you can select 1.000). The sample error decreases as the sample size increases. Another way of reducing the sampling error is by carefully selecting the sample, for example by using stratified or quota sampling techniques.
Confidence interval. The margin of error is also called “confidence interval”. The confidence interval represents the deviation between the opinions of the respondents in your sample and the opinion of the entire target population. Typically, a ±5% margin of error is considered standard in most economic research. This means that you should add 5% to the results in both directions to obtain reliable conclusions. For example, if 75% of respondents to your research consider that Email is the most useful feature in a new smartphone, then actually 70% (75%-5) – 80% (75%+5) of the target population considers Email to be the most important feature.
Confidence level. The confidence level represents the likelihood that the sample truly represents the target population. Commonly, confidence levels are set at 95%. In practice, this means that if you replicate the research 10 more times with similar samples (similar subjects from the target population), the results will be the same 95% of the time. In other words, the confidence level shows “how sure” (“how confident”) you can be that the results obtained are representative for the entire target population.
The sample size means how many subjects you have to select in your sample to get valid and representative research results.
In order establish the size of the sample, you must first determine the target population, the confidence interval and the confidence level for your study.
There are mathematical formulas to calculate the sample size. Based on formulas, there are tables that give the sample size for different confidence levels and margins of error (confidence intervals).
Thus if we want to reduce the margins of errors (for example, a margin of error of 1%) we need to increase the sample size. The same is applicable to confidence level: if we want to be very sure of the results (for example, confidence level 99%), the sample size increases.
To obtain accurate, reliable and representative results the confidence level should be 99% and the margin of error 1%, but in this case the sample size is very large. For example, if your target population is 100 subjects, you must select 99 of them as your research sample. If your target population is much larger, for example 100000 subjects, then for the same restrictive confidence interval and level you need to research a sample of 14227 subjects. If you have time and money, of course it can be done but sometimes large samples are not feasible.
There exist many easy-to-use sample size online calculators. You need to know the size of your target population and then you can calculate what size your research sample should be according to different margins of error and confidence levels. Several examples of such online sample size calculators are given below:
Babbie, E. (2011). The Basics of Social Research. 5th edition. Belmont, USA: Wadsworth, Cengage Learning.
Noordzij, M., Tripepi, G., Dekker, W.F., Zoccali, C., Tanck, M.W., Jager, K.J. (2010). Sample size calculations: basic principles and common pitfalls. Nephrology Dialysis Transplantation, 25(1), 1388–1393,