Statistical Inference, Sampling, and Probability

Descriptive and Inferential Statistics

The study and use of statistics is roughly divided into two categories: descriptive statistics and inferential statistics. As previous sections have demonstrated, descriptive statistics summarize, organize, and illustrate distributions of numerical observations. Among typically employed descriptive statistics are measures of central tendency (e.g., mean, median, mode), measures of dispersion (e.g., range, interquartile range, variance, standard deviation), and bivariate measures (e.g., correlation coefficients). Within the category of descriptive statistics are the visual displays of distributions, including frequency tables, histograms, pie charts, bar charts, and scatterplots.

So what is different about inferential statistics? Instead of simply attempting to summarize and organize a distribution of data, inferential statistics are used to extend results generated from a subgroup of observations to a larger context - in other words, the purpose of inferential statistics is to generalize from samples to populations. This section will describe the fundamental concepts of the process of using inferential statistics.

A research hypothesis is the focal point of statistical inference. A research hypothesis makes a prediction, based on a theoretical foundation, that identifies an anticipated outcome - for example, higher reading scores, due to a new instructional technique. The research hypothesis is generated from a, usually broader, research question. An example of a research question might ask whether there is a difference between a new way of teaching and a traditional method. Research questions, in turn, are narrower investigations of a more general research purpose, which seeks to address an important problem in the field. So, you can think of a research hypothesis as the end of a series of narrower, more focused, statements about a research topic.

A research hypothesis aids the research study by providing a very specific and precise prediction to test. The test of the hypothesis involves sampling and then collecting data. The nature of the sample affects the ability of the researcher to generalize the results. A full introduction to sampling techniques is beyond the scope of this introduction, but the important aspect of sampling for statistical purposes involves the concept of representation. A quantitative researcher intends to apply her results to a larger group than the one she studied. The larger group is called the population. In order to generalize, the group involved in the study, called the sample, must resemble the population on variables that are important to the research. The simplest, but usually costliest, way to sample is randomly - the equivalent of pulling names out of a hat. This process is usually costly in terms of time, money, and logistics; so, a compromise is made for practical reasons. The compromise, and the act of sampling itself, introduce errors in the process; in other words, the smaller sample may differ from the population in some important way. A researcher cannot know whether this situation has actually happened though. Instead, the researcher will use statistics to describe the sample's characteristics and then infer from those sample characteristics what the properties of the larger population are. For example, testing a new reading method on a properly selected subset of all third-grade students allows a researcher to make statements about the value of the reading program for all third-grade students. Numbers pertaining to the sample are called statistics, and numbers pertaining to the population are called parameters. Statisticians use different symbols to distinguish the two set of numbers. Latin symbols (e.g., X, s, r) are used to represent sample statistics, and Greek symbols (e.g., μ, σ, ρ) [named mu, sigma, and rho] are used to represent population parameters. Here are several relationships to remember:

Symbols for the mean: X (sample mean) is used to estimate μ (population mean)
Symbols for the standard deviation: s (sample standard deviation) is used to estimate σ (population standard deviation)
Symbols for the Pearson correlation coefficient: r (sample correlation) is used to estimate ρ (population correlation)

As a researcher who is interested in improving the state of the field, you might propose to start your investigation by assuming your research hypothesis is indeed correct and looking for evidence to the contrary. This approach might result in a less than objective view of the research study. Instead researchers start with what is called a null hypothesis, which represents the status quo, for example, that there are no differences between a new method and a traditional one. The null hypothesis makes a statement about the population that is not directly testable. The statement is untestable in practice due to time, money, and logistical constraints, not untestable theoretically. If it was logistically possible, the new method could be tested on the entire population and the need to make an inference would not exist.

Here are some examples of null hypotheses:

Comparison of treatment (t) and control (c) group means
    H₀: μ_t = μ_c

Comparison of pre-test (pre) and post-test (post) means
    H₀: μ_pre = μ_post

Test of linear relationship between age (a) and experience (e)
    H₀: ρ_ae = 0

Notice that null hypotheses assume equality among groups, between variables, or over time. Researchers then collect data to see if there is evidence to the contrary. In this way, the null hypothesis provides a benchmark for assessing whether observed differences are due to chance or some other factor/variable (i.e., systematic differences).

Here are a few examples of research hypotheses, also called alternative hypotheses. These are directly testable because they refer to the sample statistics using Latin symbols, and not the population parameters. Notice that these research hypotheses contradict the equality represented by the null hypotheses. Research hypotheses can be directional (i.e., predicting that one statistic is greater than or less than the other) or they can be non-directional (i.e., predicting that the two statistics are different but not specifying how they differ).

Directional hypothesis that treatment (t) group mean is greater than control (c) group mean
    H₁: X_t > X_c

Non-directional hypothesis that pre-test (pre) and post-test (post) means are different (i.e., not equal)
    H₁: X_pre ≠ X_post

Directional hypothesis that a direct (i.e., positive) linear relationship between age (a) and experience (e) exists
    H₁: r_ae > 0

Well written hypotheses should have the following characteristics. They should:

Be declarative statements making specific predictions.
Identify a specific expected relationship.
Have a firm theory or literature base.
Be concise and to the point.
Be testable - allowing for the collection of data measuring variables in a systematic, unambiguously way.

Probability and the Normal Curve

Probability is the mathematical study of chance. Early applications of probability involved understanding games of chance, which can be viewed as repetitions of independent events. For example, if you flip a coin multiple times, each coin flip is independent of the previous one and next one. Similarly, if you roll a pair of dice repeatedly, each roll is independent of the previous one or next one. The chance that you will see heads on a single coin flip is always the same - .5 or 1 out of 2, because there are two possible outcomes and heads is one of those. If you flip a coin 10 times, however, you probably won't see exactly five heads - there may be six or four or even nine heads. It is possible, although unlikely with a fair coin, to see all 10 flips result in heads.

Consider another example to understand about probability. Suppose you are rolling a pair of dice repeatedly and adding up the dots on the top of each die. A die has six sides, numbered 1 to 6. Because you are adding two dice, the possible sums range between 2 (for snake-eyes - two 1s) and 12 (for box-cars - two 6s). Here is a table of the possible combinations of dice. The number appearing on the dice are indicated by the row and column labels and the sums are shown in each interior cell.

Values for Dice	1	2	3	4	5	6
1	2	3	4	5	6	7
2	3	4	5	6	7	8
3	4	5	6	7	8	9
4	5	6	7	8	9	10
5	6	7	8	9	10	11
6	7	8	9	10	11	12

There are 36 (6x6) possible outcomes, some of which result in the same sum. For example, a sum of 7 can occur six different ways. Notice that there is only one way to obtain a sum of 2 and one way for a 12. These are the least likely sums. The probability associated with each sum is the ratio of the number of ways it can occur divided by the total number of possibilities. For example, the probability of obtaining a sum of 3 is 2 out of 36 or .056 presenting 5.6% of the potential outcomes. What is the probability of obtaining a sum of 11, or 8, or 5? What sum is the most likely?

If you repeatedly rolled two dice many, many times and record the sums that you obtained, their frequencies would eventually resemble the pattern shown in the table. By counting the number of times 7 occurred, for example, and dividing that number by the number of rolls, you can calculate the relative frequency of a sum of 7. Over many repetitions of the dice roll, the relative frequencies get closer and closer to the theoretical probabilities. This is called the Law of Large Numbers.

Graphing the probabilities results in the following pattern. Instead of two dice, if you rolled three the pattern would begin to look more bell-shaped. The more dice involved, the closer the pattern gets to the normal (bell-shaped) curve. This is called the Central Limit Theorem. Relating this back to our context of sampling and making inferences, when drawing a random sample from a population and calculating the mean for a variable many times independently, the shape of the resulting distribution of means will be resemble the normal curve.

probability distribution for sum of three dice

Reading about sampling and the Central Limit Theorem and viewing pictures doesn't convey the message as well as seeing the process unfold through an animation. Visit this site (http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html), read the instructions, and experiment with drawing different sized samples from different types of population distributions.

Properties of Normal Distributions

There are actually infinitely many normal distributions - they differ by the value of their means and standard deviations. Here are some important properties of normal distributions:

The distribution is symmetric, which results in the mean, median, and mode all being equal.
The area under the plot of the distribution (i.e., area between the x-axis and the curve) equals 1, representing 100% when considered as relative frequencies.
The height of the curve approximates a relative frequency, but the area under any single point is 0.
The tails (left and right extremes) of the curve approach, not never touch, the x-axis (i.e., the tails are asymptotic to the x-axis).

Use of the Normal Distribution

Any raw score from a normal distribution can be mapped onto a normal curve if the mean and standard deviation associated with the raw score is known. The graph below illustrates the mapping of several different types of scores. Notice that percentiles can be derived from the curve as well.

mapping of raw scores and others to normal curve

mapping of raw scores and others to normal curve

Notice that approximately 68% of the cases are within one standard deviation of the mean, 95% are within two standard deviations, and over 99% are within three standard deviations.

The Standard Normal Distribution

Even though there are many normal distributions, one of those has been designated to be the "standard" normal distribution. The standard normal distribution is the normal distribution with a mean of 0 and a standard deviation of 1. Notice the line labeled Z scores in the graph above. Compare this line to the line labeled Standard Deviations just below the x-axis. Notice that a z score of +1.0 corresponds to a point under the curve labeled +1σ. Any raw score can be converted to a z score and back using these formulas:

z = (X - X) / s

and

X = s * z + X

where X is the mean of the distribution of raw scores and s is the standard deviation of the distribution of raw scores.

Here is an example of how to use the conversion formulas:

Suppose you have a score of 15 on a test where the mean was 10 and the standard deviation was 2.5. The z score equivalent to a raw score of 15 is

z = (15 - 10) / 2.5 = 5 / 2.5 = 2

Another way to describe this score is that it is 2 standard deviations above the mean. In a normal distribution, a score of raw score of 15 would be higher than 97.7% of the other scores. Look at the graph above to locate the source of 97.7%.

Here is a related example. Suppose you wanted to provide extra help to anyone who scored more than one standard deviation below the mean. What raw scores should you look for? Scoring one standard deviation below the mean results in a z score of -1. To determine the associated raw score, use the second conversion formula

X = 2.5 * (-1) + 10 = -2.5 + 10 = 7.5

Anyone scoring 7.5 or lower would be offered extra help. By inspecting the graph above, can you determine what percentage of scores this would represent?

Converting to z scores allows raw scores from different testing situations to be compared. For example, which is better a 20 in Ms. Chan's class where the mean was 12 and the standard deviation was 6, or a 15 in Mr. Williams' class where the mean was also 12 but the standard deviation was 2? Let's compare the z scores.

z_c = (20 - 12) / 6 = 8 / 6 = 1.3

z_w = (15 - 12) / 2 = 3 / 2 = 1.5

The 15 in Mr. Williams' class is better than the 20 in Ms. Chan's class, by .2 standard deviations.

Visit this site (http://psych.colorado.edu/~mcclella/java/normal/normz.html) to use an applet that will convert a raw score, with its associated mean and standard deviation, to a z score. This site (http://davidmlane.com/hyperstat/z_table.html) calculates the area under the curve for different z values and intervals.