So what is different about inferential statistics? Instead of simply attempting to summarize and organize a distribution of data, inferential statistics are used to extend results generated from a subgroup of observations to a larger context - in other words, the purpose of inferential statistics is to generalize from samples to populations. This section will describe the fundamental concepts of the process of using inferential statistics.

A research hypothesis is the focal point of statistical inference. A research hypothesis makes a prediction, based on a theoretical foundation, that identifies an anticipated outcome - for example, higher reading scores, due to a new instructional technique. The research hypothesis is generated from a, usually broader, research question. An example of a research question might ask whether there is a difference between a new way of teaching and a traditional method. Research questions, in turn, are narrower investigations of a more general research purpose, which seeks to address an important problem in the field. So, you can think of a research hypothesis as the end of a series of narrower, more focused, statements about a research topic.

A research hypothesis aids the research study by providing a very specific and precise prediction to test. The test of the hypothesis involves sampling and then collecting data. The nature of the sample affects the ability of the researcher to generalize the results. A full introduction to sampling techniques is beyond the scope of this introduction, but the important aspect of sampling for statistical purposes involves the concept of representation. A quantitative researcher intends to apply her results to a larger group than the one she studied. The larger group is called the population. In order to generalize, the group involved in the study, called the sample, must resemble the population on variables that are important to the research. The simplest, but usually costliest, way to sample is randomly - the equivalent of pulling names out of a hat. This process is usually costly in terms of time, money, and logistics; so, a compromise is made for practical reasons. The compromise, and the act of sampling itself, introduce errors in the process; in other words, the smaller sample may differ from the population in some important way. A researcher cannot know whether this situation has actually happened though. Instead, the researcher will use statistics to describe the sample's characteristics and then infer from those sample characteristics what the properties of the larger population are. For example, testing a new reading method on a properly selected subset of all third-grade students allows a researcher to make statements about the value of the reading program for all third-grade students. Numbers pertaining to the sample are called statistics, and numbers pertaining to the population are called parameters. Statisticians use different symbols to distinguish the two set of numbers. Latin symbols (e.g., X, s, r) are used to represent sample statistics, and Greek symbols (e.g., μ, σ, ρ) [named mu, sigma, and rho] are used to represent population parameters. Here are several relationships to remember:

- Symbols for the mean: X (sample mean) is used to estimate μ (population mean)
- Symbols for the standard deviation: s (sample standard deviation) is used to estimate σ (population standard deviation)
- Symbols for the Pearson correlation coefficient: r (sample correlation) is used to estimate ρ (population correlation)

Here are some examples of null hypotheses:

Comparison of treatment (t) and control (c) group means

H

Comparison of pre-test (pre) and post-test (post) means

H

Test of linear relationship between age (a) and experience (e)

H

Notice that null hypotheses assume equality among groups, between variables, or over time. Researchers then collect data to see if there is evidence to the contrary. In this way, the null hypothesis provides a benchmark for assessing whether observed differences are due to chance or some other factor/variable (i.e., systematic differences).

Here are a few examples of research hypotheses, also called alternative hypotheses. These are directly testable because they refer to the sample statistics using Latin symbols, and not the population parameters. Notice that these research hypotheses contradict the equality represented by the null hypotheses. Research hypotheses can be directional (i.e., predicting that one statistic is greater than or less than the other) or they can be non-directional (i.e., predicting that the two statistics are different but not specifying how they differ).

Directional hypothesis that treatment (t) group mean is greater than control (c) group mean

H

Non-directional hypothesis that pre-test (pre) and post-test (post) means are different (i.e., not equal)

H

Directional hypothesis that a direct (i.e., positive) linear relationship between age (a) and experience (e) exists

H

Well written hypotheses should have the following characteristics. They should:

- Be declarative statements making specific predictions.
- Identify a specific expected relationship.
- Have a firm theory or literature base.
- Be concise and to the point.
- Be testable - allowing for the collection of data measuring variables in a systematic, unambiguously way.

Consider another example to understand about probability. Suppose you are rolling a pair of dice repeatedly and adding up the dots on the top of each die. A die has six sides, numbered 1 to 6. Because you are adding two dice, the possible sums range between 2 (for snake-eyes - two 1s) and 12 (for box-cars - two 6s). Here is a table of the possible combinations of dice. The number appearing on the dice are indicated by the row and column labels and the sums are shown in each interior cell.

Values for Dice |
1 | 2 | 3 | 4 | 5 | 6 |

1 | 2 | 3 | 4 | 5 | 6 | 7 |

2 | 3 | 4 | 5 | 6 | 7 | 8 |

3 | 4 | 5 | 6 | 7 | 8 | 9 |

4 | 5 | 6 | 7 | 8 | 9 | 10 |

5 | 6 | 7 | 8 | 9 | 10 | 11 |

6 | 7 | 8 | 9 | 10 | 11 | 12 |

There are 36 (6x6) possible outcomes, some of which result in the same sum. For example, a sum of 7 can occur six different ways. Notice that there is only one way to obtain a sum of 2 and one way for a 12. These are the least likely sums. The probability associated with each sum is the ratio of the number of ways it can occur divided by the total number of possibilities. For example, the probability of obtaining a sum of 3 is 2 out of 36 or .056 presenting 5.6% of the potential outcomes. What is the probability of obtaining a sum of 11, or 8, or 5? What sum is the most likely?

If you repeatedly rolled two dice many, many times and record the sums that you obtained, their frequencies would eventually resemble the pattern shown in the table. By counting the number of times 7 occurred, for example, and dividing that number by the number of rolls, you can calculate the relative frequency of a sum of 7. Over many repetitions of the dice roll, the relative frequencies get closer and closer to the theoretical probabilities. This is called the Law of Large Numbers.

Graphing the probabilities results in the following pattern. Instead of two dice, if you rolled three the pattern would begin to look more bell-shaped. The more dice involved, the closer the pattern gets to the normal (bell-shaped) curve. This is called the Central Limit Theorem. Relating this back to our context of sampling and making inferences, when drawing a random sample from a population and calculating the mean for a variable many times independently, the shape of the resulting distribution of means will be resemble the normal curve.

Reading about sampling and the Central Limit Theorem and viewing pictures doesn't convey the message as well as seeing the process unfold through an animation. Visit this site (http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html), read the instructions, and experiment with drawing different sized samples from different types of population distributions.

There are actually infinitely many normal distributions - they differ by the value of their means and standard deviations. Here are some important properties of normal distributions:

- The distribution is symmetric, which results in the mean, median, and mode all being equal.
- The area under the plot of the distribution (i.e., area between the x-axis and the curve) equals 1, representing 100% when considered as relative frequencies.
- The height of the curve approximates a relative frequency, but the area under any single point is 0.
- The tails (left and right extremes) of the curve approach, not never touch, the x-axis (i.e., the tails are asymptotic to the x-axis).

Notice that approximately 68% of the cases are within one standard deviation of the mean, 95% are within two standard deviations, and over 99% are within three standard deviations.

z = (X - X) / s

and

X = s * z + X

where X is the mean of the distribution of raw scores and s is the standard deviation of the distribution of raw scores.

Here is an example of how to use the conversion formulas:

Suppose you have a score of 15 on a test where the mean was 10 and the standard deviation was 2.5. The z score equivalent to a raw score of 15 is

z = (15 - 10) / 2.5 = 5 / 2.5 = 2

Another way to describe this score is that it is 2 standard deviations above the mean. In a normal distribution, a score of raw score of 15 would be higher than 97.7% of the other scores. Look at the graph above to locate the source of 97.7%.

Here is a related example. Suppose you wanted to provide extra help to anyone who scored more than one standard deviation below the mean. What raw scores should you look for? Scoring one standard deviation below the mean results in a z score of -1. To determine the associated raw score, use the second conversion formula

X = 2.5 * (-1) + 10 = -2.5 + 10 = 7.5

Anyone scoring 7.5 or lower would be offered extra help. By inspecting the graph above, can you determine what percentage of scores this would represent?

Converting to z scores allows raw scores from different testing situations to be compared. For example, which is better a 20 in Ms. Chan's class where the mean was 12 and the standard deviation was 6, or a 15 in Mr. Williams' class where the mean was also 12 but the standard deviation was 2? Let's compare the z scores.

z

z

The 15 in Mr. Williams' class is better than the 20 in Ms. Chan's class, by .2 standard deviations.

Visit this site (http://psych.colorado.edu/~mcclella/java/normal/normz.html) to use an applet that will convert a raw score, with its associated mean and standard deviation, to a z score. This site (http://davidmlane.com/hyperstat/z_table.html) calculates the area under the curve for different z values and intervals.