Importance of Educational Statistics

Why is this material important?

Statistics is a branch of mathematics that involves making sense of data. Data are quantities obtained from some type of systematic observation. Presented as a list of numbers, data are difficult to comprehend. By using statistical techniques, researchers work with data to organize, categorize, condense, summarize, describe, illustrate, analyze, compare, synthesize, evaluate, and infer.

Whether you intend to employ qualitative methods or quantitative methods or mixed methods to investigate the research questions for your dissertation, you will encounter statistics in several contexts. The obvious application of statistics that you may encounter is the need to use statistics for data analysis in your quantitative or mixed methods study. Perhaps you are not planning to employ quantitative methods - what is the value of this course for qualitative researchers? If your study involves a group of participants, you may find it helpful and informative to describe your participants' characteristics. To help your reader understand your participants' context, you might also find it useful to describe the characteristics of the setting for your study. These examples represent the producer role of using statistics.

Even if you are not intending to include numerical data in your study, chances are very good that other researchers working in your topic have employed statistical techniques, which you need to understand and evaluate. So, instead of a producer role, everyone who conducts or reads research studies plays a critical consumer role. When you review previous research (Chapter 2 of your dissertation), if that research involves statistics, you must be able to understand how the researcher applied the statistical techniques and what the results mean. Furthermore, you need to be able to challenge the quantitative approach to research in meaningful ways. This course aims to build your skills in filling both roles.

Introduction to Statistics

Applications of Statistics

Think about when and where you've encountered statistics - in newspaper articles, on news reports, in journal articles, at the ball game, in advertising, at a swim meet, in product evaluations, in your child's homework, in your profession, in your classroom, or in testing results. Not only are statistical treatments of data widely found in our society, but these techniques are fundamental to almost all disciplines of study. Evidence of the increasing importance of statistics can be found in the early introduction of the topic in school curricula. For example, there is a content standard for statistics for kindergarten students. Thirty years ago, the first mention of statistics might have been in an introductory graduate research course. Let's explore possible reasons for this change.

Purpose

Statistical techniques are used to organize and analyze data, where data are unambiguous quantifications of observed phenomena (i.e., a regular way of assigning numbers to certain events or characteristics). In addition to organizing and analyzing data, statistical techniques include methods for illustrating the results of the statistical analysis. The prime example of this use of statistics involves the creation of graphical displays of information using charts and graphs. Not only is a picture worth a thousand words, it's worth even more numbers. Through these analyses and graphics, trends and other patterns may emerge that help researchers and others understand and explain phenomena in nature.

Brief History

One of the earliest applications of statistics derives from the need to make sense out of census data. In fact, the derivation of the word, statistics, includes the notion of understanding the political state. A census is used to understand the characteristics of the residents of a country. In the United States, a census is conducted every ten years. Among other decisions, the composition of the House of Representatives is based on census data. As you can see, the need to describe observed phenomena is central to the use of statistics.

The mathematics of statistics has its roots in probability theory - the branch of mathematics that deals with chance and uncertainty. Explaining the characteristics of games of chance through mathematics was one of the early goals of probability theory. Essentially understanding games of chance involves predicting outcomes (e.g., heads or tails; blackjack; a royal flush; a roll of seven). How might understanding phenomena and predicting outcomes be linked? They are linked through a process of reasoning called statistical inference, which uses a known state of nature as the basis for predictions about the future or about a wider context. To read more about the general background of statistics, you might visit: http://en.wikipedia.org/wiki/Statistics.

Role of Computers

So, statistics combines tools to describe phenomena and make inferences based on descriptions - why has the subject increased in popularity, as evident from its inclusion in kindergarten curricula? As with many modern trends, the answer can be linked to the influence of cheap, yet powerful, computers with graphical displays. Not too long ago, people who needed to make sense of data either needed to conduct tedious calculations by hand or needed access to a powerful mainframe computer. Imagine perusing lists of census data to determine the average household income in the Bay Area and then calculating the figure by hand. Now, not only are many results of statistical analyses readily available to you, but you can obtain the actual raw data and conduct your own analysis. Visit http://www.census.gov/ to view what is available. Putting these data and tools in the hands of more people requires a more general understanding of statistical concepts and techniques. As you will see first-hand, having the data and statistical software program is necessary but not sufficient for conducting meaningful analyses. That's the reason for the increased emphasis on topics like statistical reasoning and statistical literacy.

Statistics as a Subset of the Tools for Research

If you've completed the Research Methods course, you've been introduced to many tools for conducting research. The primary divisions of these tools is usually along the continuum that runs from qualitative, interpretive inquiry to quantitative, scientific investigations. At the quantitative end of the continuum lie most of the statistical techniques that we will study. There are many, many more that we will not have the time to study. So, one of the goals for the course is to introduce you to fundamental statistical concepts that underlie most statistical procedures. As with qualitative and quantitative approaches in general, the specific statistical tests and routines that you should use are determined by the questions you are addressing. As you will see in the text and elsewhere, your selection of appropriate statistical tools can be guided by decision trees or flow charts that lead you through a series of questions intended to identify the appropriate statistical test to use for your given situation.

Examples of Data

Data are all around us - physical characteristics, such as height, weight, eye color, dominant hand, gender, age, ethnicity; social characteristics, such as socioeconomic level, citizenship, residency, marital status, family structure, years of education; or school-based data, such as grade in school, test scores, number of absences, number of referrals, grades, placements, abilities, aptitude, achievements. Each item named in the previous list represents a variable, which is a set of data points, all of which represent the same construct. In the context of a research study, a variable is a set of data that comprise different values - the values of variables vary! If you are studying students at a boys' high school, gender is not a variable in your study, because there is only one value for gender - male.

Scan through the list in the preceding paragraph once more. In addition to the categories of physical, social, and school data, can you discern other differences between these variables? For example, would you describe the process of assigning of numbers to height to be similar to the process of assigning numbers to ethnic classifications? Hopefully, you find these two processes to differ in a fundamental way - namely, heights are measured with a measuring device like a tape measure and assigned a length whereas ethnicities are based on ancestry and any number assigned to a specific ethnicity is completely arbitrary. For example, you could assign a 1 to Asian or a 5 to Asian and, as long as no other group is assigned a 1 or 5, either would be an adequate quantification of ethnicity.

Let's describe these differences more formally. Variables are measured at different levels - called levels of measurement. There are four levels: nominal, ordinal, interval, and ratio. Before explaining each one, you might wonder why these measurement levels matter. The reason is simple - the level of measurement determines which statistical techniques are appropriate to use. The assignment of numbers to values of the variable is often called coding.

Nominal - the numbers assigned to the values of the variable are completely arbitrary - they are just labels that are consistently applied. For example, gender is a variable that typically has two values, male and female. You can assign 1 to represent male and 2 to represent female or vice versa or 5 to represent female and 3 to represent male. As long as all males are assigned the same number and all females are assigned the same number (and the two numbers are different), the coding of gender is appropriate. Variables measured at the nominal level are also called qualitative or categorical variables. When there are just two values, as is the case with gender, the variable is called dichotomous. Dichotomous variables are quite common in research because they represent the division of a sample into two groups.

Ordinal - the numbers assigned to the values of the variable indicate order but not the actual size. The prototype to remember is rankings. For example, runners who finish a race are labeled first, second, and third (these are called ordinal numbers, by the way). The place in which they finish does not indicate their actual time though. In fact, the difference between first place and second place may be 2 seconds, while the difference between second and third may be 10 seconds. In more formal terms, the intervals between consecutive values on an ordinal scale are not equal. Think of ranking your students by some ability - the top five students may be very close in ability levels and then there may be a substantial decrease between the fifth place student and the sixth place student.

Interval - the numbers assigned to the values of the variable indicate measured amounts. The intervals between these numbers are equal. For example, temperature is measured on an interval scale - the interval between 40 degrees and 50 degrees is the same as the interval between 70 degrees and 80 degrees. Most educational variables are treated as if they were measured on interval scales.

Ratio - the numbers assigned to the values of the variable meet the properties of an interval scale and include a "true" zero, which makes a comparison of values meaningful. A true zero is the complete lack of the measured quantity. If we counted the change in students' pockets, students without any change would be assigned a 0. Reporting that Juan had twice the change that Nina had would make sense. On the other hand, if we measured math ability using a math test, a student who scored 0 on the test can't be said to lack all math ability. Furthermore, reporting that Beatrix, who scored a 90, is twice as able, mathematically, as Que, who scored a 45, isn't meaningful.

Return to the previous list of variables and try your hand at classifying them according to these four levels of measurement.

Uses of Statistics - Types of Research Questions

Research questions that are answered through the use of statistics involve describing a current context or making an inference about a different, but related, setting. A typical descriptive question might be one that asks: What are the reading levels of first-grade students at ABC Elementary who use the Write-to-Read instructional program? A typical inferential question might be one that asks: What is the effect of the Write-to-Read instructional program for students in the XYZ district?

Descriptive Statistics

Descriptive statistics summarize data by reporting a number that represents the entire set of data. For example, the mean (average) score represents a summary of all of the scores on a test.
Descriptive statistics can also be used to organize a set of data. For example, a table of age ranges and tallies (frequencies) of participants within those age ranges helps to organize the observed values of the age variable.
Descriptive statistics allow researchers to illustrate entire sets of data so that overall patterns can be seen. For example, a pie chart showing ethnic classifications can describe these characteristics for a group of students. Likewise, a graph that compares reading levels and hearing abilities can illustrate how these two variables are related.

Inferential Statistics

Inferential statistics are used to compare groups on one or more variables. For example, reading ability of girls might be compared to that of boys for a subgroup of students in order to make general statements about the comparable abilities of girls and boys.
Inferential statistics can also be used to compare two variables within a group of people. For example, the relationship between nutritional habits and school achievement levels might be studied for a subgroup of students so that general statements might be made about how these two variables could be linked.
Similar to comparing variables, inferential statistics can be used to generate mathematical models that help to predict particular outcomes. For example, an admission officer might use historical data about high school performance and subsequent success in college to help inform an admissions decision.

Generalizing Results - from Sample to Population; from Today to Tomorrow; from Here to There

Remember that a researcher using quantitative methods, and specifically inferential statistics, is intent on producing results that generalize to other people, in other settings, and at other times. In order to achieve this goal, the proper selection of a particular set of participants, called a sample, is vital. In generalizing, the researcher has a large group of people in mind - this is called the population. If the generalization to the population from the sample is going to be seen by others as valid, the sample needs to mirror the larger population in every way that is determined to be important. Keep this important point in mind as you continue to read research reports and learn about statistics.

Suggestions for Succeeding in Studying Statistics

Heed the suggestions of our text's author very carefully. Math, in general, and statistics, in particular, are both very hierarchical by their nature. This means that later concepts and techniques depend on earlier ones. This situation can be either good or bad. If you build a solid foundation to start with and keep up with the material, you can build your skills incrementally. The downside is that if you do not understand a topic or concept, you cannot skip over it, hoping to avoid it in the future. Reading carefully and thoroughly, taking notes, working exercises, and practicing new skills will help you achieve success in this course.