Quantitative Data Analysis
9 Presenting the Results of Quantitative Analysis
Mikaila Mariel Lemonik Arthur
This chapter provides an overview of how to present the results of quantitative analysis, in particular how to create effective tables for displaying quantitative results and how to write quantitative research papers that effectively communicate the methods used and findings of quantitative analysis.
Writing the Quantitative Paper
Standard quantitative social science papers follow a specific format. They begin with a title page that includes a descriptive title, the author(s)’ name(s), and a 100 to 200 word abstract that summarizes the paper. Next is an introduction that makes clear the paper’s research question, details why this question is important, and previews what the paper will do. After that comes a literature review, which ends with a summary of the research question(s) and/or hypotheses. A methods section, which explains the source of data, sample, and variables and quantitative techniques used, follows. Many analysts will include a short discussion of their descriptive statistics in the methods section. A findings section details the findings of the analysis, supported by a variety of tables, and in some cases graphs, all of which are explained in the text. Some quantitative papers, especially those using more complex techniques, will include equations. Many papers follow the findings section with a discussion section, which provides an interpretation of the results in light of both the prior literature and theory presented in the literature review and the research questions/hypotheses. A conclusion ends the body of the paper. This conclusion should summarize the findings, answering the research questions and stating whether any hypotheses were supported, partially supported, or not supported. Limitations of the research are detailed. Papers typically include suggestions for future research, and where relevant, some papers include policy implications. After the body of the paper comes the works cited; some papers also have an Appendix that includes additional tables and figures that did not fit into the body of the paper or additional methodological details. While this basic format is similar for papers regardless of the type of data they utilize, there are specific concerns relating to quantitative research in terms of the methods and findings that will be discussed here.
Methods
In the methods section, researchers clearly describe the methods they used to obtain and analyze the data for their research. When relying on data collected specifically for a given paper, researchers will need to discuss the sample and data collection; in most cases, though, quantitative research relies on pre-existing datasets. In these cases, researchers need to provide information about the dataset, including the source of the data, the time it was collected, the population, and the sample size. Regardless of the source of the data, researchers need to be clear about which variables they are using in their research and any transformations or manipulations of those variables. They also need to explain the specific quantitative techniques that they are using in their analysis; if different techniques are used to test different hypotheses, this should be made clear. In some cases, publications will require that papers be submitted along with any code that was used to produce the analysis (in SPSS terms, the syntax files), which more advanced researchers will usually have on hand. In many cases, basic descriptive statistics are presented in tabular form and explained within the methods section.
Findings
The findings sections of quantitative papers are organized around explaining the results as shown in tables and figures. Not all results are depicted in tables and figures—some minor or null findings will simply be referenced—but tables and figures should be produced for all findings to be discussed at any length. If there are too many tables and figures, some can be moved to an appendix after the body of the text and referred to in the text (e.g. “See Table 12 in Appendix A”).
Discussions of the findings should not simply restate the contents of the table. Rather, they should explain and interpret it for readers, and they should do so in light of the hypothesis or hypotheses that are being tested. Conclusions—discussions of whether the hypothesis or hypotheses are supported or not supported—should wait for the conclusion of the paper.
Creating Effective Tables
When creating tables to display the results of quantitative analysis, the most important goals are to create tables that are clear and concise but that also meet standard conventions in the field. This means, first of all, paring down the volume of information produced in the statistical output to just include the information most necessary for interpreting the results, but doing so in keeping with standard table conventions. It also means making tables that are well-formatted and designed, so that readers can understand what the tables are saying without struggling to find information. For example, tables (as well as figures such as graphs) need clear captions; they are typically numbered and referred to by number in the text. Columns and rows should have clear headings. Depending on the content of the table, formatting tools may need to be used to set off header rows/columns and/or total rows/columns; cell-merging tools may be necessary; and shading may be important in tables with many rows or columns.
Here, you will find some instructions for creating tables of results from descriptive, crosstabulation, correlation, and regression analysis that are clear, concise, and meet normal standards for data display in social science. In addition, after the instructions for creating tables, you will find an example of how a paper incorporating each table might describe that table in the text.
Descriptive Statistics
When presenting the results of descriptive statistics, we create one table with columns for each type of descriptive statistic and rows for each variable. Note, of course, that depending on level of measurement only certain descriptive statistics are appropriate for a given variable, so there may be many cells in the table marked with an — to show that this statistic is not calculated for this variable. So, consider the set of descriptive statistics below, for occupational prestige, age, highest degree earned, and whether the respondent was born in this country.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
To display these descriptive statistics in a paper, one might create a table like Table 2. Note that for discrete variables, we use the value label in the table, not the value.
Occupational Prestige Score | Age | Highest Degree Earned | Born in This Country? | |
Mean | 46.54 | 52.16 | — | 1.11 |
Median | 47 | 53 | 1: Associates (9.2%) | 1: Yes (88.8%) |
Mode | — | — | 2: High School (39.8%) | — |
Standard Deviation | 13.811 | 17.233 | — | — |
Variance | 190.745 | 296.988 | — | — |
Skewness | 0.141 | 0.018 | — | — |
Kurtosis | -0.809 | -1.018 | — | — |
Range | 64 (16-80) | 71 (18-89) | Less than High School (0) – Graduate (4) | — |
Interquartile Range | 35-59 | 37-66 | — | — |
N |
3873 | 3699 | 4009 | 3960 |
If we were then to discuss our descriptive statistics in a quantitative paper, we might write something like this (note that we do not need to repeat every single detail from the table, as readers can peruse the table themselves):
This analysis relies on four variables from the 2021 General Social Survey: occupational prestige score, age, highest degree earned, and whether the respondent was born in the United States. Descriptive statistics for all four variables are shown in Table 2. The median occupational prestige score is 47, with a range from 16 to 80. 50% of respondents had occupational prestige scores scores between 35 and 59. The median age of respondents is 53, with a range from 18 to 89. 50% of respondents are between ages 37 and 66. Both variables have little skew. Highest degree earned ranges from less than high school to a graduate degree; the median respondent has earned an associate’s degree, while the modal response (given by 39.8% of the respondents) is a high school degree. 88.8% of respondents were born in the United States.
Crosstabulation
When presenting the results of a crosstabulation, we simplify the table so that it highlights the most important information—the column percentages—and include the significance and association below the table. Consider the SPSS output below.
R’s subjective class identification | Total | ||||||
---|---|---|---|---|---|---|---|
lower class | working class | middle class | upper class | ||||
R’s highest degree | less than high school | Count | 65 | 106 | 68 | 7 | 246 |
% within R’s subjective class identification | 18.8% | 7.1% | 3.4% | 4.2% | 6.2% | ||
high school | Count | 217 | 800 | 551 | 23 | 1591 | |
% within R’s subjective class identification | 62.9% | 53.7% | 27.6% | 13.9% | 39.8% | ||
associate/junior college | Count | 30 | 191 | 144 | 3 | 368 | |
% within R’s subjective class identification | 8.7% | 12.8% | 7.2% | 1.8% | 9.2% | ||
bachelor’s | Count | 27 | 269 | 686 | 49 | 1031 | |
% within R’s subjective class identification | 7.8% | 18.1% | 34.4% | 29.5% | 25.8% | ||
graduate | Count | 6 | 123 | 546 | 84 | 759 | |
% within R’s subjective class identification | 1.7% | 8.3% | 27.4% | 50.6% | 19.0% | ||
Total | Count | 345 | 1489 | 1995 | 166 | 3995 | |
% within R’s subjective class identification | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% |
Value | df | Asymptotic Significance (2-sided) | |
---|---|---|---|
Pearson Chi-Square | 819.579a | 12 | <.001 |
Likelihood Ratio | 839.200 | 12 | <.001 |
Linear-by-Linear Association | 700.351 | 1 | <.001 |
N of Valid Cases | 3995 | ||
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.22. |
Value | Asymptotic Standard Errora | Approximate Tb | Approximate Significance | ||
---|---|---|---|---|---|
Interval by Interval | Pearson’s R | .419 | .013 | 29.139 | <.001c |
Ordinal by Ordinal | Spearman Correlation | .419 | .013 | 29.158 | <.001c |
N of Valid Cases | 3995 | ||||
a. Not assuming the null hypothesis. | |||||
b. Using the asymptotic standard error assuming the null hypothesis. | |||||
c. Based on normal approximation. |
Table 4 shows how a table suitable for include in a paper might look if created from the SPSS output in Table 3. Note that we use asterisks to indicate the significance level of the results: * means p < 0.05; ** means p < 0.01; *** means p < 0.001; and no stars mean p > 0.05 (and thus that the result is not significant). Also note than N is the abbreviation for the number of respondents.
Respondent’s Subjective Class Identification | ||||||
Lower Class | Working Class | Middle Class | Upper Class | Total | ||
Highest Degree Earned | Less than High School | 18.8% | 7.1% | 3.4% | 4.2% | 6.2% |
High School | 62.9% | 53.7% | 27.6% | 13.9% | 39.8% | |
Associate’s / Junior College | 8.7% | 12.8% | 7.2% | 1.8% | 9.2% | |
Bachelor’s | 7.8% | 18.1% | 34.4% | 29.5% | 25.8% | |
Graduate | 1.7% | 8.3% | 27.4% | 50.6% | 19.0% | |
N: 3995 Spearman Correlation 0.419*** |
If we were going to discuss the results of this crosstabulation in a quantitative research paper, the discussion might look like this:
A crosstabulation of respondent’s class identification and their highest degree earned, with class identification as the independent variable, is significant, with a Spearman correlation of 0.419, as shown in Table 4. Among lower class and working class respondents, more than 50% had earned a high school degree. Less than 20% of poor respondents and less than 40% of working-class respondents had earned more than a high school degree. In contrast, the majority of middle class and upper class respondents had earned at least a bachelor’s degree. In fact, 50% of upper class respondents had earned a graduate degree.
Correlation
When presenting a correlating matrix, one of the most important things to note is that we only present half the table so as not to include duplicated results. Think of the line through the table where empty cells exist to represent the correlation between a variable and itself, and include only the triangle of data either above or below that line of cells. Consider the output in Table 5.
Age of respondent | R’s occupational prestige score (2010) | Highest year of school R completed | R’s family income in 1986 dollars | ||
---|---|---|---|---|---|
Age of respondent | Pearson Correlation | 1 | .087** | .014 | .017 |
Sig. (2-tailed) | <.001 | .391 | .314 | ||
N | 3699 | 3571 | 3683 | 3336 | |
R’s occupational prestige score (2010) | Pearson Correlation | .087** | 1 | .504** | .316** |
Sig. (2-tailed) | <.001 | <.001 | <.001 | ||
N | 3571 | 3873 | 3817 | 3399 | |
Highest year of school R completed | Pearson Correlation | .014 | .504** | 1 | .360** |
Sig. (2-tailed) | .391 | <.001 | <.001 | ||
N | 3683 | 3817 | 3966 | 3497 | |
R’s family income in 1986 dollars | Pearson Correlation | .017 | .316** | .360** | 1 |
Sig. (2-tailed) | .314 | <.001 | <.001 | ||
N | 3336 | 3399 | 3497 | 3509 | |
**. Correlation is significant at the 0.01 level (2-tailed). |
Table 6 shows what the contents of Table 5 might look like when a table is constructed in a fashion suitable for publication.
Age | Occupational Prestige Score | Highest Year of School Completed | Family Income in 1986 Dollars | |
Age | 1 | |||
Occupational Prestige Score | 0.087*** | 1 | ||
Highest Year of School Completed | 0.014 | 0.504*** | 1 | |
Family Income in 1986 Dollars | 0.017 | 0.316*** | 0.360*** | 1 |
If we were to discuss the results of this bivariate correlation analysis in a quantitative paper, the discussion might look like this:
Bivariate correlations were run among variables measuring age, occupational prestige, the highest year of school respondents completed, and family income in constant 1986 dollars, as shown in Table 6. Correlations between age and highest year of school completed and between age and family income are not significant. All other correlations are positive and significant at the p<0.001 level. The correlation between age and occupational prestige is weak; the correlations between income and occupational prestige and between income and educational attainment are moderate, and the correlation between education and occupational prestige is strong.
Regression
To present the results of a regression, we create one table that includes all of the key information from the multiple tables of SPSS output. This includes the R2 and significance of the regression, either the B or the beta values (different analysts have different preferences here) for each variable, and the standard error and significance of each variable. Consider the SPSS output in Table 7.
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
---|---|---|---|---|
1 | .395a | .156 | .155 | 36729.04841 |
a. Predictors: (Constant), Highest year of school R completed, Age of respondent, R’s occupational prestige score (2010) |
Model | Sum of Squares | df | Mean Square | F | Sig. | |
---|---|---|---|---|---|---|
1 | Regression | 805156927306.583 | 3 | 268385642435.528 | 198.948 | <.001b |
Residual | 4351948187487.015 | 3226 | 1349022996.741 | |||
Total | 5157105114793.598 | 3229 | ||||
a. Dependent Variable: R’s family income in 1986 dollars | ||||||
b. Predictors: (Constant), Highest year of school R completed, Age of respondent, R’s occupational prestige score (2010) |
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | Collinearity Statistics | |||
---|---|---|---|---|---|---|---|---|
B | Std. Error | Beta | Tolerance | VIF | ||||
1 | (Constant) | -44403.902 | 4166.576 | -10.657 | <.001 | |||
Age of respondent | 9.547 | 38.733 | .004 | .246 | .805 | .993 | 1.007 | |
R’s occupational prestige score (2010) | 522.887 | 54.327 | .181 | 9.625 | <.001 | .744 | 1.345 | |
Highest year of school R completed | 3988.545 | 274.039 | .272 | 14.555 | <.001 | .747 | 1.339 | |
a. Dependent Variable: R’s family income in 1986 dollars |
The regression output in shown in Table 7 contains a lot of information. We do not include all of this information when making tables suitable for publication. As can be seen in Table 8, we include the Beta (or the B), the standard error, and the significance asterisk for each variable; the R2 and significance for the overall regression; the degrees of freedom (which tells readers the sample size or N); and the constant; along with the key to p/significance values.
Beta & SE | |
Age | 0.004 (38.733) |
Occupational Prestige Score | 0.181*** (54.327) |
Highest Year of School Completed | 0.272*** (274.039) |
R2 | 0.156*** |
Degrees of Freedom | 3229 |
Constant | -44,403.902 |
* p<0.05 **p<0.01 ***p<0.001 |
If we were to discuss the results of this regression in a quantitative paper, the results might look like this:
Table 8 shows the results of a regression in which age, occupational prestige, and highest year of school completed are the independent variables and family income is the dependent variable. The regression results are significant, and all of the independent variables taken together explain 15.6% of the variance in family income. Age is not a significant predictor of income, while occupational prestige and educational attainment are. Educational attainment has a larger effect on family income than does occupational prestige. For every year of additional education attained, family income goes up on average by $3,988.545; for every one-unit increase in occupational prestige score, family income goes up on average by $522.887.[1]
Exercises
- Choose two discrete variables and three continuous variables from a dataset of your choice. Produce appropriate descriptive statistics on all five of the variables and create a table of the results suitable for inclusion in a paper.
- Using the two discrete variables you have chosen, produce an appropriate crosstabulation, with significance and measure of association. Create a table of the results suitable for inclusion in a paper.
- Using the three continuous variables you have chosen, produce a correlation matrix. Create a table of the results suitable for inclusion in a paper.
- Using the three continuous variables you have chosen, produce a multivariate linear regression. Create a table of the results suitable for inclusion in a paper.
- Write a methods section describing the dataset, analytical methods, and variables you utilized in questions 1, 2, 3, and 4 and explaining the results of your descriptive analysis.
- Write a findings section explaining the results of the analyses you performed in questions 2, 3, and 4.
- Note that the actual numberical increase comes from the B values, which are shown in the SPSS output in Table 7 but not in the reformatted Table 8. ↵