Quantitative Data Analysis

9 Presenting the Results of Quantitative Analysis

Mikaila Mariel Lemonik Arthur

This chapter provides an overview of how to present the results of quantitative analysis, in particular how to create effective tables for displaying quantitative results and how to write quantitative research papers that effectively communicate the methods used and findings of quantitative analysis.

Writing the Quantitative Paper

Standard quantitative social science papers follow a specific format. They begin with a title page that includes a descriptive title, the author(s)’ name(s), and a 100 to 200 word abstract that summarizes the paper. Next is an introduction that makes clear the paper’s research question, details why this question is important, and previews what the paper will do. After that comes a literature review, which ends with a summary of the research question(s) and/or hypotheses. A methods section, which explains the source of data, sample, and variables and quantitative techniques used, follows. Many analysts will include a short discussion of their descriptive statistics in the methods section. A findings section details the findings of the analysis, supported by a variety of tables, and in some cases graphs, all of which are explained in the text. Some quantitative papers, especially those using more complex techniques, will include equations. Many papers follow the findings section with a discussion section, which provides an interpretation of the results in light of both the prior literature and theory presented in the literature review and the research questions/hypotheses. A conclusion ends the body of the paper. This conclusion should summarize the findings, answering the research questions and stating whether any hypotheses were supported, partially supported, or not supported. Limitations of the research are detailed. Papers typically include suggestions for future research, and where relevant, some papers include policy implications. After the body of the paper comes the works cited; some papers also have an Appendix that includes additional tables and figures that did not fit into the body of the paper or additional methodological details. While this basic format is similar for papers regardless of the type of data they utilize, there are specific concerns relating to quantitative research in terms of the methods and findings that will be discussed here.

Methods

In the methods section, researchers clearly describe the methods they used to obtain and analyze the data for their research. When relying on data collected specifically for a given paper, researchers will need to discuss the sample and data collection; in most cases, though, quantitative research relies on pre-existing datasets. In these cases, researchers need to provide information about the dataset, including the source of the data, the time it was collected, the population, and the sample size. Regardless of the source of the data, researchers need to be clear about which variables they are using in their research and any transformations or manipulations of those variables. They also need to explain the specific quantitative techniques that they are using in their analysis; if different techniques are used to test different hypotheses, this should be made clear. In some cases, publications will require that papers be submitted along with any code that was used to produce the analysis (in SPSS terms, the syntax files), which more advanced researchers will usually have on hand. In many cases, basic descriptive statistics are presented in tabular form and explained within the methods section.

Findings

The findings sections of quantitative papers are organized around explaining the results as shown in tables and figures. Not all results are depicted in tables and figures—some minor or null findings will simply be referenced—but tables and figures should be produced for all findings to be discussed at any length. If there are too many tables and figures, some can be moved to an appendix after the body of the text and referred to in the text (e.g. “See Table 12 in Appendix A”).

Discussions of the findings should not simply restate the contents of the table. Rather, they should explain and interpret it for readers, and they should do so in light of the hypothesis or hypotheses that are being tested. Conclusions—discussions of whether the hypothesis or hypotheses are supported or not supported—should wait for the conclusion of the paper.

Creating Effective Tables

When creating tables to display the results of quantitative analysis, the most important goals are to create tables that are clear and concise but that also meet standard conventions in the field. This means, first of all, paring down the volume of information produced in the statistical output to just include the information most necessary for interpreting the results, but doing so in keeping with standard table conventions. It also means making tables that are well-formatted and designed, so that readers can understand what the tables are saying without struggling to find information. For example, tables (as well as figures such as graphs) need clear captions; they are typically numbered and referred to by number in the text. Columns and rows should have clear headings. Depending on the content of the table, formatting tools may need to be used to set off header rows/columns and/or total rows/columns; cell-merging tools may be necessary; and shading may be important in tables with many rows or columns.

Here, you will find some instructions for creating tables of results from descriptive, crosstabulation, correlation, and regression analysis that are clear, concise, and meet normal standards for data display in social science. In addition, after the instructions for creating tables, you will find an example of how a paper incorporating each table might describe that table in the text.

Descriptive Statistics

When presenting the results of descriptive statistics, we create one table with columns for each type of descriptive statistic and rows for each variable. Note, of course, that depending on level of measurement only certain descriptive statistics are appropriate for a given variable, so there may be many cells in the table marked with an — to show that this statistic is not calculated for this variable. So, consider the set of descriptive statistics below, for occupational prestige, age, highest degree earned, and whether the respondent was born in this country.

Table 1. SPSS Ouput: Selected Descriptive Statistics
Statistics
R’s occupational prestige score (2010) Age of respondent
N Valid 3873 3699
Missing 159 333
Mean 46.54 52.16
Median 47.00 53.00
Std. Deviation 13.811 17.233
Variance 190.745 296.988
Skewness .141 .018
Std. Error of Skewness .039 .040
Kurtosis -.809 -1.018
Std. Error of Kurtosis .079 .080
Range 64 71
Minimum 16 18
Maximum 80 89
Percentiles 25 35.00 37.00
50 47.00 53.00
75 59.00 66.00
Statistics
R’s highest degree
N Valid 4009
Missing 23
Median 2.00
Mode 1
Range 4
Minimum 0
Maximum 4
R’s highest degree
Frequency Percent Valid Percent Cumulative Percent
Valid less than high school 246 6.1 6.1 6.1
high school 1597 39.6 39.8 46.0
associate/junior college 370 9.2 9.2 55.2
bachelor’s 1036 25.7 25.8 81.0
graduate 760 18.8 19.0 100.0
Total 4009 99.4 100.0
Missing System 23 .6
Total 4032 100.0
Statistics
Was r born in this country
N Valid 3960
Missing 72
Mean 1.11
Mode 1
Was r born in this country
Frequency Percent Valid Percent Cumulative Percent
Valid yes 3516 87.2 88.8 88.8
no 444 11.0 11.2 100.0
Total 3960 98.2 100.0
Missing System 72 1.8
Total 4032 100.0

To display these descriptive statistics in a paper, one might create a table like Table 2. Note that for discrete variables, we use the value label in the table, not the value.

Table 2. Descriptive Statistics
Occupational Prestige Score Age Highest Degree Earned Born in This Country?
Mean 46.54 52.16 1.11
Median 47 53 1: Associates (9.2%) 1: Yes (88.8%)
Mode 2: High School (39.8%)
Standard Deviation 13.811 17.233
Variance 190.745 296.988
Skewness 0.141 0.018
Kurtosis -0.809 -1.018
Range 64 (16-80) 71 (18-89) Less than High School (0) –  Graduate (4)
Interquartile Range 35-59 37-66
N
3873 3699 4009 3960

If we were then to discuss our descriptive statistics in a quantitative paper, we might write something like this (note that we do not need to repeat every single detail from the table, as readers can peruse the table themselves):

This analysis relies on four variables from the 2021 General Social Survey: occupational prestige score, age, highest degree earned, and whether the respondent was born in the United States. Descriptive statistics for all four variables are shown in Table 2. The median occupational prestige score is 47, with a range from 16 to 80. 50% of respondents had occupational prestige scores scores between 35 and 59. The median age of respondents is 53, with a range from 18 to 89. 50% of respondents are between ages 37 and 66. Both variables have little skew. Highest degree earned ranges from less than high school to a graduate degree; the median respondent has earned an associate’s degree, while the modal response (given by 39.8% of the respondents) is a high school degree. 88.8% of respondents were born in the United States.

Crosstabulation

When presenting the results of a crosstabulation, we simplify the table so that it highlights the most important information—the column percentages—and include the significance and association below the table. Consider the SPSS output below.

Table 3. R’s highest degree * R’s subjective class identification Crosstabulation
R’s subjective class identification Total
lower class working class middle class upper class
R’s highest degree less than high school Count 65 106 68 7 246
% within R’s subjective class identification 18.8% 7.1% 3.4% 4.2% 6.2%
high school Count 217 800 551 23 1591
% within R’s subjective class identification 62.9% 53.7% 27.6% 13.9% 39.8%
associate/junior college Count 30 191 144 3 368
% within R’s subjective class identification 8.7% 12.8% 7.2% 1.8% 9.2%
bachelor’s Count 27 269 686 49 1031
% within R’s subjective class identification 7.8% 18.1% 34.4% 29.5% 25.8%
graduate Count 6 123 546 84 759
% within R’s subjective class identification 1.7% 8.3% 27.4% 50.6% 19.0%
Total Count 345 1489 1995 166 3995
% within R’s subjective class identification 100.0% 100.0% 100.0% 100.0% 100.0%
Chi-Square Tests
Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 819.579a 12 <.001
Likelihood Ratio 839.200 12 <.001
Linear-by-Linear Association 700.351 1 <.001
N of Valid Cases 3995
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.22.
Symmetric Measures
Value Asymptotic Standard Errora Approximate Tb Approximate Significance
Interval by Interval Pearson’s R .419 .013 29.139 <.001c
Ordinal by Ordinal Spearman Correlation .419 .013 29.158 <.001c
N of Valid Cases 3995
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.

Table 4 shows how a table suitable for include in a paper might look if created from the SPSS output in Table 3. Note that we use asterisks to indicate the significance level of the results: * means p < 0.05; ** means p < 0.01; *** means p < 0.001; and no stars mean p > 0.05 (and thus that the result is not significant). Also note than N is the abbreviation for the number of respondents.

 
Respondent’s Subjective Class Identification
Lower Class Working Class Middle Class Upper Class Total
Highest Degree Earned Less than High School 18.8% 7.1% 3.4% 4.2% 6.2%
High School 62.9% 53.7% 27.6% 13.9% 39.8%
Associate’s / Junior College 8.7% 12.8% 7.2% 1.8% 9.2%
Bachelor’s 7.8% 18.1% 34.4% 29.5% 25.8%
Graduate 1.7% 8.3% 27.4% 50.6% 19.0%
N: 3995 Spearman Correlation 0.419***

If we were going to discuss the results of this crosstabulation in a quantitative research paper, the discussion might look like this:

A crosstabulation of respondent’s class identification and their highest degree earned, with class identification as the independent variable, is significant, with a Spearman correlation of 0.419, as shown in Table 4. Among lower class and working class respondents, more than 50% had earned a high school degree. Less than 20% of poor respondents and less than 40% of working-class respondents had earned more than a high school degree. In contrast, the majority of middle class and upper class respondents had earned at least a bachelor’s degree. In fact, 50% of upper class respondents had earned a graduate degree.

Correlation

When presenting a correlating matrix, one of the most important things to note is that we only present half the table so as not to include duplicated results. Think of the line through the table where empty cells exist to represent the correlation between a variable and itself, and include only the triangle of data either above or below that line of cells. Consider the output in Table 5.

Table 5. SPSS Output: Correlations
Age of respondent R’s occupational prestige score (2010) Highest year of school R completed R’s family income in 1986 dollars
Age of respondent Pearson Correlation 1 .087** .014 .017
Sig. (2-tailed) <.001 .391 .314
N 3699 3571 3683 3336
R’s occupational prestige score (2010) Pearson Correlation .087** 1 .504** .316**
Sig. (2-tailed) <.001 <.001 <.001
N 3571 3873 3817 3399
Highest year of school R completed Pearson Correlation .014 .504** 1 .360**
Sig. (2-tailed) .391 <.001 <.001
N 3683 3817 3966 3497
R’s family income in 1986 dollars Pearson Correlation .017 .316** .360** 1
Sig. (2-tailed) .314 <.001 <.001
N 3336 3399 3497 3509
**. Correlation is significant at the 0.01 level (2-tailed).

Table 6 shows what the contents of Table 5 might look like when a table is constructed in a fashion suitable for publication.

Table 6. Correlation Matrix
Age Occupational Prestige Score Highest Year of School Completed Family Income in 1986 Dollars
Age 1
Occupational Prestige Score 0.087*** 1
Highest Year of School Completed 0.014 0.504*** 1
Family Income in 1986 Dollars 0.017 0.316*** 0.360*** 1

If we were to discuss the results of this bivariate correlation analysis in a quantitative paper, the discussion might look like this:

Bivariate correlations were run among variables measuring age, occupational prestige, the highest year of school respondents completed, and family income in constant 1986 dollars, as shown in Table 6. Correlations between age and highest year of school completed and between age and family income are not significant. All other correlations are positive and significant at the p<0.001 level. The correlation between age and occupational prestige is weak; the correlations between income and occupational prestige and between income and educational attainment are moderate, and the correlation between education and occupational prestige is strong.

Regression

To present the results of a regression, we create one table that includes all of the key information from the multiple tables of SPSS output. This includes the R2 and significance of the regression, either the B or the beta values (different analysts have different preferences here) for each variable, and the standard error and significance of each variable. Consider the SPSS output in Table 7.

Table 7. SPSS Output: Regression
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .395a .156 .155 36729.04841
a. Predictors: (Constant), Highest year of school R completed, Age of respondent, R’s occupational prestige score (2010)
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 805156927306.583 3 268385642435.528 198.948 <.001b
Residual 4351948187487.015 3226 1349022996.741
Total 5157105114793.598 3229
a. Dependent Variable: R’s family income in 1986 dollars
b. Predictors: (Constant), Highest year of school R completed, Age of respondent, R’s occupational prestige score (2010)
Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics
B Std. Error Beta Tolerance VIF
1 (Constant) -44403.902 4166.576 -10.657 <.001
Age of respondent 9.547 38.733 .004 .246 .805 .993 1.007
R’s occupational prestige score (2010) 522.887 54.327 .181 9.625 <.001 .744 1.345
Highest year of school R completed 3988.545 274.039 .272 14.555 <.001 .747 1.339
a. Dependent Variable: R’s family income in 1986 dollars

The regression output in shown in Table 7 contains a lot of information. We do not include all of this information when making tables suitable for publication. As can be seen in Table 8, we include the Beta (or the B), the standard error, and the significance asterisk for each variable; the R2 and significance for the overall regression; the degrees of freedom (which tells readers the sample size or N); and the constant; along with the key to p/significance values.

Table 8. Regression Results for Dependent Variable Family Income in 1986 Dollars
Beta & SE
Age 0.004
(38.733)
Occupational Prestige Score 0.181***
(54.327)
Highest Year of School Completed 0.272***
(274.039)
R2 0.156***
Degrees of Freedom 3229
Constant -44,403.902
* p<0.05 **p<0.01 ***p<0.001

If we were to discuss the results of this regression in a quantitative paper, the results might look like this:

Table 8 shows the results of a regression in which age, occupational prestige, and highest year of school completed are the independent variables and family income is the dependent variable. The regression results are significant, and all of the independent variables taken together explain 15.6% of the variance in family income. Age is not a significant predictor of income, while occupational prestige and educational attainment are. Educational attainment has a larger effect on family income than does occupational prestige. For every year of additional education attained, family income goes up on average by $3,988.545; for every one-unit increase in occupational prestige score, family income goes up on average by $522.887.[1]

Exercises

  1. Choose two discrete variables and three continuous variables from a dataset of your choice. Produce appropriate descriptive statistics on all five of the variables and create a table of the results suitable for inclusion in a paper.
  2. Using the two discrete variables you have chosen, produce an appropriate crosstabulation, with significance and measure of association. Create a table of the results suitable for inclusion in a paper.
  3. Using the three continuous variables you have chosen, produce a correlation matrix. Create a table of the results suitable for inclusion in a paper.
  4. Using the three continuous variables you have chosen, produce a multivariate linear regression. Create a table of the results suitable for inclusion in a paper.
  5. Write a methods section describing the dataset, analytical methods, and variables you utilized in questions 1, 2, 3, and 4 and explaining the results of your descriptive analysis.
  6. Write a findings section explaining the results of the analyses you performed in questions 2, 3, and 4.

 


  1. Note that the actual numberical increase comes from the B values, which are shown in the SPSS output in Table 7 but not in the reformatted Table 8.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Social Data Analysis Copyright © 2021 by Mikaila Mariel Lemonik Arthur and Roger Clark is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.