In this worksheet we will work through multiple ways to create summary statistic tables in Stata. We go from the most basic approach to the most polished, publication-ready table. We use the built-in nlsw88 dataset (National Longitudinal Survey of Women, 1988). Our binary grouping variable is union (1 = union member, 0 = not). Recall that for the homework you can build this table with excel as well, but if you wanted to learn a bit on ways of doing it with STATA, here are a couple:


Setup

Before we begin, let’s load the data and see what we’re working with.

sysuse nlsw88, clear

describe union wage age tenure ttl_exp hours

Output:

Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
union           byte    %8.0g      unionlbl   Union worker
wage            float   %9.0g                 Hourly wage
age             byte    %8.0g                 Age in current year
tenure          float   %9.0g                 Job tenure (years)
ttl_exp         float   %9.0g                 Total work experience (years)
hours           byte    %8.0g                 Usual hours worked

Let’s check how our grouping variable looks:

tab union
      Union |
     worker |      Freq.     Percent        Cum.
------------+-----------------------------------
   Nonunion |      1,417       75.45       75.45
      Union |        461       24.55      100.00
------------+-----------------------------------
      Total |      1,878      100.00

About 75% are non-union and 25% are union. Now let’s define a global with the variables we want to summarize:

global sumvars "wage age tenure ttl_exp hours"

Method 1: summarize with if (The Basics)

The simplest approach. Just run summarize twice, once for each group.

summarize $sumvars if union == 1
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |        461    8.674294    4.174539    1.80602   39.23074
         age |        461    39.28416    3.022299         34         46
      tenure |        460    7.888225    6.105057          0   25.91667
     ttl_exp |        461    13.25391    4.553527   1.474359   25.98718
       hours |        461    38.65944    9.110139          2         70
summarize $sumvars if union == 0
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,417    7.204669    4.103694   1.151368   30.96618
         age |      1,417    39.20536    3.039435         34         46
      tenure |      1,408    6.140743    5.413678          0      24.75
     ttl_exp |      1,417    12.67667      4.6162   .1153846   28.88461
       hours |      1,416    37.26201    10.22723          1         80

Pros: Very easy. No extra packages needed.