Summary results files for the USC Understanding Coronavirus in America
tracking survey.

The files posted here contain the data that are plotted in the graphs,
plus some additional information. The filenames indicate which information
is available in which file:
- usa indicates national estimates, lac indicates estimates for Los Angeles
County.
- 7d indicates that the estimates are computed for a 7-day period.
- age, edu, inc, rac, sex indicate breakdowns by demographics: age group,
educational attainment, household income, race-ethnicity, and sex, respectively.

The files all have the same structure:
- They are tab delimited text files
- The first row contains the variable names
- The remaining rows have results for one date each. (A "date" is the last
  day of a 7-day rolling window, so e.g., "04/07/2020" uses observations
  from 04/01/2020-04/07/2020.)
- The first column is the date (MM/DD/2020).

The remaining columns are organized in blocks of variables. For the files
that do not include demographic breakdowns, each block consists of five
columns: "N" contains the sample size for the estimate, "avg" contains the
estimate itself (an average or percentage, both designated by "avg"),
"se" contains the standard error, and "lo" and "up" contain the lower and
upper limit of the confidence interval, respectively. 

After this designator and an underscore comes the name of the outcome, of
the form "probxxx". "prob" indicates it is a recoded or derived variable.
All variables here are considered derived variables. In most cases, this is
because a categorical variable is recoded to one or more binary variables.
In some cases, the variable is unprocessed, but named "prob" here for
consistency. The "prob" part originates in the first few variables we worked
on, which were probability variables or percentages, but it is used for all
variables here, so should just be interpreted as "derived variable". The
"xxx" in "probxxx" contains the name of the variable in the raw data (see
the codebooks of the surveys for descriptions of these variables). For
example, "probcr001a" is a recode of the variable "cr001a" from the raw data
file. Categorical variables are typically recoded such that "1" means "yes",
"agree", etc., "0" means "no", "disagree", etc., and "unsure" is set to missing.

Occasionally, a derived variable is based on more than one original variable.
In such a case, we have adapted the name slightly to indicate this; examples
are "probcr027anx", which is an anxiety score computed as the sum of recoded
versions of "cr027a" and "cr027b", and "problr010to012gross", which is a
gross earnings variable derived from "lr010", "lr011", and "lr012". Also,
occasionally, we split a categorical variable into dummies for the separate
categories, e.g., "problr001c1"-"problr001c6" are the dummies indicating
categories 1-6 of the original categorical variable "lr001".

In the files containing demographic breakdowns, this structure is further
elaborated: "Ntot_probxxx" has the total sample size for the estimates for
variable "probxxx" across all categories of the demographic. After this,
there is a block of the five aforementioned columns for each category of the
demographic, with the category number preceding the outcome in the column
name. For example, in the file "usa_sex_7d.csv", the column "avg_1_probcr003"
has the estimates for the outcome "cr003" for category "1" of demographic
"sex", i.e., men, and "N_1_probcr003" is the sample size for this estimate.
So, for example, Ntot_probcr003 = N_1_probcr003 + N_2_probcr003.

The demographic categories are:

sex: 1 Men
sex: 2 Women

rac: 1 White (alone; non-Hispanic)
rac: 2 Black (alone; non-Hispanic)
rac: 3 Other (and mixed; non-Hispanic)
rac: 4 Hispanic (any race or mixed)

age: 1 18-39
age: 2 40-50
age: 3 51-64
age: 4 65+

edu: 1 High school or less
edu: 2 Some College
edu: 3 Bachelor's or more

inc: 1 0-29,999
inc: 2 30,000-59,999
inc: 3 60,00-99,999
inc: 4 100,000 or more

The csv files that can be downloaded from the context menu of each graph
contain the relevant subset of one of these files that is depicted in
the particular graph. For example, the graph for "currently has a job"
only has the date and the N, avg, se,, lo, and up for the outcome problr004.