## Exercise 15 – Data Analysis

In this exercise we are going to return to the General Social Survey and conduct a simple set of analyses.

**Codebooks: variables and attributes**

First, let’s talk a bit about the variables and the codebook for the GSS. Click on the Codebooks tab at the top of the page. Then on the standard codebook. Next click on Sequential Variable List. Here you will see a list of all of the topics that have been covered on the GSS surveys since 1972.

Let’s look first at some of the demographics. Find the heading Age, Gender, Race, and Ethnicity. Look through the variables that measure demographics.

- Which variable measures the race of the respondent?
- Click on RACE… what are the various attributes of this variable?
- Which measures the ethnicity of the respondent?
- Click on ETHNIC… why do you think the various attributes of this variable not listed?
- Go back one level to the Headings for Sequential Variable List. Scan through the headings and find the heading that you think would best fit a measure of whether the respondent is working, retired, or unemployed.
- Click on WRKSTAT and tell me what percent of the respondents have been ‘in school?’
- Remember that these are not measure of current populations but an aggregate of respondents since the 1970s. For example, look now at the percent unemployed. How many does it say?
- Briefly, let’s see how accurate that is. Go to http://www.bls.gov/. According to the the Bureau of Labor Statistics, what is the national unemployment rate currently?

**Univariate Analysis**

Okay… let’s return now to the main GSS page http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10. In the Variable Selection field, I want you to type AGE, then click View.

- Describe what you see.
- How many total respondents were there on this survey?
- How many 18 year olds answered the survey?
- How many were 89 or older?
- What age accounted for the larges percentage of the data set (hint – there is more than one)?
- How many people have not answered this question?
- How many people don’t know how old they are?
- What was the average age of the respondent?

Let’s do another one. Back up to the main GSS page. We are now going to look at an opinion question. Type the variable name IMMJOBS in the Variable Selection field.

- What kind of measure is this? Nominal, Ordinal, Interval, or Ratio?
- How many people have answered this question?
- Why are there 52,504 IAP (non-responses)? >>> questions are often asked only once or only for a few years.
- What percentage of respondents AGREE that immigrants take jobs away from people? (Note you need to add)
- If you were to create a bar graph, which value would be tallest?

Just to show you another trick, let’s back up to the main GSS page again. This time, take the variable IMMJOBS and type it into the ROW field. Under ‘Chart Options” change from Stacked Bar Chart to Bar Chart. Now click Run the Table.. You end up with similar results, but in an easy to read format.

**Bivariate Analysis**

As you have read in the text, one way of conducting bivariate analysis is by means of a contingency table. Crosstabs are one form of contingency table in which the independent variable is usually presented across the top and the dependent along the side.

Now, let’s construct a crosstab with the dependent variable IMMJOBS in the rows and the independent variable SEX in the columns. Click Run the Table.

- What is your interpretation of the table?
- Think in terms of Agreement and Disagreement… who agrees more?

Let’s do another one. Construct a crosstab with the dependent variable IMMJOBS in the rows and the independent variable SEX in the columns. Click Run the Table.

- What is your interpretation of this table?

Just one more. Construct a crosstab with the dependent variable IMMJOBS in the rows and the independent variable DEGREE (measuring the respondents highest level of education) in the columns. Click Run the Table.

- What is your interpretation of this table?

**Assignment**

Find the variables that most closely represent the independent and dependent variables in your own research project. Do the following:

- List the variables and their attributes. Indicate which is the dependent and which the independent variable.
- Run a Univariate analysis using the second methods above.
- If the variable is ratio or interval, give the mean, the lowest and highest values, and the percent missing.
- If the variable is ordinal or nominal, give the distribution of responses (as percentages) . Also, copy and paste a bar chart into your response.
- Now run a as long as both of your variables are ordinal or nominal (and not really more than five or so attributes), you can crosstab of your variables. Do so and interpret the results. If your independent variable has more than attributes (like Age), why wouldn’t you want to make a crosstab?