Exercise 2 – Correlations

exercise2_correlations

Exercise 2 – Correlations

Goals:

  • to understand variables and their attributes;
  • to understand the relationship between independent and dependent variables;
  • to introduce the GSS (later used for topic selection)… and to give practice using this online database.

NOTE: This exercise will be split into two parts due to its length.  Exercise 2a will be  Understanding variables in the GSS.  Exercise 2b will be Cross-tabs.


Exercise 2a: Understanding variables in the GSS.

INTRO:
Think back to our first exercise and the 20 statements you wrote in response to the question “Who am I?”  We turned those responses into variables – something that varies across people.  For example, we looked at the number of times each person mentioned something related to social roles.  This varied across the people in this class – some of you had more of your statements relate to social roles than others.

Variables are one of the most important ideas in the social sciences.  We use variables to capture something about the world we’re interested in, and looking at the relationships between variables helps us find patterns in our lives.

Let’s look at some variables that social scientists commonly use: sex, race, and education level.  As Babbie shows in the textbook, each of these variables is defined by its attributes.  Think about a survey you’ve filled out – the attributes are the choices you’re given for your responses.  The attributes for “SEX” are most often “Male” and “Female”.  So, when you checked “Male” or “Female” on the survey, you were deciding which attribute of the variable “SEX” you fall under.

[NEXT]

STEP 1:
Take a minute and write down what you would use as attributes for “RACE” and “EDUCATION LEVEL” if you were the one deciding what choices to give on a survey.

Now, let’s look at how these variables are defined in a nationwide social survey.  The General Social Survey (GSS) asks questions about the social attitudes of thousands of people each year across the U.S.  Since the federal government helps fund this study, the data are available for anyone to use.  Let’s learn how to understand variables on the GSS website.

[NEXT]

STEP 2:
Open up the GSS website in a browser window : http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10

Don’t worry about understanding everything on this page yet – we’ll work with the GSS throughout this course, and you’ll gradually become familiar with the different parts of this website.

For now, look at the top left of the page, under the “Variable Selection” section:

[NEXT]

STEP 3:
Let’s get used to how the GSS documents the variables you can use from the survey.  Each variable in the GSS has a specific name that is the quickest way to access information about that variable.  The variable name for the sex of the respondent is, unsurprisingly, “SEX”.

Type “sex” into the “Selected:” field and click on “View”:

In the window that pops up when you click “View”, pay attention to the “Description of the Variable” section.  This gives you important information about the variable, often including the exact wording of the question used from the survey to create this variable.  For “SEX”, the description just tells us that this variable codes the respondent’s sex.

Where are the attributes for SEX on this page?
[Click to reveal: That’s right, they’re in the section below the variable description.  The Label column gives us the categories that are used as attributes for this variable – in this case, “MALE” and “FEMALE”.]

The N column shows the number of respondents who have selected either MALE or FEMALE.  In fact, any time you see an uppercase “N” in a table, this just stands for “Number”.  As you might expect, the Percent column shows the percent of respondents who are MALE or FEMALE.  From 1972-2010, 44.0% of people who responded to the GSS were male and 56.0% were female.  Percents make it easier to compare categories across variables, as we’ll see later when we look at the relationship between two variables.

[NEXT]

STEP 4:
Okay, let’s look at the other variables you defined attributes for earlier.  The name of the main variable for race in the GSS is “RACE”.  (NOTE: Some variable names are straightforward, but other names can be abbreviated or are less intuitive.  Always check the “Description of the Variable” entry for each variable you want to use to make sure you know what question the variable is based on.)  Go ahead and type “race” in the “Selected:” field and click on “View”.

What survey question is the “RACE” variable is based on?
[Reveal answer or quiz question: “What race do you consider yourself?”]

What are the attributes of the “RACE” variable?
[Reveal answer or quiz question: White, Black, and Other]

What is the number of respondents who have chosen “Other” as their attribute for race?

  1. 2,589

FEEDBACK: Yes!  This is the number (N) that matches the “Other” label.

  1. 7,625

FEEDBACK: No, this is the number (N) that matches the “Black” label.

  1. 4.7

FEEDBACK: No, this is the percentage of respondents choosing “Other”.

  1. 44,873

FEEDBACK: No, this is the number (N) that matches the “White” label.

Do the attributes for the race variable in the GSS match what you wrote when you were thinking of attributes you would use for “RACE”?  What attributes match?  What attributes are different?

Do you think the attributes in the GSS race variable are adequate to capture the racial diversity across the entire U.S.?  The “RACE” variable is only one way to measure race – we’ll work through how to measure more difficult concepts in Module 4: Study Design.

[NEXT]

STEP 5:
Let’s look at a variable for education level to make sure you have the hang of using this part of the GSS site.  The name of the variable we’ll use is “DEGREE” – go ahead and type “degree” into the “Selected:” field and click on “View”.

[NOTE: Don’t show this figure in the learning object – make students produce this on the site to answer the questions]

[NOTE: Don’t need these to be true multiple choice questions that get recorded anywhere – just have the feedback as “accordion” when student clicks on a response]

Which of the following is not an attribute for DEGREE?

  1. GED certificate
    1. FEEDBACK: Yes, “GED certificate” is not one of the labels for DEGREE.
  2. high school
    1. FEEDBACK: No, “high school” is one of the labels for DEGREE.
  3. junior college
    1. FEEDBACK: No, “junior college” is one of the labels for DEGREE.
  4. graduate
    1. FEEDBACK: No, “graduate” is one of the labels for DEGREE.

What percentage of GSS respondents have earned a Bachelor’s degree?

  1. 5.3%
    1. FEEDBACK: No, match up the percentage on the left with its label on the right.
  2. 13.9%
    1. FEEDBACK: Correct!
  3. 22.5%
    1. FEEDBACK: No, match up the percentage on the left with its label on the right.
  4. 30%
    1. FEEDBACK: No, match up the percentage on the left with its label on the right.

As you’ve probably noticed, the labels used to describe attributes of a variable are often abbreviated and shortened.  The labels for DEGREE show some common abbreviations that you might find in the documentation of other variables: LT stands for “less than”; DK stands for “Don’t Know”, and NA stands for “Not Applicable”.  Respondents are often given the options to choose something like “Don’t Know” or “Not Applicable” in a survey in case they don’t feel there is an attribute that adequately describes them or they don’t understand the question.

[NEXT]

Conclusion:
Good!  Now you should be starting to get familiar with what variables look like and how we define a variable using different attributes.  And you’ve seen part of the GSS website too – this is a very powerful resource for finding information about our social world.

In the following part of this exercise, you’ll see how to show whether two variables are related to each other.  At the end of this module, you’ll learn how to branch out and search for the variables that are the most interesting to you.


Exercise 2b: Cross-tabs

INTRO:
Tables are one of the simplest and most popular ways of seeing how two variables are related.  By the end of this exercise, you’ll be building and analyzing tables like this one:

Table 1: Cross-tab of Children of Undocumented Immigrants Should be Citizens by Sex

Wait, don’t be overwhelmed by all of the information in the table!  Reading a table is simple once you learn how.  For those of you who don’t love math (and also for those of you who do), the computer takes care of all the tabulations necessary to fill in the table.

Which number is bigger –  54.1% or 48.6%?  If you answered 54.1%, you have all the necessary math skills for this exercise.  Instead of spending our time calculating, we’ll focus on using a table to see patterns that emerge in people’s responses .

You’ll use these basic skills – creating and reading a table – in other exercises for this course. The ability to analyze a table will also make you a more critical consumer of statistical data throughout your life.

[NEXT]

STEP 1:
What is a hypothesis?
Many people think of a hypothesis generally as an “educated guess.”   That’s an okay place to start, but social scientists often use the word “hypothesis” for a much more specific purpose – stating that one variable causes a response in another variable.

In Table 1, the implied hypothesis is that your sex affects whether you believe children of undocumented immigrants born in the US should qualify for US citizenship.  The cause is sex – whether a person is male or female – and the effect is attitude towards immigration and citizenship.  In social science terms, the cause is the independent variable, and the effect is the dependent variable.

Take a moment to make sure you’ve memorized these terms: independent variable and dependent variable.  We’ll be thinking about independent and dependent variables again and again in this course.

Table 1: Cross-tab of Children of Undocumented Immigrants Should be Citizens by Sex

Look again at Table 1:
Is the independent variable in the columns (vertical) or rows (horizontal) part of the table?
[Click to reveal: SEX is our independent variable, so the attributes of SEX – “Male” and “Female” – are the column labels.]

Is the dependent variable in the columns (vertical) or rows (horizontal) section of the table?
[Click to reveal: UNDOCKID (children of undocumented immigrants should be US citizens) is our dependent variable, so the attributes of UNDOCKID – “Yes, Qualify” and “No, Not Qualify” – are the row labels.

Nearly all cross-tab tables you see in the social sciences are set up this way, with the independent variable in the columns and the dependent variable in the rows. ]

[NEXT]

STEP 2:
How to Read a Table

Table 1: Cross-tab of Children of Undocumented Immigrants Should be Citizens by Sex

We’re not quite through with Table 1 yet. The red and blue cells are the ones we’ll concentrate on to see the relationship between our two variables. The bolded numbers at the top of each cell are what you’ll be using.  Each bolded number is the percentage of people who answered the survey (respondents) who fit into each category (e.g., 48.6% of respondents are male and believe children of undocumented immigrants should be citizens if they are born in the U.S.).  To analyze a table, we’ll compare the bolded percentages across the cells in each row.

The number underneath each percentage in the table is the “N”, or the number of people who fit that category.
Do you notice anything strange about the “N” values for this table?
[Click to reveal: They aren’t whole numbers!  Each respondent in the GSS is given a different “weight” in the calculations so that the results correctly represent the U.S. population – you’ll learn more about this when we cover different ways of choosing who receives a survey.  In general, you won’t need to worry about the “N” values when you’re looking at a table.]

Remember, we will compare the percentages across each row to analyze a table that is set up with the independent variable in the columns and the dependent variable in the rows (like Table 1).  This lets us compare the different categories of our independent variable.

[NEXT]
Table 1:
Let’s get some practice analyzing Table 1 together:

  • What can you learn from looking at the first row in the table?
    • [Click to reveal: This row shows the percentage of respondents who answered YES, QUALIFY when asked if children of undocumented immigrants should continue to qualify as American citizens if they are born in the U.S.  48.6 percent of men said yes, and 54.1 percent of women said yes to this question.]
  • What can you learn from looking at the second row in the table?
    • [Click to reveal: This row shows the percentage of respondents who answered NO, NOT QUALIFY when asked if undocumented immigrants’ children should qualify for citizenship. 51.4 percent of men said no, and 45.9 percent of women said no.]
  • In this table, do you see any difference between men’s and women’s attitudes on children of undocumented immigrants?
    • [Click to reveal: There is some difference, right?  The percentages in each row for men and women are 5.5 percentage points different between the two sexes.  We could say that women are a little more favorable than men toward citizenship for children of immigrants.]

Okay, take a break for a moment.  I hope you are starting to see why tables are useful.  With a well constructed table we can compare groups of people to look for differences in their characteristics or opinions.  In the GSS table we’ve been analyzing, it looks like the sex of the respondent might make some difference to how they respond to this question about immigration.

[NEXT]

STEP 3:
How to make a a cross-tab
Next you’ll be learning to use the GSS site to build a table yourself.  In Table 1, we found some difference between men and women on the children of undocumented immigrants question. We often want to test more than one possible explanation or independent variable to see if there are other factors that help explain our dependent variable. Let’s find out if RACE has an impact on attitudes towards immigration.  Once again, we will use data from the General Social Survey (GSS).

Do you think that people who are white might agree more or less than people in other racial groups that the children of undocumented immigrants should be given U.S. citizenship?

Let’s take one possible answer to that question and rewrite it as a hypothesis:

People who identify as white will disagree more than people in other racial categories that the children of undocumented immigrants should be given U.S. citizenship.

Here’s another way to think about the variables involved in our new hypothesis:

  • Independent variable (cause): Respondent’s race
  • Dependent variable (effect): Respondent’s attitude toward children of undocumented immigrants

Remember, we want to set up our table with the independent variable in the columns and the dependent variable in the rows.

Why should you set up your table this way?  It’s what is typical in the social sciences, so you will be less likely to confuse your audience if you use this convention.

Sometimes you don’t have a clear idea which variable is the cause and which is the effect – in this case, it’s less important to follow this rule.  However, in our example it is very unlikely immigration attitudes affect the race category a respondent selects, so we have a strong reason for treating race as an independent variable.

[NEXT]

[NOTE: You can break this next section up if needed to get the screenshots on each page]

Let’s make the table!

  1. Open this link to reach the GSS survey:
    http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10
  2. Type UNDOCKID into the “Row” box. This is our dependent variable.
  3. Type RACE into the “Column” box.  This is our independent variable.

Figure 1: Screenshot of UNDOCKID and RACE entered as row and column variables

4. Click on the “Run the table” button. This should open a new window with the results of the analysis that looks like this:
Figure 2: Screen shot of SDA output cross-tab for RACE and UNDOCKID

We want to concentrate on the cross-tab portion of the output:

Table 2: Cross-tab for RACE and UNDOCKID

We’ll analyze this table just like we did the previous one – comparing the percentages across each row.  Remember, this works because we’ve set up the table with the independent variable in the columns and the dependent variable in the rows.

  1. Compare the percentages in each cell across the first row.  What pattern do you see?
    1. [Click to reveal: White respondents are less likely than Black respondents or respondents from other groups to agree that children of immigrants should have citizenship.]
  2. Compare the percentages in each cell across the second row.  What pattern do you see?
    1. [Click to reveal: White respondents are the most likely to say the children should not qualify for citizenship, followed by Black respondents and then respondents from other groups.]
  3. Think back to our hypothesis: does this table support our prediction?
    1. [Click to reveal: Yes, our hypothesis is supported by the data.  Whites are more likely to disagree with the statement that children of undocumented immigrants should qualify for U.S. citizenship. Race does affect this attitude towards immigration.]

[NEXT]

Taking stock:
Great, you have acquired some valuable skills so far in this exercise:

  • You know how to set up a table to test a cause-and-effect hypothesis about human behavior.

  • You can read a cross-tab to test this hypothesis.

  • You can find and analyze data from thousands of Americans over the past four decades.

These skills are useful outside of this class, too.  Learning to understand tables will make you a more sophisticated consumer of the data used to make claims about our world everyday.

There is one last step to this exercise – testing yourself to make sure you can create and interpret a table on your own.

[NEXT]

STEP 4: Creating your own cross-tab

Let’s test one more hypothesis.  When we were looking at different variables in the GSS, one that we described was DEGREE, a measure of the highest education level attained by each respondent.  What would you expect the relationship to be between education level and whether a person thinks the children of undocumented immigrants should qualify for U.S. citizenship?  Do you think people with more education would be more accepting of the children’s citizenship than those with less education?  Or maybe you think the opposite is true?

Let’s go with the first idea – here is how we could write it as a more formal hypothesis:

People with higher education levels are more likely to agree that children of undocumented immigrants should qualify for U.S. citizenship.

Which is the independent variable?
[Click to reveal: education level, the variable named “DEGREE” in the GSS]

Which is the dependent variable?
[Click to reveal: attitude toward citizenship of undocumented immigrants’ children, named “UNDOCKID” in the GSS]

Now go to the SDA site again (http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10) and build the cross-tab to test this hypothesis.

Which variable goes in the “Row” field?
[Click to reveal: the dependent variable – UNDOCKID]

Which variable goes in the “Column” field?
[Click to reveal: the independent variable – DEGREE]

Click on “Run the Table” to produce the cross-tab.

Compare the percentages across the categories of the independent variable, DEGREE.

Write a paragraph that answers each of the following questions:

  1. In which education level are people most likely to agree that children of undocumented immigrants should qualify for citizenship?
  2. In which education levels are they least likely to agree?
  3. Does the relationship in the cross-tab support our hypothesis that those with more education are more likely to agree that children should qualify for citizenship?  What evidence does the table provide to support our hypothesis?  What evidence opposes our hypothesis?
  4. Can you think of any plausible reasons why education affects this attitude toward immigration in this way?  What would your initial theory be if you wanted to explore this relationship further?

[TEXTBOX for submitting paragraph]

[On clicking SUBMIT, reveal: Great, you’ve now constructed and interpreted a basic cross-tab table!  This is just one way we can look for patterns and relationships in our world, but it is a powerful one.  Start thinking about questions and variables you are most interested in exploring for this course.  You now have the skills to explore these questions using information collected from thousands of people all across the country!]


Resources for instructor:

Here’s the GSS page for DEGREE that students will need to produce to answer Step 5:

Here’s the table for DEGREE and UNDOCKID they’ll need to analyze for Step 4, though we shouldn’t show this table in the learning object itself (they should produce it on their own).


Advertisements
%d bloggers like this: