# Linear Regression Exercises

**Linear Regression Exercises Due 10/13/17 by 10 pm**

**Simple Regression**

Research Question: Does the number of hours worked per week (*workweek*) predict family income (*income*)?

Using Polit2SetA data set, run a simple regression using Family Income (*income*) as the outcome variable (Y) and Number of Hours Worked per Week (*workweek*) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetA data set.

2. Click on **Analyze**, then click on **Regression**, then **Linear**.

3. Move the dependent variable (*income*) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.

4. Move the independent variable (*workweek*) into the box labeled “Independent.”

5. Click on the **Statistics **button (right side of box) and click on **Descriptives**, **Estimates**, **Confidence Interval** (should be 95%), and **Model Fit**, then click on **Continue**.

6. Click on **OK**.

**Assignment:** Through analysis of the SPSS output, answer the following questions. Answer questions 1 – 10 individually, not in paragraph form

1. What is the total sample size?

2. What is the mean income and mean number of hours worked?

3. What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?

4. What it the value of R squared (coefficient of determination)? Interpret the value.

5. Interpret the standard error of the estimate? What information does this value provide to the researcher?

6. The model fit is determined by the ANOVA table results (*F* statistic = 37.226, 1,376 degrees of freedom, and the *p* value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)

7. Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?

8. Based on the output, write out the regression equation for predicting family income.

9. Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?

10. Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?

**For this assignment, answer question 1 through 10 individually. DO NOT ANSWER IN PARAGRAPH FORM.**

**Multiple Regression**

**Assignment:** In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?

Using Polit2SetC data set, run a multiple regression using CES-D Score (*cesd*) as the outcome variable (Y) and respondent’s age (*age*), educational attainment (*educatn*), currently employed (*worknow*), number, types of abuse (*nabuse*), and poor health (*poorhlth*) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetC data set.

2. Click on **Analyze, **then click on **Regression**, then **Linear**.

3. Move the dependent variable, CES-D Score (*cesd*) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.

4. Move the independent variables (*age*, *educatn*, *worknow*, and *poorhlth*) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (*nabuse*). **Note:** Be sure the Method box states “Enter”.

5. Click on the **Statistics** button (right side of box) and click on **Descriptives**, **Estimates**, **Confidence Interval** (should be 95%), **R square change**, and **Model Fit**, and then click on **Continue**.

6. Click on **OK**.

**Assignment:** (When answering all questions, use the data on the coefficients panel from Model 2). Answer questions 1 – 5 individually, not in paragraph form

1. Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)

2. Which of the predictors were significant predictors in the model?

3. Which of the predictors was the most relevant predictor in the model?

4. Interpret the unstandardized coefficents for educational attainment and poor health.

5. If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.

**For this assignment, answer question 1 through 5 individually. DO NOT ANSWER IN PARAGRAPH FORM.**

**Required Readings**

**Gray, J.R., Grove, S.K., & Sutherland, S. (2017)****. Burns and Grove’s the practice of nursing research: Appraisal, synthesis, and generation of evidence**** (8th ed.). St. Louis, MO: Saunders Elsevier**.

- Chapter 24, “Using Statistics to Predict”

This chapter asserts that predictive analyses are based on probability theory instead of decision theory. It also analyzes how variation plays a critical role in simple linear regression and multiple regression.

*Statistics and Data Analysis for Nursing Research*

- Chapter 9, “Correlation and Simple Regression” (pp. 208–222)

This section of Chapter 9 discusses the simple regression equation and outlines major components of regression, including errors of prediction, residuals, OLS regression, and ordinary least-square regression.

- Chapter 10, “Multiple Regression”

Chapter 10 focuses on multiple regression as a statistical procedure and explains multivariate statistics and their relationship to multiple regression concepts, equations, and tests.

- Chapter 12, “Logistic Regression”

This chapter provides an overview of logistic regression, which is a form of statistical analysis frequently used in nursing research.

**Optional Resources**

**Walden University. (n.d.). Linear regression. Retrieved August 1, 2011, from http://streaming.waldenu.edu/hdp/researchtutorials/educ8106_player/educ8106_linear_regression.html**

__Week 7 Linear Regression Exercises__

__Simple Regression__

Research Question: Does the number of hours worked per week (*workweek*) predict family income (*income*)?

Using Polit2SetA data set, run a simple regression using Family Income (*income*) as the outcome variable (Y) and Number of Hours Worked per Week (*workweek*) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetA data set.

2. Click on **Analyze**, then click on **Regression**, then **Linear**.

3. Move the dependent variable (*income*) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.

4. Move the independent variable (*workweek*) into the box labeled “Independent.”

5. Click on the **Statistics **button (right side of box) and click on **Descriptives**, **Estimates**, **Confidence Interval** (should be 95%), and **Model Fit**, then click on **Continue**.

6. Click on **OK**.

** Assignment: **Through analysis of the SPSS output, answer the following questions.

1. What is the total sample size?

2. What is the mean income and mean number of hours worked?

3. What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?

4. What it the value of R squared (coefficient of determination)? Interpret the value.

5. Interpret the standard error of the estimate? What information does this value provide to the researcher?

6. The model fit is determined by the ANOVA table results (*F* statistic = 37.226, 1,376 degrees of freedom, and the *p* value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)

7. Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?

8. Based on the output, write out the regression equation for predicting family income.

9. Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?

10. Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?

__For this assignment, answer question 1 through 10 individually. DO NOT ANSWER IN PARAGRAPH FORM.__

__Multiple Regression__

** Assignment: **In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?

Using Polit2SetC data set, run a multiple regression using CES-D Score (*cesd*) as the outcome variable (Y) and respondent’s age (*age*), educational attainment (*educatn*), currently employed (*worknow*), number, types of abuse (*nabuse*), and poor health (*poorhlth*) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetC data set.

2. Click on **Analyze, **then click on **Regression**, then **Linear**.

3. Move the dependent variable, CES-D Score (*cesd*) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.

4. Move the independent variables (*age*, *educatn*, *worknow*, and *poorhlth*) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (*nabuse*). **Note:** Be sure the Method box states “Enter”.

5. Click on the **Statistics** button (right side of box) and click on **Descriptives**, **Estimates**, **Confidence Interval** (should be 95%), **R square change**, and **Model Fit**, and then click on **Continue**.

6. Click on **OK**.

** Assignment: **(When answering all questions, use the data on the coefficients panel from Model 2).

1. Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)

2. Which of the predictors were significant predictors in the model?

3. Which of the predictors was the most relevant predictor in the model?

4. Interpret the unstandardized coefficents for educational attainment and poor health.

5. If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.

__For this assignment, answer question 1 through 5 individually. DO NOT ANSWER IN PARAGRAPH FORM.__

Required Readings

**Gray, J.R., Grove, S.K., & Sutherland, S. (2017) . Burns and Grove’s the practice of nursing research: Appraisal, synthesis, and generation of evidence (8th ed.). St. Louis, MO: Saunders Elsevier**.

· Chapter 24, “Using Statistics to Predict”

This chapter asserts that predictive analyses are based on probability theory instead of decision theory. It also analyzes how variation plays a critical role in simple linear regression and multiple regression.

*Statistics and Data Analysis for Nursing Research*

· Chapter 9, “Correlation and Simple Regression” (pp. 208–222)

This section of Chapter 9 discusses the simple regression equation and outlines major components of regression, including errors of prediction, residuals, OLS regression, and ordinary least-square regression.

· Chapter 10, “Multiple Regression”

Chapter 10 focuses on multiple regression as a statistical procedure and explains multivariate statistics and their relationship to multiple regression concepts, equations, and tests.

· Chapter 12, “Logistic Regression”

This chapter provides an overview of logistic regression, which is a form of statistical analysis frequently used in nursing research.

Optional Resources

**Walden University. (n.d.). Linear regression. Retrieved August 1, 2011, from http://streaming.waldenu.edu/hdp/researchtutorials/educ8106_player/educ8106_linear_regress**

Week 7 – Linear Regression Exercises SPSS Output

Simple Linear Regression SPSS Output

Descriptive Statistics

Mean Std. Deviation N

Family income prior month,

all sources

$1,485.49 $950.496 378

Hours worked per week in

current job

33.52 12.359 378

Correlations

Family income

prior month, all

sources

Hours worked

per week in

current job

Pearson Correlation Family income prior month,

all sources

1.000 .300

Hours worked per week in

current job

.300 1.000

Sig. (1-tailed) Family income prior month,

all sources

. .000

Hours worked per week in

current job

.000 .

N Family income prior month,

all sources

378 378

Hours worked per week in

current job

378 378

Model Summary

Model

R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .300a .090 .088 $907.877

a. Predictors: (Constant), Hours worked per week in current job

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 3.068E7 1 3.068E7 37.226 .000a

Residual 3.099E8 376 824241.002

Total 3.406E8 377

a. Predictors: (Constant), Hours worked per week in current job

b. Dependent Variable: Family income prior month, all sources

Coefficientsa

Model Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95.0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 711.651 135.155 5.265 .000 445.896 977.405

Hours worked per week

in current job

23.083 3.783 .300 6.101 .000 15.644 30.523

a. Dependent Variable: Family income prior month, all sources

Part II: Multiple Regression SPSS Output

This part is going to begin with an example that has been interpreted for you. Analyze the output provided and read the interpretation of the data so that you will have an understanding of what you will do for the multiple regression assignment.

Descriptive Statistics

Mean Std. Deviation N

CES-D Score 18.5231 11.90747 156

CESD Score, Wave 1 17.6987 11.40935 156

Number types of abuse .83 1.203 156

Correlations

CES-D Score

CESD Score,

Wave 1

Number types

of abuse

Pearson Correlation CES-D Score 1.000 .412 .347

CESD Score, Wave 1 .412 1.000 .187

Number types of abuse .347 .187 1.000

Sig. (1-tailed) CES-D Score . .000 .000

CESD Score, Wave 1 .000 . .010

Number types of abuse .000 .010 .

N CES-D Score 156 156 156

CESD Score, Wave 1 156 156 156

Number types of abuse 156 156 156

Model Summary

Model

R R Square

Adjusted R

Square

Std. Error of

the Estimate

Change Statistics

R Square

Change F Change df1 df2 Sig. F Change

1 .412a .170 .164 10.88446 .170 31.506 1 154 .000

2 .496b .246 .236 10.41016 .076 15.352 1 153 .000

a. Predictors: (Constant), CESD Score, Wave 1

b. Predictors: (Constant), CESD Score, Wave 1, Number types of abuse

ANOVAc

Model Sum of Squares df Mean Square F Sig.

1 Regression 3732.507 1 3732.507 31.506 .000a

Residual 18244.613 154 118.472

Total 21977.120 155

2 Regression 5396.278 2 2698.139 24.897 .000b

Residual 16580.842 153 108.372

Total 21977.120 155

a. Predictors: (Constant), CESD Score, Wave 1

b. Predictors: (Constant), CESD Score, Wave 1, Number types of abuse

c. Dependent Variable: CES-D Score

Coefficientsa

Model Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95.0% Confidence Interval for

B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 10.911 1.612 6.768 .000 7.726 14.095

CESD Score, Wave 1 .430 .077 .412 5.613 .000 .279 .581

2 (Constant) 9.584 1.579 6.071 .000 6.465 12.702

CESD Score, Wave 1 .376 .075 .360 5.035 .000 .228 .523

Number types of

abuse

2.772 .707 .280 3.918 .000 1.374 4.170

a. Dependent Variable: CES-D Score In the regression example, we were statistically controlling for women’s level of depression 2 years earlier and attempting to determine if recent abuse experiences affected current levels of depression, earlier depression held constant. The correlation between CES-D scores in the two waves of data collection was moderate and positive, r = .412. You can see this value in the Model Summary panel—the value of R in the first step is the bivariate correlation (i.e., r) between the two CES-D scores. Yes, R2 was statistically significant at p < .001in both steps of the regression analysis, as shown in the ANOVA panel. Using R2 increased from .170 in the first model to .246 when the abuse variable was added. The R2 change (increase) of .076 (7.6%) was significant at p<.001, as shown in the Model Summary panel, under change Statistics. This indicates that even when prior levels of depression were held constant, recent abuse accounted for a significant amount of variation in current depression scores. The availability of longitudinal data does not “prove” that abuse experiences affected the women’s level of depression, but it does offer greater supportive evidence than cross-sectional data. If we wanted to predict current CES-D scores, using prior CES-D scores and abuse experiences as predictors, the unstandardized regression equation would be as follows: Y’= 9.584 + .376 (cesdwav1) + 2.772 (nabuse). This information comes from the panel labeled Coefficients. In terms of the independent variables there are two coefficients on the panel labeled coefficients. The first is the unstandardized coefficients (b-values) which represent the individual contribution of each predictor to the model. The b-value for number types of abuse (2.772) tells us about the relationship between CES-D score (Dependent variable) and number type of abuse (independent variable). These values are used when making predictions and they tell us to what degree the independent variable affects the outcome when the effects of all other variables in the equation are held constant. For example, the interpretation of number, types of abuse is as follows: For each unit increase in the number, types of abuse, the CES-D score (depression) increases by 2.772 units. The increase is dependent on the units that the variable is measured in. So, for each additional type of abuse reported the CES-D depression score will increase by 2.772 points. Always check the value in the significance column to determine if the variables are making a significant contribution to the model. The second coefficient reported is the standardized Beta coefficient. The standardized coefficient tells us the number of standard deviations that the dependent variable will change as a result of one standard deviation change in the independent variable. The standardized coefficient is typically used

to permit the researcher to understand which of the independent variables is most important in explaining the dependent variable. In the above example, the CES-D score, Wave 1 has a Beta coefficient of .360 and the number, types of abuse has a Beta coefficient of .280. This indicates that the CES-D score, Wave 1 is the most significant predictor in the model and makes the strongest unique contribution to explaining the dependent variable. Note: When you are determining the most significant predictor ignore the negative sign if one exists. So, a predictor with a Beta of -.96 is stronger than a Beta of .55. SPSS Output for Multiple Regression Assignment

Descriptive Statistics

Mean Std. Deviation N

CES-D Score 18.5815 11.78965 939

Respondent’s age at time of

interview

36.54749 6.234511 939

Educational attainment 1.57 .584 939

Currently employed? .45 .498 939

Poor health self rating .06 .247 939

Number types of abuse .85 1.160 939

Correlations

CES-D Score

Respondent’

s age at time

of interview

Educational

attainment

Currently

employed?

Poor health

self rating

Number

types of

abuse

Pearson

Correlation

CES-D Score 1.000 .061 -.155 -.220 .270 .370

Respondent’s age at

time of interview

.061 1.000 .065 -.077 .140 -.020

Educational attainment -.155 .065 1.000 .060 -.074 -.026

Currently employed? -.220 -.077 .060 1.000 -.162 -.073

Poor health self rating .270 .140 -.074 -.162 1.000 .095

Number types of abuse .370 -.020 -.026 -.073 .095 1.000

Sig. (1-tailed) CES-D Score . .031 .000 .000 .000 .000

Respondent’s age at

time of interview

.031 . .023 .009 .000 .272

Educational attainment .000 .023 . .032 .012 .215

Currently employed? .000 .009 .032 . .000 .012

Poor health self rating .000 .000 .012 .000 . .002

Number types of abuse .000 .272 .215 .012 .002 .

N CES-D Score 939 939 939 939 939 939

Respondent’s age at

time of interview

939 939 939 939 939 939

Educational attainment 939 939 939 939 939 939

Currently employed? 939 939 939 939 939 939

Poor health self rating 939 939 939 939 939 939

Number types of abuse 939 939 939 939 939 939

Model Summary

Model

R R Square

Adjusted R

Square

Std. Error of

the Estimate

Change Statistics

R Square

Change F Change df1 df2

Sig. F

Change

1 .348a .121 .117 11.07693 .121 32.148 4 934 .000

2 .483b .233 .229 10.34980 .112 136.849 1 933 .000

a. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at time of interview, Currently

employed?

b. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at time of interview, Currently

employed?, Number types of abuse

ANOVAc

Model Sum of Squares df Mean Square F Sig.

1 Regression 15777.841 4 3944.460 32.148 .000a

Residual 114600.356 934 122.698

Total 130378.197 938

2 Regression 30436.854 5 6087.371 56.829 .000b

Residual 99941.343 933 107.118

Total 130378.197 938

a. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at

time of interview, Currently employed?

b. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at

time of interview, Currently employed?, Number types of abuse

c. Dependent Variable: CES-D Score

Coefficientsa

Model Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95.0% Confidence Interval for

B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 22.182 2.351 9.434 .000 17.567 26.796

Respondent’s age at

time of interview

.045 .059 .024 .767 .443 -.070 .161

Educational attainment -2.608 .624 -.129 -4.179 .000 -3.832 -1.383

Currently employed? -4.092 .738 -.173 -5.544 .000 -5.540 -2.643

Poor health self rating 10.928 1.503 .229 7.270 .000 7.978 13.878

2 (Constant) 18.165 2.224 8.169 .000 13.801 22.528

Respondent’s age at

time of interview

.068 .055 .036 1.240 .215 -.040 .176

Educational attainment -2.518 .583 -.125 -4.318 .000 -3.663 -1.374

Currently employed? -3.605 .691 -.152 -5.219 .000 -4.961 -2.250

Poor health self rating 9.496 1.410 .199 6.735 .000 6.729 12.263

Number types of abuse 3.432 .293 .338 11.698 .000 2.856 4.008

a. Dependent Variable: CES-D Score