Linear Regression Exercises
Linear Regression Exercises Due 10/13/17 by 10 pm
Simple Regression
Research Question: Does the number of hours worked per week (workweek) predict family income (income)?
Using Polit2SetA data set, run a simple regression using Family Income (income) as the outcome variable (Y) and Number of Hours Worked per Week (workweek) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.
Follow these steps when using SPSS:
1. Open Polit2SetA data set.
2. Click on Analyze, then click on Regression, then Linear.
3. Move the dependent variable (income) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.
4. Move the independent variable (workweek) into the box labeled “Independent.”
5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), and Model Fit, then click on Continue.
6. Click on OK.
Assignment: Through analysis of the SPSS output, answer the following questions. Answer questions 1 – 10 individually, not in paragraph form
1. What is the total sample size?
2. What is the mean income and mean number of hours worked?
3. What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?
4. What it the value of R squared (coefficient of determination)? Interpret the value.
5. Interpret the standard error of the estimate? What information does this value provide to the researcher?
6. The model fit is determined by the ANOVA table results (F statistic = 37.226, 1,376 degrees of freedom, and the p value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)
7. Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?
8. Based on the output, write out the regression equation for predicting family income.
9. Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?
10. Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?
For this assignment, answer question 1 through 10 individually. DO NOT ANSWER IN PARAGRAPH FORM.
Multiple Regression
Assignment: In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?
Using Polit2SetC data set, run a multiple regression using CES-D Score (cesd) as the outcome variable (Y) and respondent’s age (age), educational attainment (educatn), currently employed (worknow), number, types of abuse (nabuse), and poor health (poorhlth) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.
Follow these steps when using SPSS:
1. Open Polit2SetC data set.
2. Click on Analyze, then click on Regression, then Linear.
3. Move the dependent variable, CES-D Score (cesd) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.
4. Move the independent variables (age, educatn, worknow, and poorhlth) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (nabuse). Note: Be sure the Method box states “Enter”.
5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), R square change, and Model Fit, and then click on Continue.
6. Click on OK.
Assignment: (When answering all questions, use the data on the coefficients panel from Model 2). Answer questions 1 – 5 individually, not in paragraph form
1. Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)
2. Which of the predictors were significant predictors in the model?
3. Which of the predictors was the most relevant predictor in the model?
4. Interpret the unstandardized coefficents for educational attainment and poor health.
5. If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.
For this assignment, answer question 1 through 5 individually. DO NOT ANSWER IN PARAGRAPH FORM.
Required Readings
Gray, J.R., Grove, S.K., & Sutherland, S. (2017). Burns and Grove’s the practice of nursing research: Appraisal, synthesis, and generation of evidence (8th ed.). St. Louis, MO: Saunders Elsevier.
- Chapter 24, “Using Statistics to Predict”
This chapter asserts that predictive analyses are based on probability theory instead of decision theory. It also analyzes how variation plays a critical role in simple linear regression and multiple regression.
Statistics and Data Analysis for Nursing Research
- Chapter 9, “Correlation and Simple Regression” (pp. 208–222)
This section of Chapter 9 discusses the simple regression equation and outlines major components of regression, including errors of prediction, residuals, OLS regression, and ordinary least-square regression.
- Chapter 10, “Multiple Regression”
Chapter 10 focuses on multiple regression as a statistical procedure and explains multivariate statistics and their relationship to multiple regression concepts, equations, and tests.
- Chapter 12, “Logistic Regression”
This chapter provides an overview of logistic regression, which is a form of statistical analysis frequently used in nursing research.
Optional Resources
Walden University. (n.d.). Linear regression. Retrieved August 1, 2011, from http://streaming.waldenu.edu/hdp/researchtutorials/educ8106_player/educ8106_linear_regression.html
Week 7 Linear Regression Exercises
Simple Regression
Research Question: Does the number of hours worked per week (workweek) predict family income (income)?
Using Polit2SetA data set, run a simple regression using Family Income (income) as the outcome variable (Y) and Number of Hours Worked per Week (workweek) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.
Follow these steps when using SPSS:
1. Open Polit2SetA data set.
2. Click on Analyze, then click on Regression, then Linear.
3. Move the dependent variable (income) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.
4. Move the independent variable (workweek) into the box labeled “Independent.”
5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), and Model Fit, then click on Continue.
6. Click on OK.
Assignment: Through analysis of the SPSS output, answer the following questions.
1. What is the total sample size?
2. What is the mean income and mean number of hours worked?
3. What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?
4. What it the value of R squared (coefficient of determination)? Interpret the value.
5. Interpret the standard error of the estimate? What information does this value provide to the researcher?
6. The model fit is determined by the ANOVA table results (F statistic = 37.226, 1,376 degrees of freedom, and the p value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)
7. Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?
8. Based on the output, write out the regression equation for predicting family income.
9. Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?
10. Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?
For this assignment, answer question 1 through 10 individually. DO NOT ANSWER IN PARAGRAPH FORM.
Multiple Regression
Assignment: In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?
Using Polit2SetC data set, run a multiple regression using CES-D Score (cesd) as the outcome variable (Y) and respondent’s age (age), educational attainment (educatn), currently employed (worknow), number, types of abuse (nabuse), and poor health (poorhlth) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.
Follow these steps when using SPSS:
1. Open Polit2SetC data set.
2. Click on Analyze, then click on Regression, then Linear.
3. Move the dependent variable, CES-D Score (cesd) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.
4. Move the independent variables (age, educatn, worknow, and poorhlth) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (nabuse). Note: Be sure the Method box states “Enter”.
5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), R square change, and Model Fit, and then click on Continue.
6. Click on OK.
Assignment: (When answering all questions, use the data on the coefficients panel from Model 2).
1. Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)
2. Which of the predictors were significant predictors in the model?
3. Which of the predictors was the most relevant predictor in the model?
4. Interpret the unstandardized coefficents for educational attainment and poor health.
5. If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.
For this assignment, answer question 1 through 5 individually. DO NOT ANSWER IN PARAGRAPH FORM.
Required Readings
Gray, J.R., Grove, S.K., & Sutherland, S. (2017). Burns and Grove’s the practice of nursing research: Appraisal, synthesis, and generation of evidence (8th ed.). St. Louis, MO: Saunders Elsevier.
· Chapter 24, “Using Statistics to Predict”
This chapter asserts that predictive analyses are based on probability theory instead of decision theory. It also analyzes how variation plays a critical role in simple linear regression and multiple regression.
Statistics and Data Analysis for Nursing Research
· Chapter 9, “Correlation and Simple Regression” (pp. 208–222)
This section of Chapter 9 discusses the simple regression equation and outlines major components of regression, including errors of prediction, residuals, OLS regression, and ordinary least-square regression.
· Chapter 10, “Multiple Regression”
Chapter 10 focuses on multiple regression as a statistical procedure and explains multivariate statistics and their relationship to multiple regression concepts, equations, and tests.
· Chapter 12, “Logistic Regression”
This chapter provides an overview of logistic regression, which is a form of statistical analysis frequently used in nursing research.
Optional Resources
Walden University. (n.d.). Linear regression. Retrieved August 1, 2011, from http://streaming.waldenu.edu/hdp/researchtutorials/educ8106_player/educ8106_linear_regress
Week 7 – Linear Regression Exercises SPSS Output
Simple Linear Regression SPSS Output
Descriptive Statistics
Mean Std. Deviation N
Family income prior month,
all sources
$1,485.49 $950.496 378
Hours worked per week in
current job
33.52 12.359 378
Correlations
Family income
prior month, all
sources
Hours worked
per week in
current job
Pearson Correlation Family income prior month,
all sources
1.000 .300
Hours worked per week in
current job
.300 1.000
Sig. (1-tailed) Family income prior month,
all sources
. .000
Hours worked per week in
current job
.000 .
N Family income prior month,
all sources
378 378
Hours worked per week in
current job
378 378
Model Summary
Model
R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .300a .090 .088 $907.877
a. Predictors: (Constant), Hours worked per week in current job
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 3.068E7 1 3.068E7 37.226 .000a
Residual 3.099E8 376 824241.002
Total 3.406E8 377
a. Predictors: (Constant), Hours worked per week in current job
b. Dependent Variable: Family income prior month, all sources
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 711.651 135.155 5.265 .000 445.896 977.405
Hours worked per week
in current job
23.083 3.783 .300 6.101 .000 15.644 30.523
a. Dependent Variable: Family income prior month, all sources
Part II: Multiple Regression SPSS Output
This part is going to begin with an example that has been interpreted for you. Analyze the output provided and read the interpretation of the data so that you will have an understanding of what you will do for the multiple regression assignment.
Descriptive Statistics
Mean Std. Deviation N
CES-D Score 18.5231 11.90747 156
CESD Score, Wave 1 17.6987 11.40935 156
Number types of abuse .83 1.203 156
Correlations
CES-D Score
CESD Score,
Wave 1
Number types
of abuse
Pearson Correlation CES-D Score 1.000 .412 .347
CESD Score, Wave 1 .412 1.000 .187
Number types of abuse .347 .187 1.000
Sig. (1-tailed) CES-D Score . .000 .000
CESD Score, Wave 1 .000 . .010
Number types of abuse .000 .010 .
N CES-D Score 156 156 156
CESD Score, Wave 1 156 156 156
Number types of abuse 156 156 156
Model Summary
Model
R R Square
Adjusted R
Square
Std. Error of
the Estimate
Change Statistics
R Square
Change F Change df1 df2 Sig. F Change
1 .412a .170 .164 10.88446 .170 31.506 1 154 .000
2 .496b .246 .236 10.41016 .076 15.352 1 153 .000
a. Predictors: (Constant), CESD Score, Wave 1
b. Predictors: (Constant), CESD Score, Wave 1, Number types of abuse
ANOVAc
Model Sum of Squares df Mean Square F Sig.
1 Regression 3732.507 1 3732.507 31.506 .000a
Residual 18244.613 154 118.472
Total 21977.120 155
2 Regression 5396.278 2 2698.139 24.897 .000b
Residual 16580.842 153 108.372
Total 21977.120 155
a. Predictors: (Constant), CESD Score, Wave 1
b. Predictors: (Constant), CESD Score, Wave 1, Number types of abuse
c. Dependent Variable: CES-D Score
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 10.911 1.612 6.768 .000 7.726 14.095
CESD Score, Wave 1 .430 .077 .412 5.613 .000 .279 .581
2 (Constant) 9.584 1.579 6.071 .000 6.465 12.702
CESD Score, Wave 1 .376 .075 .360 5.035 .000 .228 .523
Number types of
abuse
2.772 .707 .280 3.918 .000 1.374 4.170
a. Dependent Variable: CES-D Score In the regression example, we were statistically controlling for women’s level of depression 2 years earlier and attempting to determine if recent abuse experiences affected current levels of depression, earlier depression held constant. The correlation between CES-D scores in the two waves of data collection was moderate and positive, r = .412. You can see this value in the Model Summary panel—the value of R in the first step is the bivariate correlation (i.e., r) between the two CES-D scores. Yes, R2 was statistically significant at p < .001in both steps of the regression analysis, as shown in the ANOVA panel. Using R2 increased from .170 in the first model to .246 when the abuse variable was added. The R2 change (increase) of .076 (7.6%) was significant at p<.001, as shown in the Model Summary panel, under change Statistics. This indicates that even when prior levels of depression were held constant, recent abuse accounted for a significant amount of variation in current depression scores. The availability of longitudinal data does not “prove” that abuse experiences affected the women’s level of depression, but it does offer greater supportive evidence than cross-sectional data. If we wanted to predict current CES-D scores, using prior CES-D scores and abuse experiences as predictors, the unstandardized regression equation would be as follows: Y’= 9.584 + .376 (cesdwav1) + 2.772 (nabuse). This information comes from the panel labeled Coefficients. In terms of the independent variables there are two coefficients on the panel labeled coefficients. The first is the unstandardized coefficients (b-values) which represent the individual contribution of each predictor to the model. The b-value for number types of abuse (2.772) tells us about the relationship between CES-D score (Dependent variable) and number type of abuse (independent variable). These values are used when making predictions and they tell us to what degree the independent variable affects the outcome when the effects of all other variables in the equation are held constant. For example, the interpretation of number, types of abuse is as follows: For each unit increase in the number, types of abuse, the CES-D score (depression) increases by 2.772 units. The increase is dependent on the units that the variable is measured in. So, for each additional type of abuse reported the CES-D depression score will increase by 2.772 points. Always check the value in the significance column to determine if the variables are making a significant contribution to the model. The second coefficient reported is the standardized Beta coefficient. The standardized coefficient tells us the number of standard deviations that the dependent variable will change as a result of one standard deviation change in the independent variable. The standardized coefficient is typically used
to permit the researcher to understand which of the independent variables is most important in explaining the dependent variable. In the above example, the CES-D score, Wave 1 has a Beta coefficient of .360 and the number, types of abuse has a Beta coefficient of .280. This indicates that the CES-D score, Wave 1 is the most significant predictor in the model and makes the strongest unique contribution to explaining the dependent variable. Note: When you are determining the most significant predictor ignore the negative sign if one exists. So, a predictor with a Beta of -.96 is stronger than a Beta of .55. SPSS Output for Multiple Regression Assignment
Descriptive Statistics
Mean Std. Deviation N
CES-D Score 18.5815 11.78965 939
Respondent’s age at time of
interview
36.54749 6.234511 939
Educational attainment 1.57 .584 939
Currently employed? .45 .498 939
Poor health self rating .06 .247 939
Number types of abuse .85 1.160 939
Correlations
CES-D Score
Respondent’
s age at time
of interview
Educational
attainment
Currently
employed?
Poor health
self rating
Number
types of
abuse
Pearson
Correlation
CES-D Score 1.000 .061 -.155 -.220 .270 .370
Respondent’s age at
time of interview
.061 1.000 .065 -.077 .140 -.020
Educational attainment -.155 .065 1.000 .060 -.074 -.026
Currently employed? -.220 -.077 .060 1.000 -.162 -.073
Poor health self rating .270 .140 -.074 -.162 1.000 .095
Number types of abuse .370 -.020 -.026 -.073 .095 1.000
Sig. (1-tailed) CES-D Score . .031 .000 .000 .000 .000
Respondent’s age at
time of interview
.031 . .023 .009 .000 .272
Educational attainment .000 .023 . .032 .012 .215
Currently employed? .000 .009 .032 . .000 .012
Poor health self rating .000 .000 .012 .000 . .002
Number types of abuse .000 .272 .215 .012 .002 .
N CES-D Score 939 939 939 939 939 939
Respondent’s age at
time of interview
939 939 939 939 939 939
Educational attainment 939 939 939 939 939 939
Currently employed? 939 939 939 939 939 939
Poor health self rating 939 939 939 939 939 939
Number types of abuse 939 939 939 939 939 939
Model Summary
Model
R R Square
Adjusted R
Square
Std. Error of
the Estimate
Change Statistics
R Square
Change F Change df1 df2
Sig. F
Change
1 .348a .121 .117 11.07693 .121 32.148 4 934 .000
2 .483b .233 .229 10.34980 .112 136.849 1 933 .000
a. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at time of interview, Currently
employed?
b. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at time of interview, Currently
employed?, Number types of abuse
ANOVAc
Model Sum of Squares df Mean Square F Sig.
1 Regression 15777.841 4 3944.460 32.148 .000a
Residual 114600.356 934 122.698
Total 130378.197 938
2 Regression 30436.854 5 6087.371 56.829 .000b
Residual 99941.343 933 107.118
Total 130378.197 938
a. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at
time of interview, Currently employed?
b. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent’s age at
time of interview, Currently employed?, Number types of abuse
c. Dependent Variable: CES-D Score
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 22.182 2.351 9.434 .000 17.567 26.796
Respondent’s age at
time of interview
.045 .059 .024 .767 .443 -.070 .161
Educational attainment -2.608 .624 -.129 -4.179 .000 -3.832 -1.383
Currently employed? -4.092 .738 -.173 -5.544 .000 -5.540 -2.643
Poor health self rating 10.928 1.503 .229 7.270 .000 7.978 13.878
2 (Constant) 18.165 2.224 8.169 .000 13.801 22.528
Respondent’s age at
time of interview
.068 .055 .036 1.240 .215 -.040 .176
Educational attainment -2.518 .583 -.125 -4.318 .000 -3.663 -1.374
Currently employed? -3.605 .691 -.152 -5.219 .000 -4.961 -2.250
Poor health self rating 9.496 1.410 .199 6.735 .000 6.729 12.263
Number types of abuse 3.432 .293 .338 11.698 .000 2.856 4.008
a. Dependent Variable: CES-D Score