UseRStudio with RMarkdown and provide thorough explanations with complete sentences. Please submit your
pdf or html file at the course’s Blackboard website by 11:59pm Monday, October 3.
Information for 32 young red wines is included in the text file
youngwines.txt, available at Blackboard. The
quality (y) of the wine is thought to be related to the seven properties of pH (x2), total SO2 (x3), color density
(x4), polymeric pigment color (x6), anthocyanin color (x7), total anthocyanins (x8), and density of ionization
(x9). The data to be used here includes the response variable y and seven of the ten predictors shown in
the text file. The columns of data for x1, x5, and x10 are to be omitted. (Notice that x1 is simply a binary
indicator, x10 is a multiple of x7, and x5 is the sum of x6 and x7.)
1. Use the object lm to fit a linear regression model with response variable ”quality” and the seven predictor
variables listed above. Give the equation for the model. Interpret carefully the p-values that were
obtained from the values of the t and F-statistics displayed in the R summary for the model. Also
explain thoroughly what conclusions can be drawn from the value of R2 which is displayed.
2. Calculate the vector of residuals from the model in part (1). With aid of the R instructions qnorm and
plot, produce a quantile-quantile plot for the residuals. Then check your answer with the instruction
qqnorm. Discuss what conclusions can be made from the quantile-quantile plot.
3. By using the vector of fitted values along with elementary R computations, verify that the values of
R2
, adjusted R2
, and the F-statistic provided in the R summary for the model are correct. Carefully
interpret the values of these statistics.
4. Carry out a ”manual model reduction” to eventually arrive at a two-predictor model that seems to be
good. To do so, first eliminate the one of the seven predictors that seems to you to be least important.
Use the object lm to fit a linear model with the remaining six as predictors. Then proceed to eliminate
another predictor that seems less important than the others. Fit a linear model having the remaining five
as predictors. Continue the process, eliminating one variable at a time, until you reach a linear model
with response variable ”quality” and two remaining predictors. At each stage, discuss your results
carefully, explaining why you decided to eliminate the particular predictor that you did. [There is not
just one correct answer here.]
(The reduction above must be done “manually”. Do not use sophisticated algorithmic reduction techniques. You should include your R code and output for each step, but the main grading emphasis will
be upon the written analysis.)
5. Compare the listed values of R2
, adjusted R2
, and F in the two-predictor model at the end of part (4)
with the listed values of R2
, adjusted R2
, and F for the seven-predictor model in part (1). Discuss your
results.
6. Again consider the wine quality data from part (1). Produce a scatterplot matrix which includes the
response variable and all seven quantitative predictors. Does the scatterplot matrix give any support for
the conclusion that you reached in part (4)? Explain
Prazo de Entrega: 30 de Setembro de 2022