Statistics homework help

STA 622, Assignment 7, Fall 2020
For each of the following problems, please start them on a separate page.  Each page should start with your answer to the question, followed by the relevant R code, followed by the relevant R output.  For #1, this can be done one variable at a time, but please do them in the order listed in the following table.
You should interpret your output, meaning you should not supply output and expect me to interpret it for you.

Variable Description
biomass Aerial biomass production of the marsh grass Spartina Alterniflora
H2S Free sulfide
PH Soil pH
P Phosphorus
type DVEG, SHRT or TALL (make SHRT the reference group)

 
You do not need to perform any remedies for issues that you identify.  Just note the presence of the issue.  You can assume this data set is a random sample from the target population.
 

  1. Do an initial univariate data analysis (one variable at a time), to get familiar with all variables. This means summaries of all variables, especially the response variable.
  • For quantitative variables, create basic numerical summaries and a graphical summary. Comment on any interesting features such as non-normal shapes, outliers, or missing values (by noting different sample sizes across variables).
  • For categorical variables, determine how many individuals fall in each category (frequency table), and note any missing values. Note any categories with small counts.
  1. Perform basic bivariate (Y with one X at a time) explorations.
  • Make a scatter-matrix of your quantitative explanatory variables with the response variable, verifying that all relationships with the response variable are linear (or at least not obviously non-linear), and looking for outliers.
  • You should make side-by-side boxplots for the categorical explanatory variables with the response variable, and note any interesting features, such as outliers, shapes of distributions, or difference in centers or variability.

 

  1. Check for collinearity for the quantitative explanatory variables.
  2. Fit a MLR regression model for all of the explanatory variables listed in the table above, which should include the use of dummy variables for the categorical explanatory variable, and be sure to use SHRT as the reference category for type. Your answer should be the equation of the regression model.
  3. Check for all pairwise interactions simultaneously by performing a partial F test. Regardless of the outcome, do not include any interactions for the rest of this assignment.
  4. Check conditions (needed to perform statistical inference) for the model in #4.
  5. Interpret all slopes that are in the model in #4 in context. Feel free to use the generic term “unit” when referring to the units for each of the quantitative variables.
  6. Interpret for your final model.  What does this tell you about how much faith you should have in any predictions?
  7. Compute and interpret a prediction interval for an individual with H2S of -600, pH of 5.0, P of 16.7, and type is TALL.