Stat 5870 section 2 - Fall 2024

Homework 5. Due Tuesday, 1 Oct, 11:59 pm, to Canvas/Gradescope

Relevant conceptual problems (3rd ed. numbers). Chapter 3: 1-8, 11, 16, 17

Problems to turn in:
0) 0 pts. I will provide a packet of computer output (with R code, SAS code, or how obtained in JMP) to accompany some of my questions on the midterm exam. What software are you using = what packet do you want, JMP, R, or SAS?

1) 8 pts total. Here are 4 short descriptions of a study.
In each study, identify the experimental unit(s), the treatment that is "assigned" to the experimental unit, and the observational unit.
No explanations are needed.
For observational studies, identify what you believe is the most reasonable analog of the experimental unit. I.e., if the treatment had been randomly assigned, what would it have been assigned to.
Note: there can be more than one experimental unit in a study. If so, identify all experimental units and the treatment assigned to each.

a) An experimental evaluation of soybean response to phosphorus fertilization. An experimental field contains 10 plots. 5 plots (control plots) are randomly chosen to receive the usual fertilizer regime; the other 5 plots (fertilized plots) receive the usual fertilizer plus 48 lbs per acre of P fertilizer. The response is the total soybean yield on a plot. There are 10 rows in the data set.

b) An observational evaluation of the effect of neighborhood poverty on reading proficiency. Census data are used to identify 10 neighborhoods with average income near the poverty line (poor) and 10 neighborhoods with average income more than twice the national average (rich). The school closest to the geographic center of each neighborhood is identified. The reading proficiency of each 8th grade students at that school is measured using a standardized test. The data are the reading scores for each student. There are a total of approximately 3000 rows in the data set.

c) The same as the previous study, with one exception: you are not given data on individual students. You are only given the average reading score for 8th grade students at the school. There are a total of 20 observations.

d) An experimental vaccine trial in pigs. You have two controlled climate rooms. Each contains 9 piglets. You randomly assign one room to 'vaccine'; all 9 piglets in that room are vaccinated. The 9 piglets in the other room receive the placebo treatment. All piglets are then exposed to the infectious agent. After 2 weeks, 3 piglets from each room are randomly selected from each room and sacrificed to measure disease development. After 4 weeks, 3 of the remaining piglets in each room are randomly selected and sacrificed. The last 3 piglets in each room are sacrificed at 6 weeks. There are a total of 18 observations.
Note: This study has two treatment factors (vaccinate and date).

2) The data in burn.csv were collected as part of a study on the response of a prairie to experimental burning. Ten watersheds in the prairie were delineated. Five watersheds were randomly chosen to be burned in Spring 2010; the other five were left unburnt. In Fall 2011, the percent cover of shrubs was measured in five 10m x 10m plots in each watershed. There are 50 observations in the data set.
Use an analysis that satisfies the assumption of independence to answer the following questions.

a) 1 pt. Estimate the decrease in percent shrub cover caused by burning
b) 1 pt. What is the standard error of that estimate. Assume equal variances.
c) 1 pt. How many degrees of freedom are associated with that standard error?
d) 1 pt. What is a 95% confidence interval for the effect of burning (i.e., the decrease in shrub cover caused by burning)?

3) The data in dioxin.csv are from an observational study of dioxin levels in blood. The background is explained in Chapter 3, case study 2. A short summary is that these are from 646 US Army soldiers who served in Vietnam during 1967 and 1968 and 97 US Army soldiers who entered the army at similar times but served only in the US or Germany. The Veteran variable in the data set identifies the group of soldiers. Many soldiers serving in Vietnam were exposed to Agent Orange, a defoliant, that contains dioxins. Dioxins are implicated in increased risk of cancer and are especially persistent in both the environment and the human body. These data are 1987 measurements of dioxin levels in blood. Please use the data set I provide; I have adjusted reported 0 and 1 values to more reasonable numbers.

a) 2 pt. Consider a model with a different mean for each group of soldiers. Is it reasonable to assume that errors (variation of individuals around their group mean) are normally distributed? Include your evidence (plot or numbers) with your answer.

You decide to log transform the Dioxin values. All subsequent parts use log-transformed Dioxin as the response.

b) 2 pt. Analyze the data after log transforming all values. Is it appropriate to assume equal variances of these errors? Include your evidence (plot or numbers) with your answer.
c) 1 pt. Use a t-test to assess whether the two groups have the same mean log-transformed Dioxin concentration. What is the p-value from that t-test?
d) 2 pt. Estimate the multiplicative treatment effect using results from the analysis of log transformed values. Write a one-sentence interpretation of this estimate.
Note: My lecture statements like "the median CFU in the control treatment is an estimated 5.87 times that in active treatement" (I had three equivalent versions) are one-sentence interpretations of the multiplicative treatment effect.
e) 1 pt. Estimate a 95% confidence interval for the multiplicative treatment effect. No interpretation or conclusion needed.