Homework #8: Due Tuesday 29 Oct, by 11:59 pm.
Conceptual problems:
Chapter 6:7:
Computational problems: (turn in)
Note:For all problems this week, analyze the data as presented.
No need to worry about modifying the analysis to better satisfy assumptions.
1) Oxygen isotopic composition in Tyranosoraus rex bones. The book describes this data set in problem 5:23
A short summary is that we are interested in the relative amounts of two isotopes of Oxygen in bones of a single
dinosaur skeleton. This ratio depends on the temperature of the bone when the Calcium Phosphate was laid down. So,
if T rex was homeothermic (warm blooded), the temperature should be relatively similar throughout the body and the isotope
ratio should be relatively similar. If T rex was poikilothermic (cold blooded), the isotopic ratio should vary between the
bones. The data in
Trex.csv are measurements on small pieces from 12 bones from a single T rex skeleton.
a) Test the null hypothesis that all 12 bones have the same mean isotopic composition. Report your test statistic and
p-value, and write a one-sentence conclusion.
b) Imagine that the investigators were specifically interested in the difference between bone 4 (dorsal vertebrae 1)
and bone 11 (mid-caudal). Test whether this difference = 0. Consider this a pre-planned (a-priori) comparison.
Report the appropriate p-value.
c) The contrast between bones 4 and 11
is actually not a pre-planned question. The investigators considered all possible pairs of differences. If you use the Fisher's protected LSD
approach to multiple comparisons, what p-value do you report for the comparison
of bone 4 and 11?
d) If you used Tukey's adjustment
for multiple comparisons, what p-value do you report for the comparison of bones
4 and 11?
e) If you used the Bonferroni adjustment for multiple comparisons,
what p-value do you report for the comparison of bones 4 and 11?
JMP users: you will need to get the unadjusted p-value from JMP, then hand-compute the Bonferroni adjusted p-value.
f) Plot residuals vs predicted values for the 12 bones. Your answer is the plot.
g) Use the residual vs predicted value plot to assess the assumptions of equal variance and normality. Are these
assumptions reasonable? Briefly explain your answer.
h) Should you repeat the analysis after log transforming the oxygen isotope ratio? Briefly explain why or why not.
i) Deliberately mis-analyze the data by treating bone number as a continuous variable (not a categorical variable).
Look at the ANOVA table for the "continuous bone" analysis. There is one number in this table that indicates you have
done the wrong analysis. What is that number and why is it wrong?
2) This problem is based on a metabolomic study. The study examines differences in fatty acid composition between two genotypes of Arabidopsis. The data are collected from 12 replicates of the wild type and 6 replicates of a mutant in which a specific gene has been disabled (a "knockout" mutant). Which genotype is indicated by the genotype variable (Wild_type or SALK_078745). A gas chromatograph coupled to a mass spectrometer (GC/MS) will measure the concentration of many different compounds in each sample. GC/MS was used to measure the concentration of various fatty acids and related compounds in each sample. Some compounds could be named (e.g., 16_Hydroxyhexadecanoic_acid); others are known only by a unique identifier (e.g., BJN_GCMS_FAMES_1735_2). There are 115 compounds that were consistently found in these two genotypes. The raw data are in fa2.csv; unadjusted p-values from t-tests comparing the two genotypes are in faP.csv.
a) How many fatty acids have unadjusted p-values < 5%?
b) How many fatty acids are "discoveries" if you use the Benjamini-Hochberg method with a false discovery rate of 10%?
c) Consider Stigmasterol, one of the measured fatty acids
(column 38 in fa2.csv and row 34 in faP.csv). If Stigmasterol was the primary
outcome in the study, what is an appropriate conclusion about Stigmasterol concentration
in these two genotypes? Use a p-value threshold of 5% or a false positive rate of 10%, as
needed for the question.
d) If the study was intended to identify compounds that differ between the
two genotypes, with no prior focus on Stigmasterol, what is an appropriate conclusion about Stigmasterol?
Use a p-value threshold of 5% or a false positive rate of 10%, as
needed for the question.
3) The data in music.txt come from an observational study of the relationship between playing a string instrument (violin, cello, ...) and brain activity. Years is the number of years playing a string instrument. Folks who never played have a value of 0. Activity is neuronal activity index, the nai variable. Nai quantifies brain activity while performing a standardized task. A more complete description of the study is in Chapter 7, problem 30. Please answer the following questions:
a) Plot the data (X=years, Y=nai) and look at your plot. Will a linear
regression model be a reasonable summary of the relationship between nai
and years of playing? Briefly explain why or why not.
b) Estimate and report the intercept and slope for the regression: nai = b0 +
b1*years.
c) In the context of this problem, describe to someone who hasn't studied statistics
the interpretation of the slope.
d) Use the regression to estimate the mean activity for people who have never
played (years = 0).