Relevant conceptual problems: Chapter 1: 14, 15 (omit short- and long-tailed),
Chapter 2: 6, 8
1, new) 6 pts. The data in radon.txt are from a simple
random sample of 42 owner-occupied houses in Ramsey County, MN. The
numbers are the radon concentration (pCi/l) in the house.
a) Draw a box plot of the radon concentrations and say whether the distribution is symmetrical or skewed. Your answer is the plot and your description.
b) Calculate and report the average and median radon concentration. Include units in your answers.
c) Would you expect the average and median for these data to be similar? Why (or why not)?
Note: the question is not asking whether the two values are identical - when estimated from data, they almost never are identical. It is asking whether you should expect them to be close.
d) Calculate the standard error of the mean radon concentration of
owner-occupied houses in Ramsey county.
Imagine you have taken a second simple random sample of radon in Ramsey Co, MN, homes.
There are 250 homes in this sample.
e) Would you expect the average
radon concentration from that second sample to be close to the mean you calculated in question 1c? Something larger? something smaller?
Explain (briefly) your answer.
f) Would you expect the standard error of the mean for the 2nd sample to be close to the se you calculated in question 1d? Something smaller? Something larger? Explain (briefly) your answer.
2, new) 7 pts (2 for part f). This problem is based on a study of the effect of a specific mutagen on polychromatophilic erythrocytes. 9 vials of cells were haphazardly taken from a "master" cell culture. 4 vials were randomly assigned to the control treatment (no mutagen). The other 5 vials were grown with 80 mg/ml of the mutagen. The cells in each vial were allowed to grow for an appropriate length of time, then the number of microsatellite nuclei was counted in 100 cells from each vial.
The data are:
a) What are the treatments? What, if anything, are they randomly assigned to? Can causal conclusions be statistically justified? Why or why not?
Consider a permutation test of the null hypothesis that
there is no effect of the mutagen, i.e. that the mean number of
microsatellite nuclei is the same in the two groups. We will use
the difference in means (dose of 80 mg/ml - control) as our test statistic.
So, a positive value for the test statistic indicates that the 80 mg/ml
dose had a larger number of microsatellite nuclei.
The observed difference (calculated from the data above) is 8.2
There are a total of 126 permutations of 9 things into a group
of 4 and group of 5. The 126 values of the test statistic (difference in means as dose - no), sorted from
smallest to largest, are:
-9.35 -8.9 -8 -8 -8 -8 -8 -7.1 -7.1 -7.1 -7.1 -6.65 -6.65 -6.65 -6.65 -5.75
-5.75 -5.75 -5.75 -5.75 -5.75 -5.3 -4.4 -4.4 -4.4 -4.4 -3.95 -3.95 -3.95 -3.95
-3.05 -3.05 -3.05 -3.05 -3.05 -3.05 -3.05 -3.05 -3.05 -3.05 -2.15 -2.15 -2.15 -2.15 -2.15
-2.15 -1.7 -1.7 -1.7 -1.7 -1.7 -1.7 -1.25 -0.8 -0.8 -0.8 -0.8 -0.35 -0.35 -0.35 -0.35
0.1 0.1 0.1 0.1 1 1 1 1 1 1 1 1 1 1 1 1.9 1.9 1.9 1.9 1.9 1.9 2.35 2.35 2.35 2.35 2.35 2.35 3.25
3.25 3.25 3.25 3.7 3.7 3.7 3.7 4.6 4.6 4.6 4.6 4.6 4.6 5.05 5.05 5.05 5.05 5.05 5.05 5.95 5.95
5.95 5.95 5.95 5.95 5.95 5.95 5.95 5.95 6.85 6.85 6.85 6.85 7.3 7.3 7.3 8.2
b) If the null hypothesis (no difference) is true, what is the
probability of 8.2 or a more extreme positive value?
c) What is the two-sided p-value for the test of the null hypothesis
of no difference?
Note: You may check your work using the computer, but please make sure you
know how to compute the appropriate probabilities from the distribution
d) Use the computer to conduct a randomization test, using 1000 samples of the mean difference.
What is the two-sided p-value reported by the computer?
e) Why is your answer to part d) slightly different from than in part c)?
f) Write a short conclusion (1 or perhaps a few sentences) about
the effect of this mutagen on the number of microsatellite nucleii,
using the permutation test for your inference. Notes: It
is also useful to report the estimated effect, in addition to the
statistical inference. The "Summary of Statistical Analysis"
paragraphs at the end of each case study are good models.
3) Chapter 1, problem 26 4 points. The book describes the data set and asks you to "evaluate the evidence supporting party differences in the percentage of pro-environment votes. Write a brief report of your conclusions, including appropriate graphical display(s) and summary statistics."
The data are in votes.txt on the class web site. The data file includes more variables than you need. PctPro is the percent of Pro-environment votes. Note: Your conclusions should be based only graphical and descriptive analyses. Do not conduct any significance tests or calculate any confidence intervals.
This is an example of what the book calls a Data Problem. The problem provides a context and a motivation for a data analysis. You figure out what is appropriate, do it, then write a short summary.
Please organize your summary in four parts (a - d below). Each part is a separate question, just like the parts of other questions. Each can be a bulleted list of phrases or (if you must) a sentence or two.
a) 1 pt. Methods
b) 1 pt. Appropriate summary statistics
c) 1 pt. Appropriate graphics
d) 1 pt. Conclusions
The key word above is "appropriate". There may be more than one appropriate method. Choose what you (or you and your friends) feel is the most appropriate. We do not want a 'statistics dump' where you use and report on every possible method. We will deduct points for an excessive dump. We will not deduct points for doing something appropriate that isn't exactly what we would do.
4) 1 pt. I specifically told you not to do a statistical test in problem 3.
If you want to evaluate party differences in the percentage of pro-environment votes, why
is a statistical test not needed with these data?
One or two sentences is sufficient.
5) 2 pt. The computer reports that the sample average is 120.57312. This is calculated
from 16 observations.
a) 1 pt. If the standard deviation is 35.45, use Kelley's rule to appropriately report the average.
b) 1 pt. If the standard deviation is 0.984, use Kelley's rule to appropriately report the average.
Note: If you missed the lecture discussion of number of digits for reporting results, see the lecture summary or a classmate for notes. An internet search for Kelley's rule is mostly likely to find something on betting or psychometric scale construction. Neither relevant to the HW question.