Stat 5870, section 2, Homework 6. Due Tuesday, 15 Oct, 11:59 pm.

Relevant conceptual problems.
Chapter 18: 1, 2, 3, 6 (referring to study design, but not any computations, in 5), 7, 8
Chapter 19: 1, 3, 6, 9

Questions are 1 point each, except where indicated as 2 points. Computational/Data problems (turn in).
1) The data given below and repeated in perry.csv are from a randomized social experiment started in 1962. 123 three- and four-year old children were randomly assigned to either 2 years of preschool instruction or a control group that received no preschool education. The response, recorded for each individual, was whether or not that individual had been arrested for any crime before the age of 19 (Yes or No). The numbers of individuals in each combination of treatment and response groups are:

YesNo
Preschool1942
Control3230

Note: Know how to do parts b, c, d, and e by hand, even if you check your work using the computer.
a) 2 pts.What type of sampling design was used in this study? Briefly explain your choice.
b) Estimate the proportion of the preschool children who were arrested before age 19.
c) Estimate the standard error of this proportion
d) Estimate the odds that a preschool child will be arrested before age 19.
e) Estimate the odds ratio that fills in the blank in this sentence: The odds that a control child is arrested is ____ times as large as the odds for a preschool child.
f) 2 pts. Estimate a 95% confidence interval for the odds ratio in part e
g) Use a Chi-square test to test the hypothesis that control and preschool children are equally likely to be arrested by age 19. Do not use the continuity correction. Report the Chi-square statistic, p-value, and a one-sentence conclusion.

2) The data given below and repeated in epi.csv are made up from a study of the association between a potential risk factor and breast cancer. The population of interest is women in small city. The investigators identified 100 women with breast cancer then identified 100 demographically similar women without breast cancer. There is no matching at the individual level, so we will ignore the 'demogaphically similar' part of the study design. They then asked each women about their exposure to that risk factor. Group A were exposed to the risk factor; group B was not. The numbers of women in each combination of risk factor and cancer groups are:

Group:No cancerCancer
exposed4050
not6050

a) 2 pts. What type of sampling design was used in this study? Briefly explain your choice.
b) 2 pts. Consider the women in Group A. Estimate the proportion of cancer cases among the group A (exposed) women in this study. Repeat for group B (not), then calculate the difference in proportions, as A - B.
c) Calculate the odds ratio that fills in the blank in this sentence: The odds of breast cancer for a woman in Group A is ______ times as large as that for a woman in group B.

The investigators then repeat the study with new individuals and a larger sample size. They could only find 100 cases of cancer, but they identified 2500 non-cancer individuals. Groups A (exposed) and B (not) were defined the same way as in the previous study. The numbers of women in each combination of risk factor and cancer groups in the new study are:

Group:No cancerCancer
exposed100050
not150050

These data are in epi2.csv.

d) 2 pts. Repeat question 2b for the new study. That is, estimate the proportion of cancer cases in the Group A women, the Group B women, and the difference in those proportions.
e) Repeat question 2c for the new study. That is, calculate the odds ratio that fills in this sentence: The odds of breast cancer for a woman in Group A is ______ times as large as that for a woman in group B.
f) 2 pts. The first study was intended to assess the consequences of exposure to the risk factor in populations with different background levels of breast cancer. Is it more appropriate to report the difference in proportions (i.e., your answer from question 2b) or the odds ratio (i.e., your answer from question 2c)? Briefly explain your choice.
g) Which study gives you the more precise estimate of the odds ratio? Briefly explain your choice.