BIO STATIC HOMEWORK
1. The following are body mass index (BMI) scores measured in 12 patients who are free of diabetes and participating in a study of risk factors for obesity. Body mass index is measured as the ratio of weight in kilograms to height in meters squared. Generate a 95% confidence interval estimate of the true BMI.
25 27 31 33 26 28 38 41 24 32 35 40
2. Consider the data in Problem 1. How many subjects would be needed to ensure that a 95% confidence interval estimate of BMI had a margin of error not exceeding 2 units?
3. The mean BMI in patients free of diabetes was reported as 28.2. The investigator conducting the study described in Problem 1 hypothesizes that the BMI in patients free of diabetes is higher. Based on the data in Problem 1 is there evidence that the BMI is significantly higher that 28.2? Use a 5% level of significance.
4. Peak expiratory flow (PEF) is a measure of a patient’s ability to expel air from the lungs. Patients with asthma or other respiratory conditions often have restricted PEF. The mean PEF for children free of asthma is 306. An investigator wants to test whether children with chronic bronchitis have restricted PEF. A sample of 40 children with chronic bronchitis are studied and their mean PEF is 279 with a standard deviation of 71. Is there statistical evidence of a lower mean PEF in children with chronic bronchitis? Run the appropriate test at a=0.05.
5. Consider again the study in Problem 4, a different investigator conducts a second study to investigate whether there is a difference in mean PEF in children with chronic bronchitis as compared to those without. Data on PEF are collected and summarized below. Based on the data, is there statistical evidence of a lower mean PEF in children with chronic bronchitis as compared to those without? Run the appropriate test at a=0.05.
Group 
Number of Children 
Mean PEF 
Std Dev PEF 
Chronic Bronchitis 
25 
281 
68 
No Chronic Bronchitis 
25 
319 
74 
6. Using the data presented in Problem 5,
a) Construct a 95% confidence interval for the mean PEF in children without chronic bronchitis.
b) How many children would be required to ensure that the margin of error in (a) does not exceed 10 units?
7. A clinical trial is run to investigate the effectiveness of an experimental drug in reducing preterm delivery to a drug considered standard care and to placebo. Pregnant women are enrolled and randomly assigned to receive either the experimental drug, the standard drug or placebo. Women are followed through delivery and classified as delivering preterm (< 37 weeks) or not. The data are shown below.
Preterm Delivery 
Experimental Drug 
Standard Drug 
Placebo 
Yes 
17 
23 
35 
No 
83 
77 
65 
Is there a statistically significant difference in the proportions of women delivering preterm among the three treatment groups? Run the test at a 5% level of significance.
8. Using the data in Problem 7, generate a 95% confidence interval for the difference in proportions of women delivering preterm in the experimental and standard drug treatment groups.
9. Consider the data presented in Problem 7. Previous studies have shown that approximately 32% of women deliver prematurely without treatment. Is the proportion of women delivering prematurely significantly higher in the placebo group? Run the test at a 5% level of significance.
10. A study is run comparing HDL cholesterol levels between men who exercise regularly and those who do not. The data are shown below.
Regular Exercise 
N 
Mean 
Std Dev 
Yes 
35 
48.5 
12.5 
No 
120 
56.9 
11.9 
Generate a 95% confidence interval for the difference in mean HDL levels between men who exercise regularly and those who do not.
11. A clinical trial is run to assess the effects of different forms of regular exercise on HDL levels in persons between the ages of 18 and 29. Participants in the study are randomly assigned to one of three exercise groups – Weight training, Aerobic exercise or Stretching/Yoga – and instructed to follow the program for 8 weeks. Their HDL levels are measured after 8 weeks and are summarized below.
Exercise Group 
N 
Mean 
Std Dev 
Weight Training 
20 
49.7 
10.2 
Aerobic Exercise 
20 
43.1 
11.1 
Stretching/Yoga 
20 
57.0 
12.5 
Is there a significant difference in mean HDL levels among the exercise groups? Run the test at a 5% level of significance. HINT: SSerror = 7286.5.
12. Consider again the data in Problem 11. Suppose that in the aerobic exercise group we also measured the number of hours of aerobic exercise per week and the mean is 5.2 hours with a standard deviation of 2.1 hours. The sample correlation is 0.42.
a) Estimate the equation of the regression line that best describes the relationship between number of hours of exercise per week and HDL cholesterol level (Assume that the dependent variable is HDL level).
b) Estimate the HDL level for a person who exercises 7 hours per week.
c) Estimate the HDL level for a person who does not exercise.
13. The table below summarizes baseline characteristics on patients participating in a clinical trial.
Characteristic 
Placebo (n=125) 
Experimental (n=125) 
P 
Mean (+ SD) Age 
54 + 4.5 
53 + 4.9 
0.7856 
% Female 
39% 
52% 
0.0289 
% Less than High School Education 
24% 
22% 
0.0986 
% Completing High School 
37% 
36% 

% Completing Some College 
39% 
42% 

Mean (+ SD) Systolic Blood Pressure 
136 + 13.8 
134 + 12.4 
0.4736 
Mean (+ SD) Total Cholesterol 
214 + 24.9 
210 + 23.1 
0.8954 
% Current Smokers 
17% 
15% 
0.5741 
% with Diabetes 
8% 
3% 
0.0438 
a) Are there any statistically significant differences in baseline characteristics between treatment groups? Justify your answer.
b) Write the hypotheses and the test statistic used to compare ages between groups. (No calculations – just H_{0}, H_{1} and form of the test statistic)
c) Write the hypotheses and the test statistic used to compare % females between groups. (No calculations – just H_{0}, H_{1} and form of the test statistic)
d) Write the hypotheses and the test statistic used to compare educational levels between groups. (No calculations – just H_{0}, H_{1} and form of the test statistic)
14. A study is designed to investigate whether there is a difference in response to various treatments in patients with rheumatoid arthritis. The outcome is patient’s selfreported effect of treatment. The data are shown below. Is there a significant difference in effect of treatment? Run the test at a 5% level of significance.

Symptoms Worsened 
No Effect 
Symptoms Improved 
Total 
Treatment 1 
22 
14 
14 
50 
Treatment 2 
14 
15 
21 
50 
Treatment 3 
9 
12 
29 
50 
15. Using the data shown in Problem 14, suppose we focus on the proportions of patients who show improvement. Is there a statistically significant difference in the proportions of patients who show improvement between treatments 1 and 2. Run the test at a 5% level of significance.
16. An analysis is conducted to compare mean time to pain relief (measured in minutes) under four competing treatment regimens Summary statistics on the four treatments are shown below.
Treatment 
Sample Size 
Mean Time to Relief 
Sample Variance 
A 
5 
33.8 
17.7 
B 
5 
27.0 
15.5 
C 
5 
50.8 
9.7 
D 
5 
39.6 
16.8 
a) Complete the following ANOVA Table





Source of Variation 
SS 
df 
MS 
F 
Between Groups 




Within Groups 
3719.48 








Total 




b) Write the hypotheses to be tested.
c) Write the decision rule.
d) What is the conclusion?
17. The following data were collected in a clinical trial to compare a new drug to a placebo for its effectiveness in lowering total serum cholesterol. Generate a 95% confidence interval for the difference in mean total cholesterol levels between treatments.

New Drug (n=75) 
Placebo (n=75) 
Total Sample (n=150) 
Mean (SD) Total Serum Cholesterol 
185.0 (24.5) 
204.3 (21.8) 
194.7 (23.2) 
% Patients with Total Cholesterol < 200 
78.0% 
65.0% 
71.5% 
18. Using the data in Problem 17,
a) Generate a 95% confidence interval for the proportion of all patients with total cholesterol < 200.
b) How many patients would be required to ensure that a 95% confidence interval has a margin of error not exceeding 5%?
19. A small pilot study is conducted to investigate the effect of a nutritional supplement on total body weight. Six participants agree to take the nutritional supplement. To assess its effect on body weight, weights are measured before starting the supplementation and then after 6 weeks. The data are shown below. Is there a significant increase in body weight following supplementation? Run the test at a 5% level of significance.
Subject 
Initial Weight 
Weight after 6 Weeks 
1 
155 
157 
2 
142 
145 
3 
176 
180 
4 
180 
175 
5 
210 
209 
6 
125 
126 
20. The following table was presented in an article summarizing a study to compare a new drug to a standard drug and to a placebo.
Characteristic* 
New Drug 
Standard Drug 
Placebo 
p 
Age, years 
45.2 (4.8) 
44.9 (5.1) 
42.8 (4.3) 
0.5746 
% Female 
51% 
55% 
57% 
0.1635 
Annual Income, $000s 
59.5 (14.3) 
63.8 (16.9) 
58.2 (13.6) 
0.4635 
% with Insurance 
87% 
65% 
82% 
0.0352 
Disease Stage 



0.0261 
Stage I 
35% 
18% 
33% 

Stage II 
42% 
37% 
47% 

Stage III 
23% 
51% 
20% 

*Table entries and Mean (SD) or %
a) Are there any statistically significant differences in the characteristics shown among the treatments? Justify your answer.
b) Consider the test for differences in age among treatments. Write the hypotheses and the formula of the test statistic used (No computations required – formula only).
c) Consider the test for differences in insurance coverage among treatments. Write the hypotheses and the formula of the test statistic used (No computations required – formula only).
d) Consider the test for differences in disease stage among treatments. Write the hypotheses and the formula of the test statistic used (No computations required – formula only).
21. A small pilot study is run to compare a new drug for chronic pain to one that is currently available. Participants are randomly assigned to receive either the new drug or the currently available drug and report improvement in pain on a 5point ordinal scale: 1=Pain is much worse, 2=Pain is slightly worse, 3= No change, 4=Pain improved slightly, 5=Pain much improved. Is there a significant difference in selfreported improvement in pain? Use the MannWhitney U test with a 5% level of significance.
New Drug: 4 5 3 3 4 2
Standard Drug: 2 3 4 1 2 3
22. Answer True or False to each of the following
a) The margin of error is always greater than or equal to the standard error.
b) If a test is run and p=0.0356, then we can reject H_{0} at a=0.01.
c) If a 95% CI for the difference in two independent means is (4.5 to 2.1), then the point estimate is 2.1.
d) If a 95% CI for the difference in two independent means is (2.1 to 4.5), there is no significant difference in means.
e) If a 90% CI for the mean is (75.3 to 80.9), we would reject H_{0}: m=70 in favor of H_{1}: m≠70 at a=0.05.
23. A randomized controlled trial is run to evaluate the effectiveness of a new drug for asthma in children. A total of 250 children are randomized to either the new drug or placebo (125 per group). The mean age of children assigned to the new drug is 12.4 with a standard deviation of 3.6 years. The mean age of children assigned to the placebo is 13.0 with a standard deviation of 4.0 years. Is there a statistically significant difference in ages of children assigned to the treatments? Run the appropriate test at a 5% level of significance.
24. Consider again the randomized controlled trial described in Problem 22. Suppose that there are 63 boys assigned to the new drug group and 58 boys assigned to the placebo. Is there a statistically significant difference in the proportions of boys assigned to the treatments? Run the appropriate test at a 5% level of significance.
25. A clinical trial is run to evaluate the effectiveness of a new drug to prevent preterm delivery. A total of n=250 pregnant women agree to participate and are randomly assigned to receive either the new drug or a placebo and followed through the course of pregnancy. Among 125 women receiving the new drug, 24 deliver preterm and among 125 women receiving the placebo, 38 deliver preterm. Construct a 95% confidence interval for the difference in proportions of women who deliver preterm.
26. “Average adult Americans are about one inch taller, but nearly a whopping 25 pounds heavier than they were in 1960, according to a new report from the Centers for Disease Control and Prevention (CDC). The bad news, says CDC is that average BMI (body mass index, a weightforheight formula used to measure obesity) has increased among adults from approximately 25 in 1960 to 28 in 2002.” Boston is considered one of America’s healthiest cities – is the weight gain since 1960 similar in Boston? A sample of n=25 adults suggested a mean increase of 17 pounds with a standard deviation of 8.6 pounds. Is Boston statistically significantly different in terms of weight gain since 1960? Run the appropriate test at a 5% level of significance.
27. In 2007, the CDC reported that approximately 6.6 per 1000 (0.66%) children were affected with autism spectrum disorder. A sample of 900 children from Boston are tested and 7 are diagnosed with autism spectrum disorder. Is the proportion of children affected with autism spectrum disorder higher in Boston as compared to the national estimate? Run the appropriate test at a 5% level of significance.
28. A clinical trial is being planned to investigate the effect of a new experimental drug designed to reduce total serum cholesterol. Investigators will enroll participants with total cholesterol levels between 200240, they will be randomized to receive the new drug or a placebo and followed for 2 months, and the total cholesterol will be measured. Investigators plan to run a test of hypothesis and want 80% power to detect a difference of 10 points in mean total cholesterol levels between groups. They assume that 10% of the participants randomized will be lost over the 2 month followup. How many participants must be enrolled in the study? Assume that the standard deviation of total cholesterol is 18.5.
29. An observational study is conducted to investigate the association between age and total serum cholesterol. The correlation is estimated at r = 0.35. The study involves n=125 participants and the mean (std dev) age is 44.3 (10.0) years with an age range of 35 to 55 years, and mean (std dev) total cholesterol is 202.8 (38.4).
a) Estimate the equation of the line that best describes the association between age (as the independent variable) and total serum cholesterol.
b) Estimate the total serum cholesterol for a 50year old person.
c) Estimate the total serum cholesterol for a 70year old person.
30. For each statement below, indicate whether the statement is true or false.
a) In logistic regression, the predictors are dichotomous, and the outcome is a continuous variable.
b) When calculating a correlation coefficient between two continuous variables, the scales on which the variables are measured affect the value of the correlation coefficient.
c) It is more difficult to reject a null hypothesis if we use a 10% level of significance compared with a 5% level of significance.
d) The sample size required to detect an effect size of 0.25 is larger than the sample size required to detect an effect size of 0.50 with 80% power and a 5% level of significance.
31. For each question below, provide a brief (12 sentences) response.
a) How is the slope coefficient (b_{1}) in a simple linear regression different than the coefficient (b_{1}) in a multiple linear regression model?
b) When would a survival analysis model be used instead of a logistic regression model?
c) What is the appropriate statistical test to assess whether there is an association between obesity status (normal weight, overweight, obese) and 5year incident cardiovascular disease (CVD)? Suppose each participant’s obesity status (category) is known as is whether they develop CVD over the next 5 years or not.
32. An observational study is conducted to compare experiences of men and women between the ages of 5059 years following coronary artery bypass surgery. Participants undergo the surgery and are followed until the time of death, until they are lost to followup or up to 30 years, whichever comes first. The following table details the experiences of participating men and women. The data below are years of death or years of last contact for men and women.
Men 

Women 

Year of Death 
Year of Last Contact 

Year of Death 
Year of Last Contact 
5 
8 

19 
4 
12 
17 

20 
9 
14 
24 

21 
14 
23 
26 

24 
15 
29 
26 


17 

27 


19 

29 


21 

30 


22 

30 


24 

30 


25 




30 
a) Estimate the Estimate the survival functions for each treatment group using the KaplanMeier approach
b) Test if there is a significant difference in survival between treatment groups using the log rank test and a 5% level of significance.