Meteorologists classify hurricanes based on their intensity. A climatologist wants to know whether category 1 hurricanes (the least intense) are more or less likely than severe hurricanes (with a category higher than 1). The climatologist decides to use a � = .01 significance level. The hurricane dataset includes the category for a sample of hurricanes over a 63-year period.

Homework Assignment
Michael J. Culbertson UW–Madison STAT 371 Fall 2021
HOMEWORK 7: ONE GROUP, PARAMETRIC
Instructions
This assignment is due in Canvas byWednesday, October 27 at 11:59pm. Please read the
homework guide in its entirety before starting to work on this assignment. The homework guide
includes code templates that you can easily modify and combine to complete the assignment, but it
requires that you understand each of the commands demonstrated in the homework guide.
Your submission needs to include your R code, the corresponding R output, and your
narrative interpretation/responses to the questions (in complete sentences). The easiest way to do
this is to work in an R Notebook, as described in the first homework guide and demonstrated in the
corresponding video. If you use an R Notebook, you will submit the notebook’s HTML file to Canvas.
Exercise
Refer to the pineapple and hurricane datasets on Canvas for this assignment.
1. Meteorologists classify hurricanes based on their intensity. A climatologist wants to
know whether category 1 hurricanes (the least intense) are more or less likely than severe
hurricanes (with a category higher than 1). The climatologist decides to use a � = .01
significance level. The hurricane dataset includes the category for a sample of hurricanes
over a 63-year period.
a. What are the population, sample, parameter, and statistic for this study?
b. What are the null and alternative models for this statistical test? State them both
in mathematical notation and in words.
c. What model assumptions do you need to check before running this test? Do they
seem plausible for this research scenario?
d. Find the standard error for the statistic’s sampling distribution.
e. Find the test statistic for this test, and explain what the test statistic means in the
context of the research study.
f. Find the p-value for this test, and explain what the p-value represents in the context of the research study.
g. Explain what conclusion the climatologist will draw, based on the results of the
statistical test.
h. Explain whether you would have drawn the same conclusion if you used a significance level of � = .05, instead. Why is it important to choose the significance level at the start of the study, before examining the data?
2. At the Hawaii Pineapple Company, managers are interested in the size of the pineapples
grown in the company’s fields. Last year, the weight of the pineapples harvested from one
large field was roughly Normally distributed with a mean of 31 ounces and a standard
deviation of 4 ounces. A different irrigation system was installed in this field after the
2
growing season. Managers wonder if the mean weight of pineapples grown in the field
this year has changed. They weighed a random sample of pineapples during the harvest,
which are included in the pineapple dataset on Canvas. Use the significance level � = .05
for this statistical test.
a. What are the population, sample, parameter, and statistic for this study?
b. What are the null and alternative models for this statistical test? State them both
in mathematical notation an in words.
c. What model assumptions do you need to check before running this test? Do they
seem plausible for this research scenario?
d. Check whether the pineapple weights can be reasonably approximated by a Normal distribution. Do you need to be concerned with any departures from Normality for this research study? Why or why not?
e. Find the standard error for the statistic’s sampling distribution.
f. Find the test statistic for this test, and explain what the test statistic means in the
context of the research study.
g. Which distribution should the managers use to compute the p-value for this test
statistic?
h. Find the p-value for this test, and explain what the p-value represents in the context of the research study.
i. Explain what conclusion the managers will draw, based on the results of the statistical test.
3. The mean cholesterol level in the U.S. population is approximately 180, with a standard
deviation of 41. A pharmaceutical company thinks that the new drug they have developed
will reduce cholesterol levels by 10 percent. They are planning to run a study of the drug
with 40 individuals. They want to make sure this sample is large enough before investing
in the research study, so they decide to find the statistical power of the research study.
They will conduct a two-sided test with a significance level of � = .01.
a. Describe Type I and Type II errors in the context of this study.
b. What are the null model and hypothesized alternative model in this study?
c. What will be the standard error in this study?
d. Find the boundary of the rejection region. What does this boundary represent in
the context of the research scenario?
e. Calculate the statistical power, and interpret in the context of the research study.
f. Would you advise the company to proceed with the study as designed? Why or
why not?

Homework Guide
Michael J. Culbertson UW–Madison STAT 371 Fall 2021
HOMEWORK 7: ONE GROUP, PARAMETRIC
Accessing Variables in a Data Frame
So far, we have been using the dplyr package to analyze our data. This package makes it easy to
refer to variables in a dataset by the variable name alone. But, R is a collection of many functions
written by many different people. A lot of the older functions do not use the same paradigm as dplyr
for specifying dataset variable names. To use these older functions, we’ll need another way to tell R
which variable in which dataset we want to use.
One common way to access variables in a data.frame is with the $ operator. To use this operator,
you specify the dataset name, followed by the $ operator, followed by the specific variable name in
that dataset that you want to access. For example, to access the spine variable in the crabs dataset,
you would use:
crabs$spine
This way of accessing variables is not necessary for dplyr functions, like count() and summarize(), but is necessary for older R functions, like those introduced below. The homework guides
will let you know when you need to use the $ operator style of accessing variables for new functions.
Checking Normality
The Q–Q plot allows us to assess to what extent a sample can be well approximated by a Normal
distribution. The qqnorm() function asks R to create a Q–Q plot for a given variable. Simply provide
the dataset and variable name to qqnorm(), using the $ operator style. For example, to create a Q–Q
plot for the heart rates (pulse) in the nhanes dataset, you can run:
qqnorm(nhanes$pulse)
Since you’ll be looking to see whether the points in this graph fall in a straight line, it can be helpful
to ask R to add a straight line to the graph for comparison. You can do this with the qqline() function, using the same variable you gave to qqnorm(), like so:
qqline(nhanes$pulse)
Test of a Mean
When conducting a statistical test for a mean, you could compute the test statistic and p-value
manually, but R also provides a convenient function that will conduct this test for you. The function
is called t.test(). This function will actually run several different types of tests that use the t distribution, but here, we’ll only look at the usage for the one-group test we have covered in class so far.
2
To run a test of a mean with t.test(), you supply the variable with your sample data (using the
$ operator style) and the parameter for the null hypothesis as the named argument mu. For example,
to test whether the mean weight of our horseshoe crab population is 2500 grams, we could execute:
t.test(crabs$weight, mu = 2500)
This returns the following output:
One Sample t-test
data: crabs$weight
t = -1.4317, df = 172, p-value = 0.154
alternative hypothesis: true mean is not equal to 2500
95 percent confidence interval:
2350.597 2523.784
sample estimates:
mean of x
2437.191
The function tells us that the test statistic is -1.43, that the sampling distribution for this test statistic
has 172 degrees of freedom, and that the p-value for this test is .154. By default, t.test() conducts a
two-sided test. To conduct a one-sided test, you can specify which side (“less” or “greater”) in the
alternative named argument, like this:
t.test(crabs$weight, mu = 2500, alternative = “less”)
Finally, notice that t.test() also gave us a 95 percent confidence interval for the population
mean: (2350.6, 2523.8). If we want a different confidence interval, we can specify the confidence level
using the conf.level named argument. For example, for a 90 percent confidence interval:
t.test(crabs$weight, mu = 2500, conf.level = .90)
The output from this version tells us that the 90 percent confidence interval is (2364.6, 2509.7).
Scientific Notation
As you have been working with pnorm(), you may have run across some funny looking results.
For example, if you try to run the following:
pnorm(-4, 0, 1)
R will reply with 3.167124e-05. The “e” in a number is R’s way of writing scientific notation. This
number is actually 3.167124 ∙ 10!” = 0.00003167124.

hw-07-mean-prop hw-guide-07-mean-prop

APA

CLICK HERE FOR FURTHER ASSISTANCE ON THIS ASSIGNMENT

The post Meteorologists classify hurricanes based on their intensity. A climatologist wants to know whether category 1 hurricanes (the least intense) are more or less likely than severe hurricanes (with a category higher than 1). The climatologist decides to use a � = .01 significance level. The hurricane dataset includes the category for a sample of hurricanes over a 63-year period. appeared first on Apax Researchers.

APA

CLICK HERE FOR FURTHER ASSISTANCE ON THIS ASSIGNMENT

Related posts: