22sDataAnalysisAssignment2.pdf

22sDataAnalysisAssignment2.pdf

1

STAT 250 Spring 2022 Data Analysis Assignment 2

You may not upload this file to any online homework help sites. In addition, you may not

discuss this assignment on any group chats with any individuals (either in this course or not).

Please see our course syllabus for honor code rules. Thank you.

Your solutions document should include the following items. Points will be deducted if the

following are not included.

1. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #2 centered on the top of page 1

below your name to begin your solutions document.

2. Number your pages across your entire solutions document. 3. Your solutions document should include the ANSWERS ONLY with each answer

labeled by its corresponding number and subpart. Keep the answers in order.

4. Generate all requested graphs and tables using StatCrunch. 5. Upload your solutions document onto Blackboard as a pdf file using the link provided by

your instructor. It is your responsibility for uploading a readable file.

6. You may not work with other individuals on this assignment. It is an honor code violation if you do.

Please note: all StatCrunch Instructions provided in the parts of the problems will be presented

in italics.

Elements of good technical writing:

Use complete and coherent sentences to answer the questions.

Graphs must be appropriately titled and should refer to the context of the question.

Graphical displays must include labels with units if appropriate for each axis.

Units should always be included when referring to numerical values.

When making a comparison you must use comparative language, such as “greater than”, “less

than”, or “about the same as.”

Ensure that all graphs and tables appear on one page and are not split across two pages.

Type all mathematical calculations when directed to compute an answer ‘by-hand.’

Pictures of actual handwritten work are not accepted on this assignment.

When writing mathematical expressions into your solutions document you may use either an

equation editor or common shortcuts. For example, √𝑥 can be written as sqrt(x), �̂� can be written as p-hat, and �̅� can be written as x-bar.

2

Investigation 1: In-State Tuition

A list of 241 public colleges and universities in the States of Georgia (GA), South Carolina (SC),

North Carolina (NC), Tennessee (TN), Kentucky (KY), Virginia (VA), West Virginia (WV),

Maryland (MD), Delaware (DE), Pennsylvania (PA), New Jersey (NJ), New York (NY),

Connecticut (CT), Rhode Island (RI), and Massachusetts (MA) and their In-State Tuition was

collected. The data set found in our StatCrunch group is called “In-State Tuition” and the

variables State, University, and In-State Tuition (in dollars) are presented. Consider the 241

observations as a sample of all public colleges and universities in the United States.

a) Use StatCrunch to construct an appropriately titled and labeled relative frequency histogram of the “In-State Tuition” variable. Copy your histogram into your solutions.

b) What is the shape of this distribution? Answer this question in one complete sentence.

c) Now overlay your highlighted histogram from part (a) with a Normal curve and add a vertical line at the mean. Go to Options  Edit in the top left corner of your graph.

Inside the histogram graph box, look for Display Options. Next to “Overlay distrib.:”

click the arrow next to the word –optional– and select Normal. Then, check the box next

to mean under the word “Markers.” Copy and paste this histogram into your solutions.

d) Do you think it is reasonable to use the Normal probability model in this case? Answer this question and provide a reason why in one sentence.

e) Using your histogram, determine the proportion of Universities with In-State Tuition less than $5,000. To do this, click on each bar that is representing less than $5,000 and look

in the bottom left of your screen to see how many Universities are highlighted. Calculate

the correct proportion and type your work for the calculation of the proportion. Please

round your proportion to four decimal places.

f) Calculate the mean and the standard deviation of the “In-State Tuition” variable using StatCrunch. (Select Stat  Summary Stats  Columns.) Copy and paste a table only

presenting the mean and standard deviation into your solutions. Round the mean and

standard deviation to two decimal places inside this table.

No matter your answer to part (d), for parts (g) – (k), assume that the distribution of In-State

Tuition in the population is Normal with the mean and standard deviation found in part (f). (Use

the rounded mean and standard deviation values.)

g) Use the Standard Normal Table (Table 2 in your text or formula packet) to calculate the probability that a randomly selected College will have an In-State Tuition of less than

$5,000. Type all calculations needed to find this probability and your answer in your

solutions.

3

h) Verify your answer in part (g) using the StatCrunch Normal calculator by using: Stat  Calculators  Normal. Use the mean and standard deviation found in part (f) and copy

the image with all values included into your solutions. Make sure your image presents all

values including the probability. In addition, write one sentence to explain what the

probability means in context of the question posed in part (g).

i) Compare the probability from part (h) to the proportion you calculated in part (e) using the context of the question. Label each value as a theoretical or an empirical probability

in your comparison.

j) Use StatCrunch Normal calculator to calculate the probability that that a randomly selected University will have an In-State Tuition between $10,000 and $15,000. Present

your StatCrunch image as your answer as you did in part (h).

k) Calculate the minimum tuition that would put a public university in the highest 3% (i.e. top 3%) of all in-state tuitions. Provide the minimum tuition in dollars and cents such

that this value of any dollar amount higher would be in the top 3% of all tuitions. First,

type all calculations using the Standard Normal Table necessary to obtain your answer.

Round your answer to two decimal places (i.e. dollars and cents). Then, verify your

result using the StatCrunch Normal calculator and copy the StatCrunch image in your

solutions as you did in part (h).

Investigation 2: Bachelor’s Degrees around the DMV (no data set)

From the U.S. Census Bureau’s American Community Survey in 2019 it was found that 33.13%

of United States residents over the age of 25 had an educational attainment of a bachelor’s

degree or higher. In the District of Columbia, the percentage of residents over the age of 25 who

had attained a bachelor’s degree or higher was 59.67%. An investigator for the U.S. Census

Bureau took a random sample of seven residents from the District of Columbia and asked them

their highest educational degree they had obtained. When there is no data set posted for an

investigation, open a blank StatCrunch page by clicking “Open StatCrunch” on the StatCrunch

home page.

a) Verify that the sample from the District of Columbia satisfies the conditions of the binomial experiment. Write one sentence to check each requirement in context of the

investigation.

b) Assuming the sample from the District of Columbia is a binomial experiment, build the probability distribution in a single table and include the table in your solutions. You may

present this table horizontally or vertically and leave the probabilities unrounded. There

are two possible ways to do this:

1. You may use Data  Compute  Expression and choose the function dbinom. This method relies on you entering all the outcome values of the random variable in the

first column of your data table. Copy the table from StatCrunch to your answer

solutions.

4

2. The other way to do this is to use the binomial calculator and calculate the probability of each of the values of the random variable from X = 0 to X = 7. Then create the

table in your solutions based on these results.

c) Calculate the probability that at least five individuals from the District of Columbia in this sample have attained a bachelor’s degree or higher using the probability distribution

table you created in part (b). Type all of your calculations and use proper probability

notation in your solutions. Round your final answer to four decimal places.

d) Verify your answer to part (c) using the StatCrunch binomial calculator. Copy the full image into your solutions. Write a one-sentence interpretation of the probability in

context of the question.

e) Assume we take a sample of seven United States residents and this sample has a binomial distribution. Calculate the probability that at least five residents from the United States in

this sample have attained a bachelor’s degree or higher using the StatCrunch binomial

calculator. Copy the image of the Binomial calculator in your solutions. In addition,

write a one-sentence interpretation of the shape of the probability distribution graph.

f) Compare the probability calculated in part (e) to the probability you calculated in part (d) in one sentence.

g) Calculate the mean and standard deviation of the number of individuals from the District of Columbia in this sample who have obtained a bachelor’s degree or higher. Show your

work using the binomial mean and binomial standard deviation formulas in your

solutions. Round your answers to two decimal places. (It is not necessary to use

StatCrunch for this part.)

h) Imagine you repeated taking a sample of seven individuals from the District of Columbia population 100 times. We can simulate this in StatCrunch. First, go to Data  Simulate

 Binomial. Next, enter 100 for Rows, 1 for Columns, 7 for n, 0.5967 for p, and click

compute. Then, to visualize these data, go to Graph  Histogram. Produce a properly

titled and labeled frequency histogram and paste it into your solutions.

i) Calculate the proportion of the 100 samples that had at least five District of Columbia residents having obtained a bachelor’s degree or higher. Round your answer to four

decimal places and show your calculation in your solutions.

j) In one sentence, compare the proportion of the 100 samples having at least five residents with a bachelor’s degree or higher (calculated in part (i)) with the probability of having at

least five residents with a bachelor’s degree or higher (calculated in parts (c) and (d)). In

your comparison, define each probability as empirical or theoretical.

5

Investigation 3: Building a Sampling Distribution (no data set)

We will use the Sampling Distribution applet in StatCrunch to investigate properties of the

sampling distribution of United States residents over the age of 25 that have obtained a

bachelor’s degree or higher from the previous investigation. Remember, the given probability of

a US resident from having attained a bachelor’s degree or higher is 0.3313. We will begin by

taking a sample of seven.

Under Applets  Sampling distributions (box shown below). First, select Binary for the

population. Next, to the right of “p:”, enter 0.3313. Then click on Compute!

a) Once the applet box is opened, enter 7 in the box to the right of the words “sample size” in the right middle of the applet box window (see image below). Then, at the top of the

applet, click “1 time.” Watch the resulting animation. After the sample is obtained, copy

and paste the entire applet box (using Options  Copy) into your solutions.

6

b) Click “Reset” at the top of the applet. Then, click the “1000 times” to take 1000 samples of size 7. Copy and paste the applet image into your solutions.

c) Describe the shape of the Sample Proportions graph at the bottom of your image from part (b) in one sentence.

d) Use the Central Limit Theorem large sample size condition to determine if it is reasonable to approximate this sampling distribution as Normal. Explicitly show these

calculations for the condition in your answer. Write a one-sentence explanation on the

condition and the calculations.

e) Click Reset at the top of the applet. Type 77 in the sample size box. Then, click the “1000 times” to take 1000 samples of size 77. Copy and paste the applet image into your

solutions.

f) Describe the shape of the Sample Proportions graph at the bottom of your image from part (e) in one sentence.

g) Why do you think that this graph from part (e) has the shape you described? Use the Central Limit Theorem large sample size condition to answer this question in one

sentence. Explicitly show these calculations for the condition in your answer.

h) Using the image in part (e), write the values you obtained for the mean (in green) and the standard deviation (in blue). These values are found in the bottom right box labeled

“Sample Prop. of 1s.”

i) Compare the mean value of the sampling distribution (in green, found in part (h)) to the known population proportion in one sentence in context. Make sure to reference the

values in your comparison.

j) Now calculate the standard error of the sample proportion using p = 0.3313 and n = 77 by hand. Type your calculations and round your answer to four decimal places.

k) Compare the value of the standard error of the sample proportions in part (j) to the standard deviation of the sampling distribution (in blue) you obtained in part (h) in one

sentence in context. Make sure to reference the values in your comparison.

l) Use the sampling distribution to calculate the probability that from a sample of 77 US residents over the age of 25 that a majority of the US residents in the sample (more than

50%) have attained a bachelor’s degree or higher. Type all of your calculations and the

answer in your solutions.

m) Verify your answer in part (l) using the StatCrunch Normal calculator (Stat  Calculators  Normal) and copy this image in your solutions. Also, write one complete

sentence to interpret the resulting probability in context of the question. Hint: use the

values from part (l) in the calculator.

7

Investigation 4: Bachelor’s Degrees around the DMV Continued (no data set)

From the U.S. Census Bureau’s American Community Survey in 2019 it was found that 33.13%

of United States residents over the age of 25 had an educational attainment of having a

bachelor’s degree or higher. In the District of Columbia the percentage of residents over the age

of 25 who had attained a bachelor’s degree or higher was greater than any other state at 59.67%.

One sample of 77 U.S. residents over 25 was randomly and independently selected. Of those

sampled, 39 had a bachelor’s degree or higher. Assume the population of U.S. residents over 25

is 250 million.

a) Calculate the sample proportion of U.S. residents over 25 who had a bachelor’s degree or higher. Type your calculation and round your answer to four decimal places.

b) Write one sentence each to check the three conditions of the Central Limit Theorem. Type the work for all mathematical calculations needed to check these conditions.

c) Using the sample proportion obtained in (a), construct a 90% confidence interval to estimate the population proportion of US residents over 25 who had a bachelor’s degree

or higher. Please do this by typing all calculations using the formula and typing your

work. Round your confidence limits to two decimal places.

d) Verify your result from part (c) using Stat  Proportions Stats  One Sample  With Summary. Inside the box, enter the number of successes, the number of observations,

select confidence interval, set it to the correct level, and click Compute! Copy and paste

your entire StatCrunch result in your solutions.

e) Interpret the StatCrunch confidence interval in part (d) in context of the question using one complete sentence.

f) Assume you obtained the same sample result of 39 out of 77 having a bachelor’s degree or higher from the District of Columbia. Does the calculation of the 90% confidence

interval for the population proportion change? If so, show your work as you did in part

(c). If not, explain why it does not change in one sentence.

g) For each population (the U.S. population and the DC population), does the 90% confidence interval you constructed capture the value of the given parameter? Use one

sentence for each population to answer the question.

h) Use the Confidence Interval applet for the proportion (Applets  Confidence Intervals  For a Proportion) in StatCrunch to simulate constructing one thousand 90% confidence

intervals. You are given that the population proportion of US residents who have a

bachelor’s degree or higher is p = 0.3313 and remember our sample size was n = 77.

Once the window is open, click “Reset” and select (or click) 1000 intervals. Copy and

paste your full image into your solutions.

8

i) Compare the “Prop. contained” value from part (h) to the confidence level associated with the simulation in one sentence.

j) Write an interpretation for your confidence interval method in terms of what would happen if many more samples of the same size were taken and their respective confidence

intervals calculated. Use context and one complete sentence for your interpretation.

Box 1: Enter the given

population proportion,

0.3313

Box 2: Enter the given

confidence level 0.90

Box 3: Enter the given

sample size, n = 77