Book name for the Cours Stats: Data and Models (4th Edition) (Hardcover)

by Richard D. De Veaux, Paul F. Velleman

Question 1.Re-expressing Data to Fit a Linear Model

Suppose that you have the following data below for x (the independent variable) and y (the response variable):

independent.var = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)

response.var = c(16.3, 9.7, 8.1, 4.2, 3.4, 2.9, 2.4, 2.3, 1.9, 1.7, 1.4, 1.3)

A) Using a linear model, fit a line to the above data without using a re-expression. Show the fitted line relative to a scatterplot of the data. Comment on what you see in terms of fit, and also calculate and explain the meaning of R2.

B) Re-express the data so that you obtain a better linear fit, and explain how and why you chose your re-expression. (NOTE: Only consider re-expressions that change the y-variable (the response variable, for this problem). Also, show the re-expressed fitted line relative to the re-expressed data. Comment on what you see in terms of fit, and also calculate and explain the meaning of R2.

Question 2.Simulation – Washington Nationals Win The 2019 World Series.

As we all know by now, the Washington Nationals beat the Houston Astros to win the 2019 World Series. However, before the start of that World Series, the Washington Nationals were often cited as having only a 40% chance of winning any particular game against the Astros.

So, how rare is what the Nats accomplished by winning the 2019 World Series?

Let’s pretend that the World Series has not been played yet. Keep in mind that the World Series is a series of up to 7 games. The winner is determined by whichever teams wins 4 games first. So, the World Series may last only 4 games, or perhaps 5 games, or 6 games, or even 7 games. The World Series is over as soon as one of the teams wins 4 games.

Run a simulation using the above information to assess the chances that the Washington Nationals win the World Series. Make sure that you show all of your work, including anything you do in R (e.g., any random numbers you generate and how you use them). In doing so, make sure that you specify how you are modeling the simulation using equally likely random digits, explain what constitutes a trial and its outcome.

Run 25 trials, where a single trial is the simulated result of a single completed World Series. Show your results of all 25 trials and the random numbers you used for each trial.

Finally, provide the results of your entire simulation – that is, state what your simulated estimate is of the likelihood that the Washington Nationals (the team with the 40% chance of winning any specific game in a series) wins the World Series – and show details of how you got that estimate.

Question 3Nutrition Surveys: Eat This, Not That…Oh, Never Mind

As mentioned in a very early lecture, one purpose of this course is to allow people to become more informed readers of news articles, and to be able to critically assess and compare different viewpoints. I referred to this as becoming more “numerate.”

Nutrition advice seems to be quite inconsistent over time. There are some reasons for this, as the article link below explains. Please carefully read the link below, and then proceed to answer the questions posed.

https://slate.com/human-interest/2015/04/nutritional-clinical-trials-vs-observational-studies-for-dietary-recommendations.html

Provide detailed answers to the three questions below:

A) List and explain at least 3 major weaknesses of nutritional studies as cited in the article.

B) Suppose that we want to study whether or not a particular food causes a particular health issue. In this example, the health outcome is the response variable. In particular, suppose we want to study whether sugary soft drinks cause obesity in adults over the age of 18. State how you might design an experiment to better determine causality (or not). Be detailed in your design explanation by specifically focusing on the relevant features of Four Principles of Experimental Design in your response (see pages 321-322 of the text). Also, explain how you will measure both the independent and response variables in this experiment.

Question 4

Probability Rules – Must show all needed steps in getting to your final answer. And, express your final answer as a decimal.

A) Assume that for events A and B we have the following:

P(A) = .40, P(B|A) = .20, and P(B) = .30.

Find P(A or B).

B) Suppose that P(A) = .40, P(B) = .30, and P(A and B) = .15. Are A and B independent events? Why or why not?

C) Suppose that you are planning to travel a certain air route via plane once each week for your new job. Also, assume that there is a 3% chance that your outbound flight may be cancelled on any given week due to various issues. How many consecutive outbound weekly flights can you fly before the probability of another successful flight (that is, a flight that is not cancelled) drops to 50% or less?

D) Suppose that 20% of cars that are inspected have faulty pollution control systems. The cost of repairing a pollution control system exceeds $100 about 40% of the time. When a driver takes her car in for inspection, what is the probability that she will end up paying less than $100 to repair the pollution control system.

E) Assume that you are playing a game in which you pull a lever and a light comes on. The light will be either red or green. Assume that on any given pull of the lever, P(Red Light) = .40 and P(Green Light) = .60. Find the probability that in pulling the lever 5 times that you will get your 3rd Green Light on the 5th pull of the lever.

F) Suppose in front of you are three boxes that look identical. Further, you are told that one box contains two $1 bills, one contains two $100 bills, and one contains one $1 bill and one $100 bill. You are permitted to choose one box. Then you are asked to remove one bill from the box you chose without looking at the other bill in that box. Suppose that a $100 bill comes out. What is the probability that the other bill in that box is $100?