back to my lmms wk 5 apply assignment due day 7 download the titanictr
Search for question
Question
Back to my LMMS
Wk 5 - Apply: Assignment [due Day 7]
Download the TitanicTrain and TitanicScore data files. The training data file contains attributes of actual passengers aboard the Titanic when it sank in the northern Atlantic Ocean on April 15, 1912. The scoring data set contains 100 fictitious people. You will build a logistic regression model
using the actual Titanic data and then use the model to predict the survival status of the fictitious people in the scoring data set.
To understand the attributes:
⚫ The TicketNumber and PassengerName attributes identify the passengers. TicketNumber does not exist in the scoring data because the fictitious people were not actually on the ship. You will predict what their fate likely would have been had they boarded the ship.
• The Sex attribute is binary: 0 = female; 1 = male.
• The Age attribute is the passenger's age in years. In the training data, some fractions of a year are included, and in some cases, the passenger's age was not recorded.
• The SiblingsAndSpouses attribute gives the number of brothers and sisters and/or spouses the passenger had on board the Titanic with them. For example, if a married couple boarded the Titanic with their two unmarried children, the husband and wife would each have 1 in this
attribute (each other), and the children would likewise each have 1 in this attribute (each other).
• The ParentsAndChildren attribute gives the number of parents and/or children the passenger had on board the Titanic with them. Continuing the example above, the mother and father would each have 2 in this attribute (their two children), and the children would likewise each have 2
in this column (their mother and father).
⚫ The PassengerClass attribute indicates the level of luxury the passenger booked: 1 = first-class, most expensive on the upper decks of the ship; 2 = second class, less expensive in the middle decks of the ship; 3 = third class, least expensive in the lower decks of the ship.
The Fare attribute is the amount of money the passenger paid for their ticket.
• The SurvivalStatus attribute exists only in the training data set and indicates whether each passenger lived or died when the ship sank.
Complete the following steps:
1. Import both the training and scoring data sets into RStudio. Name each of them descriptively. For simplicity in building the model, you may wish to attach() the training data set.
2. Using the glm() function in R, build a logistic regression model using the SurvivalStatus attribute as the dependent variable and all of the other attributes except TicketNumber and PassengerName as independent variables. Ensure the model is stored in a descriptively named object in
R. Apply the as.factor() function in R to the dependent variable, and ensure that the glm() model's family parameter is: family-binomial()
3. Apply the summary() function in R to the model so that you can evaluate the statistical significance of the independent variables.
4. Using the predict() function in R and type="response", apply the logistic regression model to the scoring data. Create a new data frame that combines the model's predicted outcomes with the individual observations in the scoring data. Name this data frame descriptively and view it in
RStudio.
Answer the following questions using the model. 1
Examining the summary() of the logistic regression model, how many of the independent variables are not statistically significant at the 95% confidence level (0.05 alpha)?
One independent variable is not statistically significant.
Two independent variables are not statistically significant.
All independent variables are statistically significant.
Four independent variables are not statistically significant.
2
Examining the z values in the summary() of the logistic regression model, which independent variable is the single best predictor of SurvivalStatus?
3
4
Fare
Age
PassengerClass
☐ Sex
Examining the z values in the summary() of the logistic regression model, which independent variable describing families traveling together is a better predictor of Survival Status: SiblingsAndSpouses or ParentsAndChildren?
Both of these independent variables are equally good predictors of SurvivalStatus.
ParentsAndChildren is a better predictor of SurvivalStatus than SiblingsAndSpouses.
There is no way to tell in this model.
SiblingsAndSpouses is a better predictor of SurvivalStatus than ParentsAndChildren.
K
K
The standard explanation regarding the evacuation of the Titanic is that women, children, and first-class passengers were allowed to board the lifeboats first, thus increasing their chances of survival, while passengers in lower classes (second and third) and older, male
passengers were required to wait to evacuate. Does this model support the standard explanation?
No. PassengerClass, Sex, and Age are not strong predictors of SurvivalStatus.
Partially. Age is a strong predictor of SurvivalStatus, but Sex and PassengerClass are not.
Partially. Sex and Age are strong predictors of SurvivalStatus, but PassengerClass is not.
Yes. PassengerClass, Sex, and Age are all strong predictors of SurvivalStatus. 5
Examining the data frame containing the predictions for the scoring data set, how many of the fictitious people are predicted to survive the Titanic disaster (Predicted Response scores greater than 50% )?
☐ 66
6
100
ப0
34
What is the age of the person in the scoring data set who is predicted to survive the Titanic disaster with a 52.7% probability?
49
17
2
☐ 34
7
What is the name of the person in the scoring data set who has the highest predicted survival probability?
Warren Guerrero
Tracey Mcgee
Renee Cortez
Kellie May
8
Of the passengers in the scoring data set who are younger than 10 years old, how many are predicted to survive?
9
3
☐ 0
12 9
There are several male, first-class passengers in the scoring data set who are older than 65. How many are predicted to survive?
10
☐ 3
0
9
12
Of the women in the scoring data set who would have been traveling with at least one parent or child, how many are predicted to survive?
1
49
26
45