Problem Set - Statistics 2

Data Ref: https://www.kaggle.com/c/titanic/data

Tasks

Find the probability that...

  • a passenger survived (mandatory)

Choose two of the following and find the probability that...

  • a passenger was male
  • a passenger was female and had at least one sibling or spouse on board
  • a survivor was from Cherbourg

Plot the distribution of passenger ages. Choose visually-meaningful bin sizes and label your axes

Find the probability that (choose one)...

  • a passenger was less than 10 years old
  • a passenger was between 25 and 40 years old
  • a passenger was either younger than 20 years old or older than 50

Knowing nothing else about the passengers aside from the survival rate of the population (the first question), if I choose 100 passengers at random from the passenger list, what’s the probability that exactly 42 passengers survive?

What’s the probability that at least 42 of those 100 passengers survive?

Is there a statistically significant difference between...

  • ...the ages of male and female survivors?
  • ...the fares paid by passengers from Queenstown and the passengers from Cherbourg?

If so, at what level? If not, how do you know?

Accompany your p-values with histograms showing the distributions of both compared populations

*STRETCH GOAL* Write a function that takes N random samples of 100 passengers, and returns the fraction of those samples where at least 42 passengers survive. Choose a random seed and find approximately how many random samples you need to take before your fraction matches the probability you calculated (within \(\Delta p \approx 0.05\)).

It may help to visualize the survival fraction vs the number of random samples. Answers will vary based on the seed.

Submitting Your Work

Report your answers in an iPython/Jupyter notebook with either print statements or markdown. If you write any functions, include a docstring describing what that function does. Note: You are not writing tests for any functions. If you come to any conclusions via math (and you will), make sure your code matches what you say.

When your work is complete, push your work to github and issue a Pull Request to your master branch. Submit the URL for your pull request. After this is complete, you may merge your work to master.

As usual, use the comments function in canvas to submit questions, comments and reflections on this work.