Problem Set - Statistics 3

Data Ref: https://www.kaggle.com/c/titanic/data

Tasks

Choose one of these two

  • Did survivors pay more for their tickets than those that didn’t? If so, to what signifance level?
  • Did a given first-class passenger have less family members on board than a given third-class passenger? If so, to what significance level?

Here is a basic linear model for the relationship between Age and Fare:

\[\text{Fare } = C_0\cdot\text{ Age}\]

Find the a value for \(C_0\) that best fits the data and illustrates the relationship between Age and Fare. Note that while this still may not be a good fit to the data, it’ll be the best fit for *THIS* model

Submitting Your Work

Report your answers in an iPython/Jupyter notebook with either print statements or markdown. If you write any functions, include a docstring describing what that function does. Note: You are not writing tests for any functions. If you come to any conclusions via math (and you will), make sure your code matches what you say.

When your work is complete, push your work to github and issue a Pull Request to your master branch. Submit the URL for your pull request. After this is complete, you may merge your work to master.

As usual, use the comments function in canvas to submit questions, comments and reflections on this work.