*My takeaways:* 1. L1 and L2 logistic regression 5:22 2. Receiver operating characteristic 16:00 3. Statistical Sins 26:55 3.1 Example1: statistics about the data is not the same as the data, we should plot and visualise data 28:40 3.2 Example2: lying with charts, e.g. Y-axis start point 30:57 3.3 Example3: lying with charts, e.g. Y-axis start point, no Y-axis label, confusing X-axis label 32:45 3.4 GIGO: garbage in garbage out: analysis of bad data is worse than no analysis at all 35:40 3.5 Survivor bias: it's not easy to get random samples in real life 41:35, in such cases, we can't use the empirical rule, central limit theorem and standard error on them 46:38
I was playing around with the Titanic data and noticed another correlation between features - The average ages of the passengers was not evenly spread across the different cabin classes, with the cabins having average ages of 39.16 for first class, 29.51 for second class, and 24.82 for third class. Examining the weights that logistic regression provides when evaluated on just a single cabin class shows that age within a cabin class is strongly associated with the passenger surviving.
@25:00 As sensitivity increases specificity decreases, so plotting sensitivity vs specificity will result in a convex curve. Whereas sensitivity vs 1 - specificity will result in a concave curve and AOC will be easier to visualize. Also, 1 - specificity results in FP/(TP+FN) which is False Positive Rate, so it will be TPR(sensitivity) vs FPR that is the focus will be on positives...
Great lecture. Just a quick question : the age coefficient is very small (-0.03), but are all the features normalized before fitting logistic regression? If they are not, then age has a much bigger impact since the difference between a 20 and a 50 year old is (30x-0.03) =-0.9 which is almost twice the impact of being in the third cabin.
So professor first makes a model with perfect collinearity and doen't explain the issues related to be doing that. The first model felt in to a dummy variable trap.
Thanks. Video time point .... ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-K2SC-WPdT6k.htmlm33s . I still want to know what quote was [not] shown, so that I can insert my own picture.
The original slide read: "A Thing of the Past?" and "Insert Photo Here," referring to garbage-in-garbage-out data (see slides 19-20 in the deck found on the OCW site: ocw.mit.edu/6-0002F16).
Thanks for the great lecture. For me a bit confusing thing in the ROC curve (blue curve at 18:46) was that it wasn't showing the values of 'p' explicitly. Here is 'p' color-coded drive.google.com/file/d/15-AUzxuzvgFPfUoU3MxyCI1I1-Z19qWO/