Professor Andrew Gelman presented at the 7th ESRC Research Methods Festival, 5-7 July 2016, University of Bath. The Festival is organised every two years by the National Centre for Research Methods www.ncrm.ac.uk
30:46 He didn't notice that the 13th of each month was less popular? In the heatmap chart that he hated (25:19), the reduced tendency for a Day 13 was pretty obvious. I wonder if they considered the idea that the Valentines day effect was related to the 13th day effect? Like, delay the birth for the day after the 13th - which happens to be Feb. 14th.
I skipped this part because I had seen it in 2 other presentations of his, where he does point that out -- the initial bit where he says he hated the chart seems to actually be planting the later joke where he points out it was actually useful.
Regarding the pollution study that starts at 12:30 of the presentation: Prof. Gelman said it's a bad analysis because, among others, (1) the authors fit a fifth-degree polynomial, and that (2) the life expectancy should not relate to degrees north of the river. I don't understand the comments. (1) To me it's not a fifth-degree polynomial. It's a two(or three?)-degree polynomial on each side of the vertical line, and the key thing is the vertical distance of the two curves right at the vertical line. I really don't get his comment on this. (2) The linkage between the life expectancy and the degree north of the river is that further north the weather is colder and so households are likely to use more coal heating and thus more pollution. The effect of the pollution on the life expectancy is the main thesis of the paper. I thought this is very obvious and am puzzled by Prof. Gelman's complain.
A fifth degree polynomial may be specified in which the terms on the 4th and 5th may be small, so it's functionally similar to a 2nd or 3rd degree. The broader point is there's no intrinsic reason to assume a 'curvy' relationship of the sort this family of models assumes the data follows.
@@jonminton3574 yep! in my head, the best approach with this data, if they really want to employ an RDD analysis, is to adjust for counties (eg, with random effects or better with a spatial gaussian process), and then add a binary predictor `is_north_of_river`. my bet is that it'll show insignificant.