nope. although you had the beers sorted in an almost rank I would have too, taste is subjective to each person. this still doesn't work in general. one person might poo poo PU but love Bud. he might be dumb but he has the right to like what he likes.
Well since the best beer always bubbles to the top on the first round, you don't have to taste it on the following rounds, do you? So it's kind of cheating. But if you go from the top of the list to the bottom, then the worst beer will bubble down. Thus, you will have an excuse to drink less of the worse beer.
Thank you so much!!!... I have been listening to other videos on this topic, but I only got more confused. This has clarified everything for me. Thank you!
This is an unreal explanation of Bayesian data analysis, thank you so much. This is the clearest and most intuitive introduction I've found, great job.
Fantastic post Rasmus - You didn't provide an example of using pyspark using RStudio. It would be nice to see how that is done using a local Spark instance. I managed to get the sparklyr version running nicely on my windows machine after a bit of time to setup Spark, Scala. Pyspark is presenting me with more challenges.
did I get it right and all these tools don't enable you to run base R code/any random package functions on the data, but only tidyr stuff/maybe some additional package's stuff in case of Spark (as you mentioned that one can also do machine learning algos with spark, but not how)?
Yep, that the grim reality. The general strategy for dealing with big data is to actually *not* use R or python, but a system that can actually handle big data. However, if you're happy with the dplyr API then you can pretend that you're still using R, to a large degree.
Rasmus, I'm so glad you upload these presentations. Your trilogy on Bayesian analysis has got me started into the Bayesian world, which I use now a lot in my research.
May it rather simply be because people that are into bayesian stats are more pro (i.e. bayesian stats are less democratised at the moment ) and thus the cohort on which it is trained writes less "non-sense" ?
Lost soul. I thought I understood bayes, BUT this is trying to get the priors. Normally it is assumed we know them. So wouldn't a little pseudo code help thick old gits like me. Any help welcom
Notes for my future revision. Why Bayesian Data Analysis? 0:29 How easy it is to change Bayesian model while the computation stay the same. 0:32 You have great flexibility when building Bayesian models, and can focus on that, rather than computational (algorithmic) issues. There are often computational (processing) issue in fitting Bayesian model. But since there is clean separation between specifying and fitting model in a bayesian framework, you often don't have to focus too much on how your model is computed when you construct it. That mean you can focus on what assumptions are reasonable and what information you should use, rather than on algorithm when doing the actual modelling. There are many tools to help fitting Bayesian models (Stan, PyMc), just specifying the model might just be enough.
Notes for my future revision. 16:04 A parameter value that is more likely to generate the data we collected, is going to be proportionally more common in this blue distribution. A parameter value that is twice as likely (as some other parameter values) to generate the data we saw is roughly going to be twice as common in this blue distribution. Parameter value below 0.1 and above 0.8 almost never result in the data we observed. 18:33 The Posterior Distribution is really the end product of a Bayesian analysis. It contain both information from the model and from the data. It can be used to answer all sorts of questions (e.g. Maximum likelihood estimate of the mean sign up rate, the posterior mean, the probability of a range of rate, the shortest interval aka Credible Interval that cover 90% of the probability etc.) 17:05 Bayesian data analysis is all about representing uncertainties with probabilities. The sign up rate is still uncertain. But we can use the distribution to answer many questions e.g. Maximum likelihood estimate of the mean sign up rate, the poterior mean, the probability of a range of rate, the shortest interval aka Credible Interval that cover 90% of the probability etc. 17:52 Translating the histogram to probability, we end up with a probability distribution of the likely sign up rate. 19:09 As we used uniformly distributed Priors, this is also the parameter value that is the mostly to generate the data we observed. In classical statistic, this type of estimate is known as 'Maximum Likelihood Estimate'. This is why Bayesian data analysis is an extension of 'Maximum Likelihood Estimation'. If you used flat prior, you will always get maximum likelihood for free.