Learnt something today , thanks! I think for the last example of unlearnai, they will still need to test few real people with placebo to validate their model performance. With a proven working model, they can test mainly with real drug for side effect, etc
Its not often you hear a researcher give a high level talk that regular folks can understand. Great talk. Enjoyed it thoroughly. About that 20$ though, whats the algo haha
at the moment it is often using UCB/Upper Confidence Bound to maximise utility return. But the overall problem is, in casino the reward is not simply one state. It is far complex than simple one state bandit context tho. The casino example is a mere oversimplifying.
Hello Sandeep, thank you for the quick overrun. Do you mind to tell us how to connect or discuss with you after this session? Follow up, so I feel that Multi Armed Bandit is sort of Optimisation Problem given such constraint that it is quite hard and ineffective to perform AB Testing? Do you agree with such motion? Let me know your inputs
You cant use Multi armed bandits in online experimentation because they cause return user bias. MAB's can only be used once per user. The problem is that bandit machines have a fixed probability of payout.... whilst a user of a websites probability of buying something increases over time. This means that if they are switched into a new variation that new variation is more likely to incur an outcome of a sale...... flawed experiment!