If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: www.subscribestar.com/yannickilcher Patreon: www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
You make wonderful videos! 👏 I’ve got a question: 🤨 I only have these words 🤔. (behave today finger ski upon boy assault summer exhaust beauty stereo over). What should I do with this? 🤷♂️
This is already sort of a 'fail' in that the important thing science does is not about the relations... the symbolic equation, that's just icing. the important part is the terms themselves, the properties and operators that you think could describe a system. So even if this is successful, it's basically useless, 99% of the 'work' is already done in deciding we care about this thing call 'mass' that things have 'mass' etc.
27.57 explaination, here they have written if the likelihood of p(y/x) is low then 1 - p(y/x) will accelerate the gredients.. If i am not wrong then it can be consider as, if lets say likelihood of loosing side is more then the gredient will accelerate towards then other side.
I skimmed through the paper and couldn't find the part where they state that random attention pattern is the same from layer to layer... Are you sure layers didn't have different patterns for the same batch? Mind pointing to exact location in the paper where you got this idea? (sorry for being nit-picking, but this part seem to be important)
who would guess that honesty and malreluctance to say horrible things and call them out as such would yield the most truth as a self-censoring A.I. but i would appreciate if you kept your little filthy fingers away from places that you betitle with "terrible, horrible" or any adjective of similar magnitude.
I actually tried to implement this recently, not knowing it had been invented already. Though my idea was to put constraints on Z and then do more training steps to get the latent representation.
I am new to the AI field (studying deep learning). But I am not new to the --isms debates. I studied intellectual history back in undergrad. [For those interested, from the history of philosophy perspective, there is the rise of postmodernism. And, from the philosophy of science perspective, there is Thomas Kuhn to start with.] I look at this tweet exchange and I see two sides arguing from different mindsets, motives, and goals. One side is trying to discover new scientific truths (if you're a scientific realist); the other side is trying to shift power balances. I do think both goals are important. But, in my simplified assessment, this is at least in part the reason why the benefit of public discourse seems at an all-time low (some days).
Thanks for the explanation: so the Q function basically percolates from the near end-game moves, whose rewards are easier to learn, and gradually working its way from the back to the beginning?
This paper should be called, how to implement A* in the most expensive way possible. I would be surprised if a transformer network, which is also remember people, is just a bunch of stacked NNs with back prop goodness, couldn't learn it. Transformers should be able, if massaged and trained correctly, to do basically any type of ML curve fitting we've discovered, if you can shove it into their context window (which is just a big vector that gets shoved through a bunch of NNs) learning A*, a very short algo, and some data processing, seems very reasonable.
I am enrolled in a masters course in AI and i have to read lot of research papers like these as new ones come out and this channel has the best simplest paper explanation videos out there. Also i completely disregarded all the hints about the authors of the paper i dont know who wrote it. 🤫
It's almost like Hafner et al watched your video and built v3 to rectify your criticisms. Transferability to other problems - check, less hyperparameters - check, more generalizable loss function - check. Would really love to see a video like this going over v3. Been having a hell of a time wrapping my head around it, but this video is still helping a ton. Thanks Yannic!!