A next-generation kind of school where Machine Intelligence meets Human Creativity. For game changers, entrepreneurs, risk takers building an amazing future.
I’m based in Milan, any chances any events will take place in Milan as well? I working on computational environment for agents. Already in production to some degree.
i notice that Yasmin Moslem is the only person without a photo, is being photographed for public display against her religious beliefs, does she also not appear in public in person, or is she simply a virtual mentor who doesn't exist in the real world?
She is a human, a researcher, but she does not want to appear. We understand it might be strange for a human in 2023, but we respect and support any choices.
what these systems don't do i suppose is recognize text from images/scans which translators work a lot with so accurate text extraction from source is often required before the actual translation could be performed unless this software is able to not only extract text but also correct all the errors which are usually associated with OCR, and the more exotic the writing system the more challenging is the extraction task
These aren't multimodal models as they were trained only on text data (specifically for Machine Translation), so they can't be used to extract text from images. What you're probably looking for are models trained for Document Understanding tasks, such as the one we covered last session of School of AI (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gXSFE0TznGM.html&ab_channel=PiSchool)
Strange thing, he mention "attention" term before explaining what it is. What was EXACT meaning of this Query Key Value magic ??? I suspect speakers just copy thoughts of another people mechanically, not understaning real meaning of operations !
...comes from Google - Check. ...TensorFlow T-shirt - Check. Most viewers therefore rate this lecture highly - Check. This is very hand-wavy throughout with relatively no rigor shown. There are many lectures/presentations online which actually explain the nuts and bolts and wider use cases of Attention mechanisms. Maybe the title of this video should be something else, like "Our group's success with one use case (language translation) of Attention." Frankly, the drive-by treatment of the technical details of language translation case was almost terrible and should have probably been omitted.
Hey! I just found your channel and subscribed, love what you're doing! I appreciate how clear and detailed your explanations are as well as the depth of knowledge you have surrounding the topic! Since I run a tech education channel as well, I love to see fellow Content Creators sharing, educating, and inspiring a large global audience. I wish you the best of luck on your RU-vid Journey, can't wait to see you succeed! Your content really stands out and you've put so much thought into your videos! Cheers, happy holidays, and keep up the great work!
Most I gather from this talk is that "attention" is a pretty terrible term. Something like "fuzzy lookup" or "matching" or "mapping" would have been much more descriptive, but oh well, which researcher needs to think about terminology before unleashing it on the world.
In VIT, it is clearly stated that a "small dataset" like imagenet doesnt show promising results but a larger dataset like the jft gives amazing results, so this maybe a start, but it is far from perfection. Btw, I am not contradicting your statement. 😁. and also JFT is not an open source dataset(yet)
K is a matrix representing the T previously seen words and V is the matrix representing the full dictionary of words of the target language, right? But what are K and V exactly? What values do these matrices hold? Are they learned?
Great session Pi School 📚and Łukasz 😀. Here are a few key concepts I "attended" to and found interesting: "There is always another head that whatever word it is in, it looks at the (head) noun of the sentence or the (head) verb, just wants to know what are we talking about here 🤔" - Łukasz Kaiser ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rBCqOTEfxvg.html "n^2 * d seems worse than n * d^2 (on attention vs recurrent algorithmic/operation complexity). Luckily at Google there is guy name Noam (Shazeer), he never got his bachelor but wrote most of these papers..." - Łukasz Kaiser ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rBCqOTEfxvg.html "We trained a model to translate from English to French and vice versa, and from English to German and vice versa. Then if you give it French and ask for a German translation it will do a reasonable job. Multitask helps with deep learning tasks where you have little data." - Łukasz Kaiser ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rBCqOTEfxvg.html