Тёмный

Lesson 2: Byte Pair Encoding in AI Explained with a Spreadsheet 

Spreadsheets are all you need
Подписаться 2,1 тыс.
Просмотров 8 тыс.
50% 1

In this tutorial, we delve into the concept of Byte Pair Encoding (BPE) used in AI language processing, employing a practical and accessible tool: the spreadsheet.
This video is part of our series that aims to simplify complex AI concepts using spreadsheets. If you can read a spreadsheet, you can understand the inner workings of modern artificial intelligence.
🧠 Who Should Watch:
- Individuals interested in AI and natural language processing.
- Students and educators in computer science.
- Anyone seeking to understand how AI processes language.
🤖 What You'll Learn:
Tokenization Basics: An introduction to how tokenization works in language models like Chat GPT.
Byte Pair Encoding (BPE): Detailed walkthrough of the BPE algorithm, including its learning phase and application in language data tokenization.
Spreadsheet Simulation: A hands-on demonstration of the GPT-2's tokenization process via a spreadsheet model.
Limitations and Alternatives: Discussion on the challenges of BPE and a look at other tokenization methods.
🔗 Resources:
Learn more and download the Excel sheet at spreadsheets-a...

Опубликовано:

 

21 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 18   
@somdubey5436
@somdubey5436 2 месяца назад
seeing this video made me feel I am not even close to say I know excel. Your understanding of concept is really really deep as implementing something like GPT-2 in excel requires one to have thorough understanding of all the concepts. Hats off to you.
@michaelmalone7614
@michaelmalone7614 6 месяцев назад
The quality of this video has blown my mind! Really great stuff and looking forward to your next video. Thank you!
@lewisbentley3177
@lewisbentley3177 6 месяцев назад
This is amazing! Thank you for making this. Really looking forward to your next video about text and position embeddings
@paulwillis88
@paulwillis88 6 месяцев назад
This is an incredible video. The way you describe these advanced AI concepts is awesome. I'd love to see more, maybe on the GAN technology that Sora uses
@fastler2000
@fastler2000 5 месяцев назад
Incredibly brilliant. Words fail me. Thank you for sharing this, it helps me enormously in understanding AI. On a side note: How ingenious can you handle Excel? Unbelievable!
@Cal1fax
@Cal1fax 6 месяцев назад
I learned a ton from your video
@rpraver1
@rpraver1 8 месяцев назад
Very good explanation, are you going to go into positional embedding?
@Spreadsheetsareallyouneed
@Spreadsheetsareallyouneed 7 месяцев назад
thank you! yes just haven't had time to get around to it yet. Embeddings will be the next video. Not sure if I'll do token and positional embeddings in the same video or will break it up into two parts.
@jjokela
@jjokela 6 месяцев назад
Really cool stuff, it really helps me to understand how the BPE works. Looking forward to your follow-up videos!
@BryanSeigneur0
@BryanSeigneur0 6 месяцев назад
So clear! Great instruction!
@ricp
@ricp 6 месяцев назад
great explanations
@MStrong95
@MStrong95 6 месяцев назад
Large language models and AI in general seems to do a good job of compressing and then turning back into an approximation of the input. Is this a byproduct of nural networks in general or just specific subsets? Could you make a large language model or a lot of purpose built AI that are good for various different compression situations and more often than not perform better than current compression algorithms?
@defface777
@defface777 6 месяцев назад
Very cool! Thanks
@kennrich213
@kennrich213 6 месяцев назад
Great video. How exactly are the scores calculated from the possible pairs? You said a standard filter? Could you explain more?
@JohnSmith-he5xg
@JohnSmith-he5xg 5 месяцев назад
Why does having a large Embedding Table matter? Can't it just be treated as a lookup into the Table (which should be pretty manageable regardless of size)? Do we really have to perform the actual matrix multiply ?
@lordadamson
@lordadamson 6 месяцев назад
amazing work. please keep going :D
@magickpalms4025
@magickpalms4025 6 месяцев назад
Why did they include reddit usernames in the training? Considering the early days where people would have extreme/offensive names as a meme, that is just asking for trouble.
@ameliexang7543
@ameliexang7543 9 месяцев назад
promo sm
Далее
Lesson 3: Understanding Word Embeddings in AI and LLMs
46:32
Subword Tokenization: Byte Pair Encoding
19:30
Просмотров 18 тыс.
Eco-hero strikes again! ♻️ DIY king 💪🏻
00:48
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
Просмотров 1,5 млн
What are AI Agents?
12:29
Просмотров 440 тыс.