Тёмный

Lesson 2: Byte Pair Encoding in AI Explained with a Spreadsheet 

Spreadsheets are all you need
Подписаться 2 тыс.
Просмотров 8 тыс.
50% 1

In this tutorial, we delve into the concept of Byte Pair Encoding (BPE) used in AI language processing, employing a practical and accessible tool: the spreadsheet.
This video is part of our series that aims to simplify complex AI concepts using spreadsheets. If you can read a spreadsheet, you can understand the inner workings of modern artificial intelligence.
🧠 Who Should Watch:
- Individuals interested in AI and natural language processing.
- Students and educators in computer science.
- Anyone seeking to understand how AI processes language.
🤖 What You'll Learn:
Tokenization Basics: An introduction to how tokenization works in language models like Chat GPT.
Byte Pair Encoding (BPE): Detailed walkthrough of the BPE algorithm, including its learning phase and application in language data tokenization.
Spreadsheet Simulation: A hands-on demonstration of the GPT-2's tokenization process via a spreadsheet model.
Limitations and Alternatives: Discussion on the challenges of BPE and a look at other tokenization methods.
🔗 Resources:
Learn more and download the Excel sheet at spreadsheets-are-all-you-need...

Наука

Опубликовано:

 

26 ноя 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 17   
@michaelmalone7614
@michaelmalone7614 3 месяца назад
The quality of this video has blown my mind! Really great stuff and looking forward to your next video. Thank you!
@lewisbentley3177
@lewisbentley3177 3 месяца назад
This is amazing! Thank you for making this. Really looking forward to your next video about text and position embeddings
@paulwillis88
@paulwillis88 3 месяца назад
This is an incredible video. The way you describe these advanced AI concepts is awesome. I'd love to see more, maybe on the GAN technology that Sora uses
@Cal1fax
@Cal1fax 4 месяца назад
I learned a ton from your video
@fastler2000
@fastler2000 2 месяца назад
Incredibly brilliant. Words fail me. Thank you for sharing this, it helps me enormously in understanding AI. On a side note: How ingenious can you handle Excel? Unbelievable!
@BryanSeigneur0
@BryanSeigneur0 3 месяца назад
So clear! Great instruction!
@jjokela
@jjokela 3 месяца назад
Really cool stuff, it really helps me to understand how the BPE works. Looking forward to your follow-up videos!
@ricp
@ricp 3 месяца назад
great explanations
@defface777
@defface777 3 месяца назад
Very cool! Thanks
@rpraver1
@rpraver1 5 месяцев назад
Very good explanation, are you going to go into positional embedding?
@Spreadsheetsareallyouneed
@Spreadsheetsareallyouneed 4 месяца назад
thank you! yes just haven't had time to get around to it yet. Embeddings will be the next video. Not sure if I'll do token and positional embeddings in the same video or will break it up into two parts.
@MStrong95
@MStrong95 3 месяца назад
Large language models and AI in general seems to do a good job of compressing and then turning back into an approximation of the input. Is this a byproduct of nural networks in general or just specific subsets? Could you make a large language model or a lot of purpose built AI that are good for various different compression situations and more often than not perform better than current compression algorithms?
@kennrich213
@kennrich213 3 месяца назад
Great video. How exactly are the scores calculated from the possible pairs? You said a standard filter? Could you explain more?
@lordadamson
@lordadamson 3 месяца назад
amazing work. please keep going :D
@JohnSmith-he5xg
@JohnSmith-he5xg 2 месяца назад
Why does having a large Embedding Table matter? Can't it just be treated as a lookup into the Table (which should be pretty manageable regardless of size)? Do we really have to perform the actual matrix multiply ?
@magickpalms4025
@magickpalms4025 3 месяца назад
Why did they include reddit usernames in the training? Considering the early days where people would have extreme/offensive names as a meme, that is just asking for trouble.
@ameliexang7543
@ameliexang7543 6 месяцев назад
promo sm
Далее
Subword Tokenization: Byte Pair Encoding
19:30
Просмотров 17 тыс.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
это самое вкусное блюдо
00:12
Просмотров 1,7 млн
Lesson 3: Understanding Word Embeddings in AI and LLMs
46:32
1 5 Byte Pair Encoding
7:38
Просмотров 26 тыс.
КАК РАБОТАЕТ СЖАТИЕ?
27:37
Просмотров 84 тыс.
Rasa Algorithm Whiteboard - BytePair Embeddings
12:45
Why Does Diffusion Work Better than Auto-Regression?
20:18
Let's build the GPT Tokenizer
2:13:35
Просмотров 530 тыс.
FullHD в 8К БЕЗ ПОТЕРЬ? |РАЗБОР
20:42