Тёмный

Byte Pair Encoding Tokenization in NLP 

TechViz - The Data Science Guy
Подписаться 11 тыс.
Просмотров 6 тыс.
50% 1

#tokenization #transformers #nlp
Tokenization is the process of representing text into smaller meaningful lexical units. Byte Pair Encoding (BPE) is a popular subword-based tokenization algorithm used by state-of-the-art NLP models like RoBerta, BART, GPT, etc. In this video, we look into pros and cons of other methods and understand BPE through an example.
⏩ OUTLINE:
0:00 - Tokenization in NLP and it's types.
01:10 - Subword-level Tokenization
01:43 - Byte Pair Encoding (BPE) Algorithm
Enjoy reading articles? then consider subscribing to Medium membership, it is just 5$ a month for unlimited access to all free/paid content.
Subscribe now - / membership
*********************************************
If you want to support me financially which is totally optional and voluntary :) ❤️
You can consider buying me chai ( because I don't drink coffee :) ) at www.buymeacoffee.com/TechvizC...
*********************************************
⏩ IMPORTANT LINKS
Tokenization methods in NLP: • WordPiece Tokenization...
Research Paper Summaries: • Simple Unsupervised Ke...
*********************************************
⏩ RU-vid - / @techvizthedatascienceguy
⏩ LinkedIn - / prakhar21
⏩ Medium - / prakhar.mishra
⏩ GitHub - github.com/prakhar21
*********************************************
⏩ Please feel free to share out the content and subscribe to my channel - / @techvizthedatascienceguy
Tools I use for making videos :)
⏩ iPad - tinyurl.com/y39p6pwc
⏩ Apple Pencil - tinyurl.com/y5rk8txn
⏩ GoodNotes - tinyurl.com/y627cfsa
#techviz #datascienceguy #naturallanguageprocessing #machinelearning #ai
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 3+ years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).

Опубликовано:

 

19 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 1   
@TechVizTheDataScienceGuy
@TechVizTheDataScienceGuy Год назад
Hola! Checkout more of such interesting videos at bit.ly/3PxB00w
Далее
Subword Tokenization: Byte Pair Encoding
19:30
Просмотров 17 тыс.
WordPiece Tokenization in NLP
3:28
Просмотров 2,3 тыс.
When Steve Wants To Measure The Dog'S Height 😂️
00:19
1 5 Byte Pair Encoding
7:38
Просмотров 26 тыс.
Rasa Algorithm Whiteboard - BytePair Embeddings
12:45
Word2Vec, GloVe, FastText- EXPLAINED!
13:20
Просмотров 17 тыс.