This video will teach you everything there is to know about the Byte Pair Encoding algorithm for tokenization. How it's trained on a text corpus and how it's applied to tokenize texts.
This video is part of the Hugging Face course: huggingface.co/course
Related videos:
- Unigram Tokenization: • Unigram Tokenization
- WordPiece Tokenization: • WordPiece Tokenization
Don't have a Hugging Face account? Join now: huggingface.co/join
Have a question? Checkout the forums: discuss.huggingface.co/c/cour...
Subscribe to our newsletter: huggingface.curated.co/
14 ноя 2021