Do you know that half of all the English text in the world is made up of only around 135 words !
The website 🔗 to run this experiment on 👇
coding-blocks-archives.github...
When we list down all the words in English ranked by frequency, the Nth ranked word's occurrences are proportional to 1/N. Fascinating right ? When creating NLP models, we can predict how big our model will be based on how much text we are feeding it to learn - that's based on this same phenomena. The most mindblowing thing is, this holds true not just for English but for all languages!
Find out more about Zipf's Law, and Heap's Law that derives from that.
17 авг 2020