Zipf's Law & Heap's Law Explained | Frequency of Words | Linguistics & Natural Language Processing

Подписаться 2,6 тыс.

Просмотров 6 тыс.

50% 1

Do you know that half of all the English text in the world is made up of only around 135 words !
The website 🔗 to run this experiment on 👇
coding-blocks-archives.github...
When we list down all the words in English ranked by frequency, the Nth ranked word's occurrences are proportional to 1/N. Fascinating right ? When creating NLP models, we can predict how big our model will be based on how much text we are feeding it to learn - that's based on this same phenomena. The most mindblowing thing is, this holds true not just for English but for all languages!
Find out more about Zipf's Law, and Heap's Law that derives from that.

Опубликовано:

17 авг 2020

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 17

@jcpederson55126 4 года назад

Interesting! I put in the text of the American Declaration of Independence (1,320 words) from 1776, and the top four words were the of to and

@6etime 4 года назад

It usually always are those words!

@mridulsharma1863 3 года назад

Coolll..... Will share this with my friends

@prikshitbatta Год назад

Very Simply Explained.

@farhan787 4 года назад

Though I liked the video but the whole time I was staring at your MacBook 16 Pro because my MacBook air got dead and I have to buy a new one and I like macbook 16 Pro but 2 Lakhs are too much for me right now 😂😭

@censorthecarnage 4 года назад

Banger.

@l.u.v Год назад

Very brilliantly explained. One can also see Bradford's law that states similar thing but in domain of research.

@tushar9707 3 года назад

good info bhaiya.

@parvbhardwaj1997 4 года назад

amazing channel

@6etime 4 года назад

Thanks Parv! Spread the word :)

@prachingupta4780 3 года назад

At last we have Indian vsauce

@frontendlead140 4 года назад

HEllo Arnav Sir , mujhe confusion rehti hai bit me .Like 1 character 'a' kya 1 bit me store hota hai ? Please ye thoda clear karo .Jab ham bolte hain ki ye password 32 bit ka hai , tab mai samajh nhi pata ki 32 characters hai ya kaise sochna hai.Please bhaiya samjha do!

@6etime 4 года назад

Characters take up 1 byte usually. 256 bits

@6etime 4 года назад

A 32 byte password will be 32 chars

@farhan787 4 года назад

@@6etime bhaiya 8 bits not 256 bits, 8 bits can store 256 different chars.

@6etime 4 года назад

@@farhan787 yes, my bad, 8 bits, 256 chars.

@tushar9707 3 года назад

i think a character is stored according to its ASCII value like : if A is 65 then 65 will be converted to binary 1000001, and then each binary will take up 1bit. so it will take 7 bit in memory to store A. But since ASCII range is upto 128 so at max it will consume 8bits which is 1 byte again.