I LOVE that you've linked to Michael Stevens' video. I'm playing around with predictive language models and I'm really happy you're talking about WORD TOKENS in this video!
Martin, Zipf's law makes me wonder about the value of MI scores, not that they aren't meaningful, but when you review collocation results for a word and find that MI seems to have nothing to do with absolute frequency, but just mutual attraction continuing to exert its pull regardless of frequency. Collocation is a function of context, and it's the frequency of contexts that varies, analogous to the way certain climatic circumstances can promote the health of, say, vegetation and insects. Plug "miserable" into COCA and you get "creature" at rank 15 and an MI of 7.38 after a long line of MIs in the 3.0 range, because "miserable creature" is construction that occurs on certain rhetorical occasions. Am I overthinking this?
Hello :) I watched your abralin talk live on Wednesday. I study generative syntax, and I was very inspired by your discussion of negative evidence in the Q&A session! Thank you for all the wonderful videos!
Mathematically, adding "aaa" to each word does not change the distribution. In the real world, languages like that don't exist, though. Speakers would be too lazy to pronounce extra vowels that don't mean anything, and so some of the "aaa"s would disappear very soon.
@@MartinHilpert ehm ok, but there's this asian language where they say like "Praise God" before every sentence. Also, real language or not, how does it apply?
@@MartinHilpert btw, how does it now change the distribution? Take an existing text and add "aaa" to the beginning of each word, it wouldn't work, right?
Hi, thank you for your wonderful videos. Does this law hold true for words uttered or written by non-native speakers of a language? or uttered by children before having mastered the language?
Hey Carolyn! Both L2 language and child language in first language acquisition show Zipfian distributions. Here is an interesting lecture by Nick C. Ellis on Zipf and L2 language use: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7cKaX57tEXc.html Better video & audio, similar content: www.uttv.ee/naita?id=25911 Here is a study about Zipf and child language: journals.plos.org/plosone/article?id=10.1371/journal.pone.0053227
@@MartinHilpert Martin Hilpert Thank you so much Dr. Hilbert! I'm very excited about learning more about this, and I always look forward to your videos 🙂
Hi Martin, thank you for the wonderful and very helpful video. I am applying Zipf's law on my task to create a dictionary of words that are specific for a particular category - However, I wonder if I could use the curve to determine a threshold number for the most significant words for the dictionary ? For instance, use the intercept to determine this?
Thanks for that extensive video! It put a great value into my master's thesis. Even though I'm dealing with distributions in geographical data, it was great and easy way to understand Zipf's law.
Thank you for the video:D I'm trying to download Antconc on mac with the newest version but there can be opened because "Apple cannot check it for malicious software." Also, when I was forced to open it doesn't have a way to open files on it. I would wondering do there have any ways to fix those problems?
It's hard to diagnose these issues from afar, but Anthony Laurence has a great series of tutorials on his webpage: www.laurenceanthony.net/software/antconc/ Good luck!
Hi Järvi! The common way of visualizing Zipf's Law is the scatterplot of rank and frequency with logged axes. I adopted that format in order to match up with other explanations that are out there.