How to efficiently perform part of speech tagging! Part of Speech Tagging Video : • Part of Speech Tagging... Hidden Markov Model Video : • Hidden Markov Model : ... My Patreon : www.patreon.co...
Mate, I just had a lecture on Viterbi in NLP context from uni and I was having nearly a breakdown due to all the smart formulas our teacher has gave us. It was not possible to understand it for me from the lecture. But you have shown it and explained it so clearly, I am amazed and shocked at the same time. You are a legend! Please carry on with the videos, you are saving and changing lives with this
This literally has to be the best resource to understand Viterbi algorithm across the whole of the internet. Even the inventor themselves wouldn't have been able to explain it better!!! THANKS A TON
I had the same doubt you had at around 13:30 , but you cleared it without causing any confusion!!! Awesome explanation!!! Hopefully your channel becomes more popular!! cheers and good day ahead!!
Exactly. I couldn't be convinced when I was told Viterbi isn't greedy, but it makes sense now. Essentially, there's a big difference between argmax of the next connection, and argmax of the cumulative previous connections.
The prove of the dijkstra invariants is very similar to how you would prove the statement with the viterbi algorithm. In case you're interested in the exact prove!
Very well explained, I actually came from the Coursera NLP specialization since I had many doubts over there, but after watching this, everything is clear.
Same here. My expectations were high with that course given the quality of the deep learning specialization. But I'm kinda disappointed so far. I've been learning much more from materials like this over the internet.
Nicely explained definitely. One thing I could imagine helping a bit would be to show that in the example we're technically always considering DT, NN and VB, but since the emission probabilities are only non-zero for certain words in the sentence only the nodes with non-zero emission probabilities are written down in the example. As in technically from the start for example we would need to consider whether the first word is a definite article, noun or verb. But the emission probabilities for the starting word are 0 for both verbs and nouns so they would never be part of the path leading to maximum probability. In the next step only NN and VB are written down because DT has a 0 emission probability for fans and so on.
I'm not gonna say I can solve all the problems of veterbi algorithm from now on, but I can say I have a clear concept after watching this, thank you sir....
The teacher does an excellent job of explaining the Viterbi Algorithm and providing a clear example. It’s always great to see educators who are passionate about their subject and able to convey complex concepts in an understandable way.
Super helpful explanation on exactly when you can discontinue the candidate paths. I've seen a few explanations of that point and this one is definitely the clearest
Dude this is awesome. I came here because I did not uderstand the explanation of a Coursera course. No offense to them but you did a great job. Thank you.
Vitterbi algorithm was well explained. Hidden Markov Model was a bit difficult to follow because the contrast on whiteboard was low. I like the way you explain keeping formal clutter out of the way.
Hey thanks for the video - its super helpful - just one question. When you branched off the start node you only considered the possible state as DT, but isnt there also a 0.2 prob that the first state is a NN?
Great video, you are a great explainer. One note, are you sure that the reason why Viterbi is fast (O(L*P^2) rather than O(P^L)) is because you can discard paths (13:49)? It seems to me like the generic Viterbi formulation does not discard any paths (as it seems from the pseudo code on wikipedia), rather it's efficiency comes from the very nature of a dynamic program where the algo builds on previous work in a smart way (overlapping sub problems etc...). As you yourself say at the very end (19:57), you look at all nodes in the previous layer at each step. At each layer there are P nodes and at each node there are P options, which repeated L times means there are L*P^2 ops to do. So I guess it's not even necessary for Viterbi to prune paths to reach that good of a runtime.
So helpful video! really helped me a lot. I just have one suggestion, instead of green marker. try some dark marker like brown. Green shines a little extra.
Excellent video, I like the way that you explain complex things very understandable. Would you please continue to talk about Maximum Entropy Markov Models?
Hey, thank you so much for sharing all of these helpful videos with us. I really appeciate it! I can see you explained about the decoding algorithm with HMM. Could you also explain about evaluation and learning algorithms?
Great video! I think this actually helped me to better understand a different Algorithm called PELT (for Changepoint Detection). Still, I am not 100% sure about PELT so if you would cover it in a different video I would be very grateful☺❤
My Question is, here the "The" word only has "DT" as one possible Parts of Speech of it. What if your sentence had started with, say, "Fans", which have more than one possible "Parts of Speech". Viterbi algorithm will always pick the parts of speech with maximum probability (and it will always be same no matter what the sentence is). Wouldn't that be wrong?
Thank you so much for posting this awesome tutoring video. It really helped me understand the algorithm indeed. Can I ask a question? We have two probability matrices in the example. In reality, when we have a sequence data set, do we use transition and emission probabilities that a trained model by EM algorithm produced or the probabilities we can calculate from the empirical data?