This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video discusses branch delay slots in detail.
I have to say I love your british accent with a hint of indian as well. A lot less thick than most Indian computer science teachers you can find, and lot easier to understand
Nowadays, compile time branch prediction (with profiling) is usually better than 95%, and run time branch prediction (one cycle beforehand) is about 97%, according to some manufacturers. So branching code actually suffers from this technique as it make the code larger, resulting in both higher ICache mispredict rate and pressure on the instruction bandwidth. It's not ironic that the only surviving major architectures are those without a delayed branch.
Perfect branch prediction recognizes that only about 2-3% of branches are actually misbehaving, though we're only about to see about 90% of the time what direction a branch will go well ahead of time.