Not related to the video, but have you looked at "Rigging the lottery: Making all tickets winners"? (proceedings.icml.cc/static/paper_files/icml/2020/287-Paper.pdf) It's a recent paper by Google Brain where they train sparse neural networks. Typically this would be done by first training a dense network and then pruning. In this case, they instead dynamically modify the network topology during training. Since you're interested in sparsity I thought it might be worth mentioning.