Thanks a lot! Based on my current understanding I would say quickest starting mental image for me to understand BFN's would be telling me that it is a standard NN with Bayesian teacher forcing.
What I am not able to fully explain/justify is the accuracy schedule. I understand the existence on an accuracy altogether, since we chose to work with distributions across the entire workflow. I speculate that increasing the accuracy has to do with the "smoothness"/stability of training; if we were to start with a very informative message that the sender transmits to the receiver, then the update step would he very large, propelling the theta parameters to high values. This is just my intuition. It's not explained in the paper