You content is head and shoulder above the rest on the topic. Kudos! Reward + 1000! The music unnerves me as I try to concentrate on the ideas, though... reward -0.1
Running the max() function over the complete priority buffer, hinders the performance by a substantial amount. I would storing the max probability in a variable, and compare it with newly added errors, and update the variable. This can then be used for adding new experience priorities.
What about normalizing priorities (0 to 1)? This way we could just set max_priority to 1, and I think it would positively affect performance, keeping it stable! What do you think?
@@julioresende1521 But wouldn't the time complexity for using segment trees to compute max from index 0 to index N - 1 (length of the array) be the same as running the max function over the array?
Hi :) You can save the weights of the neural networks at the end of the training proccess. In my case, I use the tensorflow.keras library, where the models have a method called save_weights.
Hmm, I'm getting an error AttributeError: 'DoubleDQNAgent' object has no attribute 'sess'. Not sure why as a lot of the code looks like the previous???