Hello! I have some sort of a bug when I train my model. I have a 4VRAM gpu and I receive an error that I do not have enough space, even if i reduce the size of the batch and using the nano yolo version. Then i moved to cpu, having 24gb ram, and after 2/3 epochs the training stops randomly without any warning. Any idea why this happens?
@@TheCodingBug Regarding to cuda error, is there a gradient accumulation argument to use. I don't think training with a single sample per optim step is fine. The problem with sudden stopping is not yielding any kind of error or warning. The last thing i see in the console is the epoch progress and then the path of the program on the next line:). Some said it might be from lack of memory but 24gb should be more than enough for such a small yolo. Maybe i'll go on colab even if i hate it.