You can turn artificial neural networks inside-out by using fixed dot products (weighted sums) and adjustable (parametric) activation functions. The fixed dot products can be computed very quickly using fast transforms like the FFT. Also the number of overall parameters required is vastly reduced. The dot products of the transform act as statistical summary measures. Ensuring good behavour. See Fast Transform (fixed filter bank) neural networks. Since dot products are so statistical in nature only weak optimisers are necessary for neural networks. You can use sparse mutations and evolutions. Then the workload can be very easily split between GPUs with little data movement needed during training. See Continuous Gray Code Optimization.