Python is great. I do not really see problem as you can always do heavy CPU bound multiprocessing with Processes, avoiding most problems from code with implicitly shared memory. Also, I think that achieving better solution than with use of atomic inc/dec is not possible (without GC changes). After all what can be faster (and simpler) than hardware solution. There are tons of programming languages and expecting it to be best at everything is ... not needed.