I noticed the same when I attempted to cover anything remotely complicated. I think a lot of Blender Python programming is being either ignored or blocked to keep the AIs from training on it.
I don't have time right now but I'd be interested to see how the OS LLMs like starcoder2 or deepseek would fare, although these results don't inspire confidence in their smaller size. I guess it depends on what they've been trained on, perhaps the free models ate the entire blender developer part of the internet.
That should get you a lot further, but the more you go back and forth, the more you're testing different things and it becomes less of an apples-to-apples comparison.