This reminds me of the first presidential debate televised in America. It was broadcast on radio, as was the standard, but also televised because many homes had the new technology. When citizens were polled afterwards, those that had listened on the radio thought that Nixon had won, while those watching telly were more persuaded by Kennedy's presentation. Your video uses subtitles, which means that even with zero audio you are able to communicate the ideas easily. I'd say that the reason it seems to take more effort to watch is because humans process more visual information than auditory information. You get more signal from a silent vodeo than from a radio broadcast.
Ask a blind and a deaf person. Everyone has different world experiences so it really varies. I’m on the blindness spectrum so for me audio is definitely more important, but if I was Deaf, I’d need the visual, with subtitles.
If the video is more visual than audio then it has to be video quality, but if it is a commentary or has more audio type of video then it would be audio quality over video quality