So I talked a bit about NVIDIAs horrible OpenGL VSYNC performance (cpu usage) in a previous blog however I have found a solution to it. I was debugging RetroCopy v0.300B on my nvidia machine when I noticed it.
There were a few issues with the fragment shader on NVIDIA hardware which I fixed, but whilst fixing it I noticed absolutely horrible performance from time to time on my CORE2 machine. Now it has two cores and the video thread should only be using a single core, leaving one fully free to do the emulation. After doing a little investigation I saw that RetroCopy was using both cores 100%, which confused me because I wasn't running any emulator threads, it should have only been at 50% at worst.
So I pulled out Process Explorer and looked at the threads, lo and behold there is an opengl thread using a whole core on its own. I looked at the stack trace and saw this thread was doing the VSYNC. This confused me, because another thread was apparently also doing this, why do you need two threads to do the same thing? Especially something so costly? Well as it turns out, NVIDIA calls this an optimization, threaded optimization, and guess what, it's enabled by default.



So in conclusion if you want better performance on your multicore NVIDIA based machine, disable this "optimization". ATI doesn't suffer from the same issue, so if you're on ATI you're safe.