Not yet tested on a Pi, but I think I have got another 6% or so out of it. For sure its 'low load' performance will now be much better, but that's of less interest than the key question - 'will it be able to play Popcorn with 5-10% CPU left over for a GUI, clocked at 950MHz?'
Testing in about 20 minutes I think.
UPDATE : I broke my build system, had to wait overnight to run tests. Which are positive - here's PIANA running Popcorn at 950MHz, all synths active and bashing away - and I have 12% of the CPU free, as evidenced by the two 'thrash' threads. This was captured during one of those 'double time drumming' chunks of the track, where CPU load is highest.
The big question now is, is this 'under 90%' of the un-overclocked state, or the overclocked state, given how the Pi CPU governor works? So now I need to run the same test as 800MHz and see if I run out of puff.
Tried it - 800MHz falls down in a heap when the drums come in, which is where not only do I get 4 more notes of polyphony, but they are all filtered and hence more expensive. So can't do 800MHz. 900MHz - sort of worked, sort of almost. No audio drop outs that I could hear, but that tiny drop in CPU relative to 950MHz is enough for the MIDI timing to go a tiny bit astray, as the CPU load squeezes out MIDI response in favour of synthesis.
So there you have it. We look reliable at 950MHz - but there is no GUI - and very nearly reliable at 900. So as long as the GUI refresh is kept at a lower priority than MIDI input, it should pass the Popcorn test if nothing else.
Also it's worth pointing out, based on the screen grab above, not only is 'top -H' eating up about 2% of my precious CPU, but I have 2 SSH sessions active - 3 actually, as the file system is SSH-mounted - so the network will be getting in my way.
And one final thing - I just re-ran at 1GHz / full-on Turbo overclocking, and the performance difference is non-linear, which is to be expected - systems do collapse non-linearly as they hit saturation, so no surprise that a non-linear amount of CPU is freed up by a wee bit more clock rate - but even so, nice to see almost 30% of the CPU free all the time. Sort of all the time - occasional dips to 29% free.
And this stupid denormal issue has not yet gone away completely ... I see about 4% CPU variance after I play then stop vs no sounds at all, so there is still work to be done there. But when I run in fixed-point I don't have enough performance - maybe the ARM code exiting the compiler isn't quite optimal for fixed-point, but whatever, I need to stay on floats now.
One more tweak has resulted in more reliable behaviour with slightly fewer explicit exponent checks - I now clamp floats at strategic points in the pipeline to a rather hideous 2^-24 - so I now get float performance, fixed-point precision as all my old audio work was done in Apple's 8.24 format.