I believe the largest speedup that can be achieved here is to combine the pipeline into a single program. For this reason, I propose to create a pipeline mechanism based on C++ templates with stages for the common pipeline elements being implemented in discrete classes.
I've just got off another job where I optimised a string search function using the SSE4.2 asm instructions, so I'd be interested to see if I can use that technique here as well, but such an investigation will come after the main structure implementation.
Thank you,
Daniel Horne