I used the built in performance profiler in VSTS to look at speeding the boids up (and there seems to be a bug in the profiler - when memory profiling was turned on, it seemed unable to resolve symbols, so I switched to the CLRProfiler to look at the memory being used).
Most of the time in the application was spent in calculating the boids' steering behaviours. The optimisations here were to try to reduce the time spent in calculating the forces - one optimisation was that instead of each behaviour (cohesion, separation, alignment) each scanning through the entire set of boids, the results of one of the behaviours could be used by both others.
Secondly, the cohesion force was not updated on each frame - the value is cached for 10 iterations. This should not be a fixed value - there should be some heuristic to determine how it should be varied once work is in to measure the "quality" of the flock.
This boosted the frame rate from approx 3fps to 10fps (dropping to 5 fps after some time).
Next came some micro-optimisations.
Changing the Vector to be a struct instead of a class (to avoid over-stressing the garbage collector) resulted in 14fps dropping to 8 fps after running.
Other micro-optimisations that I tried, but didn't result in any real performance difference included:
Making the trail generator to use a fixed size array (instead of adding/removing from a list).
Removing IEulerRigidBody, calling onto objects directly instead of through an interface.
Changing all doubles to be floats (didn't actually keep this change in the code).
For Scene8, the drawing code was the real bottleneck, changing the gluSphere to be drawn using VertexBufferObjects resulted in double the frame rate (and it isn't particularly optimised - could use glDrawElements).
For Scene7, it's limited by the O(n2) collision detection (this is the next thing to tackle).