Visualising Sound

This is the first in a series of blog posts about visualising sound, in a similar way to an oscilloscope. The geek in me thought it’d be a fun thing to do, as well investigate different technological approaches to implementing the app (interesting as it’s a real-time performant app).

clip_image001

An oscilloscope has a single screen, which refreshes on a given time period, displaying a number of traces. The user can control that time period, as well as the gain (the y axis scale).

The top graph is essentially the same view as an oscilloscope containing a single trace, and a fixed time period (further blog posts may investigate varying time periods).

The bottom graph is a sliding window with a longer time period – this is the advantage of implementing an oscilloscope in code, we can create charts that aren’t really feasible in a classic CRT oscilloscope.

This series of blog posts will investigate implementing this screen using F# idioms, as well as using the Reactive Extension for .NET (RX Framework), and TPL Dataflow.

There are further things that could be implemented in future blog posts which may be interesting to see how the varying approaches look like:

· Trigger, or Trigger and hold: Only refresh the oscilloscope view once a certain trigger level has been reached. This is interesting as may want to include a certain number of immediately prior to the point where the trigger was set.

· Log many traces.

· Spectrum analyser/FFT.

· Digital filtering.

· Comparing traces, or more complicated math channels.

· Heatmap.

· Adjustable time offset (delay) – useful on the oscilloscope view to centre a waveform on the screen, or for when comparing two or more channels output.

F# Implementation

This blog post covers an F# implementation of the microphone, using asynchronous workflows. The graphing is being done by a nightly build download of Dynamic Data Display (D3). I’m really impressed with the performance, but it is quite difficult to customise.

The low-latency microphone code is from this article on gamedev (http://www.gamedev.net/topic/586706-slimdx-directsound-capture-microphone-stream-with-low-latency-example/).

The implementation of this is pretty simple; I’ll start from the detail out.

The inner loop of the program is an asynchronous workflow that reads from the buffer and returns a sequence:

clip_image003

Note the slightly-strange looking code:

let task = System.Threading.Tasks.Task<int>.Factory.StartNew(fun () -> WaitHandle.WaitAny(waitHandles))

F# has an Async.AwaitWaitHandle() method, which unfortunately only waits on a single handle. We want to wait on both handles so that we get notified when the buffer is full every 4096 instead of every 8192 bytes. With 4 bytes per sample, and a sample rate off 44KHz, this is equivalent to getting notified at an approximate rate of 40 times per second instead of 20 times per second.

I could have implemented Async.AwaitAnyWaitHandle() taking an array of WaitHandles, but looking at the code in the F# PowerPack, the code was quite complex. So, the code instead creates a new future to do the waiting and let us know which WaitHandle was set (this does mean that we’ve got the minor overhead of scheduling a new task to run on the task pool).

The Async.StartImmediate method ensures that the ProcessStream call is marshalled back onto the UI thread. It may be worth in the future looking at doing more of the data processing on a dedicated thread, leaving the UI thread free for drawing and user input.

The convertByteArrayToSequence is simple, it just iterates over the buffer in 4 byte chunks, and converts the values to floats, which it yields in the sequence:

clip_image004

The ProcessStream method looks like this:

clip_image006

For completeness, this is the Seq.sample module extension:

clip_image007

The nastiest bit of the code in ProcessStream is probably towards the end, where the windowedUnderlyingData is manipulated to ensure that the window only contains 1000 samples. It would be nice to do this in a non-imperative way, using Seq.windowed, but the problem is, is that the sequence we’re operating on is only the result of one buffer read operation, whereas windows etc. should operate over the whole data stream, and the sequences can’t be combined into a single sequence using yield! as they’re all generated asynchronously. Similarly, the buffer takes non-overlapping windows over the input sequence, and without taking a buffer over the whole stream, it may miss samples off the end of the sequence portions. Tomas Petricek has written a library, AsyncSeq, http://tomasp.net/blog/async-sequences.aspx which I may investigate in a later post.

The alternative to this would be to implement mailbox processors, having different agents for the buffering, windowing and sampling operations. I did start investigating this, but didn’t feel happy with them being pull-based (they don’t return data unless asked). I could have set them up to act more like a dataflow network, but it does seem to go against their intended use of managing state between different threads. I may revisit this in a future blog post.

I feel that even though F#’s does have nice features which helped to quickly implement the app, RX would probably be a better fit. I guess I won’t know until I implement it in RX and compare the differences.

Labels: ,