Taumuon Jabuka

.NET SIMD to solve Burgers' equation

2014-10-01T13:29:00.002-07:00

I'm enrolled on the mooc Practical Numerical Methods in Python, and module 2 has covered the Burgers' Equation. This is a great course by the way, and it's not too late to join up!

I am impressed with the productivity of Python. ipython notebooks, SymPy, plotting etc. are all great and I feel that .NET could benefit from a similar ecosystem.

That said, I've been itching to play with the new .NET SIMD support and this seemed to be a suitable example.

The Python notebook demonstrated how using Python's array notation to solve the Burgers' equation using finite difference resulted in a 9 fold speed increase, on my machine taking 25ms instead of 200ms.

I translated the code into C#, and it completed in 2ms, so I immediately was impressed (not surprised though - an interpreted versus (JIT) compiled language...). This wasn't going to tell me much about SSE improvements though, so I increased the number of spatial samples a hundred fold, and the number of timesteps by 10.

The time was then increased to 2580ms.

Installing .NET 4.5.2 and using RyuJIT CTP4 to compile and run, the running time reduced to 1880ms. RyuJIT can't arrive fast enough!

I then implemented array swappping, instead of continually creating and copying arrays, to see if it would reduce pressure on the GC (Calculate2 in the gist), but it had no significant difference.

I then pulled the calculation of (nu * dt / dx) into a variable (Calculation3())

This resulted in a calculation time of 230ms.

Then, I felt it was time to implement SIMD. The documentation is a bit scarce, but it wasn't too tricky to figure out. My machine has 128bit SSE registers, so it can add two doubles (64 bit) in a single operation.

Running this resulted in a calculation time of 90ms, so slightly greater than 2x speedup. I'm not sure whether it was able to make further optimisations based on the changed structure of the code, but this is a great win. This is implemented in Calculation4() in the gist.

The gist is available here: https://gist.github.com/taumuon/5d4db567c57f9846971d

Monte Carlo double barrier option pricing on GPU using C++ AMP

2014-08-25T01:57:00.002-07:00

I converted the double barrier option pricing code from my last post to run on the GPU, using C++ AMP, and got an 80% speedup.

This is running on a low powered Acer V5-122p laptop with an AMD A6-1450 processor with an integrated Radeon HD 8250 GPU.

The gist is here: https://gist.github.com/taumuon/bbeeb9e2c1f5082a2699

To be fair, I converted the CPU code to be otherwise identical to the gpu code. Instead of populating an array of the sample path, it instead for each point just determines whether the value breaches the upper or lower barriers and uses the last value for the payoff.

This reduced the runtime from the code in my last blog post, of 2540ms, to 1250ms, i.e. a 2x speedup.

The GPU code was ran twice, the first time it ran in 140ms, the second run (after all shaders already compiled etc) was 15.6ms, i.e. a very impressive 80x speedup from the CPU code.

If anything, it shows that AMDs strategy of cheap low powered laptop chips will payoff if people start taking advantage of the relatively strong GPU.

Monte Carlo pricing of Exotic Options in Functional Style C++

2014-08-04T11:47:00.002-07:00

My last blog post was about pricing of exotic options using F#. In that post I said "Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#."

The OO version of this would probably be built via composition: having a class implementing a IPayoff interface (implemented for Call and Put), and that would be used in implementations of IPathPayof, which would be implemented by AsianArithmeticMean, DoubleBarrier etc. The use of OO is to pull out the common implementation, but there's no state associated with any of these classes - it feels much nicer using function composition.

I've implemented a version in C++ using lambdas, and the functional composition is actually really nice. C++ has the std::bind method which means that any number of parameters can be partially applied even for functions not in the curried form.

The code can be found here: https://gist.github.com/taumuon/727c31f4a56518f91718

The most surprising thing was the speed increase - the equivalent algorithm in C++ is 20x faster than the F# code! For instance, running on my low-powered AMD A6-1450 powered laptop, the double barrier option took 45 seconds to calculate in F#, whereas it took 2.2 seconds in C++.

I did tweak the F# to instead of using Seq.unfold to create the asset path, it instead creates and mutates an array, and that reduced the running time by more than 25%.

EDIT: Jack Pappas has forked and modified the F# code so that its runtime is now in the same ballpark as the C++ code. The fork is here: https://gist.github.com/jack-pappas/d629a767823ca2f17967

I think there's scope to speed up both the F# and C++, by introducing parallelism and investigating GPGPU, which I hope to do if I don't get sidetracked.

Monte Carlo pricing of Exotic Options in F#

2014-04-25T14:00:00.001-07:00

I've recently completed all exercises for the MOOC Monte Carlo methods in Finance, ran by the University of Madrid on Iversity. The course was excellent, both in its content and delivery.

The course started off by refreshing probability theory (cdf, pdf etc) and quickly moved onto generating random numbers, Brownian motion and stochastic differential equations followed by the pricing of options and calculating risk.
The final week's homework is on the Monte Carlo calculation of a portfolio's risk, using copulas with Student's t distribution, but as that week's homework is still running I'm blogging about the penultimate week's homework.

This homework is about pricing path dependent options, and demonstrating how using a control variate can provide a better result for the same number of iterations (better meaning less error, i.e. with a smaller standard deviation). The homework further shows that when pricing a barrier
option, using an up-and-out option as the control variate produces better results than using a European option as the control variate.

The course delivery was extremely slick. The videos were filmed specially, this wasn't a rehashing of a lecture in front of students (as seems to happen in some other online courses). The lectures were short, and to the point, managing to cram a lot of information in with zero fluff. I really appreciated this as I was watching on a daily commute.

The assessments were by multiple choice question, but with only one attempt, so it did end up actually feeling like a challenge. The questions were a mixture of maths and coding, with the balance moving more towards coding as the weeks went on.
The code was delivered in Matlab, which I had to get reacquainted to after a decade break. The main downside I found is that I ended up doing a fair bit of copy-paste coding, so completed some of the homeworks in .NET instead. I personally found the coding easy, and found the maths more challenging (in a good way).

Now, for the code - the F# code is available here: https://gist.github.com/taumuon/11302896

Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#.

If this course runs again I'd wholeheartedly recommend it.

MOOCs

2014-01-24T10:03:00.003-08:00

I've just enrolled on Iversity's Monte Carlo Methods in Finance course, and have converted some of week 1's Matlab demo code over to F# and Deedle: https://gist.github.com/taumuon/8602365

I spent the latter half of last year diving into various online courses. I completed Coursera's Mathematical Methods for Quantitative Finance, and also followed along with a number of other courses to varying levels of completeness.

I completed three weeks assignments of Udacity's Cuda programming course. Week three was painful due to an error in the reference code, and week 4 was crashing due to a memory exception. I was using a GPU emulator in Ubuntu, and decided that it would be easier with real hardware. I watched the remaining videos and found the parallel algorithms explanations useful.

I completed the first two weeks of Coursera's Scientific Computing. These were maths exercises, which I enjoyed and that's what inspired me to do the Maths Methods for Quant Finance course. The Matlab exercises I was planning to do in F#, but left the course when other attendees were complaining that to complete the homework to the correct numerical accuracy you needed the exact version of Matlab the instructor was using, and they were unable to use Gnu Octave.

It is great that there's so much free high standard material available. The fixed timescale nature of the course is a bit annoying - if work or life gets in the way one week it may make it impossible to catch up with the remainder of the course. I may get around to trying a course again next time it comes around though.

RunKeeper Visualisations

2013-08-17T07:31:00.000-07:00

My last post described how I'm using the Twitter API to receive tweets off the live stream.

Since then, I've used the API to filter for tweets containing the #runkeeper hashtag, and used that to scrape the user's activity from the RunKeeper site (including the GPS points from the user's exercise). I've stored that information in a MongoDB, which has allowed me to do some simple visualisation:

The above video (best played at 720p) shows activities plotted against time of day (the sun overhead in the video indicates midday for that region).

I haven't charted this to confirm, but to my eyes it looks like amount of exercise activity peaks around sunrise and sunset, with almost none at night time (which isn't really a surprise).

For the curious, this is what the colour of the dots indicate:

let createSphere (latitude:float) (longitude:float) (activityType:string) =
    let color = match activityType with
                | "RUN" -> "Red"
                | "WALK" -> "Green"
                | "BIKE" -> "Blue"
                | "MOUNTAINBIKE" -> "Orange"
                | "SKATE" -> "Yellow"
                | "ROWING" -> "Cyan"
                | "HIKE" -> "Brown"
                | "OTHER" -> "Black"
                | _ -> "Black"
    String.Format(sphere, latitude, longitude, color)

On a local scale, the actual GPS traces are of interest:

Activities are present in the more-populated regions of the UK. The blue and orange traces indicating cycling and mountain biking activities are clearly visible.

NOTE: if you think the map looks weird, it doesn't have the usual aspect ratio - I'm not using a Mercator projection do plot the points, but am simply plotting the longitude and latitude linearly.

On an even smaller scale, landmarks around London are visible:

The GPS data contains altitude information, so there are more interesting visualisations that could be done, including generating contour plots. Also, the above only contains three days worth of data - with a larger data set it would be possible to plot to determine whether activities peak on a weekend etc.

Implementation

Acknowledgements:

The 3D plot of the globe was drawn using POV-Ray. The 3D globe model was from grabcad, with the converted to a POV-Ray model using PoseRay.

The UK outline was obtained from the NOAA.

The code was implemented in F# (which was a pleasure to get back to after the C++ I've been doing recently). I did try the MongoDB.FSharp library to store my data records, but they failed to deserialise from the database. In any case, I wanted more control over the data types saved (I wanted to store the GPS data as GeoJson3DGeographicCoordinates, with the first point stored separately as a GeoJson2DGeographicCoordinates with an index on this value). I could have created .NET classes and used the BSON serializer, but it seemed more effort than writing directly to the DB (and this is about the only time I've seen the benefit of C# implicit conversions, but I can live without them in F#).

Why use the Twitter API, and why not scrape the RunKeeper site directly? That's because of times - the RunKeeper website displays the activity time, but it is displayed in local time, and it's not clear whether that's in the user's timezone, or the timezone of the activity. It seems cleaner to instead assume that the tweet has been posted as soon as the activity is finished and store that time as UTC (this assumption may of course not be true, but the results seem realistic).

Show me the Code!

The code's not in a bad shape, but I would like to tidy it up a little before releasing it into the world. I'm busy with other things at the moment, but if I get much interest I can go ahead and do that...

Sipping from the twitter stream

2013-07-17T11:39:00.001-07:00

I’ve been playing around with connecting to twitter’s streaming API, and displaying a live stream of tweets returned.
To do this, I was originally using DotNetOpenAuth, along with the HttpClient, which worked fine for the sample stream, but would return authentication errors for the filtered stream. I looked at the HTML message in fiddler2, and the oauth parameters weren’t ordered, which twitter requires. Instead, I’m using TwitterDoodle, which uses HttpClient.
The C#5/.NET4.5 async/await code is quite elegant – it’s not CPU intensive so it’s fine running on the UI thread, without blocking. My first instinct prior to C#5 would have been to use RX for this, but now if I’m doing something simple I’d stick to async/await, only using RX if doing something more complex like buffering or batching.

    private async void ProcessTweets()
    {
      using (var t = new TwitterConnector())
      {
        var response = await t.GetSampleFirehoseConnection();
        var res = await response.Content.ReadAsStreamAsync();
        using (var streamReader = new StreamReader(res, Encoding.UTF8))
        {
          // streamReader.EndOfStream can block, so instead check for null
          while (!cts.IsCancellationRequested)
          {
            var r = await streamReader.ReadLineAsync();
            if (r == null) { return; }
            ProcessTweetText(r);
          }
        }
      }
    }
 
    private void ProcessTweetText(string r)
    {
      if (!string.IsNullOrEmpty(r))
      {
        var tweetJToken = JsonConvert.DeserializeObject<dynamic>(r);
        var tweetObj = tweetJToken["text"];
        if (tweetObj != null)
        {
          var tweetText = tweetObj.ToString();
          viewModel.Items.Add(tweetText);
        }
      }
    }

The equivalent F# async code obviously looks quite similar, with the added goodness of Type Providers. Time dependent, I am planning to do some more analysis of tweets which would be a good fit for F#.

type tweetProvider = JsonProvider<"SampleTweet.json", SampleList=true>
 
type MainWindowViewModel() =
  inherit ViewModelBase()
  let items = new ObservableCollection<string>()
  member x.Items
    with get () = items
  member x.ProcessTweet tweet =
    let tweetParsed = tweetProvider.Parse(tweet)
    match tweetParsed.Text with
    | Some(v) ->  x.Items.Add v
    | None -> ()
  member x.ProcessTweets =
    let a = async {
      use t = new TwitterConnector()
      let! response = t.GetSampleFirehoseConnection() |> Async.AwaitTask
      let! result = response.Content.ReadAsStreamAsync() |> Async.AwaitTask
      use streamReader = new StreamReader(result, Encoding.UTF8)
      let rec processTweetsAsync (s:StreamReader) =
        async {
          let! r = s.ReadLineAsync() |> Async.AwaitTask
          // streamReader.EndOfStream can block, so instead check for null
          if r <> null then
            x.ProcessTweet r
            return! processTweetsAsync s
        }
      do! processTweetsAsync streamReader
    }
    a |> Async.StartImmediate
  member x.GoCommand = 
    new RelayCommand ((fun canExecute -> true), 
      (fun action -> x.ProcessTweets))

Picking with DirectX 11 and Bullet Physics

2013-07-01T14:13:00.001-07:00

I updated the falling cubes demo to separate out the rendering and physics into separate WinRT components - see the WinRT subfolder under http://github.com/taumuon/Taumuon.Game . The performance of this wasn't noticeably different when consumed by a WP8 DirectX application, but I then switched over to a "DirectX with XAML application", so that the application was hosted via C#, and consumed the WinRT components via a DrawingSurfaceBackgroundGrid, and performance fell off a cliff.

I created the C# host as I intended to put as much game logic as I could into an F# portable library, and the only way to do this is to have a C# hosting application assembly as WP8 apps, unlike Windows Store apps, don't allow WinRT(P) components to be created in anything other than C++.

Instead, of fighting the performance issues, I decided that it would be better to implement the game logic purely in C++ - see the WinRT subfolder under http://github.com/taumuon/Taumuon.Game . This replaces most uses of the C++/CX extensions with standard C++ types, and implements picking, loosely based off http://halogenica.net/ray-casting-and-picking-using-bullet-physics

It gives a childish sense of satisfaction to do something as simple as destroy a wall of bricks so that the wall collapses.

Once I get time, the next step is to make the wall collapse more realistically by adding spring constraints between adjacent blocks, and to remove those springs once they are stretched beyond some elastic limit.

Bullet Physics demo DirectX C++ app on Windows Phone 8

2013-05-18T14:10:00.001-07:00

I’ve finally gotten around to updating my original Windows Store Bullet Physics demo (http://taumuon-jabuka.blogspot.co.uk/2012/04/bullet-physics-and-directx-11-on.html) to build for the Windows 8 final release.

While I was at it, I also got it working for Windows Phone 8 – code is on github https://github.com/taumuon/WP8-Bullet. There weren’t many changes. In building Bullet for ARM, I got around lots of parameter alignment issues by defining BT_USE_DOUBLE_PRECISION. This was the quickest thing to do, and may have performance implications but the performance running on my Nokia Lumia 920 seems encouraging. It wouldn’t have been a much bigger change to instead see if ARM is defined.

ViewGene now has mapping

2013-03-12T13:47:00.001-07:00

The newly-released version of my genealogy/family tree Windows 8 app now has Bing Maps integration, which allows the migration paths of all ancestors to be plotted.

I wanted to see visually how my ancestors moved, after researching how my ancestors moved from villages in the 17th century into towns, and then larger towns.

I’d love to see anyone else’s migration paths!

The below is just a made up sample – I’ll share my map after sanitising the data.

Try it in the Windows Store.

Option pricing in F# using the Explicit Finite Difference method

2013-02-20T13:27:00.001-08:00

It's been a little while since I've coded any F#, so I've done a little coding kata to polish off the rust.

I've converted the explicit finite difference option pricing example from (Paul Wilmott Introduces Quantitative Finance).

I followed similar steps as those I performed in my previous blog post on the binomial tree option pricing; I converted the VBA code into very similar looking imperative F# code, before refactoring out the loops into recursive calls.

It was a useful exercise as converting the algorithm made me actually focus on the mechanics of how it worked, much more than simply translating VBA code into imperative F#. It’d be interesting to hear in the comments whether people find it easy to read.

The code is available https://gist.github.com/taumuon/4999749.

ViewGene

2013-02-06T14:27:00.001-08:00

This blog has been quiet for a while, but I've been busy working on a Windows Store genealogy application, ViewGene.

This is quite a niche application, but it was written to cope with a couple of use cases which are missing from other family tree websites and applications – it's useful to have it open in conjunction with your favourite family tree software or website.

The first problem is that many of the family tree websites, when showing the whole tree (pedigree chart) of ancestors, compress the tree to make the best use of screen real estate. This is fine to save paper, but it does make it very difficult, when navigating the tree, to know what the generation is of any ancestor on the screen.

ViewGene ensures that ancestors of any generation all appear at the same vertical level – this makes it easy to see which branches of the tree need more investigation. This view wouldn't necessarily work well when printed out, but touch-based pan and zoom do mean that it's easy to zoom out to see the 'shape' of the tree, and to zoom in to see the individual details, meaning that it works for interactively investigating the data.

The second use-case is to validate and see the lifelines of the ancestors. Many of the family tree websites allow free-form entry of dates. This means that it is easy to mistype and enter a date of the wrong century, or to enter semantically incorrect dates such as a child born before the father. The Timeline Fan chart allows for some of these problems to be easily seen. The chart also allows the user to see, for example, which ancestors were alive on a given census year.

The current release includes very limited validation, to check that dates match allowable GEDCOM formats, but does not check for semantically incorrect dates. Also, there is very basic date inference for missing dates. If I devote any more time to the application, I'll want to improve both of these points – it should be possible to validate that children were conceived whilst both parents were alive, and that siblings can't have been born in the same gestation period etc.

Windows Store application observations

The chart functionality is quite generic, and would work well on any touch-screen platform (such as iOS), much of the development effort was spent in building an application following Windows Store idioms.

The first was to ensure that the application works when snapped (though this could do with further improvement). The second was to use semantic zoom, which definitely makes it easier to navigate through a large number of individuals in the GEDCOM file). Adding a very large number of individuals to the GridView results in Invalid Quota exceptions being thrown. Even if this weren't the case, there would be a usability issue in navigating through so many individuals. To fix this, if there are a large number of individuals in the GEDCOM file the application adds a further level of navigation through the first letter of the surname.

It would be quite easy to add images to the application (such as pictures of ancestors, or scans or pdfs of documentation), but it is quite tricky in a Windows Store application – the GEDCOM file contains paths to images, but it is not possible from a Windows Store application to open files other than via the open file dialog. It may be possible to work around that by instead presenting an open folder dialog (which would allow access to all files and sub-folders within that folder) and then allowing the user to choose the GEDCOM file from a list, but that definitely feels clunky.

The Pedigree Chart is implemented as an ItemsControl, populated with either Boxes or Lines, with an ItemsTemplateSelector to distinguish between them. It works well, and there are no obvious performance issues, but the ItemsControl does crash if too many items are added, so the number of ancestors are limited to 1000.

The timeline fan is implemented as C++/CX WinRT component, deriving from SurfaceImageSource, with all drawing being performed in Direct2D. If I get time to invest in this, I want to switch out to a VirtualSurfaceImageSource, so that the amount of information presented is customised to the zoom level on the screen.

Non Windows Store issues

There are a few issues not related to a Windows Store application.

The application currently does not have editing – the GEDCOM format is too flaky to be a reliable interchange format; it does not round-trip without the possibility of corrupting the user's information. Ideally, the application would allow connecting to a website to obtain the user's family tree information, but ancestry.com does not provide a user-accessible API. Geni.com does, and if I get time I intend to investigate that.

It is also trickier than it needs to be to implement mapping – it looks fairly easy to integrate with Bing maps, but the problem is that location data in the GEDCOM file doesn't necessarily match up with the required data for plotting points on the map. It is possible to ask the data to enter that information, but then the application should maintain it. This isn't too much of a problem, but it should be maintained per GEDCOM file, and to allow the user to re-import updated GEDCOM files would rely on GEDCOM merging.

The GEDCOM parser is written in C#, mostly as a GoF state machine, with some functional elements. When I originally started coding this up, I intended to start by creating a clunky procedural looking parser (which maintains the state with many flags), then show how an OO GoF pattern was nicer, then show how a F# implementation is nicer still. If I get time I still intend to create a F# version of the parser.

Speaking of F#, the layout logic for the timeline fan is implemented in a functional way, but would be nicer in F# - the problem is that it then needs to communicate with the WinRT component to do the drawing. F# can't directly interop with WinRT, so it would be necessary to create a C# wrapper or consumer of the F# library just for this – and it seems overkill.

Overall, developing a Windows Store app is fairly easy if you have Silverlight or WPF skills. There are quite a number of features I want to add to the application, but how much time I'm going to dedicate to it depends on feedback or download numbers of the current version of the app.

Binomial Tree Option Pricing in F#

2012-07-16T13:11:00.001-07:00

This post will look at implementing the binomial tree algorithm in an imperative style, and refactoring it to be more functional.

The wikipedia page on the binomial options pricing model provides pseudo-code for the pricing of an American Put option.

That code can be implemented almost identically in F#.

and is called as follows:

Functional programmers have probably died a little on the inside on seeing that code, but one of the good thing about F# being multi-paradigm is that it's easy to take code of this form and implement it, and test it for correctness before refactoring.

I did actually look at whether the results of the code were as expected at this point, but I'll cover this later in this blog post.

The first thing to do is to change the initial values V to be created more functionally:

Now, the main algorithm consists of two parts, an outer loop where each 'level' of the tree is calculated stepping up towards the root, using the results of the previous level’s nodes (the nodes nearest to the leaves). The inner loop iterates over all of the nodes at a given level calculating the option's value at that node.

First off, the outer loop can be replaced with a recursive method:

The next change is quite minor – it's for each pass of the inner loop to create a new array, instead of mutating the vi array:

This change now means that there's no need for any mutable state, so the initial values can be created as a sequence instead of an array, and the inner loop can be written in a functional style:

I find this code style is clearer to understand than following the array mutations to see what each part of the code is doing and how those parts interact.

If I was implementing this in C# I'd probably change the variable names to be more descriptive (e.g. strikePrice, riskFreeRate etc) but as one of the key advantages of F# is its succinctness it makes more sense to leave the well-known symbols in. Anyone implementing the algorithm with knowledge of the domain would know what the symbols mean without the more verbose names.

Validating the results

The calculated option value with the requested parameters is 4.486090. Running the EqutyOptions example in the .NET port of QuantLib, QLNet (http://sourceforge.net/projects/qlnet), with the same parameters gives the following results:

The figures are in the same ballpark, but the difference is curious.

Chapter 45 of GPU GEMS 2 creates the initial values in this way:

This gives a result of 4.486375 which still doesn't match QLNet, but is nearer.

The result of running the Cox version of Bubo's code from here http://cs.hubfs.net/topic/None/59053 also equals 4.486375, which matches the results of using this initialisation function. As an aside, I like how that examples shows how F# partial function application allows the variations on the algorithm to be composed, and much more succinctly than using OO.

Bubo’s Tian model does match QuantLib's result exactly. I'll have to dig a bit deeper to figure out which variations on the binomial tree model these different examples are using.

The code is available here: https://github.com/taumuon/FSharp-Binomial-Tree

Metro Bullet source code

2012-07-04T12:24:00.003-07:00

I've updated the demo that shows the Bullet Physics Engine in a DirectX 11 Metro application to build in the Release Preview drop of Visual Studio (original blog post: http://taumuon-jabuka.blogspot.co.uk/2012/04/bullet-physics-and-directx-11-on.html ).

The code needs some tweaks - I need to change the manifest, and change some stuff over to Platform::Agile, but it builds and runs so I thought I'd get it out there.

The code is available at: https://github.com/taumuon/Metro-Bullet

More playing with monads

2012-06-27T12:27:00.001-07:00

This blog post will look at using the continuation monad to sum the nodes in a tree, and then look at a case where a more complex monad is needed.

The book Real World Functional Programming gave an example of an unbalanced tree, which was deeply nested such that recursing over the tree to sum the values would blow the stack:

The solution provided was to sum the tree using continuation passing:

This is now able to sum the tree without crashing, but the code doesn’t flow as well as in the original sumTree method. However, monads can come to the rescue again, in this case the continuation monad:

The code flow now is easier to understand. The continuation passing style and continuation monad turns the code flow inside out, which isn’t a big deal, but means that the func to perform on the result needs to be passed in:

Parallel sum

This is all well and good, but to exploit multicore machines, I want to take advantage of the parallelism in the tree summation algorithm, by summing the different branches of the nodes using tasks. A naïve implementation to do this may be:

The problem with this is that it runs slower than the sequential case for a large balanced tree –if there are only 4 cores in the system it’s counter-productive to create thousands of tasks. The current depth of the tree should be passed around, and tasks spawned until traversal reaches that depth:

This is now faster for a large balanced tree, but for the deeply nested unbalanced tree it throws a StackOverflowException. Note that sumTreeParallel coped with arbitrarily nested trees, as the work to be done no longer kept being pushed onto the stack.

More state is being passed around (the current depth), which is external to the main business logic – it does feel like the state monad could be used here. First though, this is fixed up so that it has the efficiency of the parallel summation, but can cope with arbitrarily deeply nested trees:

Urghhhh. The code is definitely more convoluted now – I played with using the state monad to pass the depth around, but it’s not tail recursive. I thought that it might be possible to make use of the continuation monad here, but the problem is, is that it really needs aspects of both the state and continuation monads. I’m not sure how monad combinations work in F#, but I thought I’d throw this out here to see if anyone shows me the light. The post The Mother of All Monads does say that all other monads could be implemented in terms of the continuation monad, but I couldn’t see an example of anyone implementing the state monad in terms of the continuation monad.

State Monad in C# and F#

2012-06-06T12:04:00.001-07:00

My colleagues and I recently have been trying to gain a deeper understanding of monads, other than using LINQ and RX. Watching all the channel9 videos on monads, the concept on the surface seemed quite simple, but we were felt that we were somehow missing some key point. After reading various blog posts, the penny is finally dropping.

Some post I read somewhere (I can’t remember where) said that it’s easier to understand the individual monads before trying to understand the whole generalised concept. Mike Hadlow’s blog series (http://mikehadlow.blogspot.co.uk/2011/01/monads-in-c1-introduction.html) really explain well the identity and maybe monads (which we already understood, which is just as well as we’re using the maybe monad in our codebase). Unfortunately he didn’t explain the state monad.

The only information we could find on the state monad in C# was Brian Beckman’s Channel 9 video: http://channel9.msdn.com/Shows/Going+Deep/Brian-Beckman-The-Zen-of-Expressing-State-The-State-Monad

We found Brian Beckman’s example overly complex, there’s quite a lot going on in that video (and the code needed de-obfuscating to read more like idiomatic C#), but it’s probably a good second or third example once the basics are understood. I wanted to look at a simpler example, and found this article: http://ertes.de/articles/monads.html#section-6 the explanation of the get and put methods show pretty clearly how the state monad works, and you should probably read and understand that section before carrying on here, as I’m going to jump straight on to show how the random number generator example looks in F# and C#.

C# implementation

First off, here’s an OO random number generator:

It is easy to add three random numbers together using an instance of the generator.

We can make NextShort purely functional, so that calling it has no side-effects, by passing the current state in, and returning the new state back out, along with the result:

To add three numbers, the state has to be kept around, and passed to each of the calls in turn. This obfuscates the main business logic.

Jumping straight to the punch line, using a state monad means that the boilerplate state management can be hidden away so that the algorithm reads as clearly as it did before:

This is a trivial example, but hopefully it makes the point. The rest of this blog post will show the implementation, along with the same in F#.

First off, we have a state class, which holds a Func that given a state, returns a tuple of the state and value. The F# example doesn’t bother with the State class, and I could have made the C# version simply use the Func everywhere, but it seemed clearer to me to have the wrapper object just to get a bit of context.

I created a non-generic State class as somewhere convenient to put the Get and Set methods:

The meat of the monad lives in extension methods in a StateEx class:

and SelectMany is implemented in terms of Bind the same as in Mike Hadlow’s example for the Identity and Maybe monads:

The computation to compose can be expressed in the GetRandom method:

This returns an operation (a Func in the State container) that invokes NextShort with the old state, and returns a computation with the new state and value.

The resulting composed operation using LINQ is described above, but this is what it looks like desugared:

F# implementation

The code for the random number generator and composed operation look similar to the C#:

This has the same problem of explicitly passing the state around. I can ruin the surprise again and show how using monads hides the explicit state passing:

This code makes use of F#’s support for monads via Computation Expressions. The code for the state builder is from here: http://www.navision-blog.de/2009/10/23/using-monads-in-fsharp-part-i-the-state-monad/

The Haskell example also had the >> operator, but of course F# uses that as the forward composition operator J

Instead I’ve implemented:

getRandom looks pretty much identical to the Haskell example:

The composed version using the StateBuilder is shown above, but here’s the desugared version:

The F# version is much clearer to read than the C#, in part due to the type inference meaning that the code has less pointless fluff. Having functions live in the module and not as part of a class aids that too (can simply call setState instead of State.Set in the C#)

Following through the simpler example really helped my understanding, and hopefully will see examples in the future where it will be sensible to use the state monad in my coding.

Monte Carlo C++ AMP

2012-05-21T13:05:00.001-07:00

This blog is starting to look very inconsistent – my last blog post was talking about starting to write a game, and now I’ve gone onto a totally different topic. Due to time the game’s gone onto, if not the back burner, then a gentle simmer. I’ve got to balance doing cool stuff with keeping on top of various technologies that may be relevant as a contractor, and having a life!

This blog will describe using C++ AMP for calculating Pi; the calculation of Pi is a fairly good example of Monte Carlo calculations, as the algorithm’s simple, and we all know the result.

First off, this is what the algorithm looks like in C#:

It simply finds the number of circles representing x-y coordinate pairs, whose vector magnitude falls within the unit circle. This ratio is Pi / 4.

The random class is the Mersenne Twister code from here: http://takel.jp/mt/MersenneTwister.cs. The results are:

Replacing the Mersenne Twister class with System.Random resulted in code which was approximately 30% slower. I’m running this on a dual-core laptop, but have not parallelised it, as I didn’t fancy porting over a Parallel Random Number Generator (PRNG) myself.

Tomas Petricek has an example of using the GPU to calculate Pi using F# here, but his example generates the random numbers on the CPU and uploads them to the GPU to do the calculations and reduction.

C++ AMP Version

Microsoft have just released a C++ AMP Random Number generator library to codeplex http://amprng.codeplex.com/, but I’m using BharathM port of the parallel Mersenne Twister described here: http://blogs.msdn.com/b/nativeconcurrency/archive/2011/12/20/mersenne-twister-sample-using-c-amp.aspx

His example generates random numbers into a rank 2 array of size 4096 * 2, but I’ve modified g_n_per_RNG to be 256.

The first thing I do is to pair up those random numbers into x-y coordinates and to find the square of their magnitude:

Then I determine whether the magnitude of these coordinates fall outside of the unit circle:

The final step is to do a parallel reduce of the resulting array. This is using the just-released reduction method from the C++ AMP algorithms library: http://ampalgorithms.codeplex.com/

The reduce function behaves similar to STL’s std::accumulate in that you can pass in the binary operation to perform on each item. The function takes a rank 1 array, so I’m using view_as to change rank.

To perform the timings I run the algorithm twice, once to warm up (ensure all shaders etc are compiled – see http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/1c6c9f04-1f9f-4f44-99f3-154d991ae5ba )

The results I obtained are:

Calculating the result for half a million random numbers took approximately 16 milliseconds, whereas for the single threaded C# code a million iterations took 44 ms, so accounting for using both cores the performance seems pretty similar. This is quite disappointing, even though the GPU isn’t the most powerful (a mobile 5470), the CPU isn’t the most powerful either, so I was hoping for near an order of magnitude speed up. I wasn’t doing anything clever with tiling, and there may be other bits of the API I’m not using correctly. I’ll have to get hold of Kate Gregory’s book and try with different examples.

Bullet Physics and DirectX 11 on Windows 8 Metro

2012-04-14T13:36:00.001-07:00

I've got a game idea which I think may work on a tablet so I've created the following 'demo'. I've hooked together the Bullet Physics engine with some 3D graphics, and the falling wall of cubes could pretty much be the canonical "Hello, World" equivalent for a 3D game. I'm not sure how much time I'll have to pursue my game idea, but it should in the worst case be a route to learning some new technologies.

The demo won't win any awards for its graphics, and the video is pretty poor quality (I recorded the screen with my camera instead of using any screen capturing software).

Falling cubes

Microsoft seem to be definitely pushing for the use of C++ for game creation with Windows 8, with no announcement of a new version of XNA, but that seems to make sense; C++ is closer to the metal than C# and will get more performance out of lower power ARM tablets. Also, there's quite a buzz with the C++ renaissance with C++11, and modern-style C++. The corresponding new technologies in VS 2010 (PPL) and VS 11 (auto-vectorization, C++ AMP) are especially cool.

I'm also keen to get back into C++ - the past year or so I've been working purely in C#, but all my prior experience was using a mixture of C# and C++, and I do miss C++ (even though I am more productive in C#). Also, my previous roles were generally in a modern style (use of STL, smart pointers etc.) but my most recent experience was "C with Classes" style code, so I've been relearning modern style as well as picking up the new C++11 features.

Building Bullet as a static lib was pretty easy, I just had to switch it over to use the Multi-threaded DLL c-runtime. Defining WINAPI_FAMILY=WINAPI_PARTITION_APP to detect any non-metro API usages revealed the usage of GetTickCount, which is unavailable on Metro apps, so I just turned off profiling by defining BT_NO_PROFILE 1 in btQuickprof.h

The Windows 8 samples including a couple of games which are decent guides on using DirectX 11. These demos aren't supposed to be examples of a whole game engine, and don't necessarily follow best practices: Marble Maze has a class DirectXBase which encapsulates all rendering logic, which is great, but the game logic is in a class MarbleMaze, which derives from this class. It's nothing that would trip up an experienced developer, but beginners may follow the example. The Simple3DGame (which also demos DirectX and XAML interop) is better structured. The only strange thing in that example is that the render primitive classes are all C++/CX ref classes. This seems to contradict what I remember Herb Sutter saying in a Channel 9 video about sticking to ISO standard C++ where possible and only using C++/CX at the boundaries.

I'm not sure yet whether I'll try to continue to create a simple demo of my game idea purely in C++, or create a C++/CX interop layer to facilitate writing the game logic in C# - I'm tempted to do this as I'm more productive in C#, and it'll probably lead to a deeper understanding of WinRT which will be useful even when creating C# XAML applications.

I could have alternatively stayed purely in C# and used SharpDX, or a game framework such as ANX or Ogre (both of which say will soon be supporting XAML), but I was keen on having control of the whole game, C++ upwards - if after profiling any of the C# code any bottlenecks are found, it'll be easier to re-implement those parts of the code in C++.

As to my experiences of developing the demo: The new version of C++ definitely feels more productive, and feels closer to C#. I love auto (var), lambdas (pretty much the same), for_each (foreach) and range for, and initializer lists. Also, async programming using PPL tasks feels very similar to .NET's TPL. I've had to change mental gears a bit in thinking again about object ownership instead of relying on the garbage collector, and it took way longer than it should for me to figure out how to add a std::unique_ptr into a vector.

I could share the demo code if anyone would find it useful (I need to tidy it up a little bit, and can't be bothered otherwise) - just post in the comments!

Messing with Direct2D

2012-03-12T14:38:00.001-07:00

As I blogged about previously, the main bottleneck in the oscilloscope code was in the drawing of the charts, in my previous post I modified DynamicDataDisplay to optimise this, but I wanted to make a Metro version of my app, and didn’t want to convert that library over, so I thought I’d look at doing the drawing myself.

First off, on Windows 7/.NET4 I used the SurfaceInteropHelper which allows Direct2D to be hosted on a WPF control, to do the drawing, and found that the CPU usage dropped from 70% to just under 40% (the code’s not worth showing here, and my use of Direct2D was pretty simplistic).

For the Windows 8 consumer preview, I created an app that used the SwapChainBackgroundPanel . It wasn’t obvious how to do this in an app created from scratch, so I butchered the SDK sample Metro style DirectX shooting game sample (XAML)

This control allows Direct2D and XAML to be mixed in a metro app, but the control needs to be at the root element. I can see how this would be OK for an app such as a game where it could make use of overlayed controls, but for an app such as mine where I’d like to have multiple charts drawn using Direct2D it’s not ideal. Also, it seems to force the skeleton of the app to be written in C++ (where it would be nice to have a C# app host the controls, with the C++ –hidden- ).

This was awkward, but not a blocker to implementing the app on Metro… the blocker was to do with accessing the microphone. There are new Media Foundation libraries to do that, which do interoperate with the new C# async features than DirectShow (expose awaitable types):

The new version of RX (2.0 beta) has a new CreateAsync method instead of having an overload of Create that takes the async lambda).

The code above is untested, as the blocker is in the new MediaCapture library:

There’s no way to create a lossless MediaEncodingProfile from which to extract the raw waveform.

I think I’ll leave this app on the backburner until at least a later beta.

TPL Dataflow

2012-02-15T13:10:00.001-08:00

This blog post is coming a couple of weeks late due to my copy of Windows becoming corrupted – and strangely the only things I didn’t have backed up were this Oscilloscope app!

Anyway, last time I took a look at charting performance, but this post will investigate TPL Dataflow.

In my original F# post, I discussed that I felt that the code would have been better implemented in RX, which I then went and did in this post. However, RX isn’t the only recent new technology coming out of Microsoft that deals with streams of data. TPL Dataflow has a confusingly similar overlap with the usages of RX, and this is what its whitepaper has to say:

“Astute readers may notice some similarities between TPL Dataflow and Reactive Extensions (Rx), currently available as a download from the MSDN Data developer center. Rx is predominantly focused on coordination and composition of event streams with a LINQ-based API, providing a rich set of combinators for manipulating IObservable<T>s of data. In contrast, TPL Dataflow is focused on providing building blocks for message passing and parallelizing CPU- and I/O-intensive applications with high-throughput and low-latency, while also providing developers explicit control over how data is buffered and moves about the system. As such, Rx and TPL Dataflow, while potentially viewed as similar at a 30,000 foot level, address distinct needs. Even so, TPL Dataflow and Rx provide a better together story.”

That does sound very interesting – who doesn’t want better performance!

Implementation

I’ll dive in straight away and look at some code. The code is structured similarly to the RX code in my previous post.

The microphone still provides an IObservable<float[]> via the GetStream() method. The TaskSchedulers are specified explicitly throughout, so that the code can be unit tested (I’m using a MockSynchronizationContext).

The transformManyBlock behaves in the same way to the SelectMany operator, it separates the float array into its constituent parts.

The broadcast block is analogous to a connectable observable – many blocks can be linked to it to receive the same messages.

The transform many block is then explicitly linked to the broadcast block, meaning that its outputs make it into the broadcast block’s input.

Oscilloscope/Buffered View

The code to deal with the buffered display of data is:

The batch block behaves the same as LINQ’s Buffer operator (a bit confusingly when TPL Dataflow’s Buffer means something different). This simply takes the incoming float values and packages them into an array.

The ActionBlock is the final point on the chain – it takes the values it receives and outputs them to the screen (via oscilloscopeOutput) – this is the equivalent code to the code in the Observable Subscription in the RX version of my code.

Sliding Window View

Again, the sliding window view is structured similarly to the previous RX code:

A sample block is created so that every n samples are fed into the SlidingWindowBlock. This returns a new float[] array representing the new window of data on every input.

The second sample block ensures that not every window of data is drawn (so that the chart is refreshed at 40fps rather than 400fps!)

Finally, the observable is subscribed to, and the messages are posted into the transformManyBlock:

Whereas the RX guidelines recommend creating new operators based on existing operators, TPL Dataflow encourages building up custom blocks (though they can be built out of combining existing blocks).

The SlidingWindowBlock is a slightly modified version of the one provided in the white paper, and here (http://msdn.microsoft.com/en-us/library/hh228606(v=vs.110).aspx) The one in the example only posts a window once it has received a full window worth of data, whereas I want to start drawing the windows as soon as data arrives.

The SampleBlock is simple, it yields a value once it has received enough inputs, discarding the intermediate ones. This is structured more efficiently than my RX implementation – that one buffers the values, and then only take the last value out of the buffer.

Performance

I removed the updating of the chart, to look purely at what the performance of the app is in taking the input from the microphone, and shaping it to be in the correct format for output.

RX Performance:

TPL Dataflow:

The processor usage between the apps is in the same ballpark, but the TPL Dataflow app has many more garbage collections, and # bytes in all heaps also is running slightly higher. I had a quick look in the profiler, and it seems that there are many Task allocations from the Dataflow.Post() operation from within the SlidingWindow and Sample implementations.

The extra generation 1 collections don’t really matter for this application with the functionality that it currently has – the % time in GC barely registers, and there are no Gen 2 garbage collections over the minute or so that the app was running.

Once the app is doing more calculations and drawing more complex charts it would be interesting to see whether any latency spikes due gen 2 GCs cause similar slowdowns to the one discussed at the start of my previous post.

It would be an interesting exercise to limit the amount of GCs throughout the app, for instance there’s no need for the microphone access observable to return an IObservable<float[]> instead of IObservable<float>; currently every read from the microphone allocates a new float[]. Similarly, new List<Point> are created to more easily interface with DynamicDataDisplay – it would be better to change the types of data that D3 takes to be more observable-friendly, and to save having so many allocations. Again, there’s not much point doing this, other than an interesting theoretical exercise, until the garbage collection overhead proves to be a performance issue.

Conclusion

For an application as simple as this, there isn’t any benefit to using TPL Dataflow – it is a powerful library, with functionality such as blocks being able to decline offered blocks, and request them later, which would be difficult to implement in RX. As my app doesn’t (currently) need that level of functionality, there’s no benefit to using the library.

I may revisit this in the future – if I had some computationally expensive operation (FFT for instance) where I’d want greater control over the flow of data through the system.

Charting Performance

2012-01-25T13:12:00.001-08:00

I wasn’t going to blog about performance until a later post, but there was a performance issue I wanted to tackle before discussing some other things.

The frame rate sometimes seemed to be as expected, but sometimes slow, and the application was quite unresponsive when handling the Stop button command.

To help nail this down I created a Buffer Frame Rate performance counter, so that I monitor it along with another couple of metrics (I chose to see the count of gen1 and gen2 garbage collections, along with the % time in GC, and % processor time).

The framerate average isn’t bad, but the frame rate is not consistent; it regularly drops below 10 frames per second. This post will discuss optimising the performance of the application to keep the frame rate steady and ensure that the app is responsive.

Performance increases should generally focus on reducing unnecessary work in the whole application, and not micro-optimisations. In this case, it’s realising that the top chart is drawing at 40 frames per second (a buffer of 1000 data points into a 40KHz source), whereas the bottom chart is updating at 400 frames per second (sampling the data to 100 points, and updating the chart on each of those sample points).

However, as there are other types of charts I want to produce it’s worth investigating where the time is spent.

First off, I tried turning off anti-aliasing, and moving over to the d3future project (http://d3future.codeplex.com ), to see if that made any performance difference, but the performance was pretty similar, with the CPU pegged at 100%. A further run generated this interesting trace:

The frame count starts initially high, but after a while performance drops off dramatically, which is followed by some long spikes in the % Time in GC. The drop-off maybe seems to correspond with one of the gen 2 collections. This situation wasn’t so reproducible that I could catch it under a profiler.

My next step was to profile to see where the application was spending its time, just using the Visual Studio profiler. The application’s hot path was in the D3 LineChart’s UpdateCore method, where it was updating the chart’s Bounds using data from the EnumerableDataSource –it was iterating over the whole collection to find the x and y min and max values (to fit the chart to the data area).

It seems unnecessary to iterate over the whole data source to get the minimum and maximum x values – for a linear chart (i.e. not a scatter chart) it would be expected that the first point is at the minimum x value and the last point the maximum.

I created a chart derived from LineChart where I could instead pass in a List of points for the datasource – this means that I could get the last point without enumerating the whole collection, allowing the min and max x values to be quickly found. For the y values, I happen to want the chart to be a fixed axis and not scale to fit the data (instead scaled by a user-configurable gain factor), so there was no need at all to iterate over the collection.

The chart looked like this after those changes:

The frame rate is pretty consistent now. The gen 1 allocations are still occurring at a rate of 200 per minute, but the application is now responsive to the stop click, stopping immediately.

Profiling with memory allocations turned on showed that the types with most instances was System.Byte[] with 32% of the instances allocated in the profiling period.

ConvertByteArrayToFloatArray() previously looked like this:

And after changing this to not use LINQ:

Gives the following:

Performance is pretty similar to previous, but the gen 1 collections are occurring at a slightly lower rate. Now, profiling tells me that the types with most instances allocated is pretty much split between System.Action, System.Object, and System.Reactive.Disposables.SingleAssignmentDisposable.

Profiling at this point identifies the majority of the work now being in LineGraph.UpdateCore() where its transforming all points from their data coordinates into screen coordinates.

The transformation of coordinates for many points is embarrassingly parallel (think GPGPU). Potentially, the GPU could be put to use by simply drawing the points untransformed, adding a transformation onto the drawing context.

Further performance increases could be done for the sliding window chart: A sliding window chart redisplays the same points again and again, as they move left along the x axis. It essentially transforms the same point many times into the new viewport. Doing a transform to move an existing point to the left could be more efficient than transforming again from data to screen coordinates.

This blog so far has looked at some micro-optimisations focussing on identifying areas where work can be reduced. As I said in the beginning, if I wasn’t so interested in optimising for future requirements, I’d simply ensure that I was doing less drawing straight away.

Anyway, the sliding observable can be change to only take every 10th windowed set of values, by introducing another Sample() on the result of the WindowWithCount():

There isn’t actually that much difference in real-world performance.

The rate of gen 1 collections is lower still than previous, but the profiler shows that app is still spending the majority of its time in drawing the charts. Surprisingly the CPU usage doesn’t seem to have dropped much, although in the profiler the time can now be seen to be split between the drawing of the two charts, instead of dominated by the drawing of the sliding window chart.

The micro-optimisations were still worth investigating, as I have charts in mind that will draw multiple traces simultaneously.

Oscilloscope using RX and C# Async CTP

2012-01-18T12:10:00.001-08:00

In my last blog post I described the implementation of a simple ‘oscilloscope app’ in F#, as I wanted to see how the code would be structured using only F# idioms (http://taumuon-jabuka.blogspot.com/2012/01/visualising-sound.html )

My natural instinct would have been to implement it using the Reactive Extensions for .NET (RX Framework), but I first wanted to investigate a pure F# solution. This post describes an alternate RX implementation.

Similar to my last post, I’ll describe the code inside-out.

Creating the Observable

This code returns an IObservable with a float array payload, representing each read from the CaptureBuffer. The implementation could have internally started a dedicated Thread from which to push out values, but instead I’m using the new C# 5 async functionality (using the Async CTP Update 3), so that my code looks pretty similar to the previous F# example.

The observable takes a CancellationToken, whose IsCancellationRequested is set to true once all subscriptions to the observable have been disposed. The CompositeDisposable returned out from the lambda is also disposed of at that point.

The code loops while its CancellationToken has not been cancelled, asynchronously awaiting for one of the WaitHandles to be set, and then it reads from the buffer. The value is pushed out to subscribers in the OnNext.

The ConvertByteArrayToFloatArray() is trivial:

Subscriptions to the observable

First off, the float array returned from the observable is decomposed into the individual values, using a SelectMany, as sampling, buffering and windowing operations all operate on a stream of floats. Then, the observable is published to an IConnectableObservable. The microphone access returns a Cold observable, meaning that each subscriber to it would end up creating their own microphone access. This would work (the CaptureBuffer doesn’t prevent this), but the connectable observable means that instead all clients share the same observable, and ensures that they all see the same values (so that the traces on the two charts are in sync).

The RefCount() means that when all subscriptions to the observable IConnectableObservable variable have been disposed of then the subscription to the underlying observable will also be disposed.

The top ‘oscilloscope’ trace is a simple Observable.Buffer() over the data stream. There is no need to ObserveOn the dispatcher thread as the subscription occurs on the UI thread. Using RX, it would be easy to schedule various parts of the work onto different threads, but I’ll discuss this in a later article (I want to keep everything on the UI thread for now to compare performance with the single threaded F# implementation).

All subscriptions are added to a CompositeDisposable member in the ViewModel – the Stop button’s command implementation disposes of this, which causes all subscriptions to be disposed of, and so the microphone access loop to be terminated via its CancellationToken.

The windowing operation simply samples every 100 data points, and from those sampled data points takes a sliding window of 1000 values to display. The windowCount variable is closed over to allow the y axis to be continually updated.

The Sample operator is simple, but not particularly efficient – it takes a buffer (i.e a non-overlapping window) of count values, and then takes the last value in the buffer.

The WindowWithCount operator is the same one I discussed at http://taumuon-jabuka.blogspot.com/2011/07/rx-framework-performance-awareness.html (with implementation grabbed from http://social.msdn.microsoft.com/Forums/en-US/rx/thread/37428f58-f241-45b3-a878-c1627deb9ac4#bcdc7b79-bbde-4145-88e4-583685285682 )

And as I also talked about in that post, the RX guidelines recommend implementing an operator in terms of existing operator. There’s only one problem in this case, it’s quite slow, (I’ll get quantitative figures on this in a future blog post discussing performance of all approaches).

The following is faster (again, I know more specific figures are needed):

Comparing implementations

For me, the RX solution is cleaner than the F# solution, and easier to follow. I did implement my F# solution in quite an imperative way though, and should have perhaps used AsyncSeq or mailbox processors, but as reading from the microphone is a push-based activity, none of those solutions would be as clean as RX (of course I haven’t covered using RX in F#). The F# version is much faster, and I’ll take a big more of a dig into performance in an upcoming blog post.

Visualising Sound

2012-01-10T14:38:00.001-08:00

This is the first in a series of blog posts about visualising sound, in a similar way to an oscilloscope. The geek in me thought it’d be a fun thing to do, as well investigate different technological approaches to implementing the app (interesting as it’s a real-time performant app).

An oscilloscope has a single screen, which refreshes on a given time period, displaying a number of traces. The user can control that time period, as well as the gain (the y axis scale).

The top graph is essentially the same view as an oscilloscope containing a single trace, and a fixed time period (further blog posts may investigate varying time periods).

The bottom graph is a sliding window with a longer time period – this is the advantage of implementing an oscilloscope in code, we can create charts that aren’t really feasible in a classic CRT oscilloscope.

This series of blog posts will investigate implementing this screen using F# idioms, as well as using the Reactive Extension for .NET (RX Framework), and TPL Dataflow.

There are further things that could be implemented in future blog posts which may be interesting to see how the varying approaches look like:

· Trigger, or Trigger and hold: Only refresh the oscilloscope view once a certain trigger level has been reached. This is interesting as may want to include a certain number of immediately prior to the point where the trigger was set.

· Log many traces.

· Spectrum analyser/FFT.

· Digital filtering.

· Comparing traces, or more complicated math channels.

· Heatmap.

· Adjustable time offset (delay) – useful on the oscilloscope view to centre a waveform on the screen, or for when comparing two or more channels output.

F# Implementation

This blog post covers an F# implementation of the microphone, using asynchronous workflows. The graphing is being done by a nightly build download of Dynamic Data Display (D3). I’m really impressed with the performance, but it is quite difficult to customise.

The low-latency microphone code is from this article on gamedev (http://www.gamedev.net/topic/586706-slimdx-directsound-capture-microphone-stream-with-low-latency-example/).

The implementation of this is pretty simple; I’ll start from the detail out.

The inner loop of the program is an asynchronous workflow that reads from the buffer and returns a sequence:

Note the slightly-strange looking code:

let task = System.Threading.Tasks.Task<int>.Factory.StartNew(fun () -> WaitHandle.WaitAny(waitHandles))

F# has an Async.AwaitWaitHandle() method, which unfortunately only waits on a single handle. We want to wait on both handles so that we get notified when the buffer is full every 4096 instead of every 8192 bytes. With 4 bytes per sample, and a sample rate off 44KHz, this is equivalent to getting notified at an approximate rate of 40 times per second instead of 20 times per second.

I could have implemented Async.AwaitAnyWaitHandle() taking an array of WaitHandles, but looking at the code in the F# PowerPack, the code was quite complex. So, the code instead creates a new future to do the waiting and let us know which WaitHandle was set (this does mean that we’ve got the minor overhead of scheduling a new task to run on the task pool).

The Async.StartImmediate method ensures that the ProcessStream call is marshalled back onto the UI thread. It may be worth in the future looking at doing more of the data processing on a dedicated thread, leaving the UI thread free for drawing and user input.

The convertByteArrayToSequence is simple, it just iterates over the buffer in 4 byte chunks, and converts the values to floats, which it yields in the sequence:

The ProcessStream method looks like this:

For completeness, this is the Seq.sample module extension:

The nastiest bit of the code in ProcessStream is probably towards the end, where the windowedUnderlyingData is manipulated to ensure that the window only contains 1000 samples. It would be nice to do this in a non-imperative way, using Seq.windowed, but the problem is, is that the sequence we’re operating on is only the result of one buffer read operation, whereas windows etc. should operate over the whole data stream, and the sequences can’t be combined into a single sequence using yield! as they’re all generated asynchronously. Similarly, the buffer takes non-overlapping windows over the input sequence, and without taking a buffer over the whole stream, it may miss samples off the end of the sequence portions. Tomas Petricek has written a library, AsyncSeq, http://tomasp.net/blog/async-sequences.aspx which I may investigate in a later post.

The alternative to this would be to implement mailbox processors, having different agents for the buffering, windowing and sampling operations. I did start investigating this, but didn’t feel happy with them being pull-based (they don’t return data unless asked). I could have set them up to act more like a dataflow network, but it does seem to go against their intended use of managing state between different threads. I may revisit this in a future blog post.

I feel that even though F#’s does have nice features which helped to quickly implement the app, RX would probably be a better fit. I guess I won’t know until I implement it in RX and compare the differences.

Family tree timelines

2011-10-30T11:49:00.000-07:00

As I talked about in my previous post, I created a simple family tree visualisation program to let me know which of my ancestors I have the most interest for.

I also prototyped up a timeline view - in the above image. It's probably pretty obvious that it's a prototype, as there are no labels to identify any of the individuals.

The idea for this came about as genesreunited has a non-validated freeform text field for all entries of dates. There's no way to validate whether this information is valid for GEDCOM export - the GEDCOM spec lets various dates, such as approximate dates, bounded dates, dates within a specific quarter etc. to be specified but obviously the date has to be specified in the correct format.

I wanted to check that all dates were both in the correct format, and actually valid (i.e. check for nonsensical dates such as parents born after their children etc).

The idea behind the colouring is for green to show a GEDCOM format valid date, yellow to specify a missing date (with the date inferred), and red to indicate an invalid date.

The opacity is to indicate the 'confidence' in a date - with a specified date range not being at full opacity. Also, the dates can be inferred using a set of rules (e.g. parents should be at least 12 years older than their children, a child is probably born within a year of his or her baptism, the first child is born within a year of his or her parents marriage etc.). These rules could obviously get quite complicated.

The layout matches pretty much the layout of the ancestor chart, with ancestors being adjacent to each other. I was fussing over this a little bit - I thought it'd be nice for the descendents on the chart to be nearest to each other on the x-axis), but there's no way for this to happen through the tree.

I was feeling pretty happy with this, and thought that it's probably worth putting into the app (with some tweakes such as making the y-axis an adorner layer that adjusts with scale etc), but then I found that there's some software which has a pretty-much identical view to this (I did google around for this before implementing, I obviously didn't google hard enough).

Progeny Genealogy has an identical layout (OK, turned on its side). It even has the same idea of using opacity to indicate what data is estimated (not varying opacity, but even still). I guess that there's only so many ways to solve a problem, but it's still gutting when you think you've had an original idea!

Windows 8 Metro Development Experience

2011-10-29T12:35:00.000-07:00

I’ve been running the Windows 8 developer preview for a few weeks, and thought I’d blog about my experiences in converting a Silverlight app I’ve written into a Metro one. I’ll first describe the app, then my experiences developing it.

The App

The application is a family tree viewing application. It came about as I’ve found a few branches my family tree back to the 1600s (with help hooking up with other people on GenesReunited). It isn’t very friendly if you want to see the whole tree as it shuffles all of the individuals around to minimise the screen real estate, which is good for printing, but not so good to make sense of the tree. I was interested in seeing which of my ancestors I had the most history for, and which I wanted to research next, so I created an app to show ancestors without siblings, and to favour clarity over screen real estate.

(When I started writing this app, there didn’t seem to be any family tree apps which showed the whole family tree. I’ve since found that Gramps uses GraphViz to do exactly that).

The screenshot above shows that Metro apps really don’t much unnecessary chrome. Any infrequently used commands are hidden in the app bar at the bottom of the screen. The next screenshot shows the app at full zoom with the application bar hidden.

Development

Now I’ve laid out the background, I can talk about the development experience, and some of that will include talking about Windows 8. First off, Metro’s UI is a bit toy-ish, but I can see that it will feel slick on a touchscreen (I’ve got a Windows Phone 7 device, and am struck by the similarities). The green is pretty garish thoughout, but I’ve attempted to follow this look in my app.

UI and controls

It doesn’t feel as if Microsoft has paid enough attention to using Metro apps with the mouse and keyboard yet. A case in point is in the zooming in this application – I’ve provided a slider in the search bar, but it’s unclear whether there will be a system-wide gesture mapping to pinch-to-zoom in the release. This was necessary for testing, as even though Visual Studio has a simulator which can be used to simulate gestures, the zoom gesture uses the mousewheel, which my laptop does not have.

The actual conversion was surprisingly simple (they promised that this would be the case on the Build videos), but I’m glad that I didn’t hit any blockers. My ViewModels needed minor changes (which I’ll talk about below), and most of my Xaml just got moved across.

The ScrollViewer now allows zooming and panning, so I was able to use this instead of my own ZoomPanControl. Strangely, the zooming is performed by a method instead of a dependency property (making it easier to animate zooms etc). Also, even though it’s drawing visuals, the zoom seems to take a bitmap at full zoom and simply scale that down (this is speculation on my part, but to my eyes the scaled content looks a lot worse than a WPF version which uses scale transforms). Here’s a WPF version of my app at similar zoom:

The text is almost readable, and the connecting lines aren’t suffering from the ugly aliasing.

There are some further niggles, it seems that the ScrollViewer has a minimum zoom factor of 0.1. Additionally, the slider control, even though it was set with a minimum of 0.05 and maximum of 1, would only display values of 0 and 1 on the popup.

Coding

Now I’ve described some of the structural changes, I can describe some of the coding changes. Most of the changes were pretty trivial in the ViewModel – apart from the use of async. The Silverlight of the app was using the Reactive Framework. Other than reading a couple of articles, I hadn’t gotten into looking into C# 5 async yet, but it was pretty trivial to switch over. Testing it is another matter – I ended up downloading the Async CTP to see their unit testing sample, and found that it was the most complex sample throughout. The Reactive Framework allows you to simply control Virtual time using the scheduler, and I haven’t seen anything so simple or elegant for C# 5 async (though testing Observable.FromAsyncPattern methods are similarly tricky to test, as they don't use the scheduler requested; relying on the underlying IO Completion ports for scheduling the work).

[Edit] I originally blogged that I was concerned that there wouldn't be a version of the Reactive Framework for .NET 4.5/WinRT, following some forum rumours. However, the guys have a .NET 4.5 RX build ready.

Other changes in the code mapped naturally across – the FileDialog now returns an IInputStream, but this has a method AsString() to map across to a .NET stream. The List class has had a few of its methods (such as the Sort overload taking a delegate) removed, annoyingly.

Also, now that asynchronous calls are so pervasive, I’m surprised that the Silverlight Toolkit BusyIndicator didn’t make it in.

Visual Studio

First, the unit testing tool is definitely pre-beta. I didn’t actually finish investigating how to unit test my C# 5 async method conversions, as I couldn’t stomach using the tool any longer. I’m also not overly-enamoured with the new Find dialog. Otherwise, it seems to be pretty stable.

Metro-ising

So far I’ve spoken about how easy it was to convert over a Silverlight application to Metro, but Metro does allow many compelling features to be very easily added to the application. Charms could be provided to allow searching of family tree information, and to allow the images to be easily shared.

Once I get hold of a touchscreen device, I’d love to add some snap points to the chart.

Other thoughts

Overall, I’m pleased with the development experience of targeting Windows 8 Metro. I haven’t spoken about the actual platform in this blog, but after the months of silence and confusion about Windows 8, it’s all good news. .NET developers are still first class citizens.

I’m happy that WinRT is back to native code, and as I have experience in C++/CLI I’m very happy that C++/CX seems pretty much identical. From a .NET coder point of view, it’s a little concerning that .NET apps will have a slight performance disadvantage from the COM Interop of the WinRT projected types, but I suppose that’s simply the whole ‘use .NET for productivity, C++ for performance’ argument. And having worked on a few apps that had all of the performant (and usually legacy) parts of the system in C++, with a .NET UI, and the subsequent marshalling layer, it’s quite heartening to think that now we can stay in C++ and write a fast and fluid UI without changing languages.