Wednesday, 1 October 2014

.NET SIMD to solve Burgers' equation

I'm enrolled on the mooc Practical Numerical Methods in Python, and module 2 has covered the Burgers' Equation. This is a great course by the way, and it's not too late to join up!

I am impressed with the productivity of Python. ipython notebooks, SymPy, plotting etc. are all great and I feel that .NET could benefit from a similar ecosystem.

That said, I've been itching to play with the new .NET SIMD support and this seemed to be a suitable example.

The Python notebook demonstrated how using Python's array notation to solve the Burgers' equation using finite difference resulted in a 9 fold speed increase, on  my machine taking 25ms instead of 200ms.

I translated the code into C#, and it completed in 2ms, so I immediately was impressed (not surprised though - an interpreted versus (JIT) compiled language...). This wasn't going to tell me much about SSE improvements though, so I increased the number of spatial samples a hundred fold, and the number of timesteps by 10.

The time was then increased to 2580ms.

Installing .NET 4.5.2 and using RyuJIT CTP4 to compile and run, the running time reduced to 1880ms. RyuJIT can't arrive fast enough!

I then implemented array swappping, instead of continually creating and copying arrays, to see if it would reduce pressure on the GC (Calculate2 in the gist), but it had no significant difference.

I then pulled the calculation of (nu * dt / dx) into a variable (Calculation3())

This resulted in a calculation time of 230ms.

Then, I felt it was time to implement SIMD. The documentation is a bit scarce, but it wasn't too tricky to figure out. My machine has 128bit SSE registers, so it can add two doubles (64 bit) in a single operation.

Running this resulted in a calculation time of 90ms, so slightly greater than 2x speedup. I'm not sure whether it was able to make further optimisations based on the changed structure of the code, but this is a great win. This is implemented in Calculation4() in the gist.

The gist is available here: https://gist.github.com/taumuon/5d4db567c57f9846971d

Monday, 25 August 2014

Monte Carlo double barrier option pricing on GPU using C++ AMP

I converted the double barrier option pricing code from my last post to run on the GPU, using C++ AMP, and got an 80% speedup.

This is running on a low powered Acer V5-122p laptop with an AMD A6-1450 processor with an integrated Radeon HD 8250 GPU.

The gist is here: https://gist.github.com/taumuon/bbeeb9e2c1f5082a2699

To be fair, I converted the CPU code to be otherwise identical to the gpu code. Instead of populating an array of the sample path, it instead for each point just determines whether the value breaches the upper or lower barriers and uses the last value for the payoff.

This reduced the runtime from the code in my last blog post, of 2540ms, to 1250ms, i.e. a 2x speedup.

The GPU code was ran twice, the first time it ran in 140ms, the second run (after all shaders already compiled etc) was 15.6ms, i.e. a very impressive 80x speedup from the CPU code.

If anything, it shows that AMDs strategy of cheap low powered laptop chips will payoff if people start taking advantage of the relatively strong GPU.

Monday, 4 August 2014

Monte Carlo pricing of Exotic Options in Functional Style C++

My last blog post was about pricing of exotic options using F#. In that post I said "Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#."

The OO version of this would probably be built via composition: having a class implementing a IPayoff interface (implemented for Call and Put), and that would be used in implementations of IPathPayof, which would be implemented by AsianArithmeticMean, DoubleBarrier etc. The use of OO is to pull out the common implementation, but there's no state associated with any of these classes - it feels much nicer using function composition.

I've implemented a version in C++ using lambdas, and the functional composition is actually really nice. C++ has the std::bind method which means that any number of parameters can be partially applied even for functions not in the curried form.

The code can be found here: https://gist.github.com/taumuon/727c31f4a56518f91718

The most surprising thing was the speed increase - the equivalent algorithm in C++ is 20x faster than the F# code! For instance, running on my low-powered AMD A6-1450 powered laptop, the double barrier option took 45 seconds to calculate in F#, whereas it took 2.2 seconds in C++.

I did tweak the F# to instead of using Seq.unfold to create the asset path, it instead creates and mutates an array, and that reduced the running time by more than 25%.

EDIT: Jack Pappas has forked and modified the F# code so that its runtime is now in the same ballpark as the C++ code. The fork is here: https://gist.github.com/jack-pappas/d629a767823ca2f17967

I think there's scope to speed up both the F# and C++, by introducing parallelism and investigating GPGPU, which I hope to do if I don't get sidetracked.

Friday, 25 April 2014

Monte Carlo pricing of Exotic Options in F#

I've recently completed all exercises for the MOOC Monte Carlo methods in Finance, ran by the University of Madrid on Iversity. The course was excellent, both in its content and delivery.

The course started off by refreshing probability theory (cdf, pdf etc) and quickly moved onto generating random numbers, Brownian motion and stochastic differential equations followed by the pricing of options and calculating risk.
The final week's homework is on the Monte Carlo calculation of a portfolio's risk, using copulas with Student's t distribution, but as that week's homework is still running I'm blogging about the penultimate week's homework.

This homework is about pricing path dependent options, and demonstrating how using a control variate can provide a better result for the same number of iterations (better meaning less error, i.e. with a smaller standard deviation). The homework further shows that when pricing a barrier
option, using an up-and-out option as the control variate produces better results than using a European option as the control variate.

The course delivery was extremely slick. The videos were filmed specially, this wasn't a rehashing of a lecture in front of students (as seems to happen in some other online courses). The lectures were short, and to the point, managing to cram a lot of information in with zero fluff. I really appreciated this as I was watching on a daily commute.

The assessments were by multiple choice question, but with only one attempt, so it did end up actually feeling like a challenge. The questions were a mixture of maths and coding, with the balance moving more towards coding as the weeks went on.
The code was delivered in Matlab, which I had to get reacquainted to after a decade break. The main downside I found is that I ended up doing a fair bit of copy-paste coding, so completed some of the homeworks in .NET instead. I personally found the coding easy, and found the maths more challenging (in a good way).

Now, for the code - the F# code is available here: https://gist.github.com/taumuon/11302896

Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#.

If this course runs again I'd wholeheartedly recommend it.

Friday, 24 January 2014

MOOCs

I've just enrolled on Iversity's Monte Carlo Methods in Finance course, and have converted some of week 1's Matlab demo code over to F# and Deedle: https://gist.github.com/taumuon/8602365

I spent the latter half of last year diving into various online courses. I completed Coursera's Mathematical Methods for Quantitative Finance, and also followed along with a number of other courses to varying levels of completeness.

I completed three weeks assignments of Udacity's Cuda programming course.  Week three was painful due to an error in the reference code, and week 4 was crashing due to a memory exception. I was using a GPU emulator in Ubuntu, and decided that it would be easier with real hardware. I watched the remaining videos and found the parallel algorithms explanations useful.

I completed the first two weeks of Coursera's Scientific Computing. These were maths exercises, which I enjoyed and that's what inspired me to do the Maths Methods for Quant Finance course. The Matlab exercises I was planning to do in F#, but left the course when other attendees were complaining that to complete the homework to the correct numerical accuracy you needed the exact version of Matlab the instructor was using, and they were unable to use Gnu Octave.

It is great that there's so much free high standard material available. The fixed timescale nature of the course is a bit annoying - if work or life gets in the way one week it may make it impossible to catch up with the remainder of the course. I may get around to trying a course again next time it comes around though.

Saturday, 17 August 2013

RunKeeper Visualisations

My last post described how I'm using the Twitter API to receive tweets off the live stream.

Since then, I've used the API to filter for tweets containing the #runkeeper hashtag, and used that to scrape the user's activity from the RunKeeper site (including the GPS points from the user's exercise). I've stored that information in a MongoDB, which has allowed me to do some simple visualisation:


The above video (best played at 720p) shows activities plotted against time of day (the sun overhead in the video indicates midday for that region).

I haven't charted this to confirm, but to my eyes it looks like amount of exercise activity peaks around sunrise and sunset, with almost none at night time (which isn't really a surprise).

For the curious, this is what the colour of the dots indicate:

let createSphere (latitude:float) (longitude:float) (activityType:string=
    let color = match activityType with
                | "RUN" -> "Red"
                | "WALK" -> "Green"
                | "BIKE" -> "Blue"
                | "MOUNTAINBIKE" -> "Orange"
                | "SKATE" -> "Yellow"
                | "ROWING" -> "Cyan"
                | "HIKE" -> "Brown"
                | "OTHER" -> "Black"
                | _ -> "Black"
    String.Format(spherelatitudelongitudecolor)

On a local scale, the actual GPS traces are of interest:






Activities are present in the more-populated regions of the UK. The blue and orange traces indicating cycling and mountain biking activities are clearly visible.

NOTE: if you think the map looks weird, it doesn't have the usual aspect ratio - I'm not using a Mercator projection do plot the points, but am simply plotting the longitude and latitude linearly.

On an even smaller scale, landmarks around London are visible:














The GPS data contains altitude information, so there are more interesting visualisations that could be done, including generating contour plots. Also, the above only contains three days worth of data - with a larger data set it would be possible to plot to determine whether activities peak on a weekend etc.

Implementation

Acknowledgements:

The 3D plot of the globe was drawn using POV-Ray. The 3D globe model was from grabcad, with the converted to a POV-Ray model using PoseRay.

The UK outline was obtained from the NOAA.

The code was implemented in F# (which was a pleasure to get back to after the C++ I've been doing recently). I did try the MongoDB.FSharp library to store my data records, but they failed to deserialise from the database. In any case, I wanted more control over the data types saved (I wanted to store the GPS data as GeoJson3DGeographicCoordinates, with the first point stored separately as a GeoJson2DGeographicCoordinates with an index on this value). I could have created .NET classes and used the BSON serializer, but it seemed more effort than writing directly to the DB (and this is about the only time I've seen the benefit of C# implicit conversions, but I can live without them in F#).

Why use the Twitter API, and why not scrape the RunKeeper site directly? That's because of times - the RunKeeper website displays the activity time, but it is displayed in local time, and it's not clear whether that's in the user's timezone, or the timezone of the activity. It seems cleaner to instead assume that the tweet has been posted as soon as the activity is finished and store that time as UTC (this assumption may of course not be true, but the results seem realistic).

Show me the Code!

The code's not in a bad shape, but I would like to tidy it up a little before releasing it into the world. I'm busy with other things at the moment, but if I get much interest I can go ahead and do that...

Wednesday, 17 July 2013

Sipping from the twitter stream

I’ve been playing around with connecting to twitter’s streaming API, and displaying a live stream of tweets returned.
To do this, I was originally using DotNetOpenAuth, along with the HttpClient, which worked fine for the sample stream, but would return authentication errors for the filtered stream. I looked at the HTML message in fiddler2, and the oauth parameters weren’t ordered, which twitter requires. Instead, I’m using TwitterDoodle, which uses HttpClient.
The C#5/.NET4.5 async/await code is quite elegant – it’s not CPU intensive so it’s fine running on the UI thread, without blocking. My first instinct prior to C#5 would have been to use RX for this, but now if I’m doing something simple I’d stick to async/await, only using RX if doing something more complex like buffering or batching.

    private async void ProcessTweets()
    {
      using (var t = new TwitterConnector())
      {
        var response = await t.GetSampleFirehoseConnection();
        var res = await response.Content.ReadAsStreamAsync();
        using (var streamReader = new StreamReader(resEncoding.UTF8))
        {
          // streamReader.EndOfStream can block, so instead check for null
          while (!cts.IsCancellationRequested)
          {
            var r = await streamReader.ReadLineAsync();
            if (r == null) { return; }
            ProcessTweetText(r);
          }
        }
      }
    }
 
    private void ProcessTweetText(string r)
    {
      if (!string.IsNullOrEmpty(r))
      {
        var tweetJToken = JsonConvert.DeserializeObject<dynamic>(r);
        var tweetObj = tweetJToken["text"];
        if (tweetObj != null)
        {
          var tweetText = tweetObj.ToString();
          viewModel.Items.Add(tweetText);
        }
      }
    }

The equivalent F# async code obviously looks quite similar, with the added goodness of Type Providers. Time dependent, I am planning to do some more analysis of tweets which would be a good fit for F#.

type tweetProvider = JsonProvider<"SampleTweet.json"SampleList=true>
 
type MainWindowViewModel() =
  inherit ViewModelBase()
  let items = new ObservableCollection<string>()
  member x.Items
    with get () = items
  member x.ProcessTweet tweet =
    let tweetParsed = tweetProvider.Parse(tweet)
    match tweetParsed.Text with
    | Some(v->  x.Items.Add v
    | None -> ()
  member x.ProcessTweets =
    let a = async {
      use t = new TwitterConnector()
      let! response = t.GetSampleFirehoseConnection() |> Async.AwaitTask
      let! result = response.Content.ReadAsStreamAsync() |> Async.AwaitTask
      use streamReader = new StreamReader(resultEncoding.UTF8)
      let rec processTweetsAsync (s:StreamReader=
        async {
          let! r = s.ReadLineAsync() |> Async.AwaitTask
          // streamReader.EndOfStream can block, so instead check for null
          if r <> null then
            x.ProcessTweet r
            return! processTweetsAsync s
        }
      do! processTweetsAsync streamReader
    }
    a |> Async.StartImmediate
  member x.GoCommand = 
    new RelayCommand ((fun canExecute -> true), 
      (fun action -> x.ProcessTweets))