I’m taking a sabbatical week over the holidays. This week’s posts will serve as a sort of report of what I got up to the previous day instead of the usual schedule – wish me luck that I achieve even half of what I’d like to.

Data Points

Image by Voxphoto via Flickr

As I sit here slowly sipping on my tea I realize it may have been an incredibly bad idea to stay up until 8am trying to convince Haskellthat I really honestly don’t care about types as much as it seems to.

It’s really quite funny how weird a statically typed language feels after many years of dynamic languages. Yes, I know, this is a Numerical and that is a Double, figure it the fuck out man! It’s not that difficult! You’d think the hardest part about Haskell would be that it is incredibly strict about the functional programming thing, but no, here I am, having trouble with the most basic of concepts.

But! I prevailed!

I had in my hands a lovely algorithm that can in theory perform rudimentary predictions of how my spending is going to behave in the next few days.  During my morning exercise I realized the implementation doesn’t actually do what I think it does, but hey, at least I have the algorithm figured out :)

The idea is really quite simple:

  1. Smoothen data with a rolling average (a 7 day window seems to produce the nicest curve)
  2. The first unknown data point is simply the expected value (weighed average) of the last few points
  3. Expand weighed average window to include the new data point
  4. Calculate next one
  5. Repeat for as long as it makes sense – the more into the future you go, the more wrong you are

After reading a bunch of papers on data mining time series yesterday I realized that I’m thinking way too much into this. Sure SVM‘s are the best at predicting financial time series and people have extremely good results with backpropagation neural networks – somehow – but I honestly don’t need this complexity. I’m just making a simple tool for myself and it’s more important to have some result than the optimal result.

And either way, according to the papers a neural network is only marginally better than the sliding window approach, and even then only when you’re dealing with data when far-away points have a lot of impact on the future and/or there is a lot of repetition – none of which happens here.

Enhanced by Zemanta