A generic datastore for home monitoring/automation

For a year or more I've been storing temperature readings from 3 probes several times a minute.

The probes themselves are all hooked up to a Nanode that's still in "dev mode" (i.e. a mess on my desk), albeit that one of the probes is actually dangling outside and so providing useful temperature readings (which are input into Cortex - that runs our heating - but more on that another time). The data is sent by the nanode to a server where some php code deciphers it and puts it into a mysql database. More php supports graphing it and a bit more php formats data into xAP packets and sends it to Cortex.

I found that I was changing the database schema when I wanted to add something and because I was using the database as an integration point, with multiple pieces/apps reading from it I would then have to make matching changes to each part at the same time (ironically I realise this is a pain I've been on the receiving end of in my day job for the past couple of years too).

This got me wondering if using a full relational database was really necessary, whether it was use using it as an integration point and maybe some of the "new-fangled nosqls" would be better - given I'm mostly just storing "a few" key/values in a relatively flat structure.

Thinking about what was needed:

  • something that would allow lots of "things" (temperature and electricity sensors mainly) to "chuck" data at it.
  • store it "simply"
  • retrieve it in a useful way (but without being locked too tightly to the application)
  • avoid needing full database libs on to the data providers (Aduino/Nanode type embedded devices)
  • avoid the current mess of several disparate components bridging between the data providers and the data store (which has happened to avoid putting database libs on the providers

After a few posts on automated home I concluded a traditional DB was not such a bad thing though - if done better - and so set-about building on that to get the 'generic datastore' that I wanted.

Key decisions:

  • design the database to be as 'flat' but as generic as possible -- essentially a table with a column for value, timestamp and data-series (i.e. sensor) ID. Nothing else. No application logic, no units.
  • move the integration point to an API accessing the database -- the API layer is the only thing that access the database for either writes or reads. Everything else (data providers and consumers) access via the API
  • the API to be as generic as possible -- no methods to store a temperature reading or format watts to killowatts - all of that would have to be done by whatever calls the API
  • by default, the API and/or the database provides a time stamp on every data point stored - any comparison across sensors over a time period would be impossible without synchronised time and I didn't want to require the data providers to always have accurate clocks
  • use json for data into and back out of the API - the current bits use some XML, some json-like-custom-format and some data-submitted-in-an-http-get. XML is a pig to decode on a low power device and wasteful to encode/send on something with negligible RAM to work with
  • use GETs to retrieve data, POSTs to store

Whereas the server side of what I had was built on PHP/Apache/Mysql, I decided to change that too...

  • Postgres
  • Python - quick to develop, clean language, concise libraries for most of what I wanted to do
  • Flask - simple python framework - seems to "do what it says on the tin" but without lots of additional faff
  • Nginx & uwsgi - lean-ish webserver & module for running the Flask based python app

Why postgres?
I'd tried postgres many years ago, when my reference point was Access (don't laugh) so found it a struggle and eventually used mySQL. Also a few times I've found limitations in mysql (subselects and transactions, if I remember) that meant additional coding.

Why Python?
I've spent the past couple of years running a team that depend on python for some core server build tools and a number of other tools. As a result I got to know it a bit then found it was a very quick & easy way put stuff together. I quickly found that the "quirky" idea of making whitespace important to the structure of the code is actually a good thing as it makes things easier to read later.

Why Flask?
Someone mentioned it at work, so I googled it. At the time I was wondering how to use python to server webpages for something I'd been building, so it looked interesting. I realised it would enable the datastore to be built with python instead of PHP without the large size of Django etc,.

Why nginx/uwsgi?
Driven by lots of googling of how best to run python/flask beyond Flask's builtin dev server. Still not entirely convinced this is the best choice but nginx/uwsgi seems to be working at the moment.

Implementation:

3 tables:

sensor data - the actual table! as few fields as possible:
- data point id (unique / primary key)
- sensor id (from sensors table)
- value
- timestamp

sensors - a unique id for each sensor, link to it's type (below) physical location and any identifying serial number / id

sensor types - a look-up that gives a type/class of sensor - e.g. electricity or temperature.

I might yet need to add more, but after having had the first use of it running for a couple of months over which time I've been adding functionality to the API I've not found the need to add any additional columns or tables - a good sign, I think.