The CSM XII voting has just started, and I'm hoping that I can secure your vote. Or at least a position on your voting ballot. (I'd like position one, of course ;) ) I've made some recommendations. Anyway, take a look at my post and please consider me.

market.fuzzwork.co.uk

With the introduction (a wee while ago) of the new eve market data endpoint, and the problems which surfaced at Eve Central while that was happening, I decided to create my own market data website. And so, I present market.fuzzwork.co.uk

I posted it was out a few days ago, but thought I should write a little more about it, and the process of creating it.

Background

At fanfest, in one of the round tables, I brought up (again) the idea of an endpoint to get the order book for an entire region at once. While most people just want aggregates, that’s not something that would be easy for CCP to provide for multiple items, at the same time. In part because the aggregation takes time, and because the multiple items makes caching that data a lot harder (different people asking for different lists of things). Lo and behold, a short time later, such an endpoint came into being. It’s paginated into batches of 30,000 items, which is enough for most regions, though The Forge has 9 pages.

Details

So I rented a new server (with an offshoot of OVH called kimsufi) and set about building one. It’s a PHP site, running on an NGINX server, with a PostgreSQL database, and Redis for caching the aggregates.

The downloading script I wrote in python, and after an abortive attempts at aggregating the data, first in the database itself, then breaking it out into the script to process, I settled on using Pandas to do the aggregation. The script version would have taken over an hour to process. The Pandas version runs in a few minutes. This lets me do the grabbing of data once every 30 minutes, retaining it for a couple of days; that means you can look at snapshots of an order over that timeframe, to see how it changes.

That retention brought problems of its own. Not so much in keeping the data (each grab adds a couple of hundred meg of data and indexes.) but cleaning it up. Or being specific, the effects on the import, of having deleted it. Turns out the database doesn’t like it when you’re inserting and deleting 1.7 million rows every thirty minutes. It’s down to how it stores it. I won’t get into technical details, but it went from a couple of minutes, to over 15. which impacted kind of negatively on the performance of the site. The process wasn’t taking much CPU time, but it completely pegged the disk activity at 100%, and led to the site timing out. Not good.

How to solve this issue? One way would be to get a server with SSDs. Unfortunately, these are kind of expensive, relative to the one I’m using. I’m not made of money, after all. So I put together a Patreon campaign. If you want to contribute to the upkeep of the servers, I’d appreciate a pledge, however small it is. (Small is good in fact. My expenses aren’t that high. I’d feel bad about offloading everything to someone else)

However, a thought came to me recently. I’m using Postgres, and it can partition data, based on specific conditions. So I can have the data being stored in a partition based on the order set, and just truncate out partitions as they age out. This is far more efficient than deletion, and shouldn’t impact on the import speed. It’s not fully tested yet, but it’s looking somewhat better already. It’ll increase my data retention a bit (up to 200 samples (4 days ish), rather than the 96  (2 days) I was planning) but space isn’t too much of a concern that way. And partitions allow for more efficient queries.

Future

I still need to write the API interface for it, so you can request an arbitrary number of aggregate/region combinations, but that isn’t too far out. And I’ll provide a sample method for importing into google. In addition, I’m providing the downloaded orderbook as a gzipped csv, for an undetermined period of time (it’s all dependant on space, tbh. and zipped they’re not huge)

I also need to decide how I’m going to winnow the aggregates. The API will only allow for ‘live’ data, as that’s all I’m keeping in redis, but all the data is also being kept in a database table. I’ll likely add a page so you can see how they change over time. To keep the table from becoming an absolute monster, I’m thinking I’ll keep only a week at 30 minute intervals, then averages for the day, if you want to see over a longer time frame.

In case you’re interested, I’m generating a weighted average (price* volume summed up, then divided by the total volume), the max, the min, and a weighted average of the lowest/highest 5% of the orders (lowest for sell, highest for buy) ignoring any price more than 100 times the lowest/highest price. It’ll help when someone puts in a buy order for a million Gilas, at 0.01 isk. Which otherwise completely screws with the average prices.

Other

Oh, and I put together a very simple fleet tracker. Doesn’t do anything other than show you where fleet members are, and what they’re flying. Needs a CREST privilege, and a link you get from the fleet window. And for you to be fleet boss. I may expand it a little in the future. Maybe for invites too. Though that would require me to store the access and refresh token somewhere other than the user’s session.

As normal, all my code is on my github. Might be a little behind the release, but that’s purely down to me not committing it yet.

I’m open to suggestions, if you have them.

Anyway, Steve signing out.

market.fuzzwork.co.uk by Fuzzwork Enterprises is licensed under a Creative Commons Attribution 4.0 International License.

  • zyzyx

    Have you tried ElasticSearch DB in place of Postgres? You can create a new index for each snapshot (or day) and then just drop it when no longer needed. Plus, you can store the data as JSON and it’s all magically full-text searchable!

    • zyzyx

      It could even let you get rid of Redis and do the aggregations in the DB in real time (ElasticSearch supports aggregations of some sort, never used it myself though). You can scale the DB to multiple servers without much effort for major performance gains (map-reduce). Oh yeah, and the DB has a native REST API. 🙂