Tag Archives: market

Getting all market aggregates from Market.fuzzwork.co.uk – Excel

This is for Excel only I’m afraid. And only 2013 and up. Google Sheets just won’t load the size of file that this uses.

On https://market.fuzzwork.co.uk I generate aggregate data for all items sold on the market. This is things like average price, minimum price, maximum price, and the average price if you were to buy the cheapest 5% of the market. (handy for evening out any outliers.) This is available via an api, which is documented on the site. There’s a function for google sheets, and instructions on how to use it with power query in Excel.

However, this isn’t wonderful if you want to get everything. To that end, I’ve started dumping out a file with all the data in it. This is updated once every 30 minutes (or so), and is about 24MB uncompressed. Google sheets chokes on it. Excel handles it ok though. So that’s what I’m going to talk about here. I use Excel 2016, so this is focused towards that. I think it’ll work with 2013, but I have no way to test that.

The URL for the data is https://market.fuzzwork.co.uk/aggregatecsv.csv.gz and is a gzipped csv. With headers. The first column is the region id, the typeid, and if it’s a buy or sell order.

To load this into Excel, create a new sheet. On that sheet, create a new query (Data ribbon, get data, from other sources, blank query). In the advanced editor, you want to paste in:
let

let
Source = Binary.Decompress(Web.Contents("http://market.fuzzwork.co.uk/aggregatecsv.csv.gz"),Compression.GZip),
#"Imported CSV" = Csv.Document(Source,[Delimiter=",", Columns=10, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"what", type text}, {"weightedaverage", type number}, {"maxval", type number}, {"minval", type number}, {"stddev", type number}, {"median", type number}, {"volume", type number}, {"numorders", Int64.Type}, {"fivepercent", type number}, {"orderSet", Int64.Type}})
in
#"Changed Type"

That should then load all the aggregate data into a sheet for you. Call that sheet something appropriate, like marketdata

Then to query it:

=VLOOKUP(“10000002|34|true”,marketdata!$A:$J,9,FALSE)

Referring to the sheet name you configured. you can assemble the lookup value as you want to. Region ID, typeID, if you want buy or sell orders. true for buy, false for sell.

Something like

=VLOOKUP(“10000002|”&A1&”|false”,marketdata!$A:$J,9,FALSE)

works. Where A1 contains the typeid you’re interested in.

market.fuzzwork.co.uk

With the introduction (a wee while ago) of the new eve market data endpoint, and the problems which surfaced at Eve Central while that was happening, I decided to create my own market data website. And so, I present market.fuzzwork.co.uk

I posted it was out a few days ago, but thought I should write a little more about it, and the process of creating it.

Background

At fanfest, in one of the round tables, I brought up (again) the idea of an endpoint to get the order book for an entire region at once. While most people just want aggregates, that’s not something that would be easy for CCP to provide for multiple items, at the same time. In part because the aggregation takes time, and because the multiple items makes caching that data a lot harder (different people asking for different lists of things). Lo and behold, a short time later, such an endpoint came into being. It’s paginated into batches of 30,000 items, which is enough for most regions, though The Forge has 9 pages.

Details

So I rented a new server (with an offshoot of OVH called kimsufi) and set about building one. It’s a PHP site, running on an NGINX server, with a PostgreSQL database, and Redis for caching the aggregates.

The downloading script I wrote in python, and after an abortive attempts at aggregating the data, first in the database itself, then breaking it out into the script to process, I settled on using Pandas to do the aggregation. The script version would have taken over an hour to process. The Pandas version runs in a few minutes. This lets me do the grabbing of data once every 30 minutes, retaining it for a couple of days; that means you can look at snapshots of an order over that timeframe, to see how it changes.

That retention brought problems of its own. Not so much in keeping the data (each grab adds a couple of hundred meg of data and indexes.) but cleaning it up. Or being specific, the effects on the import, of having deleted it. Turns out the database doesn’t like it when you’re inserting and deleting 1.7 million rows every thirty minutes. It’s down to how it stores it. I won’t get into technical details, but it went from a couple of minutes, to over 15. which impacted kind of negatively on the performance of the site. The process wasn’t taking much CPU time, but it completely pegged the disk activity at 100%, and led to the site timing out. Not good.

How to solve this issue? One way would be to get a server with SSDs. Unfortunately, these are kind of expensive, relative to the one I’m using. I’m not made of money, after all. So I put together a Patreon campaign. If you want to contribute to the upkeep of the servers, I’d appreciate a pledge, however small it is. (Small is good in fact. My expenses aren’t that high. I’d feel bad about offloading everything to someone else)

However, a thought came to me recently. I’m using Postgres, and it can partition data, based on specific conditions. So I can have the data being stored in a partition based on the order set, and just truncate out partitions as they age out. This is far more efficient than deletion, and shouldn’t impact on the import speed. It’s not fully tested yet, but it’s looking somewhat better already. It’ll increase my data retention a bit (up to 200 samples (4 days ish), rather than the 96  (2 days) I was planning) but space isn’t too much of a concern that way. And partitions allow for more efficient queries.

Future

I still need to write the API interface for it, so you can request an arbitrary number of aggregate/region combinations, but that isn’t too far out. And I’ll provide a sample method for importing into google. In addition, I’m providing the downloaded orderbook as a gzipped csv, for an undetermined period of time (it’s all dependant on space, tbh. and zipped they’re not huge)

I also need to decide how I’m going to winnow the aggregates. The API will only allow for ‘live’ data, as that’s all I’m keeping in redis, but all the data is also being kept in a database table. I’ll likely add a page so you can see how they change over time. To keep the table from becoming an absolute monster, I’m thinking I’ll keep only a week at 30 minute intervals, then averages for the day, if you want to see over a longer time frame.

In case you’re interested, I’m generating a weighted average (price* volume summed up, then divided by the total volume), the max, the min, and a weighted average of the lowest/highest 5% of the orders (lowest for sell, highest for buy) ignoring any price more than 100 times the lowest/highest price. It’ll help when someone puts in a buy order for a million Gilas, at 0.01 isk. Which otherwise completely screws with the average prices.

Other

Oh, and I put together a very simple fleet tracker. Doesn’t do anything other than show you where fleet members are, and what they’re flying. Needs a CREST privilege, and a link you get from the fleet window. And for you to be fleet boss. I may expand it a little in the future. Maybe for invites too. Though that would require me to store the access and refresh token somewhere other than the user’s session.

As normal, all my code is on my github. Might be a little behind the release, but that’s purely down to me not committing it yet.

I’m open to suggestions, if you have them.

Anyway, Steve signing out.