Host a planet-scale geocoder for $10/month

About a month ago I began work a new geocoder (demo), or search engine for places and addresses. I wanted to make something very inexpensive to run. A big barrier to entry for hosting a planet-scale headway instance is the geocoder. Right now we’re using Pelias which is great at what it does, but runs on ElasticSearch which doesn’t do well on <8GB of RAM. I’ve been poking at this problem off and on for years, and didn’t expect to get anything working, but much to my surprise things shaped up pretty quickly.

I was able to cobble together a mediocre address parser based on nom, drawing an immense amount of inspiration from the Pelias parser. Armed with an okay address parser, I turned to tantivy as a search engine, thinking I’d take advantage of its ability to memory map the search index. After a bit of digging, I found tantivy-wasm which runs in the browser and issues range queries to fetch bits of the index as needed. When I saw that the gears really started turning in my head. I didn’t want to fork a search engine library, so I implemented my own backing store for mainline tantivy using anonymous memory maps and userfaultfd to fetch chunks of the index from object storage on-demand via range queries. It worked, and after a bit of tuning the latency is getting into pretty acceptable ranges, around 1-3 seconds generally, and I’ve seen it as fast as 2ms for a simple query if the cache is hot.

Fly.io charges me about $3/mo for the machine it runs on, and just under $7/mo for the 320ish gigabytes of data in object storage, so this has been a very affordable project. I’m pretty happy with the results for the price. I’m planning on extending it handle more locales and use-cases over the coming months. I made a demo site where you can play with it, but don’t expect miracles. You get what you pay for, and I haven’t indexed OpenAddresses for the demo site so if something isn’t in OpenStreetMap I definitely do not have it in the index. The code is all open-source.

Cheers ✨✨