geomermaids · GeoParquet
Free daily snapshots of OpenStreetMap data for North America, packaged so you can query them straight from a URL — no downloads, no account, no setup beyond DuckDB.
What is this?
OpenStreetMap (OSM) is the world's free, collaborative map — think of it as Wikipedia for geography. It has every road, building, park, restaurant, subway stop, and hiking trail that anyone has bothered to map. All of it free to use.
The hard part has always been getting OSM into a form you can analyze. The raw data is a single giant file (~200 GB for the whole planet), in a specialised format, not designed for the kind of queries you'd run in SQL or pandas.
This site publishes that same OSM data in GeoParquet 2.0 — a modern, open, columnar file format — split by country, by state/province/region, and by theme (buildings, roads, water, etc.). Every file is small enough to use on a laptop, and the format is cloud-native: your query engine can read just the columns and rows it needs over HTTP, without downloading the whole file.
Bottom line: if you want to answer a question like "how many coffee shops are within 500 metres of Times Square?" or "what's the total building footprint of New Jersey?", you can do it in one SQL query against a URL. You never store the data locally unless you want to.
Coverage
Currently spans three countries — 98 admin regions × 16 themes = 1,568 files per daily snapshot, roughly 15 GB altogether:
- United States — 50 states, the District of Columbia, Puerto Rico, and US Virgin Islands (53 regions)
- Canada — 10 provinces and 3 territories (13 regions)
- Mexico — 31 states and Ciudad de México (32 regions)
New snapshots publish every 24 hours; each one is dated and immutable, so analysis you run today is reproducible next year against the same bytes. Europe, South America, and the rest of the world are on the roadmap — need a specific region sooner?
What can I do with it?
Urban planning & housing
Every building footprint in a city, with typed columns for building type, number of levels, and height. Compute total floor area, detect neighborhoods with unusual density, or map new construction over time.
Retail & site selection
Every shop, restaurant, amenity, and point of interest, already split into pois (points) and amenities_polygons. Find the 5 closest competitors to a proposed site, or count cafes per ZIP code.
Transportation & logistics
Full road network (roads), rail (railways), public transit stops, and airports (aeroways). Build route networks, estimate corridor coverage, or map intersections per square kilometre.
Environment & natural resources
Rivers and canals (waterways), lakes and oceans (water), forests and protected areas (natural_areas), and land use zoning (landuse). Compute watershed areas, estimate forest cover, or overlay with any of the above.
Public infrastructure
Power grid (power) including lines and substations, administrative boundaries (boundaries), and fencing / walls (barriers). Useful for utilities, compliance, and mapping government boundaries.
Research & journalism
Reproducible datasets for stories and studies — snapshots are dated and immutable, so your analysis can be rerun next year with today's data. The full pipeline is open-source if you want to verify what's in each file.
Try it now
Everything below runs against the live URL. Install DuckDB (single binary, no daemon, free) and paste a snippet. Only the bytes your query actually needs get fetched from Cloudflare's edge.
1 · Setup
Install extensions once per session. The url() macro keeps later snippets readable.
INSTALL httpfs; LOAD httpfs;
INSTALL spatial; LOAD spatial;
CREATE OR REPLACE MACRO url(state, theme) AS
'https://parquetry.geomermaids.com/latest/country=' ||
split_part(state, '-', 1) || '/state=' || state || '/' || theme || '.parquet';
2 · How many buildings in a bounding box?
A rectangle around Midtown Manhattan. DuckDB fetches only the row groups that overlap the rectangle; the rest of the file is never transferred.
SELECT COUNT(*) AS buildings
FROM read_parquet(url('US-NY', 'buildings'))
WHERE ST_Intersects(geometry, ST_MakeEnvelope(-74.00, 40.74, -73.96, 40.77));
3 · Tallest residential towers in a neighborhood
Pulls four columns (not the whole 20-column file) and uses a spatial filter at the same time:
SELECT name, building, levels, height
FROM read_parquet(url('US-NY', 'buildings'))
WHERE building = 'residential'
AND levels > 30
AND ST_Intersects(geometry, ST_MakeEnvelope(-73.99, 40.76, -73.95, 40.79))
ORDER BY levels DESC
LIMIT 10;
4 · Match restaurants to the buildings they sit inside
A spatial join between two theme files in Massachusetts, using ST_Contains:
SELECT b.name AS building, p.name AS restaurant, p.cuisine
FROM read_parquet(url('US-MA', 'buildings')) b
JOIN read_parquet(url('US-MA', 'pois')) p
ON ST_Contains(b.geometry, p.geometry)
WHERE p.amenity = 'restaurant'
AND b.name IS NOT NULL
LIMIT 10;
5 · Nearest features to a point
Amenities within ~500 m of Times Square, sorted by distance:
WITH origin AS (SELECT ST_Point(-73.9857, 40.7484) AS pt)
SELECT p.amenity, p.name, ST_Distance(p.geometry, o.pt) AS dist
FROM read_parquet(url('US-NY', 'pois')) p, origin o
WHERE ST_DWithin(p.geometry, o.pt, 0.005)
ORDER BY dist
LIMIT 15;
6 · Compare across states
Pass a list of URLs, group by state:
SELECT state_iso, COUNT(*) AS pois
FROM read_parquet([
url('US-NY', 'pois'),
url('US-MA', 'pois'),
url('US-CT', 'pois'),
url('US-RI', 'pois')
])
GROUP BY state_iso
ORDER BY pois DESC;
7 · Same data from Python
If DuckDB isn't your thing, GeoPandas works just as well:
import geopandas as gpd
gdf = gpd.read_parquet(
"https://parquetry.geomermaids.com/latest/country=US/state=US-RI/buildings.parquet"
)
print(f"{len(gdf):,} Rhode Island buildings")
print(gdf[["building", "name", "height"]].head())
Themes
Each admin region is split into 16 thematic files, each with typed columns promoted from OSM tags:
| theme | geometry | typed columns (excerpt) |
|---|---|---|
buildings | polygon | building, name, levels, height, addr_* |
roads | linestring | highway, ref, oneway, surface, maxspeed, lanes |
railways | point, linestring | railway, name, operator, gauge, electrified |
waterways | linestring | waterway, name, width, intermittent, tunnel |
water | polygon | water, natural, name, intermittent, salt |
landuse | polygon | landuse, name, operator |
natural_areas | polygon | natural, name, wetland |
natural_features | point | natural, name, ele, prominence |
places | point | place, name, population, admin_level, capital |
boundaries | polygon | boundary, admin_level, name, iso3166_* |
pois | point | amenity, shop, tourism, leisure, office, healthcare |
amenities_polygons | polygon | amenity, shop, tourism, leisure, brand |
power | point, linestring, polygon | power, name, voltage, frequency, operator |
aeroways | point, linestring, polygon | aeroway, name, iata, icao, ref, surface |
barriers | point, linestring | barrier, name, access, height, material |
public_transport | point | public_transport, highway, railway, name, operator |
Every file also ships the full OSM tags as a MAP<VARCHAR, VARCHAR> — anything not promoted to a typed column is still there, just a tags['key'] lookup away.
URL pattern
https://parquetry.geomermaids.com/<YYYY-MM-DD | latest>/country=<CC>/state=<ISO>/<theme>.parquet
A few concrete examples:
# Every building in New York State (latest snapshot)
https://parquetry.geomermaids.com/latest/country=US/state=US-NY/buildings.parquet
# Ontario's entire road network
https://parquetry.geomermaids.com/latest/country=CA/state=CA-ON/roads.parquet
# Points of interest in Jalisco, Mexico
https://parquetry.geomermaids.com/latest/country=MX/state=MX-JAL/pois.parquet
# Pinned to a specific date — immutable, reproducible
https://parquetry.geomermaids.com/2026-04-19/country=US/state=US-CA/waterways.parquet
# Machine-readable index of every snapshot ever published
https://parquetry.geomermaids.com/snapshots.json
Dated snapshots are immutable — safe to pin in reproducible pipelines. The latest/ alias always resolves to the most recent.
Build a URL
Pick a region and theme; we'll assemble the URL for you.
—
Technical details (for the curious)
- GeoParquet 2.0 with the native Parquet
GEOMETRYlogical type — not WKB + sidecar metadata, so row-group statistics work properly. - Hilbert-ordered rows so bounding-box queries read contiguous row groups; DuckDB / Sedona / Spark all prune efficiently.
- ZSTD compression, 50,000 rows per row group — tuned for HTTP range-request workloads.
- Hive partitioning by
country=andstate=, so query engines prune entire regions before reading any file. - Daily snapshots, immutable once published. A
latest/alias is kept current. - Built with osmium-tool + DuckDB. Full pipeline source (MIT).
Custom regions, schemas, or SLA-backed hosting
The hosted snapshots here are an opinionated default: North America, 16 fixed themes, one size fits all. For:
- Other regions (Europe, a specific country, a custom polygon)
- Different themes or extra columns promoted from OSM tags
- Per-customer cadence (hourly, live replication diffs)
- SLA-backed freshness & availability guarantees
- Multi-region S3 mirrors or cross-account delivery
Get in touch: contact@geomermaids.com