geomermaids · GeoParquet

Free daily snapshots of OpenStreetMap data for North America, packaged so you can query them straight from a URL — no downloads, no account, no setup beyond DuckDB.

files published
latest snapshot
last refresh

What is this?

OpenStreetMap (OSM) is the world's free, collaborative map — think of it as Wikipedia for geography. It has every road, building, park, restaurant, subway stop, and hiking trail that anyone has bothered to map. All of it free to use.

The hard part has always been getting OSM into a form you can analyze. The raw data is a single giant file (~200 GB for the whole planet), in a specialised format, not designed for the kind of queries you'd run in SQL or pandas.

This site publishes that same OSM data in GeoParquet 2.0 — a modern, open, columnar file format — split by country, by state/province/region, and by theme (buildings, roads, water, etc.). Every file is small enough to use on a laptop, and the format is cloud-native: your query engine can read just the columns and rows it needs over HTTP, without downloading the whole file.

Bottom line: if you want to answer a question like "how many coffee shops are within 500 metres of Times Square?" or "what's the total building footprint of New Jersey?", you can do it in one SQL query against a URL. You never store the data locally unless you want to.

Coverage

Currently spans three countries — 98 admin regions × 16 themes = 1,568 files per daily snapshot, roughly 15 GB altogether:

New snapshots publish every 24 hours; each one is dated and immutable, so analysis you run today is reproducible next year against the same bytes. Europe, South America, and the rest of the world are on the roadmap — need a specific region sooner?

What can I do with it?

Urban planning & housing

Every building footprint in a city, with typed columns for building type, number of levels, and height. Compute total floor area, detect neighborhoods with unusual density, or map new construction over time.

Retail & site selection

Every shop, restaurant, amenity, and point of interest, already split into pois (points) and amenities_polygons. Find the 5 closest competitors to a proposed site, or count cafes per ZIP code.

Transportation & logistics

Full road network (roads), rail (railways), public transit stops, and airports (aeroways). Build route networks, estimate corridor coverage, or map intersections per square kilometre.

Environment & natural resources

Rivers and canals (waterways), lakes and oceans (water), forests and protected areas (natural_areas), and land use zoning (landuse). Compute watershed areas, estimate forest cover, or overlay with any of the above.

Public infrastructure

Power grid (power) including lines and substations, administrative boundaries (boundaries), and fencing / walls (barriers). Useful for utilities, compliance, and mapping government boundaries.

Research & journalism

Reproducible datasets for stories and studies — snapshots are dated and immutable, so your analysis can be rerun next year with today's data. The full pipeline is open-source if you want to verify what's in each file.

Try it now

Everything below runs against the live URL. Install DuckDB (single binary, no daemon, free) and paste a snippet. Only the bytes your query actually needs get fetched from Cloudflare's edge.

1 · Setup

Install extensions once per session. The url() macro keeps later snippets readable.

INSTALL httpfs; LOAD httpfs;
INSTALL spatial; LOAD spatial;

CREATE OR REPLACE MACRO url(state, theme) AS
  'https://parquetry.geomermaids.com/latest/country=' ||
  split_part(state, '-', 1) || '/state=' || state || '/' || theme || '.parquet';

2 · How many buildings in a bounding box?

A rectangle around Midtown Manhattan. DuckDB fetches only the row groups that overlap the rectangle; the rest of the file is never transferred.

SELECT COUNT(*) AS buildings
FROM read_parquet(url('US-NY', 'buildings'))
WHERE ST_Intersects(geometry, ST_MakeEnvelope(-74.00, 40.74, -73.96, 40.77));

3 · Tallest residential towers in a neighborhood

Pulls four columns (not the whole 20-column file) and uses a spatial filter at the same time:

SELECT name, building, levels, height
FROM read_parquet(url('US-NY', 'buildings'))
WHERE building = 'residential'
  AND levels > 30
  AND ST_Intersects(geometry, ST_MakeEnvelope(-73.99, 40.76, -73.95, 40.79))
ORDER BY levels DESC
LIMIT 10;

4 · Match restaurants to the buildings they sit inside

A spatial join between two theme files in Massachusetts, using ST_Contains:

SELECT b.name AS building, p.name AS restaurant, p.cuisine
FROM read_parquet(url('US-MA', 'buildings')) b
JOIN read_parquet(url('US-MA', 'pois')) p
  ON ST_Contains(b.geometry, p.geometry)
WHERE p.amenity = 'restaurant'
  AND b.name IS NOT NULL
LIMIT 10;

5 · Nearest features to a point

Amenities within ~500 m of Times Square, sorted by distance:

WITH origin AS (SELECT ST_Point(-73.9857, 40.7484) AS pt)
SELECT p.amenity, p.name, ST_Distance(p.geometry, o.pt) AS dist
FROM read_parquet(url('US-NY', 'pois')) p, origin o
WHERE ST_DWithin(p.geometry, o.pt, 0.005)
ORDER BY dist
LIMIT 15;

6 · Compare across states

Pass a list of URLs, group by state:

SELECT state_iso, COUNT(*) AS pois
FROM read_parquet([
  url('US-NY', 'pois'),
  url('US-MA', 'pois'),
  url('US-CT', 'pois'),
  url('US-RI', 'pois')
])
GROUP BY state_iso
ORDER BY pois DESC;

7 · Same data from Python

If DuckDB isn't your thing, GeoPandas works just as well:

import geopandas as gpd

gdf = gpd.read_parquet(
    "https://parquetry.geomermaids.com/latest/country=US/state=US-RI/buildings.parquet"
)
print(f"{len(gdf):,} Rhode Island buildings")
print(gdf[["building", "name", "height"]].head())

Themes

Each admin region is split into 16 thematic files, each with typed columns promoted from OSM tags:

themegeometrytyped columns (excerpt)
buildingspolygonbuilding, name, levels, height, addr_*
roadslinestringhighway, ref, oneway, surface, maxspeed, lanes
railwayspoint, linestringrailway, name, operator, gauge, electrified
waterwayslinestringwaterway, name, width, intermittent, tunnel
waterpolygonwater, natural, name, intermittent, salt
landusepolygonlanduse, name, operator
natural_areaspolygonnatural, name, wetland
natural_featurespointnatural, name, ele, prominence
placespointplace, name, population, admin_level, capital
boundariespolygonboundary, admin_level, name, iso3166_*
poispointamenity, shop, tourism, leisure, office, healthcare
amenities_polygonspolygonamenity, shop, tourism, leisure, brand
powerpoint, linestring, polygonpower, name, voltage, frequency, operator
aerowayspoint, linestring, polygonaeroway, name, iata, icao, ref, surface
barrierspoint, linestringbarrier, name, access, height, material
public_transportpointpublic_transport, highway, railway, name, operator

Every file also ships the full OSM tags as a MAP<VARCHAR, VARCHAR> — anything not promoted to a typed column is still there, just a tags['key'] lookup away.

URL pattern

https://parquetry.geomermaids.com/<YYYY-MM-DD | latest>/country=<CC>/state=<ISO>/<theme>.parquet

A few concrete examples:

# Every building in New York State (latest snapshot)
https://parquetry.geomermaids.com/latest/country=US/state=US-NY/buildings.parquet

# Ontario's entire road network
https://parquetry.geomermaids.com/latest/country=CA/state=CA-ON/roads.parquet

# Points of interest in Jalisco, Mexico
https://parquetry.geomermaids.com/latest/country=MX/state=MX-JAL/pois.parquet

# Pinned to a specific date — immutable, reproducible
https://parquetry.geomermaids.com/2026-04-19/country=US/state=US-CA/waterways.parquet

# Machine-readable index of every snapshot ever published
https://parquetry.geomermaids.com/snapshots.json

Dated snapshots are immutable — safe to pin in reproducible pipelines. The latest/ alias always resolves to the most recent.

Build a URL

Pick a region and theme; we'll assemble the URL for you.

Open in new tab ↗

Technical details (for the curious)

Custom regions, schemas, or SLA-backed hosting

The hosted snapshots here are an opinionated default: North America, 16 fixed themes, one size fits all. For:

Get in touch: contact@geomermaids.com