# EO-Atlas Data Dictionary

This dictionary documents the schema of `eo-atlas.db`, the SQLite artefact
shipped alongside EO-Atlas content. The artefact is regenerated on every
build from the markdown content collections in `src/content/`.

**Status:** Phase 1 stub. A schema-introspecting auto-generator script lands
in Phase 3 once third-party dataset consumers materialise. For now this file
is hand-maintained.

**Authoritative schema:** `vision-architecture/atlas-eo-data-model` in the
SpectraWorks Vision wiki. Code at `src/lib/schemas.ts`.

**Licence:** CC-BY 4.0. See `LICENSE-CC-BY.txt` alongside this file.

<!-- POPULATION_AUTOGEN_BEGIN -->
## Current population (2026-05-22)

Live as of the most recent build. Empty collections render as SOON in the directory chrome and have no rows in the .db yet.

- 62 companies
- 32 sensors
- 12 buses
- 41 missions
- 25 data products
- 8 standards
- 1 regulation
- 1 ground segment
- 2 launch services
- 0 reference stations
- 0 fleets
- 2 topics
- 0 news items

Built at 2026-05-22T21:17:39.381Z. Atlas v0.4.
<!-- POPULATION_AUTOGEN_END -->

## How to read the .db

```python
import sqlite3
con = sqlite3.connect("eo-atlas.db")
print(con.execute("SELECT * FROM current_refcal_tier").fetchall())
```

Or in JavaScript via `sql.js`. Or in any tool that speaks SQLite 3.

## Conventions

- Every entity table has a `slug` primary key (kebab-case, stable across
  rebuilds).
- Foreign keys reference the target table's `slug`.
- Array fields and nested objects are stored as JSON text. Use SQLite's
  `json_extract()` to query.
- Booleans are stored as INTEGER (0 / 1) per SQLite convention.
- Dates are TEXT in ISO 8601 (`YYYY-MM-DD`).
- All entity tables include provenance columns: `last_verified_date`,
  `verified_by`, `claim_status`, `claim_contact`.

## Entity tables (12 entity types; content_pages excluded, markdown only)

| Table | One row per | Notes |
| --- | --- | --- |
| `companies` | Legal/operational entity | `kind`: commercial / agency / research-institute / consortium. Required `operator_domains` JSON array, curator-vetted bare domains used for claim-binding (d_b3f843ed). |
| `products` | Discrete product | `kind`: sensor / bus / ground-segment / launcher / analysis-service / data-platform / consortium-platform. Optional `product_status`: operational / in-development / demonstration / retired. Conditional spec block (`sensor_spec`, `bus_spec`, etc.) per kind. Sensor spec carries `measurement_principle` (passive / active / multistatic), optional `channels[]` for multi-channel sensors (HyperScout-2 VNIR+TIR), and an `emitter` sub-block for active / multistatic. |
| `missions` | Specific deployment | `operator` (Company FK), `platform` (Product FK, kind=bus). |
| `data_products` | Specific data offering | `level` (L0/L1/L2/...), `distributor` (Company FK), spectral fields denormalised from upstream Sensor for query speed (consistency enforced at build). |
| `fleets` | Operator-claimed homogeneous grouping | `operator_claimed_homogeneity` is always 1 by construction. Heterogeneous brand groupings are NOT Fleets; they live in `cross_references` with `kind=brand-grouping`. |
| `ground_segments` | Downlink + processing infra | KSAT, ESA PDGS, individual mission ground stations. |
| `launch_services` | Launcher / rideshare programme | SpaceX rideshare, Vega-C, etc. |
| `reference_stations` | Calibration anchor site | RadCalNet sites, CEOS PICS, etc. |
| `standards` | Calibration / format / processing standard | RefCal-v0, CEOS Cal/Val, Sen2Cor, STAC. |
| `regulations` | Regulatory framework | `instrument_type`: regulation / directive / decision / treaty-mechanism / voluntary-standard / national-law. `legal_status`: in-force / phasing-in / adopted-not-yet-in-force / proposed / superseded / expired. `superseded_by` chains to the replacement regulation. EUDR, ESRS E1, etc. |
| `topics` | Buyer-side intent vocabulary | Methane monitoring, deforestation, etc. |
| `news` | Auto-ingested news item | Phase 2 deliverable. Source URL + published date + topic tags. |

## Relationship tables (8)

| Table | Edge between | Key attributes |
| --- | --- | --- |
| `mission_payloads` | Mission - Sensor | `flight_status`, `commissioning_date` |
| `mission_buses` | Mission - Bus | `integration_notes`, `customisations` |
| `data_products_from_missions` | DataProduct - Mission - ProcessingChain Standard | `processing_software`, `available_from_date` |
| `calibration_gradings` | DataProduct - Standard | `tier`, `assigned_date`, `methodology_version`, `superseded_by` (chain to later grading) |
| `compositions` | Product - Product | `kind`: tested-integration / declared-supported / inferred-compatible / incompatible. Optional `evidence_mission`. |
| `topic_capabilities` | Topic - DataProduct or Product | Polymorphic target (`target_type`, `target_slug`). `quality`: well-suited / adequate / marginal / unsuitable. |
| `regulatory_drivers` | Regulation - Topic or DataProduct | Polymorphic target (`target_type`, `target_slug`). `relation`: mandates / informs / enables. `evidence_strength`: explicit / inferred. |
| `cross_references` | Any entity - Any entity | Polymorphic both sides (`from_type`, `to_type`). `kind`: supersedes, predecessor, related, brand-grouping, etc. |

## Utility tables

| Table | Purpose |
| --- | --- |
| `sources` | Citation records for every entity carrying a `sources[]` array. Composite PK (`entity_collection`, `entity_slug`, `source_id`). Lets consumers query "all entities citing this URL" without re-parsing markdown. Per `citations-and-sources-spec`. |
| `meta` | Key-value build metadata: `version`, `built_at`, `entity_counts` (JSON), `total_rows`, and one `count.<collection>` row per collection. Plus `count.products.sensor` / `count.products.bus` for catalogue splits. |
| `_migrations` | Schema-migration ledger. One row per applied migration: `id`, `description`, `applied_at`. `build-db.ts` seeds this with every file in `scripts/migrations/` (the from-scratch build is by definition up to date). Downstream consumers of an older .db run `npm run migrate` to apply the delta. |

## Views

### `current_refcal_tier`

Latest non-superseded calibration grading per (data product, standard) pair.
This is what sensor pages render in the spec table.

```sql
CREATE VIEW current_refcal_tier AS
  SELECT data_product, standard, tier, assigned_date, methodology_version, edge_slug
  FROM calibration_gradings
  WHERE superseded_by IS NULL;
```

To roll forward when RefCal-v1 ships: append a new `calibration_gradings` row
referencing the v1 standard, set `superseded_by` on the v0 row pointing to the
v1 row's `edge_slug`. The view automatically returns the v1 grading.

## Indexes

Indexes are on join columns and frequently-filtered fields. Run
`SELECT name FROM sqlite_master WHERE type='index'` to enumerate.

## Citation

If you use EO-Atlas data in academic work, cite as:

> SpectraWorks B.V. (2026). EO-Atlas: an open earth observation supply-chain
> directory. https://sources.spectraworks.nl. Licensed CC-BY 4.0.

## Schema evolution

Schema changes ship via migration scripts at `scripts/migrations/<YYYY-MM-DD>-<slug>.ts`.
Each file exports `id`, `description`, and an `up(db)` function that mutates
an existing .db. Applied versions are tracked in the `_migrations` table.

Build pipeline: `build-db.ts` always emits the latest schema from scratch and
seeds `_migrations` with the full list, so a freshly-built .db is by
construction up to date. Downstream consumers caching an older copy run
`npm run migrate` to apply the un-applied deltas.

This dictionary is updated by hand when the schema changes. Auto-generation
from Zod schemas lands in Phase 3.
