How a write becomes durable
Follow a mutation from the query executor through in-memory staging, the single manifest-publish CAS fence, and crash recovery — the heart of Omnigraph's atomicity guarantee.
L2 — Multi-dataset coordination via __manifest
OmniGraph is not a single Lance dataset; it is a graph of datasets coordinated through one append-only manifest table.
- Manifest table:
__manifest/Lance dataset. - Layout:
nodes/{fnv1a64-hex(type_name)}— one Lance dataset per node typeedges/{fnv1a64-hex(edge_type_name)}— one Lance dataset per edge type__manifest/— the catalog of all sub-tables and their published versions_graph_commits.lance/_graph_commit_actors.lance— the commit graph and its actor map- (legacy
_graph_runs.lance/_graph_run_actors.lancefrom pre-v0.4.0 graphs are inert; the run state machine was removed. The internal schema migration sweeps stale__run__*branches on first write-open; the inert dataset bytes themselves remain until a prefix-delete storage primitive lands)
- Manifest row schema (
object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count):object_type∈table | table_version | table_tombstonetable_key∈node:<TypeName> | edge:<EdgeName>table_branchisnullfor the main lineage and the branch name otherwise
- Snapshot reconstruction: latest visible
table_versionper(table_key, table_branch)minus tombstones — rows whereobject_type = table_tombstone, whose owntable_version(acting as the tombstone version) is>= the entry's table_version. - Atomic publish: multi-dataset commits publish so that a single write to
__manifestflips all the new sub-table versions visible at once. - Row-level CAS on the merge-insert join key:
object_idcarries an unenforced-primary-key annotation so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the sameobject_idrow. Without this annotation, Lance's transparent rebase would admit silent duplicates from racing publishers. - Optimistic concurrency control on publish: a publish asserts the manifest's current latest non-tombstoned version for each touched table is exactly what the caller observed; mismatches surface as an
ExpectedVersionMismatchmanifest conflict naming the table and the expected/actual versions. Concurrent advances surface as a conflict rather than being silently rebased through.