Projects

Introducing Research Tool

Most teams working with large, messy research corpora have similar problems: There’s too much source material, it’s too mixed, and it’s too fast-moving. Important data is buried across PDFs, filings, datasets, legislation, notes, and different versions of the same source. Search returns too much. Analytics expects you to already know what to ask. AI can confidently deliver analysis without showing its work.

Research Tool is built around a different set of ideas.

First, durable truth should live in manifest-backed artifacts, not in query engines or projectiifons. Documents, datasets, enrichments, and analytical outputs are persisted as explicit, replayable assets in easily portable formats. Graph, search, and vector systems are downstream surfaces optimized for exploration and retrieval, not storage.

Second, evidence should survive every transformation. Documents retain internal structure and stable addressing. Structured data retains semantic meaning rather than being reduced to disconnected rows. Versions are preserved rather than overwritten. Every derived result can be traced back to the text spans, rows, cells, or other source coordinates that support it.

Third, change should be treated as a first-class analytical primitive. In large corpora, most content is redundant. The highest-signal view is often the set of things that changed: a clause that appeared, a threshold that moved, a shifting definition, a metric that stopped meaning the same thing. RT treats DIFF not as a utility feature, but as a discovery surface.

This gives RT a different operating model from both standard search and conventional analytics.

Search systems are good at retrieving likely matches, but they are weak at preserving structure, surfacing lineage, and guiding bounded downstream work. Analytics systems are good at answering well-formed questions, but they usually assume the joins, entities, and hypotheses are already known. RT sits in the middle: it preserves structure like a parsing and artifact system, supports bounded discovery like a planning system, and produces replayable results like a data platform.

Under the hood, RT models a corpus through four core constructs:

Spines: shared coordinate systems for time, geography, scales, and meaning
Subjects: the things whose stories matter — entities, cohorts, documents, groups
States: what is true of a subject at a moment or in a context
Arcs: how those states change across time or conditions

This allows both entity and system dynamics to be represented with the same machinery. RT can ask what changed for a single subject, what changed across a population, what changed within a single document version, or what changed across an entire corpus.

RT is also designed around a bounded discovery loop. Retrieval and graph traversal are not the end of the workflow; they are planning surfaces. They help identify high-signal candidate sets for deeper compute, so expensive analysis runs only where there is signal, coverage, and a meaningful question. The outputs of that compute are then persisted as durable artifacts and projected back into exploration surfaces for the next round.

Human assessment remains first-class throughout. Machine-generated results are provisional until validated, challenged, or enriched by an analyst. Notes, annotations, and assessments are anchored to evidence and become reusable analytical objects rather than disposable commentary.

The result is a system with a few strange but important ideas:

durable artifacts over implicit state,
evidence over plausibility,
change above static snapshots,
and bounded discovery over endless retrieval.

Research Tool is not simply a better search layer for documents and data; it is an infrastructure for building analytical conclusions that are explainable, replayable, and defensible.