Outcome Engineering

Rooted in understanding,
branching toward
outcomes

Start with business goals and customer insight at the roots, express outcome hypotheses with testable assertions, and let the structure show what holds.

Roots
Business goals and deep customer understanding form the foundation.
Branches
Outcome hypotheses, each with assertions the code must satisfy.
Leaves
Where output meets evidence — assertions the code must satisfy today.
View on GitHub →

What goes wrong with AI coding agents

AI coding agents generate code so fast that anyone can ship features in minutes. The challenge is staying in control as the codebase grows. Plans help, but they are ephemeral — the next session starts from scratch and the plan is just as much a gamble as the code it was supposed to guide.

Spec-driven development makes instructions persistent, but progress is still defined as the sum of completed tasks. Over dozens of tasks, dead code accumulates because nobody can tie the resulting code fragments back to a purpose a human could understand.

Both planning and spec-driven development organise work around tasks — and tasks are the wrong unit.

Goal drifts out of sight

Specs capture what to build, but not why it exists. Over time, the product is defined by what was built rather than what it should be. Dead code accumulates because nobody can tie it back to a purpose.

Agents imitate whatever they find

Context selection is heuristic: grepping for keywords, embedding similarity, tool defaults. As the repo grows, agents pick up patterns without knowing which ones are current or correct.

Instructions decay silently

Specs evolve, but tests still pass for old behavior. The binding between what you intended and what the code does erodes with every change — and nothing signals the gap.

Three principles for staying in control

spx/
├── spx-cli.product.md
├── 21-test-harness.enabler/● valid
├── 32-parse-directory-tree.enabler/● valid
└── 54-tree-interpretation.outcome/○ needs work
├── tree-interpretation.md
├── 43-status-rollup.outcome/◐ stale
└── 54-tree-status.outcome/○ needs work

Always be converging

Iterate on a durable artifact in small, reviewable steps — each reversible and repeatable, each reducing uncertainty about what the system should do.

Agents will get things wrong. Because every agent starts from the spec — not the implementation — mistakes do not compound. Go back one step, try again, and the spec anchors the retry.

spx/
├── spx-cli.product.md
├── 15-cli-framework.adr.md
├── 21-test-harness.enabler/● valid
├── 32-parse-directory-tree.enabler/● valid
└── 54-tree-interpretation.outcome/
└── 43-status-rollup.outcome/

Determine context

Never generate what can be derived deterministically.

The Spec Tree replaces heuristic context with structural context. The path from root to node defines what an agent receives: ancestor specs and lower-index siblings at each directory along the path.

Curate context rather than letting agents search the codebase. Link traceability explicitly rather than inferring it. Reserve generative capacity for the parts that require it.

spx/
├── spx-cli.product.md
↑ human decides: why this product exists
├── 15-cli-framework.adr.md
↑ human decides: architecture constraint
└── 54-tree-interpretation.outcome/
↓ agents propagate through dependent nodes

Ration rationalization

Decide what cascades; constrain everything else.

Given latitude, agents rationalize — they construct plausible justifications for whatever choice they face, and plausible is not correct. The human’s job is not to review every output but to narrow the decision space.

Make the choices that cascade furthest — outcome hypotheses, architectural constraints, product decisions — and agents propagate the consequences through every dependent node.

Why outcomes are the right interface

Specs are not for planning. The spec is the product’s source of truth — what users should be able to do, expressed as testable assertions.

Outcome specs begin with a hypothesis: by [output], [outcome], resulting in [impact]. Enabler specs begin with what they enable. The output is what the software does — testable locally by assertions. The outcome is the change in customer behavior we expect — measurable only with real users. The impact is the business value: increase revenue, sustain revenue, reduce costs, or avoid costs. There is no outcome without an output.

Product managers and UX researchers already have the material — business goals, customer research, validated needs. But it lives in their tools, not where engineers and agents actually work. The outcome hypothesis bridges that gap. It translates what PM and UX know into a form that lives alongside the work — where every person and every agent encounters it at the moment of decision.

This isn’t optimized for any one agent or any one generation of AI tooling. The outcome hypothesis is a stable interface for human intent that will only become more valuable as agents get better at working with it.

The Spec Tree in practice

A git-native directory structure where each node co-locates a spec, its tests, and a lock file.

spx/
├── spx-cli.product.md
├── 15-cli-framework.adr.md
├── 21-test-harness.enabler/
└── ...

The product node anchors everything

Every Spec Tree starts with a single product file at its root. This file captures what the product is and why it exists — the value function that every other node must trace back to.

When product learning changes, the spec still contains why it exists. Every outcome traces to this root. Assertions validate outputs locally; outcome hypotheses are validated only through real customer behavior.

spx/
├── spx-cli.product.md
├── 15-cli-framework.adr.md
├── 15-tree-structure-contract.pdr.md
├── 21-test-harness.enabler/
└── ...

Decision records capture choices upfront

ADRs and PDRs sit alongside the specs they affect. ADRs capture architecture choices that constrain how outputs are produced; PDRs capture product constraints — pricing, compliance, retention — that must hold across a subtree.

When an agent needs context, it doesn’t guess — it reads the decision record on the path from root to its current node. Context is deterministic.

spx/
├── spx-cli.product.md
├── 15-cli-framework.adr.md
├── 21-test-harness.enabler/● valid
├── 32-parse-directory-tree.enabler/● valid
├── 43-node-status.enabler/● valid
├── 54-tree-interpretation.outcome/
└── ...

Enablers build infrastructure bottom-up

Enabler nodes (marked with .enabler) exist because at least one outcome node needs them — infrastructure that would be removed if all its dependents were retired. The numeric prefix encodes dependency order: lower numbers are dependencies for higher ones.

Index 21 (test-harness) must be valid before index 32 (parse-directory-tree) can be worked on. The tree encodes this constraint in the filename.

spx/
├── ...
├── 54-tree-interpretation.outcome/◐ stale
│ ├── tree-interpretation.md
│ ├── 21-parent-child-links.enabler/● valid
│ ├── 43-status-rollup.outcome/◐ stale
│ └── 54-tree-status.outcome/○ needs work
└── ...

Outcomes state hypotheses about customer change

Outcome nodes (marked with .outcome) each state an outcome hypothesis — by [output], [outcome], resulting in [impact]. Assertions test the output. The outcome is a change in customer behavior that only real users can validate. The impact is the business value — increase revenue, sustain revenue, reduce costs, or avoid costs.

When a spec changes, its lock file hash breaks. Parent nodes inherit the worst child state. Staleness bubbles up to the root — nothing hides. Three states — valid, stale, needs work — tell you exactly where the product stands.

## spx-lock.yaml
schema: spx-lock/v1
blob: a3b7c12
tests:
- path: tests/status.unit.test.ts
blob: 9d4e5f2
# When either blob changes,
# the node becomes ◐ stale

Lock files bind specs to evidence

Each node can have a spx-lock.yaml that records Git blob hashes for the spec and its tests. When either side changes, the hash breaks and the node is visibly stale — before anyone runs a test.

This is drift detection: the binding between spec and tests never silently decays.

What a node looks like

Every assertion links to the test file meant to prove it. The human writes the hypothesis and its assertions; the agent writes the tests and implementation to prove them.

line-items.md
## We believe that, by showing itemized charges per service, we will reduce billing support tickets by 40%, resulting in reduced support costs

### Assertions

- A multi-service invoice shows service name, quantity,
  and unit price per line
  ([test](tests/line-items.unit.test.ts))
- Invoice total equals the sum of all line items
  ([test](tests/line-items.unit.test.ts))
spx-lock.yaml
schema: spx-lock/v1
blob: a3b7c12
tests:
  - path: tests/line-items.unit.test.ts
    blob: 9d4e5f2

Every blob is a Git blob hash. The lock file records that the spec and its tests were in agreement at the time of writing — not that they still are.

Edit the spec and its blob hash changes. The node is visibly stale — before anyone runs a test.

Deterministic context injection

The spx CLI walks the tree from the product level down to the target node and applies one structural rule: at each directory along the path, inject all lower-index siblings’ specs. Ancestor specs along the path are always included. Test files are excluded.

context for 43-status-rollup.outcome
spx/
  spx-cli.product.md                          <-- included
  15-tree-structure-contract.pdr.md            <-- included
  15-cli-framework.adr.md                      <-- included
  15-randomized-test-generation.adr.md         <-- included
  21-test-harness.enabler/
    test-harness.md                            <-- included
  32-parse-directory-tree.enabler/
    parse-directory-tree.md                    <-- included
  43-node-status.enabler/
    node-status.md                             <-- included
  54-tree-interpretation.outcome/
    tree-interpretation.md                 <-- included (ancestor)
    21-parent-child-links.enabler/
      parent-child-links.md                    <-- included
    43-status-rollup.outcome/                  [TARGET]
      status-rollup.md                         <-- included
      tests/                                   -- not included
    54-tree-status.outcome/                -- not included
  76-cli-integration.outcome/                  -- not included
  87-e2e-workflow.outcome/                     -- not included

The agent sees exactly the context the tree provides. It doesn’t search the codebase for “prior art”; the tree provides the authoritative context deterministically.

If the deterministic context payload for a single node routinely exceeds an agent’s reliable working set, the tree is telling you the component is doing too much. The structure forces architectural boundaries.

The operational loop

Three commands form the core loop: inspect state, lock validated nodes, and verify cheaply.

terminal
$ spx status --tree 54-tree-interpretation.outcome
54-tree-interpretation.outcome/    needs work (no lock file)
  21-parent-child-links.enabler/       valid
  43-status-rollup.outcome/            stale
    status-rollup.md changed           (was a3b7c12, now 5e9f1d8)
  54-tree-status.outcome/          needs work (no lock file)

spx status reads the tree and shows node states without running tests. Stale nodes show which file changed and the old vs. new hash.

terminal
$ spx lock 54-tree-interpretation.outcome/43-status-rollup.outcome
Running tests...
  tests/status.unit.test.ts            3 passed
Lock regenerated: 43-status-rollup.outcome/spx-lock.yaml

spx lock runs tests and writes the lock file — only when all pass. If tests fail, the existing lock file stays unchanged.

terminal
$ spx verify --tree 54-tree-interpretation.outcome
54-tree-interpretation.outcome/    needs work
  21-parent-child-links.enabler/       valid
  43-status-rollup.outcome/            valid
  54-tree-status.outcome/          needs work

spx verify compares hashes without running tests — cheap verification that the lock still holds.

What changes in practice

Assertions have evidence

Every assertion either has a passing test bound by a current lock file, or it doesn’t. You always know which assertions have evidence behind them and which don’t.

Change propagates

Edit a spec and every lock file in its subtree becomes stale — the structure shows you the blast radius before you run a single test.

Consistent context

Agents work from the same context every time — no drift between runs. The path from root to node defines what the agent sees, not what a keyword search happens to return.

Staleness is visible

When the evidence is stale, you see it before anything breaks — not after. A mismatch between the lock file and the current state does not mean behavior changed — it means the evidence needs refreshing.

Build with outcomes, not just features

The Spec Tree is open source. Explore the methodology, try the CLI, or start building with it in your next project.