3. Validate

Test with real people. Capture behavioural signals. Separate anecdote from evidence.

Validate exposes the artefact to external reality. This mode replaces internal confidence with structured evidence. The objective is not to gather reactions, but to observe behaviour and measure outcomes against predefined success criteria.

Validation may include live user testing, pilot deployments, behavioural telemetry, synthetic agent simulations, or commercial commitment signals such as letters of intent or early revenue.

Validation without structure is just opinion collection with extra steps. The Build Loop gives you a repeatable protocol for running tests that produce trustworthy evidence.

Activities

Test plan setup — Define validation method (user test, agent test, data simulation). Set success metrics and thresholds.
Run testing sessions — 5–7 user sessions or 50+ synthetic runs. Capture pain, delight, and confusion signals.
Data synthesis — AI-assisted clustering of findings. Human sense-checking. Identify what works and what fails.
Validation review — Present results to stakeholders. Output: validated (or invalidated) hypothesis + learning report.

Validation approaches

Human testing (5–7 users) — Best for motivation, emotional response, whether the solution feels right. Qualitative + quantitative evidence.

Agent testing (50+ synthetic runs) — Best for friction points, edge cases, task completion rates. Quantitative evidence at scale.

Both (recommended) — Agents first to catch structural problems, then humans for behavioural and motivational signals.

Outputs

Structured evidence report (qualitative + quantitative)
AI-assisted pattern analysis
Confidence assessment on problem–solution fit
Ranked list of insights and risks
Measured comparison against predefined success metrics

Signal: Evidence Threshold

You can move from Validate to Decide when:

There is measurable behavioural confirmation (usage, intent, engagement).
Evidence aligns with predefined success metrics.
Key risks are reduced below acceptable thresholds.
Commercial or operational feasibility is demonstrated.

If evidence is weak, the decision is pivot or stop — not proceed.

4. Decide→