3. Validate
Test with real people. Capture behavioural signals. Separate anecdote from evidence.
Validate exposes the artefact to external reality. This mode replaces internal confidence with structured evidence. The objective is not to gather reactions, but to observe behaviour and measure outcomes against predefined success criteria.
Validation may include live user testing, pilot deployments, behavioural telemetry, synthetic agent simulations, or commercial commitment signals such as letters of intent or early revenue.
Validation without structure is just opinion collection with extra steps. The Build Loop gives you a repeatable protocol for running tests that produce trustworthy evidence.
Activities
- Test plan setup — Define validation method (user test, agent test, data simulation). Set success metrics and thresholds.
- Run testing sessions — 5–7 user sessions or 50+ synthetic runs. Capture pain, delight, and confusion signals.
- Data synthesis — AI-assisted clustering of findings. Human sense-checking. Identify what works and what fails.
- Validation review — Present results to stakeholders. Output: validated (or invalidated) hypothesis + learning report.
Validation approaches
Human testing (5–7 users) — Best for motivation, emotional response, whether the solution feels right. Qualitative + quantitative evidence.
Agent testing (50+ synthetic runs) — Best for friction points, edge cases, task completion rates. Quantitative evidence at scale.
Both (recommended) — Agents first to catch structural problems, then humans for behavioural and motivational signals.
Outputs
- Structured evidence report (qualitative + quantitative)
- AI-assisted pattern analysis
- Confidence assessment on problem–solution fit
- Ranked list of insights and risks
- Measured comparison against predefined success metrics
Signal: Evidence Threshold
You can move from Validate to Decide when:
- There is measurable behavioural confirmation (usage, intent, engagement).
- Evidence aligns with predefined success metrics.
- Key risks are reduced below acceptable thresholds.
- Commercial or operational feasibility is demonstrated.
If evidence is weak, the decision is pivot or stop — not proceed.