How I designed a human-in-the-loop AI system that increased study launch rates by 78%, making it easier for researchers to move fast without losing control or rigor.
LAUNCH CONVERSION
time to launch
answer richness
faster insights
Executive summary – 90 second skim
AI-powered user research platform. I led the end-to-end design of an AI-moderated, voice-based survey workflow as founding product designer.
PMs, designers, marketers running studies
Respondents in voice-based sessions
Many users created a study but did not launch — due to three compounding issues:
Make setup a human-in-the-loop decision system. Increase transparency over configurability, improve intake signal quality, and use voice moderation to drive deeper participant responses.
Shipped the core end-to-end flow in 2 weeks with a small team (3 engineers). Prioritized setup → launch conversion and trust-critical review moments. Supported async stakeholder review via export/share artifacts (download + share link), while deferring in-product collaboration (approval states, commenting, co-editing) to the next iteration.
The intersection of AI capability, user trust, and business growth
Satellica is an AI-powered user research platform that helps teams run interviews, surveys, and usability tests at scale. Using AI to moderate conversations, ask adaptive follow-up questions, and synthesize responses in real time, Satellica enables teams to generate insights in hours instead of weeks.
I joined as the founding product designer for the AI Voice Survey lifecycle, owning the end-to-end experience across intake, study planning, recruitment, and pay to launch. The challenge was not simply automation, but designing decision-making with AI so researchers could move fast without losing control or rigor.
Conversational setup that structures intent naturally
Review & control surface for AI-generated study plans
Decision-first configuration for participants
Streamlined checkout with clear next steps
Confident launch with guardrails and recovery paths
Many users could create a study, but never launched
Many users could create a study, but hesitated to launch due to low trust in AI-generated plans and high setup friction. I break the problem into business, user, and system layers to guide the strategy and design decisions that follow.
How might we help researchers generate deeper insights faster while keeping them in control of AI-driven decisions?
What success looks like — for the business and the user
I defined success through two lenses: business outcomes, including launch rate, time-to-value, and voice adoption, and experience outcomes, including trust, control, and response depth, using them to guide every design trade-off that followed.
How I approached the problem
I translated the goals into a human-in-the-loop workflow that reduces setup friction without sacrificing rigor, then validated the approach through interviews, usability tests, and early pilot signals to focus the design on the moments that most directly determine launch.
How I made this AI system reliable
I designed the system as a human-in-the-loop workflow with explicit decision rights, layered guardrails, and clear recovery paths, so the AI could accelerate setup without creating fragile or “black box” moments. I then pressure-tested the architecture with prototypes and early pilot usage, tuning the workflow for predictable outcomes under real constraints like model variability, latency, and cost.
a study plan, question structure, and recommended defaults.
and edits key decisions.
guardrails to prevent invalid configurations.
Clear attribution of AI-generated content and what changed.
Avoid leading questions and overconfident conclusions.
Make voice recording expectations explicit and avoid unnecessary data capture.
Do not auto-generate conclusions beyond what the study design can support.
How I de-risked AI behaviour before launch
I validated both the user experience and the underlying AI behaviour by combining usability testing at trust checkpoints with targeted failure-mode scenarios, then iterating on prompts, guardrails, and recovery actions until users could confidently diagnose issues and move forward.
| AI output | Quality bar | How I evaluated |
|---|---|---|
| Study plan draft | Complete structure, decision-ready, low bias risk, matches stated intent | Human rubric scoring in usability sessions + pilot review of edits and regenerate behavior |
| Follow-up probes (voice moderation) | Non-leading, specific, increases depth without derailing | Failure-mode scripts (shallow/off-topic/silence) + qualitative review of probe sequences |
| System feedback and guardrails | Explains issues clearly, prevents invalid states, offers recovery paths | Scenario testing on contradictions and edge cases + usability observation of recovery success |
How the work moved the metrics that matter
I tied the design decisions back to measurable outcomes across activation, trust, and research quality, using early pilot funnel signals to quantify what improved and where the remaining drop-offs were.
What I’d carry forward
I distilled the work into a set of reusable principles about designing with AI, plus what I would change next time to strengthen trust, quality, and adoption.
Making something easy to use doesn’t automatically make it trustworthy. They require different design interventions — friction reduction vs. control and transparency.
Not just a philosophical principle — it needs to manifest as a clear UI contract: what the AI did, why, what you can change, and how to recover.
Optimized for configurability too early → caused decision fatigue → moved to decision-first hierarchy.
AI felt like a black box → added review checkpoints and explicit controls.
Didn’t design enough for failure → added recovery paths and clearer error states.