Home • About • Projects • Blog • Resources • Contact

Update: TinyLaunch is now Launchling! Same idea, new name and better experience.

🎯 It started as an experiment — and turned into something more

TinyLaunch began as a personal challenge. I wanted to upskill in AI: both in terms of tooling and building AI products.

The idea, an AI-powered tool to help non-technical founders get from fuzzy idea to actionable next steps, felt meta in the best way: I was the target user. I could dogfood every decision.

But once I started using it, I realised… this might actually be useful. Not just to me, but to a whole group of people who want to start small, and smart — without drowning in options, jargon, or tech debt.

And if it was useful, then I had a responsibility to check:

Is TinyLaunch actually helping?

Or is it just another tool that sounds good, but doesn’t deliver?

🧠 Enter: AI-based evaluation

I didn’t just want anecdotal feedback. I wanted structure. So I built a system to evaluate every plan TinyLaunch generates, using AI itself. Now, every plan goes through a second GPT-powered process. Not to generate content — but to score it.

For each plan, I ask GPT to analyse:

Category	What I’m Measuring
Tool Fit	Are the suggested tools actually suited to the user’s skills and budget?
Help Alignment	Does the plan focus on the areas they asked for help with?
Clarity	Is it easy to understand and act on?
Personalisation	Does it feel like it was written for them — or anyone?

The result is a structured scorecard in JSON, plus notes on strengths and weaknesses.

📊 What I’ve Learned So Far

After evaluating 11 real user plans:

🧠 Avg Personalisation Score: 4.36 / 5
🛠️ Tool Fit: 4.09 / 5
❗ Help Alignment: 3.73 / 5

The strongest area is Personalisation. GPT is doing a great job tailoring output to users’ skills and context. Tool suggestions are solid too, with very few mismatches. But there’s clearly work to do in making sure the advice directly reflects what users said they needed help with.

The most commonly missed needs?

💬 Building confidence
🛠️ Tech decision-making
❓ Problem validation

The evaluation insights now guide how I improve both the prompt and onboarding experience. This is now my north star for the product.

🔧 What I’m Doing About It

Finding	What I’m Changing
📉 Help Alignment lag	Updating the prompt to better tailor advice to the user’s stated help area
🧱 Confidence-building is weak	Adding starter case studies, motivational CTAs, and “Why this is doable” copy
🔧 Tech decisions unclear	Exploring tool comparison links, decision-tree flows, and a “Which tool is right for me?” quiz
🔄 Personalisation strong	Leaving it alone — but keeping an eye on prompt tweaks to make sure it holds up

This gives me a focused roadmap: tighten the prompt, improve guidance, and protect what’s already working well.

⚙️ How it works

I used:

Airtable to store user input and plans
A Python script to export data and send prompts to GPT-4o
Another script to analyse the results across all users

The evaluator returns structured scores, flags weak spots, and even highlights tools or help areas that feel mismatched.

It also lets me:

Track improvements over time
Catch regressions when I tweak the prompt
Spot patterns in where plans succeed or fall short

🚧 Why bother?

Because I didn’t build TinyLaunch to chase a trend, I built it to learn - and that doesn’t just come from building, it comes from stress-testing what you build.

This approach has helped me:

Improve my prompting (and see when it fails)
Identify low-signal onboarding questions
Think harder about trust, quality, and UX for early-stage AI products
Build a lightweight feedback loop that’s scalable

All while continuing to upskill in Python, data handling, and prompt engineering.

💬 What’s next?

This evaluator is now baked into my process. I’ll be:

Using it to track improvements as I tweak prompts and onboarding
Sharing aggregated insights soon (beyond just score averages)
Possibly exposing a version of the evaluator directly to users (e.g. “Plan Confidence Score”)

And I’m thinking hard about what good looks like here — not just “it runs,” but “it helps.”

✨ Final thought

TinyLaunch started as an excuse to play with tools. Now it’s a playground for learning how to evaluate, iterate, and ship AI products with more thought.

If you’re building with AI — consider not just what it outputs, but how you’ll know if it’s working.

Output is easy. Usefulness is harder.

That’s what I’m hoping to optimise for.

🪂 Try it now

Turn your idea into a tiny, tangible prototype — with a little help from Launchling.

👉 Try Launchling – no signup, no jargon, just a gentle push forward.

From Side Project to Stress Test: Using AI to Evaluate TinyLaunch