A humorous yet practical guide to AI-assisted development. DON'T PANIC.
View the Project on GitHub HermeticOrmus/hitchhikers-guide-to-vibe-engineering
Risk Level: 🟠 Danger (if you fail most questions)
MVP SANITY CHECK (n.): A diagnostic test for vibe-coded products that have reached real users. Reveals whether you’re building on solid ground or standing on a trapdoor. Best administered before investors, traffic spikes, or that TechCrunch mention.
After reviewing 12+ vibe-coded MVPs in a single week, the same issues appear every time someone opens the code:
Day 1 database looks fine. Day 15 you’ve got:
nullable everywherestatus, state, is_active, enabled all meaning similar thingsThe Test:
Can you draw your core tables + relations on paper in 5 minutes?
If not, you’re already in trouble. The AI created schema fragments per-feature without maintaining a coherent whole.
The Fix:
-- Document your actual data model
-- Not what you think it is, what it actually is
\d+ users
\d+ orders
\d+ payments
-- Find the drift
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'users';
-- Look for: duplicate concepts, missing FKs, no indexes
AI-generated flows assume perfect input order. Real users don’t behave like that:
Most founders don’t notice until support tickets show up.
The Test:
Take your most critical flow. Try to break it in 5 different ways.
[ ] Double-click every button
[ ] Refresh during async operations
[ ] Complete flow in 2 tabs simultaneously
[ ] Start flow, wait 24 hours, continue
[ ] Use browser back button at every step
[ ] Submit with slow network (Chrome DevTools → Slow 3G)
The Fix:
This one kills teams. No logs, no tracing, no way to answer:
“What exactly failed for this user?”
Founders end up re-prompting the AI blindly and hoping it fixes the right thing. It rarely does—most of the time it just moves the bug somewhere else.
The Test: A user reports “it didn’t work.” How long until you know:
| Time to Answer | Status |
|---|---|
| < 2 minutes | 🟢 You have observability |
| 2-30 minutes | 🟡 You have logs somewhere |
| > 30 minutes | 🔴 You’re debugging blind |
The Fix:
// Minimum viable observability
const logger = {
info: (event, data) => console.log(JSON.stringify({
timestamp: new Date().toISOString(),
level: 'info',
event,
...data
})),
error: (event, error, data) => console.error(JSON.stringify({
timestamp: new Date().toISOString(),
level: 'error',
event,
error: error.message,
stack: error.stack,
...data
}))
};
// Use it everywhere
logger.info('checkout_started', { userId, cartId, items: cart.length });
logger.error('payment_failed', error, { userId, amount, provider: 'stripe' });
Apps look scalable until you map cost per user action:
| Service | Looks Like | Actually Costs |
|---|---|---|
| Avatar generation | “Free tier” | $0.02/avatar |
| AI completion | “Cheap” | $0.01-0.10/call |
| Media processing | “Just S3” | $0.05/minute video |
| Email + SMS | “Pennies” | $0.01-0.05/message |
Fine at 100 users. Lethal at 10,000.
The Test:
Do you know your cost per active user?
Monthly API costs: $___
Monthly active users: ___
Cost per MAU: $___
At 10x users, monthly cost: $___
Can you afford that? [ ] Yes [ ] No [ ] I don't know
The Fix:
# Log costs, not just calls
def call_openai(prompt, user_id, feature):
response = openai.complete(prompt)
tokens = response.usage.total_tokens
cost = tokens * 0.00002
metrics.increment('api_cost_cents',
value=cost * 100,
tags={
'provider': 'openai',
'feature': feature,
'user_id': user_id
}
)
return response
Then build a dashboard: cost by feature, cost by user, cost trend over time.
AI touching live logic is the fastest path to “full rewrite” discussions.
Every stable product we’ve seen:
Most vibe-coded MVPs: AI edits the live codebase, deploys directly to production, crosses fingers.
The Test:
Can you safely change one feature without breaking another?
[ ] Changes go to a branch first
[ ] Branch is tested before merge
[ ] Production has rollback capability
[ ] Feature flags control exposure
[ ] Database migrations are reversible
If most are unchecked, you’re one bad AI suggestion from disaster.
The Fix: Minimum viable separation:
main branch → staging deploy → test → production deploy
↑
feature branches (AI works here)
Never let AI commit directly to main. Never deploy untested changes.
If you’re past validation and want to sanity-check your app:
| Question | Yes | No |
|---|---|---|
| Can you explain your data model clearly? | ✅ | 🚨 |
| Can you tell why the last bug happened? | ✅ | 🚨 |
| Can you estimate cost per active user? | ✅ | 🚨 |
| Can you safely change one feature without breaking another? | ✅ | 🚨 |
Scoring:
“Should we stabilize early, keep patching, or wait until things break badly enough to justify a rewrite?”
Stabilize early if:
Keep patching if:
Wait for rebuild if:
Most teams wait too long. The best time to stabilize is right after you’ve validated the core value prop, before growth makes every fix urgent.
“AI built your MVP. You decide if it survives.”
The AI optimized for shipping features. It didn’t optimize for:
These are your job now.
This week:
If the results scare you, that’s good. Scared founders fix things. Comfortable founders get surprised.