The Hitch-Hiker's Guide to Vibe Engineering

A humorous yet practical guide to AI-assisted development. DON'T PANIC.

View the Project on GitHub HermeticOrmus/hitchhikers-guide-to-vibe-engineering

Chapter 9: The Disaster Area

“Disaster Area was a plutonium rock band from the Gagrakacka Mind Zones. Their concerts were so loud they were performed in total vacuum. Their songs made grown men weep, planets crack, and insurance actuaries faint.”

Vibe-coded projects that reach paying customers produce similar effects on infrastructure teams.


The Success Paradox

Here’s a story that happens every week:

A founder hits 1,000 paying users. The app feels snappy. Customers love it. But the backend is one deploy away from a meltdown.

The vibes were good. The code worked. Then reality arrived.

This chapter is about surviving success—the moment when “it works for now” meets “now is no longer enough.”


The Twelve Warning Signs

1. The Database Grew Teeth

Tables that started clean now have six boolean flags called is_done, is_finished, is_complete. Same idea, different sprint, different AI session.

Queries run full table scans because no one added indexes since day 3.

The Check:

-- PostgreSQL: Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

If the same SELECT runs 800ms, you’re already in the danger zone.

The Fix:


2. Env Files Hold Secrets That Should Never See Git

Stripe keys, OpenAI tokens, JWT secrets—all sitting in .env.example ready to be copied to the next intern’s laptop.

The Check:

# Find secrets that might be committed
git log -p | grep -i "sk_live\|api_key\|secret"

The Fix:


3. Background Jobs Share the Same Server as Web Traffic

One user uploads a 50MB CSV and the whole sign-up flow slows to a crawl. The AI didn’t think about this because it was solving your immediate problem.

The Check:

The Fix:


4. You Have No Idea Which API Call Costs the Most

OpenAI, Stripe, SendGrid, Twilio—external calls add up. But which user actions trigger them? Which ones cost dollars per request?

The Check: When an investor asks “what happens at 10x users?” can you answer with numbers or hope?

The Fix:

# Log every external call with cost
def call_openai(prompt, user_id):
    start = time.time()
    response = openai.complete(prompt)
    duration = time.time() - start
    tokens = response.usage.total_tokens
    cost = tokens * 0.00002  # adjust for model

    logger.info(f"openai_call",
        user_id=user_id,
        duration_ms=duration*1000,
        tokens=tokens,
        cost_cents=cost*100
    )
    return response

Map every external call to a user action. Log duration + cost. Now you can model scale.


5. Tests Exist But They Never Run on Prod Data

Seed scripts are cute until real users create edge cases the AI never imagined:

The Check: Run your test suite against a snapshot of production data (anonymized). Watch it burn.

The Fix:

# Schedule this nightly
jobs:
  prod-data-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Clone anonymized prod subset
        run: ./scripts/clone-anon-prod.sh
      - name: Run tests against real data
        run: npm test

Catches the weird null birthday bug before it hits Twitter.


6. Deploys Are Still Manual and Scary

If you SSH and git pull main, you will eventually:

The Check: How many steps is your deploy process? If it’s more than “merge PR and wait,” it’s too many.

The Fix:

# .github/workflows/deploy.yml
on:
  push:
    branches: [main]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build
      - run: npm run migrate
      - run: ./scripts/deploy-blue-green.sh

7. One Big Repo Holds Everything

User app, admin dashboard, landing page, blog—all in one repo. The AI kept everything together because that’s what your prompts implied.

The Check: Can marketing update the blog without risking user authentication?

The Fix:


8. No Circuit Breakers Around Third Parties

When SendGrid hiccups, your sign-up flow shouldn’t 500. When Stripe is slow, checkout shouldn’t timeout.

The Check:

# Simulate third-party failure
# Block sendgrid.com in /etc/hosts, try to sign up

The Fix:

const sendEmail = circuitBreaker(async (to, subject, body) => {
  return await sendgrid.send({ to, subject, body });
}, {
  timeout: 5000,
  errorThreshold: 50,
  resetTimeout: 30000,
  fallback: () => {
    // Queue for retry, show polite toast
    queue.add('email', { to, subject, body });
    return { queued: true };
  }
});

Users get a polite toast instead of a white screen.


9. Logs Are Noise, Not Signal

Printing console.log("here") in every catch block is not observability.

The Check: A payment fails. How long does it take you to find out why?

The Fix: Add one request-id header that follows the call through every service:

app.use((req, res, next) => {
  req.requestId = req.headers['x-request-id'] || uuid();
  res.setHeader('x-request-id', req.requestId);
  next();
});

// Every log includes it
logger.info('payment_started', {
  requestId: req.requestId,
  userId: req.user.id,
  amount: payment.amount
});

When a payment fails, you trace it in ten seconds.


10. You Measure Uptime But Not Latency

A 200 response that takes 4 seconds still kills conversion. The site is “up” but users bounce.

The Check: What’s your 95th percentile response time? If you don’t know, you don’t know.

The Fix: Set a simple SLA: 95th percentile under 600ms. Alert when it drifts.

// Track and alert on latency
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.histogram('http_request_duration_ms', duration);

    if (duration > 600) {
      logger.warn('slow_request', {
        path: req.path,
        duration_ms: duration
      });
    }
  });
  next();
});

Small fixes—eager loading, missing index, N+1 query—usually win you 8%+ activation improvement.


11. Feature Flags Live Only in Your Head

Ship a dark mode? Hope nobody notices if it’s broken. Test new pricing? Hard-coded values, full deploy to change.

The Check: Can you turn off a feature without deploying?

The Fix:

// Simple feature flags
const flags = {
  darkMode: process.env.FLAG_DARK_MODE === 'true',
  newPricing: process.env.FLAG_NEW_PRICING === 'true',
};

// In code
if (flags.darkMode) {
  applyDarkMode();
}

Or use LaunchDarkly, Flipper, Unleash for more control.

Flags let you deploy on Thursday and turn stuff on Monday after weekend metrics look calm.


12. The README Still Says “run npm i”

The Check: Time a new developer from git clone to “signed up a test user.” Over 30 minutes? You’re the only person who can release.

The Fix:

# docker-compose.yml
services:
  app:
    build: .
    ports: ["3000:3000"]
    depends_on: [db, redis]
    environment:
      - DATABASE_URL=postgres://...
      - STRIPE_KEY=sk_test_fake

  db:
    image: postgres:15
    volumes: ["./seed.sql:/docker-entrypoint-initdb.d/seed.sql"]

  redis:
    image: redis:7
# QUICKSTART
git clone && docker-compose up
# Visit localhost:3000, sign up with test@test.com
# Stripe test card: 4242424242424242

Anyone can checkout and sign up a test user in two commands.


The Pre-Scale Checklist

Before your investor demo, traffic spike, or Product Hunt launch:

Database Health

Security

Infrastructure

Resilience

Observability

Developer Experience

Operational Maturity


The 29-Day Transformation

A real story: vibe-coded MVP to investor-grade SaaS in 29 days.

Week 1: Foundation

Week 2: Reliability

Week 3: Operations

Week 4: Polish

The founder closed a pre-seed two weeks after demo day because the product looked stable, not lucky.


The Street Rule

“Vibes get you to market. Systems get you to scale. Know when to switch.”


The Disaster Area’s Lesson

Disaster Area’s concerts destroyed planets. But they were also wildly successful—the band just needed better infrastructure to contain the power.

Your vibe-coded MVP has power. Customers love it. It works.

Now build the infrastructure to contain it.

Because success without stability isn’t success—it’s a meltdown waiting for an audience.


Return to Chapter 1: DON’T PANIC and remember: if your production system is on fire, panic won’t help. This checklist might.