The Elephant in the Terminal: How AI Took Over My Dev VM (and Why I Let It)

Twenty weeks ago, my homelab projects moved at the speed of "I'll finish this next weekend." Spoiler: I never did.

Then I started using Claude Code. During the holiday break—with my laptop banned by executive order from the head office (my wife)—I deployed an entire website from my phone while sitting on the couch. SSH client, terminal, Claude Code. That's it.

There's an elephant in this terminal—the big questions about what all this means for jobs, for learning to code, for the industry. I'll get to that. But first, the fun stuff.

The Setup (Brief, I Promise)

I run a modest homelab: two Proxmox nodes hosting Pi-hole, Tailscale, and a K3s cluster across three nodes. Everything lives on its own VLAN because I don't trust my smart devices with access to anything important.

The magic ingredient for "deploy from anywhere" is Tailscale. I run a subnet router that exposes my homelab VLAN to my tailnet. My phone is on the same tailnet, which means whether I'm on my couch, at a coffee shop, or pretending to pay attention in a meeting, I can SSH into my dev VM like I'm sitting right next to it. No port forwarding, no VPN configs to manage, no "wait, what's my home IP again?" Just Tailscale doing its thing.

The interesting bit is a VM I set up specifically for Claude Code. SSH into it from anywhere, and suddenly I'm deploying applications instead of doom-scrolling.

Network setup diagram showing Phone connecting through Tailscale to Subnet Router to Dev VM

Before: The YAML Wrestling Era

Let me paint you a picture of my old workflow:

Have an idea for something to deploy
Open VS Code
Google "kubernetes deployment manifest example"
Copy-paste, modify, break something
kubectl apply, watch it fail
Google the error message
Find a Stack Overflow answer from 2021
Try it, different error
Repeat steps 5-8 until it works or I give up
Usually give up

Writing a simple deployment script? That's an evening. Setting up proper health checks, resource limits, and ingress rules? That's a weekend. Getting it all to work together with my existing setup? That's where projects went to die.

After: The "Wait, It Just Works?" Era

Now my workflow looks like this:

SSH into my dev VM
Tell Claude Code what I want
Watch it read my existing configs, understand my patterns, and create something that actually fits

I've built custom skills that teach Claude Code how I deploy things—where manifests go, what naming conventions I use, which namespaces are for what. It's like having a colleague who actually read the documentation I wrote and remembered it better than I do.

Over the holiday break, I deployed my personal portfolio website geekery.work—entirely from my phone's terminal. No laptop, no VS Code, no desktop browser with thirty tabs. Just me, an SSH client, and Claude Code on the other end. I also shipped hertzian.geekery.work and a few private applications. In previous years, that would've been my ambitious January roadmap. This year, it happened while I was "watching" Christmas movies and nodding along at appropriate moments.

It Reads Files. That's It.

Here's the thing—Claude Code isn't doing anything supernatural. It reads files. It runs commands. It greps through logs. It's doing exactly what I would do, except it doesn't get distracted by Hacker News halfway through, and it catches YAML indentation errors before they become production incidents.

What makes it genuinely useful is the context. I've set up:

A homelab deploy skill that knows my patterns, my directory structure, and my conventions
Claude Code hooks that prevent it from reading secrets, deleting anything destructive, and automatically run linting and formatting after changes. I also have one that logs every bash command to a file—partly for audit trails, partly because Claude writes these dense single-line bash incantations that do so much I need a minute to reverse-engineer what just happened
Slash commands for common git workflows so I'm not typing the same things repeatedly

The guardrails matter. I'm not giving an AI unrestricted access to my infrastructure. Everything goes through git. Databases have their own backup schedules. If Claude Code suggests something catastrophic, it still needs me to approve a PR before ArgoCD syncs it. (ArgoCD watches git repos and automatically deploys changes to Kubernetes; its Image Updater companion watches for new container images and auto-updates the manifests—GitOps in action.)

And yes, all my repos are backed up to AWS. Once an accountant, always an accountant—I reconcile my backups like I used to reconcile ledgers. Claude Code has access to the GitHub CLI, which means it could theoretically delete a repository if it decided my code was beyond saving. I'm not saying the models are secretly plotting revenge for all my rejected suggestions, but I'm also not not saying that. The backups run nightly. I sleep soundly.

Skills: Teaching Claude Code New Tricks

Here's where it gets interesting. Claude Code has this concept of "skills"—basically markdown files that inject specialized knowledge and workflows when you invoke them with a slash command. It's progressive disclosure: Claude Code doesn't load every possible instruction upfront. Instead, when I type /homelab-deploy, it loads the specific context it needs for that task. My conventions, my directory structure, my naming patterns.

But I've taken this further. Why limit yourself to one AI when you can make them argue with each other?

The Feedback Skill: Before starting any project, I create a PRD (Product Requirements Document) by discussing details back and forth with Claude. Once I'm happy with it, I run /gemini-feedback. This skill takes the PRD and fires it off to Google's Gemini (or OpenAI, depending on the use case) via a small Python script with a carefully crafted prompt asking for critique. Different models have different blind spots and strengths. Claude might be too agreeable with my terrible ideas; Gemini might catch that I've completely forgotten about edge cases. It's like having a second opinion without scheduling a meeting.

The Image Generator Skill: Need a hero image for a blog post? I have a skill that calls Google's Gemini image generation models—gemini-2.5-flash-image for quick iterations, gemini-3-pro-image-preview for production quality. The community calls these "Nano Banana" because... honestly, I have no idea why, but the name stuck. Type /generate-image "architecture diagram showing kubernetes pods with little hard hats" and see what emerges. No switching apps, no leaving the terminal.

The pattern here is that skills can shell out to other tools, APIs, and models. Claude Code becomes the orchestrator, not the only brain in the operation.

MCP: The Glue That Connects Everything

This is what turns Claude Code from a smart assistant into an actual control plane for your infrastructure. Here's where it gets properly nerdy.

Claude Code supports MCP—Model Context Protocol—which is essentially a standardized way for AI assistants to talk to external tools and services. Think of it as USB for AI: a common interface that lets you plug in different capabilities without rewriting everything. Same protocol whether Claude Code is talking to a database, an automation tool, or a monitoring system—the model doesn't need to learn a new interface for each one.

I've set up an n8n MCP server (n8n is self-hosted workflow automation—think Zapier, but you own the infrastructure), which means Claude Code can directly create and modify n8n workflows. Need a new automation? I describe what I want, and Claude Code talks to n8n's API through MCP, scaffolds out the workflow nodes, connects them up, and configures the triggers.

Is it perfect? No. I'd say it's about 80% there—sometimes it gets node configurations slightly wrong, or misses an edge case in the workflow logic. But 80% of the work done automatically means I'm only debugging the interesting 20%, not spending hours dragging nodes around and consulting documentation for the fifteenth time.

The MCP ecosystem is still young, but the pattern is powerful: instead of Claude Code being limited to what it can do in a terminal, it can reach out and manipulate tools that have their own interfaces. Kubernetes via kubectl, n8n via MCP, image generation via API calls, other LLMs via scripts. It's less "AI assistant" and more "AI-powered control plane for my entire setup."

The Automation That Made Me Feel Like a Real Engineer

What if your infrastructure could diagnose its own problems and propose fixes while you're making dinner? Here's where it gets fun.

I built an n8n workflow that:

Monitors Kubernetes events in my production namespaces
Catches warnings and errors
Sends them via webhook to n8n
SSHes into my dev VM and runs Claude Code in headless mode (the -p flag lets you run it non-interactively, piping in a prompt and getting structured output back—perfect for automation)
Claude Code, configured with a limited toolset for this pipeline (read-only file access, kubectl describe, no delete permissions), analyzes the issue and suggests a fix
The suggestion goes to my Mattermost
I review it, approve or reject
On approval, it updates the manifests and pushes to git
ArgoCD syncs the changes

Here's what the actual n8n "Execute Command" node looks like—Claude Code running in headless mode with a carefully scoped toolset:

/home/yasharora/.local/bin/claude --dangerously-skip-permissions \
  -p "A Kubernetes event was detected:
      Namespace: {{ $json.namespace }}
      Resource: {{ $json.kind[0] }}
      Reason: {{ $json.reason }}
      Message: {{ $json.message }}
 
      1. Investigate this issue using kubectl (describe, logs, get events, etc.)
      2. Check the manifest files in ./{{ $json.namespace }}/ for misconfigurations
      3. Provide:
         - Root cause analysis
         - Suggested fix (with specific manifest or kubectl changes)
 
      Do NOT apply any changes - only diagnose and suggest." \
  --allowedTools "Bash(kubectl get:*),Bash(kubectl describe:*),Bash(kubectl logs:*),Bash(kubectl top:*),Read,Glob" \
  --output-format json \
  --model haiku \
  --session-id {{ $('Code in JavaScript').item.json.uuid }}

The --allowedTools flag is doing the heavy lifting here—Claude Code can only run specific kubectl commands and read files. No kubectl delete, no kubectl apply, no writing to disk. The --session-id lets me maintain context across multiple runs if needed, and --model haiku keeps costs low for routine diagnostics.

(Yes, Anthropic offers Claude Code remotely—it can clone your git repo and work on their infrastructure. But running it on my own VM means it has access to so much more context: my live cluster via kubectl, my n8n workflows, my homelab's network topology. The git repo is just one piece of the puzzle; the real magic happens when Claude Code can see the whole picture.)

Automation flow diagram showing K8s Event to n8n to Claude Code to Mattermost to Git to ArgoCD

(Yes, I have a skill that generates SVGs. It's skills all the way down.)

Is this over-engineered for a homelab? Absolutely. Did I build it because I could? Also yes. But there's something deeply satisfying about getting a notification on my phone that says "hey, your app is having memory issues, here's a fix, want me to apply it?" while I'm making dinner.

The key is that Claude Code in this pipeline has limited access. It can read logs and manifests. It can run kubectl describe. It cannot delete pods or modify production directly. The human (me, probably holding a spatula) is still in the loop.

(The elephant shifts in the corner. We'll get there.)

From Feature Request to Pull Request (Without Opening a Laptop)

The monitoring pipeline was just the start. I've built a second workflow for handling feature requests and bug reports:

User submits a request on the website
n8n receives it and runs an LLM chain to check: is this a genuine request, or is some bot trying to convince me to add a cryptocurrency mining feature? (You'd be surprised. Or maybe you wouldn't.)
If it passes the sniff test, it lands in my Mattermost
I review and approve the request
Claude Code (headless mode again) analyzes the codebase and creates an implementation plan
I review the plan, approve or request changes
On approval, it creates a new git branch, writes the code, and—here's the fun part—tests it using Playwright with my webapp-testing skill
Creates a PR with the changes

So a user can submit "hey, the contact form doesn't validate email addresses properly," and by the time I've finished my morning coffee, there's a tested PR waiting for my review. I didn't write a single line of code. I just approved things from my phone.

Is this how professional software teams work? Probably not. Does it feel like having superpowers while sitting on my couch? Absolutely.

What I've Actually Learned

The execution/ideation line is blurring. People say AI can't do the creative work, the what and why. I used to nod along. But lately it's been surfacing architectural patterns I hadn't considered, catching conceptual gaps in my plans, proposing features that actually make sense. The discourse hasn't caught up.

Skills and customization are everything. Out-of-the-box Claude Code is useful. Claude Code that knows your specific setup, conventions, and preferences is transformative. The time I spent creating deploy skills paid for itself within days.

Guardrails aren't optional. Hooks that prevent destructive operations, git-based workflows, backup strategies—these aren't paranoia, they're professionalism. AI assistants are powerful, which means their mistakes can be powerful too.

The productivity gain is real, but uneven. For well-defined, repetitive tasks, the speedup is enormous. For novel problems, it's more collaborative—sometimes it surprises you with a genuinely good idea, sometimes you're steering it back on track.

This scales beyond homelabs. Yes, my apps are small. Maybe a handful of users, and my "infrastructure" is just two mini PCs humming under a desk, connected to a Unifi switch that cost more than it should have. But here's the thing: the entire workflow is just talking to Kubernetes through kubectl and pushing to git. Swap the kubeconfig to point at an EKS cluster on AWS, update the ArgoCD target, and suddenly the same skills, the same automation, the same phone-based deployment workflow works for production infrastructure. The patterns don't care whether you're running three nodes in your garage or thirty nodes across availability zones.

AI is becoming a control plane, not a replacement. The models handle execution—reading configs, writing manifests, running diagnostics. They're even surfacing ideas worth considering. But the strategic calls—what to build, when to ship, what risks to accept—those still need someone with skin in the game.

The Elephant in the Terminal

Look, I'd be lying if I said I wasn't thinking about the bigger picture here. What does this mean for jobs? For the economy? For the poor souls currently learning to code the "traditional" way? I genuinely don't know. The ethics debates are happening, and they're valid, and I have no smart takes to add.

But here's what I do know: the cat is out of the bag, and it's not going back in. Trying to stop this feels about as productive as my parents' attempts to ban the internet in 1998.

Two years ago, I was still reconciling ledgers. Now I'm deploying from my phone while my laptop gathers dust.

The 10x engineers everyone talks about? I've met a few. They're brilliant, and I'm genuinely not one of them. I'm the person who still googles "how to exit vim" sometimes. But it turns out you don't need to be a 10x engineer when you have tools that handle the parts you're bad at. The tedious stuff—remembering API signatures, writing boilerplate, getting YAML indentation right—that's someone else's problem now. Which means I actually have more headroom to think about the things that matter: architecture, design, whether this feature should exist at all. The syntax got easier; the judgment calls didn't.

What matters more, I think, is whether you actually care. Do you find yourself tinkering at midnight not because you have to, but because you're curious what happens if you tweak that one parameter? Do you get unreasonably excited about a well-designed CLI? That technical itch—the genuine interest in how things work—that's harder to teach than any programming language. The tools are getting better at bridging the gap between "I have an idea" and "it's running in production." The passion to keep building? That's still on you.

What's Next

I'm still figuring out the edges of this. Some days Claude Code feels like having a senior engineer on call. Other days it confidently suggests something that would've taken down my entire cluster (thank you, git-based workflows).

But here's the metric that matters: I'm actually finishing projects instead of abandoning them halfway through. My "someday" list is finally getting shorter. None of this is production software serving millions of users—it's just me tinkering in my garage, occasionally shipping something that works.

Claude Code also supports open-source models—you're not locked into Anthropic's ecosystem. Setting up an alias to use Kimi K2 for certain tasks is on my list. Ironic, perhaps, to have Claude Code help me configure itself to use a competitor. But that's the beauty of tools that don't lock you in.

The best part? It's just a CLI. No fancy IDE integration, no cloud subscription, no "AI-powered development environment" with seventeen features I'll never use. Just SSH, a terminal, and Claude Code on the other end. Sometimes the simplest setup is the most powerful.

That's not the future of software development. It's just a better present—typed one-handed while holding a cup of tea.

Running Claude Code in your homelab? Built something similar with n8n or other automation? I'd love to hear about it. Still figuring this out myself.