jcardena.com Blog apps.jcardena.com: proof, not screenshots — my personal Live Lab
145 posts
EN ES

apps.jcardena.com: proof, not screenshots — my personal Live Lab

Web

Explore why a personal live lab is crucial for modern architects. Discover the trade-offs between local simulation and a live environment for testing AI and data systems.

apps.jcardena.com: proof, not screenshots — my personal Live Lab

There is a gap between a clean architecture diagram and the 3am reality of a production alert. In the diagram, the boxes connect and the arrows flow. In reality, an expired certificate or a misconfigured IAM policy brings it all down. This is the gap where theory breaks and the most durable lessons are learned.

For me, crossing that gap requires a place where the entire system—the code, the infrastructure, the network—can live, breathe, and fail in the wild. That place is my personal live lab. It is not a portfolio of polished successes, but a workshop for finding the failure modes that matter.

Initial ConceptLocal SimulationDeploy to Live LabObserve BehaviorRefineArchitecture
From Idea to Live Proof

When a Live Lab Is Non-Negotiable

A Git repository is a blueprint. A high-fidelity local environment using tools like Docker Compose or a local Kubernetes cluster can build a fantastic model of the building. You can test application logic, component interactions, and even simulate dependencies. For 80% of day-to-day development, this is faster and more efficient.

But some problems only appear at the seams. A live lab isn’t for testing code logic; it’s for testing the system’s interaction with the real world. It becomes necessary when you need to validate:

  • Real Network Latency: How does your system behave when a key API response sometimes takes three seconds instead of 30 milliseconds?
  • Cloud-Native Services: You can't truly test an application's IAM role without talking to the actual cloud provider's identity service. The same goes for CDN caching behavior or firewall rules.
  • Infrastructure Automation: The real test of your infrastructure-as-code is whether it can successfully execute a plan against a live cloud environment. As Martin Fowler describes, this practice is foundational, and it only proves its worth upon execution.

A local setup tests the components. A live lab tests the connections between them, and those connections are where complex systems fail.

An Architecture for Learning

My lab isn't a single application. It's a collection of small, independent services on a shared, realistically configured platform. The goal is to experiment with the composition of different patterns—especially how to orchestrate agentic work with deterministic pipelines.

Everything is automated. The infrastructure is defined with Terraform and every git push to the main branch triggers a GitHub Action to build and deploy. The stack is simple: Cloudflare at the edge for DNS and workers, a managed container service for most applications, and managed Postgres and object storage for state. It’s not large, but it is a complete, operational environment.

This setup is designed to make the trade-offs of a design decision painfully obvious. For example, I built a service where an LLM agent summarizes articles. In local tests, a synchronous API call seemed fine. In the lab, the variable latency and cost of the agent made that pattern unworkable. The production-ready solution was to re-architect it into an asynchronous, queue-based system. This aligns perfectly with established patterns for resilient serverless design, like those outlined in the AWS Architecture Blog. The lab forced this better pattern by revealing the flaws of the simpler one.

Failure Modes You Can't Simulate

The most valuable output of the lab is breakage. These are not code bugs, but systemic failures that only manifest in a fully integrated deployment. I’ve learned more from outages than from successful deploys.

I’ve watched a minor dependency update in a base Docker image break a CI/CD pipeline. I’ve seen a misconfigured container autoscaling rule trigger a cost spike from a benign web crawler. I had a cloud provider firewall change block database access, taking three services down at once. These are the boring, practical realities of maintaining production systems.

This philosophy of learning from failure is the core of Site Reliability Engineering. The foundational Google SRE book on embracing risk treats failure not as an error, but as an expected condition that the system must be built to withstand. A live lab is the most effective way to uncover which of your assumptions are brittle before your customers do.

Building Your Own Proving Ground

This practice isn’t about scale or expense; it's about fidelity. A running system is the only source of truth. It’s where you harden theory into the earned opinions that come from watching things break and then fixing them for good.

EDGE & INGRESSPublic DNSCDN and WAFAPI GatewayCOMPUTE & PROCESSINGContainer ServiceLLM Agent JobsDeterministicPipelinesSecrets ManagementSTATE & STORAGEManaged DatabaseObject StorageMessage QueueOBSERVABILITYStructured LogsMetricsDistributed Traces
The Live Lab Reference Architecture

What to Remember

If you build and operate systems, creating your own small patch of the real internet is one of the most powerful tools for professional growth. Here is how to approach it:

  • Start Small, but Complete: A cheap virtual server, a domain name, and a few serverless functions are enough. The goal is a full stack, from DNS to data, not massive scale.
  • Automate Its Existence: Use infrastructure-as-code from day one. The lab itself should be a product of the deterministic automation you want to master.
  • Know When to Use It: Use fast, local environments for component logic. Graduate to the lab to test the seams: networking, cloud service integrations, and deployment pipelines.
  • Treat Failure as the Goal: The purpose is not uptime; it is learning. When something breaks, you have discovered a weakness in your architecture or process. A system that has never failed is a system you do not yet understand.
JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.