jcardena.com Blog Learning SQL properly, after years of faking it
145 posts
EN ES

Learning SQL properly, after years of faking it

Software

A senior architect's reflection on learning SQL the right way after years of faking it. Discover why a declarative model is critical for modern data and AI work.

Learning SQL properly, after years of faking it

I wrote my first SQL query nearly two decades ago. For at least fifteen of those years, I was a fraud.

I could get the data I needed. I could stitch together complex JOINs, use a GROUP BY, and filter with a WHERE clause. But I was operating on pure intuition and pattern matching. My mental model was that I was telling the database how to fetch the data, step-by-step. That fundamental misunderstanding was the source of a thousand quiet performance bugs and a constant, low-grade sense of impostor syndrome.

ImperativeGuessworkTell DB how tofetchDeclarativeThinkingDescribe the finalset
The Two Views of SQL

From Imperative Guesswork to Declarative Thinking

My early code was full of SQL built like a weekend Ikea project. I'd write a subquery here, another there, and bolt them together, hoping the structure would hold. I was thinking procedurally, as if writing a for-loop. "First get this list of users, then for each user, go find their posts." This is the mental model of an application developer, and it’s poison when applied to a relational database.

The turning point wasn't a new tool. It was finally reading E.F. Codd's 1970 paper, "A Relational Model of Data for Large Shared Data Banks." That’s when it clicked: SQL is not a language for scripting actions; it's a language for describing a final result set. You don't tell the database the recipe, you show it a picture of the finished meal. The entire job of the database engine—specifically, the query planner—is to figure out the most efficient recipe on its own.

This is the classic trap for many of us, often aided and abetted by Object-Relational Mappers (ORMs). ORMs are fantastic for developer velocity, but their abstraction can hide monstrously inefficient queries. Without understanding the declarative model, you're just generating SQL by remote control, blind to the consequences.

The Query Planner Is Your Real Co-worker

For most of my career, the EXPLAIN command was a magic incantation I'd run when a query was slow. It would spit out a wall of cryptic text, I'd see a "Full Table Scan," and conclude that I probably needed an index. Which index? On which columns? It was pure guesswork.

Learning to actually read a query plan is a non-negotiable skill. For anyone serious about this, I can't recommend Markus Winand's site Use The Index, Luke enough; it's a masterclass in cooperating with the database. You begin to see the story behind the query, understanding why the planner chose a Nested Loop join over a Hash Join, and what that implies about the data's cardinality.

In my experience, this is one of the most effective levers for performance tuning. A key reporting query I once saw took over an hour to run because it used WHERE YEAR(order_date) = 2024. This forced the database to run the function on every single row, making the index on order_date useless. Changing it to WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01' allowed the planner to use an index range scan. The query time dropped to under five seconds. No new hardware, just understanding how to cooperate with the planner.

Beyond 'All or Nothing': Understanding Isolation

If the query planner was a mystery, transaction isolation levels were a dark art. I knew to wrap related writes in a BEGIN and COMMIT block, thinking of it as a simple "all or nothing" mechanism. This is a dangerously incomplete understanding.

The real beast is concurrency. What happens when two processes try to read and write from the same rows at the same time? This is where isolation levels define the trade-offs between performance and consistency. Most databases default to READ COMMITTED, which prevents dirty reads but allows subtle anomalies like non-repeatable reads or phantom reads.

I was once burned by this on a financial ledger system. Two concurrent processes were calculating account balances. One read a balance, the other posted a new transaction, and then the first process read the same balance again and got a different value. The fix was a stricter isolation level, but understanding why required a deep dive into the guarantees the database was making. The official documentation, like the PostgreSQL chapter on Concurrency Control, is far clearer on these trade-offs than a dozen blog posts.

Why This Foundation Matters Now

This isn't just an old-timer's rant about fundamentals. A deep understanding of SQL is more critical now because the walls between software, data, and AI are dissolving.

The deterministic automation of a modern data stack runs on SQL. A poorly constructed dbt model doesn't just run slow; it can blow up a warehouse bill by forcing massive, inefficient data shuffles. The architect of that pipeline must understand the execution cost of their SQL, not just its logical correctness.

In the world of agentic systems, this is even more urgent. An LLM might generate a SQL query that is syntactically correct and returns the right answer, but is an absolute performance disaster. You can't build a reliable system on top of that without guardrails, and you can't build the guardrails if you don't understand the failure modes. This knowledge lets you validate, optimize, or reject the agent's output before it brings production to its knees.

DATA SOURCESApplication DBsEvent StreamsExternal APIsINGESTION & STORAGEETL PipelinesData WarehouseVector DatabasePROCESSING & LOGICDeterministic SQLModelsLLM AgenticWorkflowsData ValidationSERVING LAYERBI DashboardsOperational APIsAgent Tooling
Architecture of a Modern Data System

Concrete Steps to Stop Faking It

Moving from "faking it" to true understanding is a process. The steps are straightforward and have nothing to do with chasing the latest trendy database technology.

  • Internalize the Declarative Model. The next time you write a query, consciously describe the final dataset. Articulate it in plain English first: "I need the set of customers who..."
  • Make EXPLAIN ANALYZE Your Friend. Don't just run it; study it. Pick one slow query and spend an afternoon with it. Tweak it, add an index, remove one, and run EXPLAIN each time to see how the plan changes.
  • Learn Window Functions. If there's one technical SQL feature that separates the novice from the practitioner, it's window functions. They are a superpower for solving a whole class of problems elegantly and efficiently.
  • Respect the Manual. Pick one concept, like join algorithms or isolation levels, and read the official documentation for your database of choice. It's often the most direct path to clarity.

The Takeaway: From Scripter to Architect

The core lesson isn't a list of commands to memorize. It's a fundamental shift in perspective. You move from being a script-writer, dictating steps to a machine, to an architect describing a desired state. That mindset—of declarative thinking, of respecting the boring patterns, of collaborating with the system instead of fighting it—is the durable skill. It's the craft that endures long after the current frameworks have been replaced.

JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.