Dial-up taught me to respect every kilobyte

The sound is what I remember first: the screech and hiss of a 28.8k modem negotiating a connection. Then came the wait. You’d click a link to a promising image, and a gray box would appear. Slowly, painfully, the image would resolve itself, one horizontal line of pixels at a time. You could literally watch the kilobytes crawl across the wire. It was agonizing. It was also the best lesson in system architecture I ever received.

That wait wasn't an abstract metric on a dashboard. It was a physical experience. You felt the cost of every byte in your own wasted time. A 100 KB JPEG wasn't just a file size; it was thirty seconds of your life you couldn’t get back. This experience forged an intuition that has become more relevant today than ever, even in our world of gigabit fiber and endless cloud resources.

The Kilobyte Tax Shift

The Weight of a Packet

In those days, you learned to think about data payload as a physical constraint, like the width of a pipe or the strength of a bridge. Every decision had consequences. Should this image be a GIF or a JPEG? What’s the lowest acceptable compression level? Could three small icons be combined into a single image sprite to save the overhead of multiple HTTP requests? These weren’t academic questions; they were the difference between a usable site and an abandoned one.

We measured progress in bytes saved. Shaving 10 KB off a page felt like a monumental victory. This mindset forces a kind of ruthless pragmatism. You learn to distinguish between what is essential and what is merely decorative. The core discipline is simple: send only what is necessary, in the most compact form possible. It’s a lesson that applies directly to the architecture of the massive, distributed systems we build today.

The bottleneck has moved from the user’s phone line to the invisible connections between our own microservices, data warehouses, and AI inference endpoints. The symptoms are just harder to see, but the costs are very real.

How Fat Pipes Made Us Lazy

For a while, bandwidth seemed to become a solved problem. As connections got faster and cloud infrastructure became the default, the pressure to be frugal with bytes evaporated. Why spend an hour optimizing an API payload when you could just spin up a bigger instance? Storage is cheap, and networks are fast, right? This line of thinking often prioritizes developer velocity and immediate delivery over long-term efficiency and cost, a trade-off that rarely holds up in production.

This led to a generation of lazy patterns. I've seen it countless times: an API endpoint meant to fetch a user's name and email that instead serializes the entire user object from the database—including fields for permissions, last login location, and authentication tokens. The frontend just throws away the 95% of the data it doesn't need. The developer saved ten minutes by writing return user;, but the system pays a tax on that convenience forever, on every single call.

Multiply this by thousands of services making millions of calls per day, and you're not just wasting bandwidth. You're wasting CPU on serialization and deserialization. You're increasing latency. You're bloating log files. You're running up a cloud bill for data egress that is orders of magnitude higher than it needs to be. We traded craftsmanship for short-term convenience and forgot to check the price tag.

The New Bottleneck in AI and Data

Now, the weight of the kilobyte is back with a vengeance, especially in the converged world of software, data, and AI. The bottlenecks are no longer just about user-facing load times; they are deep within the architecture, and they are punishing.

Consider an agentic AI system. An LLM agent often works in a loop: it receives a state, thinks, and then emits an action and an updated state. This state, or context, is sent back and forth with every turn. If that context payload is a bloated 200 KB JSON object when it could be a concise 20 KB summary, the impact is compounding. As illustrated by LLM provider pricing models, you pay for more tokens. You pay for the network transfer. You increase the processing latency on both ends. An otherwise clever agent can be rendered slow and expensive by sloppy data handling.

The same is true in data engineering. Shuffling terabytes of uncompressed JSON between stages of a distributed data processing job is ruinously expensive. In my experience, converting that data to a columnar format like Apache Parquet can significantly reduce storage and processing costs, often by 80% or more, as described in various industry articles on columnar storage benefits. Why? Because columnar formats only read the data you actually need for the query. It's the same principle as the dial-up icon sprite: don't move bytes you don't need.

Reclaiming the Kilobyte Mindset

Adopting this "kilobyte mindset" doesn't mean premature optimization. It means making conscious architectural choices that value efficiency by default. It’s about building systems that are durable and cost-effective, not just functional.

For APIs: Don't just return the database model. Use a dedicated Data Transfer Object (DTO) that contains only the fields the consumer needs. As Martin Fowler explains, DTOs help separate the domain model from the presentation model. Consider technologies like GraphQL when clients have wildly varying data requirements, but remember that a well-designed REST API with field selectors can be just as effective and much simpler.
For Data: Never accept "just dump the JSON" as a long-term strategy. Choose the right format for the job. Use Protobuf or Avro for high-performance RPC between services. Use Parquet or ORC for analytical data at rest. The difference in performance and cost is not incremental; it's transformative.
For AI: Treat the context window of an LLM as the precious, expensive resource it is. Before sending a payload to an agent, aggressively summarize and filter it. Extract only the essential entities and information required for the next step. Every token saved is money and time saved.

This discipline forces clarity. When you have to justify every field in a payload, you gain a much deeper understanding of what the system is actually trying to accomplish. It’s a forcing function for better design.

Lean Data Architecture for AI

That agonizing wait for a JPEG to load in 1996 taught me that data has mass. It takes energy to move it. It takes time. Forgetting that lesson is a luxury we can no longer afford. The most robust, scalable, and economical systems will be built by those who, even with a gigabit connection, still respect the kilobyte.