
Where Do All Your Try-Catch Blocks Go Wrong?
Developers write an estimated 100 billion lines of code annually—yet error handling remains the most misunderstood and inconsistently implemented aspect of software engineering. While everyone agrees exceptions need catching, few teams agree on where to catch them, what to do once caught, or how to prevent them from swallowing critical debugging information. This post examines the patterns that separate resilient applications from the ones that fail silently at 3 AM.
Why Does Swallowing Errors Feel Tempting (And Costly)?
The empty catch block is the original sin of defensive programming. You have seen it—maybe written it yourself. A try-catch wrapped around a flaky API call, the catch block containing nothing but a mental note to "handle this later." Later never comes. The application continues running, but data silently corrupts, users see stale information, and your logs contain exactly zero evidence of what went wrong.
This pattern persists because error handling feels like overhead. When sprint deadlines loom, handling edge cases competes with shipping features. The catch block becomes a graveyard for "this shouldn't happen" scenarios—except they do happen, usually in production, usually when the original author is on vacation.
Modern languages and runtimes give us better tools than ever. Structured logging, distributed tracing, and error tracking services like Sentry or Honeycomb transform error data from noise into actionable intelligence. The key is treating errors as first-class data—not interruptions to suppress but signals to instrument.
What's Wrong with Throwing Generic Exceptions?
JavaScript's Error. Java's Exception. Python's bare raise. Generic exceptions are the junk food of error handling—quick, easy, and unsatisfying. When every failure throws the same base type, calling code cannot distinguish between "network timeout" and "database constraint violation" without parsing string messages. That's fragile.
Custom error types pay dividends. In TypeScript, extending the Error class lets you attach metadata—HTTP status codes, user-facing messages, retry eligibility. Rust's Result type forces explicit handling at compile time. Go's error wrapping (introduced in Go 1.13) preserves context chains without the verbosity of previous approaches.
The rule is simple: if different errors warrant different responses, they warrant different types. A payment gateway timeout should trigger automatic retry logic. A validation failure should return immediately with field-level feedback. Conflating these into one generic "something broke" exception forces downstream code to guess—or worse, guess wrong.
How Many Try-Catch Blocks Are Too Many?
Codebases swing between two extremes: exception phobia (every function wrapped in defensive catches) and exception promiscuity (throws bubbling up ten call layers). Neither serves maintainability. The former obscures business logic under layers of error scaffolding. The latter creates debugging archaeology—tracing an exception back through six abstraction layers to find the actual failure site.
The boundary principle offers guidance. Catch at layer boundaries: API controllers, service entry points, job workers, event handlers. Within a layer—especially within a single transaction or logical unit—let exceptions propagate. Your data access code should not catch exceptions it cannot meaningfully handle. Your HTTP middleware absolutely should catch them to format proper 500 responses.
This pattern aligns with how modern frameworks operate. Express error-handling middleware. Spring's @ControllerAdvice. FastAPI's exception handlers. They provide centralized catch points precisely so your business logic stays clean. Use them. Writing try-catch in every repository method duplicates boilerplate and creates inconsistent error responses.
Should You Retry Failed Operations Automatically?
Transient failures are the majority of production errors. Network blips. Database connection pool exhaustion. Third-party API rate limiting. The instinct to retry is correct; the implementation often isn't. Naive retries—immediate, unlimited, non-idempotent—cause more problems than they solve.
Exponential backoff with jitter prevents thundering herds. Idempotency keys ensure retried payments don't double-charge. Circuit breakers (popularized by Martin Fowler) stop the bleeding when downstream services are clearly unhealthy. These patterns belong in infrastructure code, not scattered through business logic.
Libraries like Polly for .NET, Resilience4j for Java, and retry libraries for Python provide battle-tested implementations. Don't roll your own unless you have unusual requirements. The edge cases—handling partial failures, correlating retry attempts across distributed traces, respecting Retry-After headers—are deeper than they appear.
What Belongs in an Error Message?
"Something went wrong" helps nobody. Neither does dumping a full stack trace to end users. Good error messages balance three audiences: the user (what happened and what to do), the support engineer (correlation IDs and context), and the developer (stack traces and system state).
Structured logging solves the multi-audience problem. Log rich context internally—request IDs, user IDs, payload hashes, timing information. Return sanitized, actionable messages externally. "Payment processing failed. Please try again or contact support with reference #7A3F9D." The reference maps to your internal logs. The user knows next steps. Everyone wins.
Security considerations matter too. Never expose internal paths, database schemas, or authentication tokens in client-facing errors. These leak attack surface information. Your logs can contain full exception details; your API responses should not.
When Is It Okay to Crash?
Not all errors are recoverable. Memory exhaustion. Disk corruption. Unrecoverable database connection failures. In these cases, crashing cleanly beats limping along with undefined state. Process supervisors—systemd, Docker restart policies, Kubernetes health checks—expect this. They restart failed processes automatically.
The key is crashing cleanly. Flush logs. Close database connections. Release locks. Signal monitoring systems. A panic that terminates without cleanup leaves resources dangling. Go's defer, Java's shutdown hooks, and Node's process.on('beforeExit') provide escape hatches for last-resort cleanup.
Microservices architectures embrace this philosophy. Individual service crashes are expected; the system remains available through redundancy. Monolithic applications can adopt similar patterns—health check endpoints that report degraded status, graceful shutdown handlers, and clear separation between "retry this" and "restart me" error categories.
How Do You Test Error Scenarios?
Error handling code that never runs is code that doesn't work. Yet most test suites focus exclusively on happy paths. Chaos engineering—popularized by Netflix's Chaos Monkey—takes this seriously, randomly terminating production instances to verify resilience.
You don't need chaos engineering to start. Use dependency injection to mock failure modes. Test your API's behavior when the database is unreachable. Verify your retry logic actually waits between attempts. Confirm that circuit breakers open after threshold failures and close after recovery. These tests catch the logic errors that only surface when things break.
Property-based testing excels here. Instead of manually crafting error scenarios, define properties that must always hold—"all errors include a correlation ID," "retry delays increase exponentially," "no request retries after circuit breaker opens." The test framework generates edge cases you'd never consider manually.
