Why Do Your Unit Tests Pass Locally but Fail in CI?

Why Do Your Unit Tests Pass Locally but Fail in CI?

Yuki MartinBy Yuki Martin
How-To & Fixestestingci-cddebuggingautomationbest-practices

It happens to every developer eventually. You run your test suite locally—green across the board. You push your branch, open a pull request, and watch the CI pipeline light up like a Christmas tree. Suddenly tests that passed moments ago are failing with cryptic errors. The code hasn't changed. Your environment seems identical. Yet here you are, debugging through CI logs at 11 PM, questioning your sanity.

This disconnect between local and CI test runs isn't random bad luck. It's usually the result of subtle environmental differences, hidden test dependencies, or timing issues that only surface under specific conditions. Understanding why these mismatches happen—and how to prevent them—separates smooth deployments from endless debugging sessions.

Are Your Tests Secretly Depending on Each Other?

One of the most common culprits is test interdependence. When tests share state—whether through static variables, database records, or filesystem operations—they create invisible chains that break when execution order changes. Your local test runner might execute tests alphabetically while CI uses a different seed. The result? Tests pass in one sequence and explode in another.

Consider a scenario where Test A creates a user with email "test@example.com" and Test B expects that user to exist. Locally, they run in that order. In CI, Test B executes first—and fails because the user table is empty. This is particularly insidious with database-backed tests where records persist between test runs if cleanup is incomplete.

The fix starts with ensuring each test starts from a known, isolated state. Use database transactions that roll back after each test. Avoid static mutable state in your test classes. Tools like Testcontainers can spin up fresh database instances per test class, eliminating cross-contamination entirely. While slower, this approach mirrors production more closely and catches integration issues early.

Is Your CI Running Tests in Parallel?

Modern CI platforms—GitHub Actions, GitLab CI, CircleCI—often run tests with higher parallelism than your local machine. This is great for speed until race conditions emerge. A test that writes to a shared file, uses a hardcoded port, or manipulates global configuration will fail randomly when another test does the same thing simultaneously.

Look for these warning signs in failing CI logs: "Address already in use" errors, file lock conflicts, or inconsistent data in supposedly isolated tests. The solution involves making your tests truly parallel-safe. Use dynamic port allocation (port 0 lets the OS assign an available port). Create temporary directories with unique names rather than hardcoded paths. If you're using in-memory caches or singletons, ensure each test gets its own instance.

Some test frameworks make this easier than others. Jest's --runInBand flag forces serial execution—useful for debugging, but masking the underlying problem. Instead, configure your tests to be parallel-safe from the start. The non-determinism article by Martin Fowler provides deeper insight into why these issues plague test suites and how to systematically eliminate them.

Are Environment Variables Sneaking Into Your Tests?

Your local machine is a snowflake. You've accumulated years of environment variables, dotfiles, and global configurations that don't exist in CI. A test that relies on HOME being set a certain way, or expects a specific timezone, or assumes certain locale settings will fail in the sterile CI environment.

This bites developers particularly hard with timezone-sensitive code. Your local machine runs EST; CI runs UTC. A test that parses "2024-01-15" expecting midnight local time gets a different result in a different zone. Similarly, tests that write to ~/.config/myapp might work locally (the directory exists) but fail in CI where the home directory structure differs.

Explicit is better than implicit. Mock environment variables in your tests rather than inheriting them. Use libraries like timecop (Ruby), freezegun (Python), or Jest's jest.useFakeTimers() to control time. For filesystem operations, mock the filesystem or use temporary directories that get cleaned up automatically. The goal is making your tests hermetic—self-contained and unaffected by external state.

Container Differences: It's Not Just "Linux"

Many developers develop on macOS or Windows but deploy to Linux. CI typically runs Linux containers. This creates subtle behavioral differences around case sensitivity (macOS filesystems are case-insensitive by default; Linux isn't), file permissions, and available system libraries.

A test that creates a file named "Config.json" and later tries to read "config.json" works fine on macOS but fails on Linux. Network timeouts behave differently. DNS resolution can vary. Even the version of glibc or SSL libraries can affect behavior in ways that break tests.

Docker-based development environments help bridge this gap. If your CI runs in containers, develop in matching containers. Tools like VS Code Dev Containers or GitHub Codespaces make this accessible. When that's not feasible, at least ensure your CI uses the same base image as your production deployment.

Flaky Tests: The Worst Offenders

Some tests fail randomly—passing 90% of the time and failing 10%. These flaky tests are worse than consistently failing ones because they train teams to ignore CI failures. "Just re-run the pipeline" becomes standard practice, masking real issues.

Flakiness usually stems from timing issues. A test that asserts on an asynchronous operation without proper waiting will fail when the operation takes slightly longer. Network-dependent tests fail when DNS hiccups. Tests with fixed timeouts fail under load.

Replace arbitrary sleeps with proper synchronization primitives. Use polling with reasonable timeouts rather than fixed delays. Mock external services you don't control—don't make live HTTP calls in unit tests. If you must test against real services (integration tests), design them to be retry-friendly and isolated from other tests.

How Can You Debug CI-Only Failures?

When tests fail only in CI, you need visibility. Most CI platforms let you SSH into failed builds or upload artifacts. Use these features aggressively. Capture screenshots of failing browser tests. Dump database state when assertions fail. Log environment details at test startup.

Some teams configure their CI to run tests with verbose logging enabled, capturing the extra output as artifacts. Others add specific "debug mode" flags that can be triggered via commit messages for problematic builds. The key is treating CI failures as seriously as production outages—they're often the canary in the coal mine for deployment issues.

Consider using a service like Buildkite's test analytics or GitHub Actions' built-in test reporting to track flaky tests over time. If a test fails intermittently, flag it for investigation rather than letting it become background noise. Teams that ignore flaky tests eventually stop trusting their test suite entirely—and that's when bugs slip into production.

Preventing the Local/CI Divide

The best fix is prevention. Run a subset of your CI pipeline locally before pushing. Tools like act let you execute GitHub Actions workflows on your machine. Pre-commit hooks can catch basic issues. But the real solution is architectural: design your tests to be environment-agnostic from the start.

Write tests that declare their dependencies explicitly. Use dependency injection rather than global state. Mock external boundaries (databases, HTTP APIs, filesystem) rather than assuming they're available. When you do need real infrastructure, encapsulate it behind interfaces that can be swapped for test doubles.

Most importantly, treat any test that behaves differently between environments as a bug—even if it passes. That inconsistency is telling you something important about coupling or assumptions in your code. Fix the test, and you've likely fixed a potential production issue before it manifests.

"A test that passes on your machine but fails in CI is lying to you. It's not actually testing what you think it's testing."