April 14, 2026  ·  10 min read

What Breaking Changes Taught Us About Backward Compatibility

Featured image — API changelog timeline showing compatibility matrix

Every team ships a breaking change they regret. Usually it's not the change itself that causes the regret — it's the way it was communicated, the migration runway that was too short, or the assumption that clients would adapt faster than they did. The downstream consequences of breaking changes scale with how many people depend on your API and how little notice they had.

We've shipped several. Here's what the collection of those experiences actually taught us.

What "Breaking" Actually Means

The textbook definition is narrow: a change is breaking if it causes previously correct client code to fail. Field renamed? Breaking. Response format changed? Breaking. Required parameter added? Breaking. New field added to response? Technically additive — but not always safe.

The more useful definition includes anything that causes client behavior to change in ways the client didn't anticipate. New field in response might be fine — or it might trigger a validation error in a strictly-typed deserialization layer that rejects unknown fields. Status code changed from 200 to 201 for successful creation — still 2xx, still a success, but some clients pattern-match on exact codes. Adding a new enum value to a field that clients switch on — additive to the schema, breaking to the client logic.

The practical takeaway: when you're reasoning about whether something is breaking, think about what a conservative client implementation might do with the change, not just what a well-written client would do. Defensive client code is written to reject the unexpected, not to handle it gracefully. Your change may be logically additive but practically breaking for the clients who exist, not the clients you wish existed.

The Enumeration Trap

This one shows up constantly. You have a status field that returns active or suspended. Clients ship code that switches on those two values. You add a third value, pending_verification. Your schema is richer. Your clients' switch statements now have an unhandled case.

How badly this fails depends on the language and the client implementation. In some languages, an unhandled enum value throws. In others, it silently falls through to default behavior. The silent failure is worse — the client might process a pending_verification account as if it were active because that's the final state the switch statement touched before falling through.

The fix is documentation: explicitly state in your API docs that enum values may be extended, and that clients should handle unknown values gracefully (treat them as a generic fallback, log them, alert on them — not throw or process them as something they're not). Some teams add this as a note in their changelog: "We reserve the right to add new enum values. Please handle unknown values explicitly." Writing this once in your API contract is better than discovering the breakage in production.

For your own code, never switch on an enum from an external API without a default case. This is defensive programming 101 that gets skipped more often than it should.

The Hidden Breaking Change: Response Time

A change that makes a response take four times longer to return is a breaking change in production even if the schema is identical. Clients have timeouts. SLAs have latency requirements. A downstream service that expects a 200ms response and starts getting 800ms responses will timeout, fail its own SLAs, and page someone.

Performance changes are rarely treated as breaking changes in changelogs. They should be, when the magnitude is significant. "Query performance optimization" that inadvertently degrades a specific endpoint under certain query patterns is a breaking change for the clients relying on that endpoint's performance characteristics.

The lesson: before shipping anything that might change latency characteristics — adding a new join, changing caching behavior, modifying an index — profile the endpoint under realistic load. If the change moves p95 latency by more than 20%, treat it as a breaking change operationally: notify clients, allow migration runway, have a rollback plan.

Migration Runway Math

How long you need to keep deprecated behavior alive depends on how many distinct clients are calling it and how much effort their migration requires. This sounds obvious; most teams still underestimate it.

The clients least likely to migrate quickly are the ones least likely to read your changelog or respond to deprecation emails. That includes abandoned integrations, internal tools built by teams who've since moved on, and enterprise clients whose engineering teams are perpetually busy with other things. Some percentage of your deprecated-endpoint traffic will still be coming in one year after the deprecation notice. Plan for it.

The calculation we use: look at the distribution of last-contact dates for API keys calling the deprecated behavior. If the median is two weeks ago but the 90th percentile is eight months ago, some of those clients are not actively maintained and will never migrate on any reasonable timeline. You need to decide whether to accept orphaning them or extend the deprecation period enough that someone at their company notices and cares.

For enterprise clients specifically: personal email to the technical contact is more reliable than changelog announcements. A direct message that says "your integration is calling an endpoint we're deprecating on this date, here's what to change" gets results that passive documentation announcements don't.

Additive Changes and the Expand/Contract Pattern

The expand/contract pattern is the most reliable way to ship changes that affect both request and response shapes without a version bump. The idea: deploy a change in two phases. Phase one (expand): add the new field/behavior alongside the old one. Both coexist. Clients can start using the new shape. Phase two (contract): remove the old shape, after sufficient time has passed and you've confirmed clients have migrated.

Applied to a field rename: add the new field name while keeping the old one, both returning the same value. Document that the old field is deprecated and will be removed on a specific date. When the date arrives and usage of the old field has dropped to near zero, remove it.

This pattern requires discipline. The "contract" phase — actually removing the old behavior — often gets delayed indefinitely because it's not urgent and carries risk. Teams end up with ten deprecated fields that still work because nobody scheduled the cleanup. Set a date in your calendar when you announce the deprecation. Put it in your sprint backlog. The expand phase creates technical debt that compounds until you contract.

Designing for Evolution From the Start

Retrofitting backward compatibility onto a poorly designed API is expensive. Designing for evolution from the start is cheap. The principles:

Avoid absence-as-signal. If your API's meaning changes depending on whether a field is present or null, that's brittle. Adding a field to an existing response shouldn't change the meaning of other fields. Null and absent are different and should be different for a reason you can explain.

Use opaque identifiers for values you might need to extend. A string identifier gives you room to add structure later. An integer gives you a number and nothing else. If your status values are strings today, you can extend them with namespacing later. If they're integers mapped to constants, renumbering breaks everything.

Version your API before you need to. The worst time to design your versioning strategy is when you're already facing a breaking change under time pressure. Build the versioning infrastructure when you launch and you'll never face the "do we break clients or not" decision cold.

Know who's still on deprecated API versions

APIForge shows you per-client version distribution, usage trends on deprecated endpoints, and helps you identify which clients need to migrate before you sunset old behavior.

Start Free