The Fig Tree (Strangler) Pattern: Replace Legacy Systems Without a Big-Bang Rewrite
Key Takeaways: The fig tree or strangler pattern works when you can control traffic at a choke point and measure outcomes per route.
Key Takeaways:
- The fig tree or strangler pattern works when you can control traffic at a choke point and measure outcomes per route.
- The fastest wins come from strangling read paths first, then writes, then shared data.
- Most failures come from hidden coupling: shared databases, shared auth, and “just one more endpoint” scope creep.
- Treat the migration like a product with SLOs, a backlog, and a kill switch, not a side project.
Most CTOs I talk to don’t fear legacy code. They fear the rewrite. A big-bang cutover turns one bad deploy into a company wide incident. The fig tree, also called the strangler pattern, gives you a calmer path. You grow a new system around the old one, route traffic to the new parts, and shrink the old system until it has nothing left to do.
This pattern isn’t “timely” because it’s trendy. It’s timely because teams ship faster than architecture ages. A few years later, the same system blocks every roadmap item. And the business still expects weekly releases.
What is the fig tree or strangler pattern?
Here’s a definition I use with boards and exec teams.
Quotable definition: The fig tree or strangler pattern replaces a legacy system by intercepting traffic, moving one capability at a time to a new implementation, and keeping both systems running until the old one has no traffic.
Martin Fowler popularized the idea as the “Strangler Fig” pattern, named after a vine that grows around a tree and takes over its canopy over time. His write up is still the cleanest mental model: Strangler Fig Application.
In software terms, you need three things:
- A choke point where you can intercept calls, often an API gateway, reverse proxy, or edge router.
- A routing rule that decides old vs new per request.
- A feedback loop that tells you if the new path is safe.
This isn’t only for monoliths. I’ve used it to replace a payments provider, a search cluster, and an internal identity service. The pattern held because we could control traffic and we had clear signals that told us if we were breaking users.
When to use the strangler pattern vs a rewrite
Use the strangler pattern when the business needs steady delivery during the migration. That’s most companies.
It works best when:
- You can route traffic by URL, tenant, account, region, or feature flag.
- You can run both systems in parallel for months.
- You can live with temporary duplication, like two implementations of “create invoice.”
- You’re willing to spend real time on observability and test automation.
Skip it when the old system has no stable boundary. You can still strangle a monolith with 1,200 internal calls per request and a shared database, but you’re signing up for a longer runway and a lot more discipline.
A practical decision matrix helps. I call it the Choke Point Test.
| Question | If “Yes” | If “No” |
|---|---|---|
| Can we intercept 90%+ of calls at one layer? | Strangler is viable | Start by creating an edge gateway |
| Can we route by tenant or endpoint? | Migrate in slices | You need a facade or BFF first |
| Can we compare outputs between old and new? | Use shadow reads | Plan for contract tests and replay |
| Can we run dual write safely? | Migrate writes earlier | Delay writes, start with reads |
| Do we have SLOs and tracing today? | Safer cutovers | Fund observability before migration |
If you fail the first row, you don’t have a strangler project yet. You have an architecture project.
Why the strangler pattern works in real systems
This works because it changes the risk shape. A rewrite concentrates risk into one date. A strangler spreads risk across lots of small cutovers.
Three mechanics matter in practice.
Traffic shaping reduces blast radius. Start with 1% of traffic, or one internal tenant. Ramp to 10%, then 50%, then 100%. If you already do canary releases, you already have the right instincts. Google’s SRE guidance on safe rollouts and error budgets maps cleanly to strangler migrations: Site Reliability Engineering book.
Parallel run gives you proof, not hope. Shadow reads let the new system compute results without serving them. Compare outputs and latency. Log diffs. Fix gaps before users see them. This is the same idea as “dark launching.” LaunchDarkly lays out the practice and the control surface you need: Dark launches.
Contracts force clarity. Legacy systems often have fuzzy behavior. They “sort of” accept invalid input. They return odd error codes. Strangling forces you to write down the contract, and that contract becomes the migration spec. Stripe’s API is a good example of contract discipline and versioning. Their docs show how they treat compatibility as a product feature: Stripe API versioning.
A concrete scenario: a B2B SaaS company has a 9 year old billing monolith. It serves 2,000 requests per second at peak. It has a 99.9% SLO, so it can burn 43 minutes a month. The team wants to add usage based billing, but every change risks outages.
A strangler plan starts with read paths:
- Route
GET /invoicesto the new service for internal staff only. - Shadow read for 10% of customers and compare totals.
- Cut over one region, then one plan tier.
You can measure success with three numbers:
- Diff rate between old and new responses, target under 0.1%.
- P95 latency delta, target within 20% of old.
- Error budget burn, target no worse than baseline.
Common pitfalls and what to do about them
Most strangler projects fail from people problems that show up as technical problems.
Pitfall: Shared database coupling. Teams route API calls but still share tables. Then a schema change breaks both systems. The fix is boring and non-negotiable. Stop sharing writes. Introduce a replication stream or change data capture. If you run Postgres, logical replication can help. If you run MySQL, binlog based CDC can help. The rule that saves you is simple: one writer per table.
Pitfall: Dual writes without a plan. Dual write looks simple. It isn’t. You get partial failures and drift. If you must dual write, add an outbox table and a retry worker. Keep idempotency keys. Decide which system is source of truth per field, in writing, before you ship.
Pitfall: No kill switch. Every route needs a fast rollback. Put it in the gateway or feature flag layer, not behind a code deploy. If rollback takes 45 minutes, you don’t have rollback.
Pitfall: Endpoint by endpoint migration with no domain plan. Teams pick random endpoints and call it progress. Then they hit a hard dependency and stall. Move domain slices. Ship an “invoice read model” as a unit, not five endpoints that happen to be nearby.
Pitfall: The old system keeps growing. Product teams keep adding features to the legacy code because it’s faster today. That’s how migrations die. You need a policy. After a date, all net new features go to the new system, even if it hurts for a quarter.
This is where leadership shows up. You need a single migration owner with budget and authority. You also need a weekly review with product and support. Migration work changes customer behavior, and support sees the weird edge cases first.
If you want a tool to keep this visible, treat it like a portfolio item. Track routes migrated, error budgets, and open risks in Command Center. The migration dies in the dark.
An actionable plan CTOs can run in 90 days
I use a simple operating model called the Strangle Loop. It’s a repeatable cycle:
- Map: List top 20 routes by traffic and revenue impact. Add owners and dependencies.
- Measure: Add tracing, golden signals, and per route dashboards. Set SLOs for the new path.
- Mirror: Shadow read and diff outputs. Log mismatches with request ids.
- Move: Cut over a slice. Start with internal users, then one tenant tier.
- Mop up: Delete old code and old tables for that slice. Close the loop.
A 90 day plan can look like this:
- Days 1 to 14: Create the choke point. Add routing rules and a kill switch.
- Days 15 to 30: Pick one thin slice with high learning value. Migrate reads.
- Days 31 to 60: Add shadow reads for 10% of traffic. Drive diff rate under 0.1%.
- Days 61 to 90: Cut over one tenant tier to full read traffic. Start write design.
What should you migrate first? Pick the slice that reduces fear. Auth, billing, and core data writes are scary for good reasons. Start with read heavy paths that still matter, like search, reporting, or catalog reads.
This is also a good time to tighten incident muscle. Use our guide to incident postmortems with action items and keep migration regressions out of the main backlog. And if you hit weird multi service failures, Split Cause can speed up root cause work by linking signals across systems.
For org design, don't run this with a part time tiger team. Create a small migration squad, 4 to 8 engineers, plus a tech lead and a product partner. Give them a clear charter and a weekly demo. If you need to model team boundaries, use the Engineering Org Designer.
Broader context: why this pattern is also a people strategy
A strangler project is a long bet. It touches incentives, career paths, and trust. If you treat it as “cleanup,” it’ll lose every planning cycle.
I frame it as a delivery strategy. You're buying the ability to ship again. That means you need metrics execs respect. Tie the work to lead time, incident rate, and cloud spend. Track it in an engineering metrics dashboard so the story stays consistent.
The catch is that the pattern exposes weak architecture habits. No contracts. No ownership. No SLOs. No clear domains. That’s painful, and it’s also the point. The fig tree pattern won’t save you from hard work. It turns hard work into steps you can survive.
If you want one internal link to keep handy, pair this post with our guide to tech debt prioritization that ties to revenue risk. Strangling is debt paydown with a shipping plan.
Sources: