This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Description on the Minimums

Essential continuous delivery practices for software teams. Learn trunk-based development, continuous integration, deployment pipelines, and testing strategies to improve delivery speed and quality.

1 - Continuous Integration

Continuous integration requires daily code integration to trunk with automated testing. Learn CI best practices, testing strategies, and team workflows that improve software quality and delivery speed.

Definition

Continuous Integration (CI) is the activity of each developer integrating work to the trunk of version control at least daily and verifying that the work is, to the best of our knowledge, releasable.

CI is not just about tooling—it’s fundamentally about team workflow and working agreements.

The minimum activities required for CI

  1. Trunk-based development - all work integrates to trunk
  2. Work integrates to trunk at a minimum daily (each developer, every day)
  3. Work has automated testing before merge to trunk
  4. Work is tested with other work automatically on merge
  5. All feature work stops when the build is red
  6. New work does not break delivered work

Why This Matters

Without CI, Teams Experience

  • Integration hell: Weeks or months of painful merge conflicts
  • Late defect detection: Bugs found after they’re expensive to fix
  • Reduced collaboration: Developers work in isolation, losing context
  • Deployment fear: Large batches of untested changes create risk
  • Slower delivery: Time wasted on merge conflicts and rework
  • Quality erosion: Without rapid feedback, technical debt accumulates

With CI, Teams Achieve

  • Rapid feedback: Know within minutes if changes broke something
  • Smaller changes: Daily integration forces better work breakdown
  • Better collaboration: Team shares ownership of the codebase
  • Lower risk: Small, tested changes are easier to diagnose and fix
  • Faster delivery: No integration delays blocking deployment
  • Higher quality: Continuous testing catches issues early

Team Working Agreements

While CI depends on tooling, the team workflow and working agreement are more important:

  1. Define testable work: Work includes testable acceptance criteria that drive testing efforts
  2. Tests accompany commits: No work committed to version control without required tests
  3. Incremental progress: Committed work may not be “feature complete”, but must not break existing work
  4. Trunk-based workflow: All work begins from trunk and integrates to trunk at least daily
  5. Stop-the-line: If CI detects an error, the team stops feature work and collaborates to fix the build immediately

The stop-the-line practice is critical for maintaining an always-releasable trunk. For detailed guidance on implementing this discipline, see All Feature Work Stops When the Build Is Red.

Example Implementations

Anti-Pattern: Feature Branch Workflow Without CI

Developer A: feature-branch-1 (3 weeks of work)
Developer B: feature-branch-2 (2 weeks of work)
Developer C: feature-branch-3 (4 weeks of work)

Week 4: Merge conflicts, integration issues, broken tests
Week 5: Still fixing integration problems
Week 6: Finally stabilized, but lost 2 weeks to integration

Problems

  • Long-lived branches accumulate merge conflicts
  • Integration issues discovered late
  • No early feedback on compatibility
  • Large batches of untested changes
  • Team blocked while resolving conflicts

Good Pattern: Continuous Integration to Trunk

# .github/workflows/ci.yml
name: Continuous Integration

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm test

      - name: Run integration tests
        run: npm run test:integration

      - name: Code quality checks
        run: npm run lint

      - name: Security scan
        run: npm audit

      - name: Build application
        run: npm run build

  notify-on-failure:
    needs: test
    if: failure()
    runs-on: ubuntu-latest
    steps:
      - name: Notify team
        run: |
          echo "Build failed - stop feature work and fix!"
          # Send Slack/email notification

Benefits

  • Changes tested within minutes
  • Team gets immediate feedback
  • Small changes are easy to debug
  • Integration is never a surprise
  • Quality maintained continuously

Evolutionary Coding Practices

To integrate code daily while building large features, use patterns like branch by abstraction, feature flags, and connect-last. These techniques allow you to break down large changes into small, safe commits that integrate to trunk daily without breaking existing functionality.

For detailed guidance and code examples, see Evolutionary Coding Practices.

Testing in CI

A comprehensive testing strategy balances fast feedback with thorough validation. Run different test types at different stages of the pipeline:

  • Pre-merge tests (< 10 minutes): Unit tests, linting, static security scans, dependency audits
  • Post-merge tests (< 30 minutes): All pre-merge tests plus integration tests, functional tests, performance tests (validate response time and throughput requirements), and dynamic security tests
  • Deployment tests: End-to-end and smoke tests belong in the deployment pipeline, not CI

For detailed guidance on test strategy, the test pyramid, deterministic testing, and test quality, see Testing Strategies.

What is Improved

Teamwork

CI requires strong teamwork to function correctly. Key improvements:

  • Pull workflow: Team picks next important work instead of working from assignments
  • Code review cadence: Quick reviews (< 4 hours) keep work flowing
  • Pair programming: Real-time collaboration eliminates review delays
  • Shared ownership: Everyone maintains the codebase together
  • Team goals over individual tasks: Focus shifts from “my work” to “our progress”

Anti-pattern: “Push” workflow where work is assigned creates silos and delays.

Work Breakdown

CI forces better work decomposition:

  • Definition of Ready: Every story has testable acceptance criteria before work starts
  • Small batches: If the team can complete work in < 2 days, it’s refined enough
  • Vertical slicing: Each change delivers a thin, tested slice of functionality
  • Incremental delivery: Features built incrementally, each step integrated daily

See Work Breakdown for detailed guidance.

Testing

CI requires a shift in testing approach:

From: Writing tests after code is “complete” To: Writing tests before/during coding (TDD/BDD)

From: Testing implementation details To: Testing behavior and outcomes

From: Manual testing before deployment To: Automated testing on every commit

From: Separate QA phase To: Quality built into development

CI teams build a comprehensive test suite with the goal of detecting issues as close to creation as possible. See Behavior-Driven Development.

Common Challenges

“What are the main problems to overcome?”

  1. Poor teamwork: Usually driven by assigning work instead of using a pull system
  2. Lack of testable acceptance criteria: Made worse by individual assignments instead of team goals. BDD provides declarative functional tests everyone understands
  3. Lack of evolutionary coding knowledge: “I can’t commit until the feature is complete!” Use branch by abstraction, feature flags, or plan changes so the last change integrates the feature

“How do I complete a large feature in less than a day?”

You probably don’t complete it in a day, but you integrate progress every day. See Evolutionary Coding Practices for detailed patterns and code examples.

“What code coverage level is needed before we can do CI?”

You don’t need tests in existing code to begin CI. You need to test new code without exception.

Starting point: “We will not go lower than the current level of code coverage.”

“What code coverage percentage should we have?”

“I’m confident.” Are you confident you’ve covered enough positive and negative cases?

Better question: “Do we trust our tests?” Test coverage percentage doesn’t indicate test quality.

“Should we set a code coverage standard for all teams?”

No. Code coverage mandates incentivize meaningless tests that hide the fact that code is not tested.

It is better to have no tests than to have tests you do not trust.

Instead: Focus on test quality, behavior coverage, and team discipline. See Code Coverage for detailed guidance.

Monitoring CI Health

Track these key metrics to understand CI effectiveness and drive improvement:

  • Commits per day per developer: ≥ 1 (team average)—indicates integration discipline
  • Development cycle time: < 2 days average—shows effective work breakdown
  • Build success rate: > 95%—reflects pre-merge testing quality
  • Time to fix broken build: < 1 hour—demonstrates stop-the-line commitment
  • Defect rate: Stable or decreasing—ensures speed doesn’t sacrifice quality

Make pipeline status visible to everyone through dashboards, notifications, and build radiators. Visibility drives faster response, shared accountability, and continuous improvement.

For detailed guidance on metrics, dashboards, and using data for improvement, see Pipeline Visibility & Health Metrics.

Additional Resources

1.1 - Evolutionary Coding Practices

Learn how to integrate code daily while building large features using branch by abstraction, feature flags, and connect-last patterns.

A core skill needed for CI is the ability to make code changes that are not complete features and integrate them to the trunk without breaking existing behaviors. We never make big-bang changes. We make small changes that limit our risk. These are some of the most common methods.

Branch by Abstraction

Gradually replace existing behavior while continuously integrating:

// Step 1: Create abstraction (integrate to trunk)
class PaymentProcessor {
  process(payment) {
    return this.implementation.process(payment)
  }
}

// Step 2: Add new implementation alongside old (integrate to trunk)
class StripePaymentProcessor {
  process(payment) {
    // New Stripe implementation
  }
}

// Step 3: Switch implementations (integrate to trunk)
const processor = useNewStripe ? new StripePaymentProcessor() : new LegacyProcessor()

// Step 4: Remove old implementation (integrate to trunk)

Feature Flags

Feature flags control feature visibility without blocking integration. However, they’re often overused—many scenarios have better alternatives.

When to use feature flags

  • Large or high-risk changes needing gradual rollout
  • Testing in production before full release (dark launch, beta testing)
  • A/B testing and experimentation
  • Customer-specific behavior or toggles
  • Cross-team coordination requiring independent deployment

When NOT to use feature flags

  • New features that can connect to tests only, integrate in final commit
  • Behavior changes (use branch by abstraction instead)
  • New API routes (build route, expose as last change)
  • Bug fixes or hotfixes (deploy immediately)
  • Simple changes (standard deployment sufficient)

Example usage

// Incomplete feature integrated to trunk, hidden behind flag
if (featureFlags.newCheckout) {
  return renderNewCheckout() // Work in progress
}
return renderOldCheckout() // Stable existing feature

// Team can continue integrating newCheckout code daily
// Feature revealed when complete by toggling flag

For detailed decision guidance and implementation approaches, see Feature Flags.

Connect Last

Build complete features, connect them in final commit:

// Commits 1-10: Build new checkout components (all tested, all integrated)
function CheckoutStep1() {
  /* tested, working */
}
function CheckoutStep2() {
  /* tested, working */
}
function CheckoutStep3() {
  /* tested, working */
}

// Commit 11: Wire up to UI (final integration)
;<Route path="/checkout" component={CheckoutStep1} />

For detailed guidance on when to use each pattern, see Feature Flags.

Why These Patterns Matter

These evolutionary coding practices enable teams to:

  • Integrate daily: Break large features into small, safe changes
  • Reduce risk: Each commit is tested and releasable
  • Maintain flow: No waiting for features to complete before integrating
  • Improve collaboration: Team shares ownership of evolving code
  • Enable rollback: Easy to revert small changes if needed

Common Questions

“How do I complete a large feature in less than a day?”

You probably don’t complete it in a day, but you integrate progress every day using these patterns. Each daily commit is tested, working, and doesn’t break existing functionality.

“Which pattern should I use?”

  • Connect Last: Best for new features that don’t affect existing code
  • Branch by Abstraction: Best for replacing or modifying existing behavior
  • Feature Flags: Best for gradual rollout, testing in production, or customer-specific features

“Don’t these patterns add complexity?”

Temporarily, yes. But this complexity is:

  • Intentional: You control when and how it’s introduced
  • Temporary: Removed once the transition is complete
  • Safer: Than long-lived branches with merge conflicts
  • Testable: Each step can be verified independently

Additional Resources

1.2 - Testing Strategies

Learn what tests should run in CI, when they should run, and how to optimize for fast feedback while maintaining comprehensive validation.

A comprehensive testing strategy is essential for continuous integration. The key is balancing fast feedback with thorough validation by running different test types at different stages of the pipeline.

Pre-Merge Testing (Fast Feedback)

Tests that run before code merges to trunk should provide rapid feedback to developers. The goal is to catch obvious issues quickly without blocking the integration workflow.

What to Run

  • Static analysis: Type checkers, linters, security scans
  • Unit tests: Fast tests (preferably sociable unit tests with real in-process dependencies)
  • Dependency audits: Known vulnerabilities in dependencies

Performance Goal

Complete in < 10 minutes

Why Speed Matters

Pre-merge tests create a feedback loop for developers. If these tests take too long, developers context-switch while waiting, multiple developers queue up, and the team slows down integration frequency.

Keep pre-merge tests focused on fast, deterministic checks that catch the most common issues.

Post-Merge Testing (Comprehensive Validation)

After code merges to trunk, run the complete test suite to validate the integrated system.

What to Run

  • All pre-merge tests: Re-run for final validation
  • Integration tests: Test component interactions with real dependencies
  • Functional tests: Test user-facing behavior
  • Performance tests: Validate response time and throughput requirements
  • Dynamic security tests: Security analysis of running application

Performance Goal

Complete in < 30 minutes

Why Re-run Pre-merge Tests?

Pre-merge tests validate individual changes in isolation. Post-merge tests validate that the merge itself didn’t introduce issues:

  • Merge conflict resolutions may have introduced bugs
  • Timing-dependent interactions between simultaneous merges
  • Dependencies between changes merged around the same time
  • Environment differences between local and CI

Running the full suite after merge provides a final safety check.

What About Deployment Testing?

Tests that require deployment to an environment (end-to-end tests, smoke tests) belong in the deployment pipeline, not in CI.

Why Separate Deployment Testing

  • CI validates code integration
  • Deployment pipeline validates releasability
  • Different performance requirements
  • Different failure modes and remediation

Mixing these concerns leads to slow CI pipelines that discourage frequent integration.

The Testing Trophy

The testing trophy model emphasizes sociable unit tests (testing units with their real collaborators) as the foundation of your test suite.

      /\
     /  \      Static Analysis
    /----\
   / E2E  \    End-to-end tests
  /--------\
 /Integration\ ← Most tests here (80%)
/------------\
/    Unit     \ Supporting layer

Test Distribution

Static analysis (Foundation): Type checkers, linters, security scanners—catch errors before running code.

Solitary unit tests (Supporting—minimize these): Pure functions with no dependencies. Use sparingly.

Sociable unit tests / Integration tests (The bulk—80%): Test units with their real collaborators. This is where most of your tests should be.

E2E tests (Critical paths only): Complete user journeys. Use sparingly due to cost and brittleness.

Sociable vs Solitary Unit Tests

Terminology note: What the testing trophy calls “integration tests” are more precisely sociable unit tests in Martin Fowler’s Practical Test Pyramid.

  • Solitary unit tests: Test a unit in complete isolation with all dependencies mocked
  • Sociable unit tests (recommended): Test a unit with its real collaborators and dependencies within the component under test while avoiding network boundaries.

Prioritize sociable unit tests over solitary unit tests because they:

  • Catch real bugs in how components interact
  • Are less brittle (don’t break during refactoring)
  • Test actual behavior rather than implementation details
  • Provide higher confidence without significant speed penalty

For detailed examples and guidance, see:

Test at the Right Level

Decision Tree

  1. Is it pure logic with no dependencies? → Solitary unit test
  2. Does it have collaborators/dependencies? → Sociable unit test / Integration test (most code!)
  3. Does it cross system boundaries or require full deployment? → E2E test (sparingly)

Key Principle

Default to sociable unit tests (with real dependencies) over solitary unit tests (with mocks).

When in Doubt

Choose sociable unit test. It will catch more real bugs than a solitary unit test with mocks.

Deterministic Testing

All tests must be deterministic—producing the same result every time they run. Flaky tests destroy trust in the pipeline.

Common Causes of Flaky Tests

  • Race conditions and timing issues
  • Shared state between tests
  • External dependencies (networks, databases)
  • Non-deterministic inputs (random data, current time)
  • Environmental differences

Solutions

  • Mock external dependencies you don’t control
  • Clean up test data after each test
  • Control time and randomness in tests
  • Isolate test execution
  • Fix or remove flaky tests immediately

For detailed guidance, see Deterministic Tests.

Test Quality Over Coverage

Test coverage percentage doesn’t indicate test quality.

Better questions than “What’s our coverage percentage?”:

  • Do we trust our tests?
  • Are we confident we’ve covered positive and negative cases?
  • Do tests document expected behavior?
  • Would tests catch regressions in critical paths?

Coverage Mandates Are Harmful

Setting organization-wide coverage standards incentivizes meaningless tests that hide the fact that code isn’t properly tested.

It is better to have no tests than to have tests you do not trust.

Instead of mandates:

  • Focus on test quality and behavior coverage
  • Build team discipline around testing
  • Review tests as carefully as production code
  • Make testing part of the definition of done

For detailed guidance, see Code Coverage.

Practical Recommendations for CI

Building Your Test Suite

  1. Start with static analysis: Type checkers, linters—catch errors before running code
  2. Write sociable unit tests as default: Test with real dependencies (databases, state, etc.)
  3. Add solitary unit tests sparingly: Only for pure functions with complex logic
  4. Add E2E tests strategically: Critical user journeys and revenue paths only
  5. Avoid excessive mocking: Mock only external services you don’t control

For CI Effectiveness

  1. Run static analysis first: Instant feedback, zero runtime cost
  2. Run fast tests pre-merge: Use in-memory databases, parallel execution
  3. Run comprehensive tests post-merge: More realistic setup, longer running tests
  4. Run E2E tests post-merge: Keep them out of the critical path
  5. Set time budgets: Pre-merge < 10 min, post-merge < 30 min
  6. Quarantine flaky tests: Fix or remove them immediately

For Test Quality

  1. Test behavior from user’s perspective: Not implementation details
  2. Use real dependencies: Catch real integration bugs
  3. One scenario per test: Makes failures obvious and debugging fast
  4. Descriptive test names: Should explain what behavior is being verified
  5. Independent tests: No shared state, can run in any order

Testing Anti-Patterns to Avoid

  • Don’t mock everything: Solitary unit tests with extensive mocking are brittle
  • Don’t test implementation details: Tests that break during refactoring provide no value
  • Don’t write E2E for everything: Too slow, too brittle—use sociable unit tests instead
  • Don’t skip sociable unit tests: This is where the bugs hide
  • Don’t ignore flaky tests: They destroy trust in your pipeline

Starting Without Full Coverage

You don’t need tests in existing code to begin CI. You need to test new code without exception.

Starting point: “We will not go lower than the current level of code coverage.”

This approach:

  • Allows teams to start CI immediately
  • Prevents technical debt from growing
  • Builds testing discipline incrementally
  • Improves coverage over time

As you work in existing code:

  • Add tests for code you modify
  • Test new features completely
  • Gradually improve coverage in active areas
  • Don’t mandate retrofitting tests to untouched code

Additional Resources

Testing Strategies

Testing Practices

1.3 - Pipeline Visibility & Health Metrics

Monitor CI health through key metrics including commit frequency, build success rate, and time to fix failures. Learn what to measure and why it matters.

CI pipeline visibility ensures the entire team can see the health of the integration process and respond quickly to issues. Combined with the right metrics, visibility drives continuous improvement.

Why Visibility Matters

When pipeline status is visible to everyone:

  • Faster response: Team sees failures immediately
  • Shared accountability: Everyone owns the build
  • Better collaboration: Team coordinates on fixes
  • Continuous improvement: Metrics highlight bottlenecks
  • Quality culture: Green builds become a team priority

Making the Pipeline Visible

Real-Time Status Display

Make build status impossible to ignore:

  • Build radiators: Large displays showing current status
  • Team dashboards: Shared screens with pipeline health
  • Status indicators: Visual signals (traffic lights, etc.)
  • Browser extensions: Build status in developer tools
  • Desktop notifications: Alerts when builds break

The key is making status ambient—visible without requiring effort to check.

Notification Systems

Automated notifications ensure the team knows when action is needed:

When to notify

  • Build failures on trunk
  • Flaky test detection
  • Long-running builds
  • Security vulnerabilities found
  • Quality gate failures

How to notify

  • Team chat channels (Slack, Teams)
  • Email for critical failures
  • SMS/phone for extended outages
  • Dashboard alerts
  • Version control integrations

Notification best practices

  • Notify the whole team, not individuals
  • Include failure details and logs
  • Link directly to failed builds
  • Suggest next actions
  • Avoid notification fatigue with smart filtering

CI Health Metrics

Track these metrics to understand and improve CI effectiveness:

Commits per Day per Developer

What: How frequently the team integrates code to trunk

How to measure: Total commits to trunk ÷ number of developers ÷ days

Good: ≥ 1 commit per developer per day (team average)

Why it matters:

  • Indicates true CI practice adoption
  • Shows work breakdown effectiveness
  • Reveals integration discipline
  • Predicts integration conflict frequency

Important: Never compare individuals—this is a team metric. Use it to understand team behavior, not to rank developers.

If the number is low

  • Work is too large to integrate daily
  • Team needs better work decomposition
  • Fear of breaking the build
  • Missing evolutionary coding skills

Development Cycle Time

What: Time from when work begins to completion (merged to trunk)

How to measure: Time from first commit on branch to merge to trunk

Good: < 2 days on average

Why it matters:

  • Indicates effective work breakdown
  • Shows CI practice maturity
  • Predicts batch size and risk
  • Correlates with deployment frequency

If cycle time is high

  • Stories are too large
  • Rework due to late feedback
  • Waiting for code reviews
  • Complex approval processes
  • Poor work decomposition

Build Success Rate

What: Percentage of trunk builds that pass all tests

How to measure: (Successful builds ÷ total builds) × 100

Good: > 95%

Why it matters:

  • Indicates pre-merge testing quality
  • Shows team discipline
  • Predicts trunk stability
  • Reflects testing effectiveness

If success rate is low

  • Pre-merge tests insufficient
  • Team not running tests locally
  • Flaky tests creating false failures
  • Missing stop-the-line discipline

Time to Fix Broken Build

What: How quickly the team resolves build failures on trunk

How to measure: Time from build failure to successful build

Good: < 1 hour

Why it matters:

  • Shows team commitment to CI
  • Indicates stop-the-line practice
  • Reflects debugging capability
  • Predicts integration delays

If fix time is high

  • Team continues feature work during failures
  • Difficult to diagnose failures
  • Complex, slow build process
  • Lack of build ownership
  • Poor error messages in tests

Defect Rate

What: Critical guardrail metric to ensure speed doesn’t sacrifice quality

How to measure: Defects found per unit of time or per deployment

Good: Stable or decreasing as CI improves

Why it matters:

  • Quality validation
  • Prevents speed over quality
  • Shows testing effectiveness
  • Builds stakeholder confidence

If defect rate increases

  • Tests don’t cover critical paths
  • Team skipping testing discipline
  • Poor test quality (coverage without value)
  • Speed prioritized over quality
  • Missing acceptance criteria

Dashboard Design

Effective CI dashboards show the right information at the right time:

Essential Information

Current status

  • Trunk build status (green/red)
  • Currently running builds
  • Recent commit activity
  • Failed test names
  • Commit frequency
  • Build success rate
  • Average fix time
  • Cycle time trends

Team health

  • Number of active branches
  • Age of oldest branch
  • Flaky test count
  • Test execution time

Dashboard Anti-Patterns

Avoid

  • Individual developer comparisons
  • Vanity metrics (total commits, lines of code)
  • Too much detail (cognitive overload)
  • Metrics without context
  • Stale data (not real-time)

Using Metrics for Improvement

Metrics are tools for learning, not weapons for management.

Good Uses

  • Team retrospectives on CI effectiveness
  • Identifying bottlenecks in the process
  • Validating improvements (A/B comparisons)
  • Celebrating progress and wins
  • Guiding focus for improvement efforts

Bad Uses

  • Individual performance reviews
  • Team comparisons or rankings
  • Setting arbitrary targets without context
  • Gaming metrics to look good
  • Punishing teams for honest reporting

Improvement Cycle

  1. Measure current state: Establish baseline metrics
  2. Identify bottleneck: What’s the biggest constraint?
  3. Hypothesize improvement: What change might help?
  4. Experiment: Try the change for a sprint
  5. Measure impact: Did metrics improve?
  6. Standardize or iterate: Keep or adjust the change

Common Visibility Challenges

“Metrics Can Be Gamed”

Yes, any metric can be gamed. The solution isn’t to avoid metrics—it’s to:

  • Use metrics for learning, not punishment
  • Track multiple metrics (gaming one reveals problems in others)
  • Focus on outcomes (quality, speed) not just outputs (commits)
  • Build a culture of honesty and improvement

“Too Many Notifications Create Noise”

True. Combat notification fatigue:

  • Only notify on trunk failures (not branch builds)
  • Aggregate related failures
  • Auto-resolve when fixed
  • Use severity levels
  • Allow custom notification preferences

“Dashboards Become Wallpaper”

Dashboards lose impact when ignored. Keep them relevant:

  • Update regularly with fresh data
  • Rotate what’s displayed
  • Discuss in stand-ups
  • Celebrate improvements
  • Remove stale metrics

Additional Resources

1.4 - All Feature Work Stops When the Build Is Red

Why continuous delivery requires stopping all feature work when the build breaks—not just blocking merges. Learn the team mindset, practices, and working agreements that make this discipline effective.

When the trunk build breaks, the entire team stops feature work and collaborates to fix it immediately. This practice, borrowed from lean manufacturing’s Andon Cord, prevents defects from propagating and maintains an always-releasable trunk.

Every team member shifts focus to:

  1. Understanding what broke
  2. Fixing the broken build
  3. Learning why it happened
  4. Preventing similar failures

No new feature work begins until the build is green again.

Why ALL Work Stops, Not Just Merges

A common objection is: “Why stop all feature work? Just block merging until the pipeline is green.”

This misses the point. Continuous Delivery is not just technology and workflow—it is a mindset. Part of that mindset is that individuals on the team do not have individual priorities. The team has priorities.

Work Closer to Production Is Always More Valuable

Work that is closer to production is always more valuable than work that is further away. A broken pipeline is halting the most important work: getting tested, integrated changes to users. It is also blocking any hotfix the team may need to deploy.

When the build is red, fixing it is the team’s highest priority. Not your feature. Not your story. The pipeline.

“Just Block Merges” Creates a False Sense of Progress

If developers continue writing feature code while the build is broken:

  • They are building on a foundation they cannot verify
  • Their work is accumulating integration risk with every passing minute
  • They are individually productive but the team is not delivering
  • The broken build becomes someone else’s problem instead of everyone’s priority
  • The incentive to fix the build urgently is removed—it can wait until someone wants to merge

This is the difference between individual activity and team effectiveness. A team where everyone is typing but nothing is shipping is not productive.

This Is a Team Organization Problem

If the team is not organized to enable everyone to swarm on a broken build, that is a fundamental dysfunction. CD requires teams that:

  • Share ownership of the pipeline and the codebase
  • Prioritize collectively rather than protecting individual work streams
  • Can all contribute to diagnosing and fixing build failures
  • Treat the pipeline as the team’s most critical asset

A team that says “I’ll keep working on my feature while someone else fixes the build” has not adopted the CD mindset. They are a group of individuals sharing a codebase, not a team practicing Continuous Delivery.

What This Looks Like in Practice

When the Team Stops

09:15 - Build fails on trunk
09:16 - Automated notification to team chat
09:17 - Team acknowledges
09:18 - Feature work pauses
09:20 - Quick huddle: what broke?
09:25 - Two devs pair on fix
09:40 - Fix committed
09:45 - Build green
09:46 - Team resumes feature work
09:50 - Quick retro: why did it break?

Total impact: 30 minutes of paused feature work Team learned: Missing test case for edge condition Outcome: Better tests, faster next time

When the Team Doesn’t Stop

09:15 - Build fails on trunk
09:30 - Someone notices
10:00 - "We'll look at it later"
11:00 - Another commit on a red build
12:00 - Third failure, harder to diagnose
14:00 - "This is too complex, we need help"
16:00 - Multiple devs debugging
17:30 - Finally fixed

Total impact: 8+ hours of broken trunk, multiple devs blocked Team learned: Nothing systematic Outcome: Same failures likely to recur

When developers continue working on a broken build, new work may depend on broken code, multiple changes pile up making diagnosis harder, and the broken state becomes the new baseline. Stopping immediately contains the problem.

When the Fix Takes Too Long

If the fix will take more than 15 minutes, prefer reverting:

Option 1: Revert immediately

  • Roll back the commit that broke the build
  • Get trunk green
  • Fix properly offline
  • Re-integrate with the fix

Option 2: Forward fix with a time limit

  • Set a timer (15 minutes)
  • Work on forward fix
  • If the timer expires: revert
  • Fix offline and re-integrate

Choose revert bias when unsure. The goal is a green trunk, not a heroic fix.

Team Working Agreements

Effective stop-the-line requires clear agreements:

Fast Build Feedback

Agreement: “Our builds complete in < 10 minutes”

Developers can’t respond to failures they don’t know about. If builds are slow, parallelize test execution, move slow tests post-merge, or invest in faster infrastructure.

Visible Build Status

Agreement: “Build status is visible to the entire team at all times”

You can’t stop for failures you don’t see. Use build radiators, chat notifications, and desktop alerts. See Pipeline Visibility for detailed guidance.

Team Owns the Fix

Agreement: “When the build breaks, the team owns the fix”

Not: “Whoever broke it fixes it” Instead: “The team fixes it together”

Individual blame prevents collaboration. The person who triggered the failure may not have the expertise or context to fix it quickly. Rally the team.

Fixed Means Green

Agreement: “Fixed means green build on trunk, not just a fix committed”

Fixed includes: root cause identified, fix implemented, tests passing on trunk, and a plan to prevent recurrence.

No Bypassing

Agreement: “We will not bypass CI to deploy during red builds”

Not for critical hotfixes (fix the build first, or revert). Not for small changes (small doesn’t mean safe). Not for “known failures” (then they should be fixed or removed). Not for executive pressure (protect the team).

Common Objections

“We can’t afford to stop feature work”

You can’t afford not to. Every hour the build stays broken compounds future integration issues, blocks other developers, erodes deployment confidence, and increases fix complexity. Stopping is cheaper.

“Stopping kills our velocity”

Short term, stopping might feel slow. Long term, stopping accelerates delivery. Broken builds that persist block developers, create integration debt, and compound failures. Stopping maintains velocity by preventing these compounding costs.

“We stop all the time”

If builds break frequently, the problem isn’t stopping—it’s insufficient testing before merge. Improve pre-merge testing, require local test runs, and fix flaky tests. Stopping reveals the problem. Better testing solves it.

“It’s a known flaky test”

Then remove it from the build. Either fix the flaky test immediately, remove it from trunk builds, or quarantine it for investigation. Non-deterministic tests are broken tests. See Deterministic Tests for guidance.

“Management doesn’t support stopping”

Educate stakeholders on the economics: show time saved by early fixes, demonstrate deployment confidence, track defect reduction, and measure cycle time improvement. If leadership demands features over quality, you’re not empowered to do CI.

The Cultural Shift

This practice represents a fundamental change:

From: “Individual productivity” To: “Team effectiveness”

From: “Ship features at all costs” To: “Maintain quality while shipping features”

From: “Move fast and break things” To: “Move fast by not breaking things”

This shift is uncomfortable but essential for sustainable high performance.

Metrics

  • Time to fix: Time from build failure to green build. Target < 15 minutes median, < 1 hour average.
  • Stop rate: Percentage of build failures that trigger full stop. Target 100%.
  • Failure frequency: Build failures per week. Should decrease over time.

Track patterns in why builds break (flaky tests, missing pre-merge tests, environment differences, integration issues) to identify systemic improvement opportunities.

Additional Resources

2 - Only Path to Any Environment

All deployments must go through a single automated pipeline. Learn why using one deployment path for all environments improves reliability, security, and continuous delivery practices.

Definition

The deployment pipeline is the single, standardized path for all changes to reach any environment—development, testing, staging, or production. No manual deployments, no side channels, no “quick fixes” bypassing the pipeline. If it’s not deployed through the pipeline, it doesn’t get deployed.

Key principles:

  1. Single path: All deployments flow through the same pipeline
  2. No exceptions: Even hotfixes and rollbacks go through the pipeline
  3. Automated: Deployment is triggered automatically after pipeline validation
  4. Auditable: Every deployment is tracked and traceable
  5. Consistent: The same process deploys to all environments

Why This Matters

Multiple Deployment Paths Create Serious Risks

  • Quality issues: Bypassing the pipeline bypasses quality checks
  • Configuration drift: Manual deployments create inconsistencies between environments
  • Security vulnerabilities: Undocumented changes escape security review
  • Debugging nightmares: “What’s actually running in production?”
  • Compliance violations: Audit trails break when changes bypass the pipeline
  • Lost confidence: Teams lose trust in the pipeline and resort to manual interventions

A Single Deployment Path Provides

  • Reliability: Every deployment is validated the same way
  • Traceability: Clear audit trail from commit to production
  • Consistency: Environments stay in sync
  • Speed: Automated deployments are faster than manual
  • Safety: Quality gates are never bypassed
  • Confidence: Teams trust that production matches what was tested
  • Recovery: Rollbacks are as reliable as forward deployments

What “Single Path” Means

One Merge Pattern for All Changes

Direct Trunk Integration: all work integrates directly to trunk using the same process.

trunk ← features
trunk ← bugfixes
trunk ← hotfixes

Anti-pattern Examples

  1. Integration Branch
trunk → integration ← features

This creates TWO merge structures instead of one:

  1. When trunk changes → merge to integration branch immediately
  2. When features change → merge to integration branch at least daily

The integration branch lives a parallel life to the trunk, acting as a temporary container for partially finished features. This attempts to “mimic” feature toggles to keep inactive features out of production.

Why This Violates Single-Path
  • Creates multiple merge patterns (trunk→integration AND features→integration)
  • Integration branch becomes a second “trunk” with different rules
  • Adds complexity: “Is this change ready for integration or trunk?”
  • Defeats the purpose: Use actual feature flags instead of mimicking them with branches
  • Accumulates “given-up” features that stay unfinished forever
  • Delays true integration: Features are integrated to integration branch but not to trunk
  1. GitFlow (Multiple Long-Lived Branches)
master (production)
  ↓
develop (integration)
  ↓
feature branches → develop
  ↓
release branches → master
  ↓
hotfix branches → master → develop

GitFlow creates MULTIPLE merge patterns depending on change type:

  • Features: feature → develop → release → master
  • Hotfixes: hotfix → master AND hotfix → develop
  • Releases: develop → release → master
Why This Violates Single-Path
  • Different types of changes follow different paths to production
  • Multiple long-lived branches (master, develop, release) create merge complexity
  • Hotfixes have a different path than features (bypassing develop)
  • Release branches delay integration and create batch deployments
  • Merge conflicts multiply across multiple integration points
  • Violates continuous integration principle (changes don’t integrate daily to trunk)
  • Forces “release” to be a special event rather than continuous deployment

The Correct Approach: Trunk-Based Development with Integration Patterns

Option 1: Feature Flags

For incomplete features that need to be hidden:

// Feature code lives in trunk, controlled by flags
if (featureFlags.newCheckout) {
  return renderNewCheckout()
}
return renderOldCheckout()
Option 2: Branch by Abstraction

For behavior changes:

// Old behavior behind abstraction
class PaymentProcessor {
  process() {
    // Gradually replace implementation while maintaining interface
  }
}
Option 3: Connect Tests Last

For new features:

// Build new feature code, integrate to trunk
// Connect to UI/API only in final commit
function newCheckoutFlow() {
  // Complete implementation ready
}

// Final commit: wire it up
;<button onClick={newCheckoutFlow}>Checkout</button>
Option 4: Dark Launch

For new API routes:

// New API route exists but isn't exposed
router.post('/api/v2/checkout', newCheckoutHandler)

// Final commit: update client to use new route

All code integrates to trunk using ONE merge pattern. Incomplete features are managed through these patterns, not through separate integration branches.

For guidance on when to use each pattern, see Feature Flags.

All Environments Use the Same Pipeline

The same pipeline deploys to every environment, including hotfixes and rollbacks:

Commit → Pipeline → Dev → Test → Staging → Production

Anti-Patterns to Avoid

  • SSH into server and copy files
  • Upload through FTP/SFTP
  • Run scripts directly on production servers
  • Use separate “emergency deployment” process
  • Manual database changes in production
  • Different deployment processes for different environments

Example Implementations

Anti-Pattern: Multiple Deployment Paths

Normal: Developer → Push to Git → Pipeline → Staging
Hotfix: Developer → SSH to prod → Apply patch directly
Database: DBA → SQL client → Run scripts manually
Config: Ops → Edit files on server → Restart service

Problem: No consistency, no audit trail, no validation. Production becomes a mystery box.

Good Pattern: Single Pipeline for Everything

# .github/workflows/deploy.yml
name: Deployment Pipeline

on:
  push:
    branches: [main]
  workflow_dispatch: # Manual trigger for rollbacks

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm ci
      - run: npm test
      - run: npm run lint
      - run: npm run security-scan

  build:
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - run: npm run build
      - run: docker build -t app:${{ github.sha }} .
      - run: docker push app:${{ github.sha }}

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: kubectl set image deployment/app app=app:${{ github.sha }}
      - run: kubectl rollout status deployment/app

  smoke-test:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - run: npm run smoke-test:staging

  deploy-production:
    needs: smoke-test
    runs-on: ubuntu-latest
    steps:
      - run: kubectl set image deployment/app app=app:${{ github.sha }}
      - run: kubectl rollout status deployment/app

Benefit: Every deployment—normal, hotfix, or rollback—uses this pipeline. Consistent, validated, traceable.

Common Patterns

Environment Promotion

Deploy the same artifact through progressive environments:

Build Artifact (v1.2.3)
  ↓
Deploy to Dev → Validate
  ↓
Deploy to Test → Validate
  ↓
Deploy to Staging → Validate
  ↓
Deploy to Production

Fast-Track Pipeline for Emergencies

Keep the same path, but optimize for speed when needed:

deploy-hotfix:
  if: github.event.inputs.hotfix == 'true'
  steps:
    - run: npm test -- --fast # Run critical tests only
    - run: npm run build
    - run: deploy --target=production --skip-staging
    - run: smoke-test --production

Rollback via Pipeline

Rollbacks should be faster than forward deployments:

# Trigger rollback via pipeline (skips build/test, already validated)
gh workflow run deploy.yml -f version=v1.2.2 -f rollback=true

Database Migrations

All database changes flow through the pipeline:

deploy:
  steps:
    - name: Run database migrations
      run: |
        npm run db:migrate
        npm run db:validate
    - name: Deploy application
      run: kubectl apply -f deployment.yaml
    - name: Verify deployment
      run: kubectl rollout status deployment/app

Database Change Requirements

  • Backward-compatible (new code works with old schema)
  • Forward-deployable (migrations are additive)
  • Automated (migrations run in pipeline)

This allows rolling back application code without rolling back schema.

FAQ

What if the pipeline is broken and we need to deploy a critical fix?

Fix the pipeline first. If your pipeline is so fragile that it can’t deploy critical fixes, that’s a pipeline problem, not a process problem. Invest in pipeline reliability.

What about emergency hotfixes that can’t wait for the full pipeline?

The pipeline should be fast enough to handle emergencies. If it’s not, optimize the pipeline. A “fast-track” mode that skips some tests is acceptable (see Common Patterns above), but it must still be the same pipeline, not a separate manual process.

Can we manually patch production “just this once”?

No. “Just this once” becomes “just this once again.” Manual production changes always create problems. Commit the fix, push through the pipeline, deploy.

What if deploying through the pipeline takes too long?

Optimize your pipeline:

  1. Parallelize tests
  2. Use faster test environments
  3. Implement progressive deployment (canary, blue-green)
  4. Cache dependencies
  5. Optimize build times

A well-optimized pipeline should deploy to production in under 30 minutes.

Can operators make manual changes for maintenance?

Infrastructure maintenance (patching servers, scaling resources) is separate from application deployment. However, application deployment must still only happen through the pipeline.

Health Metrics

  • Pipeline deployment rate: Should be 100% (all deployments go through pipeline)
  • Manual override rate: Should be 0%
  • Hotfix pipeline time: Should be < 30 minutes
  • Rollback success rate: Should be > 99%
  • Deployment frequency: Should increase over time as confidence grows

Additional Resources

3 - Deterministic Pipeline

Deterministic pipelines produce consistent results for the same inputs. Learn how to build reliable CI/CD pipelines with locked dependencies and reproducible builds for continuous delivery.

Definition

A deterministic pipeline produces consistent, repeatable results. Given the same inputs (code, configuration, dependencies), the pipeline will always produce the same outputs and reach the same pass/fail verdict. The pipeline’s decision on whether a change is releasable is definitive—if it passes, deploy it; if it fails, fix it.

Key principles:

  1. Repeatable: Running the pipeline twice with identical inputs produces identical results
  2. Authoritative: The pipeline is the final arbiter of quality, not humans
  3. Immutable: No manual changes to artifacts or environments between pipeline stages
  4. Trustworthy: Teams trust the pipeline’s verdict without second-guessing

Why This Matters

Non-deterministic pipelines create serious problems:

  • False confidence: Tests pass inconsistently, hiding real issues
  • Wasted time: Debugging “flaky” tests instead of delivering value
  • Trust erosion: Teams stop trusting the pipeline and add manual gates
  • Slow feedback: Re-running tests to “see if they pass this time”
  • Quality degradation: Real failures get dismissed as “just flaky tests”

Deterministic pipelines provide:

  • Confidence: Pipeline results are reliable and meaningful
  • Speed: No need to re-run tests or wait for manual verification
  • Clarity: Pass means deploy, fail means fix—no ambiguity
  • Quality: Every failure represents a real issue that must be addressed

What Makes a Pipeline Deterministic

Version Control Everything

All pipeline inputs must be version controlled:

  • Source code (obviously)
  • Infrastructure as code (Terraform, CloudFormation, etc.)
  • Pipeline definitions (GitHub Actions, Jenkins files, etc.)
  • Test data (fixtures, mocks, seeds)
  • Configuration (app config, test config)
  • Dependency lockfiles (package-lock.json, Gemfile.lock, go.sum, Cargo.lock, poetry.lock, etc.)
  • Build scripts (Make, npm scripts, etc.)

Critical: Always commit lockfiles to version control. This ensures every pipeline run uses identical dependency versions.

Eliminate Environmental Variance

The pipeline must control its environment:

  • Container-based builds: Use Docker with specific image tags (e.g., node:18.17.1, never node:latest)
  • Isolated test environments: Each pipeline run gets a clean, isolated environment
  • Exact dependency versions: Always use lockfiles (package-lock.json, go.sum, etc.) and install with --frozen-lockfile or equivalent
  • Controlled timing: Don’t rely on wall-clock time or race conditions
  • Deterministic randomness: Seed random number generators for reproducibility

Recommended Practice: Never use floating version tags like latest, stable, or version ranges like ^1.2.3. Always pin to exact versions.

Remove Human Intervention

Manual steps break determinism:

  • No manual approvals in the critical path (use post-deployment verification instead)
  • No manual environment setup (automate environment provisioning)
  • No manual artifact modifications (artifacts are immutable after build)
  • No manual test data manipulation (generate or restore from version control)

Fix Flaky Tests Immediately

Flaky tests destroy determinism:

  • All feature work stops when tests become flaky
  • Root cause and fix flaky tests immediately—don’t just retry
  • Quarantine pattern: Move flaky tests to quarantine, fix them, then restore
  • Monitor flakiness: Track test stability metrics

Example Implementations

Anti-Pattern: Non-Deterministic Pipeline

# Bad: Uses floating versions
dependencies:
  nodejs: "latest"
  postgres: "14"  # No minor/patch version

# Bad: Relies on external state
test:
  - curl https://api.example.com/test-data
  - run_tests --use-production-data

# Bad: Time-dependent tests
test('shows current date', () => {
  expect(getDate()).toBe(new Date())  # Fails at midnight!
})

# Bad: Manual steps
deploy:
  - echo "Manually verify staging before approving"
  - wait_for_approval

Problem: Results vary based on when the pipeline runs, what’s in production, which dependency versions are “latest,” and human availability.

Good Pattern: Deterministic Pipeline

# Good: Pinned versions
dependencies:
  nodejs: "18.17.1"
  postgres: "14.9"

# Good: Version-controlled test data
test:
  - docker-compose up -d
  - ./scripts/seed-test-data.sh  # From version control
  - npm run test

# Good: Deterministic time handling
test('shows date', () => {
  const mockDate = new Date('2024-01-15')
  jest.useFakeTimers().setSystemTime(mockDate)
  expect(getDate()).toBe(mockDate)
})

# Good: Automated verification
deploy:
  - deploy_to_staging
  - run_smoke_tests
  - if: smoke_tests_pass
    deploy_to_production

Benefit: Same inputs always produce same outputs. Pipeline results are trustworthy and reproducible.

What is Improved

  • Quality increases: Real issues are never dismissed as “flaky tests”
  • Speed increases: No time wasted on test reruns or manual verification
  • Trust increases: Teams rely on the pipeline instead of adding manual gates
  • Debugging improves: Failures are reproducible, making root cause analysis easier
  • Collaboration improves: Shared confidence in the pipeline reduces friction
  • Delivery improves: Faster, more reliable path from commit to production

Common Patterns

Immutable Build Containers

Use specific container images for builds:

# Dockerfile.build - version controlled
FROM node:18.17.1-alpine3.18

RUN apk add --no-cache \
    python3=3.11.5-r0 \
    make=4.4.1-r1

WORKDIR /app
COPY package-lock.json .
RUN npm ci --frozen-lockfile

Hermetic Test Environments

Isolate each test run:

# GitHub Actions
jobs:
  test:
    runs-on: ubuntu-22.04
    services:
      postgres:
        image: postgres:14.9
        env:
          POSTGRES_DB: testdb
          POSTGRES_PASSWORD: testpass
    steps:
      - uses: actions/checkout@v3
      - run: npm ci
      - run: npm test
      # Each workflow run gets a fresh database

Always use dependency lockfiles - this is essential for deterministic builds:

// package-lock.json (ALWAYS commit to version control)
{
  "dependencies": {
    "express": {
      "version": "4.18.2",
      "resolved": "https://registry.npmjs.org/express/-/express-4.18.2.tgz",
      "integrity": "sha512-5/PsL6iGPdfQ/..."
    }
  }
}

Never:

  • Use npm install in CI (use npm ci instead)
  • Add lockfiles to .gitignore
  • Use version ranges in production dependencies (^, ~, >=)
  • Rely on “latest” tags for any dependency

Quarantine for Flaky Tests

Temporarily isolate flaky tests:

// tests/quarantine/flaky-test.spec.js
describe.skip('Quarantined: Flaky Test', () => {
  // This test is quarantined due to flakiness
  // GitHub Issue: #1234
  // Will be fixed and restored by: 2024-02-01
  it('should work consistently', () => {
    // Test code
  })
})

FAQ

What if a test is occasionally flaky but hard to reproduce?

This is still a problem. Flaky tests indicate either:

  1. A real bug in your code (race conditions, etc.)
  2. A problem with your test (dependencies on external state)

Both need to be fixed. Quarantine the test, investigate thoroughly, and fix the root cause.

Can we use retries to handle flaky tests?

Retries mask problems rather than fixing them. A test that passes on retry is hiding a failure, not succeeding. Fix the flakiness instead of retrying.

What about tests that depend on external services?

Use test doubles (mocks, stubs, fakes) for external dependencies. If you must test against real external services, use contract tests and ensure those services are version-controlled and deterministic too.

How do we handle tests that involve randomness?

Seed your random number generators with a fixed seed in tests:

// Deterministic randomness
const rng = new Random(12345) // Fixed seed
const result = shuffle(array, rng)
expect(result).toEqual([3, 1, 4, 2]) // Predictable

What if our deployment requires manual verification?

Manual verification can happen after deployment, not before. Deploy automatically based on pipeline results, then verify. If verification fails, roll back automatically.

Should the pipeline ever be non-deterministic?

There are rare cases where controlled non-determinism is useful (chaos engineering, fuzz testing), but these should be:

  1. Explicitly designed and documented
  2. Separate from the core deployment pipeline
  3. Reproducible via saved seeds/inputs

Health Metrics

  • Test flakiness rate: Should be < 1% (ideally 0%)
  • Pipeline consistency: Same commit should pass/fail consistently across runs
  • Time to fix flaky tests: Should be < 1 day
  • Manual override rate: Should be near zero

Additional Resources

4 - Definition of Deployable

Define clear deployment standards with automated quality gates. Learn how to establish your definition of deployable with security, testing, and compliance checks in your CI/CD pipeline.

Definition

The “definition of deployable” is your organization’s agreed-upon set of non-negotiable quality criteria that every artifact must pass before it can be deployed to any environment. This definition should be automated, enforced by the pipeline, and treated as the authoritative verdict on whether a change is ready for deployment.

Key principles:

  1. Pipeline is definitive: If the pipeline passes, the artifact is deployable—no exceptions
  2. Automated validation: All criteria are checked automatically, not manually
  3. Consistent across environments: The same standards apply whether deploying to test or production
  4. Fails fast: The pipeline rejects artifacts that don’t meet the standard immediately

Why This Matters

Without a clear, automated definition of deployable, teams face:

  • Inconsistent quality standards: Different people have different opinions on “ready”
  • Manual gatekeeping: Deployment approvals become bottlenecks
  • Surprise failures: Issues that should have been caught earlier appear in production
  • Blame culture: Unclear accountability when problems arise
  • Deployment fear: Uncertainty about readiness causes risk aversion

A strong definition of deployable creates:

  • Confidence: Everyone trusts that pipeline-approved artifacts are safe
  • Speed: No waiting for manual approvals or meetings
  • Clarity: Unambiguous standards for the entire team
  • Accountability: The pipeline (and the team that maintains it) owns quality

What Should Be in Your Definition

Your definition of deployable should include automated checks for:

Security

  • Static security scans (SAST) pass
  • Dependency vulnerability scans show no critical issues
  • Secrets are not embedded in code
  • Authentication/authorization tests pass

Functionality

  • All unit tests pass
  • Integration tests pass
  • End-to-end tests pass
  • Regression tests pass
  • Business logic behaves as expected

Compliance

  • Code meets regulatory requirements
  • Audit trails are in place
  • Required documentation is generated
  • Compliance tests pass

Performance

  • Response time meets thresholds
  • Resource usage is within acceptable limits
  • Load tests pass
  • No memory leaks detected

Reliability

  • Error rates are within acceptable bounds
  • Circuit breakers and retries work correctly
  • Graceful degradation is in place
  • Health checks pass

Code Quality

  • Code style/linting checks pass
  • Code coverage meets minimum threshold
  • Static analysis shows no critical issues
  • Technical debt is within acceptable limits

Example Implementations

Anti-Pattern: Manual Approval Process

Developer: "I think this is ready to deploy"
QA: "Let me manually test it again"
Manager: "Looks good, but wait for the CAB meeting Thursday"
Ops: "We need to review the deployment plan first"

Problem: Manual steps delay feedback, introduce inconsistency, and reduce confidence.

Good Pattern: Automated Pipeline Gates

# .github/workflows/cd-pipeline.yml
name: CD Pipeline

on: [push]

jobs:
  validate-deployable:
    steps:
      - name: Run unit tests
        run: npm test

      - name: Run security scan
        run: npm audit --audit-level=high

      - name: Run integration tests
        run: npm run test:integration

      - name: Check code coverage
        run: npm run test:coverage -- --threshold=80
        # Note: This is a team-defined quality gate, not an org-wide mandate.
        # See https://dojoconsortium.org/docs/metrics/code-coverage/

      - name: Run E2E tests
        run: npm run test:e2e

      - name: Performance tests
        run: npm run test:perf

      - name: Build artifact
        if: success()
        run: npm run build

      - name: Mark as deployable
        if: success()
        run: echo "Artifact meets definition of deployable"

Benefit: Every commit is automatically validated against all criteria. If it passes, it’s deployable.

What is Improved

  • Removes bottlenecks: No waiting for manual approval meetings
  • Increases quality: Automated checks catch more issues than manual reviews
  • Reduces cycle time: Deployable artifacts are identified in minutes, not days
  • Improves collaboration: Shared understanding of quality standards
  • Enables continuous delivery: Trust in the pipeline makes frequent deployments safe
  • Reduces stress: Clear criteria eliminate guesswork and blame

Common Patterns

Progressive Quality Gates

Structure your pipeline to fail fast on quick checks, then run expensive tests:

Stage 1: Fast Feedback (< 5 min)
  ├─ Linting
  ├─ Unit tests
  └─ Security scan

Stage 2: Integration (< 15 min)
  ├─ Integration tests
  ├─ Database migrations
  └─ API contract tests

Stage 3: Comprehensive (< 30 min)
  ├─ E2E tests
  ├─ Performance tests
  └─ Compliance checks

Context-Specific Definitions

Some criteria may vary by context:

# Base definition (always required)
base_deployable:
  - unit_tests: pass
  - security_scan: pass
  - code_coverage: >= 80%

# Production-specific (additional requirements)
production_deployable:
  - load_tests: pass
  - disaster_recovery_tested: true
  - runbook_updated: true

# Feature branch (relaxed for experimentation)
feature_deployable:
  - unit_tests: pass
  - security_scan: no_critical

Error Budget Approach

Use error budgets to balance speed and reliability:

definition_of_deployable:
  error_budget_remaining: > 0
  slo_compliance: >= 99.9%
  recent_incidents: < 2 per week

If error budget is exhausted, focus shifts to reliability work instead of new features.

FAQ

Who decides what goes in the definition of deployable?

The entire team—developers, QA, operations, security, and product—should collaboratively define these standards. It should reflect genuine risks and requirements, not arbitrary bureaucracy.

What if the pipeline passes but we find a bug in production?

This indicates a gap in your definition of deployable. Add a test to catch that class of bug in the future. The definition should evolve based on production learnings.

Can we skip pipeline checks for “urgent” hotfixes?

No. If the pipeline can’t validate a hotfix quickly enough, that’s a problem with your pipeline, not your process. Fix the pipeline, don’t bypass it. Bypassing quality checks for “urgent” changes is how critical bugs reach production.

How strict should our definition be?

Strict enough to prevent production incidents, but not so strict that it becomes a bottleneck. If your pipeline rejects 90% of commits, your standards may be too rigid. If production incidents are frequent, your standards may be too lax.

Should manual testing be part of the definition?

Manual exploratory testing is valuable for discovering edge cases, but it should inform the definition, not be the definition. Automate the validations that result from manual testing discoveries.

What about things we can’t test automatically?

Some requirements (like UX polish or accessibility) are harder to automate fully. For these:

  1. Automate what you can (e.g., accessibility checkers, visual regression tests)
  2. Make manual checks lightweight and concurrent, not blockers
  3. Continuously work to automate more

Health Metrics

  • Pipeline pass rate: Should be 70-90% (too high = tests too lax, too low = tests too strict)
  • Pipeline execution time: Should be < 30 minutes for full validation
  • Production incident rate: Should decrease over time as definition improves
  • Manual override rate: Should be near zero (manual overrides indicate broken process)

Additional Resources

5 - Immutable Artifact

Immutable artifacts are built once and deployed unchanged to all environments. Learn why artifact immutability is essential for reliable continuous delivery and deployment consistency.

Central to CD is that we are validating the artifact with the pipeline. It is built once and deployed to all environments. A common anti-pattern is building an artifact for each environment. The pipeline should generate immutable, versioned artifacts.

Definition

  • Immutable Pipeline: In the beginning, it may seem that the obvious way to address a failure in the pipeline is to go to the failure point, make some adjustments in the environment, test data, or whatever else failed, and then to re-start the pipeline from that point. However, that transforms a repeatable quality process into an untrustworthy custom build. Failures should be addressed by changes in version control so that two executions with the same configuration will always yield the same results.
  • Immutable Artifacts: Some package management systems will allow the creation of release candidate versions. For example, it is common to find -SNAPSHOT versions used for this in Java. However, this means we have an artifact where the behavior can be changed without modifying the version. Version numbers are cheap. If we are to have an immutable pipeline, it must produce an immutable artifact. We should never have dependencies that use -SNAPSHOT versions and we should never produce -SNAPSHOT versions.

Immutability provides us with the confidence to know that the results from the pipeline are real and repeatable.

What is Improved

  • Everything must be version controlled: source code, environment configurations, application configurations, and even test data. This reduces variability and improves the quality process.

6 - Prod-Like Test Environment

Production-like test environments catch bugs early and reduce deployment risk. Learn how to create reliable test environments that mirror production for better software quality assurance.

Definition

It is crucial to leverage pre-production environments in your CI/CD to run all of your tests (Unit / Integration / UAT / Manual QA / E2E) early and often. Test environments increase interaction with new features and exposure to bugs – both of which are important prerequisites for reliable software.

Example Implementations

There are different types of pre-production test environments. Most organizations will employ both static and short-lived environments and utilize them for case-specific stages of the SDLC.

  • Staging environment: Ideally, this is the last environment that teams will run automated tests against prior to deployment, particularly for testing interaction between all new features after a merge. Its infrastructure will reflect production as closely as possible.
  • Ephemeral environments (collected from EphemeralEnvironments.io): These are full-stack, on-demand environments that are spun up on every code change. Each ephemeral environment should be leveraged in your pipeline, which will run E2E, unit, and integration tests against them on every code change. These environments are defined in version control and created and destroyed automatically on demand. They are short-lived by definition but should closely resemble production; they are intended to replace long-lived “static” environments and the maintenance required to keep those stable, i.e., “development,” “QA1”, “QA2”, “testing,” etc.

What is Improved

  • Infrastructure is kept consistent: Test environments deliver results that reflect real-world performance. Few unprecedented bugs sneak into production since using prod-like data and dependencies allows you to run your entire test suite earlier against multiple prod-like environments.
  • Test against latest changes: These environments will rebuild upon code changes with no manual intervention.
  • Test before merge: Attaching an ephemeral environment to every PR enables E2E testing in your CI before code changes get deployed to staging. New features get tested in parallel, avoiding the dreaded “waiting to run my tests” blocking your entire SDLC.

7 - Rollback On-demand

Fast, safe rollback capability is essential for continuous delivery. Learn rollback strategies including blue-green deployments, canary releases, and feature flags for reliable production deployments.

Definition

Rollback on-demand means the ability to quickly and safely revert to a previous working version of your application at any time, without requiring special approval, manual intervention, or complex procedures. It should be as simple and reliable as deploying forward.

Key principles:

  1. Fast: Rollback completes in minutes, not hours
  2. Automated: No manual steps or special procedures
  3. Safe: Rollback is validated just like forward deployment
  4. Simple: Single command or button click initiates rollback
  5. Tested: Rollback mechanism is regularly tested, not just used in emergencies

Why This Matters

Without reliable rollback capability:

  • Fear of deployment: Teams avoid deploying because failures are hard to recover from
  • Long incident resolution: Hours wasted debugging instead of immediately reverting
  • Customer impact: Users suffer while teams scramble to fix issues
  • Pressure to “fix forward”: Teams rush incomplete fixes instead of safely rolling back
  • Deployment delays: Risk aversion slows down release cycles

With reliable rollback:

  • Deployment confidence: Knowing you can roll back reduces fear
  • Fast recovery: Minutes to restore service instead of hours
  • Reduced risk: Bad deployments have minimal customer impact
  • Better decisions: Teams can safely experiment and learn
  • Higher deployment frequency: Confidence enables more frequent releases

What “Rollback On-demand” Means

Rollback is a Deployment

Rolling back means deploying a previous artifact version through your standard pipeline:

Current: v1.2.3 (has bug)
  ↓
Trigger rollback to v1.2.2
  ↓
Pipeline deploys artifact v1.2.2
  ↓
Service restored

Not this:

Current: v1.2.3 (has bug)
  ↓
SSH into servers
  ↓
Manually revert code changes
  ↓
Restart services
  ↓
Hope it works

Rollback is Tested

Rollback mechanisms should be tested regularly, not just during incidents:

  • Practice rollbacks during non-critical times
  • Include rollback tests in your pipeline
  • Time your rollback to ensure it meets SLAs
  • Verify rollback doesn’t break anything

Rollback is Fast

Rollback should be faster than forward deployment:

  • Skip build stage (artifact already exists)
  • Skip test stage (artifact was already tested)
  • Go straight to deployment with previous artifact

Target: < 5 minutes from rollback decision to service restored.

Rollback is Safe

Rollback should:

  • Deploy through the same pipeline (not a manual process)
  • Run smoke tests to verify the rollback worked
  • Update monitoring and alerts
  • Maintain audit trail

Example Implementations

Anti-Pattern: Manual Rollback Process

1. Identify the problem (10 minutes)
2. Find someone with production access (15 minutes)
3. SSH into each server (5 minutes)
4. Find the previous version files (10 minutes)
5. Stop the service (2 minutes)
6. Copy old files (5 minutes)
7. Restart the service (3 minutes)
8. Hope nothing else broke (???)

Total: ~50 minutes + stress + risk

Problem: Slow, manual, error-prone, no validation.

Good Pattern: Automated Rollback

# .github/workflows/rollback.yml
name: Rollback

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version to roll back to'
        required: true
        type: string

jobs:
  rollback:
    runs-on: ubuntu-latest
    steps:
      - name: Validate version exists
        run: |
          docker manifest inspect app:${{ inputs.version }}

      - name: Deploy previous version
        run: |
          kubectl set image deployment/app \
            app=app:${{ inputs.version }}
          kubectl rollout status deployment/app

      - name: Run smoke tests
        run: |
          npm run smoke-test:production

      - name: Notify team
        if: success()
        run: |
          slack-notify "Rolled back to ${{ inputs.version }}"

      - name: Rollback failed
        if: failure()
        run: |
          slack-notify "Rollback to ${{ inputs.version }} failed!"

Usage:

# Single command to roll back
gh workflow run rollback.yml -f version=v1.2.2

# Total time: ~3 minutes

Benefit: Fast, automated, validated, audited.

What is Improved

  • Mean Time To Recovery (MTTR): Drops from hours to minutes
  • Deployment frequency: Increases due to reduced risk
  • Team confidence: Higher willingness to deploy
  • Customer satisfaction: Faster incident resolution
  • Learning: Teams can safely experiment
  • On-call burden: Reduced stress for on-call engineers

Common Patterns

Blue-Green Deployment

Maintain two identical environments:

Blue (current): v1.2.3
Green (idle): v1.2.2

Issue detected
  ↓
Switch traffic to Green (v1.2.2)
  ↓
Instant rollback (< 30 seconds)

Canary Rollback

Roll back gradually:

Deploy v1.2.3 to 10% of servers
  ↓
Issue detected in monitoring
  ↓
Automatically roll back 10% to v1.2.2
  ↓
Issue contained, minimal impact

Feature Flag Rollback

Disable problematic features without redeploying:

// Feature flag controls new feature
if (featureFlags.isEnabled('new-checkout')) {
  return renderNewCheckout()
}
return renderOldCheckout()

// Rollback: Toggle flag off via config
// No deployment needed, instant effect

Database-Safe Rollback

Design schema changes to support rollback:

-- Good: Additive change
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Old code ignores new column
-- New code uses new column
-- Rolling back code doesn't break

-- Bad: Breaking change
ALTER TABLE users DROP COLUMN email;
-- Old code breaks if email is removed
-- Rollback requires schema rollback (risky!)

Use expand-contract pattern:

  1. Expand: Add new column (both versions work)
  2. Migrate: Start using new column
  3. Contract: Remove old column (later, when safe)

Artifact Registry Retention

Keep previous artifacts available:

# Docker registry retention policy
artifact_retention:
  keep_last_n_versions: 10
  keep_production_versions: forever
  cleanup_after: 90_days

Ensures you can always roll back to recent versions.

FAQ

How far back should we be able to roll back?

Minimum: Last 3-5 production releases. Ideally: Any production release from the past 30-90 days. Balance storage costs with rollback flexibility.

What if the database schema changed?

Design schema changes to be backward-compatible:

  • Use expand-contract pattern
  • Make schema changes in separate deployment from code changes
  • Test that old code works with new schema

What if we need to roll back the database too?

Database rollbacks are risky. Instead:

  1. Design schema changes to support rollback (backward compatibility)
  2. Use feature flags to disable code using new schema
  3. If absolutely necessary, have tested database rollback scripts

Should rollback require approval?

For production: On-call engineer should be empowered to roll back immediately without approval. Speed of recovery is critical. Post-rollback review is appropriate, but don’t delay the rollback.

How do we test rollback?

  1. Practice regularly: Perform rollback drills during low-traffic periods
  2. Automate testing: Include rollback in your pipeline tests
  3. Use staging: Test rollback in staging before production deployments
  4. Chaos engineering: Randomly trigger rollbacks to ensure they work

What if rollback fails?

Have a rollback-of-rollback plan:

  1. Roll forward to the next known-good version
  2. Use feature flags to disable problematic features
  3. Have out-of-band deployment method (last resort)

But if rollback is regularly tested, failures should be rare.

How long should rollback take?

Target: < 5 minutes from decision to service restored.

Breakdown:

  • Trigger: < 30 seconds
  • Deploy: 2-3 minutes
  • Verify: 1-2 minutes

What about configuration changes?

Configuration should be versioned with the artifact. Rolling back the artifact rolls back the configuration. See Application Configuration.

Health Metrics

  • Rollback success rate: Should be > 99%
  • Mean Time To Rollback (MTTR): Should be < 5 minutes
  • Rollback test frequency: At least monthly
  • Rollback usage: Track how often rollback is used (helps justify investment)
  • Failed rollback incidents: Should be nearly zero

Additional Resources

8 - Application Configuration

Application configuration should deploy with your artifact, not vary by environment. Learn how to separate app config from environment config for reliable continuous delivery.

Definition

Application configuration defines the internal behavior of your application and is bundled with the artifact. It does not vary between environments. This is distinct from environment configuration (secrets, URLs, credentials) which varies by deployment.

We embrace The Twelve-Factor App config definitions:

  • Application Configuration: Internal to the app, does NOT vary by environment (feature flags, business rules, UI themes, default settings)
  • Environment Configuration: Varies by deployment (database URLs, API keys, service endpoints, credentials)

Application configuration should be:

  1. Version controlled with the source code
  2. Deployed as part of the immutable artifact
  3. Testable in the CI pipeline
  4. Unchangeable after the artifact is built

Why This Matters

Separating application configuration from environment configuration provides several critical benefits:

  • Immutability: The artifact tested in staging is identical to what runs in production
  • Traceability: You can trace any behavior back to a specific commit
  • Testability: Application behavior can be validated in the pipeline before deployment
  • Reliability: No configuration drift between environments caused by manual changes

Example Implementations

Anti-Pattern: External Application Config

# Stored in external config service, modified after build
feature_flags:
  new_checkout_flow: true
  payment_processor: 'stripe'
business_rules:
  max_cart_items: 100
  discount_threshold: 50.00

Problem: Changes to this config after build mean the artifact behavior is untested and unpredictable.

Good Pattern: Bundled Application Config

# config/application.yml - bundled with artifact
feature_flags:
  new_checkout_flow: true
  payment_processor: 'stripe'
business_rules:
  max_cart_items: 100
  discount_threshold: 50.00
# Environment-specific - injected at runtime via env vars
environment:
  database_url: ${DATABASE_URL}
  stripe_api_key: ${STRIPE_API_KEY}
  log_level: ${LOG_LEVEL}

Benefit: Application behavior is locked at build time; only environment-specific values change.

What is Improved

  • Confidence in testing: When the pipeline passes, you know the exact behavior that will run in production
  • Faster rollback: Rolling back an artifact rolls back all application configuration changes
  • Audit trail: Every configuration change is in version control with commit history
  • Reduced deployment risk: No surprises from configuration changes made outside the pipeline
  • Better collaboration: Developers, QA, and operations all see the same configuration

Common Patterns

Feature Flags (Release Control)

Feature flags come in two flavors, and understanding the distinction is critical:

Static Feature Flags (Application Configuration)

Bundled with the artifact - These are application configuration:

// config/features.json (bundled with artifact)
{
  "features": {
    "new_dashboard": {
      "enabled": true,
      "rollout_percentage": 25
    }
  }
}
  • Flag definitions are in version control
  • Deployed with the artifact
  • Changing flags requires a new deployment
  • Pipeline tests validate flag behavior
  • Use case: Long-lived flags, kill switches, A/B test definitions

Dynamic Feature Flags (Environment Configuration)

External service - These are NOT application configuration:

// Application code reads from external service at runtime
const flags = await featureFlagService.getFlags({
  user: currentUser,
  environment: 'production',
})

if (flags.newDashboard) {
  return renderNewDashboard()
}
  • Flag state stored in external service (LaunchDarkly, Split.io, etc.)
  • Changed without redeployment
  • Different per environment (dev/staging/production)
  • Use case: Real-time experimentation, emergency kill switches, gradual rollouts

Which should you use?

  • Static flags: When you want config changes tested in pipeline
  • Dynamic flags: When you need real-time control without deployment

Business Rules

validation_rules:
  password_min_length: 12
  session_timeout_minutes: 30
  max_login_attempts: 5

These rules should be tested in the pipeline and deployed with the code.

Service Discovery

# Application config - service relationships
services:
  payment_service_name: 'payment-api'
  user_service_name: 'user-api'

# Environment config - actual endpoints (injected)
service_mesh_url: ${SERVICE_MESH_URL}

FAQ

How do I change application config for a specific environment?

You shouldn’t. If behavior needs to vary by environment, it’s environment configuration (injected via environment variables or secrets management). Application configuration is the same everywhere.

What if I need to hotfix a config value in production?

If it’s truly application configuration, make the change in code, commit it, let the pipeline validate it, and deploy the new artifact. Hotfixing config outside the pipeline defeats the purpose of immutable artifacts.

Can feature flags be application configuration?

It depends on the type:

Static feature flags (bundled with artifact): YES, these are application configuration

  • Flag definitions and states in version control
  • Deployed with the artifact
  • Changes require redeployment through pipeline

Dynamic feature flags (external service): NO, these are environment configuration

  • Flag states stored externally (LaunchDarkly, Split.io, etc.)
  • Changed without redeployment
  • Different per environment
  • Not tested by pipeline before changes take effect

Both are valid patterns serving different needs. Static flags ensure pipeline validation; dynamic flags enable real-time experimentation.

What about config that changes frequently?

If it changes frequently enough that redeploying is impractical, it might be data, not configuration. Consider whether it belongs in a database or content management system instead.

How do I test application configuration changes?

The same way you test code changes:

  1. Commit the config change to version control
  2. CI builds the artifact with the new config
  3. Automated tests validate the behavior
  4. Deploy the artifact through all environments

Health Metrics

  • Configuration drift incidents: Should be zero (config is immutable with artifact)
  • Config-related rollbacks: Track how often config changes cause rollbacks
  • Time to config change: From commit to production should match your deployment cycle time

Additional Resources

9 - Trunk Based Development

Trunk-based development eliminates merge conflicts by integrating all code changes directly to trunk. Learn how short-lived branches and daily commits improve software delivery performance.

Excerpt from Accelerate by Nicole Forsgren Ph.D., Jez Humble & Gene Kim

Definition

TBD is a team workflow where changes are integrated into the trunk with no intermediate integration (Develop, Test, etc.) branch. The two common workflows are making changes directly to the trunk or using very short-lived branches that branch from the trunk and integrate back into the trunk.

It is important to note that release branches are an intermediate step that some chose on their path to continuous delivery while improving their quality processes in the pipeline. True CD releases from the trunk.

What is Improved

  • Smaller changes: TBD emphasizes small, frequent changes that are easier for the team to review and more resistant to impactful merge conflicts. Conflicts become rare and trivial.
  • We must test: TBD requires us to implement tests as part of the development process.
  • Better teamwork: We need to work more closely as a team. This has many positive impacts, not least we will be more focused on getting the team’s highest priority done. We will stop starting and start finishing work.
  • Better work definition: Small changes require us to decompose the work into a level of detail that helps uncover things that lack clarity or do not make sense. This provides much earlier feedback on potential quality issues.
  • Replaces process with engineering: Instead of creating a process where we control the release of features with branches, we can control the release of features with engineering techniques called evolutionary coding methods. These techniques have additional benefits related to stability that cannot be found when replaced by process.
  • Reduces risk: There are two risks with long-lived branches that happen frequently. First, the change will not integrate cleanly and the merge conflicts result in broken or lost features. Second, the branch will be abandoned. This is usually because of the first reason. Sometimes because all of the knowledge about what is in that branch resides in the mind of someone who decided to leave before it was integrated.

Need Help?

See the TBD migration guide.

9.1 - Migrating to Trunk-Based Development

Continuous delivery requires continuous integration and CI requires very frequent code integration, at least daily, to the trunk. Doing that either requires trunk-based development or worthless process overhead to do multiple merges to accomplish this. So, if you want CI, you’re not getting there without trunk-Based development. However, standing up TBD is not as simple as “collapse all the branches.” CD is a quality process, not just automated code delivery. Trunk-based development is the first step in establishing that quality process and in uncovering the problems in the current process.

GitFlow, and other branching models that use long-lived branches, optimize for isolation to protect working code from untested or poorly tested code. They create the illusion of safety while silently increasing risk through long feedback delays. The result is predictable: painful merges, stale assumptions, and feedback that arrives too late to matter.

TBD reverses that. It optimizes for rapid feedback, smaller changes, and collaborative discovery — the ingredients required for CI and continuous delivery.

This article explains how to move from GitFlow (or any long-lived branch pattern) toward TBD, and what “good” actually looks like along the way.


Why Move to Trunk-Based Development?

Long-lived branches hide problems. TBD exposes them early, when they are cheap to fix.

Think of long-lived branches like storing food in a bunker: it feels safe until you open the door and discover half of it rotting. With TBD, teams check freshness every day.

To do CI, teams need:

  • Small changes integrated at least daily
  • Automated tests giving fast, deterministic feedback
  • A single source of truth: the trunk

If your branches live for more than a day or two, you aren’t doing continuous integration — you’re doing periodic integration at best. True CI requires at least daily integration to the trunk.


The First Step: Stop Letting Work Age

The biggest barrier isn’t tooling. It’s habits.

The first meaningful change is simple:

Stop letting branches live long enough to become problems.

Your first goal isn’t true TBD. It’s shorter-lived branches — changes that live for hours or a couple of days, not weeks.

That alone exposes dependency issues, unclear requirements, and missing tests — which is exactly the point. The pain tells you where improvement is needed.


Before You Start: What to Measure

You cannot improve what you don’t measure. Before changing anything, establish baseline metrics, so you can track actual progress.

Essential Metrics to Track Weekly

Branch Lifetime

  • Average time from branch creation to merge
  • Maximum branch age currently open
  • Target: Reduce average from weeks to days, then to hours

Integration Health

  • Number of merge conflicts per week
  • Time spent resolving conflicts
  • Target: Conflicts should decrease as integration frequency increases

Delivery Speed

  • Time from commit to production deployment
  • Number of commits per day reaching production
  • Target: Decrease time to production, increase deployment frequency

Quality Indicators

  • Build/test execution time
  • Test failure rate
  • Production incidents per deployment
  • Target: Fast, reliable tests; stable deployments

Work Decomposition

  • Average pull request size (lines changed)
  • Number of files changed per commit
  • Target: Smaller, more focused changes

Start with just two or three of these. Don’t let measurement become its own project.

The goal isn’t perfect data — it’s visibility into whether you’re actually moving in the right direction.


Path #1: Moving from Long-Lived Branches to Short-Lived Branches

When GitFlow habits are deeply ingrained, this is usually the least-threatening first step.

1. Collapse the Branching Model

Stop using:

  • develop
  • release branches that sit around for weeks
  • feature branches lasting a sprint or more

Move toward:

  • A single main (or trunk)
  • Temporary branches measured in hours or days

2. Integrate Every Few Days — Then Every Day

Set an explicit working agreement:

“Nothing lives longer than 48 hours.”

Once this feels normal, shorten it:

“Integrate at least once per day.”

If a change is too large to merge within a day or two, the problem isn’t the branching model — the problem is the decomposition of work.

3. Test Before You Code

Branch lifetime shortens when you stop guessing about expected behavior. Bring product, QA, and developers together before coding:

  • Write acceptance criteria collaboratively
  • Turn them into executable tests
  • Then write code to make those tests pass

You’ll discover misunderstandings upfront instead of after a week of coding.

This approach is called Behavior-Driven Development (BDD) — a collaborative practice where teams define expected behavior in plain language before writing code. BDD bridges the gap between business requirements and technical implementation by using concrete examples that become executable tests.

Key BDD resources:

How to Run a Three Amigos Session

Participants: Product Owner, Developer, Tester (15-30 minutes per story)

Process:

  1. Product describes the user need and expected outcome
  2. Developer asks questions about edge cases and dependencies
  3. Tester identifies scenarios that could fail
  4. Together, write acceptance criteria as examples

Example:

Feature: User password reset

Scenario: Valid reset request
  Given a user with email "user@example.com" exists
  When they request a password reset
  Then they receive an email with a reset link
  And the link expires after 1 hour

Scenario: Invalid email
  Given no user with email "nobody@example.com" exists
  When they request a password reset
  Then they see "If the email exists, a reset link was sent"
  And no email is sent

Scenario: Expired link
  Given a user has a reset link older than 1 hour
  When they click the link
  Then they see "This reset link has expired"
  And they are prompted to request a new one

These scenarios become your automated acceptance tests before you write any implementation code.

From Acceptance Criteria to Tests

Turn those scenarios into executable tests in your framework of choice:

// Example using Jest and Supertest
describe('Password Reset', () => {
  it('sends reset email for valid user', async () => {
    await createUser({ email: 'user@example.com' });

    const response = await request(app)
      .post('/password-reset')
      .send({ email: 'user@example.com' });

    expect(response.status).toBe(200);
    expect(emailService.sentEmails).toHaveLength(1);
    expect(emailService.sentEmails[0].to).toBe('user@example.com');
  });

  it('does not reveal whether email exists', async () => {
    const response = await request(app)
      .post('/password-reset')
      .send({ email: 'nobody@example.com' });

    expect(response.status).toBe(200);
    expect(response.body.message).toBe('If the email exists, a reset link was sent');
    expect(emailService.sentEmails).toHaveLength(0);
  });
});

Now you can write the minimum code to make these tests pass. This drives smaller, more focused changes.

4. Invest in Contract Tests

Most merge pain isn’t from your code — it’s from the interfaces between services.
Define interface changes early and codify them with provider/consumer contract tests.

This lets teams integrate frequently without surprises.


Path #2: Committing Directly to the Trunk

This is the cleanest and most powerful version of TBD. It requires discipline, but it produces the most stable delivery pipeline and the least drama.

If the idea of committing straight to main makes people panic, that’s a signal about your current testing process — not a problem with TBD.


How to Choose Your Path

Use this rule of thumb:

  • If your team fears “breaking everything,” start with short-lived branches.
  • If your team collaborates well and writes tests first, go straight to trunk commits.

Both paths require the same skills:

  • Smaller work
  • Better requirements
  • Shared understanding
  • Automated tests
  • A reliable pipeline

The difference is pace.


Essential TBD Practices

These practices apply to both paths—whether you’re using short-lived branches or committing directly to trunk.

Use Feature Flags the Right Way

Feature flags are one of several evolutionary coding practices that allow you to integrate incomplete work safely. Other methods include branch by abstraction and connect-last patterns. For a comprehensive guide on when to use each approach, see Evolutionary Coding Practices.

Feature flags are not a testing strategy. They are a release strategy.

Every commit to trunk must:

  • Build
  • Test
  • Deploy safely

Flags let you deploy incomplete work without exposing it prematurely. They don’t excuse poor test discipline.

Start Simple: Boolean Flags

You don’t need a sophisticated feature flag system to start. Begin with environment variables or simple config files.

Simple boolean flag example:

// config/features.js
module.exports = {
  newCheckoutFlow: process.env.FEATURE_NEW_CHECKOUT === 'true',
  enhancedSearch: process.env.FEATURE_ENHANCED_SEARCH === 'true',
};

// In your code
const features = require('./config/features');

app.get('/checkout', (req, res) => {
  if (features.newCheckoutFlow) {
    return renderNewCheckout(req, res);
  }
  return renderOldCheckout(req, res);
});

This is enough for most TBD use cases.

Testing Code Behind Flags

Critical: You must test both code paths — flag on and flag off.

describe('Checkout flow', () => {
  describe('with new checkout flow enabled', () => {
    beforeEach(() => {
      features.newCheckoutFlow = true;
    });

    it('shows new checkout UI', () => {
      // Test new flow
    });
  });

  describe('with new checkout flow disabled', () => {
    beforeEach(() => {
      features.newCheckoutFlow = false;
    });

    it('shows legacy checkout UI', () => {
      // Test old flow
    });
  });
});

If you only test with the flag on, you’ll break production when the flag is off.

Two Types of Feature Flags

Feature flags serve two fundamentally different purposes:

Temporary Release Flags (should be removed):

  • Control rollout of new features
  • Enable gradual deployment
  • Allow quick rollback of changes
  • Test in production before full release
  • Lifecycle: Created for a release, removed once stable (typically 1-4 weeks)

Permanent Configuration Flags (designed to stay):

  • User preferences and settings (dark mode, email notifications, etc.)
  • Customer-specific features (enterprise vs. free tier)
  • A/B testing and experimentation
  • Regional or regulatory variations
  • Operational controls (read-only mode, maintenance mode)
  • Lifecycle: Part of your product’s configuration system

The distinction matters: Temporary release flags create technical debt if not removed. Permanent configuration flags are part of your feature set and belong in your configuration management system.

Most of the feature flags you create for TBD migration will be temporary release flags that must be removed.

Release Flag Lifecycle Management

Temporary release flags are scaffolding, not permanent architecture.

Every temporary release flag should have:

  1. A creation date
  2. A purpose
  3. An expected removal date
  4. An owner responsible for removal

Track your flags:

// flags.config.js
module.exports = {
  flags: [
    {
      name: 'newCheckoutFlow',
      created: '2024-01-15',
      owner: 'checkout-team',
      jiraTicket: 'SHOP-1234',
      removalTarget: '2024-02-15',
      purpose: 'Progressive rollout of redesigned checkout'
    }
  ]
};

Set reminders to remove flags. Permanent flags multiply complexity and slow you down.

When to Remove a Flag

Remove a flag when:

  • The feature is 100% rolled out and stable
  • You’re confident you won’t need to roll back
  • Usually 1-2 weeks after full deployment

Removal process:

  1. Set flag to always-on in code
  2. Deploy and monitor
  3. If stable for 48 hours, delete the conditional logic entirely
  4. Remove the flag from configuration

Common Anti-Patterns to Avoid

Don’t:

  • Let temporary release flags become permanent (if it’s truly permanent, it should be a configuration option)
  • Let release flags accumulate without removal
  • Skip testing both flag states
  • Use flags to hide broken code
  • Create flags for every tiny change

Do:

  • Use release flags for large or risky changes
  • Remove release flags as soon as the feature is stable
  • Clearly document whether each flag is temporary (release) or permanent (configuration)
  • Test both enabled and disabled states
  • Move permanent feature toggles to your configuration management system

Commit Small and Commit Often

If a change is too large to commit today, split it.

Large commits are failed design upstream, not failed integration downstream.

Use TDD and ATDD to Keep Refactors Safe

Refactoring must not break tests. If it does, you’re testing implementation, not behavior. Behavioral tests are what keep trunk commits safe.

Prioritize Interfaces First

Always start by defining and codifying the contract:

  • What is the shape of the request?
  • What is the response?
  • What error states must be handled?

Interfaces are the highest-risk area. Drive them with tests first. Then work inward.


Getting Started: A Tactical Guide

The initial phase sets the tone. Focus on establishing new habits, not perfection.

Step 1: Team Agreement and Baseline

  • Hold a team meeting to discuss the migration
  • Agree on initial branch lifetime limit (start with 48 hours if unsure)
  • Document current baseline metrics (branch age, merge frequency, build time)
  • Identify your slowest-running tests
  • Create a list of known integration pain points
  • Set up a visible tracker (physical board or digital dashboard) for metrics

Step 2: Test Infrastructure Audit

Focus: Find and fix what will slow you down.

  • Run your test suite and time each major section
  • Identify slow tests
  • Look for:
    • Tests with sleeps or arbitrary waits
    • Tests hitting external services unnecessarily
    • Integration tests that could be contract tests
    • Flaky tests masking real issues

Fix or isolate the worst offenders. You don’t need a perfect test suite to start — just one fast enough to not punish frequent integration.

Step 3: First Integrated Change

Pick the smallest possible change:

  • A bug fix
  • A refactoring with existing test coverage
  • A configuration update
  • Documentation improvement

The goal is to validate your process, not to deliver a feature.

Execute:

  1. Create a branch (if using Path #1) or commit directly (if using Path #2)
  2. Make the change
  3. Run tests locally
  4. Integrate to trunk
  5. Deploy through your pipeline
  6. Observe what breaks or slows you down

Step 4: Retrospective

Gather the team:

What went well:

  • Did anyone integrate faster than before?
  • Did you discover useful information about your tests or pipeline?

What hurt:

  • What took longer than expected?
  • What manual steps could be automated?
  • What dependencies blocked integration?

Ongoing commitment:

  • Adjust branch lifetime limit if needed
  • Assign owners to top 3 blockers
  • Commit to integrating at least one change per person

The initial phase won’t feel smooth. That’s expected. You’re learning what needs fixing.


Getting Your Team On Board

Technical changes are easy compared to changing habits and mindsets. Here’s how to build buy-in.

Acknowledge the Fear

When you propose TBD, you’ll hear:

  • “We’ll break production constantly”
  • “Our code isn’t good enough for that”
  • “We need code review on branches”
  • “This won’t work with our compliance requirements”

These concerns are valid signals about your current system. Don’t dismiss them.

Instead: “You’re right that committing directly to trunk with our current test coverage would be risky. That’s why we need to improve our tests first.”

Start with an Experiment

Don’t mandate TBD for the whole team immediately. Propose a time-boxed experiment:

The Proposal:

“Let’s try this for two weeks with a single small feature. We’ll track what goes well and what hurts. After two weeks, we’ll decide whether to continue, adjust, or stop.”

What to measure during the experiment:

  • How many times did we integrate?
  • How long did merges take?
  • Did we catch issues earlier or later than usual?
  • How did it feel compared to our normal process?

After two weeks: Hold a retrospective. Let the data and experience guide the decision.

Pair on the First Changes

Don’t expect everyone to adopt TBD simultaneously. Instead:

  1. Identify one advocate who wants to try it
  2. Pair with them on the first trunk-based changes
  3. Let them experience the process firsthand
  4. Have them pair with the next person

Knowledge transfer through pairing works better than documentation.

Address Code Review Concerns

“But we need code review!” Yes. TBD doesn’t eliminate code review.

Options that work:

  • Pair or mob programming (review happens in real-time)
  • Commit to trunk, review immediately after, fix forward if issues found
  • Very short-lived branches (hours, not days) with rapid review SLA
  • Pairing on code review and review change

The goal is fast feedback, not zero review.

Handle Skeptics and Blockers

You’ll encounter people who don’t want to change. Don’t force it.

Instead:

  • Let them observe the experiment from the outside
  • Share metrics and outcomes transparently
  • Invite them to pair for one change
  • Let success speak louder than arguments

Some people need to see it working before they believe it.

Get Management Support

Managers often worry about:

  • Reduced control
  • Quality risks
  • Slower delivery (ironically)

Address these with data:

  • Show branch age metrics before/after
  • Track cycle time improvements
  • Demonstrate faster feedback on defects
  • Highlight reduced merge conflicts

Frame TBD as a risk reduction strategy, not a risky experiment.


Working in a Multi-Team Environment

Migrating to TBD gets complicated when you depend on teams still using long-lived branches. Here’s how to handle it.

The Core Problem

You want to integrate daily. Your dependency team integrates weekly or monthly. Their API changes surprise you during their big-bang merge.

You can’t force other teams to change. But you can protect yourself.

Strategy 1: Consumer-Driven Contract Tests

Define the contract you need from the upstream service and codify it in tests that run in your pipeline.

Example using Pact:

// Your consumer test
const { pact } = require('@pact-foundation/pact');

describe('User Service Contract', () => {
  it('returns user profile by ID', async () => {
    await provider.addInteraction({
      state: 'user 123 exists',
      uponReceiving: 'a request for user 123',
      withRequest: {
        method: 'GET',
        path: '/users/123',
      },
      willRespondWith: {
        status: 200,
        body: {
          id: 123,
          name: 'Jane Doe',
          email: 'jane@example.com',
        },
      },
    });

    const user = await userService.getUser(123);
    expect(user.name).toBe('Jane Doe');
  });
});

This test runs against your expectations of the API, not the actual service. When the upstream team changes their API, your contract test fails before you integrate their changes.

Share the contract:

  • Publish your contract to a shared repository
  • Upstream team runs provider verification against your contract
  • If they break your contract, they know before merging

Strategy 2: API Versioning with Backwards Compatibility

If you control the shared service:

// Support both old and new API versions
app.get('/api/v1/users/:id', handleV1Users);
app.get('/api/v2/users/:id', handleV2Users);

// Or use content negotiation
app.get('/api/users/:id', (req, res) => {
  const version = req.headers['api-version'] || 'v1';
  if (version === 'v2') {
    return handleV2Users(req, res);
  }
  return handleV1Users(req, res);
});

Migration path:

  1. Deploy new version alongside old version
  2. Update consumers one by one
  3. After all consumers migrated, deprecate old version
  4. Remove old version after deprecation period

Strategy 3: Strangler Fig Pattern

When you depend on a team that won’t change:

  1. Create an anti-corruption layer between your code and theirs
  2. Define your ideal interface in the adapter
  3. Let the adapter handle their messy API
// Your ideal interface
class UserRepository {
  async getUser(id) {
    // Your clean, typed interface
  }
}

// Adapter that deals with their mess
class LegacyUserServiceAdapter extends UserRepository {
  async getUser(id) {
    const response = await fetch(`https://legacy-service/users/${id}`);
    const messyData = await response.json();

    // Transform their format to yours
    return {
      id: messyData.user_id,
      name: `${messyData.first_name} ${messyData.last_name}`,
      email: messyData.email_address,
    };
  }
}

Now your code depends on your interface, not theirs. When they change, you only update the adapter.

Strategy 4: Feature Toggles for Cross-Team Coordination

When multiple teams need to coordinate a release:

  1. Each team develops behind feature flags
  2. Each team integrates to trunk continuously
  3. Features remain disabled until coordination point
  4. Enable flags in coordinated sequence

This decouples development velocity from release coordination.

When You Can’t Integrate with Dependencies

If upstream dependencies block you from integrating daily:

Short term:

  • Use contract tests to detect breaking changes early
  • Create adapters to isolate their changes
  • Document the integration pain as a business cost

Long term:

  • Advocate for those teams to adopt TBD
  • Share your success metrics
  • Offer to help them migrate

You can’t force other teams to change. But you can demonstrate a better way and make it easier for them to follow.


TBD in Regulated Environments

Regulated industries face legitimate compliance requirements: audit trails, change traceability, separation of duties, and documented approval processes. These requirements often lead teams to believe trunk-based development is incompatible with compliance. This is a misconception.

TBD is about integration frequency, not about eliminating controls. You can meet compliance requirements while still integrating at least daily.

The Compliance Concerns

Common regulatory requirements that seem to conflict with TBD:

Audit Trail and Traceability

  • Every change must be traceable to a requirement, ticket, or change request
  • Changes must be attributable to specific individuals
  • History of what changed, when, and why must be preserved

Separation of Duties

  • The person who writes code shouldn’t be the person who approves it
  • Changes must be reviewed before reaching production
  • No single person should have unchecked commit access

Change Control Process

  • Changes must follow a documented approval workflow
  • Risk assessment before deployment
  • Rollback capability for failed changes

Documentation Requirements

  • Changes must be documented before implementation
  • Testing evidence must be retained
  • Deployment procedures must be repeatable and auditable

Short-Lived Branches: The Compliant Path to TBD

Path #1 from this guide—short-lived branches—directly addresses compliance concerns while maintaining the benefits of TBD.

Short-lived branches mean:

  • Branches live for hours to 2 days maximum, not weeks or months
  • Integration happens at least daily
  • Pull requests are small, focused, and fast to review
  • Review and approval happen within the branch lifetime

This approach satisfies both regulatory requirements and continuous integration principles.

How Short-Lived Branches Meet Compliance Requirements

Audit Trail:

Every commit references the change ticket:

git commit -m "JIRA-1234: Add validation for SSN input

Implements requirement REQ-445 from Q4 compliance review.
Changes limited to user input validation layer.

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>"

Modern Git hosting platforms (GitHub, GitLab, Bitbucket) automatically track:

  • Who created the branch
  • Who committed each change
  • Who reviewed and approved
  • When it merged
  • Complete diff history

Separation of Duties:

Use pull request workflows:

  1. Developer creates branch from trunk
  2. Developer commits changes (same day)
  3. Second person reviews and approves (within 24 hours)
  4. Automated checks validate (tests, security scans, compliance checks)
  5. Merge to trunk after approval
  6. Automated deployment with gates

This provides stronger separation of duties than long-lived branches because:

  • Reviews happen while context is fresh
  • Reviewers can actually understand the small changeset
  • Automated checks enforce policies consistently

Change Control Process:

Branch protection rules enforce your process:

# Example GitHub branch protection for trunk
required_reviews: 1
required_checks:
  - unit-tests
  - security-scan
  - compliance-validation
dismiss_stale_reviews: true
require_code_owner_review: true

This ensures:

  • No direct commits to trunk (except in documented break-glass scenarios)
  • Required approvals before merge
  • Automated validation gates
  • Audit log of every merge decision

Documentation Requirements:

Pull request templates enforce documentation:

## Change Description
[Link to Jira ticket]

## Risk Assessment
- [ ] Low risk: Configuration only
- [ ] Medium risk: New functionality, backward compatible
- [ ] High risk: Database migration, breaking change

## Testing Evidence
- [ ] Unit tests added/updated
- [ ] Integration tests pass
- [ ] Manual testing completed (attach screenshots if UI change)
- [ ] Security scan passed

## Rollback Plan
[How to rollback if this causes issues in production]

What “Short-Lived” Means in Practice

Hours, not days:

  • Simple bug fixes: 2-4 hours
  • Small feature additions: 4-8 hours
  • Refactoring: 1-2 days

Maximum 2 days: If a branch can’t merge within 2 days, the work is too large. Decompose it further or use feature flags to integrate incomplete work safely.

Daily integration requirement: Even if the feature isn’t complete, integrate what you have:

  • Behind a feature flag if needed
  • As internal APIs not yet exposed
  • As tests and interfaces before implementation

Compliance-Friendly Tooling

Modern platforms provide compliance features built-in:

Git Hosting (GitHub, GitLab, Bitbucket):

  • Immutable audit logs
  • Branch protection rules
  • Required approvals
  • Status check enforcement
  • Signed commits for authenticity

CI/CD Platforms:

  • Deployment approval gates
  • Audit trails of every deployment
  • Environment-specific controls
  • Automated compliance checks

Feature Flag Systems:

  • Change deployment without code deployment
  • Gradual rollout controls
  • Instant rollback capability
  • Audit log of flag changes

Secrets Management:

  • Vault, AWS Secrets Manager, Azure Key Vault
  • Audit log of secret access
  • Rotation policies
  • Environment isolation

Example: Compliant Short-Lived Branch Workflow

Monday 9 AM: Developer creates branch feature/JIRA-1234-add-audit-logging from trunk.

Monday 9 AM - 2 PM: Developer implements audit logging for user authentication events. Commits reference JIRA-1234. Automated tests run on each commit.

Monday 2 PM: Developer opens pull request:

  • Title: “JIRA-1234: Add audit logging for authentication events”
  • Description includes risk assessment, testing evidence, rollback plan
  • Automated checks run: tests, security scan, compliance validation
  • Code owner automatically assigned for review

Monday 3 PM: Code owner reviews (5-10 minutes—change is small and focused). Suggests minor improvement.

Monday 3:30 PM: Developer addresses feedback, pushes update.

Monday 4 PM: Code owner approves. All automated checks pass. Developer merges to trunk.

Monday 4:05 PM: CI/CD pipeline deploys to staging automatically. Automated smoke tests pass.

Monday 4:30 PM: Deployment gate requires manual approval for production. Tech lead approves based on risk assessment.

Monday 4:35 PM: Automated deployment to production. Audit log captures: what deployed, who approved, when, what checks passed.

Total time: 7.5 hours from branch creation to production.

Full compliance maintained. Full audit trail captured. Daily integration achieved.

When Long-Lived Branches Hide Compliance Problems

Ironically, long-lived branches often create compliance risks:

Stale Reviews: Reviewing a 3-week-old, 2000-line pull request is performative, not effective. Reviewers rubber-stamp because they can’t actually understand the changes.

Integration Risk: Big-bang merges after weeks introduce unexpected behavior. The change that was reviewed isn’t the change that actually deployed (due to merge conflicts and integration issues).

Delayed Feedback: Problems discovered weeks after code was written are expensive to fix and hard to trace to requirements.

Audit Trail Gaps: Long-lived branches often have messy commit history, force pushes, and unclear attribution. The audit trail is polluted.

Regulatory Examples Where Short-Lived Branches Work

Financial Services (SOX, PCI-DSS):

  • Short-lived branches with required approvals
  • Automated security scanning on every PR
  • Separation of duties via required reviewers
  • Immutable audit logs in Git hosting platform
  • Feature flags for gradual rollout and instant rollback

Healthcare (HIPAA):

  • Pull request templates documenting PHI handling
  • Automated compliance checks for data access patterns
  • Required security review for any PHI-touching code
  • Audit logs of deployments
  • Environment isolation enforced by CI/CD

Government (FedRAMP, FISMA):

  • Branch protection requiring government code owner approval
  • Automated STIG compliance validation
  • Signed commits for authenticity
  • Deployment gates requiring authority to operate
  • Complete audit trail from commit to production

The Real Choice

The question isn’t “TBD or compliance.”

The real choice is: compliance theater with long-lived branches and risky big-bang merges, or actual compliance with short-lived branches and safe daily integration.

Short-lived branches provide:

  • Better audit trails (small, traceable changes)
  • Better separation of duties (reviewable changes)
  • Better change control (automated enforcement)
  • Lower risk (small, reversible changes)
  • Faster feedback (problems caught early)

That’s not just compatible with compliance. That’s better compliance.


What Will Hurt (At First)

When you migrate to TBD, you’ll expose every weakness you’ve been avoiding:

  • Slow tests
  • Unclear requirements
  • Fragile integration points
  • Architecture that resists small changes
  • Gaps in automated validation
  • Long manual processes in the value stream

This is not a regression. This is the point.

Problems you discover early are problems you can fix cheaply.


Common Pitfalls to Avoid

Teams migrating to TBD often make predictable mistakes. Here’s how to avoid them.

Pitfall 1: Treating TBD as Just a Branch Renaming Exercise

The mistake: Renaming develop to main and calling it TBD.

Why it fails: You’re still doing long-lived feature branches, just with different names. The fundamental integration problems remain.

What to do instead: Focus on integration frequency, not branch names. Measure time-to-merge, not what you call your branches.

Pitfall 2: Merging Daily Without Actually Integrating

The mistake: Committing to trunk every day, but your code doesn’t interact with anyone else’s work. Your tests don’t cover integration points.

Why it fails: You’re batching integration for later. When you finally connect your component to the rest of the system, you discover incompatibilities.

What to do instead: Ensure your tests exercise the boundaries between components. Use contract tests for service interfaces. Integrate at the interface level, not just at the source control level.

Pitfall 3: Skipping Test Investment

The mistake: “We’ll adopt TBD first, then improve our tests later.”

Why it fails: Without fast, reliable tests, frequent integration is terrifying. You’ll revert to long-lived branches because trunk feels unsafe.

What to do instead: Invest in test infrastructure first. Make your slowest tests faster. Fix flaky tests. Only then increase integration frequency.

Pitfall 4: Using Feature Flags as a Testing Escape Hatch

The mistake: “It’s fine to commit broken code as long as it’s behind a flag.”

Why it fails: Untested code is still untested, flag or no flag. When you enable the flag, you’ll discover the bugs you should have caught earlier.

What to do instead: Test both flag states. Flags hide features from users, not from your test suite.

Pitfall 5: Keeping Flags Forever

The mistake: Creating feature flags and never removing them. Your codebase becomes a maze of conditionals.

Why it fails: Every permanent flag doubles your testing surface area and increases complexity. Eventually, no one knows which flags do what.

What to do instead: Set a removal date when creating each flag. Track flags like technical debt. Remove them aggressively once features are stable.

Pitfall 6: Forcing TBD on an Unprepared Team

The mistake: Mandating TBD before the team understands why or how it works.

Why it fails: People resist changes they don’t understand or didn’t choose. They’ll find ways to work around it or sabotage it.

What to do instead: Start with volunteers. Run experiments. Share results. Let success create pull, not push.

Pitfall 7: Ignoring the Need for Small Changes

The mistake: Trying to do TBD while still working on features that take weeks to complete.

Why it fails: If your work naturally takes weeks, you can’t integrate daily. You’ll create work-in-progress commits that don’t add value.

What to do instead: Learn to decompose work into smaller, independently valuable increments. This is a skill that must be developed.

Pitfall 8: No Clear Definition of “Done”

The mistake: Integrating code that “works on my machine” without validating it in a production-like environment.

Why it fails: Integration bugs don’t surface until deployment. By then, you’ve integrated many other changes, making root cause analysis harder.

What to do instead: Define “integrated” as “deployed to a staging environment and validated.” Your pipeline should do this automatically.

Pitfall 9: Treating Trunk as Unstable

The mistake: “Trunk is where we experiment. Stable code goes in release branches.”

Why it fails: If trunk can’t be released at any time, you don’t have CI. You’ve just moved your integration problems to a different branch.

What to do instead: Trunk must always be production-ready. Use feature flags for incomplete work. Fix broken builds immediately.

Pitfall 10: Forgetting That TBD is a Means, Not an End

The mistake: Optimizing for trunk commits without improving cycle time, quality, or delivery speed.

Why it fails: TBD is valuable because it enables fast feedback and low-cost changes. If those aren’t improving, TBD isn’t working.

What to do instead: Measure outcomes, not activities. Track cycle time, defect rates, deployment frequency, and time to restore service.


When to Pause or Pivot

Sometimes TBD migration stalls or causes more problems than it solves. Here’s how to tell if you need to pause and what to do about it.

Signs You’re Not Ready Yet

Red flag 1: Your test suite takes hours to run If developers can’t get feedback in minutes, they can’t integrate frequently. Forcing TBD now will just slow everyone down.

What to do: Pause the TBD migration. Invest 2-4 weeks in making tests faster. Parallelize test execution. Remove or optimize the slowest tests. Resume TBD when feedback takes less than 10 minutes.

Red flag 2: More than half your tests are flaky If tests fail randomly, developers will ignore failures. You’ll integrate broken code without realizing it.

What to do: Stop adding new features. Spend one sprint fixing or deleting flaky tests. Track flakiness metrics. Only resume TBD when you trust your test results.

Red flag 3: Production incidents increased significantly If TBD caused a spike in production issues, something is wrong with your safety net.

What to do: Revert to short-lived branches (48-72 hours) temporarily. Analyze what’s escaping to production. Add tests or checks to catch those issues. Resume direct-to-trunk when the safety net is stronger.

Red flag 4: The team is in constant conflict If people are fighting about the process, frustrated daily, or actively working around it, you’ve lost the team.

What to do: Hold a retrospective. Listen to concerns without defending TBD. Identify the top 3 pain points. Address those first. Resume TBD migration when the team agrees to try again.

Signs You’re Doing It Wrong (But Can Fix It)

Yellow flag 1: Daily commits, but monthly integration You’re committing to trunk, but your code doesn’t connect to the rest of the system until the end.

What to fix: Focus on interface-level integration. Ensure your tests exercise boundaries between components. Use contract tests.

Yellow flag 2: Trunk is broken often If trunk is red more than 5% of the time, something’s wrong with your testing or commit discipline.

What to fix: Make “fix trunk immediately” the top priority. Consider requiring local tests to pass before pushing. Add pre-commit hooks if needed.

Yellow flag 3: Feature flags piling up If you have more than 5 active flags, you’re not cleaning up after yourself.

What to fix: Set a team rule: “For every new flag created, remove an old one.” Dedicate time each sprint to flag cleanup.

How to Pause Gracefully

If you need to pause:

  1. Communicate clearly: “We’re pausing TBD migration for two weeks to fix our test infrastructure. This isn’t abandoning the goal.”

  2. Set a specific resumption date: Don’t let “pause” become “quit.” Schedule a date to revisit.

  3. Fix the blockers: Use the pause to address the specific problems preventing success.

  4. Retrospect and adjust: When you resume, what will you do differently?

Pausing isn’t failure. Pausing to fix the foundation is smart.


What “Good” Looks Like

You know TBD is working when:

  • Branches live for hours, not days
  • Developers collaborate early instead of merging late
  • Product participates in defining behaviors, not just writing stories
  • Tests run fast enough to integrate frequently
  • Deployments are boring
  • You can fix production issues with the same process you use for normal work

When your deployment process enables emergency fixes without special exceptions, you’ve reached the real payoff: lower cost of change, which makes everything else faster, safer, and more sustainable.


Concrete Examples and Scenarios

Theory is useful. Examples make it real. Here are practical scenarios showing how to apply TBD principles.

Scenario 1: Breaking Down a Large Feature

Problem: You need to build a user notification system with email, SMS, and in-app notifications. Estimated: 3 weeks of work.

Old approach (GitFlow): Create a feature/notifications branch. Work for three weeks. Submit a massive pull request. Spend days in code review and merge conflicts.

TBD approach:

Week 1:

  • Day 1: Define notification interface, commit to trunk

    // notifications/NotificationService.ts
    interface NotificationService {
      send(userId: string, message: NotificationMessage): Promise<void>;
    }
    
    interface NotificationMessage {
      title: string;
      body: string;
      priority: 'low' | 'normal' | 'high';
    }
    

    This compiles but doesn’t do anything yet. That’s fine.

  • Day 2: Add in-memory implementation for testing

    class InMemoryNotificationService implements NotificationService {
      private notifications: NotificationMessage[] = [];
    
      async send(userId: string, message: NotificationMessage) {
        this.notifications.push(message);
      }
    }
    

    Now other teams can use the interface in their code and tests.

  • Day 3-5: Implement email notifications behind a feature flag

    class EmailNotificationService implements NotificationService {
      async send(userId: string, message: NotificationMessage) {
        if (!features.emailNotifications) {
          return; // No-op when disabled
        }
        // Real implementation
      }
    }
    

    Commit daily. Deploy. Flag is off in production.

Week 2:

  • Add SMS notifications (same pattern: interface, implementation, feature flag)
  • Enable email notifications for internal users only
  • Iterate based on feedback

Week 3:

  • Add in-app notifications
  • Roll out email and SMS to all users
  • Remove flags for email once stable

Result: Integrated 12-15 times instead of once. Each integration was small and low-risk.

Scenario 2: Database Schema Change

Problem: You need to split the users.name column into first_name and last_name.

Old approach: Update schema, update all code, deploy everything at once. Hope nothing breaks.

TBD approach (expand-contract pattern):

Step 1: Expand (Day 1) Add new columns without removing the old one:

ALTER TABLE users ADD COLUMN first_name VARCHAR(255);
ALTER TABLE users ADD COLUMN last_name VARCHAR(255);

Commit and deploy. Application still uses name column. No breaking change.

Step 2: Dual writes (Day 2-3) Update write path to populate both old and new columns:

async function createUser(name) {
  const [firstName, lastName] = name.split(' ');
  await db.query(
    'INSERT INTO users (name, first_name, last_name) VALUES (?, ?, ?)',
    [name, firstName, lastName]
  );
}

Commit and deploy. Now new data populates both formats.

Step 3: Backfill (Day 4) Migrate existing data in the background:

async function backfillNames() {
  const users = await db.query('SELECT id, name FROM users WHERE first_name IS NULL');
  for (const user of users) {
    const [firstName, lastName] = user.name.split(' ');
    await db.query(
      'UPDATE users SET first_name = ?, last_name = ? WHERE id = ?',
      [firstName, lastName, user.id]
    );
  }
}

Run this as a background job. Commit and deploy.

Step 4: Read from new columns (Day 5) Update read path behind a feature flag:

async function getUser(id) {
  const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
  if (features.useNewNameColumns) {
    return {
      firstName: user.first_name,
      lastName: user.last_name,
    };
  }
  return { name: user.name };
}

Deploy and gradually enable the flag.

Step 5: Contract (Week 2) Once all reads use new columns and flag is removed:

ALTER TABLE users DROP COLUMN name;

Result: Five deployments instead of one big-bang change. Each step was reversible. Zero downtime.

Scenario 3: Refactoring Without Breaking the World

Problem: Your authentication code is a mess. You want to refactor it without breaking production.

TBD approach:

Day 1: Characterization tests Write tests that capture current behavior (warts and all):

describe('Current auth behavior', () => {
  it('accepts password with special characters', () => {
    // Document what currently happens
  });

  it('handles malformed tokens by returning 401', () => {
    // Capture edge case behavior
  });
});

These tests document how the system actually works. Commit.

Day 2-3: Strangler fig pattern Create new implementation alongside old one:

class LegacyAuthService {
  // Existing messy code (don't touch it)
}

class ModernAuthService {
  // Clean implementation
}

class AuthServiceRouter {
  constructor(private legacy: LegacyAuthService, private modern: ModernAuthService) {}

  async authenticate(credentials) {
    if (features.modernAuth) {
      return this.modern.authenticate(credentials);
    }
    return this.legacy.authenticate(credentials);
  }
}

Commit with flag off. Old behavior unchanged.

Day 4-7: Migrate piece by piece Enable modern auth for one endpoint at a time:

if (features.modernAuth && endpoint === '/api/users') {
  return modernAuth.authenticate(credentials);
}

Commit daily. Monitor each endpoint.

Week 2: Remove old code Once all endpoints use modern auth and it’s been stable for a week:

class AuthService {
  async authenticate(credentials) {
    // Just the modern implementation
  }
}

Delete the legacy code entirely.

Result: Continuous refactoring without a “big rewrite” branch. Production was never at risk.

Scenario 4: Working with External API Changes

Problem: A third-party API you depend on is changing their response format next month.

TBD approach:

Week 1: Adapter pattern Create an adapter that normalizes both old and new formats:

class PaymentAPIAdapter {
  async getPaymentStatus(orderId) {
    const response = await fetch(`https://api.payments.com/orders/${orderId}`);
    const data = await response.json();

    // Handle both old and new format
    if (data.payment_status) {
      // Old format
      return {
        status: data.payment_status,
        amount: data.total_amount,
      };
    } else {
      // New format
      return {
        status: data.status.payment,
        amount: data.amounts.total,
      };
    }
  }
}

Commit. Your code now works with both formats.

Week 2-3: Wait for the third-party API to migrate. Your code keeps working.

Week 4 (after API migration): Simplify adapter to only handle new format:

async getPaymentStatus(orderId) {
  const response = await fetch(`https://api.payments.com/orders/${orderId}`);
  const data = await response.json();
  return {
    status: data.status.payment,
    amount: data.amounts.total,
  };
}

Result: No coupling between your deployment schedule and the external API migration. Zero downtime.


References and Further Reading

Trunk-Based Development

Core Resources:

Testing Practices

ATDD and BDD:

Test-Driven Development:

  • “Test-Driven Development: By Example” by Kent Beck - TDD fundamentals
  • “Growing Object-Oriented Software, Guided by Tests” by Steve Freeman and Nat Pryce - TDD at scale

Contract Testing:

Patterns for Incremental Change

Database Migrations:

Legacy Code:

  • “Working Effectively with Legacy Code” by Michael Feathers - Characterization tests and strangler patterns
  • Strangler Fig Application - Incremental rewrites

Team Dynamics and Change Management

  • “Accelerate” by Nicole Forsgren, Jez Humble, and Gene Kim - Data on what drives software delivery performance
  • “Team Topologies” by Matthew Skelton and Manuel Pais - Organizing teams for fast flow
  • State of DevOps Reports - Annual research on delivery practices

Continuous Integration

Communities and Discussions


Final Thought

Migrating from GitFlow to TBD isn’t a matter of changing your branching strategy. It’s a matter of changing your thinking.

Stop optimizing for isolation.
Start optimizing for feedback.

Small, tested, integrated changes — delivered continuously — will always outperform big batches delivered occasionally.

That’s why teams migrate to TBD. Not because it’s trendy, but because it’s the only path to real continuous integration and continuous delivery.