This is the multi-page printable view of this section. Click here to print.
Description on the Minimums
- 1: Continuous Integration
- 1.1: Evolutionary Coding Practices
- 1.2: Testing Strategies
- 1.3: Pipeline Visibility & Health Metrics
- 1.4: All Feature Work Stops When the Build Is Red
- 2: Only Path to Any Environment
- 3: Deterministic Pipeline
- 4: Definition of Deployable
- 5: Immutable Artifact
- 6: Prod-Like Test Environment
- 7: Rollback On-demand
- 8: Application Configuration
- 9: Trunk Based Development
1 - Continuous Integration
Definition
Continuous Integration (CI) is the activity of each developer integrating work to the trunk of version control at least daily and verifying that the work is, to the best of our knowledge, releasable.
CI is not just about tooling—it’s fundamentally about team workflow and working agreements.
The minimum activities required for CI
- Trunk-based development - all work integrates to trunk
- Work integrates to trunk at a minimum daily (each developer, every day)
- Work has automated testing before merge to trunk
- Work is tested with other work automatically on merge
- All feature work stops when the build is red
- New work does not break delivered work
Why This Matters
Without CI, Teams Experience
- Integration hell: Weeks or months of painful merge conflicts
- Late defect detection: Bugs found after they’re expensive to fix
- Reduced collaboration: Developers work in isolation, losing context
- Deployment fear: Large batches of untested changes create risk
- Slower delivery: Time wasted on merge conflicts and rework
- Quality erosion: Without rapid feedback, technical debt accumulates
With CI, Teams Achieve
- Rapid feedback: Know within minutes if changes broke something
- Smaller changes: Daily integration forces better work breakdown
- Better collaboration: Team shares ownership of the codebase
- Lower risk: Small, tested changes are easier to diagnose and fix
- Faster delivery: No integration delays blocking deployment
- Higher quality: Continuous testing catches issues early
Team Working Agreements
While CI depends on tooling, the team workflow and working agreement are more important:
- Define testable work: Work includes testable acceptance criteria that drive testing efforts
- Tests accompany commits: No work committed to version control without required tests
- Incremental progress: Committed work may not be “feature complete”, but must not break existing work
- Trunk-based workflow: All work begins from trunk and integrates to trunk at least daily
- Stop-the-line: If CI detects an error, the team stops feature work and collaborates to fix the build immediately
The stop-the-line practice is critical for maintaining an always-releasable trunk. For detailed guidance on implementing this discipline, see All Feature Work Stops When the Build Is Red.
Example Implementations
Anti-Pattern: Feature Branch Workflow Without CI
Problems
- Long-lived branches accumulate merge conflicts
- Integration issues discovered late
- No early feedback on compatibility
- Large batches of untested changes
- Team blocked while resolving conflicts
Good Pattern: Continuous Integration to Trunk
Benefits
- Changes tested within minutes
- Team gets immediate feedback
- Small changes are easy to debug
- Integration is never a surprise
- Quality maintained continuously
Evolutionary Coding Practices
To integrate code daily while building large features, use patterns like branch by abstraction, feature flags, and connect-last. These techniques allow you to break down large changes into small, safe commits that integrate to trunk daily without breaking existing functionality.
For detailed guidance and code examples, see Evolutionary Coding Practices.
Testing in CI
A comprehensive testing strategy balances fast feedback with thorough validation. Run different test types at different stages of the pipeline:
- Pre-merge tests (< 10 minutes): Unit tests, linting, static security scans, dependency audits
- Post-merge tests (< 30 minutes): All pre-merge tests plus integration tests, functional tests, performance tests (validate response time and throughput requirements), and dynamic security tests
- Deployment tests: End-to-end and smoke tests belong in the deployment pipeline, not CI
For detailed guidance on test strategy, the test pyramid, deterministic testing, and test quality, see Testing Strategies.
What is Improved
Teamwork
CI requires strong teamwork to function correctly. Key improvements:
- Pull workflow: Team picks next important work instead of working from assignments
- Code review cadence: Quick reviews (< 4 hours) keep work flowing
- Pair programming: Real-time collaboration eliminates review delays
- Shared ownership: Everyone maintains the codebase together
- Team goals over individual tasks: Focus shifts from “my work” to “our progress”
Anti-pattern: “Push” workflow where work is assigned creates silos and delays.
Work Breakdown
CI forces better work decomposition:
- Definition of Ready: Every story has testable acceptance criteria before work starts
- Small batches: If the team can complete work in < 2 days, it’s refined enough
- Vertical slicing: Each change delivers a thin, tested slice of functionality
- Incremental delivery: Features built incrementally, each step integrated daily
See Work Breakdown for detailed guidance.
Testing
CI requires a shift in testing approach:
From: Writing tests after code is “complete” To: Writing tests before/during coding (TDD/BDD)
From: Testing implementation details To: Testing behavior and outcomes
From: Manual testing before deployment To: Automated testing on every commit
From: Separate QA phase To: Quality built into development
CI teams build a comprehensive test suite with the goal of detecting issues as close to creation as possible. See Behavior-Driven Development.
Common Challenges
“What are the main problems to overcome?”
- Poor teamwork: Usually driven by assigning work instead of using a pull system
- Lack of testable acceptance criteria: Made worse by individual assignments instead of team goals. BDD provides declarative functional tests everyone understands
- Lack of evolutionary coding knowledge: “I can’t commit until the feature is complete!” Use branch by abstraction, feature flags, or plan changes so the last change integrates the feature
“How do I complete a large feature in less than a day?”
You probably don’t complete it in a day, but you integrate progress every day. See Evolutionary Coding Practices for detailed patterns and code examples.
“What code coverage level is needed before we can do CI?”
You don’t need tests in existing code to begin CI. You need to test new code without exception.
Starting point: “We will not go lower than the current level of code coverage.”
“What code coverage percentage should we have?”
“I’m confident.” Are you confident you’ve covered enough positive and negative cases?
Better question: “Do we trust our tests?” Test coverage percentage doesn’t indicate test quality.
“Should we set a code coverage standard for all teams?”
No. Code coverage mandates incentivize meaningless tests that hide the fact that code is not tested.
It is better to have no tests than to have tests you do not trust.
Instead: Focus on test quality, behavior coverage, and team discipline. See Code Coverage for detailed guidance.
Monitoring CI Health
Track these key metrics to understand CI effectiveness and drive improvement:
- Commits per day per developer: ≥ 1 (team average)—indicates integration discipline
- Development cycle time: < 2 days average—shows effective work breakdown
- Build success rate: > 95%—reflects pre-merge testing quality
- Time to fix broken build: < 1 hour—demonstrates stop-the-line commitment
- Defect rate: Stable or decreasing—ensures speed doesn’t sacrifice quality
Make pipeline status visible to everyone through dashboards, notifications, and build radiators. Visibility drives faster response, shared accountability, and continuous improvement.
For detailed guidance on metrics, dashboards, and using data for improvement, see Pipeline Visibility & Health Metrics.
Additional Resources
- Continuous Integration on Martin Fowler’s site
- Accelerate: Technical Practices - Nicole Forsgren, Jez Humble, Gene Kim
- The Practical Test Pyramid - Martin Fowler
- Branch By Abstraction
- Feature Toggles - Martin Fowler
- Behavior-Driven Development - DevOps Dojo Consortium
1.1 - Evolutionary Coding Practices
A core skill needed for CI is the ability to make code changes that are not complete features and integrate them to the trunk without breaking existing behaviors. We never make big-bang changes. We make small changes that limit our risk. These are some of the most common methods.
Branch by Abstraction
Gradually replace existing behavior while continuously integrating:
Feature Flags
Feature flags control feature visibility without blocking integration. However, they’re often overused—many scenarios have better alternatives.
When to use feature flags
- Large or high-risk changes needing gradual rollout
- Testing in production before full release (dark launch, beta testing)
- A/B testing and experimentation
- Customer-specific behavior or toggles
- Cross-team coordination requiring independent deployment
When NOT to use feature flags
- New features that can connect to tests only, integrate in final commit
- Behavior changes (use branch by abstraction instead)
- New API routes (build route, expose as last change)
- Bug fixes or hotfixes (deploy immediately)
- Simple changes (standard deployment sufficient)
Example usage
For detailed decision guidance and implementation approaches, see Feature Flags.
Connect Last
Build complete features, connect them in final commit:
For detailed guidance on when to use each pattern, see Feature Flags.
Why These Patterns Matter
These evolutionary coding practices enable teams to:
- Integrate daily: Break large features into small, safe changes
- Reduce risk: Each commit is tested and releasable
- Maintain flow: No waiting for features to complete before integrating
- Improve collaboration: Team shares ownership of evolving code
- Enable rollback: Easy to revert small changes if needed
Common Questions
“How do I complete a large feature in less than a day?”
You probably don’t complete it in a day, but you integrate progress every day using these patterns. Each daily commit is tested, working, and doesn’t break existing functionality.
“Which pattern should I use?”
- Connect Last: Best for new features that don’t affect existing code
- Branch by Abstraction: Best for replacing or modifying existing behavior
- Feature Flags: Best for gradual rollout, testing in production, or customer-specific features
“Don’t these patterns add complexity?”
Temporarily, yes. But this complexity is:
- Intentional: You control when and how it’s introduced
- Temporary: Removed once the transition is complete
- Safer: Than long-lived branches with merge conflicts
- Testable: Each step can be verified independently
Additional Resources
- Branch By Abstraction
- Feature Toggles - Martin Fowler
- Feature Flags - Detailed implementation guidance
1.2 - Testing Strategies
A comprehensive testing strategy is essential for continuous integration. The key is balancing fast feedback with thorough validation by running different test types at different stages of the pipeline.
Pre-Merge Testing (Fast Feedback)
Tests that run before code merges to trunk should provide rapid feedback to developers. The goal is to catch obvious issues quickly without blocking the integration workflow.
What to Run
- Static analysis: Type checkers, linters, security scans
- Unit tests: Fast tests (preferably sociable unit tests with real in-process dependencies)
- Dependency audits: Known vulnerabilities in dependencies
Performance Goal
Complete in < 10 minutes
Why Speed Matters
Pre-merge tests create a feedback loop for developers. If these tests take too long, developers context-switch while waiting, multiple developers queue up, and the team slows down integration frequency.
Keep pre-merge tests focused on fast, deterministic checks that catch the most common issues.
Post-Merge Testing (Comprehensive Validation)
After code merges to trunk, run the complete test suite to validate the integrated system.
What to Run
- All pre-merge tests: Re-run for final validation
- Integration tests: Test component interactions with real dependencies
- Functional tests: Test user-facing behavior
- Performance tests: Validate response time and throughput requirements
- Dynamic security tests: Security analysis of running application
Performance Goal
Complete in < 30 minutes
Why Re-run Pre-merge Tests?
Pre-merge tests validate individual changes in isolation. Post-merge tests validate that the merge itself didn’t introduce issues:
- Merge conflict resolutions may have introduced bugs
- Timing-dependent interactions between simultaneous merges
- Dependencies between changes merged around the same time
- Environment differences between local and CI
Running the full suite after merge provides a final safety check.
What About Deployment Testing?
Tests that require deployment to an environment (end-to-end tests, smoke tests) belong in the deployment pipeline, not in CI.
Why Separate Deployment Testing
- CI validates code integration
- Deployment pipeline validates releasability
- Different performance requirements
- Different failure modes and remediation
Mixing these concerns leads to slow CI pipelines that discourage frequent integration.
The Testing Trophy
The testing trophy model emphasizes sociable unit tests (testing units with their real collaborators) as the foundation of your test suite.
/\
/ \ Static Analysis
/----\
/ E2E \ End-to-end tests
/--------\
/Integration\ ← Most tests here (80%)
/------------\
/ Unit \ Supporting layer
Test Distribution
Static analysis (Foundation): Type checkers, linters, security scanners—catch errors before running code.
Solitary unit tests (Supporting—minimize these): Pure functions with no dependencies. Use sparingly.
Sociable unit tests / Integration tests (The bulk—80%): Test units with their real collaborators. This is where most of your tests should be.
E2E tests (Critical paths only): Complete user journeys. Use sparingly due to cost and brittleness.
Sociable vs Solitary Unit Tests
Terminology note: What the testing trophy calls “integration tests” are more precisely sociable unit tests in Martin Fowler’s Practical Test Pyramid.
- Solitary unit tests: Test a unit in complete isolation with all dependencies mocked
- Sociable unit tests (recommended): Test a unit with its real collaborators and dependencies within the component under test while avoiding network boundaries.
Prioritize sociable unit tests over solitary unit tests because they:
- Catch real bugs in how components interact
- Are less brittle (don’t break during refactoring)
- Test actual behavior rather than implementation details
- Provide higher confidence without significant speed penalty
For detailed examples and guidance, see:
- Write tests. Not too many. Mostly integration. - Kent C. Dodds
- The Testing Trophy and Testing Classifications - Kent C. Dodds
- The Practical Test Pyramid - Martin Fowler
Test at the Right Level
Decision Tree
- Is it pure logic with no dependencies? → Solitary unit test
- Does it have collaborators/dependencies? → Sociable unit test / Integration test (most code!)
- Does it cross system boundaries or require full deployment? → E2E test (sparingly)
Key Principle
Default to sociable unit tests (with real dependencies) over solitary unit tests (with mocks).
When in Doubt
Choose sociable unit test. It will catch more real bugs than a solitary unit test with mocks.
Deterministic Testing
All tests must be deterministic—producing the same result every time they run. Flaky tests destroy trust in the pipeline.
Common Causes of Flaky Tests
- Race conditions and timing issues
- Shared state between tests
- External dependencies (networks, databases)
- Non-deterministic inputs (random data, current time)
- Environmental differences
Solutions
- Mock external dependencies you don’t control
- Clean up test data after each test
- Control time and randomness in tests
- Isolate test execution
- Fix or remove flaky tests immediately
For detailed guidance, see Deterministic Tests.
Test Quality Over Coverage
Test coverage percentage doesn’t indicate test quality.
Better questions than “What’s our coverage percentage?”:
- Do we trust our tests?
- Are we confident we’ve covered positive and negative cases?
- Do tests document expected behavior?
- Would tests catch regressions in critical paths?
Coverage Mandates Are Harmful
Setting organization-wide coverage standards incentivizes meaningless tests that hide the fact that code isn’t properly tested.
It is better to have no tests than to have tests you do not trust.
Instead of mandates:
- Focus on test quality and behavior coverage
- Build team discipline around testing
- Review tests as carefully as production code
- Make testing part of the definition of done
For detailed guidance, see Code Coverage.
Practical Recommendations for CI
Building Your Test Suite
- Start with static analysis: Type checkers, linters—catch errors before running code
- Write sociable unit tests as default: Test with real dependencies (databases, state, etc.)
- Add solitary unit tests sparingly: Only for pure functions with complex logic
- Add E2E tests strategically: Critical user journeys and revenue paths only
- Avoid excessive mocking: Mock only external services you don’t control
For CI Effectiveness
- Run static analysis first: Instant feedback, zero runtime cost
- Run fast tests pre-merge: Use in-memory databases, parallel execution
- Run comprehensive tests post-merge: More realistic setup, longer running tests
- Run E2E tests post-merge: Keep them out of the critical path
- Set time budgets: Pre-merge < 10 min, post-merge < 30 min
- Quarantine flaky tests: Fix or remove them immediately
For Test Quality
- Test behavior from user’s perspective: Not implementation details
- Use real dependencies: Catch real integration bugs
- One scenario per test: Makes failures obvious and debugging fast
- Descriptive test names: Should explain what behavior is being verified
- Independent tests: No shared state, can run in any order
Testing Anti-Patterns to Avoid
- Don’t mock everything: Solitary unit tests with extensive mocking are brittle
- Don’t test implementation details: Tests that break during refactoring provide no value
- Don’t write E2E for everything: Too slow, too brittle—use sociable unit tests instead
- Don’t skip sociable unit tests: This is where the bugs hide
- Don’t ignore flaky tests: They destroy trust in your pipeline
Starting Without Full Coverage
You don’t need tests in existing code to begin CI. You need to test new code without exception.
Starting point: “We will not go lower than the current level of code coverage.”
This approach:
- Allows teams to start CI immediately
- Prevents technical debt from growing
- Builds testing discipline incrementally
- Improves coverage over time
As you work in existing code:
- Add tests for code you modify
- Test new features completely
- Gradually improve coverage in active areas
- Don’t mandate retrofitting tests to untouched code
Additional Resources
Testing Strategies
- Write tests. Not too many. Mostly integration. - Kent C. Dodds (Testing Trophy)
- The Testing Trophy and Testing Classifications - Kent C. Dodds
- Static vs Unit vs Integration vs E2E Testing - Kent C. Dodds
- The Practical Test Pyramid - Martin Fowler
- Testing Strategies for Microservices - Martin Fowler (for distributed systems and service-oriented architectures)
Testing Practices
- Behavior-Driven Development - DevOps Dojo Consortium
- Deterministic Tests
- Code Coverage - DevOps Dojo Consortium
1.3 - Pipeline Visibility & Health Metrics
CI pipeline visibility ensures the entire team can see the health of the integration process and respond quickly to issues. Combined with the right metrics, visibility drives continuous improvement.
Why Visibility Matters
When pipeline status is visible to everyone:
- Faster response: Team sees failures immediately
- Shared accountability: Everyone owns the build
- Better collaboration: Team coordinates on fixes
- Continuous improvement: Metrics highlight bottlenecks
- Quality culture: Green builds become a team priority
Making the Pipeline Visible
Real-Time Status Display
Make build status impossible to ignore:
- Build radiators: Large displays showing current status
- Team dashboards: Shared screens with pipeline health
- Status indicators: Visual signals (traffic lights, etc.)
- Browser extensions: Build status in developer tools
- Desktop notifications: Alerts when builds break
The key is making status ambient—visible without requiring effort to check.
Notification Systems
Automated notifications ensure the team knows when action is needed:
When to notify
- Build failures on trunk
- Flaky test detection
- Long-running builds
- Security vulnerabilities found
- Quality gate failures
How to notify
- Team chat channels (Slack, Teams)
- Email for critical failures
- SMS/phone for extended outages
- Dashboard alerts
- Version control integrations
Notification best practices
- Notify the whole team, not individuals
- Include failure details and logs
- Link directly to failed builds
- Suggest next actions
- Avoid notification fatigue with smart filtering
CI Health Metrics
Track these metrics to understand and improve CI effectiveness:
Commits per Day per Developer
What: How frequently the team integrates code to trunk
How to measure: Total commits to trunk ÷ number of developers ÷ days
Good: ≥ 1 commit per developer per day (team average)
Why it matters:
- Indicates true CI practice adoption
- Shows work breakdown effectiveness
- Reveals integration discipline
- Predicts integration conflict frequency
Important: Never compare individuals—this is a team metric. Use it to understand team behavior, not to rank developers.
If the number is low
- Work is too large to integrate daily
- Team needs better work decomposition
- Fear of breaking the build
- Missing evolutionary coding skills
Development Cycle Time
What: Time from when work begins to completion (merged to trunk)
How to measure: Time from first commit on branch to merge to trunk
Good: < 2 days on average
Why it matters:
- Indicates effective work breakdown
- Shows CI practice maturity
- Predicts batch size and risk
- Correlates with deployment frequency
If cycle time is high
- Stories are too large
- Rework due to late feedback
- Waiting for code reviews
- Complex approval processes
- Poor work decomposition
Build Success Rate
What: Percentage of trunk builds that pass all tests
How to measure: (Successful builds ÷ total builds) × 100
Good: > 95%
Why it matters:
- Indicates pre-merge testing quality
- Shows team discipline
- Predicts trunk stability
- Reflects testing effectiveness
If success rate is low
- Pre-merge tests insufficient
- Team not running tests locally
- Flaky tests creating false failures
- Missing stop-the-line discipline
Time to Fix Broken Build
What: How quickly the team resolves build failures on trunk
How to measure: Time from build failure to successful build
Good: < 1 hour
Why it matters:
- Shows team commitment to CI
- Indicates stop-the-line practice
- Reflects debugging capability
- Predicts integration delays
If fix time is high
- Team continues feature work during failures
- Difficult to diagnose failures
- Complex, slow build process
- Lack of build ownership
- Poor error messages in tests
Defect Rate
What: Critical guardrail metric to ensure speed doesn’t sacrifice quality
How to measure: Defects found per unit of time or per deployment
Good: Stable or decreasing as CI improves
Why it matters:
- Quality validation
- Prevents speed over quality
- Shows testing effectiveness
- Builds stakeholder confidence
If defect rate increases
- Tests don’t cover critical paths
- Team skipping testing discipline
- Poor test quality (coverage without value)
- Speed prioritized over quality
- Missing acceptance criteria
Dashboard Design
Effective CI dashboards show the right information at the right time:
Essential Information
Current status
- Trunk build status (green/red)
- Currently running builds
- Recent commit activity
- Failed test names
Trends over time
- Commit frequency
- Build success rate
- Average fix time
- Cycle time trends
Team health
- Number of active branches
- Age of oldest branch
- Flaky test count
- Test execution time
Dashboard Anti-Patterns
Avoid
- Individual developer comparisons
- Vanity metrics (total commits, lines of code)
- Too much detail (cognitive overload)
- Metrics without context
- Stale data (not real-time)
Using Metrics for Improvement
Metrics are tools for learning, not weapons for management.
Good Uses
- Team retrospectives on CI effectiveness
- Identifying bottlenecks in the process
- Validating improvements (A/B comparisons)
- Celebrating progress and wins
- Guiding focus for improvement efforts
Bad Uses
- Individual performance reviews
- Team comparisons or rankings
- Setting arbitrary targets without context
- Gaming metrics to look good
- Punishing teams for honest reporting
Improvement Cycle
- Measure current state: Establish baseline metrics
- Identify bottleneck: What’s the biggest constraint?
- Hypothesize improvement: What change might help?
- Experiment: Try the change for a sprint
- Measure impact: Did metrics improve?
- Standardize or iterate: Keep or adjust the change
Common Visibility Challenges
“Metrics Can Be Gamed”
Yes, any metric can be gamed. The solution isn’t to avoid metrics—it’s to:
- Use metrics for learning, not punishment
- Track multiple metrics (gaming one reveals problems in others)
- Focus on outcomes (quality, speed) not just outputs (commits)
- Build a culture of honesty and improvement
“Too Many Notifications Create Noise”
True. Combat notification fatigue:
- Only notify on trunk failures (not branch builds)
- Aggregate related failures
- Auto-resolve when fixed
- Use severity levels
- Allow custom notification preferences
“Dashboards Become Wallpaper”
Dashboards lose impact when ignored. Keep them relevant:
- Update regularly with fresh data
- Rotate what’s displayed
- Discuss in stand-ups
- Celebrate improvements
- Remove stale metrics
Additional Resources
- Accelerate: Measuring Performance - Nicole Forsgren, Jez Humble, Gene Kim
- Metrics That Matter - DevOps Dojo Consortium
- Code Coverage
1.4 - All Feature Work Stops When the Build Is Red
When the trunk build breaks, the entire team stops feature work and collaborates to fix it immediately. This practice, borrowed from lean manufacturing’s Andon Cord, prevents defects from propagating and maintains an always-releasable trunk.
Every team member shifts focus to:
- Understanding what broke
- Fixing the broken build
- Learning why it happened
- Preventing similar failures
No new feature work begins until the build is green again.
Why ALL Work Stops, Not Just Merges
A common objection is: “Why stop all feature work? Just block merging until the pipeline is green.”
This misses the point. Continuous Delivery is not just technology and workflow—it is a mindset. Part of that mindset is that individuals on the team do not have individual priorities. The team has priorities.
Work Closer to Production Is Always More Valuable
Work that is closer to production is always more valuable than work that is further away. A broken pipeline is halting the most important work: getting tested, integrated changes to users. It is also blocking any hotfix the team may need to deploy.
When the build is red, fixing it is the team’s highest priority. Not your feature. Not your story. The pipeline.
“Just Block Merges” Creates a False Sense of Progress
If developers continue writing feature code while the build is broken:
- They are building on a foundation they cannot verify
- Their work is accumulating integration risk with every passing minute
- They are individually productive but the team is not delivering
- The broken build becomes someone else’s problem instead of everyone’s priority
- The incentive to fix the build urgently is removed—it can wait until someone wants to merge
This is the difference between individual activity and team effectiveness. A team where everyone is typing but nothing is shipping is not productive.
This Is a Team Organization Problem
If the team is not organized to enable everyone to swarm on a broken build, that is a fundamental dysfunction. CD requires teams that:
- Share ownership of the pipeline and the codebase
- Prioritize collectively rather than protecting individual work streams
- Can all contribute to diagnosing and fixing build failures
- Treat the pipeline as the team’s most critical asset
A team that says “I’ll keep working on my feature while someone else fixes the build” has not adopted the CD mindset. They are a group of individuals sharing a codebase, not a team practicing Continuous Delivery.
What This Looks Like in Practice
When the Team Stops
09:15 - Build fails on trunk
09:16 - Automated notification to team chat
09:17 - Team acknowledges
09:18 - Feature work pauses
09:20 - Quick huddle: what broke?
09:25 - Two devs pair on fix
09:40 - Fix committed
09:45 - Build green
09:46 - Team resumes feature work
09:50 - Quick retro: why did it break?
Total impact: 30 minutes of paused feature work Team learned: Missing test case for edge condition Outcome: Better tests, faster next time
When the Team Doesn’t Stop
09:15 - Build fails on trunk
09:30 - Someone notices
10:00 - "We'll look at it later"
11:00 - Another commit on a red build
12:00 - Third failure, harder to diagnose
14:00 - "This is too complex, we need help"
16:00 - Multiple devs debugging
17:30 - Finally fixed
Total impact: 8+ hours of broken trunk, multiple devs blocked Team learned: Nothing systematic Outcome: Same failures likely to recur
When developers continue working on a broken build, new work may depend on broken code, multiple changes pile up making diagnosis harder, and the broken state becomes the new baseline. Stopping immediately contains the problem.
When the Fix Takes Too Long
If the fix will take more than 15 minutes, prefer reverting:
Option 1: Revert immediately
- Roll back the commit that broke the build
- Get trunk green
- Fix properly offline
- Re-integrate with the fix
Option 2: Forward fix with a time limit
- Set a timer (15 minutes)
- Work on forward fix
- If the timer expires: revert
- Fix offline and re-integrate
Choose revert bias when unsure. The goal is a green trunk, not a heroic fix.
Team Working Agreements
Effective stop-the-line requires clear agreements:
Fast Build Feedback
Agreement: “Our builds complete in < 10 minutes”
Developers can’t respond to failures they don’t know about. If builds are slow, parallelize test execution, move slow tests post-merge, or invest in faster infrastructure.
Visible Build Status
Agreement: “Build status is visible to the entire team at all times”
You can’t stop for failures you don’t see. Use build radiators, chat notifications, and desktop alerts. See Pipeline Visibility for detailed guidance.
Team Owns the Fix
Agreement: “When the build breaks, the team owns the fix”
Not: “Whoever broke it fixes it” Instead: “The team fixes it together”
Individual blame prevents collaboration. The person who triggered the failure may not have the expertise or context to fix it quickly. Rally the team.
Fixed Means Green
Agreement: “Fixed means green build on trunk, not just a fix committed”
Fixed includes: root cause identified, fix implemented, tests passing on trunk, and a plan to prevent recurrence.
No Bypassing
Agreement: “We will not bypass CI to deploy during red builds”
Not for critical hotfixes (fix the build first, or revert). Not for small changes (small doesn’t mean safe). Not for “known failures” (then they should be fixed or removed). Not for executive pressure (protect the team).
Common Objections
“We can’t afford to stop feature work”
You can’t afford not to. Every hour the build stays broken compounds future integration issues, blocks other developers, erodes deployment confidence, and increases fix complexity. Stopping is cheaper.
“Stopping kills our velocity”
Short term, stopping might feel slow. Long term, stopping accelerates delivery. Broken builds that persist block developers, create integration debt, and compound failures. Stopping maintains velocity by preventing these compounding costs.
“We stop all the time”
If builds break frequently, the problem isn’t stopping—it’s insufficient testing before merge. Improve pre-merge testing, require local test runs, and fix flaky tests. Stopping reveals the problem. Better testing solves it.
“It’s a known flaky test”
Then remove it from the build. Either fix the flaky test immediately, remove it from trunk builds, or quarantine it for investigation. Non-deterministic tests are broken tests. See Deterministic Tests for guidance.
“Management doesn’t support stopping”
Educate stakeholders on the economics: show time saved by early fixes, demonstrate deployment confidence, track defect reduction, and measure cycle time improvement. If leadership demands features over quality, you’re not empowered to do CI.
The Cultural Shift
This practice represents a fundamental change:
From: “Individual productivity” To: “Team effectiveness”
From: “Ship features at all costs” To: “Maintain quality while shipping features”
From: “Move fast and break things” To: “Move fast by not breaking things”
This shift is uncomfortable but essential for sustainable high performance.
Metrics
- Time to fix: Time from build failure to green build. Target < 15 minutes median, < 1 hour average.
- Stop rate: Percentage of build failures that trigger full stop. Target 100%.
- Failure frequency: Build failures per week. Should decrease over time.
Track patterns in why builds break (flaky tests, missing pre-merge tests, environment differences, integration issues) to identify systemic improvement opportunities.
Additional Resources
- Continuous Integration - Martin Fowler
- The Andon Cord - Lean Manufacturing principle
- Pipeline Visibility
- Deterministic Tests
2 - Only Path to Any Environment
Definition
The deployment pipeline is the single, standardized path for all changes to reach any environment—development, testing, staging, or production. No manual deployments, no side channels, no “quick fixes” bypassing the pipeline. If it’s not deployed through the pipeline, it doesn’t get deployed.
Key principles:
- Single path: All deployments flow through the same pipeline
- No exceptions: Even hotfixes and rollbacks go through the pipeline
- Automated: Deployment is triggered automatically after pipeline validation
- Auditable: Every deployment is tracked and traceable
- Consistent: The same process deploys to all environments
Why This Matters
Multiple Deployment Paths Create Serious Risks
- Quality issues: Bypassing the pipeline bypasses quality checks
- Configuration drift: Manual deployments create inconsistencies between environments
- Security vulnerabilities: Undocumented changes escape security review
- Debugging nightmares: “What’s actually running in production?”
- Compliance violations: Audit trails break when changes bypass the pipeline
- Lost confidence: Teams lose trust in the pipeline and resort to manual interventions
A Single Deployment Path Provides
- Reliability: Every deployment is validated the same way
- Traceability: Clear audit trail from commit to production
- Consistency: Environments stay in sync
- Speed: Automated deployments are faster than manual
- Safety: Quality gates are never bypassed
- Confidence: Teams trust that production matches what was tested
- Recovery: Rollbacks are as reliable as forward deployments
What “Single Path” Means
One Merge Pattern for All Changes
Direct Trunk Integration: all work integrates directly to trunk using the same process.
Anti-pattern Examples
- Integration Branch
This creates TWO merge structures instead of one:
- When trunk changes → merge to integration branch immediately
- When features change → merge to integration branch at least daily
The integration branch lives a parallel life to the trunk, acting as a temporary container for partially finished features. This attempts to “mimic” feature toggles to keep inactive features out of production.
Why This Violates Single-Path
- Creates multiple merge patterns (trunk→integration AND features→integration)
- Integration branch becomes a second “trunk” with different rules
- Adds complexity: “Is this change ready for integration or trunk?”
- Defeats the purpose: Use actual feature flags instead of mimicking them with branches
- Accumulates “given-up” features that stay unfinished forever
- Delays true integration: Features are integrated to integration branch but not to trunk
- GitFlow (Multiple Long-Lived Branches)
GitFlow creates MULTIPLE merge patterns depending on change type:
- Features: feature → develop → release → master
- Hotfixes: hotfix → master AND hotfix → develop
- Releases: develop → release → master
Why This Violates Single-Path
- Different types of changes follow different paths to production
- Multiple long-lived branches (master, develop, release) create merge complexity
- Hotfixes have a different path than features (bypassing develop)
- Release branches delay integration and create batch deployments
- Merge conflicts multiply across multiple integration points
- Violates continuous integration principle (changes don’t integrate daily to trunk)
- Forces “release” to be a special event rather than continuous deployment
The Correct Approach: Trunk-Based Development with Integration Patterns
Option 1: Feature Flags
For incomplete features that need to be hidden:
Option 2: Branch by Abstraction
For behavior changes:
Option 3: Connect Tests Last
For new features:
Option 4: Dark Launch
For new API routes:
All code integrates to trunk using ONE merge pattern. Incomplete features are managed through these patterns, not through separate integration branches.
For guidance on when to use each pattern, see Feature Flags.
All Environments Use the Same Pipeline
The same pipeline deploys to every environment, including hotfixes and rollbacks:
Anti-Patterns to Avoid
- SSH into server and copy files
- Upload through FTP/SFTP
- Run scripts directly on production servers
- Use separate “emergency deployment” process
- Manual database changes in production
- Different deployment processes for different environments
Example Implementations
Anti-Pattern: Multiple Deployment Paths
Problem: No consistency, no audit trail, no validation. Production becomes a mystery box.
Good Pattern: Single Pipeline for Everything
Benefit: Every deployment—normal, hotfix, or rollback—uses this pipeline. Consistent, validated, traceable.
Common Patterns
Environment Promotion
Deploy the same artifact through progressive environments:
Fast-Track Pipeline for Emergencies
Keep the same path, but optimize for speed when needed:
Rollback via Pipeline
Rollbacks should be faster than forward deployments:
Database Migrations
All database changes flow through the pipeline:
Database Change Requirements
- Backward-compatible (new code works with old schema)
- Forward-deployable (migrations are additive)
- Automated (migrations run in pipeline)
This allows rolling back application code without rolling back schema.
FAQ
What if the pipeline is broken and we need to deploy a critical fix?
Fix the pipeline first. If your pipeline is so fragile that it can’t deploy critical fixes, that’s a pipeline problem, not a process problem. Invest in pipeline reliability.
What about emergency hotfixes that can’t wait for the full pipeline?
The pipeline should be fast enough to handle emergencies. If it’s not, optimize the pipeline. A “fast-track” mode that skips some tests is acceptable (see Common Patterns above), but it must still be the same pipeline, not a separate manual process.
Can we manually patch production “just this once”?
No. “Just this once” becomes “just this once again.” Manual production changes always create problems. Commit the fix, push through the pipeline, deploy.
What if deploying through the pipeline takes too long?
Optimize your pipeline:
- Parallelize tests
- Use faster test environments
- Implement progressive deployment (canary, blue-green)
- Cache dependencies
- Optimize build times
A well-optimized pipeline should deploy to production in under 30 minutes.
Can operators make manual changes for maintenance?
Infrastructure maintenance (patching servers, scaling resources) is separate from application deployment. However, application deployment must still only happen through the pipeline.
Health Metrics
- Pipeline deployment rate: Should be 100% (all deployments go through pipeline)
- Manual override rate: Should be 0%
- Hotfix pipeline time: Should be < 30 minutes
- Rollback success rate: Should be > 99%
- Deployment frequency: Should increase over time as confidence grows
Additional Resources
3 - Deterministic Pipeline
Definition
A deterministic pipeline produces consistent, repeatable results. Given the same inputs (code, configuration, dependencies), the pipeline will always produce the same outputs and reach the same pass/fail verdict. The pipeline’s decision on whether a change is releasable is definitive—if it passes, deploy it; if it fails, fix it.
Key principles:
- Repeatable: Running the pipeline twice with identical inputs produces identical results
- Authoritative: The pipeline is the final arbiter of quality, not humans
- Immutable: No manual changes to artifacts or environments between pipeline stages
- Trustworthy: Teams trust the pipeline’s verdict without second-guessing
Why This Matters
Non-deterministic pipelines create serious problems:
- False confidence: Tests pass inconsistently, hiding real issues
- Wasted time: Debugging “flaky” tests instead of delivering value
- Trust erosion: Teams stop trusting the pipeline and add manual gates
- Slow feedback: Re-running tests to “see if they pass this time”
- Quality degradation: Real failures get dismissed as “just flaky tests”
Deterministic pipelines provide:
- Confidence: Pipeline results are reliable and meaningful
- Speed: No need to re-run tests or wait for manual verification
- Clarity: Pass means deploy, fail means fix—no ambiguity
- Quality: Every failure represents a real issue that must be addressed
What Makes a Pipeline Deterministic
Version Control Everything
All pipeline inputs must be version controlled:
- Source code (obviously)
- Infrastructure as code (Terraform, CloudFormation, etc.)
- Pipeline definitions (GitHub Actions, Jenkins files, etc.)
- Test data (fixtures, mocks, seeds)
- Configuration (app config, test config)
- Dependency lockfiles (package-lock.json, Gemfile.lock, go.sum, Cargo.lock, poetry.lock, etc.)
- Build scripts (Make, npm scripts, etc.)
Critical: Always commit lockfiles to version control. This ensures every pipeline run uses identical dependency versions.
Eliminate Environmental Variance
The pipeline must control its environment:
- Container-based builds: Use Docker with specific image tags (e.g.,
node:18.17.1, nevernode:latest) - Isolated test environments: Each pipeline run gets a clean, isolated environment
- Exact dependency versions: Always use lockfiles (
package-lock.json,go.sum, etc.) and install with--frozen-lockfileor equivalent - Controlled timing: Don’t rely on wall-clock time or race conditions
- Deterministic randomness: Seed random number generators for reproducibility
Recommended Practice: Never use floating version tags like latest, stable, or version ranges like ^1.2.3. Always pin to exact versions.
Remove Human Intervention
Manual steps break determinism:
- No manual approvals in the critical path (use post-deployment verification instead)
- No manual environment setup (automate environment provisioning)
- No manual artifact modifications (artifacts are immutable after build)
- No manual test data manipulation (generate or restore from version control)
Fix Flaky Tests Immediately
Flaky tests destroy determinism:
- All feature work stops when tests become flaky
- Root cause and fix flaky tests immediately—don’t just retry
- Quarantine pattern: Move flaky tests to quarantine, fix them, then restore
- Monitor flakiness: Track test stability metrics
Example Implementations
Anti-Pattern: Non-Deterministic Pipeline
Problem: Results vary based on when the pipeline runs, what’s in production, which dependency versions are “latest,” and human availability.
Good Pattern: Deterministic Pipeline
Benefit: Same inputs always produce same outputs. Pipeline results are trustworthy and reproducible.
What is Improved
- Quality increases: Real issues are never dismissed as “flaky tests”
- Speed increases: No time wasted on test reruns or manual verification
- Trust increases: Teams rely on the pipeline instead of adding manual gates
- Debugging improves: Failures are reproducible, making root cause analysis easier
- Collaboration improves: Shared confidence in the pipeline reduces friction
- Delivery improves: Faster, more reliable path from commit to production
Common Patterns
Immutable Build Containers
Use specific container images for builds:
Hermetic Test Environments
Isolate each test run:
Dependency Lock Files (Recommended Practice)
Always use dependency lockfiles - this is essential for deterministic builds:
Never:
- Use
npm installin CI (usenpm ciinstead) - Add lockfiles to
.gitignore - Use version ranges in production dependencies (
^,~,>=) - Rely on “latest” tags for any dependency
Quarantine for Flaky Tests
Temporarily isolate flaky tests:
FAQ
What if a test is occasionally flaky but hard to reproduce?
This is still a problem. Flaky tests indicate either:
- A real bug in your code (race conditions, etc.)
- A problem with your test (dependencies on external state)
Both need to be fixed. Quarantine the test, investigate thoroughly, and fix the root cause.
Can we use retries to handle flaky tests?
Retries mask problems rather than fixing them. A test that passes on retry is hiding a failure, not succeeding. Fix the flakiness instead of retrying.
What about tests that depend on external services?
Use test doubles (mocks, stubs, fakes) for external dependencies. If you must test against real external services, use contract tests and ensure those services are version-controlled and deterministic too.
How do we handle tests that involve randomness?
Seed your random number generators with a fixed seed in tests:
What if our deployment requires manual verification?
Manual verification can happen after deployment, not before. Deploy automatically based on pipeline results, then verify. If verification fails, roll back automatically.
Should the pipeline ever be non-deterministic?
There are rare cases where controlled non-determinism is useful (chaos engineering, fuzz testing), but these should be:
- Explicitly designed and documented
- Separate from the core deployment pipeline
- Reproducible via saved seeds/inputs
Health Metrics
- Test flakiness rate: Should be < 1% (ideally 0%)
- Pipeline consistency: Same commit should pass/fail consistently across runs
- Time to fix flaky tests: Should be < 1 day
- Manual override rate: Should be near zero
Additional Resources
4 - Definition of Deployable
Definition
The “definition of deployable” is your organization’s agreed-upon set of non-negotiable quality criteria that every artifact must pass before it can be deployed to any environment. This definition should be automated, enforced by the pipeline, and treated as the authoritative verdict on whether a change is ready for deployment.
Key principles:
- Pipeline is definitive: If the pipeline passes, the artifact is deployable—no exceptions
- Automated validation: All criteria are checked automatically, not manually
- Consistent across environments: The same standards apply whether deploying to test or production
- Fails fast: The pipeline rejects artifacts that don’t meet the standard immediately
Why This Matters
Without a clear, automated definition of deployable, teams face:
- Inconsistent quality standards: Different people have different opinions on “ready”
- Manual gatekeeping: Deployment approvals become bottlenecks
- Surprise failures: Issues that should have been caught earlier appear in production
- Blame culture: Unclear accountability when problems arise
- Deployment fear: Uncertainty about readiness causes risk aversion
A strong definition of deployable creates:
- Confidence: Everyone trusts that pipeline-approved artifacts are safe
- Speed: No waiting for manual approvals or meetings
- Clarity: Unambiguous standards for the entire team
- Accountability: The pipeline (and the team that maintains it) owns quality
What Should Be in Your Definition
Your definition of deployable should include automated checks for:
Security
- Static security scans (SAST) pass
- Dependency vulnerability scans show no critical issues
- Secrets are not embedded in code
- Authentication/authorization tests pass
Functionality
- All unit tests pass
- Integration tests pass
- End-to-end tests pass
- Regression tests pass
- Business logic behaves as expected
Compliance
- Code meets regulatory requirements
- Audit trails are in place
- Required documentation is generated
- Compliance tests pass
Performance
- Response time meets thresholds
- Resource usage is within acceptable limits
- Load tests pass
- No memory leaks detected
Reliability
- Error rates are within acceptable bounds
- Circuit breakers and retries work correctly
- Graceful degradation is in place
- Health checks pass
Code Quality
- Code style/linting checks pass
- Code coverage meets minimum threshold
- Static analysis shows no critical issues
- Technical debt is within acceptable limits
Example Implementations
Anti-Pattern: Manual Approval Process
Problem: Manual steps delay feedback, introduce inconsistency, and reduce confidence.
Good Pattern: Automated Pipeline Gates
Benefit: Every commit is automatically validated against all criteria. If it passes, it’s deployable.
What is Improved
- Removes bottlenecks: No waiting for manual approval meetings
- Increases quality: Automated checks catch more issues than manual reviews
- Reduces cycle time: Deployable artifacts are identified in minutes, not days
- Improves collaboration: Shared understanding of quality standards
- Enables continuous delivery: Trust in the pipeline makes frequent deployments safe
- Reduces stress: Clear criteria eliminate guesswork and blame
Common Patterns
Progressive Quality Gates
Structure your pipeline to fail fast on quick checks, then run expensive tests:
Context-Specific Definitions
Some criteria may vary by context:
Error Budget Approach
Use error budgets to balance speed and reliability:
If error budget is exhausted, focus shifts to reliability work instead of new features.
FAQ
Who decides what goes in the definition of deployable?
The entire team—developers, QA, operations, security, and product—should collaboratively define these standards. It should reflect genuine risks and requirements, not arbitrary bureaucracy.
What if the pipeline passes but we find a bug in production?
This indicates a gap in your definition of deployable. Add a test to catch that class of bug in the future. The definition should evolve based on production learnings.
Can we skip pipeline checks for “urgent” hotfixes?
No. If the pipeline can’t validate a hotfix quickly enough, that’s a problem with your pipeline, not your process. Fix the pipeline, don’t bypass it. Bypassing quality checks for “urgent” changes is how critical bugs reach production.
How strict should our definition be?
Strict enough to prevent production incidents, but not so strict that it becomes a bottleneck. If your pipeline rejects 90% of commits, your standards may be too rigid. If production incidents are frequent, your standards may be too lax.
Should manual testing be part of the definition?
Manual exploratory testing is valuable for discovering edge cases, but it should inform the definition, not be the definition. Automate the validations that result from manual testing discoveries.
What about things we can’t test automatically?
Some requirements (like UX polish or accessibility) are harder to automate fully. For these:
- Automate what you can (e.g., accessibility checkers, visual regression tests)
- Make manual checks lightweight and concurrent, not blockers
- Continuously work to automate more
Health Metrics
- Pipeline pass rate: Should be 70-90% (too high = tests too lax, too low = tests too strict)
- Pipeline execution time: Should be < 30 minutes for full validation
- Production incident rate: Should decrease over time as definition improves
- Manual override rate: Should be near zero (manual overrides indicate broken process)
Additional Resources
5 - Immutable Artifact
Central to CD is that we are validating the artifact with the pipeline. It is built once and deployed to all environments. A common anti-pattern is building an artifact for each environment. The pipeline should generate immutable, versioned artifacts.
Definition
- Immutable Pipeline: In the beginning, it may seem that the obvious way to address a failure in the pipeline is to go to the failure point, make some adjustments in the environment, test data, or whatever else failed, and then to re-start the pipeline from that point. However, that transforms a repeatable quality process into an untrustworthy custom build. Failures should be addressed by changes in version control so that two executions with the same configuration will always yield the same results.
- Immutable Artifacts: Some package management systems will allow the creation of release candidate versions. For example, it is common to find
-SNAPSHOTversions used for this in Java. However, this means we have an artifact where the behavior can be changed without modifying the version. Version numbers are cheap. If we are to have an immutable pipeline, it must produce an immutable artifact. We should never have dependencies that use-SNAPSHOTversions and we should never produce-SNAPSHOTversions.
Immutability provides us with the confidence to know that the results from the pipeline are real and repeatable.
What is Improved
- Everything must be version controlled: source code, environment configurations, application configurations, and even test data. This reduces variability and improves the quality process.
6 - Prod-Like Test Environment
Definition
It is crucial to leverage pre-production environments in your CI/CD to run all of your tests (Unit / Integration / UAT / Manual QA / E2E) early and often. Test environments increase interaction with new features and exposure to bugs – both of which are important prerequisites for reliable software.
Example Implementations
There are different types of pre-production test environments. Most organizations will employ both static and short-lived environments and utilize them for case-specific stages of the SDLC.
- Staging environment: Ideally, this is the last environment that teams will run automated tests against prior to deployment, particularly for testing interaction between all new features after a merge. Its infrastructure will reflect production as closely as possible.
- Ephemeral environments (collected from EphemeralEnvironments.io): These are full-stack, on-demand environments that are spun up on every code change. Each ephemeral environment should be leveraged in your pipeline, which will run E2E, unit, and integration tests against them on every code change. These environments are defined in version control and created and destroyed automatically on demand. They are short-lived by definition but should closely resemble production; they are intended to replace long-lived “static” environments and the maintenance required to keep those stable, i.e., “development,” “QA1”, “QA2”, “testing,” etc.
What is Improved
- Infrastructure is kept consistent: Test environments deliver results that reflect real-world performance. Few unprecedented bugs sneak into production since using prod-like data and dependencies allows you to run your entire test suite earlier against multiple prod-like environments.
- Test against latest changes: These environments will rebuild upon code changes with no manual intervention.
- Test before merge: Attaching an ephemeral environment to every PR enables E2E testing in your CI before code changes get deployed to staging. New features get tested in parallel, avoiding the dreaded “waiting to run my tests” blocking your entire SDLC.
7 - Rollback On-demand
Definition
Rollback on-demand means the ability to quickly and safely revert to a previous working version of your application at any time, without requiring special approval, manual intervention, or complex procedures. It should be as simple and reliable as deploying forward.
Key principles:
- Fast: Rollback completes in minutes, not hours
- Automated: No manual steps or special procedures
- Safe: Rollback is validated just like forward deployment
- Simple: Single command or button click initiates rollback
- Tested: Rollback mechanism is regularly tested, not just used in emergencies
Why This Matters
Without reliable rollback capability:
- Fear of deployment: Teams avoid deploying because failures are hard to recover from
- Long incident resolution: Hours wasted debugging instead of immediately reverting
- Customer impact: Users suffer while teams scramble to fix issues
- Pressure to “fix forward”: Teams rush incomplete fixes instead of safely rolling back
- Deployment delays: Risk aversion slows down release cycles
With reliable rollback:
- Deployment confidence: Knowing you can roll back reduces fear
- Fast recovery: Minutes to restore service instead of hours
- Reduced risk: Bad deployments have minimal customer impact
- Better decisions: Teams can safely experiment and learn
- Higher deployment frequency: Confidence enables more frequent releases
What “Rollback On-demand” Means
Rollback is a Deployment
Rolling back means deploying a previous artifact version through your standard pipeline:
Not this:
Rollback is Tested
Rollback mechanisms should be tested regularly, not just during incidents:
- Practice rollbacks during non-critical times
- Include rollback tests in your pipeline
- Time your rollback to ensure it meets SLAs
- Verify rollback doesn’t break anything
Rollback is Fast
Rollback should be faster than forward deployment:
- Skip build stage (artifact already exists)
- Skip test stage (artifact was already tested)
- Go straight to deployment with previous artifact
Target: < 5 minutes from rollback decision to service restored.
Rollback is Safe
Rollback should:
- Deploy through the same pipeline (not a manual process)
- Run smoke tests to verify the rollback worked
- Update monitoring and alerts
- Maintain audit trail
Example Implementations
Anti-Pattern: Manual Rollback Process
Problem: Slow, manual, error-prone, no validation.
Good Pattern: Automated Rollback
Usage:
Benefit: Fast, automated, validated, audited.
What is Improved
- Mean Time To Recovery (MTTR): Drops from hours to minutes
- Deployment frequency: Increases due to reduced risk
- Team confidence: Higher willingness to deploy
- Customer satisfaction: Faster incident resolution
- Learning: Teams can safely experiment
- On-call burden: Reduced stress for on-call engineers
Common Patterns
Blue-Green Deployment
Maintain two identical environments:
Canary Rollback
Roll back gradually:
Feature Flag Rollback
Disable problematic features without redeploying:
Database-Safe Rollback
Design schema changes to support rollback:
Use expand-contract pattern:
- Expand: Add new column (both versions work)
- Migrate: Start using new column
- Contract: Remove old column (later, when safe)
Artifact Registry Retention
Keep previous artifacts available:
Ensures you can always roll back to recent versions.
FAQ
How far back should we be able to roll back?
Minimum: Last 3-5 production releases. Ideally: Any production release from the past 30-90 days. Balance storage costs with rollback flexibility.
What if the database schema changed?
Design schema changes to be backward-compatible:
- Use expand-contract pattern
- Make schema changes in separate deployment from code changes
- Test that old code works with new schema
What if we need to roll back the database too?
Database rollbacks are risky. Instead:
- Design schema changes to support rollback (backward compatibility)
- Use feature flags to disable code using new schema
- If absolutely necessary, have tested database rollback scripts
Should rollback require approval?
For production: On-call engineer should be empowered to roll back immediately without approval. Speed of recovery is critical. Post-rollback review is appropriate, but don’t delay the rollback.
How do we test rollback?
- Practice regularly: Perform rollback drills during low-traffic periods
- Automate testing: Include rollback in your pipeline tests
- Use staging: Test rollback in staging before production deployments
- Chaos engineering: Randomly trigger rollbacks to ensure they work
What if rollback fails?
Have a rollback-of-rollback plan:
- Roll forward to the next known-good version
- Use feature flags to disable problematic features
- Have out-of-band deployment method (last resort)
But if rollback is regularly tested, failures should be rare.
How long should rollback take?
Target: < 5 minutes from decision to service restored.
Breakdown:
- Trigger: < 30 seconds
- Deploy: 2-3 minutes
- Verify: 1-2 minutes
What about configuration changes?
Configuration should be versioned with the artifact. Rolling back the artifact rolls back the configuration. See Application Configuration.
Health Metrics
- Rollback success rate: Should be > 99%
- Mean Time To Rollback (MTTR): Should be < 5 minutes
- Rollback test frequency: At least monthly
- Rollback usage: Track how often rollback is used (helps justify investment)
- Failed rollback incidents: Should be nearly zero
Additional Resources
8 - Application Configuration
Definition
Application configuration defines the internal behavior of your application and is bundled with the artifact. It does not vary between environments. This is distinct from environment configuration (secrets, URLs, credentials) which varies by deployment.
We embrace The Twelve-Factor App config definitions:
- Application Configuration: Internal to the app, does NOT vary by environment (feature flags, business rules, UI themes, default settings)
- Environment Configuration: Varies by deployment (database URLs, API keys, service endpoints, credentials)
Application configuration should be:
- Version controlled with the source code
- Deployed as part of the immutable artifact
- Testable in the CI pipeline
- Unchangeable after the artifact is built
Why This Matters
Separating application configuration from environment configuration provides several critical benefits:
- Immutability: The artifact tested in staging is identical to what runs in production
- Traceability: You can trace any behavior back to a specific commit
- Testability: Application behavior can be validated in the pipeline before deployment
- Reliability: No configuration drift between environments caused by manual changes
Example Implementations
Anti-Pattern: External Application Config
Problem: Changes to this config after build mean the artifact behavior is untested and unpredictable.
Good Pattern: Bundled Application Config
Benefit: Application behavior is locked at build time; only environment-specific values change.
What is Improved
- Confidence in testing: When the pipeline passes, you know the exact behavior that will run in production
- Faster rollback: Rolling back an artifact rolls back all application configuration changes
- Audit trail: Every configuration change is in version control with commit history
- Reduced deployment risk: No surprises from configuration changes made outside the pipeline
- Better collaboration: Developers, QA, and operations all see the same configuration
Common Patterns
Feature Flags (Release Control)
Feature flags come in two flavors, and understanding the distinction is critical:
Static Feature Flags (Application Configuration)
Bundled with the artifact - These are application configuration:
- Flag definitions are in version control
- Deployed with the artifact
- Changing flags requires a new deployment
- Pipeline tests validate flag behavior
- Use case: Long-lived flags, kill switches, A/B test definitions
Dynamic Feature Flags (Environment Configuration)
External service - These are NOT application configuration:
- Flag state stored in external service (LaunchDarkly, Split.io, etc.)
- Changed without redeployment
- Different per environment (dev/staging/production)
- Use case: Real-time experimentation, emergency kill switches, gradual rollouts
Which should you use?
- Static flags: When you want config changes tested in pipeline
- Dynamic flags: When you need real-time control without deployment
Business Rules
These rules should be tested in the pipeline and deployed with the code.
Service Discovery
FAQ
How do I change application config for a specific environment?
You shouldn’t. If behavior needs to vary by environment, it’s environment configuration (injected via environment variables or secrets management). Application configuration is the same everywhere.
What if I need to hotfix a config value in production?
If it’s truly application configuration, make the change in code, commit it, let the pipeline validate it, and deploy the new artifact. Hotfixing config outside the pipeline defeats the purpose of immutable artifacts.
Can feature flags be application configuration?
It depends on the type:
Static feature flags (bundled with artifact): YES, these are application configuration
- Flag definitions and states in version control
- Deployed with the artifact
- Changes require redeployment through pipeline
Dynamic feature flags (external service): NO, these are environment configuration
- Flag states stored externally (LaunchDarkly, Split.io, etc.)
- Changed without redeployment
- Different per environment
- Not tested by pipeline before changes take effect
Both are valid patterns serving different needs. Static flags ensure pipeline validation; dynamic flags enable real-time experimentation.
What about config that changes frequently?
If it changes frequently enough that redeploying is impractical, it might be data, not configuration. Consider whether it belongs in a database or content management system instead.
How do I test application configuration changes?
The same way you test code changes:
- Commit the config change to version control
- CI builds the artifact with the new config
- Automated tests validate the behavior
- Deploy the artifact through all environments
Health Metrics
- Configuration drift incidents: Should be zero (config is immutable with artifact)
- Config-related rollbacks: Track how often config changes cause rollbacks
- Time to config change: From commit to production should match your deployment cycle time
Additional Resources
- The Twelve-Factor App: Config
- Continuous Delivery: Configuration Management
- Feature Toggles (Feature Flags) - Martin Fowler
- Immutable Infrastructure - Understanding immutability principles
9 - Trunk Based Development
Excerpt from Accelerate by Nicole Forsgren Ph.D., Jez Humble & Gene Kim
Definition
TBD is a team workflow where changes are integrated into the trunk with no intermediate integration (Develop, Test, etc.) branch. The two common workflows are making changes directly to the trunk or using very short-lived branches that branch from the trunk and integrate back into the trunk.
It is important to note that release branches are an intermediate step that some chose on their path to continuous delivery while improving their quality processes in the pipeline. True CD releases from the trunk.
What is Improved
- Smaller changes: TBD emphasizes small, frequent changes that are easier for the team to review and more resistant to impactful merge conflicts. Conflicts become rare and trivial.
- We must test: TBD requires us to implement tests as part of the development process.
- Better teamwork: We need to work more closely as a team. This has many positive impacts, not least we will be more focused on getting the team’s highest priority done. We will stop starting and start finishing work.
- Better work definition: Small changes require us to decompose the work into a level of detail that helps uncover things that lack clarity or do not make sense. This provides much earlier feedback on potential quality issues.
- Replaces process with engineering: Instead of creating a process where we control the release of features with branches, we can control the release of features with engineering techniques called evolutionary coding methods. These techniques have additional benefits related to stability that cannot be found when replaced by process.
- Reduces risk: There are two risks with long-lived branches that happen frequently. First, the change will not integrate cleanly and the merge conflicts result in broken or lost features. Second, the branch will be abandoned. This is usually because of the first reason. Sometimes because all of the knowledge about what is in that branch resides in the mind of someone who decided to leave before it was integrated.
Need Help?
See the TBD migration guide.
9.1 - Migrating to Trunk-Based Development
Continuous delivery requires continuous integration and CI requires very frequent code integration, at least daily, to the trunk. Doing that either requires trunk-based development or worthless process overhead to do multiple merges to accomplish this. So, if you want CI, you’re not getting there without trunk-Based development. However, standing up TBD is not as simple as “collapse all the branches.” CD is a quality process, not just automated code delivery. Trunk-based development is the first step in establishing that quality process and in uncovering the problems in the current process.
GitFlow, and other branching models that use long-lived branches, optimize for isolation to protect working code from untested or poorly tested code. They create the illusion of safety while silently increasing risk through long feedback delays. The result is predictable: painful merges, stale assumptions, and feedback that arrives too late to matter.
TBD reverses that. It optimizes for rapid feedback, smaller changes, and collaborative discovery — the ingredients required for CI and continuous delivery.
This article explains how to move from GitFlow (or any long-lived branch pattern) toward TBD, and what “good” actually looks like along the way.
Why Move to Trunk-Based Development?
Long-lived branches hide problems. TBD exposes them early, when they are cheap to fix.
Think of long-lived branches like storing food in a bunker: it feels safe until you open the door and discover half of it rotting. With TBD, teams check freshness every day.
To do CI, teams need:
- Small changes integrated at least daily
- Automated tests giving fast, deterministic feedback
- A single source of truth: the trunk
If your branches live for more than a day or two, you aren’t doing continuous integration — you’re doing periodic integration at best. True CI requires at least daily integration to the trunk.
The First Step: Stop Letting Work Age
The biggest barrier isn’t tooling. It’s habits.
The first meaningful change is simple:
Stop letting branches live long enough to become problems.
Your first goal isn’t true TBD. It’s shorter-lived branches — changes that live for hours or a couple of days, not weeks.
That alone exposes dependency issues, unclear requirements, and missing tests — which is exactly the point. The pain tells you where improvement is needed.
Before You Start: What to Measure
You cannot improve what you don’t measure. Before changing anything, establish baseline metrics, so you can track actual progress.
Essential Metrics to Track Weekly
Branch Lifetime
- Average time from branch creation to merge
- Maximum branch age currently open
- Target: Reduce average from weeks to days, then to hours
Integration Health
- Number of merge conflicts per week
- Time spent resolving conflicts
- Target: Conflicts should decrease as integration frequency increases
Delivery Speed
- Time from commit to production deployment
- Number of commits per day reaching production
- Target: Decrease time to production, increase deployment frequency
Quality Indicators
- Build/test execution time
- Test failure rate
- Production incidents per deployment
- Target: Fast, reliable tests; stable deployments
Work Decomposition
- Average pull request size (lines changed)
- Number of files changed per commit
- Target: Smaller, more focused changes
Start with just two or three of these. Don’t let measurement become its own project.
The goal isn’t perfect data — it’s visibility into whether you’re actually moving in the right direction.
Path #1: Moving from Long-Lived Branches to Short-Lived Branches
When GitFlow habits are deeply ingrained, this is usually the least-threatening first step.
1. Collapse the Branching Model
Stop using:
develop- release branches that sit around for weeks
- feature branches lasting a sprint or more
Move toward:
- A single
main(ortrunk) - Temporary branches measured in hours or days
2. Integrate Every Few Days — Then Every Day
Set an explicit working agreement:
“Nothing lives longer than 48 hours.”
Once this feels normal, shorten it:
“Integrate at least once per day.”
If a change is too large to merge within a day or two, the problem isn’t the branching model — the problem is the decomposition of work.
3. Test Before You Code
Branch lifetime shortens when you stop guessing about expected behavior. Bring product, QA, and developers together before coding:
- Write acceptance criteria collaboratively
- Turn them into executable tests
- Then write code to make those tests pass
You’ll discover misunderstandings upfront instead of after a week of coding.
This approach is called Behavior-Driven Development (BDD) — a collaborative practice where teams define expected behavior in plain language before writing code. BDD bridges the gap between business requirements and technical implementation by using concrete examples that become executable tests.
Key BDD resources:
- Behavior-Driven Development - Dojo Consortium - Comprehensive guide to BDD practices
- “Specification by Example” by Gojko Adzic - Foundational text on collaborative specification
How to Run a Three Amigos Session
Participants: Product Owner, Developer, Tester (15-30 minutes per story)
Process:
- Product describes the user need and expected outcome
- Developer asks questions about edge cases and dependencies
- Tester identifies scenarios that could fail
- Together, write acceptance criteria as examples
Example:
These scenarios become your automated acceptance tests before you write any implementation code.
From Acceptance Criteria to Tests
Turn those scenarios into executable tests in your framework of choice:
Now you can write the minimum code to make these tests pass. This drives smaller, more focused changes.
4. Invest in Contract Tests
Most merge pain isn’t from your code — it’s from the interfaces between services.
Define interface changes early and codify them with provider/consumer contract tests.
This lets teams integrate frequently without surprises.
Path #2: Committing Directly to the Trunk
This is the cleanest and most powerful version of TBD. It requires discipline, but it produces the most stable delivery pipeline and the least drama.
If the idea of committing straight to main makes people panic, that’s a signal about your current testing process — not a problem with TBD.
Note on regulated environments
If you work in a regulated industry with compliance requirements (SOX, HIPAA, FedRAMP, etc.), Path #1 with short-lived branches is usually the better choice. Short-lived branches provide the audit trails, separation of duties, and documented approval workflows that regulators expect—while still enabling daily integration. See TBD in Regulated Environments for detailed guidance on meeting compliance requirements, and Address Code Review Concerns for how to maintain fast review cycles with short-lived branches.How to Choose Your Path
Use this rule of thumb:
- If your team fears “breaking everything,” start with short-lived branches.
- If your team collaborates well and writes tests first, go straight to trunk commits.
Both paths require the same skills:
- Smaller work
- Better requirements
- Shared understanding
- Automated tests
- A reliable pipeline
The difference is pace.
Essential TBD Practices
These practices apply to both paths—whether you’re using short-lived branches or committing directly to trunk.
Use Feature Flags the Right Way
Feature flags are one of several evolutionary coding practices that allow you to integrate incomplete work safely. Other methods include branch by abstraction and connect-last patterns. For a comprehensive guide on when to use each approach, see Evolutionary Coding Practices.
Feature flags are not a testing strategy. They are a release strategy.
Every commit to trunk must:
- Build
- Test
- Deploy safely
Flags let you deploy incomplete work without exposing it prematurely. They don’t excuse poor test discipline.
Start Simple: Boolean Flags
You don’t need a sophisticated feature flag system to start. Begin with environment variables or simple config files.
Simple boolean flag example:
This is enough for most TBD use cases.
Testing Code Behind Flags
Critical: You must test both code paths — flag on and flag off.
If you only test with the flag on, you’ll break production when the flag is off.
Two Types of Feature Flags
Feature flags serve two fundamentally different purposes:
Temporary Release Flags (should be removed):
- Control rollout of new features
- Enable gradual deployment
- Allow quick rollback of changes
- Test in production before full release
- Lifecycle: Created for a release, removed once stable (typically 1-4 weeks)
Permanent Configuration Flags (designed to stay):
- User preferences and settings (dark mode, email notifications, etc.)
- Customer-specific features (enterprise vs. free tier)
- A/B testing and experimentation
- Regional or regulatory variations
- Operational controls (read-only mode, maintenance mode)
- Lifecycle: Part of your product’s configuration system
The distinction matters: Temporary release flags create technical debt if not removed. Permanent configuration flags are part of your feature set and belong in your configuration management system.
Most of the feature flags you create for TBD migration will be temporary release flags that must be removed.
Release Flag Lifecycle Management
Temporary release flags are scaffolding, not permanent architecture.
Every temporary release flag should have:
- A creation date
- A purpose
- An expected removal date
- An owner responsible for removal
Track your flags:
Set reminders to remove flags. Permanent flags multiply complexity and slow you down.
When to Remove a Flag
Remove a flag when:
- The feature is 100% rolled out and stable
- You’re confident you won’t need to roll back
- Usually 1-2 weeks after full deployment
Removal process:
- Set flag to always-on in code
- Deploy and monitor
- If stable for 48 hours, delete the conditional logic entirely
- Remove the flag from configuration
Common Anti-Patterns to Avoid
Don’t:
- Let temporary release flags become permanent (if it’s truly permanent, it should be a configuration option)
- Let release flags accumulate without removal
- Skip testing both flag states
- Use flags to hide broken code
- Create flags for every tiny change
Do:
- Use release flags for large or risky changes
- Remove release flags as soon as the feature is stable
- Clearly document whether each flag is temporary (release) or permanent (configuration)
- Test both enabled and disabled states
- Move permanent feature toggles to your configuration management system
Commit Small and Commit Often
If a change is too large to commit today, split it.
Large commits are failed design upstream, not failed integration downstream.
Use TDD and ATDD to Keep Refactors Safe
Refactoring must not break tests. If it does, you’re testing implementation, not behavior. Behavioral tests are what keep trunk commits safe.
Prioritize Interfaces First
Always start by defining and codifying the contract:
- What is the shape of the request?
- What is the response?
- What error states must be handled?
Interfaces are the highest-risk area. Drive them with tests first. Then work inward.
Getting Started: A Tactical Guide
The initial phase sets the tone. Focus on establishing new habits, not perfection.
Step 1: Team Agreement and Baseline
- Hold a team meeting to discuss the migration
- Agree on initial branch lifetime limit (start with 48 hours if unsure)
- Document current baseline metrics (branch age, merge frequency, build time)
- Identify your slowest-running tests
- Create a list of known integration pain points
- Set up a visible tracker (physical board or digital dashboard) for metrics
Step 2: Test Infrastructure Audit
Focus: Find and fix what will slow you down.
- Run your test suite and time each major section
- Identify slow tests
- Look for:
- Tests with sleeps or arbitrary waits
- Tests hitting external services unnecessarily
- Integration tests that could be contract tests
- Flaky tests masking real issues
Fix or isolate the worst offenders. You don’t need a perfect test suite to start — just one fast enough to not punish frequent integration.
Step 3: First Integrated Change
Pick the smallest possible change:
- A bug fix
- A refactoring with existing test coverage
- A configuration update
- Documentation improvement
The goal is to validate your process, not to deliver a feature.
Execute:
- Create a branch (if using Path #1) or commit directly (if using Path #2)
- Make the change
- Run tests locally
- Integrate to trunk
- Deploy through your pipeline
- Observe what breaks or slows you down
Step 4: Retrospective
Gather the team:
What went well:
- Did anyone integrate faster than before?
- Did you discover useful information about your tests or pipeline?
What hurt:
- What took longer than expected?
- What manual steps could be automated?
- What dependencies blocked integration?
Ongoing commitment:
- Adjust branch lifetime limit if needed
- Assign owners to top 3 blockers
- Commit to integrating at least one change per person
The initial phase won’t feel smooth. That’s expected. You’re learning what needs fixing.
Getting Your Team On Board
Technical changes are easy compared to changing habits and mindsets. Here’s how to build buy-in.
Acknowledge the Fear
When you propose TBD, you’ll hear:
- “We’ll break production constantly”
- “Our code isn’t good enough for that”
- “We need code review on branches”
- “This won’t work with our compliance requirements”
These concerns are valid signals about your current system. Don’t dismiss them.
Instead: “You’re right that committing directly to trunk with our current test coverage would be risky. That’s why we need to improve our tests first.”
Start with an Experiment
Don’t mandate TBD for the whole team immediately. Propose a time-boxed experiment:
The Proposal:
“Let’s try this for two weeks with a single small feature. We’ll track what goes well and what hurts. After two weeks, we’ll decide whether to continue, adjust, or stop.”
What to measure during the experiment:
- How many times did we integrate?
- How long did merges take?
- Did we catch issues earlier or later than usual?
- How did it feel compared to our normal process?
After two weeks: Hold a retrospective. Let the data and experience guide the decision.
Pair on the First Changes
Don’t expect everyone to adopt TBD simultaneously. Instead:
- Identify one advocate who wants to try it
- Pair with them on the first trunk-based changes
- Let them experience the process firsthand
- Have them pair with the next person
Knowledge transfer through pairing works better than documentation.
Address Code Review Concerns
“But we need code review!” Yes. TBD doesn’t eliminate code review.
Options that work:
- Pair or mob programming (review happens in real-time)
- Commit to trunk, review immediately after, fix forward if issues found
- Very short-lived branches (hours, not days) with rapid review SLA
- Pairing on code review and review change
The goal is fast feedback, not zero review.
Important
If you’re using short-lived branches that must merge within a day or two, asynchronous code review becomes a bottleneck. Even “fast” async reviews with 2-4 hour turnaround create delays: the reviewer reads code, leaves comments, the author reads comments later, makes changes, and the cycle repeats. Each round trip adds hours or days.
Instead, use synchronous code reviews where the reviewer and author work together in real-time (screen share, pair at a workstation, or mob). This eliminates communication delays through review comments. Questions get answered immediately, changes happen on the spot, and the code merges the same day.
If your team can’t commit to synchronous reviews or pair/mob programming, you’ll struggle to maintain short branch lifetimes.
Handle Skeptics and Blockers
You’ll encounter people who don’t want to change. Don’t force it.
Instead:
- Let them observe the experiment from the outside
- Share metrics and outcomes transparently
- Invite them to pair for one change
- Let success speak louder than arguments
Some people need to see it working before they believe it.
Get Management Support
Managers often worry about:
- Reduced control
- Quality risks
- Slower delivery (ironically)
Address these with data:
- Show branch age metrics before/after
- Track cycle time improvements
- Demonstrate faster feedback on defects
- Highlight reduced merge conflicts
Frame TBD as a risk reduction strategy, not a risky experiment.
Working in a Multi-Team Environment
Migrating to TBD gets complicated when you depend on teams still using long-lived branches. Here’s how to handle it.
The Core Problem
You want to integrate daily. Your dependency team integrates weekly or monthly. Their API changes surprise you during their big-bang merge.
You can’t force other teams to change. But you can protect yourself.
Strategy 1: Consumer-Driven Contract Tests
Define the contract you need from the upstream service and codify it in tests that run in your pipeline.
Example using Pact:
This test runs against your expectations of the API, not the actual service. When the upstream team changes their API, your contract test fails before you integrate their changes.
Share the contract:
- Publish your contract to a shared repository
- Upstream team runs provider verification against your contract
- If they break your contract, they know before merging
Strategy 2: API Versioning with Backwards Compatibility
If you control the shared service:
Migration path:
- Deploy new version alongside old version
- Update consumers one by one
- After all consumers migrated, deprecate old version
- Remove old version after deprecation period
Strategy 3: Strangler Fig Pattern
When you depend on a team that won’t change:
- Create an anti-corruption layer between your code and theirs
- Define your ideal interface in the adapter
- Let the adapter handle their messy API
Now your code depends on your interface, not theirs. When they change, you only update the adapter.
Strategy 4: Feature Toggles for Cross-Team Coordination
When multiple teams need to coordinate a release:
- Each team develops behind feature flags
- Each team integrates to trunk continuously
- Features remain disabled until coordination point
- Enable flags in coordinated sequence
This decouples development velocity from release coordination.
When You Can’t Integrate with Dependencies
If upstream dependencies block you from integrating daily:
Short term:
- Use contract tests to detect breaking changes early
- Create adapters to isolate their changes
- Document the integration pain as a business cost
Long term:
- Advocate for those teams to adopt TBD
- Share your success metrics
- Offer to help them migrate
You can’t force other teams to change. But you can demonstrate a better way and make it easier for them to follow.
TBD in Regulated Environments
Regulated industries face legitimate compliance requirements: audit trails, change traceability, separation of duties, and documented approval processes. These requirements often lead teams to believe trunk-based development is incompatible with compliance. This is a misconception.
TBD is about integration frequency, not about eliminating controls. You can meet compliance requirements while still integrating at least daily.
The Compliance Concerns
Common regulatory requirements that seem to conflict with TBD:
Audit Trail and Traceability
- Every change must be traceable to a requirement, ticket, or change request
- Changes must be attributable to specific individuals
- History of what changed, when, and why must be preserved
Separation of Duties
- The person who writes code shouldn’t be the person who approves it
- Changes must be reviewed before reaching production
- No single person should have unchecked commit access
Change Control Process
- Changes must follow a documented approval workflow
- Risk assessment before deployment
- Rollback capability for failed changes
Documentation Requirements
- Changes must be documented before implementation
- Testing evidence must be retained
- Deployment procedures must be repeatable and auditable
Short-Lived Branches: The Compliant Path to TBD
Path #1 from this guide—short-lived branches—directly addresses compliance concerns while maintaining the benefits of TBD.
Short-lived branches mean:
- Branches live for hours to 2 days maximum, not weeks or months
- Integration happens at least daily
- Pull requests are small, focused, and fast to review
- Review and approval happen within the branch lifetime
This approach satisfies both regulatory requirements and continuous integration principles.
How Short-Lived Branches Meet Compliance Requirements
Audit Trail:
Every commit references the change ticket:
Modern Git hosting platforms (GitHub, GitLab, Bitbucket) automatically track:
- Who created the branch
- Who committed each change
- Who reviewed and approved
- When it merged
- Complete diff history
Separation of Duties:
Use pull request workflows:
- Developer creates branch from trunk
- Developer commits changes (same day)
- Second person reviews and approves (within 24 hours)
- Automated checks validate (tests, security scans, compliance checks)
- Merge to trunk after approval
- Automated deployment with gates
This provides stronger separation of duties than long-lived branches because:
- Reviews happen while context is fresh
- Reviewers can actually understand the small changeset
- Automated checks enforce policies consistently
Change Control Process:
Branch protection rules enforce your process:
This ensures:
- No direct commits to trunk (except in documented break-glass scenarios)
- Required approvals before merge
- Automated validation gates
- Audit log of every merge decision
Documentation Requirements:
Pull request templates enforce documentation:
What “Short-Lived” Means in Practice
Hours, not days:
- Simple bug fixes: 2-4 hours
- Small feature additions: 4-8 hours
- Refactoring: 1-2 days
Maximum 2 days: If a branch can’t merge within 2 days, the work is too large. Decompose it further or use feature flags to integrate incomplete work safely.
Daily integration requirement: Even if the feature isn’t complete, integrate what you have:
- Behind a feature flag if needed
- As internal APIs not yet exposed
- As tests and interfaces before implementation
Compliance-Friendly Tooling
Modern platforms provide compliance features built-in:
Git Hosting (GitHub, GitLab, Bitbucket):
- Immutable audit logs
- Branch protection rules
- Required approvals
- Status check enforcement
- Signed commits for authenticity
CI/CD Platforms:
- Deployment approval gates
- Audit trails of every deployment
- Environment-specific controls
- Automated compliance checks
Feature Flag Systems:
- Change deployment without code deployment
- Gradual rollout controls
- Instant rollback capability
- Audit log of flag changes
Secrets Management:
- Vault, AWS Secrets Manager, Azure Key Vault
- Audit log of secret access
- Rotation policies
- Environment isolation
Example: Compliant Short-Lived Branch Workflow
Monday 9 AM:
Developer creates branch feature/JIRA-1234-add-audit-logging from trunk.
Monday 9 AM - 2 PM: Developer implements audit logging for user authentication events. Commits reference JIRA-1234. Automated tests run on each commit.
Monday 2 PM: Developer opens pull request:
- Title: “JIRA-1234: Add audit logging for authentication events”
- Description includes risk assessment, testing evidence, rollback plan
- Automated checks run: tests, security scan, compliance validation
- Code owner automatically assigned for review
Monday 3 PM: Code owner reviews (5-10 minutes—change is small and focused). Suggests minor improvement.
Monday 3:30 PM: Developer addresses feedback, pushes update.
Monday 4 PM: Code owner approves. All automated checks pass. Developer merges to trunk.
Monday 4:05 PM: CI/CD pipeline deploys to staging automatically. Automated smoke tests pass.
Monday 4:30 PM: Deployment gate requires manual approval for production. Tech lead approves based on risk assessment.
Monday 4:35 PM: Automated deployment to production. Audit log captures: what deployed, who approved, when, what checks passed.
Total time: 7.5 hours from branch creation to production.
Full compliance maintained. Full audit trail captured. Daily integration achieved.
When Long-Lived Branches Hide Compliance Problems
Ironically, long-lived branches often create compliance risks:
Stale Reviews: Reviewing a 3-week-old, 2000-line pull request is performative, not effective. Reviewers rubber-stamp because they can’t actually understand the changes.
Integration Risk: Big-bang merges after weeks introduce unexpected behavior. The change that was reviewed isn’t the change that actually deployed (due to merge conflicts and integration issues).
Delayed Feedback: Problems discovered weeks after code was written are expensive to fix and hard to trace to requirements.
Audit Trail Gaps: Long-lived branches often have messy commit history, force pushes, and unclear attribution. The audit trail is polluted.
Regulatory Examples Where Short-Lived Branches Work
Financial Services (SOX, PCI-DSS):
- Short-lived branches with required approvals
- Automated security scanning on every PR
- Separation of duties via required reviewers
- Immutable audit logs in Git hosting platform
- Feature flags for gradual rollout and instant rollback
Healthcare (HIPAA):
- Pull request templates documenting PHI handling
- Automated compliance checks for data access patterns
- Required security review for any PHI-touching code
- Audit logs of deployments
- Environment isolation enforced by CI/CD
Government (FedRAMP, FISMA):
- Branch protection requiring government code owner approval
- Automated STIG compliance validation
- Signed commits for authenticity
- Deployment gates requiring authority to operate
- Complete audit trail from commit to production
The Real Choice
The question isn’t “TBD or compliance.”
The real choice is: compliance theater with long-lived branches and risky big-bang merges, or actual compliance with short-lived branches and safe daily integration.
Short-lived branches provide:
- Better audit trails (small, traceable changes)
- Better separation of duties (reviewable changes)
- Better change control (automated enforcement)
- Lower risk (small, reversible changes)
- Faster feedback (problems caught early)
That’s not just compatible with compliance. That’s better compliance.
What Will Hurt (At First)
When you migrate to TBD, you’ll expose every weakness you’ve been avoiding:
- Slow tests
- Unclear requirements
- Fragile integration points
- Architecture that resists small changes
- Gaps in automated validation
- Long manual processes in the value stream
This is not a regression. This is the point.
Problems you discover early are problems you can fix cheaply.
Common Pitfalls to Avoid
Teams migrating to TBD often make predictable mistakes. Here’s how to avoid them.
Pitfall 1: Treating TBD as Just a Branch Renaming Exercise
The mistake:
Renaming develop to main and calling it TBD.
Why it fails: You’re still doing long-lived feature branches, just with different names. The fundamental integration problems remain.
What to do instead: Focus on integration frequency, not branch names. Measure time-to-merge, not what you call your branches.
Pitfall 2: Merging Daily Without Actually Integrating
The mistake: Committing to trunk every day, but your code doesn’t interact with anyone else’s work. Your tests don’t cover integration points.
Why it fails: You’re batching integration for later. When you finally connect your component to the rest of the system, you discover incompatibilities.
What to do instead: Ensure your tests exercise the boundaries between components. Use contract tests for service interfaces. Integrate at the interface level, not just at the source control level.
Pitfall 3: Skipping Test Investment
The mistake: “We’ll adopt TBD first, then improve our tests later.”
Why it fails: Without fast, reliable tests, frequent integration is terrifying. You’ll revert to long-lived branches because trunk feels unsafe.
What to do instead: Invest in test infrastructure first. Make your slowest tests faster. Fix flaky tests. Only then increase integration frequency.
Pitfall 4: Using Feature Flags as a Testing Escape Hatch
The mistake: “It’s fine to commit broken code as long as it’s behind a flag.”
Why it fails: Untested code is still untested, flag or no flag. When you enable the flag, you’ll discover the bugs you should have caught earlier.
What to do instead: Test both flag states. Flags hide features from users, not from your test suite.
Pitfall 5: Keeping Flags Forever
The mistake: Creating feature flags and never removing them. Your codebase becomes a maze of conditionals.
Why it fails: Every permanent flag doubles your testing surface area and increases complexity. Eventually, no one knows which flags do what.
What to do instead: Set a removal date when creating each flag. Track flags like technical debt. Remove them aggressively once features are stable.
Pitfall 6: Forcing TBD on an Unprepared Team
The mistake: Mandating TBD before the team understands why or how it works.
Why it fails: People resist changes they don’t understand or didn’t choose. They’ll find ways to work around it or sabotage it.
What to do instead: Start with volunteers. Run experiments. Share results. Let success create pull, not push.
Pitfall 7: Ignoring the Need for Small Changes
The mistake: Trying to do TBD while still working on features that take weeks to complete.
Why it fails: If your work naturally takes weeks, you can’t integrate daily. You’ll create work-in-progress commits that don’t add value.
What to do instead: Learn to decompose work into smaller, independently valuable increments. This is a skill that must be developed.
Pitfall 8: No Clear Definition of “Done”
The mistake: Integrating code that “works on my machine” without validating it in a production-like environment.
Why it fails: Integration bugs don’t surface until deployment. By then, you’ve integrated many other changes, making root cause analysis harder.
What to do instead: Define “integrated” as “deployed to a staging environment and validated.” Your pipeline should do this automatically.
Pitfall 9: Treating Trunk as Unstable
The mistake: “Trunk is where we experiment. Stable code goes in release branches.”
Why it fails: If trunk can’t be released at any time, you don’t have CI. You’ve just moved your integration problems to a different branch.
What to do instead: Trunk must always be production-ready. Use feature flags for incomplete work. Fix broken builds immediately.
Pitfall 10: Forgetting That TBD is a Means, Not an End
The mistake: Optimizing for trunk commits without improving cycle time, quality, or delivery speed.
Why it fails: TBD is valuable because it enables fast feedback and low-cost changes. If those aren’t improving, TBD isn’t working.
What to do instead: Measure outcomes, not activities. Track cycle time, defect rates, deployment frequency, and time to restore service.
When to Pause or Pivot
Sometimes TBD migration stalls or causes more problems than it solves. Here’s how to tell if you need to pause and what to do about it.
Signs You’re Not Ready Yet
Red flag 1: Your test suite takes hours to run If developers can’t get feedback in minutes, they can’t integrate frequently. Forcing TBD now will just slow everyone down.
What to do: Pause the TBD migration. Invest 2-4 weeks in making tests faster. Parallelize test execution. Remove or optimize the slowest tests. Resume TBD when feedback takes less than 10 minutes.
Red flag 2: More than half your tests are flaky If tests fail randomly, developers will ignore failures. You’ll integrate broken code without realizing it.
What to do: Stop adding new features. Spend one sprint fixing or deleting flaky tests. Track flakiness metrics. Only resume TBD when you trust your test results.
Red flag 3: Production incidents increased significantly If TBD caused a spike in production issues, something is wrong with your safety net.
What to do: Revert to short-lived branches (48-72 hours) temporarily. Analyze what’s escaping to production. Add tests or checks to catch those issues. Resume direct-to-trunk when the safety net is stronger.
Red flag 4: The team is in constant conflict If people are fighting about the process, frustrated daily, or actively working around it, you’ve lost the team.
What to do: Hold a retrospective. Listen to concerns without defending TBD. Identify the top 3 pain points. Address those first. Resume TBD migration when the team agrees to try again.
Signs You’re Doing It Wrong (But Can Fix It)
Yellow flag 1: Daily commits, but monthly integration You’re committing to trunk, but your code doesn’t connect to the rest of the system until the end.
What to fix: Focus on interface-level integration. Ensure your tests exercise boundaries between components. Use contract tests.
Yellow flag 2: Trunk is broken often If trunk is red more than 5% of the time, something’s wrong with your testing or commit discipline.
What to fix: Make “fix trunk immediately” the top priority. Consider requiring local tests to pass before pushing. Add pre-commit hooks if needed.
Yellow flag 3: Feature flags piling up If you have more than 5 active flags, you’re not cleaning up after yourself.
What to fix: Set a team rule: “For every new flag created, remove an old one.” Dedicate time each sprint to flag cleanup.
How to Pause Gracefully
If you need to pause:
-
Communicate clearly: “We’re pausing TBD migration for two weeks to fix our test infrastructure. This isn’t abandoning the goal.”
-
Set a specific resumption date: Don’t let “pause” become “quit.” Schedule a date to revisit.
-
Fix the blockers: Use the pause to address the specific problems preventing success.
-
Retrospect and adjust: When you resume, what will you do differently?
Pausing isn’t failure. Pausing to fix the foundation is smart.
What “Good” Looks Like
You know TBD is working when:
- Branches live for hours, not days
- Developers collaborate early instead of merging late
- Product participates in defining behaviors, not just writing stories
- Tests run fast enough to integrate frequently
- Deployments are boring
- You can fix production issues with the same process you use for normal work
When your deployment process enables emergency fixes without special exceptions, you’ve reached the real payoff: lower cost of change, which makes everything else faster, safer, and more sustainable.
Concrete Examples and Scenarios
Theory is useful. Examples make it real. Here are practical scenarios showing how to apply TBD principles.
Scenario 1: Breaking Down a Large Feature
Problem: You need to build a user notification system with email, SMS, and in-app notifications. Estimated: 3 weeks of work.
Old approach (GitFlow):
Create a feature/notifications branch. Work for three weeks. Submit a massive pull request. Spend days in code review and merge conflicts.
TBD approach:
Week 1:
-
Day 1: Define notification interface, commit to trunk
This compiles but doesn’t do anything yet. That’s fine.
-
Day 2: Add in-memory implementation for testing
Now other teams can use the interface in their code and tests.
-
Day 3-5: Implement email notifications behind a feature flag
Commit daily. Deploy. Flag is off in production.
Week 2:
- Add SMS notifications (same pattern: interface, implementation, feature flag)
- Enable email notifications for internal users only
- Iterate based on feedback
Week 3:
- Add in-app notifications
- Roll out email and SMS to all users
- Remove flags for email once stable
Result: Integrated 12-15 times instead of once. Each integration was small and low-risk.
Scenario 2: Database Schema Change
Problem:
You need to split the users.name column into first_name and last_name.
Old approach: Update schema, update all code, deploy everything at once. Hope nothing breaks.
TBD approach (expand-contract pattern):
Step 1: Expand (Day 1) Add new columns without removing the old one:
Commit and deploy. Application still uses name column. No breaking change.
Step 2: Dual writes (Day 2-3) Update write path to populate both old and new columns:
Commit and deploy. Now new data populates both formats.
Step 3: Backfill (Day 4) Migrate existing data in the background:
Run this as a background job. Commit and deploy.
Step 4: Read from new columns (Day 5) Update read path behind a feature flag:
Deploy and gradually enable the flag.
Step 5: Contract (Week 2) Once all reads use new columns and flag is removed:
Result: Five deployments instead of one big-bang change. Each step was reversible. Zero downtime.
Scenario 3: Refactoring Without Breaking the World
Problem: Your authentication code is a mess. You want to refactor it without breaking production.
TBD approach:
Day 1: Characterization tests Write tests that capture current behavior (warts and all):
These tests document how the system actually works. Commit.
Day 2-3: Strangler fig pattern Create new implementation alongside old one:
Commit with flag off. Old behavior unchanged.
Day 4-7: Migrate piece by piece Enable modern auth for one endpoint at a time:
Commit daily. Monitor each endpoint.
Week 2: Remove old code Once all endpoints use modern auth and it’s been stable for a week:
Delete the legacy code entirely.
Result: Continuous refactoring without a “big rewrite” branch. Production was never at risk.
Scenario 4: Working with External API Changes
Problem: A third-party API you depend on is changing their response format next month.
TBD approach:
Week 1: Adapter pattern Create an adapter that normalizes both old and new formats:
Commit. Your code now works with both formats.
Week 2-3: Wait for the third-party API to migrate. Your code keeps working.
Week 4 (after API migration): Simplify adapter to only handle new format:
Result: No coupling between your deployment schedule and the external API migration. Zero downtime.
References and Further Reading
Trunk-Based Development
Core Resources:
- trunkbaseddevelopment.com - Comprehensive guide by Paul Hammant
- “Continuous Delivery” by Jez Humble and David Farley - Foundational text on CD practices
- Martin Fowler on Feature Toggles - Deep dive into feature flag patterns
Testing Practices
ATDD and BDD:
- “Specification by Example” by Gojko Adzic - Collaborative test writing
- “The Cucumber Book” by Matt Wynne and Aslak Hellesøy - Practical BDD guide
- Three Amigos sessions - Collaborative requirements discovery
Test-Driven Development:
- “Test-Driven Development: By Example” by Kent Beck - TDD fundamentals
- “Growing Object-Oriented Software, Guided by Tests” by Steve Freeman and Nat Pryce - TDD at scale
Contract Testing:
- Pact Documentation - Consumer-driven contract testing
- Spring Cloud Contract - For JVM ecosystems
Patterns for Incremental Change
Database Migrations:
- “Refactoring Databases” by Scott Ambler and Pramod Sadalage - Expand-contract pattern
- Evolutionary Database Design - Martin Fowler
Legacy Code:
- “Working Effectively with Legacy Code” by Michael Feathers - Characterization tests and strangler patterns
- Strangler Fig Application - Incremental rewrites
Team Dynamics and Change Management
- “Accelerate” by Nicole Forsgren, Jez Humble, and Gene Kim - Data on what drives software delivery performance
- “Team Topologies” by Matthew Skelton and Manuel Pais - Organizing teams for fast flow
- State of DevOps Reports - Annual research on delivery practices
Continuous Integration
- “Continuous Integration: Improving Software Quality and Reducing Risk” by Paul Duvall
- ThoughtWorks on CI - Foundational practices
- Continuous Delivery Foundation - Community and standards
Communities and Discussions
- DevOps subreddit - Practitioner discussions
- Continuous Delivery Slack - Active community
- Software Engineering Stack Exchange - Q&A on practices
Final Thought
Migrating from GitFlow to TBD isn’t a matter of changing your branching strategy. It’s a matter of changing your thinking.
Stop optimizing for isolation.
Start optimizing for feedback.
Small, tested, integrated changes — delivered continuously — will always outperform big batches delivered occasionally.
That’s why teams migrate to TBD. Not because it’s trendy, but because it’s the only path to real continuous integration and continuous delivery.