By Robert L. Nord, Ipek Ozkaya - Software Engineering Institute Author(s) Biography
There are four parts to our discussion of Agile at scale. First, we set the context by providing an answer to the question, “Why is AAS challenging?” The ten AAS primary technical best practices follow. We then briefly address how an organization can prepare for and achieve effective results from these best practices. We conclude with a listing of selected resources to help you learn more. Also, we’ve added links to various sources to help amplify a point—be mindful that such sources may occasionally include material that might differ from some of the recommendations below.
Every organization is different; judgment is required to implement these practices in a way that provides benefit to your organization. In particular, be mindful of your mission, goals, existing processes, and culture. All practices have limitations—there is no “one size fits all.” To gain the most benefit, you need to evaluate each practice for its appropriateness and decide how to adapt it, striving for an implementation in which the practices reinforce each other. Also, consider additional best practice collections (such as the one from the GAO that is referenced at the end of this webpage). Monitor your adoption and use of these practices and adjust as appropriate.
These practices are certainly not complete—they are a work in progress. For example, as future additions we plan to include webpages addressing management and acquisition best practices for AAS.
And, of course, we welcome your feedback (use comments section at the end).
Why is AAS challenging?Go to Top
Agile practices, derived from a set of foundational principles, have been used for well over a decade and have enjoyed much success and broad adoption in the commercial sector with the net result that development teams have gotten better at building software. Reasons include: increased visibility into a project and the emerging product, increased empowerment of development teams, the ability for customers and end users to interact early with executable code, and the direct engagement of the customer or product owner in the project to provide a greater sense of shared responsibility.
But business and mission goals are larger than a single development team and thus applying AAS is challenging along these dimensions:
Team size. What happens when Agile practices are used in a 100-person (or larger) development team? What happens when the development team needs to interact with the rest of the business such as quality assurance, system integration, project management, and marketing to get input into product development and collaborate on the end-to-end delivery of the product? Scrum and Agile methods such as extreme programming (XP) are typically used by small teams of at most 7-10 people. Larger teams require orchestration of both multiple (sub)teams and cross-functional roles beyond development.
Complexity. Large-scale systems are often large in scope in terms of the number of features, the amount of new technology being introduced, the number of independent systems being integrated, the number and types of users to be accommodated, and the number of external systems with which the system communicates. Does the system have stringent quality-of-service requirements (e.g., strict real-time, high-reliability, and security requirements)? Are there multiple external stakeholders and interfaces? Typically, such systems must go through rigorous verification and validation (V&V), which makes the frequent-deployment practices used in Agile development challenging.
Duration. How long will the system be in development? How long in operations and sustainment? Larger systems need to be in development and operation for a longer period of time than products to which agile development is typically applied, requiring attention to future changes, possible redesigns as well as maintaining several delivered versions. This is a focus that some Agile teams would consider antithetical to the Agile principles. Answers to these questions affect the choice of quality attributes supporting system maintenance and evolution goals that are key to system success over the long term.
AAS Best Practices:Go to Top
Use Scrum of Scrums carefully when coordinating multiple teams.Scrum is the most often used Agile method in today’s environment, and primarily involves team management practices. In its simplest instantiation, a Scrum development environment consist of a single Scrum team with the skills, authority and knowledge required to specify requirements, architect, design, code, and test the system. As systems grow in size and complexity, the single team mode may no longer meet development demands. If a project has already decided to use a Scrum-like project-management technique, the Scrum approach can be extended to managing multiple teams with a “Scrum of Scrums,” a special coordination team whose role is to (1) define what information will flow between and among development teams (addressing inter-team dependencies and communication) and (2) identify, analyze, and resolve coordination issues and risks that have potentially broader consequences (e.g., for the project as a whole). A Scrum of Scrums typically consists of members from each team chosen to address end-to-end functionality or cross-cutting concerns such as user interface design, architecture, integration testing, and deployment. Creating a special team responsible for inter-team coordination helps ensure that the right information, including measurements, issues, and risks, is communicated between and among teams. But care needs to be taken when the Scrum of Scrums team itself gets large to not overwhelm the team. This can be accomplished by organizing teams—and the Scrum of Scrums team itself—along feature and service affinities. We further discuss this approach to organizing teams in our Feature-Based Development and System Decomposition practice. Such orchestration is essential to managing larger teams to success, including Agile teams.
Use an architectural runway to manage technical complexity. Stringent safety or mission-critical requirements increase technical complexity and risk. Technical complexity arises when the work takes longer than a single iteration or release cycle and cannot be easily partitioned and allocated to different technical competencies (or teams) to independently and concurrently develop their part of a solution. Successful approaches to managing technical complexity include having the most-urgent system or software architecture features well defined early (or even pre-defined at the organizational level, e.g., as infrastructure platforms or software product lines).
The Agile term for such pre-staging of architectural features that can be leveraged by development teams is "architectural runway." The architectural runway has the goal of providing the degree of stability required to support future iterations of development. This stability is particularly important to the successful operation of multiple teams. A system or software architect decides which architectural features must be developed first by identifying the architecturally significant requirements for the system. By initially defining (and continuously extending) the architectural runway, development teams are able to iteratively develop customer-desired features that leverage that runway and benefit from the quality attributes they confer (e.g., security).
Having a defined architectural runway enables technical risks to be uncovered earlier, thereby helping to manage system complexity (no late surprises). The consequence of uncovering underlying architectural concerns such as security, performance, or availability late—that is, after several iterations have passed—often is a significant rework rate and schedule delay. Delivering functionality is more predictable when the infrastructure for the new features is in place so it is important to maintain a continual focus on the architecturally significant requirements and estimation of when the development teams will depend on having code that implements an architectural solution.
Align Feature-Based Development and System Decomposition. A common approach in Agile teams is to implement a feature (or user story) in all the components of the system. This gives the team the ability to focus on something that has stakeholder value. The team controls every piece of implementation for that feature and therefore they do not have to wait until someone else outside the team has finished some required work. We call this vertical alignment because every component of the system required for realizing the feature is implemented only to the degree required by the team.
However, system decomposition could also be horizontal, based on the architectural needs of the system, focusing on common services and variability mechanisms promoting reuse.
The goal of creating a feature-based development and system decomposition approach is to provide flexibility in aligning teams horizontally, vertically, or in combination, while minimizing coupling to ensure progress. Although organizations create products in very different domains (embedded systems to enterprise systems) similar architecture patterns and strategies emerge when a need to balance rapid progress and agile stability is desired. The teams create a platform containing commonly used services and development environments either as frameworks or platform plug-ins to enable fast feature-based development.
Use quality-attribute scenarios to clarify architecturally significant requirements. Scrum emphasizes customer-facing requirements—features that end users dwell on—and indeed these are important to success. But when the focus on end-user functionality becomes exclusive, the underlying architecturally significant requirements can go unnoticed.
Superior practice is to elicit, document, communicate, and validate underlying quality-attribute scenarios during development of the architectural runway. This becomes even more important at scale when projects often have significant longevity and sustainability needs. Early in the project, evaluate the quality-attribute scenarios to determine which architecturally significant requirements need to be addressed in early development increments (see architectural runway practice above) or whether strategic shortcuts can be taken to deliver end-user capability more quickly.
For example, will the system really have to scale up to a million users immediately, or is this actually a trial product? There are different considerations depending on the domain; for example, IT systems use existing frameworks, so understanding the quality-attribute scenarios can help developers understand which architecturally significant requirements might already be addressed adequately within existing frameworks (including open-source systems) or existing legacy systems that can be leveraged during software development. Similarly, such systems have to deal with changing requirements in security and deployment environments that necessitates architecturally significant requirements to be top priority when dealing with scale.
Use test-driven development for early and continuous focus on verification. This practice can be summarized as “write your test before you write the system.” When there is an exclusive focus on “sunny-day” scenarios (a typical developer’s mindset), the project becomes overly reliant on extensive testing at the end of the project to identify overlooked scenarios and interactions. Therefore, be sure to focus on rainy-day scenarios (e.g., consider different system failure modes) as well as sunny-day scenarios. The practice of writing tests first, especially at the business or system level (which is known as acceptance test-driven development) reinforces the other practices that identify the more challenging aspects and properties of the system, especially quality attributes and architectural concerns (see architectural runway and quality-attribute scenarios practices above).
Use end-to-end testing for early insight into emerging system properties. To successfully derive the full benefit from test-driven development at scale, consider early and continuous end-to-end testing of system scenarios. When teams test only the features for which they are responsible, they lose insight into overall system behavior (and how their efforts contribute to achieving it). Each small team could be successful against its own backlog, but someone needs to be looking after broader or emergent system properties and implications. For example, who is responsible for the fault tolerance of the system as a whole? Answering such questions requires careful orchestration of development with verification activities early and throughout development. When testing end to end, take into account different operational contexts, environments, and system modes.
At scale, understanding end-to-end functionality requires its elicitation and documentation. This can be achieved through use of agile requirements management techniques such as stories as well as use of architecturally significant requirements. However, if there is a need to orchestrate multiple systems, a more deliberate elicitation of end-to-end functionality as mission/business threads should provide a better result.
Use continuous integration for consistent attention to integration issues. This basic Agile practice becomes even more important at scale, given the increased number of subsystems that must work together and whose development must be orchestrated. One implication is that the underlying infrastructure that developers will use day to day must be able to support continuous integration. Another is that developers focus on integration earlier, identifying the subsystems and existing frameworks that will need to integrate. This identification has implications for the architectural runway, quality-attribute scenarios, and orchestration of development and verification activities. Useful measures for managing continuous integration include rework rate and scrap rate. It is also important to start early in the project to identify issues that can arise during integration. What this means more broadly is that both integration and the ability to integrate must be managed in the Agile environment.
Consider technical debt management as an approach to strategically manage system development. The concept of technical debt arose naturally from use of Agile methods, where the emphasis on getting features out quickly often creates a need for rework later. At scale, there may be multiple opportunities for shortcuts, and understanding technical debt and its implications becomes a means for strategically managing the development of the system. For example, there might be cases, where to accelerate delivery, certain architectural selections are made that have long-term consequences. Such tradeoffs must be understood and managed based on both qualitative and quantitative measurements of the system. Qualitatively, architecture evaluations can be used as part of the product demos or retrospectives that Agile advocates. Quantitative measures are harder but can arise from understanding productivity, system uncertainty, and measures of rework (e.g., when uncertainty is greater, you might be more willing to take on more rework later). Several larger organizations have started to look into technical-debt management practices organizationally.
Use prototyping to rapidly evaluate and resolve significant technical risks. To address significant technical issues, teams employing Agile methods will sometimes perform what in Scrum is referred to as a technical spike, in which a team branches out from the rest of the project to investigate the specific technical issue, develop one or more prototypes to evaluate possible solutions, and bring back what was learned to the project so that it can proceed with greater likelihood of success. A technical spike may extend over multiple sprints, depending on the seriousness of the issue and how much time it takes to investigate the issue and bring back information that the project can use.
At scale, technical risks having severe consequences are typically more numerous, and so prototyping (and other approaches to evaluating candidate solutions such as simulation and demonstration) can be an essential early planning but also recurring activity. A goal of Agile methods is increased early visibility. From that perspective, prototyping is a valuable means of achieving visibility more quickly for technical risks and their mitigations. The Scrum of Scrums practice mentioned earlier has a role here, too, for helping to orchestrate bringing back what was learned from prototyping to the overall system.
Use architectural evaluations to ensure that architecturally significant requirements are being addressed. While not considered part of mainstream Agile practice, architecture evaluations have much in common with Agile methods in seeking to bring a project’s stakeholders together to increase their visibility into and commitment to the project, and to identify overlooked risks. At scale, architectural issues become even more important, and architecture evaluations thus have a critical role on the project. Architecture evaluation can be formal, as in the Software Engineering Institute’s Architecture Tradeoff Analysis Method, which can be performed, for example, early in the Agile project lifecycle before the project’s development teams are launched, or recurrently. There is also an important role for lighter weight evaluations in project retrospectives to evaluate progress against architecturally significant requirements.
Under what conditions will organizations derive the most benefit from the AAS best practices?Go to Top
None of these practices in isolation will enable agility at scale. They are meant to be orchestrated together. Improving visibility and understanding into high priority concerns for the system under development and understanding the technical challenges hindering their development early on and continuously is what enabled agile development practices to succeed in its initial context. Carrying that to scale means making sure the technical barriers and enablers are clearly communicated through not only team practices but through the working system as well. When an organization neglects the following factors, the effectiveness of AAS practices, and of Agile more generally, may be severely limited:
A technical infrastructure that empowers the teams to collaborate. An infrastructure that supports such things as configuration management; issue and defect tracking; and team measurement and analysis are extremely important for Agile and AAS practices. For example, a large Agile project with distributed teams may lack something as simple as a standard virtual-meeting capability to support daily standup meetings.
A management culture that empowers and trusts team decisions. Agile practices assume empowerment of development teams. Technical decisions made at the development level should be trusted and propagated to other teams and management that might be affected. More generally, communication barriers must be removed, and management must create a culture that removes silos, particularly around interdependent work.
One key is ensuring that team members have the training and mentoring they need to make sound technical judgments. Teams must be empowered and encouraged to define their own work processes, define the measurements they will collect and analyze, and regularly evaluate the quality of their work and gauge the progress made.
Strongly hierarchical decision-making organizations may experience significant challenges as they try to transition to such a culture: development teams may be used to being told what to do and may experience unease taking the initiative, and their management may remain uneasy in granting teams that initiative.
Visibility. Agile is all about achieving visibility early and continuously and recognizing and addressing risks in a timely way. The challenge with knowledge work is that though work processes may be “proven” across a range of circumstances, they nevertheless represent theories of how the work should proceed (theories that can improve with time); thus, team processes should be measured, monitored, and adjusted as needed.
One key to greater visibility and understanding is to make all team artifacts that contribute to the development of the system broadly accessible to everyone in the project. Many open-source efforts now employ social coding environments—such as GitHub—that provide full transparency into each developer’s work. More generally, it is not possible to fully anticipate who needs to know about team progress and issues, now or in the future, and thus the environment should make working code, team and project backlogs, and quality-attribute priorities visible to all.
Learn MoreGo to Top
For more information about Agile at scale, please see:
Leffingwell, Dean. Scaling Software Agility: Best Practices for Large Enterprises. Addison-Wesley, 2007.
Government Accountability Office. Software Development: Effective Practices and Federal Challenges in Applying Agile Methods. Report GAO-12-681. July 2012. http://www.gao.gov/products/GAO-12-681
Larman, Craig and Vodde, Bas, Practices for Scaling Lean & Agile Development: Large, Multisite, and Offshore Product Development with Large-Scale Scrum
Stephany Bellomo, Robert L. Nord, Ipek Ozkaya: A Study of Enabling Factors for Rapid Fielding: Combined Practices to Balance Speed and Stability. ICSE 2013: 982-991
For more information about architectural tactics and Agile, please see:
Royce, W. Measuring Agility and Architectural Integrity, Int’l J. Software and Informatics, vol. 5, no. 3, 2011, pp 415-433.
For more information on Agile for the enterprise and teams, please see
Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.
To learn more about the interplay of Agile at scale best practices, see:
Integrate End to End Early and Often, IEEE Software July/.August 2013 issue, Felix Bachmann et al
Government Accountability Office. Software Development: Effective Practices and Federal Challenges in Applying Agile Methods. Report GAO-12-681. July 2012.
For more information about quality attribute scenarios, please see:
Ipek Ozkaya, Len Bass, Raghvinder Sangwan and Robert Nord. Making Practical Use of Quality Attribute Information, in IEEE Software Volume 25 Issue 2 March-April 2008, Page(s): 25-33.
Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.
To learn more about test-driven development, see:
Whittaker, James A., Jason Arbon and Jeff Carollo: How Google Tests Software (Apr 2, 2012)
Beck, Kent: Test Driven Development by Example
Learn more about continuous integration by seeing:
Continuous Integration: Improving Software Quality and Reducing Risk
Paul M. Duvall; Steve Matyas; Andrew Glover; Addison-Wesley Professional, 2007
To view information about technical debt, please visit
Philippe Kruchten, Robert L. Nord, Ipek Ozkaya. Technical debt: from metaphor to theory and practice. IEEE Software Special Issue on Technical Debt (Nov/Dec 2012).
To learn more about prototyping, see:
Stephany Bellomo, Robert L. Nord, Ipek Ozkaya. Elaboration on an Integrated Architecture and Requirement Practice: Prototyping with Quality Attribute Focus. Second International Workshop on the Twin Peaks of Requirements and Architecture. International Conference on Software Engineering (ICSE) 2013, May 18-26, 2013 in San Francisco, CA, USA.
Copyright 2013 Carnegie Mellon University