Book: Team Topologies

515ymmKuFqL“Team Topologies: Organizing Business And Technology Teams For Fast Flow” By Matthew Skelton and Manuel Pais

  • “For example, having teams adopting cloud and infrastructure-as-code can reduce the time to provision new infrastructure from weeks or months to minutes or hours. But if every change requires deployment (to production) approval from a board that meets once a week, then delivery speed will remain weekly at best.”
  • “Organizations which design systems . . . are constrained to produce designs which are copies of the communication structures of these organizations.”
  • “In other words, building software requires an understanding of communication across teams in order to realistically consider what kind of software architectures are feasible. If the desired theoretical system architecture does not fit the organizational model, then one of the two will need to change.”
  • “Dan Pink’s three elements of intrinsic motivation: autonomy (quashed by constant juggling of requests and priorities from multiple teams), mastery (‘jack of all trades, master of none’), and purpose (too many domains of responsibility).”
  • “If the architecture of the system and the architecture of the organization are at odds, the architecture of the organization wins.”
  • “In particular, an organization that is arranged in functional silos (where teams specialize in a particular function, such as QA, DBA, or security) is unlikely to ever produce software systems that are well-architected for end-to-end flow. Similarly, an organization that is arranged primarily around sales channels for different geographic regions unlikely to produce effective software architecture that provides multiple different software services to all global regions.”
  • “Our research lends support to what is sometimes called the ‘inverse Conway maneuver,’ which states that organizations should evolve their team and organizational structure to achieve the desired architecture. The goal is for your architecture to support the ability of teams to get their work done—from design through to deployment—without requiring high-bandwidth communication between teams.”
  • “Conway’s law tells us that we need to understand what software architecture is needed before we organize our teams, otherwise the communication paths and incentives in the organization will end up dictating the software architecture.”
  • “If the organization has an expectation that ‘everyone should see every message in the chat’ or ‘everyone needs to attend the massive standup meetings’ or ‘everyone needs to be present in meetings’ to approve decisions, then we have an organization design problem. Conway’s law suggests that this kind of many-to-many communication will tend to produce monolithic, tangled, highly coupled, interdependent systems that do not support fast flow.”
  • “In fact, research by Google on their own teams found that who is on the team matters less than the team dynamics; and that when it comes to measuring performance, teams matter more than individuals.”
  • “By team, we mean a stable grouping of five to nine people who work toward a shared goal as a unit.”
  • “High trust is what enables a team to innovate and experiment. If trust is missing or reduced due to a larger group of people, speed and safety of delivery will suffer.”
  • “A single team: around five to eight people (based on industry experience)
    In high-trust organizations: no more than fifteen people
    Families (“tribes”): groupings of teams of no more than fifty people
    In high-trust organizations: groupings of no more than 150 people
    Divisions/streams/profit & loss (P&L) lines: groupings of no more than 150 or 500 people”
  • “Typically, a team can take from two weeks to three months or more to become a cohesive unit.”
  • “The best approach to team lifespans is to keep the team stable and ‘flow the work to the team,’ as Allan Kelly says in his 2018 book Project Myopia. Teams should be stable but not static, changing only occasionally and when necessary.”
  • “For example, at cloud software specialist Pivotal, ‘an engineer would switch teams about every 9 to 12 months.’ In typical organizations with lower levels of trust, people should remain in the same team for longer (perhaps eighteen months or two years), and the team should be given coaching to improve and sustain team cohesion.”
  • “The danger of allowing multiple teams to change the same system or subsystem is that no one owns either the changes made or the resulting mess. However, when a single team owns the system or subsystem, and the team has the autonomy to plan their own work, then that team can make sensible decisions about short-term fixes with the knowledge that they will be removing any dirty fixes in the next few weeks.”
  • “Every part of the software system needs to be owned by exactly one team. This means there should be no shared ownership of components, libraries, or code. Teams may use shared services at runtime, but every running service, application, or subsystem is owned by only one team. Outside teams may submit pull requests or suggestions for change to the owning team, but they cannot make changes themselves.”
  • “The team takes responsibility for the code and cares for it, but individual team members should not feel like the code is theirs to the exclusion of others. Instead, teams should view themselves as stewards or caretakers as opposed to private owners. Think of code as gardening, not policing.”
  • “With a team-first approach, the whole team is rewarded for their combined effort.”
  • “If we stress the team by giving it responsibility for part of the system that is beyond its cognitive load capacity, it ceases to act like a high-performing unit and starts to behave like a loosely associated group of individuals, each trying to accomplish their individual tasks without the space to consider if those are in the team’s best interest”
  • “At the same time, the team needs the space to continuously try to reduce the amount of intrinsic and extraneous load they currently have to deal with (via training, practice, automation, and any other useful techniques).”
  • “Limit the Number and Type of Domains per Team”
  • ” If a domain is too large for a team, instead of splitting responsibilities of a single domain to multiple teams, first split the domain into subdomains and then assign each new subdomain to a single team.”
  • “The second heuristic is that a single team (considering the golden seven-to-nine team size) should be able to accommodate two to three “simple” domains.”
  • “The third heuristic is that a team responsible for a complex domain should not have any more domains assigned to them—not even a simple one.”
  • “The last heuristic is to avoid a single team responsible for two complicated domains.”
  • “Instead of designing a system in the abstract, we need to design the system and its software boundaries to fit the available cognitive load within delivery teams.”
  • “Instead of choosing between a monolithic architecture or a microservices architecture, design the software to fit the maximum team cognitive load.”
  • “Instead of structuring teams according to technical know-how or activities, organize teams according to business domain areas.”
    —Jutta Eckstein, “Feature Teams—Distributed and Dispersed,” in Agility Across Time and Space
  • “The first anti-pattern is ad hoc team design.”
  • “The other common anti-pattern is shuffling team members.”
  • “we must . . . ensure delivery teams are cross-functional, with all the skills necessary to design, develop, test, deploy, and operate the system on the same team.”
  • “The feature team typically needs to touch multiple codebases, which might be owned by different component teams. If the team does not have a high degree of engineering maturity, they might take shortcuts, such as not automating tests for new user workflows or not following the ‘boy-scout rule’ (leaving the code better than they found it). Over time, this leads to a breakdown of trust between teams as technical debt increases and slows down delivery speed.”
  • “A stream-aligned team is a team aligned to a single, valuable stream of work; this might be a single product or service, a single set of features, a single user journey, or a single user persona. Further, the team is empowered to build and deliver customer or user value as quickly, safely, and independently as possible, without requiring hand-offs to other teams to perform parts of the work.”
  • “In line with the principle ‘you build it, you run it’ popularized by Werner Vogels, CTO of Amazon, ‘service teams’ (as they’re called internally) must be cross-functional and include all the required capabilities to manage, specify, design, develop, test, and operate their services (including infrastructure provisioning and client support). These capabilities are not necessarily mapped to individuals; the team as a whole must provide them.”
  • “An enabling team is composed of specialists in a given technical (or product) domain, and they help bridge this capability gap. Such teams cross-cut to the stream-aligned teams and have the required bandwidth to research, try out options, and make informed suggestions on adequate tooling, practices, frameworks, and any of the ecosystem choices around the application stack.”
  • “The end goal of an enabling team is to increase the autonomy of stream-aligned teams by growing their capabilities with a focus on their problems first, not the solutions per se. If an enabling team does its job well, the team that it is helping should no longer need the help from the enabling team after a few weeks or months; there should not be a permanent dependency on an enabling team.”
  • “Time taken to fix a failing deployment”
  • “Time from code commit to deployment (cycle time)”
  • “I felt strongly that an engineering enablement team should plan for its own extinction from the very first day to avoid other teams becoming dependent.”
  • “We estimated that a quarter of our team’s time was spent actually implementing solutions; the rest was sharing knowledge.”
  • “Stream-aligned teams should expect to work with enabling teams only for short periods of time (weeks or months) in order to increase their capabilities around a new technology, concept, or approach. After the new skills and understanding have been embedded in the stream-aligned team, the enabling team will stop daily interaction with the stream-aligned team, switching their focus to a different team.”
  • “Enabling teams and CoP can co-exist because they have slightly different purposes and dynamics: an enabling team is a small, long-lived group of specialists focused on building awareness and capability for a single team (or a small number of teams) at any one point in time, whereas a CoP usually seeks to have more widespread effects, diffusing knowledge across many teams.”
  • “A complicated-subsystem team is responsible for building and maintaining a part of the system that depends heavily on specialist knowledge, to the extent that most team members must be specialists in that area of knowledge in order to understand and make changes to the subsystem.”
  • “The goal of this team is to reduce the cognitive load of stream-aligned teams working on systems that include or use the complicated subsystem. The team handles the subsystem complexity via specific capabilities and expertise that are typically hard to find or grow. We can’t expect to embed the necessary specialists in all the stream-aligned teams that make use of the subsystem; it would not be feasible, cost-effective, or in line with the stream-aligned team’s goals.”
  • “the complicated-subsystem team is created only when a subsystem needs mostly specialized knowledge. The decision is driven by team cognitive load, not by a perceived opportunity to share the component.”
  • “The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services.”
  • “A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination.”
  • “For organizations that are successful at delivering software rapidly and safely, most teams are stream aligned, with only around one in seven to one in ten teams not stream aligned.”
  • “For example, database-administrator (DBA) teams can often be converted to enabling teams if they stop doing work at the software-application level and focus on spreading awareness of database performance, monitoring, etc. to stream-aligned teams.”
  • “Likewise, ‘middleware’ teams can also be converted to platform teams if they make those parts of the system easier to use for stream-aligned teams, reducing cognitive load for developers by customizing, simplifying, or wrapping the middleware into easy-to-consume self-serve services aligned to the key organization goals.”
  • “The model for IT support that consistently seems to work best has two aspects: (1) support teams aligned to the stream of changes, and (2) dynamic cross-team activity to resolve live service incidents.”
  • “In this model, if dedicated support teams are needed, they are aligned to the stream of change, alongside a team or squad building the software systems.”
  • “When an incident occurs with the live production systems, the support teams initially attempt to resolve the problem within stream areas alone. If the problem is entirely within stream, there is no need for any other team to get involved. If necessary, other stream-aligned support teams are brought in to help diagnose the problem; and if the incident affects many teams, a dynamic “swarm” or “incident squad” of support specialists is formed from the various support teams to triage the problem and restore service as rapidly as possible.”
  • “Crucially, for effective modern software development, the architecture team should support the other teams, helping them to be as effective as possible, rather than imposing designs or technology choices on other teams.”
  • “A crucial role of a part-time, architecture-focused enabling team is to discover effective APIs between teams and shape the team-to-team interactions with Conway’s law in mind.”
  • “An application monolith is a single, large application with many dependencies and responsibilities that possibly exposes many services and/or different user journeys. Such applications are typically deployed as a unit, often causing headaches for users (the application is not available during deployment) and operators (unexpected issues because the production environment is a moving target; even if we tested the monolith in an environment similar to production, it has surely drifted since then).”
  • “A joined-at-the-database monolith is composed of several applications or services, all coupled to the same database schema, making them difficult to change, test, and deploy separately. This monolith often results from the organization viewing the database, not the services, as the core business engine.”
  • “A monolithic build uses one gigantic continuous-integration (CI) build to get a new version of a component. Application monoliths lead to monolithic builds, but even with smaller services, it’s possible that the build scripts set out to build the entire codebase instead of using standard dependency-management mechanisms between components (such as packages or containers).”
  • “A monolithic release is a set of smaller components bundled together into a ‘release.’ When components or services can be built independently in CI but are only able to test in a shared static environment without service mocks, people end up bringing into that same environment all the latest versions of the components.”
  • “A monolithic model is software that attempts to force a single domain language and representation (format) across many different contexts. “
  • “Standardizing everything in order to minimize variation simplifies management oversight of engineering teams, but it comes at a high premium. Good engineers are able and keen to learn new techniques and technologies. Removing teams’ freedom to choose by enforcing a single technology stack and/or tooling strongly harms their ability to use the right tool for the job and reduces (or sometimes kills) their motivation. In Accelerate, the authors mention how their research indicates that enforcing standardization upon teams actually reduces learning and experimentation, leading to poorer solution choices.”
  • “Most of our fracture planes (software responsibility boundaries) should map to business-domain bounded contexts. A bounded context is a unit for partitioning a larger domain (or system) model into smaller parts, each of which represents an internally consistent business domain area (the term was introduced in the book Domain-Driven Design by Eric Evans).”
  • “Splitting off the parts of the system that typically change at different speeds allows them to change more quickly. “
  • “We’d argue that for a team to communicate efficiently, the options are between full colocation (all team members sharing the same physical space) or a true remote-first approach (explicitly restricting communication to agreed channels—such as messaging and collaboration apps—that everyone on the team has access to and consults regularly). When neither of these options is feasible (full colocation or remote first), then it’s better to split off the monolith into separate subsystems for teams in different locations.”
  • “Splitting off subsystems with clearly different risk profiles allows mapping the technology changes to business appetite or regulatory needs. It also allows each subsystem to evolve its own risk profile over time, adopting practices like continuous delivery that allow increasing speed of change without incurring more risk.”
  • “Splitting off such a subsystem based on particular performance demands helps to ensure it can scale autonomously, increasing performance and reducing cost.”
  • “There are situations where splitting off a subsystem based on technology can be effective, particularly for systems integrating older or less automatable technology.”
  • “Collaboration means explicitly working together on defined areas. X-as-a-Service means one team consumes something ‘as a service’ from another team.”
  • “Collaboration: working closely together with another team
    X-as-a-Service: consuming or providing something with minimal collaboration
    Facilitating: helping (or being helped by) another team to clear impediments”
  • “The collaboration interaction mode is good for rapid discovery of new things, because it avoids costly hand-offs between teams.”
  • “This collaboration occurs between groups with different skill sets in order to bring together the combined knowledge and experience of many people to solve challenging problems.”
  • “Short-term or light-touch occasional collaboration to establish or refine the interfaces is fine, but a need for ongoing collaboration suggests incorrect domain boundaries and/or team responsibilities, or the incorrect mix of skills within a team.”
  • “The X-as-a-Service team interaction mode is suited to situations where there is a need for one or more teams to use a code library, component, API, or platform that ‘just works’ without much effort, where a component or aspect of the system can be effectively provided ‘as a service’ by a distinct team or group of teams.”
  • “The X-as-a-Service model works well only if the service boundary is well chosen and well implemented, with a good service-management practice from the team providing the service.”
  • “Typical Uses: Stream-aligned teams and complicated-subsystem teams consuming Platform-as-a-Service from a platform team; stream-aligned teams and complicated-subsystem teams consuming a component or library as a service from a complicated-subsystem team.”
  • “A team with a facilitating remit does not take part in building the main software systems, supporting components, or platform but, instead, focuses on the quality of interactions between other teams building and running the software. For example, a team facilitating the effectiveness of three stream-aligned teams (see Chapter 5) might find that the logging service provided by the platform is quite difficult to configure: all three teams find it difficult to use. The team helping the three teams can then facilitate some improvements to the logging service from the platform.”
  • “Teams interacting using the facilitating mode should expect to help and be helped. Let’s say that a stream-aligned team is being helped by an enabling team to adopt new practices. People in the stream-aligned team need to be open to being helped by the enabling team; they need to have an open mind to new approaches and be aware that the enabling team has probably seen some better approaches.”
  • “For example, a stream-aligned team can typically expect to interact with other teams using either collaboration or X-as-a-Service, whereas a platform team mostly expects to interact using X-as-a-Service. This gives some further hints for the kinds of interpersonal skills likely to be needed for each type of team: platform teams will need strong product- and service-management expertise, whereas enabling teams will need people with strong mentoring and facilitating experience.”
  • “Of course, Conway’s law tells us that during the discovery and rapid learning taking place as part of collaboration mode, the responsibilities and architecture of the software is likely to be more ‘blended together’ compared to when the teams are interacting using X-as-a-Service. By anticipating this fuzziness, some awkward team interactions (‘the API is not well designed’ and so forth) can be avoided by tightening up the API as the team moves to X-as-a-Service.”
  • “John Kotter, expert in organizational change, says: ‘I think of [strategy] as an ongoing process of ‘searching, doing, learning, and modifying’. . . . The more the organization exercises its strategy skills, the more adept it becomes at dealing with a hypercompetitive environment.'”
  • “By separating the maintenance work from the initial design work, the feedback loop from Ops to Dev is broken, and any influence that operating that software may have on the design of the software is lost.”
  • “Having separate teams for ‘new stuff’ and BAU tends to prevent learning, improvements and ability to self-steer.”
  • “Having separate teams for new-stuff and BAU also tends to prevent learning between these two groups.”
  • “Instead, it is much more effective to have one team responsible for new services and BAU of an existing system side by side. This helps the team to increase the quality of signals from the older system by retro-fitting telemetry from the newer system and increasing the organization’s ability to sense its environment and self steer.”
  • “A healthy organizational culture: an environment that supports the professional development of individuals and teams—one in which people feel empowered and safe to speak out about problems, and the organization expects to learn continuously.”
  • “First, as an organization ask yourself: What does the team need in order to:
    Act and operate as an effective team?
    Own part of the software effectively?
    Focus on meeting the needs of users?
    Reduce unnecessary cognitive load?
    Consume and provide software and information to other teams?”

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s