Combined 16 infrastructure development reports into one. Mostly familiar, though there are some new names. Glad I can still roughly keep up with the market. We were doing the same DevX in different companies 9–10 years ago under the slogan “if you want to roll out a process, make that process simple,” and platform teams and EaC in 2020.

TL;DR

  1. CI/CD automation and DevSecOps integration: Pipelines move toward “everything as code”: declarative CI/CD processes (via YAML) with mandatory security checks (static analysis, dependency and container scanning).
  2. Mass adoption of GitOps: Managing deployments through Git ensures transparency of changes, automated rollbacks, and reversible configurations. Popular tools (Argo CD, FluxCD) show steady growth.
  3. Evolution of Infrastructure as Code (IaC): Declarative infra management remains key. The Terraform license change sparked alternatives (e.g., OpenTofu) and the development of tools based on Pulumi, AWS CDK, Bicep, as well as experimental approaches (Winglang, Cue, Dhall).
  4. Growth of observability and monitoring: The traditional pillars (metrics, logs, traces) are expanded with continuous profiling (Parca, Grafana Phlare) and eBPF for deep analysis. OpenTelemetry becomes the unified standard for telemetry collection.
  5. Increased resilience and fault tolerance: Architectural approaches (multi-AZ, active-active/passive, Circuit Breaker and Bulkhead patterns) and Chaos Engineering integration (LitmusChaos, Chaos Mesh) allow systems to anticipate failures and recover quickly.
  6. Formalization of disaster recovery (DR): Implementation of formal DR plans with regular tests, backups (Velero, Percona XtraBackup), and multi-region/multi-cloud strategies to minimize downtime.
  7. Rise of platform engineering and improving Developer Experience (DevX): Internal platforms are built to provide developers with self-serve infrastructure and “golden paths” to speed development. Specialized teams form to own platform solutions.
  8. Adoption of SRE approaches: SRE practices (SLO/SLI, error budgets, incident remediation automation) are adopted in both large and small organizations. Separating product and platform SRE improves processes.
  9. Development of cloud and multi-cloud strategies: Hyperscalers expand managed service offerings, organizations actively adopt multi-cloud and hybrid strategies for resilience and cost control.
  10. New directions: generative AI, autonomous systems, and WebAssembly: • Generative AI – helps analyze logs, create infrastructure code, and optimize processes. • Autonomous systems – development of self-healing infrastructure with autoscaling and automatic diagnostics. • WebAssembly (WASM) – explored as a lightweight alternative to containers, especially for edge computing.

It’s still not very clear where everything goes next, besides cleaning up legacy. The big tasks I see so far:

  • simplification of configuration
  • heterogeneous solutions, including bare-metal (in the absence of Terraform for it)
  • increasing team capacity via LLM usage (not clear how)