Infrastructure Platform Engineering: Pedal to the Metal

Updated: Apr 10

In recent years, Platform Engineering has emerged as a crucial function within software-focused organizations.

Platform engineering is about making it easier for developers to do their job. Instead of focusing on specific tools, it’s more about creating the right environment for them to work in.

Most of the Platform Engineering practices center on treating infrastructure as either container or VM-based environments, often readily accessible in the public cloud through an as-a-service model. However, for on-premises deployments, there's often an assumption that data center infrastructure will magically be provided to Platform Engineers, enabling them to build their platform layer on top of it.

Ever wonder how long it takes to provision the data center infrastructure? It is a very complex process to build physical infrastructure: it’s always changing, operating systems need regular updates, and as you scale, managing heterogeneous environments inside the data centers becomes increasingly complex.

So, how can we streamline the availability of the foundational infrastructure layer for Platform Engineers? Our approach is to expose this physical infrastructure layer as an API, empowering engineers to swiftly translate product visions into reality.

Current Platform Engineering Model

Platform Engineering is not just necessary tooling but a combination of tools, workflows, and processes. Platform Engineers develop an integrated product that provides self-service capabilities to developers. Whether it is Kubernetes cluster provisioning, code pipelines, monitoring and so on. The self-service platform hides all these complexities and provides developers with all the necessities of the entire life cycle of the application.

Figure 1: Current Platform Engineering Scope

The stack shown above functions effectively for deployments in the public cloud. However, it encounters challenges when applied to on-premises deployments, as Bare Metal infrastructure typically falls outside the purview of Platform Engineering. With the increasing demand for Bare Metal in AI/ML workloads, there arises a need to redefine the scope of platform engineering to encompass physical infrastructure.

Proposed (Infrastructure) Platform Engineering Model

Several factors contributed to the decision to exclude bare metal from the scope of Platform Engineering:

Complexity of the Underlying Technology Stack: Data center infrastructure comprises servers, switches, and storage systems, each operating on different protocols and standards such as IPMI, RedFish, SwordFish, SSH, NetConf, and REST. Each technology has its learning curve and requires integration to work cohesively, amplifying the overall system complexity. Additionally, managing a multi-vendor environment adds further layers of challenges.
Network Complexity: Challenges arise in ensuring tenant isolation, security, and network segmentation through technologies like VLANs, VxLAN, and BGP EVPN
Management and Operations Complexity: Tasks include ensuring the right firmware and correct versions of host operating systems, maintaining day-1 configuration sanity, and conducting regular patching and upgrades.

So how do we empower the platform engineer to take control of underlying physical infrastructure, so they minimize any dependency their upper layer stacks would have.

MetalSoft abstracts away all these complexities and offers the entire infrastructure through easily consumable APIs, seamlessly integrating into the upper layers of the platform. This means that when developers are ready to deploy their applications, they not only have access to the container platform but also to the physical infrastructure, all available as self-serve capabilities.

Figure 2: New Platform Engineering Mode

The essential features of the MetalSoft platform for Infrastructure Platform Engineering include:

User Features: Authorization and Authentication (AuthZ/N), API Schema, Terraform provider, notifications, workflows, and self-service declarative environments.
Lifecycle Management: Integration with CI/CD, policy-based firmware management.
Multi-platform Features: Provisioning of servers, switches, and storage; deployment of operating systems; security measures; scalability; and resiliency

Conclusion:

Integrating physical infrastructure provisioning aims to streamline the platform engineer's journey from infrastructure to application deployment, minimizing cognitive overload and context-switching.

Platform engineering team should not require specialized knowledge to manage underlying physical infrastructures, instead the interface to those subsystems should be made available in ready to consume APIs.

The inclusion of infrastructure will accelerate the application build process and will enable rapid iterations for businesses.