MetalSoft 7.2: Smarter Infrastructure, Faster Operations, and AI You Can Actually Trust
- Alex Bordei

- 1 day ago
- 5 min read

The latest release transforms how teams deploy, manage, and optimize physical infrastructure, with AI agents, full network orchestration, and enterprise-grade storage support.
Managing bare-metal infrastructure has always demanded deep expertise, constant vigilance, and an ever-growing team. A misconfigured network change can cascade into a multi-hour outage. An alert without context means an engineer spending an hour chasing logs before they can even diagnose the problem. A new cluster deployment can take days of careful, manual coordination.
MetalSoft 7.2 changes the calculus. This release brings intelligent AI agents, a production-ready network fabric manager, expanded storage and virtualization support, and security improvements, all designed to let your team move faster with fewer risks.
AI Operations: Reducing Toil, Closing Skills Gaps, Eliminating Alert Fatigue
Infrastructure operations is one of the most error-prone disciplines in IT. Complex, multi-vendor environments demand deep institutional knowledge, and when that knowledge is unevenly distributed across a team, the risk of misconfiguration multiplies. A single step missed during a deployment, a BGP policy applied inconsistently across vendors, or an alert misread under pressure can cascade into a major incident.
MetalSoft 7.2 introduces two production-ready AI agents and an MCP service layer that fundamentally change this equation.
The Monitoring Agent: Context Before the Call
When something goes wrong, the first 15 minutes of an incident are typically spent gathering information: pulling logs, checking BGP sessions, and verifying switch state. The Monitoring Agent automates exactly that work.
When an alert fires, the agent automatically investigates before notifying anyone. By the time your team gets the page, they receive a diagnosis, not just a notification. That shift alone can cut mean-time-to-resolution by hours. Critically, the Monitoring Agent can be deployed fully on-premises against a local LLM, so sensitive infrastructure data never leaves your environment.
The Infrastructure Optimization Assistant: Expertise on Demand
Building a Kubernetes cluster optimized for a specific workload profile requires deep knowledge of server types, networking, and resource allocation. When that expertise isn't evenly distributed across a team, deployments slow down, or get done incorrectly by engineers working outside their comfort zone.
The Infrastructure Optimization Assistant lets operators describe what they need in plain language, "build me a Kubernetes cluster for a workload with these resource requirements", and recommends the right server types, topology, and configuration automatically. That distributes infrastructure expertise across the team, reduces the chance of under-provisioned or misconfigured deployments, and shortens the time from request to running cluster.
Safe AI Operations by Design
Concerns about AI autonomy over mission-critical infrastructure are legitimate. MetalSoft addresses this with an architecture that interposes a robust abstraction layer between AI agents and physical equipment. Agents interact with the MetalSoft's abstraction layer, not directly with hardware, and every action is subject to MetalSoft's access controls, permission model, KeyStore credential management, and audit logging. Third-party agents and custom automations can also plug into these same MCP tools, letting customers build their own workflows while inheriting the same safety guarantees.
Network Orchestration: From Risk Management to Risk Elimination
Network changes are the highest-risk operations in infrastructure management. A misconfiguration can partition a datacenter, take down production workloads, or silently degrade performance for hours before anyone notices.
The MetalSoft Fabric Manager is now a fully multi-vendor network orchestrator, supporting Cisco Nexus, Juniper, Arista, Dell Enterprise SONiC, and NVIDIA SpectrumX, designed not just to reduce deployment risk but to eliminate it. By providing a consistent, vendor-agnostic management layer, it removes the mental overhead of working across multiple vendor CLIs and reduces the chance of human error.
New capabilities in 7.2 include:
Datacenter Interconnect (DCI) support, connect and manage traffic across multiple datacenters from a single control plane
L3 VNI and Route Domains, advanced VXLAN routing with symmetric IRB, giving teams the building blocks for sophisticated multi-tenant network designs
BGP management, full underlay configuration including MCLAG, PeerLinks, and BGP, previously requiring manual CLI work across vendors
IPAM improvements, full visibility into allocated IPs reduces provisioning errors and speeds up troubleshooting
Infiniband support via NVIDIA UFM, critical for AI and HPC workloads that depend on ultra-low-latency interconnects
FiberChannel support expands storage connectivity options for organizations with existing FC infrastructure
A new network-focused view, gives operators a dedicated interface for fabric management without wading through compute-centric screens
Together, these capabilities mean network teams spend less time on routine configuration, respond to incidents faster, and can safely make changes that would previously have required a maintenance window.
Cluster Deployments: From Days to Minutes
Standing up a production cluster, whether Kubernetes, VMware, or a bare-metal HPC environment, has historically been a multi-day project involving careful sequencing of hardware, network, and software configuration. Mistakes at any step require backtracking.
MetalSoft 7.2 automates end-to-end cluster deployment for three major platforms:
OpenShift: deploy and expand Red Hat OpenShift clusters automatically, including network and storage configuration
VMware VCF: full lifecycle management of VCF clusters with support for VM management across the physical network
Incus: deploy and scale Incus clusters automatically
For operations teams, this means faster time-to-value for new infrastructure, and fewer specialists required to execute a deployment. For the business, it means faster delivery on compute commitments to internal teams and customers.
AI Infrastructure: First-Class Support for GPU and HPC Workloads
As AI training and inference workloads move on-premises, infrastructure platforms need to keep pace. MetalSoft 7.2 adds native support for the hardware that powers these workloads:
NVIDIA SpectrumX networking, high-performance ethernet fabric purpose-built for AI clusters
Infiniband switches and NICs, low latency interconnects essential for large-scale model training
GPU passthrough in KVM/Incus VMs, expose physical GPUs directly to virtual machines for inference workloads
Hitachi Weka storage, high-throughput parallel file system optimized for AI data pipelines
For organizations building private AI infrastructure, this means MetalSoft can serve as the unified management layer from bare metal through cluster, covering both the networking and storage layers that AI workloads demand.
Expanded Storage and Server Support
MetalSoft 7.2 substantially widens the range of infrastructure it can manage out of the box:
Storage: New support for Hitachi, Pure Storage, and Huawei, plus Fibre Channel connectivity, snapshot support for NetApp and Pure, File Share (NFS), and Object Storage (S3)
Server registration: Support for Supermicro hardware and flexible registration profiles that let teams define per-model registration behavior
Cabling: Significant improvements to automated cabling detection, plus manual override support for non-standard environments
Image building: On-site-controller image building with caching delivers faster deploys and better scalability
Broader hardware support means fewer integration projects, less time spent on custom tooling, and a faster path to production for diverse infrastructure estates.
Security: Built for Enterprise and Compliance Requirements
7.2 brings meaningful improvements to MetalSoft's security posture:
Password expiration and rotation: enforces credential hygiene without requiring external IAM tooling
Streamlined permissions: removal of obsolete permission entries reduces attack surface and simplifies access audits
Encrypted communications: port 9091 is no longer required for Global Controller to Site Controller communication, reducing network exposure, with improved encryption protocols throughout
Secure boot for Network OS templates: hardens the boot chain for managed network devices
For organizations operating under formal compliance frameworks or internal security standards, these improvements reduce audit findings and simplify the conversation with security teams.
The Bottom Line
MetalSoft 7.2 is designed for infrastructure teams operating in complex, multi-vendor environments where the margin for human error is low and the consequences of getting it wrong are high.
The AI agents reduce human error and raise the floor of what every engineer on the team can safely do. The network fabric manager eliminates the highest-risk manual operations. The expanded hardware support means less time on integration and more time on delivery. And the security improvements mean fewer blockers when working with compliance and security teams.
Taken together, 7.2 represents a step change in what a small infrastructure team can operate effectively, and a meaningful competitive advantage for organizations willing to invest in modern tooling.
Ready to upgrade? Contact your MetalSoft account team or visit the documentation to get started with 7.2.


