"Automating" server upgrades
Updated: Apr 7, 2021
You're thinking robots in the datacenter right? There actually a number of ways to go about it that do not involve our soon-to-be sentient friends.
Let's start by defining the objective. What do we want to achieve here? Bare metal automation is used for a variety of reasons but fundamentally it is to offer the compute, storage and network part to some application or user or further consumption.
The challenge is how to provide a new server with a certain configuration (RAM, DISK, Cores, GPUs) and connect it in a certain way to other servers without touching any server or cable.
This is useful for speeding up service delivery for service providers and reducing truck rolls in telco edge cloud scenarios.
Let's break it down shall we?
Step #1 - disaggregate storage
Storage is the biggest pain for a service provider. Too little and many people will run out. Too much and your solution is too expensive. Hence, storage providers do what's called storage disaggregation.
There are two major ways of doing this:
Option 1: External storage
Some bare metal automation solutions (Including MetalSoft) allow the operation of diskless servers (also called Netboot). Servers have no local storage and they boot via iSCSI from an external storage. This way you can quickly move the volumes between hosts if you want to perform an upgrade.
From our experience this setup is actually FASTER than local hard drives and even some local SSDs due to the storage system's cache.
From a cost perspective it is about the same. Storage space costs more than local disks per GB but they make up through deduplication and compression.
This setup is limited by the network links of the server (eg: if you have Gbit connections you are limited to about 100Mbps.). The other limit is latency (read IOPs). From our experience this is negligible for disks but it starts to become apparent versus setups with more than one local NVMe SSDs.
A new class of transport promises to eliminate this issue as well which is NVMe-over-Fabrics. There are multiple flavors, including my favorite RDMA over Converged Ethernet (RoCE) which promises performance levels identical to direct attached storage (DAS).
Option 2: Common pool of local disks
Other solutions use another approach whereby the disks are still in the node but they are aggregated into a common pool and exported over the network to the other nodes.
This solution doesn't quite work for pure bare metal in a multi-tenant environment due to the security implications. However it does work very well with private Kubernetes clusters (by using something such as Rook+Ceph). This is a setup that works best at the edge as it avoids having to have a local storage.
Step #2 - standardize on CPU and RAM configurations
From my experience as a product manager for a service provider I can tell you that users do not actually need very specific configuration sizes so it pays to standardize.
Users typically think in terms of I need a "Small" server rather than I want a server with 10 cores and 17GB of RAM. Nobody is that precise.
The ratio that works best as a general purpose server seems to be 4GB RAM for every hyper-threaded 'core' (1:4).
An "S" server would be a 8 HT cores, 32GB RAM machine. An "M" server would be a 16 (or 20) HT cores 64GB RAM and so forth. Your base "SKUs" would look like this:
"S" server configuration: 1x Intel® Xeon® E-2134 Processor (4 cores/ 8 threads) 32GB RAM.
"M" server configuration: 1x Intel® Xeon® E-2278G Processor (8 cores/ 16 threads) 64 GB RAM.
"L" server configuration: 2x Intel® Xeon® Gold 6328H Processor (16 cores 32 threads) 256GB RAM.
This should cover 80% of your requests. Beyond this, you will also have some "Specialized" instance types with a ratio of 1:8 ratio of HT cores:RAM for "memory intensive apps" or 1:2 for CPU "intensive apps" but remember that those are exceptions not the rule and you can always fallback to one of the "larger" standard configurations. You can always fit inside a larger memory server if you have enough CPU.
I've attached a graph of one of RAM capacity distribution that I've seen in practice so you can calculate your numbers. Lean towards having more of the extremes (S and L/XL) rather than the Ms.
This standardization will help you greatly to avoid having to actually upgrade and downgrade servers all the time physically. This is not only costly and time consuming but can also damage the server itself while the RAM DIMM are introduced. It's also hit-and-miss we had to 're-bed" so many DIMMS over the years that it's just not worth it.
If you use netboot, if a customer needs an upgrade you simply move the drive over to th new system and reboot. If you use a locally installed OS then you provision the new instance in addition to the first one, copy the apps and data over and shutdown the first one.
Step #3 - Automate switch provisioning
This is perhaps the key to the above process. Cabling is one of the biggest pains and the piece most likely to fail. To completely eliminate the need to touch the servers after you've racked them you need switch provisioning automation.
You typically pre-cable 2 or 4 connections per server to one or two leaf switches when you rack the server.
After that, you have the freedom to programmatically "link" the ports to whatever network you want from the infrastructure.
We recommend using M-Lag capable switches and using link aggregation across different switches by default for all links. This way, if a switch fails you're still up and running.
So there you have it, you can now provision, de-provision, repurpose bare metal to serve different needs without touching it (after the initial racking that is). The servers don't actually change of course. It's the application that moves between them.
In a different article I'll get into more details about the server configurations you can use.