You've decided to build a Bare Metal Cloud! The next question is, what servers should you buy in your first round? Since we're talking about a lot of money, we need to get this right. It needs to match your user's consumption habits as much as possible, so that you don't end up buying RAM or SSDs for upgrades all the time and perform countless remote hand operations, which would defeat the purpose of automation.
Capacity planning is an art in itself. In this blog post, we'll cover a small part of what that means to help you get started with your bare metal cloud.
Chances are you already have a VM-based private cloud offering running. Don't be tempted to derive a lot of insight from that. While from a consumption perspective, bare metal it is very similar to VMs, there are many differences in the economics that go into it due to the nature of hardware. For example, a bare metal cloud will probably be used for greenfield deployments for Kubernetes or Big Data, while existing VMs might be for older applications that have been consolidated and might even be older than you!
It's a balancing act. Two conflicting forces are typically at play:
The diverse needs of the end user
The need for enough capacity for each configuration
In the VM world, you can mix and match any ratio of core-to-RAM-to-disk you want. In the BMaaS world, the very diverse client needs will pull the company into what is called "snowflaking" - creating too many different configurations that are hard to repurpose between users without manual intervention. Snowflaking also means you never have enough stock of any one config, which will end up negating the advantages of automation. Thus, standardization is needed.
Choosing the CPU & RAM Configurations
You need a healthy mix of server configurations (what we refer to at MetalSoft as Server types). Most of the servers you deploy will be general purpose ones (80%) and then you'll have a minority of "specialized" configurations. (20%). I recommend you use "S","M" and "L" (and maybe an "XL" ) for the general purpose servers and then maybe an S and an L for the specialized ones.
General Purpose Configurations
The ratio that works best as a "general purpose server" seems to be 4GB RAM for every hyper-threaded core. 8 cores: 32GB RAM, 4 cores: 16 GB RAM, 16 cores: 64GB, etc. We tend to see around 25GB of SSD per HT core. Core frequency doesn't seem to matter much to anyone unless it's ridiculous. Anything above 2Ghz is fine.
There are instances that fall outside of the normal ratio of 1 HT core to every 4GB RAM. Here's what we are seeing:
CPU intensive instances with 1:2 ratio (8 cores 16GB, 16 cores 32GB) - these tend to sell very rarely, as most workloads need a mix;
Memory intensive instances with 1:8 ratio (16 cores 128 GB RAM, 32 cores 384GB RAM) - these sell more frequently, as there are specific workloads requiring this, such as in-memory databases or AI model serving;
Storage-intensive instances will most likely have either IOPs intensive 4x 1TB NVMe or large capacity 12x 2TB HDD or 24x2TB for Spark/Hadoop. The requirement is for many smaller drives, rather than a few large ones, for better parallelization and data safety, which is why we suggest configurations with 12 drives.
Choosing Storage Configurations
With MetalSoft, you have two options for storage:
Local storage: This solution is simple, very fast, and very stable. The disadvantage is less flexibility and "on-demand-ness." Also, deployments take slightly longer.
Netboot (PXE-booted): Diskless nodes offer the advantage of disks with any size, and being able to perform upgrades or downgrades easily, and replace the server with just a reboot. This solution also offers the advantage of quick deployment, snapshots, quick migrations (inter-dc and inter-storage) and quick external backups. The disadvantage is that if the network or storage goes down, the server will also go down. Dual-head storage and multi-path connections are required to enable redundancy, which increases the cost of the solution. The link capacity itself will also be a limiting factor. It is why we favor dual 10GbE and 25GbE links even on “small” servers, even though user applications typically utilize a fraction of that capacity.
Cost-wise, the two solutions are relatively similar, as the SAN can use copy on write, deduplication, and compression to use space much more efficiently.
From our experience, only a fraction (around 10%) of workloads will need truly on-demand, thus, local storage should be good enough for the majority of servers.
We recommend using relatively small, but fast, local drivers on all instances and then supplementing with SAN storage where needed. If those prove insufficient, the same instances can be booted from the SAN, enabling all of the above while the local drives can be used as non-persistent data drives for increased performance.
Depending on your expected workload distribution, if iSCSI based volumes or Netboot are required (frequent server power-on, power off, etc.), then we recommend using an All-Flash storage populated with 12x2TB "Mixed use" SSDs, and then subsequently upgraded with more capacity as demand requires it.
How many of each?
We've noticed that users tend to require servers at the extreme ends of the spectrum more than the middle ("small" (<=32GB RAM) and "large" servers (>=128GB RAM), rather than "medium").
Also, if your users are not very picky in terms of exact requirements, they will generally use what you have and are happy with larger servers, if offered as an alternative (at a discount).
The figure on the right is a real distribution that I've seen in practice.
Choosing networking gear
For all servers, we recommend using a dual port 10g/25G NIC. We recommend using a 32x100G leaf (S5232F-ON) switch and breakout cables (QSFP28-to-4SFP28) because 100g will have a longer life than 25g and will expand the useful life of the switch. If this isn’t a consideration, a 48x25Gbps leaf switch (S5248F-ON) can also be used for 30-40% less cost per server.
The following is an example of configuration distribution that you can use as a starting point. Of course this will need to be tweaked further based on your estimated needs.
We recommend using as few variations on the components as possible: use the same NVMe drive with the same size, the same DIMM time, the same CPU, etc.
This has several purposes:
In an emergency, such as when a customer needs a server configuration that is not in stock or when a server has failed, DC ops can simply add components to an existing compatible chassis. We use the general purpose configurations as base platforms for the specialist instances, which are needed much less frequently than GP instances.
We recommend having an extra stock of these common components in the data centers to quickly replace a faulty disk; less variation means less stock required.
If the same hardware is present in all systems, operating system migrations can be performed. This is important especially for iSCSI based booted systems.
There you have it! You are now ready to order your kit.