You've decided to build a Bare metal cloud! The next question is, what servers should I buy in my first round? Since we're talking about a lot of money, we need to get this right. It needs to match your user's consumption habits as much as possible so that you don't end up buying RAM or SSDs for upgrades all the time and perform countless remote hand operations, which will defeat the purpose of the automation.
Capacity planning is an art in itself. In this blog post, we'll cover a small part of what that means to help you get started with your bare metal cloud.
Chances are you already have a VM-based private cloud offering running. Don't be tempted to derive a lot of insight from that. While from a consumption perspective, bare metal it is very similar to VMs, due to the nature of hardware, there are many differences in the economics that go into it. For example, a bare metal cloud will probably be used for greenfield deployments for Kubernetes or Big Data while existing VMs might be old applications that have been consolidated and might be older than yourself.
It's a balancing act. Two conflicting forces are typically at play:
1. The diverse client needs 2. The need for enough capacity for each configuration
In the VM world, you can mix and match any core-to-ram-to-disk ratio you want. In the BMaaS world, the very diverse client needs will pull the company into what is called "snowflaking" - creating too many different configurations that are hard to repurpose between users without manual intervention. Snowflaking also means you never have enough stock of any, which will end up negating the advantages of automation. Thus standardization is needed.
Choosing the CPU & RAM configurations
You need a healthy mix of server configurations (what we refer to at MetalSoft as Server types). Most of the servers you deploy will be general purpose ones (80%) and then you'll have a minority of "specialized" configurations. (20%). I recommend you use "S","M" and "L" (and maybe an "XL" ) for the general purpose servers and then maybe an S and an L for the specialized ones.
General purpose configurations
The ratio that works best as a "general purpose server" seems to be 4GB RAM for every hyper-threaded core. 8 cores 32GB RAM, 4 cores 16 GB RAM etc.16 cores 64GB etc. We tend to see around 25GB of SSD per HT core. Core frequency doesn't seem to matter to anyone unless it's ridiculous. Anything above 2Ghz is fine.
"Specialized" configurations
There are instances that fall outside of the normal 1 HT core to 4GB RAM and we have:
CPU intensive instances with 1:2 ratio (8 cores 16GB, 16 cores 32GB) - these tend to sell these very rarely as most workloads need a mix;
Memory intensive instances with 1:8 ratio (16 cores 128 GB RAM, 32 cores 384GB RAM) - these sell more as there are specific workloads requiring this such as in-memory databases or AI model serving;
Storage-intensive instances will most likely have either IOPs intensive 4x 1TB NVMe or large capacity 12x 2TB HDD or 24x2TB for Spark/Hadoop. The requirement is for many smaller drives rather than a few large ones for better parallelization and data safety, which is why we suggest configurations with 12 drives.
Choosing Storage configurations
With MetalSoft, you have two options for storage:
Local storage: This solution is simple, very fast and very stable. The disadvantage is less flexibility and on-demandness. Also deployments take slightly longer.
Netboot (PXE-booted): Diskless nodes offer the advantage of disks with any size and the advantage of being able to perform upgrades or downgrades easily and replace the server with just a reboot. This solution also offers the advantage of quick deployment, snapshots, quick inter-dc and inter-storage migrations and external backups. The disadvantage is that if the network or storage goes down the server will also go down. Dual-head storage and multi-path connections are required to enable redundancy which increases the cost of the solution. The link capacity itself will also be a limiting factor. It is why we favor dual 10GbE and 25GbE links even on “small” servers even though user applications usually utilize a fraction of that capacity.
Cost wise the two solutions are relatively similar as the SAN can use copy on write, deduplication and compression to use space much more efficiently.
From our experience, only a fraction (around 10%) of workloads will be truly on-demand thus local storage at least for the majority of the servers should be good enough.
We recommend using relatively small but fast local drivers on all instances and supplementing with SAN storage where needed. If those prove insufficient, the same instances can be booted from the SAN enabling all of the above while the local drives can be used as non-persistent data drives for increased performance.
Depending on your expected workload distribution, if iSCSI based volumes or netboot are required (frequent server power-on, power off etc) we recommend using an All flash storage populated with 12x2TB "Mixed use" SSDs and then subsequently upgraded with more capacity as demand requires it.
How many of each?
Users tend to require servers at the extreme ends of the spectrum more than the middle ("small" (<=32GB RAM) and "large" servers (>=128GB rather than "medium").
Also your users are not very picky in terms of exact requirements, they will generally use what you have and are happy with larger servers if offered as an alternative (at a discount).
Attached is a real distribution that I've seen in practice.
Choosing networking gear
For all servers we recommend using a dual port 10g/25G NIC. We recommend using a 32x100G leaf (S5232F-ON) switch and breakout cables (QSFP28-to-4SFP28) because 100g will have a longer life than 25g and will expand the useful life of the switch. If this isn’t a consideration a 48x25Gbps leaf switch (S5248F-ON) can also be used for 30-40% less cost per server.
Example configurations
The following is an example of configuration distribution that you can use as a starting point. Of course this will need to be tweaked further based on your estimated needs.
Additional considerations
We recommend using as few variations on the components as possible, such as use the same NVMe drive with the same size, the same DIMM time, the same CPU etc.
This has several purposes:
In an emergency, such as when a customer needs a server configuration that is not on stock or when a server has failed, DC ops can simply add additional components to an existing compatible chassis. We use the general purpose configurations as base platforms for the specialist instances which are needed much rarely then GP instances.
We recommend having an extra stock of these common components in the datacenters to quickly replace a faulty disk thus less variation means less stock.
If the same hardware is present in all systems, operating system migrations can be performed. This is important especially for iSCSI based booted systems.
There you have it! You are now ready to order your kit.
Comments