Openstack and Ceph complete installation

Customer

Former Employer

What we did

Complete installation of Openstack and Ceph

Year

2018 / 2019

Products

Red Hat Openstack Platform Red Hat Ceph Storage Red Hat OpenShift Ansible

Datacenters

Nodes

~ 40

Lines of code

~ 5000

A Complete installation of Red Hat Openstack and Red Hat Ceph Storage

The goal of this project was a scalable internal cloud platform, and the reason the customer needed this boiled down to a few key points,

Customers (ie. devolopers and product owners) required an API (ie. cloud functionality) to write infrastructure as code
They needed to get a way from separated hypervisors with separate configuartion (no scalability)
They needed scalable storage and not individual disk arrays
Last but not least, they needed a modern platform for their traditonal VM’s (for things like live migration between datacenters)

This was a huge project and included,

The installation for the whole stack was done via an automated deployment, consisting of

A Red Hat Director configuring and installing roughly 21 compute nodes and 3 controllers, divided over three datacenters
A Ceph Ansible Play configuring 15 storage nodes and 3 monitors
Pre and post scripts that configured storage pools, projects creation, and user authentication and authorization (Active Directory, Kerberos and Single Sign on)

The end result is a fully working production stack consisting of,

3 Datacenters, ie. 3 availability zones (AZ) – both for computes and storage
3 Controllers, 1 in each AZ – Handles all API’s and routes traffic in / out from the cloud via neutron
21 Compute nodes for computing (7 computes in each AZ)
- 17.000 vCPU’s presented to the users
- 12 TB vRAM presented to the users
15 Storage nodes with rougly 135 disks (a third of those disks was traditional HDD’s used for rgw/s3-pools, and the rest was SSD’s), roughly 135 TB raw data
- 3 Local storage pools with 3 replicas, all 3 in the same datacenter
- 1 Streched storage pool with 3 replicas, 1 in each datacenter
- Ceph presents raw block-devices through librbd on the computes
- Users can use rgw / s3 to utilize object storage

On top of all that we also,

Designed and standardized they way internal cloud projects was ordered and created
Standardized the way traditional VM’s was created and maintained (ie. for projects not using kubernetes / docker)
- Standardized platform based on rhel7
- Configuration via puppet
- Authentication via Active Directory users
- Authorization via Active Directory groups / sudo
- Monitoring via check_mk
- Daily security updates
- Predefined package repositories as well as packages - packages that are not defined will be removed
- Logs via splunk
- Metrics via Telegraf -> InfluxDB presented via Grafana
- All of the above where created (and removed) automatically with Ansible Play’s
Created a portal for adminstrating Active Directory users and groups for access to the cloud-solution