A Complete installation of Red Hat Openstack and Red Hat Ceph Storage
The goal of this project was a scalable internal cloud platform, and the reason the customer needed this boiled down to a few key points,
- Customers (ie. devolopers and product owners) required an API (ie. cloud functionality) to write infrastructure as code
- They needed to get a way from separated hypervisors with separate configuartion (no scalability)
- They needed scalable storage and not individual disk arrays
- Last but not least, they needed a modern platform for their traditonal VM’s (for things like live migration between datacenters)
This was a huge project and included,
- Determining the specification of requirements for the customer
- Ordering and mounting hardware
- Configuring and Designing the stack (storage pools etc.)
- Configuring and installation
- Designing end user usage (such as ordering project processes etc.)
- Hardware / Usage monitoring and regular follow ups
The installation for the whole stack was done via an automated deployment, consisting of
- A Red Hat Director configuring and installing roughly 21 compute nodes and 3 controllers, divided over three datacenters
- A Ceph Ansible Play configuring 15 storage nodes and 3 monitors
- Pre and post scripts that configured storage pools, projects creation, and user authentication and authorization (Active Directory, Kerberos and Single Sign on)
The end result is a fully working production stack consisting of,
- 3 Datacenters, ie. 3 availability zones (AZ) – both for computes and storage
- 3 Controllers, 1 in each AZ – Handles all API’s and routes traffic in / out from the cloud via neutron
- 21 Compute nodes for computing (7 computes in each AZ)
- 17.000 vCPU’s presented to the users
- 12 TB vRAM presented to the users
- 15 Storage nodes with rougly 135 disks (a third of those disks was traditional HDD’s used for rgw/s3-pools, and the rest was SSD’s), roughly 135 TB raw data
- 3 Local storage pools with 3 replicas, all 3 in the same datacenter
- 1 Streched storage pool with 3 replicas, 1 in each datacenter
- Ceph presents raw block-devices through librbd on the computes
- Users can use rgw / s3 to utilize object storage
On top of all that we also,
- Designed and standardized they way internal cloud projects was ordered and created
- Standardized the way traditional VM’s was created and maintained (ie. for projects not using kubernetes / docker)
- Standardized platform based on rhel7
- Configuration via puppet
- Authentication via Active Directory users
- Authorization via Active Directory groups / sudo
- Monitoring via check_mk
- Daily security updates
- Predefined package repositories as well as packages - packages that are not defined will be removed
- Logs via splunk
- Metrics via Telegraf -> InfluxDB presented via Grafana
- All of the above where created (and removed) automatically with Ansible Play’s
- Created a portal for adminstrating Active Directory users and groups for access to the cloud-solution