Project Details
Customer
Former Employer
What we did
Complete installation of Openstack and Ceph
Year
2018 / 2019
Products
Red Hat Openstack Platform Red Hat Ceph Storage Red Hat OpenShift Ansible
Datacenters
3
Nodes
~ 40
Lines of code
~ 5000

A Complete installation of Red Hat Openstack and Red Hat Ceph Storage

The goal of this project was a scalable internal cloud platform, and the reason the customer needed this boiled down to a few key points,

  • Customers (ie. devolopers and product owners) required an API (ie. cloud functionality) to write infrastructure as code
  • They needed to get a way from separated hypervisors with separate configuartion (no scalability)
  • They needed scalable storage and not individual disk arrays
  • Last but not least, they needed a modern platform for their traditonal VM’s (for things like live migration between datacenters)

This was a huge project and included,

  • Determining the specification of requirements for the customer
  • Ordering and mounting hardware
  • Configuring and Designing the stack (storage pools etc.)
  • Configuring and installation
  • Designing end user usage (such as ordering project processes etc.)
  • Hardware / Usage monitoring and regular follow ups

The installation for the whole stack was done via an automated deployment, consisting of

  • A Red Hat Director configuring and installing roughly 21 compute nodes and 3 controllers, divided over three datacenters
  • A Ceph Ansible Play configuring 15 storage nodes and 3 monitors
  • Pre and post scripts that configured storage pools, projects creation, and user authentication and authorization (Active Directory, Kerberos and Single Sign on)

The end result is a fully working production stack consisting of,

  • 3 Datacenters, ie. 3 availability zones (AZ) – both for computes and storage
  • 3 Controllers, 1 in each AZ – Handles all API’s and routes traffic in / out from the cloud via neutron
  • 21 Compute nodes for computing (7 computes in each AZ)
    • 17.000 vCPU’s presented to the users
    • 12 TB vRAM presented to the users
  • 15 Storage nodes with rougly 135 disks (a third of those disks was traditional HDD’s used for rgw/s3-pools, and the rest was SSD’s), roughly 135 TB raw data
    • 3 Local storage pools with 3 replicas, all 3 in the same datacenter
    • 1 Streched storage pool with 3 replicas, 1 in each datacenter
    • Ceph presents raw block-devices through librbd on the computes
    • Users can use rgw / s3 to utilize object storage

On top of all that we also,

  • Designed and standardized they way internal cloud projects was ordered and created
  • Standardized the way traditional VM’s was created and maintained (ie. for projects not using kubernetes / docker)
    • Standardized platform based on rhel7
    • Configuration via puppet
    • Authentication via Active Directory users
    • Authorization via Active Directory groups / sudo
    • Monitoring via check_mk
    • Daily security updates
    • Predefined package repositories as well as packages - packages that are not defined will be removed
    • Logs via splunk
    • Metrics via Telegraf -> InfluxDB presented via Grafana
    • All of the above where created (and removed) automatically with Ansible Play’s
  • Created a portal for adminstrating Active Directory users and groups for access to the cloud-solution