Ansible and Cassandra

An exercise in how you can use Ansible to build a Cassandra Cluster

July 29, 2019 Greg Smith

4 minute read

This is the final in my series on using Ansible. In this post I summarise how you can use Ansible to deploy a Cassandra cluster using my cass-deployer playbook.

The cass-builder playbook

Once you have cloned the repo you will the standard directory structure that I use for all playbooks.

build-site.yml - contains all the roles that must be executed to build everything that makes up the “virtual” site.
group_vars - contains all variables that have been defined along with their values
inventory - this is where the inventory that is used to build the site is located.
roles - this is a directory containing the roles that have been defined, along with the various sub-directories that make up the roles hierarchy.

Rather than go through the various roles I think it is worth concentrating on the variables I have defined and the build-site.yaml that is used to build the entire cluster.

First lets take a look at the master variables that I have defined in group_vars/all/vars.yaml

#
# Conditionals and configuration

# Playbook version used to create the cluster
playbook_version: 1.1

# If set to true, stage binaries will download tarballs to OpsCenter
# and copy out to the rest of the nodes in the cluster. This may be slower
# and will require additional space on the OpsCenter host
stage_binaries: false

# If set to true, this will build a separate metrics cluster [store] variables and
# configure the agents and OpsCenter to monitor and store metrics in that cluster. If
# set to false ensure that the [store] inventory group still has hosts so that the agent
# has entries to store metrics. i.e. add the hosts entries from the cluster
metrics_cluster: false

# When set to true the next two options will configure a striped logical volume for the
# cassandra data and search indexes.
configure_db_disks: false
configure_search_disks: false

#When set to true this will configure metrics forwarding and sla-checking to statuspage
configure_statuspage: false

#When set to true this will configure an OSS Apache Cassandra Cluster
oss_install: false
oss_url: http://archive.apache.org/dist/cassandra
#
# Usernames and Credentials
creds:
  remote_user: ubuntu

#
# OS Specific Variables
#limits:
#  memlock: unlimited
#  as: unlimited
#  nproc: 32768
#  nofile: 100000

#
# DSE Specific Variables
store:
  stage_versions: ["5.0.12"]
  active_version: "5.0.12"

dse:
  stage_versions: ["5.0.12","5.1.14"]
  active_version: "5.1.14"

agent:
  stage_versions: ["6.5.5"]
  active_version: "6.5.5"

opscenter:
  stage_versions: ["6.5.5"]
  active_version: "6.5.5"
  authentication: true

#
# OSS Specific Variables
oss:
  stage_versions: ["2.1.21","2.1.4"]
  active_version: "2.1.21"

java:
  version: 8
  max_heap_size: 2G
  heap_newsize: 300M

The playbook will pull down tarballs from either DataStax or the Open Source Apache Cassandra URLs. If you are using DataStax software, it will optionally configure a metrics cluster (default : false) that will be used by OpsCenter to store metrics. In addition you can optionally configure disks using LVM for the data directories and even your search directories, again by default these are set to false.

The playbook can handle upgrades of the cluster in a rolling fashion my simply changing the values of the stage_versions and the active_versions. The variable stage_versions controls what tarballs will be downloaded and extracted. Whilst the active_version determines which version will by used when starting/stopping Cassandra.

A rolling upgrade is a two step process. Firstly modify the stage_versions variable to include the current version you are using and the version you want to upgrade to, then run build-site.yml. Then after this completes, modify the active_version variable to the version you want to activate and once more execute build-site.yml. This will trigger a rolling restart of the cluster.

The other item that is worth explaining is the way I have defined my inventory.

[opscenter]

[storeseeds]

[storenonseeds]

[store:children]
storeseeds
storenonseeds

[clusterseeds]
10.101.33.124

[cluster:children]
DC1
DC2

[DC1]
10.101.33.124
10.101.32.53

[DC2]

This is a sample minimal inventory, but hopefully illustrates the intent.

[opscenter] is optional and would contain the IP address of a DataStax OpsCenter host.
[store….] is optional and would contain the IP addresses of the seed and non-seed nodes that make up the storage cluster used by the OpsCenter host to store metrics.
[cluster….] is the section of the inventory containing the definition of the cluster group that defines the IP addresses that comprise the cassandra cluster.

The seed nodes of a cluster must be started first in order to build a cluster and consequently I created a separate group in the inventory for just these nodes called [clusterseeds]. THe actual group cluster uses a trick in the inventory definition that defines that group in terms of children. In this case the cluster group comprises of children in DC1 and DC2.

I use these DC groups, so that in my group_vars I can build variables that are DC dependent. For example the name of the DC is controlled by the variable dc which differs in group_vars/DC1/vars.yaml compared with group_vars/DC2/vars.yaml.

Throughout the playbook are some interesting little tricks and generally I think it is a good example of what is possible leveraging Ansible. Take a look and by all means feel free to clone and hack away.

The Data Blog

Ansible and Cassandra

The cass-builder playbook

Recent posts

Hosting with Amazon S3 and CloudFront

Dynamic DNS and Kubernetes

Kubernetes for Websites

About