Azure Managed Instance For Apache Cassandra – Part 2

June 6, 2022 | Comments(1) |

Note to Readers:

This blog is a 3-part series for Apache Cassandra’s No-SQL database. In the first part of my blog, I have given a brief overview of Open-Source Apache Cassandra in Part 1. This next part of the series deals with Azure Managed Instance for Apache Cassandra – its features, Use case, Installation, and Connectivity.

Azure Managed Instance for Apache Cassandra is one of the many managed services offered by Microsoft Azure Cloud. Since this service went GA in November of 2021, there have been continuous feature additions to the service. I was a part of the team that worked closely with the Microsoft Cosmos DB team to evaluate the Cassandra MI features for Production use. I have mostly referred to Datastax, Apache, and Microsoft official web pages to produce the content in this blog.

Happy Reading!!

TABLE OF CONTENT

1. Introduction
2. Deploying Cassandra MI
3. Key Considerations while Deploying Cassandra MI Resource
4. Steps to Deploy a Cassandra Datacenter on Azure Portal
5. Steps for connecting to the Cassandra Datacenter
6. Assessing Node and Datacenter details Azure CLI
7. CQLSH Commands
8. Conclusion
9. About CloudThat

Introduction:

Cassandra MI or Azure Managed Instance for Apache Cassandra is a highly scalable, highly available service entirely managed by Azure. The cluster can be deployed automatically, and Microsoft Azure takes the cluster updates, patching, replication, and cluster health. Customers can use all the features of open-source Cassandra in Cassandra MI. They can create hybrid deployment by extending their on-premises Cassandra Cluster to the Azure cloud and running their data centers with the least maintenance overhead.

Deploying Cassandra MI:

Cassandra MI deployment pre-requisites:

Azure Services Needed:

  1. Azure Subscription
  2. Azure Resource Group
  3. Azure Virtual Network with /16 Address Space
  4. At least 1 Subnet /24 Address Space

IAM Permissions and NSG Settings:

  1. IAM role of “Network Contributor” for Azure Cosmos DB Service Principal.
  2. Network Security Group Settings:
    a. Outbound Rules should allow the following Service Tags for HTTPS Protocol on Port 443:
    i.Storage
    ii. AzureKeyVault
    iii.EventHub
    iv. AzureMonitor
    v. AzureActiveDirectory
    vi. AzureResourceManager
    vii. AzureFrontDoor.Firstparty
    viii. GuestAndHybridManagement
    ix. ApiManagement
    x. For IP address: 104.40.0.0/13 13.104.0.0/14 40.64.0.0/10b. Cassandra MI listens on the below ports for its various services, so these ports should not be assigned for any other services or resources:
    i. 8443 – httpsCA (https with Client Authentication)
    ii. 9443 – HTTPS servlet transport
    iii. 7001 – TLS Internode communication (used if TLS enabled)
    iv. 9042 – CQL native transport port
    v. 7199 – JMX
    vi. 9160 – Thrift Client API
    vii. 7000 – Inter-node communication

Key Considerations while Deploying Cassandra MI Resource:

  1. The available Disk SKU Size for the deployment of Cassandra MI:
    1. Standard_E8s_v4
    2. Standard_E16s_v4
    3. Standard_E20s_v4
    4. Standard_E32s_v4
    5. Standard_DS13_v2
    6. Standard_DS14_v2
    7. Standard_D8s_v4
    8. Standard_D16s_v4
    9. Standard_D32s_v4
  2. Only Premium Disk (P30) can be used with Cassandra MI.
  3. When deploying Cassandra MI using Azure Portal, there is a limitation to the number of nodes deployed (100 nodes per Data Center). If more Nodes need to be deployed, use Azure CLI.
  4. On Azure Portal, only Standard_DS14_v2 SKU Size can have up to 64 disks attached, while all the other SKU sizes can have only 32 disks attached. Use Azure CLI for custom addition of disks.
  5. For Cluster Database Management, the Nodetool commands used is under Public Preview. Microsoft Azure does not guarantee any SLAs for running these Nodetool commands as it can destabilize the cluster and is best suited for a non-Production environment. Only limited Nodetool Utility commands are allowed on Cassandra MI.
  6. Since Cassandra MI is a managed service, gossip protocol settings for internode communication. The certificates for secure internode communication, the token generation by partitioner, the replication factor, repair of cluster nodes, back/retention of database data and other native Cassandra features are completely managed by Azure Platform.
  7. For monitoring the Cluster, Azure Monitor can be integrated with Log Analytics Workspace.
  8. For Logging and Visualizing Cluster activity, a Prometheus Server is created along with the cluster by default. Its endpoint is exposed, which can be used to visualize the node datacenter-related metrics on Grafana.
  9. Horizontal scaling can be achieved by increasing the number of nodes, and Vertical Scaling can be achieved by increasing to a larger SKU size. In Vertical Scaling, a new cluster must be deployed with the new SKU size, and then the old cluster will be retired post moving workloads to the new cluster. In both the scaling, there will be downtime involved.
  10. If a node goes down, then Azure will add a new node from the backups taken incrementally every 2 hours without impacting the stability of the cluster or the availability.
  11. A data center needs to have a minimum of 3 nodes in a cluster, and a minimum of 2 P30 Premium Disks with the Disk SKU Size mentioned earlier.
  12. Availability Zones are not available in all regions. Hence for high availability requirements, deploy the cluster in a region that supports AZ. 

Steps to Deploy a Cassandra Datacenter on Azure Portal:

  1. Select Managed Instance of Apache Cassandra from the “All Resources” section.
  2. Select the Subscription, Resource Group, and the Virtual Network/ Subnet previously created.
  3. Give a Cluster name as per a Standard Naming Convention and an Initial Password for Admin login to the cluster.
  4. Select the Region based on your business requirement. Also, tick the box for “Assign Role”
    Apache Cassandra
  5. Change the default Data Center name if needed.
  6. Tick the Availability Zone box for high availability Cluster requirements.
  7. Select the Disk SKU size, the no. of Disks, and the no. of Nodes by dragging the slider or entering the desired number in the box directly.
  8. Add Tags as per the business requirement and finally, review/ create the cluster
    Apache Cassandra

Steps for connecting to the Cassandra Datacenter:

  • Cassandra MI does not get a Public IP address assigned, so a jump server is required with a “cqlsh” command-line utility tool installed on it to access the cluster.
  • Deploy an Ubuntu VM of a minimum SKU size like a DS1 v2. SSH into the VM.
  • Install “cqlsh” and other dependencies on the jump server

  • Azure CLI (version 2.30.0) is required for accessing Cassandra MI resources

  • Connect to the cluster for Managing the Database using the below commands

Assessing Node and Datacenter details Azure CLI:

 Azure CLI command for Datacenter and Cluster creation: 

az managed-cassandra < cluster> <verb> –cluster-name <name> –resource-group <name> — location <az location>

Example: az managed-cassandra cluster create –cluster-name cassandracls01 –resource-group Cassandra-RG –location ap-south-1

az managed-cassandra <datacenter> <verb> –cluster-name <name> –resource-group <name> –data-center-name <name> –data-center-location <az location> —-node-count <no.of node> 

Example: az managed-cassandra datacenter create –cluster-name cassandracls01 –resource-group Cassandra-RG –data-center-name datacenter01 –data-center-location ap-south-1 —-node-count 3 

CQLSH Commands:

  1. General commands on the cluster:
    a. CAPTURE: to capture the commands keyed onto a file
    b. CONSISTENCY: to check the consistency level set on the cluster
    c. COPY <table name> [(<column>, …)] TO <file name> WITH <copy option> [AND <copy option> …]: to copy data to and from the Cassandra tables
    d. CLEAR: to clear the screen
    e. EXIT: to exit cqlsh command line
    f. DESCRIBE: to describe the Tables and Keyspaces on the cluster
    i.DESCRIBE CLUSTER
    ii. DESCRIBE SCHEMA
    iii. DESCRIBE KEYSPACES / KEYSPACE <keyspace name>
    iv. DESCRIBE TABLES / DESCRIBE TABLE <table name>
    v. DESCRIBE INDEX <index name>
    vi. DESCRIBE MATERIALIZED VIEW <view name>
    vii. DESCRIBE TYPES / DESCRIBE TYPE <type name>
    viii. DESCRIBE FUNCTIONS / FUNCTION <function name>
    ix. DESCRIBE AGGREGATES / AGGREGATE <aggregate function name>
    g. HELP: to get help on commands
    h. SOURCE <filename>; – to execute commands from a source file
    i. SHOW: to show the current cqlsh session details, hostname, cqlsh and Cassandra version
  1. Commands to create Keyspaces:
    a. Create keyspace with Replication Factor, Network Strategy, and Datacenter details:

CREATE KEYSPACE <keyspace name> WITH replication = {‘class’: ‘NetworkTopologyStrategy/SimpleStrategy,’ <datacenter_name>: <replication_factor> /  ‘replication_factor’ : <RF no.> }

AND durable_writes = false;
b. Alter a Keyspace:

CREATE KEYSPACE <keyspace name> WITH replication = {‘class’: ‘NetworkTopologyStrategy/SimpleStrategy’, ‘replication_factor’: <RF no.>};
c. Delete a Keyspace:

DROP KEYSPACE <keyspace name> WITH replication = {‘class’: ‘NetworkTopologyStrategy/SimpleStrategy’, ‘replication_factor’: <RF no.>};
d. Other Commands:
i. USE KEYSPACE <keyspacename>; – to use a particular keyspace

3. Viewing the Keyspace:
a. System Keyspaces:

DESCRIBE keyspaces;

SELECT * FROM <systemdb.keyspaces>;

4. Commands to Create, Alter and Delete Columns from Tables:

CREATE TABLE keyspace_name.table_name (<column_name data_type PRIMARY_KEY);

ALTER TABLE <table_name> ALTER <created_date> TYPE <data_type>; – to change the type of column

ALTER TABLE <table_name> ADD <column_name> <data_type>; – to add a column to a table

ALTER TABLE <table_name> DROP <column_name>; – to remove a column

DROP TABLE <table_name>;

TRUNCATE <keyspace_name.table_name>;

5. Insert, Update, Delete data into/from Keyspaces:
a. INSERT INTO <keyspace_name.table_name> (<column_name1>, <column_name2>…) VALUES (‘value1’, ‘value2’…);

Example: INSERT INTO CloudThat (emp_id, first_name, last_name, age, email_id, ‘city’) VALUES (‘00230’, ‘Naman’, ‘24’, ‘naman.b@cloudthat.com’);
b. UPDATE < keyspace_name.table_name> SET <column_name> = ‘<old_value>’ WHERE <column_name> = ‘<primary_key_value>’;

Example: UPDATE CloudThat SET last_name = ‘Sharma’ WHERE emp_id = ‘00230’;
c. DELETE < column_name> FROM <keyspace_name.table_name> WHERE <row_details>;

Example: DELETE age FROM CloudThat WHERE emp_id = ‘00230’;

Conclusion:

To summarize, Azure Managed Instance for Apache Cassandra can be set up and managed with great ease within seconds on Azure cloud, or it can be extended from on-premises Cassandra cluster to cloud using Hybrid connectivity. The above blog focused on setting up a Cassandra MI cluster using Azure CLI and Azure Portal. We also saw how to connect to the cluster and perform CRUD operations on the Database. The next and final blog will focus on Data Migration options onto Cassandra Cluster using native and third-party tools.

About CloudThat:

We here at CloudThat are the official Microsoft gold partner, AWS (Amazon Web Services) Advanced Consulting Partner, and Training partner helping people develop knowledge on the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding Apache Cassandra’s utilization, and I will get back to you quickly. To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings.


One Response to “Azure Managed Instance For Apache Cassandra – Part 2”

Leave a Reply