Note to Readers:
This blog aims to deep-dive into Azure Cassandra MI, which became GA in Nov 2020. I have mostly referred to Datastax, Apache, and Microsoft’s official web pages to produce this content. If you feel any information is missing, please connect with me.
Today’s business is driven mainly by new ideas and innovative solutions that rely on information and intelligence gathered by an unending stream of Data flowing from various sources. Organizations can customize their solutions according to customer needs and markets. However, not always is this data in a structured manner. This is where NoSQL databases come into focus. There are many options available in the market for NoSQL databases. Apache Cassandra is one of the most sought-after non-relational databases today. Cassandra is a highly available and scalable distributed No-SQL databases system with no single point of failure. Unlike SQL databases, Cassandra clusters are schema-free and designed to handle large amounts of incoming data and hundreds or thousands of writes per second.
Facebook developed Cassandra to handle their Inbox search feature. It became open-sourced in 2008 until Apache Incubator accepted it in 2009. Since early 2010, Apache has been a top project due to its growing popularity.
Apache Cassandra Architecture:
Image source: Tutorialspoint.com
A Cassandra ring consists of homogeneous nodes connected in a peer-to-peer model. The schema in Cassandra is not fixed, and the queries performed on the cluster are based on the Partition+ Clustering key combination. Therefore, it doesn’t support Cross Partition Transactions, Distributed Joints, and Foreign Keys.
A Cassandra cluster can be deployed on a single VM, multiple VMs, Kubernetes, or the cloud (AWS, Azure). The main reason behind Cassandra’s distributed architecture is that a hardware failure can and does occur frequently. All distributed storage system works on the principle of the CAP theorem. CAP theorem states that any distributed system can powerfully deliver any two out of the three properties: Consistency, Availability, and Partition-tolerance. However, in the case of Cassandra, it maintains a wonderful balance between consistency, availability, and to an extent, Partition Tolerance.
- Consistency in Cassandra Cluster:
Cassandra is referred to as an “eventually consistent” system, which means that all the nodes will have the same data set at no point in time. However, the consistency level can be tuned for reading and writing operations by a database client. Here, the “Consistency Level (CL)” can be defined as the response from the number of replicas before a read/write operation is considered successful.
Commonly used CLs: One, Two, Three, All, Quorum, Local, Each.
Multiple copies of data are written to nodes on the cluster. Partition Tolerance:
Regarding Partition Tolerance, the Cassandra cluster continues to function even if a node or a group of nodes goes down.
The main components of a Cassandra cluster include the following:
- Node: the smallest unit of a Cassandra cluster or a single instance of Cassandra.
- Rack: a Cassandra rack is a logical grouping of nodes within the ring.
- Datacenter: is a logical set of racks.
- Cluster: is a component that contains one or more data centers.
- Seed Node: is used to bootstrap a node when it first joins a cluster.
- Keyspace: is a namespace that defines data replication on nodes.
- Snitch: defines groups of machines into data centers and racks (the topology) that the replication strategy uses to place replicas.
Internode Peer-to-Peer communication:
Internode communication takes place with the help of the “Gossip” protocol. Nodes constantly share state and location information about themselves and up to three other nodes. Moreover, the state information of each node is persisted locally to enable access to this information quickly by a node in case it restarts.
Image source: Cassandra.apache.org
When one of the replica nodes becomes unavailable, the write operations on this node go unsuccessful. In this situation, a coordinator in the cluster creates a “Hinted Handoff” and persists it locally for nearly 3 hours. Once the node comes back up and starts gossiping with other nodes, hints are played to this node until it is in sync with the other nodes.
For Read operations, if there is a mismatch in the data residing on the read replicas, a Read Repair is performed to ensure an UpToDate data is presented to the customer.
The incoming data into a Cassandra Cluster needs to be directed to one of the nodes for processing. The Partition key is taken care of in this task, also known as the Primary Key. Each node has an equal part of the Partition key hash value. A partition key is converted into a hash value by a Partitioner, issuing tokens and determining which node will be responsible for which token range. Murmur3Partitioner is the default Partitioner for Cassandra. Each node is assigned tokens that are signed integer values between -2^63 to +2^63-1.
For example, if there are 4 nodes in the cluster, the token range is between 0-99, node1 gets 0-24 key hash value, node 2 gets 25-49, etc.
Data is replicated across nodes for high availability. “Replication Factor (RF)” determines the total number of replicas maintained across the cluster in each keyspace. E.g., if RF is set to two at a datacenter level, two copies of data are maintained on two different nodes. Caution should be taken while setting the RF. It should never be less than one and never be more than the number of nodes in the cluster. RF can be set at the Keyspace level as well.
One copy of data resides on the owner node with a token range for that data. The remaining replicas are placed on other nodes in the cluster clockwise depending on the set Replication strategy.
Two main replication strategies are available:
- Simple Strategy: use only for a single data center and one rack.
- Network Strategy: used when there are clusters deployed multiple datacenters.
Along with the Replication Strategy, a “Snitch” helps route the requests efficiently as they keep an account of the network topology of the Cassandra cluster.
Read / Write Paths in a Cassandra Datacenter (Data Storage):
Writes in Cassandra Reads in Cassandra
Image source: baeldung.com
A coordinator (one of the Cassandra nodes) acts as a broker to facilitate read/write processes in a cluster for both reads and writes.
Both actions depend mainly on Consistency Level, Replication Factor, and the network topology configured per cluster. Cassandra is designed to be write-intensive and compared to other no-sql databases; the write performance is outstanding in its architectural design.
Steps involved in data Writes:
- When a client performs a write action, data is first written to the “commit log”, an append log to temporarily store the data in case of an unexpected node failure. Compressors like LZ4, Snappy, Deflate, and Zstd are compressed data while writing to commit logs. Commit log stores logs for multiple tables.
- A copy of the same data is also written into a cache system called “memtable”, which consists of partition index with a token which decides the location of the disk where the data is to be written. There is one memtable per table, and the data written to this in-memory system gets flushed to the SSTable when either the memtable or commit log reaches its configured threshold. This configuration can be set on Cassandra.yaml file.
- “SSTables” are immutable files that flush and store data permanently on the disk. As Cassandra is a group of nodes, all the SSTables are compacted into one SSTable before persisting it on the disk, and eventually, newer SSTables replace older SSTs. Each SSTable is made of this separate DB’s data.db, Index.db, Filter.db, Summary.db, TOC.txt, Statistics.db, CompressionInfo.db and Digest.crc32.
Steps involved in Data Reads:
- Reads are more time-consuming and long in Cassandra as there are multiple locations Cassandra tries to read to data from in a sequential manner.
- The first location data is read from is the “memtable.” Cassandra combines this data with the multiple SSTables across various nodes before presenting the result to the client if the data is present here. Based on the set Read Consistency Level, the operation is considered a success or failure.
- If the data is not present in the memtable, the “row cache” is next checked. Row cache is a subset of partition data stored on a disk in the SSTables.
- If the data is not present even in the row cache, then the “bloom filter” is inspected to see which of the SSTables has the partition data that is being requested. It helps narrow down the process of partition key lookup, with the probability of either finding or not finding certain partition data.
- The three other locations Cassandra tries to locate the requested data are related to the “Partition Cache” (stores a cache of the partition index in off-heap memory), “Partition Summary” (again off-heap in-memory structure that stores a sampling of the Partition index). Both of these structures help in reducing the time to find the location of the Partition data location.
- Through the “Partition summary,” the possible range of the Partition key is found, and then the Partition index is searched for the desired Partition key. This Index resides on the disk with mapping all the Partition keys to their offsets. Cassandra goes through a single read through the narrowed range of Partition keys, finds the information, and moves to the Compression offset map to find the actual data block on disk, which is usually compressed.
- Finally, the exact location of the desired data is found on the disk using pointers stored on the “Compression offset map,” which fetches the data from the correct SSTable using either the Partition Key Cache or Partition Key Index. Finally, the read query is getting the requested result.
Apache Cassandra’s structure is “built to scale” and handles massive amounts of data. Every corporation stores huge amounts of data across various systems and requires providing control and access to users to this data; Apache Cassandra makes this a reality without a single point of failure, very less downtime, and significantly improved uptime. Due its varied benefits major companies are utilizing Apache Cassandra to leverage their business requirements.
This blog is a 3-part series for Apache Cassandra’s No-SQL database. The first part gives the reader a brief overview of Open-Source Cassandra. The subsequent parts will deal with Azure Managed Instance for Apache Cassandra features, Use cases, Installation, Connectivity, and Migration options.
We here at CloudThat are the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge on cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Drop a query if you have any questions regarding Apache Cassandra utilization, and I will get back to you quickly. To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings.