Zoomcar – A DevOps Case Study

November 11, 2021 | Comments(1) |

TABLE OF CONTENT

1. Overview
2. Risk Assessment
3. Proposed Solution
4. Third-Party Solutions Used
5. AWS Services Used
6. Architecture Diagram
7. Results and Outcome
8. Conclusion
9. About CloudThat

 

Overview

The client is an Indian self-drive car rental company headquartered in Bangalore, India. Zoomcar holds the distinction of being India’s first personal mobility platform, with the introduction of car-sharing services in 2013. With a strong focus on the mobile experience, Zoomcar allows users to rent cars by the hour, day, week, or month. In 2018, Zoomcar introduced India’s first peer2peer based marketplace for cars with the launch of its shared subscription mobility model.

The client wants to enhance their working experience by adapting to industry-standard best practices. The team has built applications mainly on Java, Python, Go, and PHP to be deployed on AWS. However, to implement this application, the client faced many issues regarding expertise, resources, and budget issues. To solve these issues, the client reached out to the CloudThat team to facilitate an end-to-end team, streamline their processes with the DevOps best practices and achieve overall project excellence.

Risk Assessment

A carefully designed risk assessment checklist for the client by CloudThat to understand their existing IT system environments. A detailed response shed light on the security posture, networking process, connectivity models, security group management, network analysis, continuous monitoring systems, defined SLAs, and many more.

Proposed Solution

CloudThat’s team worked closely with the client’s development team and IT team to adhere to DevOps best practices and deploy the new application on AWS infrastructure. The functions were split into different domains: infrastructure, monitoring, observability, security, networking.

  • Applications must be deployed in multiple environments like development, QA, UAT, and production.
  • Use an egress VPC to expose internal services to the internet and have all the applications in different VPCs with proper network segregation
  • Implement highly available, scalable, fault-tolerant microservices deployed on Amazon EKS clusters across multiple environments with horizontal pod autoscaling and cluster autoscaling enabled
  • Implement a branching strategy for application development to release cycle
  • Implement Continuous Integration, Continuous Delivery, and deployment to support hotfixes, rollback on failure, multi-environment deployment following the DevOps best practices
  • Have Infrastructure as a Code using Terraform to perform deployment of the infrastructure for all the services
  • Network-level segregation of different applications and having internal app communication via AWS backbone network
  • Infrastructure monitoring in place with AWS CloudWatch, Prometheus, and Grafana with alerts enabled using Prometheus Alert manager to email and Slack channels.
  • Implement application performance monitoring in New Relic to understand and trace dependencies across the distributed system to detect anomalies, reduce latency, squash errors, and optimize customer’s experience
  • Setup Amazon OpenSearch Service for collecting application logs using Fluent-bit and OpenSearch dashboards analyze and visualize the application logs
  • Implement PagerDuty for on-call management, incident response, event management, and operational analytics.
  • Use Akamai as a content delivery network (CDN), cybersecurity, and cloud service tool to have services like web and internet security. Implementing the same for bot detection and alerting system to avoid malicious attacks on mobile and web applications provides an extra edge to the business.
  • Enable logging at every point possible and plan for log storage along with the retention policy and access management.
  • Have a backup strategy and a disaster recovery infrastructure in place
  • Use Jira Boards for planning, tracking, releasing, and reporting the issues to have synchronous workflow within the teams

Third-Party Solutions Used

  • Akamai
  • Prometheus
  • New Relic
  • Apache Kafka
  • Jenkins
  • Ansible
  • Terraform

AWS Services Used

  • Highly available, scalable, fault-tolerant microservices are deployed on Amazon EKS clusters across multiple environments with Autoscaling and Application Load balancers in place.
  • The tightly scoped security groups are configured as a virtual firewall for all the AWS workloads to control incoming and outgoing traffic accordingly.
  • AWS Transit Gateway is used to have secure connectivity between VPCs of different apps.
  • Use VPC endpoints to connect VPCs privately with supported AWS service achieving secure data transfer and reducing the cost and latency
  • To ensure higher control of databases, the databases were deployed on EC2 instances for MongoDB and memsql database as clusters for failover
  • Setup Amazon OpenSearch Service for collecting application logs using Fluent-bit and OpenSearch dashboards analyze and visualize the application logs
  • To ensure high availability of Apache services like Kafka and Zookeeper, need to deploy as clusters on EC2
  • Use AWS Systems manager for vulnerability scanning and patch management
  • Enable AWS CloudTrail logs for all regions stored at a central S3 bucket location with log file integrity validation
  • Implemented network security for EKS involves two elements: a set of rules that restricts the traffic flow between services. The other is the encryption of data traffic while in transit
  • Analyze network in and out to the AWS account using ELB Access logs and VPC flow logs. Each one of the logs contains information about the packet traversing

Architecture Diagram

  1. CICD Flow Diagram
    CICD flow diagram

  2. High-Level Architecture
    High level architecture

  3. VPN Connection Diagram
    VPN Connection diagram

Results and Outcome

We have successfully built a highly secure and robust infrastructure to handle massive traffic.

  • We have containerized 60% of the applications and deployed them on the EKS cluster.
  • We have provided an enhanced monitoring and alerting solution, which is helping improve the performance by reducing significant costs.
  • We have also performed multiple resource optimization activities to reduce the overall cost.
  • We are also implementing several automation solutions for the client using Lambda functions.
  • The CI/CD flow has been set up using Jenkins. As a result, new features are released faster with 99.9% uptime, which was a hurdle for the client to overcome with limited technical expertise.
  • We are also documenting the implementations for future reference.

Outcome metrics

Conclusion

The client has a continuous delivery process with a restructured DevOps team, leading to fewer defects and a change failure rate. In addition, the client is now equipped with cross-skilled engineers who can collaborate on various projects and achieve better operational support to ensure that the fixes are much faster than before. Although team members are assigned to different tasks based on their expertise, we have knowledge-sharing sessions every week to upskill and make everyone aware of every domain.

About CloudThat

We here at CloudThat are the official AWS (Amazon Web Services) Advanced Consulting Partner and AWS DevOps Services Competency Partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Feel free to drop a comment or any queries that you have regarding AWS services, cloud adoption, or consulting and we will get back to you quickly. To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings.


One Response to “Zoomcar – A DevOps Case Study”

Leave a Reply