Data Pre-Processing Using SageMaker Data Wrangler – Part 1

Posted on November 17, 2022 by Arslan Eqbal | Comments(0)

Nowadays, With the increment in the production of a vast variety of data from multiple resources inside the pipelines, the preprocessing steps to manage those amounts of data are also tough in the pipelines. So, to handle the preprocessing steps, Amazon SageMaker has a working functionality to preprocess the data which is known as SageMaker Data Wrangler. With the help of Data Wrangler, we can handle the vast amount of data in the pipeline itself, we just need to set up the flow of the preprocessing steps inside the Data Wrangler service.

Continue Reading…


Migrating Data from AWS Aurora to Delta Lake: Databricks Data Lakehouse – Part 2

Posted on November 17, 2022 by Sai Pratheek | Comments(0)

In our previous blog, we discussed how to migrate data from a database table using Full Load and Continuous Replication operations on Amazon Database Migration Service from an Amazon Aurora RDS instance to an S3 Bucket. In this blog, we will see how we can replicate the continuous data coming from Aurora RDS as Delta Tables on the Databricks Lakehouse Platform to use it for BI and ML workloads.

Continue Reading…



A Guide to Setup Kubernetes Dashboard on Amazon EKS Cluster

Posted on November 16, 2022 by Shivani Gandhi | Comments(0)

A Kubernetes dashboard is a web-based user interface for managing Kubernetes clusters. It facilitates the creation, viewing, and editing of resources (pods, deployments, replica sets, etc.). Kubernetes dashboard also displays basic resource usage information done by the workloads. It also enables us to create, deploy and scale any containerized application via the wizard. The dashboard allows us to monitor our cluster as well as create and modify individual Kubernetes resources. It allows us to easily manage all our cluster resources visually, without having to go to the command line.

Continue Reading…