Increase the Productivity of Data Science Projects by Using Amazon SageMaker Notebooks

November 22, 2022 | Comments(0) |

TABLE OF CONTENT

1. Overview
2. Introduction
3. Lifecycle Configuration: Customizing a Notebook Instance
4. Step-by-Step Process to Implement Lifecycle Configuration for a Notebook Instance
5. Conclusion
6. About CloudThat
7. FAQs

 

Overview

Amazon SageMaker provides cloud-oriented services which are at the heart of Data Science workflows. We can leverage the customization of notebooks catered to our tasks and increase Data Science productivity!

Introduction

Amazon SageMaker

Amazon SageMaker is a group of ML services fully managed by the AWS cloud. SageMaker supports Frameworks and toolkits like Jupyter, TensorFlow, PyTorch, etc. It enables developers to create, train, and deploy ML models in the cloud.

Background on Amazon SageMaker Notebooks

A SageMaker notebook is an ML instance running on Jupyter App. It is like running Jupyter notebook locally but only done on AWS cloud with different compute and memory power. They are efficient and easy to deploy to prepare and process data, write code to train ML models, or deploy/host them. Notebooks can run on prebuilt kernels which are optimized for specific tasks; for example, conda_pytorch_p36, Sparkmagic, conda_python3, etc.  SageMaker Notebooks support native Amazon Linux 2(AL2) and Amazon Linux (AL1) OS, and they are fully maintained by the AWS cloud itself. The instances themselves can run on many instance types (with differing CPUs and memory power) and as per your requirements, they can be deployed.

Within a SageMaker instance, we can even create multiple Notebook instances, and each instance runs separately or is a standalone instance

Some of the features that SageMaker notebook instances are:

  • Fully managed and Scalable cloud infrastructure

As it is a serverless service, AWS cloud takes care of all the infrastructure for you. This includes software and security updates/patches, maintenance, etc.

  • Support for TensorFlow, MXNET, Keras, etc

There is default support for ML-related libraries on every SageMaker instance and other libraries can be imported or customized from the start using a lifecycle configuration, suitable for an ML project or task.

  • Automated labeling tool and workflow

SageMaker Ground Truth can be utilized for labeling tasks that can be pivotal in ML models.

There are a lot of other features that are native to the SageMaker instances and others that can be integrated with it.

Lifecycle Configuration: Customizing a Notebook Instance

A lifecycle configuration is just really a shell script that only runs whenever you create a notebook or start one. Every time a notebook is created, a new lifecycle configuration is created, and the scripts run with it. For example, a sample lifecycle script looks something like this

sage1

A lifecycle configuration always runs as the root user, or it should be run as one. To affect only the Jupyter kernel, we need to set the source and then pip installs any packages we want or are required in any of the environments or the notebooks. It should be noted that all of this is done in the conda environment, in which most of the kernels operate

sage2

We can also have a package that is covered and accessed in all conda environments, and the base directory where all the environments should be set to “/home/ec2-user/anaconda3/envs/*”. We can also install monitoring on the instances every time an instance starts, which can then send the logs to the cloud watch for generating alerts. This use-case makes it easy since we don’t have to do it every time, we run an instance notebook or do it manually from the terminal when we can do it from the AWS console. Another use-case would be to run some tasks periodically which often requires lambda to be run in the background.

Step-by-Step Process to Implement Lifecycle Configuration for a Notebook Instance

To create a lifecycle configuration, we must do the following:

  1. In the SageMaker console, under SageMaker Dashboard, go to Lifecycle Configuration
  2. We create our first lifecycle script that will run every time we run a notebook instance

sage3

3. Now we must write a custom script installing packages/libraries that are required by you every time you run a notebook instance

sage4

In the following code, I am entering the shell as an ec2-user, which is the root user and then selecting the environments that need to have those libraries installed. The envs are then looped through and activated and each of the following libraries is installed, and finally, then the same environment is deactivated to finish the script process. I copied the script that I wrote into the script are then we click on the create configuration.

sage5

4. Now when we create an instance, in the Additional configurations, we need to add the select the Script

sage6

Conclusion

SageMaker provides us with fast, scalable notebook instances that can be launched in a matter of minutes, and with the help of lifecycle configurations, we can make notebook instances curated to our needs, making them highly customizable and easy to manage from the start to moment the notebook instances stop running. In this way, we can be more productive in our Data Science projects or tasks without having to do things repeatedly, saving us a lot of time.

About CloudThat

CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding SageMaker and I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.

FAQs

  1. What are some of the benefits of using SageMaker?

A. SageMaker is a fully managed, server-less machine-learning cloud service which can be leveraged to do ML model creation at scale. We can create and manage SageMaker notebooks and automate them. This requires little effort and is one of the major advantages of using AWS SageMaker.
2. Are there pertained models available on SageMaker?
A. SageMaker Studio in fact has a plethora of highly accurate, pre-trained models and algorithms/solutions at hand if one wants to quickly deploy and test, or maybe use them in the projects.
3. Can I customize my SageMaker notebooks?
A. Yes, SageMaker notebooks can be configured using lifecycles rules to cater to one’s needs and requirements.


Leave a Reply