Syncing Data From On-Premise To AWS S3 Using Data Sync

Introduction

Data synchronization is syncing data across two or more devices and automatically updating changes between them to ensure system consistency.

While the massive amount of data stored in the cloud poses issues, it also makes it the ideal option for big data. Today’s data solutions provide quick and straightforward tools for avoiding repetitive activities, resulting in data in sync throughout the system.

Agent creation

Use the CLI command below to acquire the most recent DataSync Amazon Machine Image (AMI) ID for the selected AWS Region.

aws ssm get-parameter –name /aws/service/datasync/ami –region $region

Launch the agent using your AMI from the Amazon EC2 launch wizard from the AWS account where the source file system is stored. To start the AMI, go to the following URL: https://console.aws.amazon.com/ec2/v2/home?region=source-file-system-region#LaunchInstanceWizard:ami=ami-id

Note:- Please make sure that you are launching a machine having RAM of 16 GiB or more

Here I am launching t2.xlarge for my agent with public IP and opening an HTTP port.

Once your agent creating is done now, let us go to DataSync.

Cloud Migration
Devops
AIML & IoT

Know More

Step by Step Guide for Configuring DataSync

Open the DataSync service from the AWS Console and select the below option for the data transfer option. Click on Get Started.
Select the Hypervisor as AWS EC2, select the Endpoint type as Public service Endpoints, and provide the public IP of the agent in the agent address as below
Once the activation key retrieval is successful you can provide the agent with a name and click on create button then your agent will be created and shows Agent status as Online.
Before creating a task for the DataSync, create an EC2 (t2.micro) which will act as an On-premise system.
Please open the RDP and NFS ports in the On-premise security group
Now install the NFS server On-Premise such that it will sync the local files to the cloud through NFS.
Create a folder and text file inside it. In my case, I have created a folder with the name test and a text file with the name sampletext.txt, which we want to sync. Go to the properties of the folder. Select the NFS sharing option and click on Manage NFS sharing and check the share this folder as shown below.
Click Apply and then OK to save the configuration over the folder
Now go to the DataSync console and create a task and choose the configuration details as below:
a. Location as the new location
b. The location type should be Network File System (NFS)
c. Select the agent which we created in the steps
d. Provide the Ip address in the NFS server domain
e. Provide the mount path in my case as /test
f. The destination location is S3, and select the S3 bucket to where these files need to be sync
Leave the remaining fields as default and click on create
Select the autogenerate for creating the IAM in the provided fields
Once the previous step is completed, wait until your task’s status becomes available.
Now start the task and wait for a while; this starts syncing your on-premise data to AWS S3

Conclusion

We investigated and highlighted the relevance of data synchronization in this blog. When just the changed data is transferred, data synchronization will work smoothly. As a result, each synchronization procedure uses a marker to determine the most up-to-date information.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

Cloud Training
Customized Training
Experiential Learning

About CloudThat

CloudThat is the Databricks PArtner, AWS (Amazon Web Services) Advanced Consulting Partner and Training partner, and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding Data synchronization syncing data from on-prem to AWS S3, or AWS services, and I will get back to you quickly. To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings.

FAQs

1. Where can I move data to and from?

ANS: – DataSync supports the following storage location types: Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, Google Cloud Storage, Azure Files, AWS Snowcone, Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, and Amazon FSx for OpenZFS file systems.

2. Can I use AWS DataSync to copy data from other public clouds to AWS?

ANS: – Yes. Using AWS DataSync, you can copy data from Google Cloud Storage using the S3 API or Azure Files using the SMB protocol. Deploy the DataSync agent in your cloud environment or on Amazon EC2, create your source and destination locations, and then start your task to begin copying data. Learn more about using DataSync to copy data from Google Cloud Storage or Azure Files.

3. Does AWS DataSync preserve the directory structure when copying files?

ANS: – Yes. When transferring files, AWS DataSync creates the same directory structure as the source location’s structure on the destination.