Introduction to AWS DataSync
There has been increased demand to back up our files to a cloud environment for long-term storage to cover any disaster-related incidents. Customers want to securely migrate their data with the help of reliable utility tools which help them to do so. Moreover, they want to automate these tasks through a reliable mechanism to transfer their data to the cloud-like AWS in S3, EFS, Windows file server, etc.
AWS DataSync helps connect your on-prem storage to S3 and much more with a reliable automation architecture. File share systems like NFS (network file system) and SMB (Server Message Block) can now be integrated into AWS DataSync to transfer your required files.
DataSync allows you to transfer all the files or only the changed data the next time you start transferring; it does so by using metadata related to previous data captured, which helps decrease the transfer size and the related time to transfer. AWS DataSync uses a DataSync agent, either installed on on-premises hardware such as VMware, Hyper-V, or an EC2 machine using AWS-provided AMI. This server helps in connecting the source endpoint of the local server to the target endpoint on S3 EFS, etc.
I will be showing a small but powerful setup where you can transfer the files into the S3 from SMB (Samba) server using AWS data DataSync.
Step to Covert your Linux machine into a Samba Server
Here I am using an Amazon Linux AMI in another VPC to act as a remote Samba server
- Launch Linux EC2 of your choice here; I am using Amazon Linux to install the SMB server.
- In the security group, open port 22and port 445 to anywhere, i.e., 0.0.0.0/0 range (later, we can change this to a dedicated IP)
- It is an Amazon Linux 2 AMI; we need to follow the below steps to change the hostname to a user-friendly name
1sudo vi /etc/cloud/cloud.cfg
- At the end of the file, add the below line and save
- Change Hostname
123$ sudo hostnamectl set-hostname samba-server$ sudo yum update -y$ sudo reboot
- Install Samba, Samba-client, and cifs-utils
12$ sudo su -# yum install -y samba samba-client cifs-utils
- Configuration file changes
1# vim /etc/samba/smb.conf
123456789101112131415161718192021222324252627282930313233343536security = userhosts allow = ip address of Data DataSync agent VM*** Add the loopback IP and the VPC starting ip as shown aboveinterfaces = lo eth0 passdb backend = smbpasswd:/etc/samba/sambapasswd.txt printing = cupsprintcap name = cupsload printers = yescups options = raw[homes]comment = Home Directoriesvalid users = %S, %D%w%Sbrowseable = Noread only = Noinherit acls = Yesprinters]comment = All Printerspath = /var/tmpprintable = Yescreate mask = 0600browseable = No[print$]comment = Printer Driverspath = /var/lib/samba/driverswrite list = @printadmin rootforce group = @printadmincreate mask = 0664directory mask = 0775[samba]comment = Development documentationread only = noavailable = yespath = /smbfolderpublic = yesvalid users = sambauserwrite list = sambauserwritable = yesbrowseable = yes
Make sure to edit the config as above and change the host allow to your data DataSync agent to connect
- We will now create a samba user to access the samba folder directory
1234# useradd sambauser# passwd sambauser# smbpasswd -a sambauser# service smb restart
- Create a directory for file share and give permissions
12# mkdir /smbfolder# chmod 777 /smbfolder
- Restart and test with the data DataSync agent
12# service smb restart# testparm
Create a DataSync agent in EC2 in a VPC
Create the agent in the DataSync in the AWS Portal by navigating to DataSync in the AWS Console
Place the Data DataSync agent ID in the agent address
Create a Task
- Choose location type as SMB server
- Choose the Data DataSync agent you configured
- Put IP of SMB server launched up earlier
- Share the name for the folder
- User and Password for SMB server user
- Choose destination location type as S3 / EFS/ NFS as per requirement
- Give the task a name and verify the options selected in the review screen
Start the transfer
Once the task is created you can start transferring files using the start tab above
Navigate to the S3 bucket and now you can find the files transferred
Data DataSync helps create a reliable connection between your on-premises storage to AWS S3 / EFS/ NFS servers to transfer data. You can also change the storage classes to store the data when it comes to S3 as the destination. By clicking start on tasks, you can start transferring all the files or files that are not yet transferred. You can also delete the files in AWS storage by choosing the delete option to DataSynchronize your on-premises drives completely.
Here at CloudThat are the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Feel free to drop a comment or any queries that you have regarding AWS services, cloud adoption, or consulting and we will get back to you quickly. To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings.
- Which file system is from where we can replicate our data to AWS?
Ans: File systems such as NFS, SMB, and HDFS can be set up for on-premises storage locations. Moreover, Amazon EFS, Amazon FSx, and AWS S3 can also be made as source points for data capture.
- Can we change the storage class when choosing the destination location to AWS S3?
Ans: Yes, at any point in time, once the AWS DataSync agent is set up, you can change the storage class for a new task where the destination is S3, such as Standard, Glacier, or Deep-archive.
- Can we use CloudWatch to monitor Data DataSync tasks?
Ans: Yes, you can monitor the files which are copied using AWSDataSync through AWS CloudWatch metrics.