If you have a database hosted on Amazon Web Services(AWS) that is not publicly accessible, you can allow data.world to connect to it by using an SSH tunnel
In this scenario, you will launch a publicly accessible SSH server (sometimes called a bastion server) in the same AWS Virtual Private Cloud (VPC) as your AWS database.
You will then configure data.world to connect to your SSH server instead of directly to the database. The public SSH server will forward data.world's requests to the private database.
By using this type of connection, you can keep your database hidden from the public internet and instead rely on your SSH server to handle the security and access control for connections to that database.
Task 1: Find your database's VPC and Port
Login to your AWS Management Console.
For RDS databases(including SQL Server, MySQL, PostgreSQL, or Oracle), click on Database>RDS in the center of the page.
Click on Databases on the left side of the Amazon RDS page.
Click the DB Instances link in the middle of the page to see a list of active databases.
Click on the name of the database in the DB identifier column to open its details page.
Within the Connectivity and Security section at the bottom of the page, take note of the following information for use in upcoming Tasks 2 and 5:
- Endpoint & port > Endpoint
- Endpoint & port > Port
- Networking > VPC
Task 2: Create an SSH server on an Amazon EC2 instance
Go to the AWS Management Console main page and click Compute>EC2 from the center of the page.
In the middle of the page, select Launch Instance.
The Instance configuration requires the following steps:
- Choose AMI (Amazon Machine Image): Any Linux based AMI will be appropriate for this setup - SSH is the only program required.
- Choose Instance Type: This will be a low memory and storage application. Depending on your usage requirements, a free tier t2.micro instance may be sufficient.
- Configure Instance: ensure the following settings are configured:
- Network: select your database's VPC from Task 1
- Subnet: Choose a public subnet
- Auto-assign Public IP: Use subnet setting(Enable)
- Add Storage: Accept defaults
- Add Tags: Accept defaults
- Configure Security Group:
- Assign a security group: Select Create a new security group
- Security group name: data.world to SSH
- Description: Bastion server for forwarding requests from data.world to private database
Modify the first line of the security rules to show the following values: - Type: SSH
- Protocol: TCP
- Port Range: 22
- Source: Enter your public IP address followed by /32. You can use a Google search to find your IP
- Description: My IP for configuration
Click Add Rule and enter: - Type: SSH
- Protocol: TCP
- Port Range: 22
- Source: 52.3.83.134/32
- Description: data.world inbound connection
Click Add Rule and enter: - Type: SSH
- Protocol: TCP
- Port Range: 22
- Source: 52.205.195.10/32
- Description: data.world inbound connection
Click Add Rule and enter: - Type: SSH
- Protocol: TCP
- Port Range: 22
- Source: 52.205.207.86/32
- Description: data.world inbound connection
- Review: Verify the above settings were entered correctly and create the instance.
- When launching the instance, you will be prompted to select an existing key pair or create a new key pair. If you are using a previous key pair, you will need to have a copy of the key .pem file stored on your local computer from when you created the keys. If you're creating a new key pair, download that .pem file now. You'll use that key .pem file in Task 4.
Task 3: Enable forwarding from the SSH server to the database
Go to the AWS Management Console and select Compute>EC2.
Select Instance>Instances on the left side.
Find the instance you just created in Task 2 from the list in the main section of the page. Click on it to load its details in the lower part of the page.
You'll need to know the Public DNS and Private IP addresses shown for the next step - so keep them handy by creating a new, duplicate browser tab to complete the next set of steps.
Create a new security group
- In your new tab, select Network & Security>Security Groups on the left side of the EC2 page.
- Click Create Security Group
- Enter the following values:
- Security group name: SSH to your database name
- Description: "allows traffic from the SSH server to database"
- VPC: this is the same VPC used for the SSH server and database
- With the Inbound tab selected at the bottom of the window, click Add Rule
- A new row will populate in the list of rules. Enter the following values:
- Type: Custom TCP Rule
- Protocol: TCP
- Port Range: the port number of the database found in Task 1
- Source: Custom; in the blank box to the right enter the Private IP address you found in the beginning of Task 3 with /32 added to the end
Add the security group to your database
- From the AWS Management Console, click on Database>RDS
- Select Databases on the left side
- Click on the link to your database in the DB identifier column
- On the database details page, click on the Modify button on the top right
- Scroll down to the Network & Security section; from the Security group drop down menu, add the security group that you created in the previous section (e.g. "SSH to your database name")
- Save the changes by scrolling to the bottom of the page and clicking the Continue button
- On the following page, choose when to apply the changes, then click Modify DB instance
Task 4: Configure an SSH user for data.world to use to connect
- On MacOS or Linux, open a new Terminal window. For Windows, use an SSH client such as Putty or OpenSSH
- Within the terminal, navigate to the location where you downloaded the .pem key file you generated in Task 2, Step 8
- Set the permissions for the key file to be not publicly viewable, using the appropriate name if your key pair is different than our example of ssh_tunnel:
chmod 400 ssh_tunnel.pem
- Connect to your AWS SSH server from the terminal using the default user for your EC2 instance. AWS will generate the command tailored to your specific instance - to find that command, you can navigate to your EC2 Instance details page and click the Connect button at the top.
The general form will be:ssh -i "your key file.pem" <Your default EC2 user>@<Your EC2 Public DNS>
For other options, please see additional guidance from Amazon at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstances.html - Once you're connected to your server, create a user group named datadotworld:
sudo groupadd datadotworld
- Create a user named ddw:
sudo useradd -m -g datadotworld ddw
- Switch to the user named ddw
sudo su - ddw
- Create a hidden directory called .ssh to upload your public key, setting its permissions appropriately:
mkdir ~/.ssh
chmod 700 ~/.ssh - Enable read and write permissions for the owner on that file
- Create an empty file called authorized_keys and provide read and write access to its owner:
touch authorized_keys
chmod 600 authorized_keys - Get the public key value from data.world by doing the following:
- Open a new browser tab
- While logged into data.world, go to https://data.world/integrations/categories/database and click on the tile of the database type you'd like to connect
- If that integration has not yet been enabled, click the Enable Integration button (otherwise go the Manage tab and choose Add new connection)
- In the new window that opens, click the Advanced Settings tab and check the Use SSH Tunnel box
- Copy the SSH public key
- Leave this window open as you'll return to it in the next Task to complete the configuration
- Back in the terminal, add the public key to your authorized_keys file with the following command:
echo "<Your Public Key>" >> ~/.ssh/authorized_keys
Include the quotation marks but replace <Your Public Key> with the key you just copied from data.world in the previous step
Task 5: Configure the database connection in data.world
- Return to the database configuration tab on data.world that you opened in the previous step
- Within the Advanced Settings, enter the following values:
- SSH host: this is the Public DNS(IPv4) value in your EC2 instance created in Task 2
- SSH user: the name of the SSH user you created in Task 4 - that's "ddw" if you followed our suggestion
- Go to the General Settings tab and enter the following values:
- Display name: your choice - this is how your database connection will appear in data.world
- Host/IP: the Endpoint value you found in Task 1
- Port: the Port value you found in Task 1
- Connection username: a valid user in your database instance
- Connection password: the database password for that user
- Test the configuration and save.
With your database configured, you can now use any of the Add data mechanisms to import data into a dataset or project on data.world.