Once you have created a live connection to your data source you can create one or more datasets in data.world to work with the data in the data source. A dataset is simply a repository of data including data files and associated metadata, documentation, scripts, and any other supporting assets that should be stored alongside the data.
You can create a new dataset by selecting the + New dropdown on the right side of the header bar and select New dataset:
You'll need to name your dataset, set the owner, and set the permissions. By default your organization will show up as the owner of the dataset, but you can use the dropdown to make it a personal dataset, or set the owner as another organization of which you are a member. By default the permissions to the dataset are set to No one so even if the owner is your organization it won't be visible to anyone but you and the admin(s) of your org. If you want to share with the rest of your organization select the All of ... option:
After you select Create dataset you'll have the option to add a description and connect to your data source with your newly created connection:
When you select Add data you'll be taken to a screen where you can choose a variety of options including your virtual connection listed under MY DATA SOURCES:
When you select your connection name you'll have the choice of creating the dataset with a live connection or a data extract:
Data extracts are not available for all data sources. If this option is not available for your data source it will be greyed out--as shown above.
The main differences between a live table and and a data extract are as follows.
On a live table :
Data continues to live at its source and will not be ingested into data.world.
Any queries executed against this dataset will be translated and executed in the data source.
Users may select tables to pull into the dataset, but cannot specify a SQL query.
With a data extract:
Data will be pulled into data.world and processed into our internal representation.
You can set it to update at specific intervals from the source.
Users can select tables to pull into the dataset or specify a SQL query whose results should be pulled into the dataset.
After choosing live or extract you might be prompted to select a database, and then a schema, followed by tables--or you might just be presented with a list of tables. Your options are determined by the data source.
When you get to the table selection you have the option of adding one or many at the same time. If you want to use all of the tables in your dataset, select Name at the top of the list:
Select Import ... tables and when the tables have been linked you will receive a confirmation and a reminder of whether this data is from a live connection or brought in with a data extract:
Finally you will get a confirmation that your dataset has been created and when you close that window you'll be taken to your dataset overview page where you can document the dataset and edit the metadata: