There are several ways to get your data or metadata into data.world and there is no one right choice. There are benefits to each method, and which you choose will depend on several factors:
- What format is your data in?
- Where is your data currently located?
- What is the size of your data?
- How often does it update?
- What are you going to do with it?
Whichever method you choose, the place you'll put your data is in a dataset. A dataset is the basic repository for data files and associated metadata, documentation, scripts, and any other supporting resources that should be stored alongside the data. Datasets are where all data is stored and documented for later sharing and use in projects.
In this article we'll look at the different places your data might live and how best to get it into data.world. The data sources we'll examine are:
- Databases or data warehouses
- Local files
- Cloud-based storage (in sources like Google Drive, Box, Dropbox or S3)
- Excel spreadsheets
- Data from real-time sources
- Via a URL or RESTful API
- Corporate network
Databases or Data Warehouses
If you work with a JDBC-compatible database or warehouse, you can use our API to directly sync your data into a dataset. Below is a view of the databases supported by our API, and a current list can be found on our database integrations page.
If you wish to leave your data at rest in your existing database or data warehouse, whether it’s on-premises or in the cloud, our Enterprise Tier supports virtualized access to that data source.
The best way to get files from your computer onto data.world is to upload them directly into a dataset. Files uploaded from a computer cannot be automatically synced or updated, but you can manually push new versions up to your dataset, replacing the previous version, as needed. When a new version is uploaded, the older version is still available for auditability and versioning. More about uploading data from local files and versioning can be found in our article Adding data files.
Documents that are stored in cloud-based storage services (e.g., in Google Drive, Box, Dropbox or Amazon S3) can be easily added to data.world with one of our integrations and set to sync so that they update automatically:
As with manual updating, versions of files that are automatically updated are also kept for reference. More information about adding cloud-based files can also be found in the article Adding data files.
For Excel spreadsheets, data.world has created a specific add-in that's available on the AppSource or from within Excel. The add-on allows you to work with your data in Excel while at the same time sharing it in a dataset with others who may not have or use Excel:
See our Excel integration page for more information. Of course if you so choose you can always either upload your Excel spreadsheet into a dataset like you would any other file type, or you could put in a cloud service like Google Drive, Box or DropBox and add it to the dataset there so it can automatically sync between the two. Versions of Excel files that are uploaded or synced are also kept for future reference.
Data from real-time sources via streaming
You might have data that updates in real-time that you would like to put on data.world. This data could be something like log files, test metrics or tracking data. The best way to integrate this data into a dataset is to use data.world's streaming API. Unlike the methods previously mentioned which pull data from the source on a regularly scheduled basis, data brought in through the streaming API can be pushed into a dataset based on a change to the original data. Because it's triggered by data events and not random time intervals, using the streaming API is the best way to manage real-time data. You can read more about streaming in our API Quickstart guide.
For those less comfortable with working directly with an API, data.world also integrates with several superconnectors like IFFTTT, KNOTS, Singer or Stitch. While easier to use, they are less flexible and versatile than our own streaming API. You can see a full list of our superconnector integrations on our superconnector integrations page.
Data via a URL or RESTful API
Another common source of data is from a URL or RESTful API available on the internet. If you have a Google Sheets doc, e.g., you can add it to a data.world dataset. As long as the data is on a site that's publicly accessible, you can sync it to data.world--even if it's on a password-protected site with data.world's option to add from a URL. Detailed instructions for adding and syncing data from a url can be found in the article Adding files from a URL. If you do not own the data from the web that you'd like to bring into data.world, you can find out more about licensing and data in the article Licensing and data you found.
If you have data that is behind an API that you'd like to put on data.world--e.g., data from Salesforce, Facebook Ads, Google Ads, etc.--the best way to get it into a dataset is to use one of the superconnectors shown above. More information about our sales and marketing app integrations can be found here.
In addition to data that is available to data.world via cloud sources or APIs, some data that you might want to make accessible on data.world might only be available on your corporate network or behind a firewall. For customers with a need to catalog data behind a firewall, we make our Virtual Data Connector available as an appliance that can be hosted at your site and communicates with data.world via a secure bridge protocol. This option is available to our enterprise tier customers. If you have this need, please contact our sales team at firstname.lastname@example.org and they will help you with your options.