One of the first things you'll need to decide when adding your own data to data.world is whether to save it in a dataset or in a project. While data can be stored in both, there are some fundamental differences that will guide your decision.
What is a dataset?
Datasets are the building blocks for projects. They contain data and metadata related to a topic. The files and tabular data in a dataset can be used--queried and analyzed--in one or more projects. Datasets are meant to be reusable assets. They can be combined with other datasets in projects, or they can be a single source for querying and analysis in a project.
Datasets can be owned by an individual or an organization, and a dataset provides an additional layer of access permissions to the data in a project. Because permissions are assigned at both the dataset and the project level an individual can create a project available to the public, but if any datasets owned by an organization are added to the project, only people in that organization can see that dataset in the project or any queries written against it.
Because datasets are linked to projects, any changes to the data or the metadata in the dataset show up automatically in the linked project. Linking data instead of copying it means that everything is kept up to date throughout your organization.
What is a project?
Projects bring datasets together with documentation and analysis. This is where work and collaboration happen. A project, as the name implies, likely has a beginning and an end. Data in it is shared and analyzed, and insights are derived from the analysis and written up in the project.
The biggest difference between a dataset and a project is that datasets can be linked to and included in projects, but projects cannot be linked to or included in other projects or datasets--nor can the files that are added directly to a project. With a project you can run queries against the data, analyze it, share it and create charts and visualizations from it. However if you decide to start right away with a project and add your data files to it, neither you nor anyone else can link those data files to another project. The only way to reuse the data in another project is to download it file by file and re-upload it as a dataset or directly into another project. While there are times you'll want to download and re-upload files instead of just linking to them, you won't have a choice if you start by adding new data files directly to a project. One disadvantage to re-uploading is that you have to recreate all the metadata for the files (descriptions and the data.dictionary) which is a very cumbersome process!
Generally if you are putting up data to share or data that is private but which you might conceivably want to reuse in other projects, it's better to add the data to a dataset. If the data is in a dataset, all of its metadata will automatically show up in your project because the dataset is linked instead of copied. All changes to the original dataset--including automatic updates from the source and manual updates by the dataset owner to the metadata--will also be conveyed.
The table below summarizes the differences between adding data files to a dataset vs. to a project:
|Dataset vs. Project||dataset||project|
|Can run and save queries against||X|
|Can have charts/visualizations||X|
|Can incorporate different file types||X||X|
|Can contain multiple files||X||X|
|Can be shared/have contributors||X||X|
|Can have a discussion thread||X||X|
|Can include insights||X|
|Can use existing data.world datasets without having to download and reimport them and having to recreate the associated meta-data||X|
|Can be included in a project||X|
|Can be shared for others to use in their own datasets and projects||X|