Recently data.world rolled out a new workflow in order to improve the user experience. The new workflow really highlights the differences between datasets and projects and clearly delineates when to use one and when to use the other. Previously the two could be used more or less indistinguishably--even though there were intended functional differences. However the ability to do all the same things in both places created confusion in users and led to inconsistent data practices. As a user's or organization's number of datasets and projects grew it rapidly became unwieldy trying to find the right data in them. The new workflow guides users to better--replicable and consistent--data management and analysis.
So what's new?
The main change in the layout and workflow of datasets and projects is that there are no longer separate dataset and project workspaces. Instead for datasets there is now an option to explore the dataset in a new, untitled project window. In addition to exploring the dataset you can also find out how many projects use it, link it directly to another project you have already created, or create a new project based upon it:
If you select Explore this dataset there is functionally no difference between the old dataset workspace and the new untitled project workspace. The button takes you to a workspace where you are able to browse or query, but to begin analysis (i.e., save anything) you need to save the project/give it a name:
Queries and the new workspace
Obviously the biggest change here for users accustomed to the previous workflow is that queries are no longer saved to datasets. The logic behind this change is that datasets are for storing files and tables, and projects are for querying and analyzing those files and tables. A dataset is meant to be reused in multiple projects, and if queries are saved to it instead of to the projects using it then the dataset can rapidly fill with irrelevant queries making it difficult to use. However if the queries specific to a project are all stored in that project, the linked dataset remains clean and ready for reuse.
The reasoning above covers 80% of the use cases, but what about the times you really do want to save a query to a dataset? Maybe you want to clean up the data, join tables and preserve the lineage of the original tables for reference, or just use the query in multiple projects without having to rewrite it (you might even want to parameterize it). In those cases it is useful to be able to save your query to the dataset, and you can still do that. After running your query, to save it to the dataset select the Save link and click the drop-down link to the right of the + New Project option. In addition to New project you'll also see the name of the dataset. Select it and the query will be saved to the dataset and you'll still be in an untitled, unsaved project:
One thing about saving queries to the dataset instead of to the project is that queries saved to a dataset won't show up in the queries list of any project the dataset is used in. Instead they'll be displayed under the connected datasets info:
For details on all the features in the updated project workspace see our quickstart to navigating the project workspace.