Once you have created a dataset or project and added your files to it, you can make it easier to find and more useful to others by describing, or documenting it. Documenting consists of creating the metadata for your dataset or project. Searches on data.world look at titles, descriptions, summary, and tags to match search strings so the more completely you describe your data the more chance it has of being found (for more information on searching see Finding data). The metadata collected is:
The starting point for describing your data is the dataset overview page. From here you can create the summary, edit the description, create the tags, set the licensing, and edit the data dictionary:
For a comprehensive review of the various components of the dataset pages see Tour data.world.
The summary of the dataset is one of two documents created with the dataset. The summary is where all of the information about the origin of the data, why you created the dataset, etc., is found. Use the Summary section to tell your data’s story. For example:
- Where did the data come from? Cite and link to your sources or include your details for a 'citation request'. Not only does this give credit where credit is due, but it helps other people evaluate the data's suitability for their needs.
- If you think a particular piece of context will be useful to others, add it.
- The best summaries cover the “who, what, where, when, why, and how” of the data.
- What’s the data telling you? What would others be interested to know about it? What have others found using this data?
- If the data has associated data dictionaries or other documentation, upload it and then link to it from your Summary.
- Make it visually friendly with Markdown styling. It’s easy to learn and goes a long way.
To write or edit the summary, select the Edit link in the top right corner of the summary window and you'll be taken to en edit window for the document Dataset summary in the workspace. The summary shown below was written with Markdown styling. There is a sidebar to the right of the summary window that is accessed by clicking the arrow in the upper right corner of the summary edit window will bring up a Markdown cheat sheet:
To see what your summary will look like formatted you can select the Preview tab:
Once you have saved your summary, if you used header formatting in it the right sidebar displays an outline of it with anchors to the various sections. In the example below the summary is very short, but the outline feature can be very useful for long summaries:
The dataset, all the files in it, and all the columns in the tabular files have description fields associated with them. Descriptions are very short and serve as a quick reference for the item they describe. To edit the description for the dataset you can select Edit next to the description, Edit next to About this dataset, or the Settings tab:
Tags are a powerful feature that you can use in a variety of ways to facilitate access to your data. For example, tags can be used to organize and group your data by topic, category, source, department, or team. They can be searched for explicitly with the Tags search operator, and can also help to filter down more generic search results.
The tag section is accessed from the dataset Overview tab. There is no limit to the number of tags you can use for a dataset, and there is an autofill feature on the tag field. If the dataset is owned by an organization, the tags displayed for autofill are chosen from all the tags used by the organization. If the dataset is not owned by an organization, the autofill suggestions are from a generic list of tags as well as from tags you have recently created:
The final piece of metadata to document on the Settings tab is the licensing information. licensing is determined by two factors:
- The licensing of the source documents in the dataset
- The wishes of the dataset owner
The general rule of thumb is that the most proscriptive license for the source material is the least proscriptive license that can be used for the dataset. However--existing source licensing again being taken into account--the owner of the dataset can choose even more stringent licensing for others who wish to use the dataset. Licensing selection is made via a dropdown menu in the licensing field:
For a comprehensive explanation of licensing on data.world see our article Understanding licensing.
The data dictionary is the other document created along with the dataset. The data dictionary contains:
- The names of all the files in the dataset
- A place to add descriptions for each file
- Metadata labels for each file
and for tabular files:
- Column names
- The format of the data in each column
- A place to add a description for each column
You can get to the data dictionary either from the Overview tab (right below the Summary) or from the Documents section in the left pane of the workspace:
Data dictionary entries for each file are edited separately by selecting the Edit link next to the filename in the data dictionary document. Every file--no matter what type--has a data dictionary entry which contains the file metadata for the file:
Tabular files also have a tab for columnar metadata where you can rename the columns, change their format, and add descriptions for them:
Changing column names is a great way to avoid the ambiguity that comes from having multiple columns with the same name returned by a search. It also renders obscure column names understandable. Changes to column names propagate throughout the data from the overview page to the query window, and the changes remain even if the data is updated from an external source.