Welcome to the data.world tutorial! There are many great reasons why to use data,world, and in this tutorial we're going to walk through how to use it. The tutorial is broken up into sections representing logical, manageable chunks of information on using the data.world platform. The exercises at the end of each section are completed on data.world. You will need a data.world login, (available for free here if you do not have one) to do the tutorial. Each section follows the format of:
- Introduction - Contains a brief introduction to the topic of the article including terminology introduced in it.
- Requirements - A list of tangible work from previous exercises needed to complete this exercise and links to sample files you can use if you did not do the previous exercises.
- Objectives - The learning goals of the article linked to sections in the Background.
- Background - The main body of the article with all documentation and screenshots necessary to complete the exercises.
- Exercise(s) - One or more practice exercises that walk through all the material presented in the section. The output of the exercise is often the basis for a future section's exercise.
- Best Practices (optional) - If there are many ways to do something and we have suggestions for the best way to do it, they will be here.
- Conclusion - A quick recap of the article.
- Reference List - A list of all the hyperlinked data from the article and other important references.
How to use the tutorial
The sections in the tutorial are numbered and presented in a natural progression for learning data.world from the ground up. Sections that cover the main aspects of working with data.world are numbered (e.g., 1.), secondary or expert skills have letters after the numbers (e.g., 4a.). If you have already been using the platform and know the basics, you can jump around the sections to fill in any gaps or expand your knowledge. The tutorial is based on a project using the Bee Colony Statistics dataset. Each person who does the exercises in the tutorial sections will need to create their own tutorial project.
Though the exercises in each section of the tutorial build on the work done in exercises from previous sections, all of the sections work as stand-alone exercises too--We provide the files, queries, datasets, etc. needed to do each project if you do not have your own. The only exercise that must be completed is the one to create the project. The project is the container for each subsequent exercise and each person working through the tutorial must have their own project.
Here is a list of all the sections in the tutorial and links to them:
- Find data on data.world
- Create a project to work with data
- Add data to data.world
- Query your data
- Produce visualizations of your findings
- Document your analysis and share with others
After working through the sections in this tutorial you should be able to:
- Understand the terms used to describe data and data analysis
- Find data relevant to your needs on data.world
- Add, document, and share your own datasets
- Create a project to work with your data
- Produce visualizations of your data
- Document and share your analysis
Following are some terms it's helpful to understand before beginning the tutorial:
Data, whether you think of it as singular or plural, is information. It can be stored in a variety of formats including text files, spreadsheets, and relational and graph databases.
A database is information stored in a structured way.
The terms you use to describe data are its metadata. Metadata is information about data, like what format it's in, who owns it, where it came from.
A dataset is a snapshot of a database at a specific moment in time. Datasets on data.world are worked with--combined, analyzed, and discussed--in projects.
A project is the place you do your work with the data, and it can contain both datasets and files specific to (and only available in) the project. Datasets, in contrast, can be used in many different projects. (More detailed information about when to use a dataset or a project is in the article on datasets and projects.)
The data.world data catalog is a repository of datasets containing both data and metadata resources.
The tutorial is a living document, and as such new sections are being continually added as new functionality becomes available in the platform. If there is information you would like to see added, or if you have questions about any of part of the tutorial, please drop us a note!