Welcome to the basic data.world tutorials! There are many great reasons why to use data.world, and in these tutorials we're going to walk through how to use it. The basic tutorials are written in logical, manageable chunks of information explaining the basic concepts of the data.world platform. The exercises at the end of each tutorial are completed on data.world. You will need a data.world login, (available for free here if you do not have one) to do the tutorial exercises. Each tutorial follows the format of:
- Introduction - Contains a brief introduction to the topic of the tutorial including terminology introduced in it.
- Requirements - A list of tangible work from previous tutorials needed to complete this tutorial, and links to sample files you can use if you did not complete the previous tutorials.
- Objectives - The learning goals of the tutorial linked to sections in the Background.
- Background - The main body of the tutorial with all documentation and screenshots necessary to complete the exercise.
- Exercise(s) - One or more practice exercises that walk through all the material presented in the tutorial. The output of an exercise is often the basis for a future tutorial's exercise.
- Best practices (optional) - If there are many ways to do something and we have suggestions for the best way to do it, they will be here.
- Conclusion - A quick recap of the tutorial.
- Reference list - A list of all the hyperlinked data in the tutorial and other important references.
How to use the tutorials
The tutorials are ordered and presented in a natural progression for learning data.world from the ground up. Those that cover the main aspects of working with data.world are in the Basics section, and are numbered (e.g., 1.) Secondary or expert skills are found in the Advanced tutorials section. If you have already been using the platform and know the basics, you can jump around the tutorials to fill in any gaps or expand your knowledge. The entire tutorial series is based on a project using the Bee Colony Statistics dataset. Each person who uses the tutorials will need to create their own project using that dataset.
Though the exercises in each tutorial build on the work done in previous tutorials, all of the tutorials are also stand-alone--we provide the files, queries, datasets, etc. needed to do the tutorial if you did not create them in a previous tutorial. The only action that everyone must complete on their own is to create the project. The project is the container for all the exercises, and each person working through them must have their own project.
Here is a list of all the tutorials in the basic section and links to them:
- Find data on data.world
- Create a project to work with data
- Add data to data.world
- Query your data
- Produce visualizations of your findings
After working through the tutorials you should be able to:
- Understand the terms used to describe data and data analysis
- Find data relevant to your needs on data.world
- Add, document, and share your own datasets
- Create a project to work with your data
- Produce visualizations of your data
Following are some terms it's helpful to understand before beginning the tutorials:
Data, whether you think of it as singular or plural, is information. It can be stored in a variety of formats including text files, spreadsheets, and relational and graph databases.
A database is information stored in a structured way.
The terms you use to describe data are its metadata. Metadata is information about data, like what format it's in, who owns it, where it came from.
A dataset is a snapshot of a database at a specific moment in time. Datasets on data.world are worked with--combined, analyzed, and discussed--in projects.
A project is the place you do your work with the data, and it can contain both datasets and files specific to (and only available in) the project. Datasets, in contrast, can be used in many different projects. (More detailed information about when to use a dataset or a project is in the article on datasets and projects.)
The data.world data catalog is a repository of datasets containing both data and metadata resources.
The tutorial is a living document, and as such new sections are being continually added as new functionality becomes available in the platform. If there is information you would like to see added, or if you have questions about any of part of the tutorial, please drop us a note!