This article is part of a larger data.world tutorial. Instructions and basic terminology for the tutorial are found in the intro to the data.world tutorial. Please refer back to that article if you need help with how to use the various sections of the tutorial.
In this exercise we will introduce writing SQL queries in the query panel of the project workspace. A query in the context of data is a request for information written in a query language like SQL or SPARQL. Queries use a specified set of terms in a specific order with proscribed punctuation. In addition to this tutorial we have many resources to help you learn about about querying on data.world including articles on query basics, working with existing queries, and using query templates. We also have complete documentation and a tutorial for SQL and SPARQL--the query languages used on data.world.
After working through the exercise in this article you should understand:
- The difference between datasets and projects
- How to create a query
- Auto-complete of terms and objects in your query
- Click-to-copy for columns and tables
- Pop-up help text in queries
- Auto-formatting queries
- Troubleshooting queries
- Saving queries
To complete this exercise you need to have:
- A data.world login (available for free here if you don't have one).
- Your own tutorial project (you must create this yourself--it cannot be downloaded).
- The Bee Colony Statistics dataset linked to your project.
If you need help creating the project or linking the dataset to it, detailed instructions are in the exercise Create a project to work with data.
Skip to exercise
If you prefer to go straight to the exercise and refer back to the information in the Background section when you need additional information, click here.
Datasets vs. projects
As was discussed in the Intro to the data.world tutorial, a dataset is where data is stored for use in projects. It contains only the basic resources that pertain to the majority of the projects that use it. Queries are generally specific to a project as they pull only the data from the dataset which is needed for analysis for that project. As such, queries are stored, with rare exceptions, in a project. See our video Introduction to navigating the project workspace (need to film the video and link it in here) if you would like an overview of the entire project workspace.
More detailed information about when to use a dataset or a project is in the article on datasets and projects.
How to create a query
To write a SQL query against data in a project go to the upper left corner of the project workspace and select SQL Query from the + Add menu:
When the query editor comes up in the center panel of the screen, the Project Schema also comes up in the right side panel. If the project schema isn't showing, select the arrow to the right of the Run query button to expand the sidebar:
Auto-complete of terms and objects
Click next in the query editor window to begin entering your query text. As you type your query, the editor suggests SQL terms--operators, aggregations, and functions--for you. Arrow down or up to choose another option:
Hitting 'enter' will autocomplete the highlighted suggestion in the list.
Click-to-copy for columns and tables
You have the option of typing in the table and column names for the data you are querying, but you can also use the click-to-copy feature on the project schema to copy the names for you so you don't have to worry about typos.
Pop-up help text
Help text is provided for all SQL functions and aggregations when you hover over the function name:
While you can type on one line type your SQL query all on one line--it is perfectly legal and will run--it's good practice to split it up based on clauses and punctuation. The line breaks make it easier to see where you may have forgotten a comma or some other element of the query, and it also allows us to give you better error messages if there are problems with your query.
Here's an example of a query before and after running it through our auto-formatter:
To auto-format your query, use the keyboard shortcut Cmd + Option + L . If you forget the command there is a dropdown menu to the right of the Run query button which has a link to a list of all the keyboard shortcuts.
From the dropdown menu you can also manage your query (rename, copy, etc.) as well as get to our SQL tutorial for additional help.
The article on query editor shortcuts also contains a full list of the shortcuts for the query editor.
If you try to auto-format your query and it doesn't work, that's a sign that you have an error in the query. If you run it you'll get an error message telling you what the problem is and where to find it:
In the above example the query was written to search for the number of colonies recorded by year just for the state of Montana. However the way the query is written it looks like it is searching for records where the value in the bee_colony_census_data_by_state.state column matches the value in another column named Montana. For Montana to be recognized as a value, it has to be surrounded by double-quotes. The error message returned from running the query tells you that the problem is in the 3rd line and the 47th character position (column). The 47th character should have been the opening ":
When your query is complete, save it for later use by selecting Save to the left of the Run query button. You'll be prompted to name the query, add a description, determine where to save it, and indicate who can see it:
It's good practice to give the query a short descriptive name for both your own later use and for use by others working on your project. Adding a description also makes your query more useable as the description shows up on the resource card returned from using search on data.world.
- Open your tutorial project.
- Select SQL query from the +Add button in the upper left.
- Write the following query in the query editor pane and run it:
WHERE bee_colony_census_data_by_county.state = "MONTANA"
- Save the query to your project
There is much more to learn about queries and query languages than will fit into one article, and this tutorial just scratches the surface of how to write and save queries. If you're conversant with SQL or SPARQL your best next step is use your project to practice writing queries on data.world. If you need more information, both our SQL and SPARQL documentation are great resources. If you don't know either SQL or SPARQL and want to query data.world datasets we recommend you work through our SQL tutorial. More information about queries and how to use them is covered in the section Advanced work with queries.
- Intro to data.world tutorial - An introduction to and instructions for using these exercises
- Query basics - A general intro to queries on data.world
- Working with existing queries - How to save existing queries for your own use
- Using Query templates - Creating and using query templates to make using variables in queries easier
- SQL documentation - How to use SQL on data.world
- SQL tutorial - Exercises built on our SQL documentation for learning SQL from the ground up
- SPARQL docs and tutorial - An introduction to the SPARQL query language with built-in exercises
- Bee Colony Statistics - The main dataset used in these exercises
- Bee Colony Census 2017 by State - A dataset created in Exercise 3. of this tutorial available to those who did not do that exercise
- Create a project to work with data - The previous exercise in this tutorial which details how to create the project you will need to have for this and the and the rest of the tutorial exercises
- Introduction to navigating the project workspace - Video introduction to the project workspace. The text article can be subbed here until the video is finished
- Datasets and projects - An explanation of when to create and use datasets and projects
- Query editor shortcuts - A list of the shortcuts that are available in the query editor
- Advanced work with queries - The section of the tutorial which describes how to use the results of queries in your project.