This tutorial is part of the basic tutorial series for the data.world platform. See the article overview of basic tutorials for more information.
In this tutorial we introduce writing SQL queries in the query panel of the workspace. A query in the context of data is a request for information written in a query language like SQL or SPARQL. Queries use a specified set of terms in a specific order with proscribed punctuation. In addition to this tutorial we have many resources to help you learn about about querying on data.world including articles on query basics, working with existing queries, and using query templates. We also have complete documentation and a tutorial for SQL and SPARQL--the query languages used on data.world.
After working through the tutorial you should be able to:
- Tell the difference between datasets and projects
- Create a SQL query
- Use auto-complete of terms and objects in your query
- Use click-to-copy for columns and tables
- Understand the pop-up help text in queries
- Auto-format queries
- Troubleshoot queries
- Save queries
To complete this tutorial you need to have:
- A data.world login (available for free here if you don't have one).
- Your own tutorial project (you must create this yourself--it cannot be downloaded)
- The Bee Colony Statistics dataset linked to your project
If you need help creating the project or linking the dataset to it, detailed instructions are in the tutorial Create a project to work with data.
If you prefer to go straight to the exercises, click here.
Datasets vs. projects
As was discussed in the Intro to the data.world tutorial, a dataset is where data is stored for use in projects. It contains only the basic resources that pertain to the majority of the projects that use it. Queries are generally specific to a project as they pull only the data from the dataset which is needed for analysis for that project. As such, queries are stored, with rare exceptions, in a project. See our overview of the project workspace for more information. More detailed information about when to use a dataset or a project is in the article on datasets and projects.
How to create a query
To write a SQL query against data in a project go to the upper left corner of the project workspace and select SQL Query from the + Add menu:
When the query editor comes up in the center panel of the screen, the Project Schema also comes up in the right side panel. If the project schema isn't showing, select the arrow to the right of the Run query button to expand the sidebar:
Auto-complete of terms and objects
Click next in the query editor window to begin entering your query text. As you type your query, the editor suggests SQL terms--operators, aggregations, and functions--for you. Arrow down or up to choose another option:
Hitting 'enter' will autocomplete the highlighted suggestion in the list.
Click-to-copy for columns and tables
You have the option of typing in the table and column names for the data you are querying, but you can also use the click-to-copy feature on the project schema to copy the names for you so you don't have to worry about typos.
Pop-up help text
Help text is provided for all SQL functions and aggregations when you hover over the function name:
While you can type on one line type your SQL query all on one line--it is perfectly legal and will run--it's good practice to split it up based on clauses and punctuation. The line breaks make it easier to see where you may have forgotten a comma or some other element of the query, and it also allows us to give you better error messages if there are problems with your query.
Here's an example of a query before and after running it through our auto-formatter:
To auto-format your query, use the keyboard shortcut Cmd + Option + L . If you forget the command there is a dropdown menu to the right of the Run query button which has a link to a list of all the keyboard shortcuts.
From the dropdown menu you can also manage your query (rename, copy, etc.) as well as get to our SQL tutorial for additional help.
The article on query editor shortcuts also contains a full list of the shortcuts for the query editor.
If you try to auto-format your query and it doesn't work, that's a sign that you have an error in the query. If you run it you'll get an error message telling you what the problem is and where to find it:
In the above example the query was written to search for the number of colonies recorded by year just for the state of Montana. However the way the query is written it looks like it is searching for records where the value in the bee_colony_census_data_by_state.state column matches the value in another column named Montana. For Montana to be recognized as a value, it has to be surrounded by double-quotes. The error message returned from running the query tells you that the problem is in the 3rd line and the 47th character position (column). The 47th character should have been the opening ":
When your query is complete, save it for later use by selecting Save to the left of the Run query button. You'll be prompted to name the query, add a description, determine where to save it, and indicate who can see it:
It's good practice to give the query a short descriptive name for both your own later use and for use by others working on your project. Adding a description also makes your query more useable as the description shows up on the resource card returned from using search on data.world.
- Open your tutorial project.
- Select SQL query from the +Add button in the upper left.
- Write the following query in the query editor pane and run it:
WHERE bee_colony_census_data_by_county.state = "MONTANA"
- Save the query to your project
There is much more to learn about queries and query languages than will fit into one tutorial, and this one just scratches the surface of how to write and save queries. If you're conversant with SQL or SPARQL, your best next step is use your project to practice writing queries on data.world. If you need more information, both our SQL and SPARQL documentation are great resources. If you don't know either SQL or SPARQL and want to query data.world datasets we recommend you work through our SQL tutorial. More information about queries and how to use them is covered in the tutorial Advanced work with queries.
- Overview of basic tutorials - An overview of the basic tutorials
- Query basics - A general intro to queries on data.world
- Working with existing queries - How to save existing queries for your own use
- Using Query templates - Creating and using query templates to make using variables in queries easier
- SQL documentation - How to use SQL on data.world
- SQL tutorial - Exercises built on our SQL documentation for learning SQL from the ground up
- SPARQL docs and tutorial - An introduction to the SPARQL query language with built-in exercises
- Bee Colony Statistics - The main dataset used in these tutorials
- Bee Colony Census 2017 by State - A dataset created in the tutorial Add data to data.world
- Create a project to work with data - The previous tutorial which details how to create the project you will need to have for this and the and the rest of the tutorialss
- Introduction to the project workspace - A complete overview of the project workspace
- Datasets and projects - An explanation of when to create and use datasets and projects
- Query editor shortcuts - A list of the shortcuts that are available in the query editor
- Advanced work with queries - A deeper-level tutorial on using queries