Introduction
A query, in the context of data, is a request for information written in a query language. A query language uses a specified set of terms in a specific order with proscribed punctuation. The query languages used on data.world are SQL and SPARQL. While data.world is built on semantic web technologies that are best queried using SPARQL, we recognize that most people are more familiar with SQL. For that reason, we've developed our own version of SQL to easily query data.world datasets. If you are new to SQL, we have documentation that will walk you through everything from 'what is a database' to 'how do I join all these different tables together so I can query them all at the same time'. There is also a tutorial based on the exercises in the documentation. For those who would rather work with SPARQL, data.world also has a tutorial for SPARQL that will get you up to speed regardless of your skill level.
In this article we'll go over how to use the query panel in the project workspace for SQL queries including:
- Creating a new query
- Auto-complete of terms and objects
- Click-to-copy for columns and tables
- Pop-up help text
- How to auto-format queries
- Troubleshooting
- Saving queries
- Saving queries to datasets
See the article Quickstart to navigating the project workspace if you would like an overview of the entire project workspace.
Creating a new query
To write a SQL query against data in a project go to the upper left corner of the workspace and select SQL Query from the + Add menu:
When the query editor comes up in the center panel of the screen, the Project Schema also comes up in the right side panel. If the project schema isn't showing, select the arrow to the right of the Run query button to expand the sidebar:
Auto-complete of terms and objects
As you type your query, the editor suggests SQL terms--operators, aggregations, and functions--for you. Hitting 'enter' will autocomplete the highlighted suggestion in the list. Arrow down or up to choose another option:
Click-to-copy for columns and tables
You have the option of typing in the table and column names for the data you are querying, but you can also use the click-to-copy feature on the project schema to copy the names for you so you don't have to worry about typos.
Pop-up help text
Help text is provided for all SQL functions and aggregations when you hover over the function name:
How to auto-format queries
While you can type on one line type your SQL query all on one line--it is perfectly legal and will run--it's good practice to split it up based on clauses and punctuation. The line breaks make it easier to see where you may have forgotten a comma or some other element of the query, and it also allows us to give you better error messages if there are problems with your query.
Here's an example of a query before and after running it through our auto-formatter:
To auto-format your query, use the keyboard shortcut Cmd + Option + L . If you forget the command there is a dropdown menu to the right of the Run query button which has a link to a list of all the keyboard shortcuts.
From the dropdown menu you can also manage your query (rename, copy, etc.) as well as get to our SQL tutorial for additional help.
Troubleshooting
If you try to auto-format your query and it doesn't work, that's a sign that you have an error in the query. If you run it you'll get an error message telling you what the problem is and where to find it:
In the above example the query is written to search for all the records that have the value "won" in the deal_stage column in the sales_pipeline table. However the way the query is written it looks like it is searching for records where the value in the deal_stage column matches the value in another column named Won. For "won" to be recognized as a value, it has to be surrounded by double-quotes. The error message returned from running the query tells you that the problem is in the 4th line and the 36th character (column). The 36th character should have been the opening " on the 4th line:
Saving queries
When your query is complete, save it for later use by selecting Save to the left of the Run query button. You'll be prompted to name the query, add a description, determine where to save it, and indicate who can see it:
:
It's good practice to give the query a short descriptive name for both your own later use and for use by others working on your project. Adding a description also makes your query more useable as the description shows up on the resource card returned from using search on data.world.
Saving queries to datasets
In some cases you might want to save your query in its dataset rather than in a project. Most of the time you'll want to work with your data--including querying it--from within a project. However if you do want a query to be available in every project that uses a certain dataset you can save the query to the dataset. When you select Save, instead of selecting the default + New project, if you click the menu icon to the right you'll see that you can save the query directly to the dataset:
For information on all the ways you can use the results of a query see the article Working with queries.