This article is an advanced look at the search operators used in the search bar on data.world. For an introduction to all search capabilities including filtering search results and finding similar data start with the article on finding data.
There is a lot of information on data.world, and finding just the data resource that you're looking for can be a daunting task. Fortunately the robust search options available with the data.world search engine enable you to craft just the right search string to find what you're looking for. From the search bar (located at the top of your homepage and on many other data.world pages), you can search for an entire phrase or for matches on single words. Additionally, you can qualify your search with various operators or perform complex searches combining operators. In this article we'll cover:
Basic text search
In a basic text search (no operators) the search engine looks for a match to your search terms in the following places:
- Title of project, dataset, or query
- Description of project or dataset
- Summary of project or dataset
- Tags
- Insights
- User and organization names
If you have a search string with more than one word in it you can structure your search so that the search engine looks for an exact match of the string or looks for individual words and searches for items that have all of the words in any order. To search for the entire string, put double quotes around that string (" "). The search engine then looks for instances of that exact string and only returns items which contain it. Spaces, hyphens or underscores in search strings are not tokenized if there are double-quotes around the string. Searches with multiple words in them that are not bounded by double quotes return items that have all of the words in the search string. Here are some examples:
- "Medicare hospital spending" - returns only data resources with the exact phrase Medicare hospital spending in them
- Medicare hospital spending - returns results that include Medicare, hospital, and spending, but they don't have to be in that order
Logical operators AND, OR, and NOT
The operators AND, OR, and NOT can all be used to restrict results returned from a search.
AND
Using the operator AND is not necessary because AND is the default operator for searches on data.world. The search string
- colony collapse - returns anything with the words colony and collapse somewhere in the search fields
OR
The OR operator returns results that have either one string or the other. The OR operator can be used multiple times in a search string, however, one thing to watch for when using operators other than AND is that each word is considered a search term so the search string
university degree OR high school diploma
has five search terms--not two--and does not return results that have either the phrase university degree OR the phrase high school diploma. Nor does it return results that have university AND degree OR high AND school AND diploma because there are different ways to parse that search string which would yield different results. For that reason we require explicit grouping of search terms in complex searches. More on how to use complex searches below.
NOT
Sometimes a search will return a lot of items that are related to each other and not of interest to you and you need to be able to winnow out extraneous results. E.g.:
- wildlife - Returns several thousand results--many of which seem to be about wildlife refuges instead of wildlife.
Insert the operator NOT before the text you want left out of your search and you can cut the number of results significantly.
- wildlife NOT refuge - Gets the number down significantly and shows you what else you might want to remove.
The NOT operator, like the OR operator, can be used more than once in a search:
- wildlife NOT refuge NOT "us-doi-gov" - Returns results that match wildlife and do not contain refuge or us-doi-gov. At this point you might have a better idea what keywords are common to the results you want and you can search for them with AND or OR.
NOT, like OR, cannot be used in complex searches (combined with AND or OR) without specifying the grouping of the search terms.
Other operators
Keywords can also be used with a set of data.world-specific operators to further qualify your searches. There is a common set of rules that govern the use of these operators and it is consistent across all of them. For all keyword operators:
- The syntax of an operator search string is operator:keyword where operator is the name of the operator and keyword is the string you want to match.
- There is no space after the colon (operator:keyword, not operator: keyword).
- If there are underscores, hyphens, or spaces in the search string you need to use double quotes to match the entire string:
The following keyword operators and qualifiers are currently supported as search parameters:
- tag
- user and org
- owner and creator
- file
- table
- column
- resourcetype - choices are:
- dataset
- project
- insight
- file
- table
- query
- catalogTable
- catalog
- term
- datatype
- analysis
- extension
- visibility - qualifiers are:
- open
- private
- created and updated - qualifiers are:
- >
- >=
- <
- <=
- { }
tag
The tag operator specifically searches against the tags associated with a dataset or project and returns a list of only the datasets and projects that have that tag. Partial and exact matches are allowed:
- tag:bee - Any dataset or project with the word 'bee' in its tag (e.g., 'bee', 'bees' and 'bee colony').
- tag:"bee" - Only datasets and projects with the exact tag 'bee'.
- tag:bees - Any dataset or project that has a tag which includes 'bees' (e.g., 'bees' and 'native bees') NOTE: Does not include datasets or projects with the tag 'bee'.
- tag:"bees knees" - Only datasets and projects with the exact tag 'bees knees'
- tag:bees knees - Any dataset or project that has a tag which includes 'bees' and the string 'knees' in any searchable field.
user and org
The user and org operators search for a string found in the display name or the id of a user or organization respectively. The character "@" restricts the search to an exact match of the id:
- user:dave - all users with the string dave in the login name or in the display name.
- user:"dave" - all users with the exact login name 'dave' or display name 'dave'.
- user:@dave and user:"@dave" - only the user whose id is @dave
- user:dave griffith - Users with 'dave' in either the login or display name, and 'griffith' in either the login or display name fields.
- user:da - Any user or organization with the leading string 'da' in either the login or display name
- user:"dave griffith" - Only the user whose display name is Dave Griffith.
- org:data - any organization with the string 'data' in either the the display name or the organization id.
- org:"denver" and org:"@denver" - Only the organization with the id 'denver'.
NOTE: The search @name will return either the organization or the user with the id 'name'.
owner and creator
The owner of a resource is the person or entity who was designated as such when the resource was created. If a person was selected as the owner, that person will also be the creator. If an organization was selected as the owner, the creator will still be the person who created the resource. The owner operator returns all the datasets, projects, and insights owned by either a person or an organization. The creator operator returns all the datasets, projects, and insights created by a user. They both follow the same patterns as user and org:
- owner:dave - All datasets, projects and insights owned by any user or organization with 'dave' in either the display name or id.
- owner:"dave", owner:@dave, owner:"@dave" - All datasets, projects and insights owned by any user or organization with the exact display name or id 'dave'.
- owner:"dave griffith" - All datasets, projects and insights owned by the user whose display name is Dave Griffith.
- creator:@stateofny - everything created by the user with the login of stateofny regardless of whether the user or an organization that the user belongs to is the owner.
- owner:data-ny-gov - Everything owned by the organization data-ny-gov (created by individuals in the organization).
file
The way the search engine treats the file operator has been updated and file results are shown the same way as other primary resources like dataset, project, insight, etc. Now when you search for a file you get a list of all the files that match your search--not a list of the datasets and projects which use that file:
- file:bee - Returns a list of files with 'bee' in the name, and the location of the file is shown by the icon at the bottom of the result card. The orange icon indicates the file is located in a project, the blue icon indicates it's in a dataset:
- file:bee colony - Returns any file with 'bee' in the name and 'colony' in any other searchable field
table
The table operator is used to find tablular data either in table or as a sheet in a spreadsheet:
- table:cutting - All tables with either 'cutting' in the name of the file or as the name of a sheet in a spreadsheet are returned:
- table:"austin_animal_center_outcomes" - Using an exact search on the table operator is one way to find all the public datasets and projects that were created from the same source data. In this case, several different people imported the Austin Animal Center statistics from the City of Austin government website:
column
Searching with the operator column returns a list of all datasets and projects which have a tabular file with a column with that name in it:
- column:outcome - As expected, all datasets and projects which have tables containing columns with 'outcome' in their names
- column:"test_outcome" - Only the project with the column named 'test_outcome' in one of its tabular files.
resourcetype
The resourcetype operator allows you to search for either datasets, projects, or insights. It is best used in conjunction with another search string in a complex search (see the examples below under Complex Searches).
extension
Searching with the extension operator returns all datasets and projects which include files with the specified extension. The searches are exact-match only and the '.' is optional:
- extension:jpg, extension:"jpg" and extension:.jpg - all return the same results
- extension:jpeg, extension:"jpeg", and extension:.jpeg - all return the same results which are different than the results from the results above.
visibility
The visibility operator is mainly useful to verify permissions on your data:
- visibility:private - Returns all private datasets and projects owned by you or an organization you are in.
- visibility:open - Returns all public datasets and projects on data.world.
created and updated
Created and updated are two operators which can be used to find datasets, projects, insights, users and organizations based on the date they were added or last updated. Timestamps are set in UTC, not your local time, so you might get results that are a day off of your local time depending on where you are:
- created:>2018-07-01 - Created after date.
- updated:>=2018-07-01 - Updated on or after date.
- created:<2018-07-01 - Created before date.
- updated:<=2018-07-01 - Updated on or before date.
- created:{2018-07-19 TO 2018-07-21} - Created between dates (not including).
Tokenization
Hyphen and underscore (- _) characters are tokenized in some searches and are not read by the search engine as hyphens and underscores except in exact match searches:
- animal_center and animal-center - Return the same results.
- animal center - Returns a different set of results.
- "animal-center", "animal_center" and "animal center" - All search for the exact strings in the quotes and return different results.
- table:"bee_colony_census_data_by_county" - this is the format required to get an exact search match
Complex searches
Combining search operators is a powerful way to restrict search results and really drill down through the data to find what you want. Here are some examples of complex searches created by combining operators:
- owner:siyeh AND resourcetype:insight - Finds all insights written by a specific person.
- tag:health AND resourcetype:dataset AND shelter - Finds all datasets and projects that have the tag 'health' and a table with either 'shelter' or center' in the name.
- resourcetype:dataset AND owner:dave - Finds all the datasets owned by anyone with 'dave' in their id or display name.
It is also possible to combine different operators in a complex search, but you need to clearly group the parts of the search string that go with each operator or the search engine will not process your request correctly. For examples the search string
bee and pesticide or colony and collapse
could be parsed in a few of different ways including:
- (bee AND pesticide) OR (colony AND collapse) - all results that either have bee and pesticide or have colony and collapse
- bee AND (pesticide OR colony) AND collapse - all results that have bee and either pesticide or colony and also have collapse
- bee AND (pesticide OR (colony AND collapse)) - all results that have bee and either pesticide or both colony and collapse
The search string
bee and pesticide or colony and collapse
will not return predictable results.
Searching for exact matches in complex searches also requires careful construction of the search string to get the desired results. For example if you wanted to search for everything that had to do with either a university degree or a high school diploma the following search strings would give you completely different results:
- university degree OR high school diploma - nonsense results because of the lack of grouping
- (university degree) OR (high school diploma) - all results have either the terms university and degree (together or separate in any order or location), or the terms high, school, and diploma (also together or separate in any order or location)
- "university degree" OR "high school diploma" - all results have either the string "university degree" or the string "high school diploma" somewhere in them.