With discoverable datasets we introduced the ability to make a the existence of a dataset available to the general data.world community without exposing any of the data in it. In this article we'll discuss how to take this feature and extend it even further by adding sample data files that can also be viewed. Users can view the samples to determine if they want to request access to the full dataset. A preview of the sample file is visible on the dataset overview page. If the user evaluating the data would like to see more than the preview, the file can be downloaded and viewed.
There are different ways to create sample preview files. A sample may have all of the columns as the original file, but not all the rows. Or it may have only some of the rows and some of the columns--columns with sensitive data having been removed.
Creating a sample file with all the columns
Starting from the overview page of your discoverable dataset, the easiest way to create a sample with all of the columns and only some of the rows is to select the Explore dataset button on the top right of the dataset overview page. Then select the file you would like to preview from the list in the left sidebar and click Query:
All you need to add to the sample query you are presented is a line at the end with a LIMIT clause in it setting the limit of the number of rows returned by the query to five (or however many you would like to be available. Note that only five will be previewed but the rest are available for download):
Hit the Run query button, then Download and Save to dataset or project:
It's a good idea to save the resulting table so that it's easily identified as a sample of a data file, not the file itself. The current dataset name will automatically be populated in the Dataset/Project field:
After you have added the file to your dataset you can go back to the dataset overview page by selecting View dataset from the left sidebar menu:
Then scroll down to the sample file and select the three dots on the right to edit its metadata:
Check the Preview option and save:
Creating a sample file with some columns removed
The process for creating this sample file is exactly the same as for the previous file except for one exception: Instead of using the * wildcard in your SQL query for "all columns", you need to list out the columns you want to include and then finish with the LIMIT clause as before. Here is an example:
each file can also be flagged as discoverable from the file metadata, accessed from the three dots to the right of the file name on the Overview tab:
Check the Preview box to make a random 5-line sample of the file visible from the overview page of the dataset:
Files that have been made previewable are flagged with the Preview label on the overview tab in the creator's view:
Note: Even though the file cannot be accessed in data.world until permission has been granted by the creator, it can still be downloaded by anyone. For this reason it's best to make a sample file derived from the original data and flag it as discoverable rather than make the original file previewable: