The DWCC is a self-contained program for extracting metadata from various data sources including databases like MS SQl, Redshift, Amazon Athena, and Snowflake, and non-database sources like Tableau Server. New sources--both database and non database--are continually being added.
The DWCC is deployed as a command-line application shipped as a Docker image. When it is run, it creates a Docker container that is isolated from everything else on your system except the data source it catalogs, and the directory outside the container for the catalog output.
The DWCC pulls only metadata from the source. It doesn't collect any data. For databases, the information gathered includes the number of tables and columns , the names of the tables and columns, key information, and the data types used--information that is useful for data analysts to use.
You can use one DWCC to catalog as many data sources as you have. All you need to do is change the name of the catalog source and the parameters in the command-line.
Because the DWCC is shipped as a Docker image, you need to have Docker installed on the local machine. If you can't use Docker, we also have a Java version of DWCC available. For more information about Docker see https://docs.docker.com/get-docker/.
The computer running the catalog collector should have network access to the data source.
The user running the catalog collector must have read access to the data resource.
For many data sources you will also need to have jdbc drivers for the data source installed on the local machine. The DWCC assumes the .jar file driver is in the ../jdbcdrivers directory.
Finally, a minimum of 2G of memory and a 2Ghz processor are required for all sources. Certain data sources (like BigQuery) may have additional requirements.