Data Catalog (which uses Cloud Spanner under the hood), provides a centralized place where organizations can find, curate and describe their data assets.

Data Catalog is a fully managed, scalable metadata management service in Google Cloud's Data Analytics family of products.

Data Catalog is a serverless central metadata management service. It provides data discovery and a solid foundation for data governance service.

Integrating Data Catalog with Cloud Storage or BigQuery, is relatively straightforward.

Using Data Catalog

There are two main ways you interact with Data Catalog:

  • Searching for data assets that you have access to
  • Tagging assets with metadata

In addition, Data Catalog interacts with Cloud Data Loss Prevention (DLP) to automatically identify sensitive data by using Cloud Data Loss Prevention's powerful auto-tagging mechanism.

Data Catalog can catalog the native metadata on data assets from the following Google Cloud storage system sources:

  • BigQuery datasets, tables, and views
  • Pub/Sub topics