Knowledge Catalog manages metadata about data assets (such as BigQuery tables). This metadata powers search, and agents to discover and understand data assets.
This sample demonstrates how external sources of information can be used to augment the metadata in Knowledge Catalog, using an agentic approach. The enrichment agent finds information relevant to a data asset, generates documentation, and uses Knowledge Catalog APIs to publish the documentation as enriched metadata.
Users can customize this enrichment agent with custom instructions, tools to acccess sourcses of information within their organization, and skills to describe the use of these tools.
Setup your environment
This step clones the repository locally, configures various environment variables, creates a python virtual environment and installs dependencies.
git clone https://github.com/googlecloudplatform/dataplex-labs
cd dataplex-labs/knowledge_catalog_enrichment_agentEnsure you have the GCloud CLI installed and configured.
export CLOUD_PROJECT=<cloud-project-id>
gcloud auth application-default login
gcloud config set core/project $CLOUD_PROJECT
gcloud auth application-default set-quota-project $CLOUD_PROJECTSetup a python environment
cd src
source env.sh --installSetup sample data
The sample creates a sample BigQuery dataset in your project whose metadata is curated.
python3 ../sample/data/create_data.py
You are now ready to run the enrichment workflow.
Download a metadata snapshot
python3 -m enrichment.download \
--dir ../sample/metadata.initial \
--dataset ${KC_ENRICH_SAMPLE_PROJECT}.kc_enrich_sample_dataEnrich the metadata
python3 -m enrichment.enrich \
--dir ../sample/metadata.initial \
--output-dir ../sample/metadata.new \
--config-dir ../sample/configReview the metadata updates
git diff --no-index \
../sample/metadata.initial ../sample/metadata.newPublish the updated metadata
python3 -m enrichment.publish \
--dir ../sample/metadata.new