Configuration File (config.yaml)

Each of dwh-auditor’s diagnostic rules can be customized in config.yaml to meet project-specific criteria. Since “what is considered abnormal” depends on business requirements, all values ​​are designed to be overridden by the user.

Generating the Configuration File

dwh-auditor init

A config.yaml file will be generated in the current directory. If the file already exists, it will not be overwritten.

Note

config.yaml is safe to include in a Git repository. However, never include sensitive information (such as service account keys).

Full Configuration Schema

# dwh-auditor config.yaml

pricing:
  # On-demand price per 1TB scanned (USD)
  # BigQuery on-demand default price: $6.25/TB
  tb_scan_usd: 6.25

thresholds:
  # Exclusion line for full scan detection (GB)
  # Scans below this value are excluded from warnings
  # Setting to prevent alert fatigue from small master table scans
  ignore_full_scan_under_gb: 1.0

  # Maximum number of high-cost queries to report
  top_expensive_queries_limit: 10

  # Number of unreferenced days before considering it a zombie table
  zombie_table_days: 90

# (For future extension) dbt integration settings
dbt:
  enabled: false
  job_label_key: "dbt_model"

Configuration Details

pricing Section

pricing.tb_scan_usd

The on-demand price per 1TB scan in USD. The estimated cost calculation is as follows.

Estimated Cost (USD) = TB Scanned × tb_scan_usd

Key

Default Value

Description

tb_scan_usd

6.25

Corresponds to the standard BigQuery on-demand pricing.

Tip

If you are using Editions (Flex Slots / Standard / Enterprise), it will be a slot time billing model. In this case, set tb_scan_usd: 0.0 and use the cost analysis as a reference value.

thresholds Section

thresholds.ignore_full_scan_under_gb

Scans under this GB amount are excluded from full scan warnings.

Key

Default Value

Recommended Usage

ignore_full_scan_under_gb

1.0

Ignore scans of small master tables (e.g., prefecture tables)

thresholds.top_expensive_queries_limit

The maximum number of high-cost queries to report.

Key

Default Value

Description

top_expensive_queries_limit

10

Maximum number of queries to display in the console or report

thresholds.zombie_table_days

Tables that have not been referenced by SELECT for this number of days or more are reported as zombie tables.

Key

Default Value

Description

zombie_table_days

90

Classified as zombie if it hasn’t appeared in JOBS views in the past 90 days

dbt Section

This setting identifies queries issued by dbt using BigQuery job labels and aggregates costs for each dbt model. Leave it enabled: false for the current version.

Key

Default Value

Description

dbt.enabled

false

Whether to enable dbt integration

dbt.job_label_key

"dbt_model"

Label key name to identify the dbt model

Pydantic Schema

The configuration file is parsed into the Pydantic model dwh_auditor.config.AppConfig on the Python side, so invalid values (negative fees, threshold of 0, etc.) are detected as validation errors at startup.

from dwh_auditor.config import load_config

config = load_config("config.yaml")
print(config.pricing.tb_scan_usd)   # 6.25
print(config.thresholds.zombie_table_days)  # 90

For details, please refer to dwh_auditor.config — Load configuration files and manage DWH settings.