Configuration File (config.yaml)¶
Each of dwh-auditor’s diagnostic rules can be customized in config.yaml to meet project-specific criteria. Since “what is considered abnormal” depends on business requirements, all values are designed to be overridden by the user.
Generating the Configuration File¶
dwh-auditor init
A config.yaml file will be generated in the current directory. If the file already exists, it will not be overwritten.
Note
config.yaml is safe to include in a Git repository. However, never include sensitive information (such as service account keys).
Full Configuration Schema¶
# dwh-auditor config.yaml
pricing:
# On-demand price per 1TB scanned (USD)
# BigQuery on-demand default price: $6.25/TB
tb_scan_usd: 6.25
thresholds:
# Exclusion line for full scan detection (GB)
# Scans below this value are excluded from warnings
# Setting to prevent alert fatigue from small master table scans
ignore_full_scan_under_gb: 1.0
# Maximum number of high-cost queries to report
top_expensive_queries_limit: 10
# Number of unreferenced days before considering it a zombie table
zombie_table_days: 90
# (For future extension) dbt integration settings
dbt:
enabled: false
job_label_key: "dbt_model"
Configuration Details¶
pricing Section¶
pricing.tb_scan_usd¶
The on-demand price per 1TB scan in USD. The estimated cost calculation is as follows.
Estimated Cost (USD) = TB Scanned × tb_scan_usd
Key |
Default Value |
Description |
|---|---|---|
|
|
Corresponds to the standard BigQuery on-demand pricing. |
Tip
If you are using Editions (Flex Slots / Standard / Enterprise), it will be a slot time billing model. In this case, set tb_scan_usd: 0.0 and use the cost analysis as a reference value.
thresholds Section¶
thresholds.ignore_full_scan_under_gb¶
Scans under this GB amount are excluded from full scan warnings.
Key |
Default Value |
Recommended Usage |
|---|---|---|
|
|
Ignore scans of small master tables (e.g., prefecture tables) |
thresholds.top_expensive_queries_limit¶
The maximum number of high-cost queries to report.
Key |
Default Value |
Description |
|---|---|---|
|
|
Maximum number of queries to display in the console or report |
thresholds.zombie_table_days¶
Tables that have not been referenced by SELECT for this number of days or more are reported as zombie tables.
Key |
Default Value |
Description |
|---|---|---|
|
|
Classified as zombie if it hasn’t appeared in JOBS views in the past 90 days |
dbt Section¶
This setting identifies queries issued by dbt using BigQuery job labels and aggregates costs for each dbt model. Leave it enabled: false for the current version.
Key |
Default Value |
Description |
|---|---|---|
|
|
Whether to enable dbt integration |
|
|
Label key name to identify the dbt model |
Pydantic Schema¶
The configuration file is parsed into the Pydantic model dwh_auditor.config.AppConfig on the Python side, so invalid values (negative fees, threshold of 0, etc.) are detected as validation errors at startup.
from dwh_auditor.config import load_config
config = load_config("config.yaml")
print(config.pricing.tb_scan_usd) # 6.25
print(config.thresholds.zombie_table_days) # 90
For details, please refer to dwh_auditor.config — Load configuration files and manage DWH settings.