dwh_auditor.analyzer — DWH cost analysis/security diagnosis logic

The dwh_auditor.analyzer package is pure Python logic that performs diagnostics by matching the Pydantic model received from the Extractor with the thresholds in config.yaml.

Note

This package does not import google.cloud.bigquery at all. ** Since there is no external API communication, unit tests complete in milliseconds. You can test it by simply passing a dummy QueryJob / TableStorage object.


Cost analysis (analyzer.cost)

Analysis logic for high cost queries.

Note: This module must not import google.cloud.bigquery at all. Configure only pure Python logic and ensure unit tests complete in milliseconds.

dwh_auditor.analyzer.cost.analyze_cost(jobs, config)[source]

Analyzes query jobs and returns rankings of high cost queries.

Jobs that hit the cache are excluded because they are charged zero.

Parameters:
  • jobs (list[QueryJob]) – List of QueryJobs (received from Extractor)

  • config (AppConfig) – AppConfig object

Returns:

List of CostInsights (descending cost, top N)

Return type:

list[CostInsight]


定常コスト分析 (analyzer.recurring)

定常実行クエリの分析ロジック.

Note: This module must not import google.cloud.bigquery at all.

dwh_auditor.analyzer.recurring.analyze_recurring_cost(raw_stats, config)[source]

定常実行されているクエリのメタデータを RecurringCostInsight のリストにマップ・計算する.

Parameters:
  • raw_stats (list[dict[str, Any]]) – Extractor から抽出した定常クエリの集計辞書

  • config (AppConfig) – しきい値やコスト単価設定

Returns:

RecurringCostInsight のリスト

Return type:

list[RecurringCostInsight]


Full scan detection (analyzer.scan)

Full scan (inefficient query) detection logic.

Note: This module must not import google.cloud.bigquery at all. Configure only pure Python logic and ensure unit tests complete in milliseconds.

dwh_auditor.analyzer.scan.detect_full_scans(jobs, tables, config)[source]

Detect queries that may result in a full scan.

バイト比率バリデーション方式: クエリの課金バイト数が参照テーブルの物理サイズの 90% 以上ならフルスキャンとみなす。

Parameters:
  • jobs (list[QueryJob]) – Extractor から抽出したパース対象のジョブ

  • tables (list[TableStorage]) – テーブルサイズ情報をルックアップするためのリスト

  • config (AppConfig) – しきい値

Returns:

FullScanInsight のリスト

Return type:

list[FullScanInsight]


Zombie table detection (analyzer.zombie)

テーブルプロファイリング・およびゾンビ(未使用)判定ロジック.

Note: This module must not import google.cloud.bigquery at all. Configure only pure Python logic and ensure unit tests complete in milliseconds.

dwh_auditor.analyzer.zombie.analyze_table_usage(tables, usage_stats, config, now=None)[source]

各テーブルのプロファイル(利用状況とゾンビ判定結果)を返す.

Extractor から受け取った軽量な usage_stats と tables を結合します。

Parameters:
  • tables (list[TableStorage]) – List of TableStorage (received from Extractor)

  • usage_stats (dict[str, dict[str, Any]]) – “project.dataset.table” をキーとする統計辞書 (Extractor から受け取る)

  • config (AppConfig) – AppConfig object

  • now (datetime | None) – 基準となる現在日時 (テスト用)

Returns:

TableUsageProfile のリスト (ストレージサイズ降順)

Return type:

list[TableUsageProfile]


Analysis Runner (analyzer.runner)

Analysis runner: Calls each Analyzer and aggregates it into an AuditResult.

Note: This module must not import google.cloud.bigquery at all.

dwh_auditor.analyzer.runner.run_analysis(top_cost_jobs, heavy_scan_jobs, recurring_stats, table_usages, tables, config, analyzed_days, project_id)[source]

Runs all analyzers and returns comprehensive audit results.

Parameters:
  • top_cost_jobs (list[QueryJob]) – 高コストランキング用のジョブ抽出結果

  • heavy_scan_jobs (list[QueryJob]) – フルスキャン検知用のジョブ抽出結果

  • recurring_stats (list[dict[str, Any]]) – 定常実行クエリの抽出結果 (dict)

  • table_usages (dict[str, dict[str, Any]]) – テーブルごとの利用統計 (dict)

  • tables (list[TableStorage]) – テーブルストレージ情報

  • config (AppConfig) – AppConfig object

  • analyzed_days (int) – Analysis period (days)

  • project_id (str) – 分析対象の自 GCP プロジェクト ID

Returns:

AuditResult object

Return type:

AuditResult