dwh_auditor.models — Internal data model for DWH auditing (Pydantic)

The dwh_auditor.models package defines the “type contract” when each layer (Extractor → Analyzer → Reporter) passes data. By using the Pydantic model rather than passing dict types directly, we achieve both static type checking and runtime validation.


Query job model (models.job)

BigQuery query job data model.

class dwh_auditor.models.job.QueryJob(*, job_id, user_email, query, creation_time, total_bytes_billed=0, cache_hit=False, referenced_tables=<factory>, statement_type='SELECT')[source]

Bases: BaseModel

A model that represents query job history in BigQuery.

It serves as the data contract that the Extractor layer retrieves from BQ’s INFORMATION_SCHEMA.JOBS and passes it to the Analyzer layer.

Parameters:
job_id: str
user_email: str
query: str
creation_time: datetime
total_bytes_billed: int
cache_hit: bool
referenced_tables: list[str]
statement_type: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].


Table storage model (models.table)

BigQuery table storage data model.

class dwh_auditor.models.table.TableStorage(*, project_id, dataset_id, table_id, total_logical_bytes=0, total_physical_bytes=0, active_logical_bytes=0)[source]

Bases: BaseModel

A model that represents storage information for each BigQuery table.

It serves as a data contract that the Extractor layer retrieves from BQ’s INFORMATION_SCHEMA.TABLE_STORAGE and passes it to the Analyzer layer (zombie table detection).

Parameters:
  • project_id (str)

  • dataset_id (str)

  • table_id (str)

  • total_logical_bytes (int)

  • total_physical_bytes (int)

  • active_logical_bytes (int)

project_id: str
dataset_id: str
table_id: str
total_logical_bytes: int
total_physical_bytes: int
active_logical_bytes: int
property full_table_id: str

Fully qualified name concatenated with project dataset table name.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].


Analysis result model (models.result)

Data model of analysis results.

Type definition for diagnostic results that the Analyzer layer passes to the Reporter layer.

class dwh_auditor.models.result.CostInsight(*, job, estimated_cost_usd, scanned_tb)[source]

Bases: BaseModel

Analysis results of high cost queries.

Parameters:
job: QueryJob
estimated_cost_usd: float
scanned_tb: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.FullScanInsight(*, job, scanned_gb)[source]

Bases: BaseModel

Analysis results of queries determined to be full scans.

Parameters:
job: QueryJob
scanned_gb: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.TableUsageProfile(*, table, is_zombie, last_accessed_at=None, top_users=<factory>, access_count_30d=0, size_gb)[source]

Bases: BaseModel

テーブルの利用状況プロファイルとゾンビ判定結果.

Parameters:
table: TableStorage
is_zombie: bool
last_accessed_at: datetime | None
top_users: list[str]
access_count_30d: int
size_gb: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.RecurringCostInsight(*, query_hash, query_sample, execution_count, total_estimated_usd, total_scanned_tb, last_executed_at)[source]

Bases: BaseModel

バッチやBI等から定常的に実行されている高コストクエリの分析結果.

Parameters:
  • query_hash (str)

  • query_sample (str)

  • execution_count (int)

  • total_estimated_usd (float)

  • total_scanned_tb (float)

  • last_executed_at (datetime)

query_hash: str
query_sample: str
execution_count: int
total_estimated_usd: float
total_scanned_tb: float
last_executed_at: datetime
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.AuditResult(*, analyzed_days, project_id, total_jobs_analyzed, total_tables_analyzed, top_expensive_queries=<factory>, recurring_expensive_queries=<factory>, full_scans=<factory>, table_profiles=<factory>)[source]

Bases: BaseModel

Comprehensive audit result report finally output by the Analyzer layer.

Parameters:
analyzed_days: int
project_id: str
total_jobs_analyzed: int
total_tables_analyzed: int
top_expensive_queries: list[CostInsight]
recurring_expensive_queries: list[RecurringCostInsight]
full_scans: list[FullScanInsight]
table_profiles: list[TableUsageProfile]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].