dwh_auditor.models — Internal data model for DWH auditing (Pydantic)¶

The dwh_auditor.models package defines the “type contract” when each layer (Extractor → Analyzer → Reporter) passes data. By using the Pydantic model rather than passing dict types directly, we achieve both static type checking and runtime validation.

Query job model (`models.job`)¶

BigQuery query job data model.

class dwh_auditor.models.job.QueryJob(*, job_id, user_email, query, creation_time, total_bytes_billed=0, cache_hit=False, referenced_tables=<factory>, statement_type='SELECT')[source]¶

Bases: BaseModel

A model that represents query job history in BigQuery.

It serves as the data contract that the Extractor layer retrieves from BQ’s INFORMATION_SCHEMA.JOBS and passes it to the Analyzer layer.

Parameters:

job_id (str)
user_email (str)
query (str)
creation_time (datetime)
total_bytes_billed (int)
cache_hit (bool)
referenced_tables (list[str])
statement_type (str)

job_id: str¶

user_email: str¶

query: str¶

creation_time: datetime¶

total_bytes_billed: int¶

cache_hit: bool¶

referenced_tables: list[str]¶

statement_type: str¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Table storage model (`models.table`)¶

BigQuery table storage data model.

class dwh_auditor.models.table.TableStorage(*, project_id, dataset_id, table_id, total_logical_bytes=0, total_physical_bytes=0, active_logical_bytes=0)[source]¶

Bases: BaseModel

A model that represents storage information for each BigQuery table.

It serves as a data contract that the Extractor layer retrieves from BQ’s INFORMATION_SCHEMA.TABLE_STORAGE and passes it to the Analyzer layer (zombie table detection).

Parameters:

project_id (str)
dataset_id (str)
table_id (str)
total_logical_bytes (int)
total_physical_bytes (int)
active_logical_bytes (int)

project_id: str¶

dataset_id: str¶

table_id: str¶

total_logical_bytes: int¶

total_physical_bytes: int¶

active_logical_bytes: int¶

property full_table_id: str¶: Fully qualified name concatenated with project dataset table name.

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Analysis result model (`models.result`)¶

Data model of analysis results.

Type definition for diagnostic results that the Analyzer layer passes to the Reporter layer.

class dwh_auditor.models.result.CostInsight(*, job, estimated_cost_usd, scanned_tb)[source]¶

Bases: BaseModel

Analysis results of high cost queries.

Parameters:

job (QueryJob)
estimated_cost_usd (float)
scanned_tb (float)

job: QueryJob¶

estimated_cost_usd: float¶

scanned_tb: float¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.FullScanInsight(*, job, scanned_gb)[source]¶

Bases: BaseModel

Analysis results of queries determined to be full scans.

Parameters:

job (QueryJob)
scanned_gb (float)

job: QueryJob¶

scanned_gb: float¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.TableUsageProfile(*, table, is_zombie, last_accessed_at=None, top_users=<factory>, access_count_30d=0, size_gb)[source]¶

Bases: BaseModel

テーブルの利用状況プロファイルとゾンビ判定結果.

Parameters:

table (TableStorage)
is_zombie (bool)
last_accessed_at (datetime | None)
top_users (list[str])
access_count_30d (int)
size_gb (float)

table: TableStorage¶

is_zombie: bool¶

last_accessed_at: datetime | None¶

top_users: list[str]¶

access_count_30d: int¶

size_gb: float¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.RecurringCostInsight(*, query_hash, query_sample, execution_count, total_estimated_usd, total_scanned_tb, last_executed_at)[source]¶

Bases: BaseModel

バッチやBI等から定常的に実行されている高コストクエリの分析結果.

Parameters:

query_hash (str)
query_sample (str)
execution_count (int)
total_estimated_usd (float)
total_scanned_tb (float)
last_executed_at (datetime)

query_hash: str¶

query_sample: str¶

execution_count: int¶

total_estimated_usd: float¶

total_scanned_tb: float¶

last_executed_at: datetime¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class dwh_auditor.models.result.AuditResult(*, analyzed_days, project_id, total_jobs_analyzed, total_tables_analyzed, top_expensive_queries=<factory>, recurring_expensive_queries=<factory>, full_scans=<factory>, table_profiles=<factory>)[source]¶

Bases: BaseModel

Comprehensive audit result report finally output by the Analyzer layer.

Parameters:

analyzed_days (int)
project_id (str)
total_jobs_analyzed (int)
total_tables_analyzed (int)
top_expensive_queries (list[CostInsight])
recurring_expensive_queries (list[RecurringCostInsight])
full_scans (list[FullScanInsight])
table_profiles (list[TableUsageProfile])

analyzed_days: int¶

project_id: str¶

total_jobs_analyzed: int¶

total_tables_analyzed: int¶

top_expensive_queries: list[CostInsight]¶

recurring_expensive_queries: list[RecurringCostInsight]¶

full_scans: list[FullScanInsight]¶

table_profiles: list[TableUsageProfile]¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

dwh_auditor.models — Internal data model for DWH auditing (Pydantic)¶

Query job model (models.job)¶

Table storage model (models.table)¶

Analysis result model (models.result)¶

Query job model (`models.job`)¶

Table storage model (`models.table`)¶

Analysis result model (`models.result`)¶