dwh_auditor.extractor — BigQuery metadata extraction layer

The dwh_auditor.extractor package is the only layer that takes metadata from BigQuery’s INFORMATION_SCHEMA and transforms it into a Pydantic model.

Warning

Direct import of the google.cloud.bigquery library is limited to bigquery.py in this package. Never import from Analyzer/Reporter/CLI. Due to this restriction, when testing, you can only mock BigQueryExtractor and test all other layers without mocking.

Test method:

def test_get_job_history(mocker):
    # Only mock google.cloud.bigquery.Client
    mock_client = mocker.patch("dwh_auditor.extractor.bigquery.bq.Client")
    mock_client.return_value.query.return_value.result.return_value = [
        {"job_id": "j1", "user_email": "u@e.com", ...}
    ]
    extractor = BigQueryExtractor(project_id="my-project", region="region-us")
    jobs = extractor.get_job_history(days=30)
    assert len(jobs) == 1

BigQuery metadata extraction layer.

Warning: Only this module can import google.cloud.bigquery. It should not be imported directly from other modules (analyzer/, reporter/, main.py).

class dwh_auditor.extractor.bigquery.BigQueryExtractor(target_project_id, job_project_ids, region)[source]

Bases: object

BigQuery メタデータ抽出クラス (v0.2.6 API 準拠).

Parameters:
  • target_project_id (str)

  • job_project_ids (list[str])

  • region (str)

__init__(target_project_id, job_project_ids, region)[source]
Parameters:
  • target_project_id (str)

  • job_project_ids (list[str])

  • region (str)

Return type:

None

get_top_cost_jobs(**kwargs)[source]
Parameters:
Return type:

Any

get_heavy_scan_jobs(**kwargs)[source]
Parameters:
Return type:

Any

get_recurring_cost_jobs(**kwargs)[source]
Parameters:
Return type:

Any

get_table_usage_stats(**kwargs)[source]
Parameters:
Return type:

Any

get_table_storage(**kwargs)[source]
Parameters:
Return type:

Any