LaunchFlow

BigQueryDataset

A dataset in Google BigQuery.

Like all Resources, this class configures itself across multiple Environments.

For more information see the official documentation.

Example Usage

1from google.cloud import bigquery
2import launchflow as lf
3
4# Automatically creates / connects to a BigQuery Dataset in your GCP project
5dataset = lf.gcp.BigQueryDataset("my_dataset")
6
7schema = [
8    bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
9    bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
10]
11table = dataset.create_table("table_name", schema=schema)
12
13dataset.insert_table_data("table_name", [{"name": "Alice", "age": 30}])
14
15# You can also use the underlying resource directly
16# For example, for a table with columns name,age
17query = f"""
18SELECT name, age
19FROM `{dataset.dataset_id}.table_name`
20WHERE age > 10
21ORDER BY age DESC
22"""
23
24for row in dataset.client().query(query):
25    print(row)

initialization

Create a new BigQuery Dataset resource.

Args:

name (str): The name of the dataset. This must be globally unique.
location (str): The location of the dataset. Defaults to "US".
allow_nonempty_delete (bool): If True, the dataset can be deleted even if it is not empty. Defaults to False.

dataset_id

1@property
2BigQueryDataset.dataset_id() -> str

Get the dataset id.

Returns:

The dataset id.

get_table_uuid

1BigQueryDataset.get_table_uuid(table_name: str) -> str

Get the table UUID, {project_id}.{dataset_id}.{table_id}.

Args:

table_name (str): The name of the table.

Returns:

The table UUID.

client

1BigQueryDataset.client() -> "bigquery.Client"

Get the BigQuery Client object.

Returns:

The BigQuery Client object.

dataset

1BigQueryDataset.dataset() -> "bigquery.Dataset"

Get the BigQuery Dataset object.

Returns:

The BigQuery Dataset object.

create_table

1BigQueryDataset.create_table(table_name: str, *, schema: "Optional[List[bigquery.SchemaField]]" = None) -> "bigquery.Table"

Create a table in the dataset.

Args:

table_name (str): The name of the table.
schema (Optional[List[bigquery.SchemaField]]): The schema of the table. Not required and defaults to None.

Returns:

The BigQuery Table object.

Example usage:

1from google.cloud import bigquery
2import launchflow as lf
3
4dataset = lf.gcp.BigQueryDataset("my_dataset")
5
6schema = [
7    bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
8    bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
9]
10table = dataset.create_table("table_name", schema=schema)

delete_table

1BigQueryDataset.delete_table(table_name: str) -> None

Delete a table from the dataset.

Args:

table_name (str): The name of the table to delete.

load_table_data_from_csv

1BigQueryDataset.load_table_data_from_csv(table_name: str, file_path: Path) -> None

Load data from a CSV file into a table.

Args:

table_name (str): The name of the table to load the data into.
file_path (Path): The path to the CSV file to load.

insert_table_data

1BigQueryDataset.insert_table_data(table_name: str, rows_to_insert: List[Dict[Any, Any]]) -> None

Insert in-memory data into a table. There's seems to be a bug in bigquery where if a table name is re-used (created and then deleted recently), streaming to it won't work. If you encounter an unexpected 404 error, try changing the table name.

Args:

table_name (str): The name of the table to insert the data into.
rows_to_insert (List[Dict[Any, Any]]): The data to insert into the table.

Raises: ValueError if there were errors when inserting the data.