BigQueryDataset
A dataset in Google BigQuery.
Like all Resources, this class configures itself across multiple Environments.
For more information see the official documentation.
Example Usage
1from google.cloud import bigquery
2import launchflow as lf
3
4# Automatically creates / connects to a BigQuery Dataset in your GCP project
5dataset = lf.gcp.BigQueryDataset("my_dataset")
6
7schema = [
8 bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
9 bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
10]
11table = dataset.create_table("table_name", schema=schema)
12
13dataset.insert_table_data("table_name", [{"name": "Alice", "age": 30}])
14
15# You can also use the underlying resource directly
16# For example, for a table with columns name,age
17query = f"""
18SELECT name, age
19FROM `{dataset.dataset_id}.table_name`
20WHERE age > 10
21ORDER BY age DESC
22"""
23
24for row in dataset.client().query(query):
25 print(row)
initialization
Create a new BigQuery Dataset resource.
Args:
name (str)
: The name of the dataset. This must be globally unique.location (str)
: The location of the dataset. Defaults to "US".allow_nonempty_delete (bool)
: If True, the dataset can be deleted even if it is not empty. Defaults to False.
dataset_id
1@property
2BigQueryDataset.dataset_id() -> str
Get the dataset id.
Returns:
- The dataset id.
get_table_uuid
1BigQueryDataset.get_table_uuid(table_name: str) -> str
Get the table UUID, {project_id}.{dataset_id}.{table_id}.
Args:
table_name (str)
: The name of the table.
Returns:
- The table UUID.
client
1BigQueryDataset.client() -> "bigquery.Client"
Get the BigQuery Client object.
Returns:
- The BigQuery Client object.
dataset
1BigQueryDataset.dataset() -> "bigquery.Dataset"
Get the BigQuery Dataset object.
Returns:
- The BigQuery Dataset object.
create_table
1BigQueryDataset.create_table(table_name: str, *, schema: "Optional[List[bigquery.SchemaField]]" = None) -> "bigquery.Table"
Create a table in the dataset.
Args:
table_name (str)
: The name of the table.schema (Optional[List[bigquery.SchemaField]])
: The schema of the table. Not required and defaults to None.
Returns:
- The BigQuery Table object.
Example usage:
1from google.cloud import bigquery
2import launchflow as lf
3
4dataset = lf.gcp.BigQueryDataset("my_dataset")
5
6schema = [
7 bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
8 bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
9]
10table = dataset.create_table("table_name", schema=schema)
delete_table
1BigQueryDataset.delete_table(table_name: str) -> None
Delete a table from the dataset.
Args:
table_name (str)
: The name of the table to delete.
load_table_data_from_csv
1BigQueryDataset.load_table_data_from_csv(table_name: str, file_path: Path) -> None
Load data from a CSV file into a table.
Args:
table_name (str)
: The name of the table to load the data into.file_path (Path)
: The path to the CSV file to load.
insert_table_data
1BigQueryDataset.insert_table_data(table_name: str, rows_to_insert: List[Dict[Any, Any]]) -> None
Insert in-memory data into a table. There's seems to be a bug in bigquery where if a table name is re-used (created and then deleted recently), streaming to it won't work. If you encounter an unexpected 404 error, try changing the table name.
Args:
table_name (str)
: The name of the table to insert the data into.rows_to_insert (List[Dict[Any, Any]])
: The data to insert into the table.
Raises: ValueError if there were errors when inserting the data.