BigQuery – Ingestion Time Partitioning
Also read Partitioning Overview in BigQuery , Bigquery Partitioning Overview - Clustering versus Partitioning and File Formats for Loading Data into BigQuery
Ingestion time partitioning
Step 1 - Define the partition boundary - hourly, daily, weekly, monthly, yearly. These are all in UTC time.
Step 2 - When you create a table partitioned by ingestion time, BigQuery automatically assigns rows to partitions based on the time when BigQuery ingests the data.
- An ingestion-time partitioned table has a pseudocolumn named
_PARTITIONTIME
. - The value of this column is the ingestion time for each row, truncated to the partition boundary (such as hourly or daily).
For example, suppose that you create an ingestion-time partitioned table with hourly partitioning and send data at the following times:
Ingestion time | _ |
Partition (hourly) |
---|---|---|
2021-05-07 17:22:00 | 2021-05-07 17:00:00 | 2021050717 |
2021-05-07 17:40:00 | 2021-05-07 17:00:00 | 2021050717 |
2021-05-07 18:31:00 | 2021-05-07 18:00:00 | 2021050718 |
Because the table in this example uses hourly partitioning, the value of _PARTITIONTIME
is truncated to an hour boundary. BigQuery uses this value to determine the correct partition for the data.
You can also write data to a specific partition. For example, you might want to load historical data or adjust for time zones. You can use any valid date between 0001-01-01 and 9999-12-31.
Instead of using _PARTITIONTIME
, you can also use _PARTITIONDATE
. The _PARTITIONDATE
pseudocolumn contains the UTC date corresponding to the value in the _PARTITIONTIME
pseudocolumn.
Summary
Partitioning bigquery tables can improve query performance.
Leave a Reply