Both Clustering and Partitioning are used to improve storage layouts and hence, overall query performance.

BigQuery - Partitioning versus Clustering

High Cardinality (> 1000) --> Use Clustering

Low Cardinality (<1000) --> Use Partitioning

Cardinality refers to the number of unique values for a column. (e.g. Months of the year - cardinality 12)

Prefer clustering over partitioning under the following circumstances:

  • Partitioning results in a small amount of data per partition (approximately less than 1 GB).
  • Partitioning results in a large number of partitions beyond the limits on partitioned tables.
  • Partitioning results in your mutation operations modifying most partitions in the table frequently (for example, every few minutes).

Combine the two

You can also combine partitioning with clustering. Data is first partitioned and then data in each partition is clustered by the clustering columns.

When the table is queried, partitioning sets an upper bound of the query cost based on partition pruning.

There might be other query cost savings when the query actually runs, because of cluster pruning.