Overview - Why Link Google Analytics with BigQuery

  • Sampling and Reporting API Limitations - If you are using the analytics API, there is a query limitation (10,000 requests per day per view).
  • No more need to do Sampling
  • Unlimited dimensions
  • Create your own Channel Grouping
  • Combine Google Analytics data with third party data sources and Visualization Tools

BigQuery Linkage to Google Analytics and Exporting data to BigQuery

  1. Step 1- You will need a project (preferably inside a GCP Folder - called Development or Analytics or something). You will need billing enabled on the project as well as the BigQuery API enabled.
  2. Step 2 - With the above,  go into Analytics and 'link to the project containing BigQuery'
  3. Step 3 - Once linked, we enable BigQuery Export from Analytics Admin.

 Sampling and Reporting - aka Querying the data via the API

If  queries need to be run against the API (e.g via PHP or Java code), there is a limitation once the sandbox is exceeded.

What is the cost associated with linking BigQuery to GA 360?

Free Tier - BigQuery sandbox provides you with up to 1 terabyte per month of query capacity and 10GB of free storage. All tables and partitions have a 60-day retention policy. Some features of BigQuery are not included in the sandbox (DML, Streaming, Data Transfer Service).

Paid Tier - The cost starts at a whopping $150,000 per year (though I could not get Google to confirm).

Step 1 - Using a Service Account to authenticate to Google APIs

Service accounts are special Google accounts that can be used by applications to access Google APIs programmatically via OAuth 2.0. A service account uses an OAuth 2.0 flow that does not require human authorization. Instead, it uses a key file that only your application can access.

Note: The default credentials are obtained when you install gCloud SDK on your worksation.

gcloud auth application-default login

Step 3 - Linking Analytics to BigQuery and Exporting the data

Follow this guide

Once the linkage is complete, data should start flowing to your BigQuery project within 24 hours. 1 file will be exported each day that contains the previous day’s data (generally, during the morning of the time zone you set for reporting), and 3 files will be exported each day that contain the current day's data. We will provide a historical export of the smaller of 10 billion hits or 13 months of data within 4 weeks after the integration is complete.

Each 'View' (which can be the analytics for a single website) maps to a single dataset in BigQuery.

Here is a sample using R (python would be similar)

library(bigQueryR)

## go through Google oAuth2 flow
## needs email that has access to the BigQuery dataset
bqr_auth()

## get lists of your project and datasets
bqr_list_projects()
bqr_list_datasets("project-id")a

Appendix - BigQuery IAM Roles 

Basic roles for projects are granted or revoked through the Google Cloud Console. When a project is created, the Owner role is granted to the user who created the project.

When BigQuery receives a call from an identity (either a user, a group, or a service account) that is assigned a basic role, BigQuery interprets that basic role as a member of a special group. Special group membership grants the identity permissions that are assigned to a BigQuery IAM role. The following chart identifies the special group and role for each basic role:

Basic role Assigned group BigQuery IAM Role
roles/viewer access.specialGroup: projectReaders roles/bigquery.dataViewer
roles/editor access.specialGroup: projectWriters roles/bigquery.dataEditor
roles/owner access.specialGroup: projectOwners roles/bigquery.dataOwner

The following basic roles apply at the dataset level.

Capabilities
READER
  • Can read, query, copy or export tables in the dataset. Can read routines in the dataset
    • Can call get on the dataset
    • Can call get and list on tables in the dataset
    • Can call get and list on routines in the dataset
    • Can call list on table data for tables in the dataset
  • Maps to the bigquery.dataViewer predefined role
WRITER
OWNER

Note: A dataset must have at least one entity with the OWNER role. A user with the OWNER role can't remove their own OWNER role.