Google BigQuery

Introduction

Google BigQuery is one of the most popular, scalable, easy to manage data warehouse options for those wanting to setup a Data Warehouse on the cloud. Read more about the features and how to get started with the popular platform here.

In case you wish to use this platform to host your data that you are aggregating using DataChannel, you can either setup a cluster of your own or use a DataChannel Managed BigQuery Warehouse. This document will show you how you can connect / provision a BigQuery based warehouse in minutes using the DataChannel Platform.

Self Managed Google BigQuery Warehouse

Prerequisites for connecting your BigQuery Cluster
  • Ensure that you’ve created a Google Cloud project in your BigQuery Warehouse. The project name, project ID and project number will be visible to you on the Dashboard page of your Google Cloud console. Keep this project information handy when configuring your Warehouse to allow DataChannel to interact with your Google Cloud resources. For more information on creating and managing projects click here.

  • Create dataset(s) in BigQuery using the Google Cloud console. The Dataset ID, Dataset name and location must be known to you when configuring DataChannel. You may use the Google Cloud Console Explorer panel to expand a project name to see the datasets in that project, or use the search box to search by dataset name. For more information on creating datasets click here.

  • Create a Google cloud storage bucket in the same region as your cluster. DataChannel will need you to share information about the Bucket name where you hold your data in Google Cloud Storage. You can create a bucket using the Google Cloud console from the Cloud Storage Buckets page. For more information on creating buckets click here.

  • Ensure the required permissions as discussed in the next section have been granted.

Permissions

DataChannel will need certain permissions to access your projects, folders and objects in order to write to/ edit them when connecting using pipelines / syncs. Since you are the owner/administrator of your BigQuery Warehouse, you will need to manage access to your resources. You can do this by either of the two ways:

big query permissions 6
  • Option 1: Grant the following roles to the DataChannel Principal:

    • BigQuery Data Editor

    • Storage Admin

    • BigQuery Job User

  • Option 2: You can create a custom role granting the required privileges/ permissions and associate it to the DataChannel Principal. Creating a custom role will help you to ensure that we have only the permissions we need, thus enabling you to control access at a more granular level (rather than the much broader predefined roles).

You can read and understand more about creating custom roles by clicking here. This Video Tutorial by Google Cloud Tech will also prove useful to educate you about Managing access with cloud IAM.

The following permissions are needed by DataChannel. What each permission implies is also given alongside:

  1. bigquery.config.get - Get details about a configuration.

  2. bigquery.datasets.get - Get metadata about a dataset.

  3. bigquery.jobs.create - Run jobs (including queries) within the project.

  4. bigquery.tables.create - Create new tables.

  5. bigquery.tables.delete - Delete tables.

  6. bigquery.tables.get - Get table metadata.

  7. bigquery.tables.getData - Get table data. This permission is required for querying table data.

  8. bigquery.tables.list - List tables and metadata on tables.

  9. bigquery.tables.updateData - Update table data.

  10. bigquery.tables.update - Update table metadata.

  11. resourcemanager.projects.get - View the name of a project’s Cloud Storage service agent

  12. storage.buckets.get - Returns metadata for the specified bucket.

  13. storage.multipartUploads.abort - Abort multipart upload sessions.

  14. storage.multipartUploads.create - Upload objects in multiple parts.

  15. storage.multipartUploads.list - List the multipart upload sessions in a bucket.

  16. storage.multipartUploads.listParts - List the uploaded object parts in a multipart upload session.

  17. storage.objects.create - Add new objects to a bucket.

  18. storage.objects.delete - Deletes an object and its metadata.

  19. storage.objects.get - Read object data and metadata, excluding ACLs.

  20. storage.objects.list - Retrieves a list of objects.

  21. storage.objects.update - Updates the complete metadata of an object.

  22. bigquery.routines.get - Get routine definitions and metadata.

  23. bigquery.routines.list - List routines and metadata on routines.

You can read more information about what each of these permissions imply by clicking BigQuery permissions and IAM permissions .

Steps to Create a Custom Role and Assign Permissions to it

  1. Go to Google Cloud Console and sign in using your work account credentials.

  2. From the list, select the name of the Project where your data resides . This is where you want to create a Custom role.

  3. If your warehouse contains data of more than one organization and the project is not visible in the list, you may be looking in the wrong organization. You can change the name of the organization from the dropdown.

    big query permissions 1
  4. In the Google Cloud console, navigate to IAM & Admin menu and go to the Roles page.

    big query permissions 2
  5. Click on +Create Role.

    big query permissions 3
  6. Enter a Name, Title, Description, and Role launch stage for the role. The role name cannot be changed after the role is created. It is advisable to choose a role title that you will recognise easily later For example: “Datachannel_custom_access”. Role Titles can contain uppercase and lowercase alphanumeric characters and symbols. Also, give a description of the role that helps you understand the privileges provided to the custom role. We recommend that you use the Default Role launch stage.

    big query permissions 4
  7. Click on +Add Permissions.

  8. Select the permissions you want to include in the role(as listed above) and click Add. To search a particular permission, use the ‘Filter by property name or value’ box as shown. Select using the checkbox on the left.

    big query permissions 5
  9. Once completed, click on Add.

  10. Make sure that you have added all 23 permissions required by DataChannel Finish creating the role, by clicking on Create.

Grant the Custom Role to the DataChannel Principal

You need to associate the newly created custom role (with the required privileges) with this email:

Use the following steps to do so:

  1. In the Google Cloud console, navigate to IAM & Admin menu and go to the IAM page.

    big query add principal 1
  2. Select the name of the Project where your data resides.

  3. In the ‘Permissions’ page, go to ‘View by Principals’ Page.

    big query add principal 2
  4. To grant a role to the DataChannel Principal, click person_add Grant Access.

  5. Enter the DataChannel principal’s email address:

big query add principal 3
  1. Assign the custom role (created in the previous section) to this Principal.

    big query add principal 4
  2. Thereafter, click on Save to finish. The principal is granted the selected role on the selected Project.

Step By Step Guide for adding a new BigQuery Data warehouse to your DataChannel Account

Step 1

Click on the Data Warehouses tab in the left side bar navigation to reach the Data Warehouses Module as shown below.

destinations 1
Step 2

Click on Add New to add a new Data Warehouse to your account.

Step 3

Select BigQuery from the listed Warehouse options.

destinations bq step3
Step 4

You may choose to opt for a DataChannel Managed warehouse.

dc managed bq 1
Step 5

Else, you may choose to connect your own warehouse.You would need to have credentials ready to connect your data warehouse.

destinations bq own

Option 1 : Adding your own BigQuery Data warehouse to your DataChannel Account

Step 6

Add the given email to your google cloud account and ensure that you have granted the permissions listed. Click on Next

destinations bq permissions
Step 7

Enter the details for your BigQuery Dataset in the form and click on Validate Configuration to add the warehouse. An explanation of each of the fields in the form is given in the table below. Refer to the presetup documentation above to know how to get this information for your project.

destinations bq step7
Field Description

Name

Required

Provide a name for your warehouse. It needs to be unique across your account.

Dataset Name

Required

Provide the name of your BigQuery Dataset (you can get this from the BigQuery console).

Project ID

Required

Provide the project ID for the cloud project which has the BigQuery Dataset.

BigQuery Region

Required

Provide the region / location where your Dataset is located.

BigQuery Authentication Code

Required

Click on the link Generate Authentication Code and follow the process given here to generate a code using OAuth2

Use DataChannel GCS

Required

Leave this toggle off so that you can specify your own GCS bucket.

GCS Project ID

Required

Provide the project ID where you have created the Google Cloud Storage Bucket to store the raw data files. Note that DataChannel does not remove the files after they have been copied into GCS so it is advisable to use life cycle properties to manage the removal / archival of the raw files to manage GCS costs.

GCS Bucket Name

Required

Provide the name of the cloud storage bucket you have created for DataChannel.

GCS Region Name

Required

Provide the name of the region where your GCS Bucket is stored. Note:- This should be same as the region for your BigQuery Dataset.

GCS Authentication Code

Required

Click on the link Generate Authentication Code and follow the process given here to generate a code using OAuth2

Option 2: Adding a DataChannel Managed BigQuery Data warehouse to your DataChannel Account

Step 6

In case you choose that DataChannel manage a BigQuery warehouse for you, the process to add a DataChannel managed warehouse is very simple. Just fill in your Google Account email and Region and click on save.

destinations bq dc

Still have Questions?

We’ll be happy to help you with any questions you might have! Send us an email at info@datachannel.co.

Subscribe to our Newsletter for latest updates at DataChannel.