AWS Redshift Data Warehouse

Introduction

Amazon Redshift is one of the most popular, scalable, easy to manage data warehouse options for those wanting to setup a Data Warehouse on the cloud. Read more about the features and how to get started with the popular platform from AWS here.

In case you wish to use this platform to host your data that you are aggregating using DataChannel, you can either setup a cluster of your own or use a DataChannel Managed Redshift Warehouse. This document will show you how you can connect / provision a Redshift based warehouse in minutes using the DataChannel Platform.

Self Managed Redshift Cluster

Prerequisites for connecting your Redshift Cluster

If you don’t already have an AWS account, create and activate an AWS account. Set up your data warehouse with AWS Redshift. The AWS Redshift console will allow you to create and manage your Redshift resources.
Connect to your AWS Redshift as an admin user ( Or any user which has permissions to create databases, users and schemas). Using the AWS Redshift console you can launch query editor v2 by clicking on the Query data. The query editor v2 connects from your client machine to the AWS Redshift environment. You need to create a database to be used for DataChannel by running this query:

---
create database <DATABASE_NAME>;
---

Substitute the placeholder in the above query with the desired Database name. You may choose any name for your Database but keep in mind that the Database name should be 1-64 characters. Valid characters are lowercase alphanumeric characters. This database is where DataChannel will actually create tables, load data, and run queries.

If you don’t have access to query editor v2, you can use any database client ( E.g Dbeaver) to connect to your redshift and run the same queries.

By default, only the admin user that you created when you created your AWS Account has access to the resources that you have created. Thus, to grant DataChannel access to your resources you will need to create a new user. At the time of creating a new DataChannel user, you will need to specify a name and a password for this new user. The password for the user must have 8–64 characters, and it must include at least one uppercase letter, one lowercase letter, and one numeral. A database comprises one or more schemas which contain tables and other database objects. Schemas help to organize database objects into logical groups so that their management becomes easier. So, on running this query using the Query Editor v2/ your Database Client, you can connect to this database and create a new user and schema.Substitute the placeholder in the above query with the desired Database name.

---
 create user <USERNAME> password '<PASSWORD>';
 create schema <SCHEMA_NAME> authorization <USERNAME>; # This will create and give ownership of the schema to this user
---

Substitute the placeholders in the above query with the desired Username, password and Schema name. Please make a note of these as you will require them when configuring your warehouse with DataChannel.

Ensure that you’ve created an S3 bucket in your AWS Redshift Warehouse. You can use the following steps to create an Amazon S3 Bucket.
1. Use your AWS account’s email address and password to Sign in to your AWS Management Console.
2. Open the Console Home page.
3. In the Services search Box, search for S3.
4. From the search results, select S3.
5. Choose Buckets from the Amazon S3 menu in the left navigation pane and then choose the Create bucket button.
6. Enter a name for your bucket.
7. Select the AWS Region where you would like your bucket to be created.
8. You may choose to enable or disable Bucket Versioning.
9. Navigate to the bottom of the page and choose Create bucket.

Please make a note of your Amazon S3 Bucket Name and AWS Region as these will be needed when configuring your warehouse with DataChannel.

Your AWS Management Console can be used to view your S3 Bucket Name, AWS Location and to create the Access Key and Secret Key. Navigate to the IAM console, and move to the Users page.Choose the username, and then click on Security credentials to view/ create access key and secret key for the profile name matching the user name.

Save this information and keep it handy when configuring your warehouse, to allow DataChannel to interact with your AWS Redshift resources. For more information on creating and managing credentials click here.

Tips to ensure smooth functioning

Create a separate schema for the data coming in from DataChannel.
Create two users, one for loading the data and another user for reading the data from BI tools etc. This will assist you in configuring Workload Management and prevent queuing up of your queries.

To find your AWS Redshift Details

You’ll be needing certain AWS redshift details when configuring your Data Warehouse at DataChannel:

Open your AWS admin console.
Check the Top Banner to make sure that you are in the correct region.
Search for Redshift in the search box and from the search results, click on "Amazon Redshift".
This will list all the clusters that you have in your current region. Click on the cluster that you want to use.
Copy the "Endpoint URL" from the General Information. This will have a format: {cluster_url}:{port}/dev. cluster_url is your Host Name. The “Endpoint URL” also gives you the details of Amazon Redshift Database Port Number (see format).
You now have the required AWS Redshift details to enter into the fields on the DataChannel Console namely, Username, Password, Host Name, Port number, Database Name and Schema Name.

To find your S3 Bucket Details

You’ll also be needing certain S3 Bucket details when configuring your Data Warehouse at DataChannel. You can get the bucket details by following these steps:

Open your AWS admin console.
Check the Top Banner to make sure that you are in the correct region.
Search for ‘Bucket’ in the search box on the Top Banner and click on ‘S3’.
This will list all the buckets you have in your account. Click on the bucket that you want to use.
Click on the properties tab. This will show the bucket details such as Bucket region. Copy the region code. This is your AWS S3 location.
For finding your AWS secret key and access key, refer AWS Documentation
You now have the required S3 details to enter into the fields on the DataChannel Console namely, Bucket name, AWS S3 Location, Access Key and Secret Key.

Additional Steps to be followed if you are using a Firewall

In case you are using a firewall, then you might need to edit inbound rules to allow Datachannel IP’s to connect to your AWS network. You can follow these steps:

Open your AWS admin console.
Check the Top Banner to make sure that you are in the correct region.
Search for Redshift in the search box and from the search results, click on "Amazon Redshift".
This will list all the clusters that you have in your current region. Click on the cluster that you want to use.
Open your cluster details and click on properties.
In the "Network and security settings" section, click on the VPC Security group. This will open the VPC security group details page.
Click on "Edit inbound rules".
Click on Add rule. Make sure the Type is "Custom TCP". Enter your port number in the "Port range" field.
Enter the DataChannel IP in Source.
Please note that you will have to do the above steps for all the datachannel IPs.
When finished, click on Save rules.

Step By Step Guide

Step 1: Click on Data Warehouses tab in the left side bar navigation to reach the Data Warehouses Module as shown below.
Step 2: Click on Add New to add a new Data Warehouse to your account.
Step 3: Select Redshift from the listed Warehouse options.
Step 4: You may choose to opt for a DataChannel Managed warehouse.
Step 5: Else, you may choose to connect your own warehouse.You would need to have credentials ready to connect your data warehouse.

Option 1 : Adding your own Redshift Data warehouse to your DataChannel Account

Step 6: Enter the details for your AWS Redshift and S3 Bucket in the form and click on Validate Configuration to add the warehouse. An explanation of each of the fields in the form is given in the table below.

Field Description

Field	Description
Name	*Required* Provide a name for your warehouse. It needs to be unique across your account.
Host	*Required* Provide the hostname or end-point for the cluster.
Username	*Required* Provide a username which will be used to create the tables and load data. This user needs to have all rights on the schema that you intend to use. In case you are creating a dedicated schema for the data from DataChannel (which is recommended), then this user can be the schema owner.
Password	*Required* Provide the password for the load user.
Select Users	*Optional* Comma separated list of users who should get select rights on tables created by DataChannel using the schema and username specified by you.
Port	*Required* Provide the port number for your cluster. The default value for this is 5439 unless you have changed it while creating your redshift cluster.
DB Name	*Required* Provide the name of the database you have created in your cluster.
Schema Name	*Required* Provide the database schema where DataChannel should push the data. As mentioned above, it is recommended to create a new schema for DataChannel in your database.
Use DataChannel S3	*Required* Leave this toggle off so that you can specify your own S3 bucket.
AWS Location	*Required* Provide the AWS region where your S3 bucket has been created. This should typically be same as the region in which your Redshift cluster is hosted. Example `us-east-1`
Bucket Name	*Required* Provide name of the S3 bucket where DataChannel should copy files before loading them into your Redshift instance. Note that DataChannel does not remove the files after they have been copied into Redshift so it is advisable to use life cycle properties to manage the removal / archival of the raw files to manage S3 costs.
Access Key	*Required* Provide the access key required to access the S3 bucket using the API. Refer AWS documentation here to learn how to manage your access keys.
Secret Key	*Required* Provide the secret key required to access the S3 bucket using the API. Refer AWS documentation here to learn how to manage your secret keys.
IAM Role	*Required* Provide the IAM role required to access the S3 bucket using the API. Refer AWS documentation here to learn how to create and manage your IAM Role.

Name

Required

Provide a name for your warehouse. It needs to be unique across your account.

Host

Required

Provide the hostname or end-point for the cluster.

Username

Required

Provide a username which will be used to create the tables and load data. This user needs to have all rights on the schema that you intend to use. In case you are creating a dedicated schema for the data from DataChannel (which is recommended), then this user can be the schema owner.

Password

Required

Provide the password for the load user.

Select Users

Optional

Comma separated list of users who should get select rights on tables created by DataChannel using the schema and username specified by you.

Port

Required

Provide the port number for your cluster. The default value for this is 5439 unless you have changed it while creating your redshift cluster.

DB Name

Required

Provide the name of the database you have created in your cluster.

Schema Name

Required

Provide the database schema where DataChannel should push the data. As mentioned above, it is recommended to create a new schema for DataChannel in your database.

Use DataChannel S3

Required

Leave this toggle off so that you can specify your own S3 bucket.

AWS Location

Required

Provide the AWS region where your S3 bucket has been created. This should typically be same as the region in which your Redshift cluster is hosted. Example us-east-1

Bucket Name

Required

Provide name of the S3 bucket where DataChannel should copy files before loading them into your Redshift instance. Note that DataChannel does not remove the files after they have been copied into Redshift so it is advisable to use life cycle properties to manage the removal / archival of the raw files to manage S3 costs.

Access Key

Required

Provide the access key required to access the S3 bucket using the API. Refer AWS documentation here to learn how to manage your access keys.

Secret Key

Required

Provide the secret key required to access the S3 bucket using the API. Refer AWS documentation here to learn how to manage your secret keys.

IAM Role

Required

Provide the IAM role required to access the S3 bucket using the API. Refer AWS documentation here to learn how to create and manage your IAM Role.

Option 2: Adding a DataChannel Managed Redshift Data warehouse to your DataChannel Account

Step 6: The process to add a DataChannel managed warehouse is very simple. Just fill in your Region and click on Save.

Still have Questions?

We’ll be happy to help you with any questions you might have! Send us an email at info@datachannel.co.

Subscribe to our Newsletter for latest updates at DataChannel.