JSON

This pipeline can be used to request and retrieve JSON files/folders (consisting similar structured JSON files). The transfer is enabled irrespective of whether JSON files are compressed or not.

Configuring the Credentials

Select the account credentials which has access to relevant Azure Blob Storage data from the dropdown menu & Click Next

Credentials not listed in dropdown ?: Click on + Add New for adding new credentials. Give your credentials a name, enter the Storage Account Name, Access Key, and Endpoint Suffix and click on Save.

Data Pipelines Details

Data Pipeline: Select JSON from the dropdown

Setting Parameters

Select the fields that are necessary as per the file or folder .

Parameter	Description	Values
Folder Path	*Required* Points to the path along which the files are present	String value (eg:folder/subfolder)
File Name	*Required* Specify the File Name. In cases where the user doesn’t remember complete name of file, specify file name match type using the operator which takes values as 'Exact, Startswith, Endswith, and Contains'.	String value (eg:abc.csv) *Default Value:* Exact (For the operator)
Process All Matching Files	*Required* Select Yes or No, depending on if all matching files are to be processed or not	{Yes,No} *Default Value:* No
File Selection Criteria *Dependant*	*Required* *(If Process All Files in Folder = NO)* Choose File’s creation or modification Date	{Date Created,Date Modified} *Default Value:* Date Created
Attempt Schema Inference	*Required* If Yes then value types will be fetched as it is, eg: Float will be fetched as float. If No then everything will be fetched as string irrespective of its type.	{Yes,No} *Default Value:* No
Insert Mode	*Required* Specifies the manner in which data will get updated in the data warehouse : Upsert will insert only new records or records with changes, Append will insert all fetched data at the end, Replace will drop the existing table and recreate a fresh one on each run.	{Upsert,Append,Replace} *Default Value:* Replace
Key *Dependant*	*Required* *(If Upsert is chosen as the Insert Mode Type)* Enter the column name based on which data is to be upserted.	String value
Container Name	*Required* Enter the container name in lowercase.	String value
Compressed	*Required* Choose Yes or No depending on the file compression	{Yes,No} *Default Value:* No
Compression Type *Dependant*	*Required (If Compressed = Yes)* Specify the file compression type	{Zip,Gzip}
Post Processing Actions	*Required* Actions to be performed once the file processing has been completed	{No Action,Move Files} *Default Value:* No Action
Move File Destination *Dependant*	*Required* *(If Post Processing Actions = Move Files)* Specify the folder where the files are to be moved	String value (eg:test_folder/)
Include Source File Name	*Required* Set this parameter to 'YES' if you want to include source file name in the data warehouse.	{Yes,No} *Default Value:* No
Content Structure	*Required* This refers to the manner in which your data in a file can be structured. In the case of Newline Delimited JSON, each line in the file represents a JSON object. On the other hand, for the Array of Objects structure, the data is wrapped within an array.	{Newline Delimited JSON, Array of Objects} *Default Value:* Newline Delimited JSON

Parameter

Description

Values

Folder Path

Required

Points to the path along which the files are present

String value (eg:folder/subfolder)

File Name

Required

Specify the File Name. In cases where the user doesn’t remember complete name of file, specify file name match type using the operator which takes values as 'Exact, Startswith, Endswith, and Contains'.

String value (eg:abc.csv)

Default Value: Exact (For the operator)

Process All Matching Files

Required

Select Yes or No, depending on if all matching files are to be processed or not

{Yes,No}

Default Value: No

File Selection Criteria

Dependant

Required

(If Process All Files in Folder = NO)

Choose File’s creation or modification Date

{Date Created,Date Modified}

Default Value: Date Created

Attempt Schema Inference

Required

If Yes then value types will be fetched as it is, eg: Float will be fetched as float. If No then everything will be fetched as string irrespective of its type.

{Yes,No}

Default Value: No

Insert Mode

Required

Specifies the manner in which data will get updated in the data warehouse : Upsert will insert only new records or records with changes, Append will insert all fetched data at the end, Replace will drop the existing table and recreate a fresh one on each run.

{Upsert,Append,Replace}

Default Value: Replace

Key

Dependant

Required

(If Upsert is chosen as the Insert Mode Type)

Enter the column name based on which data is to be upserted.

String value

Container Name

Required

Enter the container name in lowercase.

String value

Compressed

Required

Choose Yes or No depending on the file compression

{Yes,No}

Default Value: No

Compression Type

Dependant

Required (If Compressed = Yes)

Specify the file compression type

{Zip,Gzip}

Post Processing Actions

Required

Actions to be performed once the file processing has been completed

{No Action,Move Files}

Default Value: No Action

Move File Destination

Dependant

Required

(If Post Processing Actions = Move Files)

Specify the folder where the files are to be moved

String value (eg:test_folder/)

Include Source File Name

Required

Set this parameter to 'YES' if you want to include source file name in the data warehouse.

{Yes,No}

Default Value: No

Content Structure

Required

This refers to the manner in which your data in a file can be structured. In the case of Newline Delimited JSON, each line in the file represents a JSON object. On the other hand, for the Array of Objects structure, the data is wrapped within an array.

{Newline Delimited JSON, Array of Objects}

Default Value: Newline Delimited JSON

Datapipeline Scheduling

Scheduling specifies the frequency with which data will get updated in the data warehouse. You can choose between Manual Run, Normal Scheduling or Advance Scheduling.

Manual Run: If scheduling is not required, you can use the toggle to run the pipeline manually.
Normal Scheduling: Use the dropdown to select an interval-based hourly, monthly, weekly, or daily frequency.
Advance Scheduling: Set schedules fine-grained at the level of Months, Days, Hours, and Minutes.

Detailed explanation on scheduling of pipelines can be found here

Dataset & Name

Dataset Name: Key in the Dataset Name(also serves as the table name in your data warehouse).Keep in mind, that the name should be unique across the account and the data source. Special characters (except underscore _) and blank spaces are not allowed. It is best to follow a consistent naming scheme for future search to locate the tables.
Dataset Description: Enter a short description (optional) describing the dataset being fetched by this particular pipeline.
Notifications: Choose the events for which you’d like to be notified: whether "ERROR ONLY" or "ERROR AND SUCCESS".

Once you have finished click on Finish to save it. Read more about naming and saving your pipelines including the option to save them as templates here

Still have Questions?

We’ll be happy to help you with any questions you might have! Send us an email at info@datachannel.co.

Subscribe to our Newsletter for latest updates at DataChannel.