Load, Query And Process JSON Data In Snowflake

Load, Query And Process JSON Data In Snowflake

How to Load, Query And Proces JSON Data Into Snowflake

Introduction - JSON & Snowflake

Snowflake can load semi-structured data into database tables, such as JSON. Snowflake has excellent functionality for directly querying semi-structured data as well as flattening it into a columnar structure once the data is loaded from stage into a database table.

One of Snowflake’s key strengths is its ability to rapidly ingest both structured and unstructured data. This blog will show you how to work with JSON files in Snowflake, including how to directly query Snowflake JSON data and copy it into a Snowflake table.

Once you finish this video, you will be able to answer following questions:

  1. How to insert JSON data into snowflake using queries?
  2. How to insert multiple JSON data in the snowflake using queries?
  3. What data type should be used to store JSON data into snowflake table?
  4. How to parse JSON data in the snowflake and validate them?
  5. How to parse JSON data and get individual elements?
  6. How to parse nested JSON in snowflake and access all elements?
  7. How Snowflake handle Date & Timestamp data type?
  8. How to flatten JSON data & load elements in individual tables in snowflake?
  9. What is the role of colon & dot-notation and explicit casting?
  10. How to join JSON data with other standard table in snowflake?

What is JSON (JavaScript Object Notation)?

JSON, which is pronounced “Jay-Sawn” or “Jason,” is a well-liked data serialization format that is simple for both humans and machines to read, write, and produce. The third edition of the JavaScript Programming Language, Standard ECMA-262, is where the JSON file format had its start. It was initially created as an alternative to XML and is mostly used to convey data between a server and a web application.

Basic JSON Syntax

Rules:

- <KEY>:<VALUE> Data is in key/value pairs
- Data is separated by commas
- Objects are enclosed in curly braces ({})
- Arrays are enclosed in square brackets ([])
{
    "employee": {
      "name": "John",
      "age": 30,
      "height_in_ft": 5.11,
      "married": true,
      "has_kids": false,
      "stock_options": null,
      "phone": "+1 123-456-7890",
      "date_of_birth":"2022-12-08",
      "dob_timestamp":"2022-12-08T19:27:40.606-08:00",
      "Address": {
        "street": "4132 Grant Street",
        "city": "Rockville",
        "State": "Minnesota"
      }
    }
  }

JSON values can be:

JSON Value Can Be From Following Data Type.

  • A number
  • A string 
  • A Boolean
  • An array (The Phone Number in Above example is of an arrya type)
  • A JSON object (The address in above example is an object type)
  • Null (The stock option is an example of null value)

Load JSON Data Using Insert Queries

You can directly load JSON data into relational tables using Snowflake JSON. Then, without having to perform any modifications, you may query this data with SQL and join it to other structured data. This enables businesses to streamline their data pipelines and quicken the rate at which this data is made available for analysis.

SQL Script Used In Video

Following are the SQL script used in this video.


create or replace transient table json_tbl(
    json_col variant
);

desc table json_tbl;

-- lets insert a record
insert into json_tbl(json_col) values ('{"firstName":"John", "empid":1001}');

-- it is not possible to insert a row into variant using direct insert statement

insert into json_tbl(json_col) 
select '{"firstName":"John", "empid":1001}';

-- the above also does not work.. so we have to use 
-- a function called parse_json

insert into json_tbl(json_col) 
select parse_json('{"firstName":"John", "empid":1001}');

select * from json_tbl;

delete from json_tbl;
-- since it is a single value.. I don't need to specify 
insert into json_tbl (json_col)
select parse_json(Column1) from values 
('{"firstName":"Name-1", "empid":1001}'),
('{"firstName":"Name-2", "empid":1002}'),
('{"firstName":"Name-3", "empid":1003}'),
('{"firstName":"Name-4", "empid":1004}'),
('{"firstName":"Name-5", "empid":1005}')
;

-- if you observer, the order of field changed..

-- so it is not easy to insert the json data via insert statement.
delete from json_tbl;
insert into json_tbl(json_col)
select parse_json(
  '{
    "employee": {
      "name": "John",
      "age": 30,
      "height_in_ft": 5.11,
      "married": true,
      "has_kids": false,
      "stock_options": null,
      "phone": [
        "+1 123-456-7890",
        "+1 098-765-4321"
      ],
      "Address" {
        "street": "4132 Grant Street",
        "city": "Rockville",
        "State": "Minnesota"
      }
    }
  }');
  
  
  -- how to insert data using internal & external stage are part of future chapters
  -- in this playlist. 

Query JSON Data In Snowflake

Lets Query JSON Data


-- create an employee table with one variant column

create or replace transient table employee_tbl(
    emp_json variant
);

-- let me insert one single record
insert into employee_tbl(emp_json)
select parse_json(
  '{
      "name": "John",
      "age": 30,
      "height_in_ft": 5.11,
      "married": true,
      "has_kids": false,
      "stock_options": null
    }');

-- lets run the select sql
select * from employee_tbl;

Colon Notation To Access JSON Elements


select 
    emp_json:name,
    emp_json:"age",
    emp_json:height_in_ft,
    emp_json:married,
    emp_json:has_kids,
    emp_json:stock_options
from employee_tbl;

Alias of JSON Elements

select 

    emp_json:age::integer as age,
    emp_json:name::string as "Name",
    emp_json:height_in_ft::decimal as height,
    emp_json:married as is_married,
    emp_json:has_kids as has_kids,
    emp_json:stock_options as stock_options
from employee_tbl;

-- add typeof() funciton to check how snowflake treat these name/value pair
select 
    typeof(emp_json:name) as name,
    typeof(emp_json:age) as age,
    typeof(emp_json:height_in_ft) as height,
    typeof(emp_json:married) as is_married,
    typeof(emp_json:has_kids) as has_kids,
    typeof(emp_json:stock_options) as stock_options
from employee_tbl;


insert into employee_tbl(emp_json)
select parse_json(
  '{
    "employee": {
      "name": "John",
      "age": 30,
      "height_in_ft": 5.11,
      "married": true,
      "has_kids": false,
      "stock_options": null,
      "phone": [
        "+1 123-456-7890",
        "+1 098-765-4321"
      ],
      "Address": {
        "street": "3621 McDonald Avenue",
        "city": "Orlando",
        "State": "Florida"
      }
    }
  }');
  
select * from employee_tbl;
  
select 
    emp_json:employee.name::string as name,
    emp_json:employee.age as age,
    emp_json:employee.height_in_ft as height,
    emp_json:employee.married as is_married,
    emp_json:employee.has_kids as has_kids,
    emp_json:employee.stock_options as stock_options,
    typeof(emp_json:employee.phone) as all_phone_type,
    ARRAY_SIZE(emp_json:employee.phone) as how_many_phone,
    emp_json:employee.phone[0] as work_phone,
    emp_json:employee.phone[1] as office_phone,
    typeof(emp_json:employee:Address) as address_type,
    emp_json:employee:Address:street as street,
    emp_json:employee.Address.city as city,
    emp_json:employee.Address.State as state
from employee_tbl; 


-- apply other function and mathematical operation without casting it.
select 
    emp_json:employee.age as age,
    (emp_json:employee.height_in_ft) * (12*2.54) as height_in_cm,
    typeof(emp_json:employee.Phone) as all_phone_type,
    ARRAY_SIZE(emp_json:employee.Phone) as how_many_phone
from employee_tbl; 

Date & Timestamp Data Types in Snowflake JSON

insert into employee_tbl(emp_json)
select parse_json(
  '{
      "name": "John",
      "age": 30,
      "height_in_ft": 5.11,
      "dob":"2022-12-11",
      "dob_timestemp":"2022-12-11T00:19:06.043-08:00",
      "married": true,
      "has_kids": false,
      "stock_options": null
    }');
    
   
 select 
        emp_json:dob::date,
        emp_json:dob_timestemp::timestamp
        from employee_tbl order by 1 desc;   

Flatting JSON Data & Loading Into Snowflake Tables


-- create the table

create transient table employee_tbl(
    emp_json variant
);

--1st record
insert into employee_tbl(emp_json)
select parse_json(
  '{
    "employee": {"name": "John-1","age": 30,"height_in_ft": 5.11,"married": true,"has_kids": false,
      "stock_options": null,"email":"john1@ttips.com","phone": ["+1 123-456-7890","+1 098-765-4321"],
      "Address": {"street": "3621 McDonald Avenue","city": "Orlando","State": "Florida"}
               }
    }');
--2nd record
insert into employee_tbl(emp_json)
select parse_json(
  '{
    "employee": {"name": "John-2","age": 33,"height_in_ft": 5.09,"married": false,"has_kids": false,
      "stock_options": 10,"email":"john2@ttips.com","phone": ["+1 222-456-0987"],
      "Address": {"street": "532 Locust View Drive","city": "San Jose","State": "California"}
               }
    }');

Creating Sequence Objects

-- create sequencer
create or replace sequence emp_seq
  start 1 
  increment 1
  comment = 'employee sequence';
  
create or replace sequence phone_seq
  start 1 
  increment 1
  comment = 'phone sequence';
  
create or replace sequence address_seq
  start 1 
  increment 1
  comment = 'address sequence';
  

Creating Master Tables

-- employee master table
create or replace table employee_master(
    emp_pk integer default emp_seq.nextval,
    name string,
    age number(3),
    height_in_cm decimal(6,3),
    is_married boolean,
    has_kids boolean,
    stock_options integer,
    email varchar(100)
);

-- child table holding all the phones
create or replace table emp_phones(
    phone_pk integer default phone_seq.nextval,
    emp_fk number,
    phone_type varchar(20),
    phone_number varchar(30)
);

-- child table holding all the phones
create or replace table emp_address(
    address_pk integer default address_seq.nextval,
    emp_fk number,
    street_address varchar(200),
    city varchar(50),
    state varchar(50)
);

Insert into employee master table

insert into employee_master (name, age, height_in_cm,is_married,has_kids,stock_options,email)  
select 
    emp_json:employee.name::string as name,
    emp_json:employee.age as age,
    (emp_json:employee.height_in_ft)*(12*2.54) as height_in_cm,
    emp_json:employee.married as is_married,
    emp_json:employee.has_kids as has_kids,
    emp_json:employee.stock_options as stock_options,
    emp_json:employee.email::string as email
from employee_tbl; 

select * from employee_master;

Insert into employee phone table


insert into emp_phones (emp_fk,phone_type,phone_number)
select 
    b.emp_pk,
    'home_phone' as home_phone,
    a.emp_json:employee.phone[0]::string as home_phone
from 
    employee_tbl a 
    join 
    employee_master b 
    on  
        a.emp_json:employee.email = b.email
union all
select 
    b.emp_pk,
    'work_phone' as work_phone,
    a.emp_json:employee.phone[1]::string as work_phone
from 
    employee_tbl a 
    join 
    employee_master b 
    on  
        a.emp_json:employee.email = b.email;
        
        select * from emp_phones;

Insert into employee phone table

insert into emp_address (emp_fk,street_address,city,state)
select 
    b.emp_pk,
    a.emp_json:employee.Address.street::string as street,
    a.emp_json:employee.Address.city::string as city,
    a.emp_json:employee.Address.State::string as state
from 
    employee_tbl a 
    join 
    employee_master b 
    on  
        a.emp_json:employee.email = b.email;
        
select * from emp_address;

select e.*, a.* 
    from employee_master e join emp_address a on e.emp_pk = a.emp_fk;