Loading Special Character From CSV Into Snowflake
Summary
How to load data special characters available in your data file whicch either conflicts with your file delimiter or row delimiter or stops your copy command and eventually stops data loading process into snowflake tables.
Loading Special Character From CSV Into Snowflake
There are scenarios where you need to manage special characters part of your data file that may conflict with your field delimiter or row delimiter and eventually stops your copy command in snowflake. If that is the case, how can you load such data file having description or string with special character into snowflake and what properties of file format object or copy command object should be used.
This episode is a comprehensive & practical guide with hands-on excercise to learn how to load data into snowflake which has special charactes.
Once you complete this video, you will be able to answer following questions
- File Format & Use of Field Optinally Enclosed By Parameter
- Double Quote & Single Quote Together in a text field
- Escape Character & how to use it
- Transformation Using Copy Command
Sample Data Set
Here is the data set which is used in this guide.
SQL Used In this Article
Here is the User Email table.
create or replace transient table user_email(
id number,
first_name varchar(100),
last_name varchar(100),
email varchar(100),
gender varchar(1),
about_me varchar(500)
);
Create File Format with single & doublt quotes.
-- create a file format
-- double quote as field_optionally_enclosed_by parameter
create or replace file format csv_double_q_ff
type = 'csv'
compression = 'none'
field_delimiter = ','
record_delimiter = '\n'
skip_header = 1
field_optionally_enclosed_by = '\042'
trim_space = false
error_on_column_count_mismatch = true;
-- another file format where field_optionally_enclosed_by parameter
-- will take single quote (\047)
create or replace file format csv_single_q_ff
type = 'csv'
compression = 'none'
field_delimiter = ','
record_delimiter = '\n'
skip_header = 1
field_optionally_enclosed_by = '\047'
trim_space = false
error_on_column_count_mismatch = true;
Load Data Using Copy Command
-- lets load our 1st sample file. this has 1k records and all of them should get processed.
copy into user_email
from @~/ch06/csv/01_sample_user_email.csv
file_format = csv_double_q_ff
on_error = 'continue';
-- review the loaded data
select * from user_email;
-- Truncating before loading new file
truncate table user_email;
-- loading next file where about_me field is enclosed with double quotes.
copy into user_email
from @~/ch06/csv/02_sample_email_double_quotes.csv
file_format = csv_double_q_ff
on_error = 'continue';
-- review the loaded data
select * from user_email;
truncate table user_email;
-- loading next file where about_me field is enclosed with single quotes.
copy into user_email
from @~/ch06/csv/03_sample_email_single_quote.csv
file_format = csv_single_q_ff
on_error = 'continue';
-- review the loaded data
select * from user_email;
truncate table user_email;
File Format with Escape Character
-- lets truncate the data set
truncate table user_email;
-- run a select command
select * from user_email;
-- lets see if we have the sample file #06 available in our user stage or not.
list @~/ch06/csv/06_sample_email_regex.csv;
-- file format
create or replace file format csv_double_q_ff
type = 'csv'
compression = 'none'
field_delimiter = ','
record_delimiter = '\n'
skip_header = 1
field_optionally_enclosed_by = '\042'
escape = '\134'
skip_blank_lines = true
trim_space = true;
Data Loading Using Copy Command & Select Statement
-- here we have a copy command with transformation
-- if you have not see how to work with $ notation,
-- refer my ch-11 from snowflake tutorial to understand it.
-- https://youtu.be/w9BQsOlJc5s
-- here is finaly copy command where regexp_replace is used to replace all double and single quote
-- however it will apply the transformaiton after the field
copy into user_email
from
(
select $1, $2, $3, $4, $5, regexp_replace($6,'\\"|\'','')
from @~/ch06/csv/06_sample_email_regex.csv t
)
file_format = csv_double_q_ff
force = true
on_error = 'continue'
;
-- run a select command
select * from user_email;