SnowPipe Introduction (Snowflake Certification)
Data Loading & Data Ingestion is one of the key activity in any data warehouse system and that’s why Snowflake SnowPro Certification exam ask many questions around this concept. This subject area does not need any hands on knowledge however you will get at least 2 questions around SnowPipe and fundamentals around continuous data loading. With trial edition, we generally don’t play a lot with SnowPipe as it needs cloud account to trigger batches and that weakens our knowledge around SnowPipe & continuous data loading.
Overview - Snowflake SnowPipe
Workflow - Snowflake SnowPipe
Snowflake SnowPipe Cheat Sheet
Snowflake SnowPipe Overview
- SnowPipe enables loading data from files as soon as they’re available in a external stage
- A pipe is a named, first-class Snowflake object that contains a COPY statement used by Snowpipe
- Pipe wraps copy commands, so all data type is supported (json,avro etc)
- File arrival detecting mechanism
- Using cloud notification
- Calling REST API endpoint
- SnowPipe copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in a specified pipe object.
- File Size Recommendation
- The number of load operations that run in parallel cannot exceed the number of data files to be loaded
- data files roughly 10 MB to 100 MB in size compressed
- Split large files into a greater number of smaller files to distribute the load among the servers
- Semi-Structured File size
- Variant data type imposes 16Mb compressed for individual rows.
- we recommend enabling the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command to remove the outer array structure and load the records into separate table rows
- Other Important Note
- Snowpipe charges 0.06 credits per 1000 files queued
- There is no guarantee that files are loaded in the same order they are staged
- SnowPipe latency is hard to estimate as it depends on the file size, file format and complexity of copy statement.
SnowPipe/Continouse Loading VS Copy Command/Bulk Loading
- Bulk Loading - user session
- Continouse Loading - JSON Web Token using public/private key
- Load History
- Bulk Loading - 64 days of metadata history
- Continouse Loading - 14 days of metadata history
- Bulk Loading - always single transaction
- Continouse Loading - combine or split based on number of files
- Compute Resource
- Bulk Loading - WH is required
- Continouse Loading - uses SF supplied resources
- Bulk Loading - priced everytime WH is active
- Continouse Loading - bill as per SF supplied resource is used.
Snowflake SnowPipe Summary Video
Snowflake SnowPro Practice & Reading Guide
SnowProc Certification Cheat Sheet
Refer topic wise important notes and cheat sheet