Snowflake Architecture Cheat Sheet
- No software, No Hardware, No maintenance. Snowflake is provided as Software-as-a-Service (SaaS) that runs completely on cloud infrastructure
- Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. (Shared Disk)
- Similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally (Shared Nothing)
- Shared disk architecture & Shared nothing architecture (SDA/SNA) (multi cluster architecture)
- This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.
Snowflake’s unique architecture consists of three key layers:
- Cloud Services (The brain of the system)
- Query Processing (The muscles of the system)
- Database Storage
- Snowflake Manages everything for customer
- Resource Management
- Data Protection
- Availability (it has built in redundancy)
- When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format.
- All data is encrypted AES 256 strong encryption
- Snowflake stores this optimized data in cloud storage.
- Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake.
- The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake
- Snowflake allows to create multiple, independent compute clusters for query processing and they are called virtual warehouses.
- They all access same data source without any contention (Unlimited scale)
- When a virtual warehouse is resized, all subsequent queries take advantage of new resources.
- Services layer is fully maintained by snowflake and distributed across multiple availability to ensure high availability
- The cloud services layer is a collection of services that coordinate activities across Snowflake.
- These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch
- The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.
- The key component of service layer is the metadata store which powers number of snowflake unique features
- Zero copy cloning
- Time travel
- Data sharing
- Among the services in this layer:
- Authentication & session management
- Infrastructure management
- Metadata management
- Query parsing and optimization
- Access control
Virtually all operations can be performed through these client
- A web based user interface
- Command Line Interface (SnowSQL)
- ODBC & JDBC
- Native Connector
- Third Party Connector
- Authenticate and submit a query via a client ( SnowSQL or WebUI) after initiating a session
- Specify the virtual warehouse to the session (else a default will be taken)
- Services layer check if you are authorized to access the data and database (both) and operation specified in the operation (it may be alter or delete or drop or select etc)
- The service layer creates an optimized query plan
- Services layer send the instruction to virtual warehouse, allocate resources, get the data needed for processing and execute the query
- (caching might come but in simple one it is not mentioned)
- Results are then return to you
Snowflake SnowPro Practice & Reading Guide
SnowProc Certification Cheat Sheet
Refer topic wise important notes and cheat sheet
- Snowflake SnowPro Complete Guide
- Architecture Cheat Sheet
- Snowflake Data Sharing Cheat Sheet
- Snowflake SnowPipe Cheat Sheet
- Snowflake Stream & Task Jump Start
- Snowflake Stream And Task Beginner's Guide
SnowProc Certification Practice Test