Using the power of nested materials and discovery of cascading restoration Amazon Web Services

Amazon RedShift will make the views significantly improvise the performance of complex queries. Materialized views store preliminary results of queries that can use future similar quarries, and offers powerful solutions for the surroundings of the data warehouse, where applications often need to make a feature on large table resources. This technical optimization increases the speed and efficiency of the query by allowing to skip many computing steps, while the preliminary results were returned directly. The tro -shaped views are particularly useful for accelerating predictable and repeated questions, such as those used to fill the control panels or generate messages. Intoread repeated implementation of intensive resources operations can ask applications to materialized views and retrieve preliminary results, leading to significant performance profits and user improvement. In addition, the materialized views can be gradually updated, and the logic applies only to change data when the language of data manipulation (DML) on the basic base tables is made, which further optimizes performance and maintains data consistency.

This post shows how to maximize the performance of Amazon RedShift queries effectively by implementing materialized opinions. We explore the creation of materialized opinions and implement nested renewal strategies where materialized views are defined in terms of other materialized views on expanding their abilities. This approach is particularly strong for the re -use of preliminary connections with different aggregated possibilities, which will absorb the processing time for complex ETL and BI workload. Let’s explore how to implement this powerful feature around your data warehouse.

Introduction to nested personalities

The nested reports in Amazon Redshift allow you to create materialized views based on other materialized views. This ability enables the hierarchical structure of preliminary calculation results, which significantly increases the performance of the query and efficiency of data processing. With nested materials, you can create multilayer data abstraction and create ingrediently complex and specialized views adapted to specific needs.

  • Improved queries performance: Each level of the nested hierarchy of views serves as a cache, which allows questions to quickly approach pre -settled data without having to exceed the base table.
  • Reduced calculation load: By creating computing work in the renewed display process, you can significantly reduce the use of running and resources of your daily questions.
  • Simplified data modeling: Nested materialized views allow you to create a modular and more expandable data model where each layer represents a specific business concept or use.
  • Incremental refreshment: RedShift has materialized incremental recovery views, allowing you to update only the changed data within the nested hierarchy, which further optimizes the recovery process.
  • Mailed Cascading Views: Redshift The views are supported by automatic treatment of extract, load and transformation (ELT) in the style of workload, minimizing the need for manual creation and management of these processes.

You can implement nested materials using The Create Materied View Stament, which allows you to refer to other materialized views in the definition. Boxs of common use include:

  • Modular Piping of Data Transformation
  • Hierarchical aggregation for progressive analysis
  • Multi -level validation pipe
  • Historical Data Management Snapshot
  • Optimized BI report with expected results

Architecture

The Amazon Redshift architectural department is a nested view structure. Multiple base tables (orange) connect to the material (red) connect to the nested viewing layer and the data sharing table (green). It includes integration points for users and fast -view visualization.

  1. Base table: These are basic basic tables that contain unprocessed data for your data warehouse. It can be local table sharing tables.
  2. The materialized base of the view: This is the first teaching of matialized views, which are created directly on the basis of the basic tables. These views encapsulate common transformations and aggregation data. This can serve as a base for a nested view of matter and also highlights it directly.
  3. Nested clear view (view): This is the second level (or highhri) affairs that are created on the base material. A nested materialized view can further aggregate, filter or transform data from basic materials.
  4. Application/Users/BI Report: Tools for application or business intelligence (BI) interact with nested materials on generating messages and dashboards. The nested views provide a more optimized and protected data structure for efficient questioning and reporting.

Creating and nested Materials

To show how nested views work in Amazon Redshift, we use the TPC-DS data file. We will create three questions using Store, Store_sales, Customers and Customer_address tables for Simlate Data Warehouse Reports. This example illustrates how multiple messages can share sets of results and how materialized views can improve both resource efficiency and query performance. Consider the following questions as questions on the dashboard:

SELECT cust.c_customer_id,
cust.c_first_name, 
cust.c_last_name, 
sales.ss_item_sk, 
sales.ss_quantity, 
cust.c_current_addr_sk 
FROM store_sales sales INNER JOIN customer cust
ON sales.ss_customer_sk = cust.c_customer_sk;

SELECT cust.c_customer_id,
cust.c_first_name, 
cust.c_last_name, 
sales.ss_item_sk, 
sales.ss_quantity, 
cust.c_current_addr_sk, 
store.s_store_name
FROM store_sales sales INNER JOIN customer cust
ON sales.ss_customer_sk = cust.c_customer_sk
INNER JOIN store store
ON sales.ss_store_sk = store.s_store_sk;

SELECT cust.c_customer_id, 
cust.c_first_name, cust.c_last_name, 
sales.ss_item_sk, 
sales.ss_quantity, 
addr.ca_state
FROM store_sales sales INNER JOIN customer cust
ON sales.ss_customer_sk = cust.c_customer_sk
INNER JOIN store store
ON sales.ss_store_sk = store.s_store_sk
INNER JOIN customer_address addr
ON cust.c_current_addr_sk = addr.ca_address_sk;

Note that the connection between the Store_sales tables and the customer tables is present in all 3 questions (Dashboards).

The second question adds a connection to the store table and the third question is the second with the next connection with the Customer_address table. This formula is common in the Sales Information scenarios. Earlier, using a materialized view, it can speed up queries, as the set of results is stored and ready to deliver to the user and avoids reworking the same data. In such cases, we can use nested materials to reuse already processed data. When we convert our questions into a set of nested matialized views, the result would be bes below:

CREATE MATERIALIZED VIEW StoreSalesCust as
SELECT cust.c_customer_id, 
cust.c_first_name, 
cust.c_last_name, 
sales.ss_item_sk, 
sales.ss_store_sk, 
sales.ss_quantity, 
cust.c_current_addr_sk
FROM store_sales sales INNER JOIN customer cust
ON sales.ss_customer_sk = cust.c_customer_sk;

CREATE MATERIALIZED VIEW StoreSalesCustStore as
SELECT storesalescust.c_customer_id, 
storesalescust.c_first_name, 
storesalescust.c_last_name, 
storesalescust.ss_item_sk, 
storesalescust.ss_quantity, 
storesalescust.c_current_addr_sk, 
store.s_store_name
FROM StoreSalesCust storesalescust INNER JOIN store store
ON storesalescust.ss_store_sk = store.s_store_sk;

CREATE MATERIALIZED VIEW StoreSalesCustAddress as
SELECT storesalescuststore.c_customer_id, 
storesalescuststore.c_first_name, 
storesalescuststore.c_last_name, 
storesalescuststore.ss_item_sk, 
storesalescuststore.ss_quantity, 
addr.ca_state
FROM StoreSalesCustStore storesalescuststore INNER JOIN customer_address addr
ON storesalescuststore.c_current_addr_sk = addr.ca_address_sk;

The nested material views can improvise the performance and efficiency of resources by re -using the initial view results, minimizing reducing joints and working with less results. This creates a hierarchical structure where it depends on one else. Due to these dependencies, you must restore opinions in a specific order.

message

Result of SQL queries denoting the problem of dependence on refresh trades.

With a new option “to restore a materialized view MV_NAME Cascade »You will be able to restore the entire string of addictions for the materialized opinions you have. Note that in this example we use a third materialized view, Storesascustaddress, and this will restore all 3 materials.

message

The SQL query showing a successful cascading renewal of shops shops has materialized a view in Amazon Redshift.

If we use a second materialized view with the possibility of Cascade, we will only restore the first and second materialized views, the third unchanged. This can be used to maintain some materialized opinions with less current data than others.

View the SVL_MV_REFRESH_STATUS system reveals the recovery sequence of materialized views. When you start a cascade refreshment to StoresCustaddress, the system follows the chain of addiction that we founded: Storesascust first renews, followed by Storescescuststore and Finlly StoresCustaddress. This shows how the renewal operation respects the hierarchical structure of our materialized views.

result

The result of the SQL query from SVL_MV_REFRESH_STATUS showing a successful reconstruction of three materialized views.

With regard to

Consider a string of dependence where Storesalescust (A) → Storesascuststore (B) → StoresCustaddress (C).

  • Cascade renewal behavior works as follows:
    • When refreshing C with a cascade: A, B and C, all will be refreshed.
    • In refreshment B with a cascade: only A and B will be restored.
    • In refreshment and cascade: only and will be refreshed.
    • In particular, you need to renew A and C but not B, you must perform separate restoration operations without using a cascade – Set renewal A, then directly restore C.

Proven procedures for a materialized view

  • Improve the source query: Start with a well -optimized selected stem for your materialized view. This is particularly important for opinions that require complete conversions during each recovery.
  • Plan Restoring Strategy: You cannot use automatic recovery YES when creating materialized opinions that depend on other material views. Instead, implement the Redshift Data API orchestration orchestrations using the Redshift Data API with Amazon EventBridge for AWS Step Functions for Workflow Management.
  • Distribution and cruise keys:: Correctly configure the distribution and sorting of keys on materialized views based on their patterns for performance optimization. Well the highest keys improve the speed of the query and reduce I/O operations.
  • Consider the incremental ability to restore: if possible, design the materialized views of incremental support that only updates the data than to rebuild the whole view, which significantly improves the performance of the renewal.
  • To learn more about the automated material display (Auto-MV) that strengthens the workload performance, this intelligent system monitors your workload and automatically creates a materialization to increase overall performance. For more detailed information about this feature, provide automated materials.

Clean up

Complete the following steps to clean up resources:

  • Remove a temporary replica replica replica or endpoints without a redshift server created for this example

gold

  • Just dig the materialized view you created for testing

Conclusion

This post showed how to create nested Amazon Redshift materialized and renewed a child that proved to be a new possibility of a Refresh cascade. You can quickly create and maximize effective data processing and freely expand the benefits of conducting low latency in data analysis.


About the authors

Ritesh kumar sinha He is an architect of Analytics Specialist Solutions based in San Francisco. He helped customers build scalable data storage and large data solutions for over 16 years. He loves designing and creating efficient solutions on end-to-end at AWS. In his free time she loves reading, walking and doing yoga.

Raza hafez He is Senior Product Manager at Amazon Redshift. He has over 13 years of professional experience in building and optimizing business data warehouses and is excited by allowing customers to realize the power of his data. He specializes in the migration of business data warehouses in modern AWS data architecture.

Ricardo Serafim is Senior Analytics Specialist Solutions Architect on AWS. Since 2007, he has been helping companies with Data Warehouse Solutions.

Leave a Comment