Using Set Processing Effectively 有效地使用集合处理
Using Set Processing Effectively 有效地使用集合处理
The information in the topics that follow applies if you are developing new or upgrading older Application Engine programs to adhere to a set-based model.
如果要开发新的应用程序引擎程序或升级旧的应用程序引擎程序以遵循基于集的模型,则以下主题中的信息适用。
SQL Expertise SQL专业知识
You should be a SQL expert if you are developing row-by-row programs with Application Engine and especially if you are developing set-based programs. The following concepts are particularly important:
如果使用应用程序引擎开发逐行程序,尤其是开发基于集合的程序,那么您应该是SQL专家。以下几个概念尤为重要:
- Group by and Having clauses.
- group by 和 having 子句。
- Complex joins.
- 复杂的联接。
- Subqueries (correlated and non-correlated).
- 子查询(相关的和不相关的)。
- Tools for your database to analyze complex SQL statements for performance analysis.
- 为您的数据库分析复杂的SQL语句进行性能分析的工具。
Typically, you use these SQL constructs to refine or filter a set to contain only the rows that meet specific criteria. In Application Engine, you code using SQL, and Application Engine passes that SQL directly to the database, where it is processed. If you have a complex SQL statement that works functionally, it may not perform well if it is not tuned properly.
通常,您使用这些SQL构造来优化或筛选一个集合,使其仅包含满足特定条件的行。在应用程序引擎中,使用SQL进行编码,应用程序引擎将该SQL直接传递到数据库,并在数据库中对其进行处理。如果您有一个在功能上工作的复杂SQL语句,如果没有适当地调优,它的性能可能不会很好。
Planning 规划
Well-constructed, robust, and efficient Application Engine programs are usually the product of a detailed planning stage in which loops, program flow, the use of temporary tables, sections, steps, and so on are discussed.
构造良好、健壮且高效的应用程序引擎程序通常是详细规划阶段的产物,在该阶段中讨论了循环、程序流、临时表、节、步骤等的使用。
Ideally, you should address batch processing as a whole while you are designing the system. Sometimes, system analysts and developers focus primarily on the online system during the database design, and then they consider the batch component within the existing database design. Set processing works best in an environment in which data models are optimized for set processing.
理想情况下,您应该在设计系统时将批处理作为一个整体来处理。有时,系统分析师和开发人员在数据库设计过程中主要关注在线系统,然后在现有的数据库设计中考虑批处理组件。集合处理在数据模型为集合处理进行优化的环境中工作得最好。
For example, you could have a separate staging table for new data that has not been processed rather than having numerous cases of existing rows in a table being updated. In set processing, processing the data after moving it to a temporary table using an Insert or Select statement is easier than using an update.
Avoid performing updates on real application tables, and try to perform your updates on temporary tables. You can structure your data model to minimize updating real application tables.
例如,对于尚未处理的新数据,可以使用一个单独的staging表,而不是对表中的现有行进行多次更新。在集合处理中,使用Insert或Select语句将数据移动到临时表后再对其进行处理比使用update更容易。避免在实际应用程序表上执行更新,并尝试在临时表上执行更新。您可以构造数据模型以尽量减少实际应用程序表的更新。
Another important consideration is keeping historical data separate from active transactions. After the life cycle of given piece of transaction data is over, so that no more updates are possible, consider moving that data to an archive or history table and deleting it from the real transaction table. This action minimizes the number of rows in the table, which improves performance for queries and updates to your active data.
另一个重要的考虑是将历史数据与活动事务分开。在给定的事务数据块的生存周期结束之后,因此不再可能进行更新,可以考虑将该数据移动到归档或历史表中,并从实际的事务表中删除它。此操作可最大限度地减少表中的行数,从而提高查询和更新活动数据的性能。
Temporary Tables 临时表
Although temporary tables are not required for set processing, well-designed temporary tables complement your set-based program in a variety of ways.
虽然集合处理不需要临时表,但是设计良好的临时表可以以各种方式补充基于集合的程序。
Creating temporary tables enables you to achieve one of the main objectives of set-based processing: the processing remains on the database server. By storing transient data in temporary tables, you avoid the batch program fetching the data, row by row, and running the business rule, processing the data, and then passing the updated data back to the database. If the program ran on the client, you would encounter performance issues because of the network round-trip and the diminished processing speed of a client compared to the database platform.
创建临时表使您能够实现基于集的处理的主要目标之一:处理保留在数据库服务器上。通过将临时数据存储在临时表中,可以避免批处理程序逐行获取数据、运行业务规则、处理数据,然后将更新的数据传递回数据库。如果程序在客户机上运行,您将遇到性能问题,因为网络往返和与数据库平台相比客户机的处理速度降低。
Design your temporary tables to:
将临时表设计为:
- Hold transaction data for the current run or iteration of your program.
- 保存程序当前运行或迭代的事务数据。
- Contain only those rows of data affected by the business rule.
- 仅包含受业务规则影响的数据行。
- Present key information in a denormalized, or flattened, form, which provides the most efficient processing.
- 以非规范化或扁平化的形式呈现关键信息,这提供了最有效的处理方式。
- Switch the keys for rows coming from the master tables, if needed.
- 如果需要,为来自主表的行切换键。
A transaction may use a different key than what appears in the master tables.
事务可以使用与主表中显示的不同的键。
Denormalized Tables 非正规化表
The most efficient temporary tables store data in denormalized form. Because most programs need to access data that resides in multiple tables, you should consolidate all of the affected and related data into one table, a temporary table. The program runs more efficiently against a flattened, temporary table rather than relying on the system to materialize complex joins and views to retrieve or update necessary data for each transaction.
最有效的临时表以非规范化的形式存储数据。因为大多数程序需要访问驻留在多个表中的数据,所以应该将所有受影响的相关数据合并到一个表(临时表)中。程序在一个扁平的临时表上运行的效率更高,而不是依赖于系统来具体化复杂的连接和视图,以便为每个事务检索或更新必要的数据。
If your program requires the use of a complex view to process transactions, then resolve the view into a temporary table for your program to run against. Each join or view that needs to materialize for each transaction consumes system resources and affects performance. In this approach, the system applies the join or view once (during the filtering process), populates the temporary table with the necessary information that the program needs to complete the transaction, and then runs the program against the temporary table as needed.
如果您的程序需要使用复杂的视图来处理事务,那么请将该视图解析为一个临时表,以便程序在其上运行。需要为每个事务实现的每个联接或视图都会消耗系统资源并影响性能。在这种方法中,系统应用一次联接或视图(在过滤过程中),用程序完成事务所需的必要信息填充临时表,然后根据需要对临时表运行程序。
For example, consider the following situation:
例如,考虑下面的情况:
A program needs to update 10,000 rows in the Customer table, which contains 100,000 rows of data. The Customer table is keyed by setID. To complete the transaction, the program references data that resides in a related table called PS_SET_CNTRL_REC. PS_SET_CNTRL_REC is used to associate setID and BUSINESS_UNIT values. The transaction is keyed by BUSINESS_UNIT.
一个程序需要更新Customer表中的10,000行,该表包含100,000行数据。Customer表由setID键控。为了完成事务,程序引用驻留在名为PS_SET_CNTRL_REC的相关表中的数据。PS_SET_CNTRL_REC用于关联setID和BUSINESS_UNIT值。事务由BUSINESS_UNIT键控
Given this set of circumstances, the most efficient processing method would be similar to the following:
考虑到这组情况,最有效的处理方法将类似于以下方法:
- Isolate affected or necessary data from both tables and insert it into a temporary table.
- ·从两个表中隔离受影响的或必要的数据,并将其插入到临时表中。
Now, instead of dealing with a 10,000-row Customer table and a join to a related table, the program faces a 10,000-row temporary table that contains all of the required data to join directly to the transaction data, which can also be in a temporary table. If all necessary columns reside in the temporary tables, then the program can modify all the rows at once in a simple Update statement.
现在,程序面对的不是一个10,000行的Customer表和一个到相关表的联接,而是一个10,000行的临时表,其中包含直接联接到事务数据所需的所有数据,这些数据也可以在一个临时表中。如果所有必要的列都驻留在临时表中,那么程序可以在一个简单的Update语句中一次性修改所有行。
This example presents two different uses of temporary tables. In one situation, the temporary table is designed to hold setup and control data in a modified form. In the other situation, the temporary table is designed to hold transaction data in a denormalized form, perhaps with additional work columns to hold intermediate calculations.
本例展示了临时表的两种不同用法。在一种情况下,临时表被设计成以修改后的形式保存设置和控制数据。在另一种情况下,临时表被设计为以非规范化的形式保存事务数据,可能还有额外的工作列来保存中间计算。
- Make sure the data appears in a denormalized form for optimum processing.
- 确保数据以非规范化的形式出现,以便进行最佳处理。
- Because the transaction is keyed by BUSINESS_UNIT, you should also key the temporary table that holds the control data by BUSINESS_UNIT.
- 因为事务是由BUSINESS_UNIT键控的,所以还应该由BUSINESS_UNIT键控保存控制数据的临时表。
In this case, the table that holds the control data is the Customer table.
在这种情况下,保存控制数据的表是Customer表。
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.kler.cn/a/136061.html 如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!