Skip to main content

Patterns for e-business > Select Business pattern >
c


Application Integration: Select Application pattern

The various designs in the Application patterns that follow allow for solution flexibility in Application Integration, and are categorized as either Process Integration or Data Integration. These two categories enable different types of integration functionality.

Process Integration application patterns

Overview
Process Integration application patterns are observed where multiple automated business processes are combined to yield a new business offering or to provide a consolidated view of some business entity with many representations in the corporate business systems. An often quoted example is the consolidated view of the state of all relationships of the business with a particular customer.

This mode of integration is highly flexible. In its more sophisticated form it enables "late binding" of the targets of integration and is particularly useful in tying together different platforms and technologies. However it represents a more difficult design and development task compared to data integration and often requires complex middleware.

Explanation for re-engineering of Process Integration application patterns.

Process Integration application patterns
Application Integration application patterns

Relationship of Process Integration patterns to Extended Enterprise and SOA profile patterns
The PI patterns introduce the following patterns - Direct Connection, Broker, Router, Serial Process, Serial Workflow, Parallel Process, Parallel Workflow, Hub, Zone etc.
The EE profile adds the Exposed qualifier and Partner applications and infrastructure.
The SOA profile builds off both of these and adds ESB, ESB Gateway, and BSC patterns.

Business and IT drivers
Business drivers

IT drivers

Common Services for the Process Integration application patterns
Process Integration application patterns contain a well-defined set of services, combinations of which are used in the patterns observed in practice. These services include:

  • 1. Protocol adapters
  • 2. Message handlers
  • 3. Data transformation
  • 4. Decomposition/Recomposition
  • 5. Routing/Navigation
  • 6. State management
  • 7. Security
  • 8. Local business logic
  • 9. (Business) unit-of-work management


More descriptive information on these services can be found in the "Application Integration Services" section of the general guidelines page. Then, select from the following Process Integration application patterns the design that best addresses the specific requirements of your solution.

Documentation of the most frequently observed QoS concerns that you must consider when implementing integration solutions.

Process Integration application patterns



Legend for Process Integration application patterns
Process Integration application patterns legend


Direct Connection
Direct Connection application pattern
For a legend, please see above.

The Direct Connection application pattern represents the simplest interaction type and is based on a 1-to-1 topology. It allows a pair of applications within the organization to directly communicate with each other. Interactions between a source and a target application can be arbitrarily complex. Generally, complexity can be addressed by breaking down interactions into more elementary interactions.

More complex point to point connections will have modeled connection rules such as business rules associated with them, as shown above. Connection rules are generally used to control the mode of operation of a connector depending on external factors. Examples of connection rules are:

  • Business data mapping rules (for adapter connectors)
  • Autonomic rules (such as priority in a shared environment)
  • Security rules
  • Capacity and availability rules


The Direct Connection application pattern has two variations:

  • Message Connection variation
  • Call Connection variation


All applications of the Direct Connection application pattern will be one variation or the other. The variation required depends on whether the initiating source application needs an immediate response from the target application in order to continue with execution.

Both variations may be used either with synchronous or asynchronous communication protocols. However, there are preferences for a specific protocol type depending on the variation. For example, the Call Connection variation has a more natural fit with synchronous protocols while the Message Connection variation favors asynchronous protocols.

Business and IT Drivers
The business and IT drivers for choosing the Direct Connection application pattern are to:

  • Improve the organizational efficiency
  • Reduce the latency of business events
  • Support a structured exchange within the organization
  • Support real-time one-way message flows
  • Support real-time request/reply message flows
  • Leverage existing skills
  • Leverage the legacy investment
  • Enable back-end application integration
  • Minimize application complexity


The primary goal is to allow one application to gain direct and real-time access to another in order to reduce the latency of business events.

Solution
This Application pattern, as shown in the figure above, is divided into a number of logical components:

  • The Source Application represents one or more applications that are interested in initiating an interaction with the target application.
  • The Connection is the line between the source application and the target application representing a point-to-point connection between the two applications.
  • The Connection Rules represent any business rules associated with the connection, such as data mapping rules and security rules.
  • The Target Application represents a new application, a modified existing application, or an unmodified existing application. This application is responsible for implementing the necessary business services.


Guidelines for use
Direct integration between applications can be inflexible, in that any changes to one application may have knock-on effects on other applications. Changes to the target application may also require changes to the source application. Such changes can become both expensive and time consuming, especially when the target application is being accessed by a number of different source applications.

Different IT departments may also be responsible for developing and maintaining the source and target applications. Under such a scenario, development might be difficult to coordinate, especially if the interfaces between the applications being integrated are not properly defined and documented. Because of this, it is important to clearly define such interfaces in advance.

Benefits
The Direct Connection application pattern offers the following benefits:

  • It works with applications that have simple integration requirements with only a few back-end applications.
  • It increases the organizational efficiency and reduces the latency of business events by providing real-time access to business data and business logic, and avoiding manual synchronization of data between applications.
  • Direct access to back-end applications reduces the duplication of business logic across multiple tiers. As a result, changes to business logic can be made in one tier rather than in multiple applications.
  • It can enable re-use of investments already made with the organization.


Limitations
Although this is a reasonable starting Application pattern for integrating applications in a one to one relationship with one another, this pattern will result in a many to many "spaghetti" configuration with point to point integration mappings for each application pair. Also, the expansion of this implementation into a multi-point configuration will require additional application logic to handle the coordination.

This pattern cannot be used for intelligent routing of requests, decomposition and re-composition of requests, and for invoking complex business process workflow as a result of a request from another application. Under such circumstances, you should consider a more advanced Application pattern, such as Broker or Serial/Parallel Process.

Putting the Application pattern to use
ITSO Electronics, an electronics retailer/wholesaler, wants to integrate their retail and wholesale departments. Currently, both organizations have proven IT infrastructures but have no interconnectivity. The first process ITSO Electronics wants to focus on is the inventory and order replenishment process. Currently, the items sold are tallied at the end of the month by the retail ordering process and delivered to the wholesale organization by internal mail. This creates a lag in the inventory replenishment process and causes many out of stock situations. A primary business goal is to minimize the loss of sales due to out of stock situations. To meet these requirements ITSO Electronics chooses the Direct Connection application pattern.

Message Connection variation
Message Connection variation
For a legend, please see above.

The Message Connection variation, shown in the figure above, applies to solutions where the business process does not require a response from the target application within the scope of the interaction.

Business and IT Drivers
The business and IT driver for choosing the Message Connection variation of the Direct Connection application pattern is to:

  • Support real-time one-way message flows


The main driver for selecting this variation is when the business process has no interest in the result of the operation. This variation also has the most natural fit when message-oriented middleware is used, such as IBM WebSphere MQ.

Putting the Application pattern to use
In our scenario the retail department of the ITSO Electronics organization needs to notify the wholesale department to update their inventory records when a part needs to be ordered. The retail department does not require any acknowledgement of this request. To meet these requirements ITSO Electronics chooses the Message Connection variation of the Direct Connection application pattern.

Call Connection variation
Call Connection variation
For a legend, please see above.

The Call Connection variation, shown in the figure above, applies to solutions where the business process depends on the target application to process a request and return a response within the scope of the interaction.

Business and IT drivers
The business and IT driver for choosing the Call Connection variation of the Direct Connection application pattern is to:

  • Support real-time request/reply message flows


The main driver for selecting this variation is when the business process does require a result message from the interaction.

Putting the Application pattern to use
In our scenario the retail department of the ITSO Electronics organization needs to be advised by the wholesale department of the expected delivery date of a part on order that is out of stock with the retail department. To meet these requirements ITSO Electronics chooses the Call Connection variation of the Direct Connection application pattern.


Broker
Broker application pattern
For a legend, please see above.

The Broker application pattern, shown in the figure above, is based on a 1-to-N topology that separates distribution rules from the applications. It allows a single interaction from the source application to be distributed to multiple target applications concurrently. This application pattern reduces the proliferation of point-to-point connections.

The Broker application pattern applies to solutions where the source application starts an interaction that is distributed to multiple target applications that are within the organization. It separates the application logic from the distribution logic based on broker rules. The decomposition/ recomposition of the interaction is managed by the broker rules tier.

The Broker pattern reuses the Direct Connection pattern to provide connectivity between the tiers. The Broker Rules may support Message variation or Call variation (or both variations) of the Direct Connection pattern.

The Broker application pattern was previously known as the Aggregator application pattern for read intent calls and the Broker application pattern for Messages and update intent calls. However, this distinction was found to be of insufficient value to warrant a separate pattern - and so it has been dropped from the revised PI patterns.

The Broker application pattern is also used as the Application pattern for the Pub/Sub Runtime variation which can be found here.

Business and IT Drivers
The primary business driver for selecting this Application pattern is to allow one application to interact with one or more of multiple target applications. Using a hub-and-spoke architecture instead of a point-to-point architecture allows for the seamless integration of applications while minimizing the complexity. A request for information can be routed to one of many targets or simultaneously to multiple targets. The resulting request message can be decomposed into multiple request messages, and the reply messages then recomposed into a single reply message using appropriate recomposition rules.

This externalization of routing, decomposition, and recomposition rules from individual source and target applications increases the maintainability and flexibility and reduces the enterprise wide integration complexity.

This Application pattern is particularly important when a processing request requires execution of multiple interactions concurrently, or where the source application should be relieved of the need to know anything about its targets.

The primary IT driver for selecting this Application pattern is to allow loose coupling of clients and services with minimum modification to each. The solution should allow for multiple transmission protocols to be used and for transformation of protocols between client and service.

Solution
This Application pattern, as shown above, is divided into a number of logical components:

  • The Source Application tier represents one or more applications that are interested in interacting with the target applications.
  • The Broker Rules tier reduces the proliferation of direct connections. In addition, it supports message routing, decomposition and recomposition, message enhancement and transformation. These rules are often captured as business rules that govern the behavior of the broker tier. This tier also uses a work-in-progress data store to retain the intermediate results from the responses coming back from target applications until all the necessary responses are received.
  • The Target Application tier represents new, modified existing, or unmodified existing applications. These applications are responsible for implementing the necessary business services.


Guidelines for use
To increase the flexibility of the solution and responsiveness to changing business requirements, it is recommended that particular attention is paid to definition of reusable messages/services that pass through the Broker tier.

Robust transaction processing systems should be used to implement the back-end applications to ensure availability, scalability, and performance.

A decomposition implementation (one source call to multiple target calls) requires state persistence and re-composition of the response messages. Standards should be used where possible to minimize future changes required to the source and target applications.

Benefits
The benefits of this Application pattern are:

  • It allows the integration of multiple, diverse applications.
  • It minimizes the impact to existing applications.
  • The Broker tier provides routing services, relieving the source application from being aware of the target application.
  • The Broker tier can provide transformation services that allow the source and target to use different communication protocols.
  • The Broker tier can provide decomposition/recomposition of messages, allowing one request from the source to be satisfied using multiple target applications. The fact that the response is a composite of multiple requests and responses is hidden from the source application.
  • The Broker tier minimizes the impact of changes in location of the target application.


Limitations
Logic must be implemented at the broker for routing and decomposition/recomposition tasks.

Putting the Application pattern to use
ITSO Electronics consists of multiple Retail stores and Wholesale departments. The Retail stores get their supplies from the Wholesale departments and have a need to request the delivery dates of those supplies before ordering. Currently there is no integration of the Retail and Wholesale applications. All interaction between the two are done over the phone or by mail. A solution must be found to allow Retail stores to request delivery dates from the Wholesale departments. To eliminate the need for the Retail departments to know which Wholesale department carries which supplies, a Broker is needed to take incoming requests and direct them based on part numbers to the Wholesale department that carries them. In the event that a part is carried by multiple Wholesale departments, the broker must get delivery dates from each and return the best date and the Wholesale department that can supply it to the Retail department.

Broker=Router variation
Router variation
For a legend, please see above.

The Router variation of the Broker application pattern, shown in the figure above, applies to solutions where the source application initiates an interaction that is forwarded to at most one of multiple target applications.

Where the Broker application pattern enables 1:N connectivity, the Router application pattern enables 1:1 connectivity where the Router Rules tier selects the target.

The Router variation of the Broker application pattern was previously known as the Router variation of the Aggregator application pattern. [The Aggregator application pattern facilitates multi-point request for information integration between applications.]

Business and IT Drivers
The primary business driver for selecting this Application pattern is similar to that of the Broker application pattern. The difference lies in the fact that the Router tier routes the request to only one of multiple target applications. The requirement for transformation of message and interface format still applies. Externalizing the routing from individual source and target applications increases the maintainability and flexibility and reduces the enterprise wide integration complexity.

This Application pattern is particularly important when a processing request requires the source application to be relieved of the need to know anything about its targets.

The primary IT driver for selecting this Application pattern is to allow loose coupling of clients and services with minimum modification to each. The solution should allow for multiple transmission protocols to be used and for transformation of protocols between client and service.

Solution
This Application pattern provides a routing function to allow any attached (initiating) application using a single router link to connect to one of multiple target applications. While access to multiple applications is supported, at any given time an application is connected to only one other application. This Application pattern, as shown in the figure above, is divided into a number of logical components:

  • The Source Application tier represents one or more applications that are interested in interacting with the target applications, one target at a time.
  • The Router Rules tier represents any business rules associated with the message handling, such as routing and transformation. It receives requests from multiple source applications and routes them intelligently to the appropriate target applications. The resulting integration is essentially a point-to-point connection between source and target. This tier implements minimal business logic.
  • The Target Application tier represents new, modified existing, or unmodified existing applications. These applications are responsible for implementing the necessary business services.


Guidelines for use
The guidelines for this application pattern are the same as those for the Broker application pattern.

Benefits
The benefits of this Application pattern are:

  • It allows the integration of multiple, diverse applications
  • It minimizes the impact to existing applications
  • It provides routing services, relieving the source application from being aware of the target application.
  • It provides transformation services that allow the source and target to use different communication protocols.
  • The use of a router minimizes the impact of changes in location of the target application.


Limitations
With the Router variation, there is limited ability in the router to manipulate the requests. It performs intelligent routing and protocol transformation, but does not have the ability to send simultaneous requests to the target applications based on one incoming request, nor does decomposition / recomposition ability.

Putting the Application pattern to use
ITSO Electronics consists of multiple Retail stores and Wholesale departments. The Retail stores get their supplies from the Wholesale departments and have a need to request the delivery dates of those supplies before ordering. Currently there is no integration of the Retail and Wholesale applications. All interaction between the two are done over the phone or by mail. A solution must be found to allow the Retail stores to request delivery dates from the Wholesale departments. To eliminate the need for the Retail departments to know which Wholesale department carries which supplies, a Router is needed to take incoming requests and direct them based on part numbers to the Wholesale department that carries them. This differs from the example outlined in the Broker pattern in that only one Wholesale department will carry a part. There is no need to distribute one request to multiple Wholesale departments simultaneously to see who can supply the part at the earliest date.


Serial Process
Serial Process application pattern
For a legend, please see above.

The Serial Process Application pattern, shown in the figure above, extends the 1:N topology provided by the Broker Application pattern. It facilitates the sequential execution of business services hosted by several target applications. Therefore, it enables the orchestration of a serial business process in response to an interaction initiated by the source application.

Business and IT Drivers
The primary business driver for selecting this Application pattern is to support the composition of end-to-end business process flows by leveraging business services implemented by several target applications. From an IT perspective, the key driver for selecting this Application pattern is improving the flexibility and responsiveness of IT by externalizing the process flow logic from individual applications.

Solution
The Serial Process Application pattern is broken down into three logical tiers:

  • The Source Application tier is the same as for the Broker Application pattern.
  • The Serial Process Rules tier supports most of the services provided by the Broker tier in the Broker Application pattern, including routing of requests, protocol conversion, message broadcasting, and message decomposition and recomposition. In addition, it supports the separation of business process flow logic from individual application logic. The process logic is governed by serial process rules that define execution rules for each target application, together with control flow and data flow rules. It may also include any necessary adapter rules.
    The combination of these process execution rules are stored in read-only databases. This externalization of process flow logic is essential for the implementation of a flexible and responsive IT environment that can respond quickly to changing business needs. It also makes it possible to compose new end-to-end processes by combining different business services provided by different applications. Finally, this tier uses a work-in-progress (WIP) database to store the intermediate results from the execution of different process steps.
  • The Target Application tier is the same as for the Broker Application pattern.


Guidelines for use
The flexibility and responsiveness provided by this Application pattern heavily depend on the externalization of process execution logic from individual applications. Applications with designs based on a service-oriented architecture (SOA) approach, which have well-defined and coarse-grained business services that represent a unit of work, are better suited for participation in this Application pattern. You must be able to compose these business services into an end-to-end process flow. A given service may need to participate in more than one end-to-end process.

Typically, legacy applications are not designed with this thinking in mind. Similarly, many of the legacy applications have significant amounts of process logic embedded within them. These constraints in existing environments may pose challenges to fully implementing the vision promised by this Application pattern. Careful refactoring of legacy and packaged applications by wrappering them into business services is a good starting point for the eventual widespread implementation of this Application pattern within an enterprise.

Composition of process flows by tying together different applications may introduce the need for compensating transaction support. This is especially the case when certain participating target applications do not leverage XA-compliant transaction processing engines. In such cases, it may be necessary to design compensating transaction pairs for every affected transaction and execute them if there is a need to reverse a particular portion of the process flow. You may need to modify participating legacy and packaged target applications to introduce compensating transactions if they do not already implement such mechanisms.

Finally, pay particular attention to the Business Process Management capabilities supported by the business process design tools and the process execution engines when you select middleware products that facilitate automation of business processes. The eventual goal is to enable business users to compose business processes and make necessary changes with minimal involvement from IT professionals. The business processes that are defined must be easily exported into a process execution engine. More sophisticated business process management tools allow for the definition of metrics during the process design to measure the effectiveness of process implementation and support monitoring of the metrics in the process execution engine.

Benefits
The Serial Process Application pattern improves the flexibility and responsiveness of an organization by implementing end-to-end process flows and by externalizing process logic from individual applications. In addition, it provides a foundation for automated support for Business Process Management that enables the monitoring and measurement of the effectiveness of business processes.

Limitations
This Application pattern is ideally suited for straight-through processing where human interactions are not necessary to complete an end-to-end process. If support for human interactions is needed to complete certain process steps, consider the Workflow variation of this Application pattern.

Putting the Application pattern to use
ITSO Electronics wants to integrate its retail department with its two inventory wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process.

Typically, the retail department places orders with Wholesale A. However, when the Wholesale A is unable to guarantee delivery within seven days, Wholesale B is contacted to check the anticipated delivery date. Then, the order is placed with departments that guarantee the shortest delivery date.

To meet these business process automation requirements, ITSO Electronics chooses the Serial Process Application pattern. The primary driver for this selection is the need to externalize process logic from individual applications. This promotes flexibility and responsiveness to changing business needs.

Serial Workflow variation
Workflow variation
For a legend, please see above.

The Serial Workflow variation of the Serial Process Application pattern, shown in the figure above, extends the basic serial process orchestration capability by supporting human interaction for completing certain process steps.

Business and IT Drivers
All the business and IT drivers listed under the Serial Process Application pattern apply to this variation as well. The additional business driver for selecting this variation is the need to support human interaction and intervention within the process flow. Support for long-running transactions is another IT driver, which is often a prerequisite for the automation of complex process flows involving human interaction.

Solution
The Serial Workflow variation is broken down into three logical tiers:

  • The Source Application tier is the same as for the Serial Process Application pattern.
  • The Serial Workflow Rules tier supports all the services provided by the serial process rules tier within the Serial Process Application pattern. In addition, it supports certain tasks within the process to be routed to people for completion. To accomplish this, the process execution rules are augmented with task-resource relationships that define which resources are capable of performing which tasks.
    In this context, note the following points:
    • A task is a portion of the end-to-end process.
    • Resources are capable of executing these tasks.
    • People, departments, and target applications can all be resources capable of executing a particular task.
    This tier resolves the task-resource relationship during the execution of a process. If the need for human interaction is identified, the task is added to a work list associated with an individual or a department as a work item to be completed by a person. The process is typically suspended until the completion of the task.
    Finally, this tier provides support for long-running transactions. It uses a WIP database to store the intermediate results from the execution of different process steps until the complete execution of the end-to-end process.
  • The Target Application tier is the same as for the Serial Process Application pattern.


Guidelines for use
These guidelines apply to this variation in addition to the guidelines that are documented in "Serial Process Application pattern" above. We recommend that you implement people-based exception handling for the majority of the automated tasks within the process. If an automated task reaches certain error conditions, a person must be able to intervene and handle exceptions.

Benefits
The Serial Workflow Application pattern improves the flexibility and responsiveness of an organization. It does this by implementing end-to-end process flows that externalize process logic from the individual application. Further flexibility is introduced by the externalization of task-resource resolution rules. In addition, it provides a foundation for automated support for Business Process Management that enables monitoring and measurements of the effectiveness of business processes.

Limitations
This variation does not support the parallel execution of multiple tasks. Under such circumstances, consider the more advanced Parallel Process Application pattern and Parallel Workflow variation.

Putting the Application to use
ITSO Electronics wants to integrate its retail department with its two wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but have no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process. Typically, the retail department places orders with Wholesale A. However, when the Wholesale A is unable to guarantee delivery within seven days, Wholesale B is contacted to check the anticipated delivery date.

The main change from the scenario used in "Serial Process Application pattern", is documented here. If both Wholesale A and Wholesale B cannot offer delivery within seven days, a retail department manager must review the shortest anticipated delivery date proposed by the wholesale department systems and approve the order before placing it. The intent of this review is to determine whether other sourcing options must be considered.

To meet these business process automation requirements, ITSO Electronics chooses the Serial Workflow variation of Serial Process Application pattern. The primary drivers for this selection include the need for externalization of process logic from the individual application. This promotes flexibility and responsiveness to changing business needs and the need to support human interaction.


Parallel Process
Parallel Process application pattern
For a legend, please see above.

The Parallel Process application pattern, shown above, extends the basic serial process orchestration capability provided by the Serial Process application pattern by supporting parallel (concurrent) execution of the sub-processes.

Business and IT Drivers
All the business and IT drivers listed under the Serial Process application pattern apply to this Application pattern as well. The additional business driver for selecting this pattern is the need to reduce cycle time through the parallel execution of certain portions of the process flow.

Solution
The Parallel Process application pattern is broken down into three logical tiers:

  • The Source Application tier is the same as for the Serial Process application pattern.
  • The Parallel Process Rules tier supports all the services provided by the serial process rules tier within the Serial Process application pattern. In addition, the interaction initiated by the source application may control parallel (concurrent) sub-processes on multiple target applications. Each sub-process may consist of a sequence of operations executed in succession on a target application. This parallelism requires that additional start and join conditions be defined for sub-processes executing in parallel. This requires sophisticated runtime engines that can initiate parallel threads of control, ensure these threads join upon completion, and manage them as a unit (for example to allow cancellation of the process or to report its status).
  • The Target Application tier is the same as for the Serial Process application pattern.


Guidelines for use
The following guidelines apply to this variation in addition to the guidelines that are documented under the Serial Process application pattern.

The implementation of parallel processes without sufficient support from the selected runtime engine would require the development of excessive custom code. The need for parallel process execution must be analyzed before middleware selection decisions are finalized.

Judicious use of parallelism is a powerful tool for reducing the cycle time of a process in the right circumstances. However, in practice, it is critical to ensure that all of the error scenarios are carefully analyzed and that the impact of these scenarios upon the end-user experience is thoroughly understood. The number of error scenarios and processing complexity increases exponentially with the degree of parallelism. Hence, the best practice is to start with a serial process and introduce limited parallelism only where there is a clear and worthwhile benefit.

Benefits
In addition to providing all the benefits provided by the Serial Process application pattern, this pattern provides a foundation for the reduction of cycle times by implementing parallel processes.

Limitations
Parallel processes are more complex to design, test, and operate than serial processes.

In addition, this Application pattern is ideally suited for straight-through processing where human interactions are not necessary to complete an end-to-end process. If support for human interactions are needed to complete certain process steps, consider the Workflow variation of this Application pattern.

Putting the Application pattern to use
ITSO Electronics, an electronics retailer/wholesaler, wants to integrate its retail department with its two wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but have no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process.

The main difference from the scenario used in the Serial Process and Serial Workflow application patterns sections is that here both wholesalers are queried in parallel to find who offers the shortest delivery time. In other words, Wholesale Dept. A is not considered as the defacto supplier of parts in this scenario. The order is then automatically placed with the wholesale department that offers the shortest delivery date.

To meet these business process automation requirements, ITSO Electronics chooses the Parallel Process application pattern. The primary drivers for this selection include the need for externalization of process logic from the individual application, thus promoting flexibility and responsiveness to changing business needs and addressing the need for reducing cycle time of queries by simultaneously sending enquiries to the two departments for the best delivery date.

Parallel Workflow variation
Workflow variation
For a legend, please see above.

The Parallel Workflow variation of the Parallel Process application pattern, shown above, extends the basic parallel process orchestration capability by supporting human interaction for completing certain process steps. This is the most sophisticated Process Integration Application pattern in the domain of Application Integration patterns.

Business and IT Drivers
All of the business and IT drivers listed under the Parallel Process application pattern apply to this variation as well. The additional business driver for selecting this variation is the need to support human interaction and intervention within the process flow. Support for long running transactions is another IT driver, which is often a prerequisite for the automation of complex process flows that involve human interaction.

Solution
The Parallel Workflow variation is broken down into three logical tiers:

  • The Source Application tier is the same as for the Parallel Process application pattern.
  • The Parallel Workflow Rules tier supports all the services provided by the parallel process rules tier within the Parallel Process application pattern. In addition, it supports certain tasks within the process to be routed to human actors for completion. To accomplish this, the process execution rules are augmented with task-resource relationships that define which resources are capable of performing which tasks. In this context:
    • A task is a portion of the end-to-end process.
    • Resources are capable of executing these tasks.
    • People, departments, and target applications can all be resources capable of executing a particular task.
    This tier resolves the task-resource relationship during the execution of a process. If the need for human interaction is identified, the task is added to a worklist associated with an individual or a department as a work item to be completed by a human. The process is typically suspended until the completion of the task.
    Finally, this tier provides support for long-running transactions and utilizes a work-in-progress (WIP) database to store the intermediate results from the execution of different process steps until the complete execution of the end-to-end process.
  • The Target Application tier is the same as for the Parallel Process application pattern.


Guidelines for use
The following guidelines apply to this variation in addition to the guidelines that are documented under the Parallel Process application pattern.

It is recommended that people-based exception handling be implemented for all automated tasks within the process. In other words, if an automated task reaches certain error conditions, human actors must be able to intervene and handle the exceptions.

Benefits
The Parallel Workflow application pattern improves the flexibility and responsiveness of an organization by implementing end-to-end process flows that externalize process logic from individual applications. Further flexibility is introduced by the externalization of task-resource resolution rules.

It supports the reduction of cycle time by supporting parallel execution of portions of a process flow.

In addition, it provides a foundation for automated support for Business Process Management that enables monitoring and measurement of the effectiveness of business processes.

Limitations
Only a few middleware products are capable of supporting all the capabilities needed to realize this Application pattern. If this Application pattern is implemented using middleware products that do not support the necessary capabilities, the implementation could be very complex.

Putting the Application to use
ITSO Electronics, an electronics retailer/wholesaler, wants to integrate its retail department with its two wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but have no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process.

The main difference from the scenario used in Parallel Process application patterns sections is documented here. In this scenario, both wholesalers are queried in parallel to find who offers the shortest delivery time. The order is then automatically placed with the wholesale department that offers the shortest delivery date, unless the shortest delivery time received from the wholesale departments exceeds 10 business days. In that case, a human intervention is required by the Retail Department Manager to review the anticipated delivery date to determine other sourcing options that must be considered.

To meet these business process automation requirements, ITSO Electronics chooses the Parallel Workflow variation of the Parallel Process application pattern. The primary drivers for this selection include the need for the externalization of process logic from the individual application, thus promoting flexibility and responsiveness to changing business requirements, the need for reducing cycle time of queries by simultaneously sending enquiries to the two departments for the best delivery date, and the need for supporting human interaction during the execution of the process flow.

Data Integration application patterns

Overview
When applications need to share information rather than coordinate processing, data integration is more appropriate than a process integration approach. Note, however, that when the frequency of data update is extremely high (for example, when integrating an order entry system with a back-end ERP system), process integration is the best solution. When this is not the case, however, integration of (application) data repositories is handled outside of any specific application request.


Explanation for re-engineering of Data Integration application patterns.


Data Integration application patterns and variations
Data Integration application patterns and variations

Business and IT drivers
Business drivers

IT drivers

Data Integration application patterns




Federation
The Federation application pattern is a basic Data Integration application pattern that provides access to many diverse data sources and provides the appearance that these sources are a single logical data store. This appearance is delivered as follows:
1. Exposing a single consistent interface to the user (or application) that invokes the function
2. Translating that interface to whatever interface is needed for the underlying data
3. Compensating for any differences in function between the different sources
4. Allowing data from different sources to be combined into a single result set that is returned to the user

Business and IT Drivers
Federation may be required in any business process where the data needed exists in a number of different locations. Such diversity may be the result of historical, technical or organizational factors. Federation is preferred over other data integration methods, such as Population, when the access required meets one or more of the following criteria:

  • (Near) real-time access is needed to rapidly changing data.
  • Making a consolidated copy of the data is not possible for technical, legal or other reasons.
  • Read/write access to the data is required, rather than read-only.
  • Reducing or limiting the number of copies of the data is a goal.
The Federation application pattern's connector/adapter design allows for improved maintainability, minimized TCO, leveraging of existing technology investments, and reduced deployment and implementation costs.

Federation application pattern
Federation application pattern
Data Integration application patterns legend
When called by an application, Federation uses its metadata store to determine where and in what format the required data is stored. Metadata mapping also enables the decomposition of the unified query into requests to each individual repository. The information model thus appears as one unified virtual repository to users. Using adapters for each target repository, data is accessed and retrieved. Based on its knowledge of functionality, performance, and other factors, Federation determines the optimal plan for performing the incoming query, pushing down function to the remote data stores or compensating for missing function locally, and storing intermediate results in the local temporary store. Federation then returns a single result to the calling application, thus integrating the multiple disjoint formats into a common federated schema.

Federation supports both structured and unstructured data, as well as read-only and read/write access to the underlying data stores. Read-write access is best limited to single remote sources, in part because of fundamental theoretical limitations in support for two-phase commit in a fully distributed environment.

Federation=Cache variation pattern
Federation=Cache variation pattern
For a legend please see the Federation application pattern image.

Local temporary storage can be used to cache data returned from read-only queries to remote data sources. Under defined circumstances, this cache can be used to speed up query response time or to compensate for a data source that is temporarily off line. Such function must be used carefully, however, as the cached data and its underlying source may no longer be in sync (there may be a latency involved).

It is also possible and often necessary to maintain the contents of the cache. This involves the use of the Population application pattern described below, "Data Integration::Population".


Population
The Population application pattern has a very simple model. It gathers data from one or more sources, processes that data in an appropriate way, and applies it to some data target. The primary business driver for population is to gather and reconcile data from multiple data sources in advance of a user's need to use this information.

In some cases, the reconciliation is sufficiently simple that it can be conceived as a single (integrated) function. In many cases, however, the transformation and restructuring is rather complex or the gathering phase has unique characteristics.
This leads to four variations on the basic Application pattern as follows:

  • Population=Multi Step variation pattern
  • Population=Multi Step Gather variation pattern
  • Population=Multi Step Process variation pattern
  • Population=Multi Step Federated Gather variation pattern


These population patterns are often applied towards business intelligence-related business problems. They can also be utilized to provide content feeds into an e-business portal of more unstructured data. This "content" can then be accessed via the portal, or even searched via basic portal search capabilities.

Business and IT Drivers
Any business need that requires a specialized copy of data (derived data) from a pre-existing source may indicate the use of the Population application pattern or one of its variation patterns. These needs are most often seen in business intelligence and content search and related applications. However, some cases are also seen in a pure operational environment, where a dedicated copy of data is needed. A key indicator is that the use of the derived data is read-only or a close approximation to it. If there are significant amounts of read/write usage of the derived data, the Two-way Synchronization pattern is indicated.

Such specialized, derived data copies may be:

  • Subsets of existing data sources: Limiting access for ease of use or understanding, for security or privacy, or for the needs of a particular business process
  • Modified versions of existing data sources: Creating point-in-time (stable) or historical versions of the source data, cleansing data of errors or changing structures
  • Combinations of existing data sources: Joining or reconciling data from multiple sources
  • Creation of a more usable and relevant organization of documents or unstructured data, built from a vast set of original documents, and based on specified selection criteria


The business objective can often be summarized as providing the user with quick access to useful information instead of bombarding the user with too much, irrelevant, incorrect, or otherwise useless misinformation.

In many cases, it is the IT drivers rather than the business drivers that dictate the use of the Population set of patterns, because in many cases one can envisage that the business need can be equally well satisfied either by direct access to the original sources or to a copy of those sources. These IT drivers include, among others:

  • Improved performance of user access
  • Protection of performance of data source systems
  • Reliability of access to and extended availability of the required data
  • Load distribution across systems


Population application pattern
Population application pattern
For a legend please see the Federation application pattern image.

The figure above represents the basic population functionality as a "read dataset - process - write dataset" model.

  • There can be one or more source data stores that are read by the population application. These source data stores are created and maintained by other processes. The target data stores are the output of the population application. These can be the final output from the process, or can be an intermediate data store used as a source for another step in the process.
  • The extraction rules may range from a simple rule such as including all data, to a more complex rule, prescribing the extraction of only specific fields from specific records under varying conditions. Similarly, the load rules for the target data can range from a simple process of overwriting the target data store to a complex process of inserting new records and updating existing records.
  • The metadata contains the rules describing which records from the source are read, how they are modified (if needed) on their way to the target, and how they are applied to the target. The rules are depicted in this way to emphasize the best practice of having a rules-driven application, rather than hard-coding the rules in the application; this facilitates easier maintenance.
    The metadata also describes the output that the population application produces, such as statistics, timing information, and so on. In general, both source and target can contain any type of data, including structured and unstructured data. However, in the majority of the cases, this application pattern is used for moving/copying structured data from one data store to another with relatively simple manipulation of the data.


Population=Multi Step variation application pattern
Population=Multi Step variation application pattern
For a legend please see the Federation application pattern image.

Note: We have deliberately avoided using the traditional extract, transform, and load terminology in order to accommodate the emerging functionality requirements and variations of population patterns.

In the Multi Step variation of the Population application pattern, the basic population function of the Population application pattern is decomposed into its three primary constituents or steps:

  • Gather
  • Process
  • Apply


The intermediate target data created by one step acts as the source data for the subsequent step. In some cases, the temporary stores may be physically instantiated files; in more modern implementations, the data may be "piped" from one step of the population process to the next.

The figure above shows the three logical steps: Gather, Process and Apply. In most best practice implementations, these functional steps contain additional sub tasks.

  • The Gather step extracts data according to some defined rules from the source data store. This data store is typically owned by another application and used in a read/write fashion by that application. This data source may also be a special kind of data source created by system or user processes. The extraction rules may range from a simple rule such as including all data, to a more complex rule, prescribing the extraction of only specific fields from specific records under varying conditions.
    Breaking out this step recognizes that Gather may have very specific function or placement depending on the particular implementation. Gather may have to read unusual data structures or may need to take into account very specific conditions in the data source, such as "read this field as character if a particular flag is set in some other field; otherwise read as decimal". Furthermore, particular instances of Gather may have to be co-located with their corresponding data sources.
  • The Process step transforms data from an input to an output structure according to supplied rules. Processing covers a wide variety of activities, including reconciling data from many inputs, transforming data in individual fields based on predefined rules or based on the content of other fields, and so on. When two or more inputs are involved, there is generally no guarantee that all inputs will be present when required. The Process step must be able to handle this situation.
  • The Apply step loads the processed data into the target data store. Applying the target data can range from a simple process of overwriting the target data store to a complex process of inserting new records and updating existing records.
    In any population process, different Apply functions may need to be invoked under different circumstances. For example, the first time a target is loaded is a straightforward write, but later updates may require logic to determine whether data should be overwritten, appended to, or some other custom operation.
  • A common metadata store links the three steps. This store contains the metadata that describes the data to be gathered, the rules for processing it and the way to apply the resulting data to the target. It also serves as a store for information about the success or failure of each step and as a means for inter-step communication.


When Population consists of multiple steps, there clearly must exist an entity that controls and orchestrates the entire set of function. This is not shown explicitly in the diagram simply because this controlling function seldom exists as a separate entity. It may be considered to be a function of the Process step in this case.

The actual implementation of Population=Multi Step can involve a fewer or greater number of steps than the three shown here. In such cases, the steps in the figure above must be adjusted accordingly, and consideration must be given to the placement of any additional tiers. A number of special cases are treated in the variations below. It is also important to note that this application pattern has been generalized to cover any source data store and target data store.

Population=Multi Step Gather variation application pattern
Population=Multi Step Gather variation application pattern
For a legend please see the Federation application pattern image.

The Multi Step Gather shown here is an extension of the Population=Multi Step variation shown above that recognizes that the Gather function itself may need to occur in multiple steps.

In the figure above an independent Gather step (Gather 1) extracts a specialized subset of the data and stores it in a temporary or persistent store. This data store is read, perhaps in conjunction with the original data store, by the Gather step (Gather 2) of the Population=Multi Step variation that completes the overall population process.

There are a number of circumstances where Multi Step Gather is found for structured and unstructured data, as follows:

  • Structured data - Gathering changed data
    When used with structured data, the primary driver for the Multi Step Gather variation is to reduce the latency of updating an existing target with changes that are occurring in the source system. Without such an ability to collect only the changes occurring in the source system(s), one would need to constantly rebuild the target data by scanning the entire contents of the data source(s) resulting in a high latency of content between the source and the target. The Multi Step Gather variation is therefore applied towards all data warehouses, as well as real-time or near real-time business intelligence related business problems and operational data stores (ODS) where very low latencies between the source and target data are critical.
    In the figure above:
    • Gather 1 identifies the changes that have occurred in the data source and writes them out to a target data store containing only changed data-either every occurrence or a consolidation of multiple occurrences. The Gather method involved varies by data source. For example, with relational data sources, the data replication features of most products provide a change capture facility that collects changes as they occur and writes them out to another table. For non-relational data sources, a more complex mechanism may be required, such as having an application create an "audit" journal of all changes, or having a general purpose program compare different versions of a data source and perform a DIFF operation to identify the changes and then write them out to another relational or non-relational target. In some cases, the Gather function is specific to a single type of data source; in other cases, it can handle a number of (usually related) types.
    • The Temporary/Persistent Store can be the final output from the process (for example, in the generation of an audit journal), or can be an intermediate store used as input to the remainder of the steps as shown for updating data warehouses, data marts, or an ODS.
    • The separate metadata contains the rules describing the specific objects of interest, the frequency of collecting changes, the collection of every change occurrence or a consolidation of the changes over a given interval, and the pruning of the target data store. Here too, the rules are depicted in this way to emphasize the best practice of having a rules-driven application, rather than hard-coding the rules in the application; this facilitates easier maintenance.
  • Unstructured data - Creating indices and taxonomies
    For unstructured data, this approach is often required to create indices or taxonomies of the source documents.
    The Population=Multi Step Gather variation also replaces the "Population Crawl and Discovery" application pattern previously used for unstructured data. It provides a structure for applications that retrieve and parse documents and data, and create resulting indices, taxonomies, and other summarizations of the original data. The Multi Step Process variation described below, "Population=Multi Step Process variation pattern", may also be required.
    These result sets may include:
    • A basic index of relevant documents that match a specified selection criteria
    • A categorization or clustering of common documents from the original data
    • An automatically built taxonomy of the original data, to allow for easy browsing
    • Locating expertise by automatically mapping the authors of the original data to topics of "experts"-based on the contents of the documents and the categories discovered.
    The primary driver here is to provide a more usable and relevant organization of documents or unstructured data, built from a vast set of original documents, and based on a specified selection criteria. The objective is to provide quick access to useful information instead of bombarding the user with too much information.
    Search engines that crawl the World Wide Web/file systems implement this variation, as well as the more advanced "discovery" search engines that perform document clustering/categorization, expertise location (that is, identify experts), and intelligent analysis of the document contents.
    This approach is best suited for selecting useful information from a huge collection of unstructured textual data. A variation of this can be used for working with other forms of unstructured data such as images, audio, and video files. In such cases additional transformation and translation services are required to parse and analyze the data.
    In the figure above:
    • Gather 1 crawls through multiple data stores, retrieving documents, parsing them, and building a result set of all documents that match the selection criteria. Alternatively, this initial step may parse the original data from multiple sources and build a single interim "index" that contains key pieces of document data and metadata.
    • This initial step then allows additional steps to summarize, categorize, create taxonomies, or locate experts from this single normalized index. In some cases, such as World Wide Web search engines, the contents of documents in one data source (that is, URL links) may actually be used to determine additional data sources to crawl. This is shown by Gather 2 using the results of Gather 1 in combination with the same or other data sources to build a more complete set of input data.
    • When the unstructured data recovered by these activities must be transformed, cleansed, or manipulated before it can be purposefully used, this is the responsibility of the Process and Apply steps. In more advanced search applications that perform document clustering and expertise identification, Multi Step Process described below, "Population=Multi Step Process variation pattern", may also be invoked.
    Further variations can be envisaged, for example, a two-step process where one step performs "search" types of activities, and the other actually populates the index from these searches. Such approaches are discussed in the redbook Patterns: Portal Search Custom Design, SG24-6881.


Population=Multi Step Process variation application pattern
Population=Multi Step Process variation application pattern
For a legend please see the Federation application pattern image.

Like the Multi Step Gather variation described above, "Population=Multi Step Gather variation pattern", the Multi Step Process variation is also an extension of the Multi Step Population variation described above, "Population pattern". In this case, the focus is on supporting population instances where the processing of the received data is rather complex and cannot be performed in a single pass as shown in the figure above.

In this Multi Step Process variation, the Process step is replaced by a more powerful Multi Step Process approach. Within this, the individual Process stages are more likely to be linked directly, as shown by the line connecting them, rather than through intermediate temporary stores, although this possibility is also depicted. Clearly, there may also be more than two stages.

As mentioned earlier, this Multi Step Process approach may be required when building summaries or categorizations of unstructured data, often in conjunction with the Multi Step Gather variation. The Multi Step Process variation may also be required with structured data, for example, when populating a multidimensional cube or snowflake schema from an enterprise data warehouse.

Another use of this variation is in data cleansing implementations. Data cleansing often requires multiple passes of the data to gather statistics, perform analyses, propose changes, obtain human approval, and so on. In many cases, the cleansed data may be partially written back directly to the source, as shown in the figure above.

Population=Multi Step Federated Gather variation application pattern
Population=Multi Step Federated Gather variation application pattern
For a legend please see the Federation application pattern image.

The figure above shows how the Population application pattern can be composed with the Federation application pattern as a means to gather data from one or more sources, by providing a unified query that accesses data in separated or remote structured and unstructured repositories in real-time.

Use of this variation pattern is indicated by a number of key requirements, such as reduced latency of population, reuse or extension of existing population investments, and reduced implementation or maintenance costs.

The figure above shows how the Gather step in Population=Multi Step variation described above, "Population=Multi Step variation pattern", is replaced by a potentially synchronous "Federated Gather" step that directly accesses remote data stores, structured or unstructured. This access is mediated through wrappers (aka adapters) that contain the logic to access the data, either directly or through an application API, and send the results back to the requestor tier. These requests may simultaneously access multiple data stores.

Metadata mapping enables the decomposition of a unified query into requests to each individual data store. The multiple data sources thus appear as one unified virtual data store to the requestor. In some cases, there may be separate metadata stores for the Population and Federation components, although this may lead to data consistency issues.

The Multi Step Federated Gather variation application pattern contains its own temporary/persistent store. This store can be used to cache results data obtained from remote sources, allowing continued access to remote data when the actual source is unavailable. Clearly, use of such a cache may have implications for data currency in the target.


Two-way Synchronization
This Two-way Synchronization application pattern was previously known as the Replication pattern. It enables a coordinated bidirectional update flow of data in a multi-copy database environment. It is important to highlight the "two-way" synchronization aspect of this Application pattern, as it is what distinguishes it from the "one-way" capabilities provided by the Population application patterns discussed above, "Data Integration::Population".

We focus on the two-way case here because it is of more interest in business intelligence and similar applications where the relationship between replicas is usually limited to pairs of replicas operating as true master/slaves or where the distributed read/write function is limited to a small percentage of the shared data.

Business and IT Drivers
As in the case of Population, any business need that requires a specialized copy of data-derived data-from a pre-existing source may indicate the need for the Two-way Synchronization application pattern. These needs are most often seen in business intelligence and content search and related applications. However, some cases are also seen in a pure operational environment, where a dedicated copy of data is needed. The key indicator for Synchronization is that the use of the derived data has some strong read-write characteristics.

The business and IT drivers for Two-way Synchronization are partially the same as those listed for Population, "Business and IT drivers" above. However, modern and more sophisticated business intelligence and combined operational/informational needs such as customer relationship management (CRM), call centers, customer portals, etc. place added requirements for updating the derived data. These modern business processes often require that the source and derived data are more closely synchronized than "pure" business intelligence applications, and thus need Two-way Synchronization.

As the need for synchronization increases, the differences between the source and derived data that can be handled decreases, because some transformations are fundamentally unidirectional, or are time-dependent. In the limit, the IT drivers for creating and managing a copy of the source have to be traded off against those for having a single copy of data and accessing that distributed data through the Federation application pattern.

Two-way Synchronization application pattern
Two-way Synchronization application pattern
For a legend please see the Federation application pattern image.

The figure above shows a basic two-way synchronization of data between two separate data stores. At a simplistic level, it can be compared to the basic Population application pattern described above, "Population pattern", with the only difference being that data now flows in both directions. Depending on the relationship between the data flowing in either direction, this similarity with Population may be more apparent than real. If the data elements flowing in both directions are fully independent, then Two-way Synchronization is no more than two separate instances of Population. However, it is more common to find some overlap between the data sets flowing in either direction. In this case, the need to reconcile data updates on both source/target systems means that the Two-way Synchronization pattern is rather more than two separate Population instances. A significant issue in this case is conflict detection and resolution when updates occur independently in the different data stores.

As indicated by the dotted boxes enclosing the source/target data stores and their controlling applications in the figure above, the Two-way Synchronization pattern may act directly at the data level or at the application level. However, from the viewpoint of Data Integration, the interactions are more likely to be at the data level, while in Process Integration the interactions are more often at the application level.

Applications in this solution design do not necessarily have to be identical.

Two-way Synchronization=Multi Step variation application pattern
Two-way Synchronization=Multi Step variation application pattern
For a legend please see the Federation application pattern image.

The figure above shows how the Population application pattern can be composed to implement both directions of the synchronization data flow. An additional function "Reconcile" appears between the two data flows, and it is here that the complex process of ensuring that data updates do not conflict, cancel out, or get otherwise corrupted is handled. If the opportunities for conflict are minimal (when there are few overlaps between data flowing in either direction), this pattern can be effectively constructed from existing Population components. However, for more complex situations, a specialized product solution will be more appropriate.

c