Level: Intermediate Tim Dunn (dunnt@uk.ibm.com), WebSphere MQ Integrator Performance Team, IBM
19 Nov 2003 Understanding the processing profile of each message flow is necessary to determine how many copies of a message flow you need to run to achieve a particular message rate. This article explains how a message flow runs in an execution group and helps you understand the processing characteristics of your own message flows, so that you run sufficient copies.
Introduction
Once message flow coding and test are complete,
you are in a position to configure the production environment.
A key question at this point is how many copies of each of the message
flows should be run. Run too few and the required message throughput
rate will not be met. Run too many and the system could become
flooded causing an unnecessary overhead as the operating system
attempts to manage many concurrent units of work. If the execution
groups have very large memory requirements, this can lead to excessive
paging that in the most extreme cases can render the machine useless
as the operating system attempts to manage the competing needs of
the different message flows running in their respective execution
groups. This is clearly not a good position to be in and can easily
be avoided by understanding the processing characteristics of the
message flow.
This article helps you avoid this situation by
providing an understanding of the principle of how a message flow
runs in an execution group and helps you understand the processing
characteristics of your own message flows.
In a complex environment, there will be many message flows
running within a single WebSphere® Business Integration Message Broker. In such a case, it is ideal to understand the characteristics of
all message flows. Where this is not feasible or practical it should
be possible to divide the message flows into a limited number of
categories where each message flow in the category has broadly similar
processing characteristics. The number of copies of the message
flow should be allocated on the characteristics of the category. To
do this, you will need to study in detail at least one flow per
category.
This article assumes that you are familiar with the major
components and configuration of WebSphere Business Integration Message Broker
V5. It also assumes that the ESQL code and node usage within the message flows
are optimized for performance.
Use of operating system threads by a message flow
A WebSphere Business Integration Message Broker
message flow can only process messages when running within the WebSphere
Business Integration Message Broker run time environment. Each
message flow must be assigned and successfully deployed to one or
more execution groups. The message flow is then able to process
any messages appearing on the input queue(s) associated with the
MQInput nodes. Each copy of the message flow runs on an operating
system thread in the Windows or UNIX environment. On z/OS, a UNIX
Systems Services (USS) thread is a Task Control Block (TCB). The
execution group is implemented as an operating system process in
the Windows and UNIX environment. On z/OS, a USS process is an
Address Space.
Operating system limits mean that it is not possible
to run more than around 500 threads per process (execution group)
in the Windows or UNIX environments. With z/OS the limit is 255
TCBs per address space (execution group). This places an upper
limit on the number of message flows that can be assigned to one
execution group. Once we allow for a number of WebSphere Business
Integration Message Broker management threads, the practical working
limits of the number of threads per process or TCBs per Address
Space is further reduced. Recommended working limits are 256 threads
per process for Windows and UNIX and 230 TCBs per Address Space
on z/OS. It is unusual for these limits to be a concern in practice
since it would be unwise to assign all copies of message flow or
all message flows to a single execution group. You are advised
to use more than one execution group to provide a level of resilience
in the event of execution group failure.
A single copy of a message flow may use more than
one thread depending on how it is written. A message flow requires
one operating system thread per MQInput node (or any other input
node). A message flow with three MQInput nodes will require three
operating system threads to execute even if only a single instance
of the message is specified. In this case, it would mean that it
is not possible to run more than 85 copies (256 thread working limit
divided by 3 per message flow) of such a flow in in one execution
group in a Windows environment.
Any Additional Instances that are specified for
a given message flow within an execution group are shared among
all of the MQInput nodes in the message flow. It is not possible
to guarantee that a thread is allocated to a particular MQInput
node. The only way to do this is to ensure that the message flow
contains a single MQInput node. Where multiple instances of a message
flow are in use the distribution of threads across the multiple
MQInput nodes will be dependent on the number of messages to be
processed on the WebSphere MQ queue from which the MQInput nodes
read. MQInput nodes reading from those input queues with the most
messages on them will obtain most threads. This distribution is
not fixed for the life of the WebSphere Business Integration Message
Broker. After a short period with no messages on an input queue
the additional threads will be returned to the pool. Each MQInput
node will still have one thread allocated.
How to run multiple copies of a message flow
There are two recommended mechanisms that let
you run multiple copies of a message flow within a WebSphere Business
Integration Message Broker. These are the use of the Additional
Instances attribute and the assignment of a message flow to more
than one execution group. Both of these actions must be performed
using the Message brokers Toolkit for WebSphere Studio in WebSphere
Business Integration Message Broker V5 or the Control Center in WebSphere
MQ Integration V2.
Additional Instances
Once a message flow has been successfully assigned to an execution,
there is by default a single copy of the message flow running. It is possible
to increase the number of copies of that flow by increasing the value of the
Additional Instances parameter. This specifies how many more copies of the
message flow should be run in that particular execution group. The technique
for doing this varies between WebSphere MQSeries Integration V2.1 and WebSphere
Business Integration Message Broker V5.
WebSphere MQSeries Integration V2.1
Once you've successfully assigned a message flow
to an execution group:
- Expand the Domain Hierarchy in the Operations pane of the Control
Center.
- Right-click on the required message flow within the execution
group and select the properties option. A window displays the
properties of the message flow in that execution group. See Figure 1 below.
- Specify the number of additional copies of the message that
you would like to run in the Additional
Instances box. The value you enter applies to that message
flow in that execution group only. It is possible to run the
same or a different number of instances for that message flow
in another execution group.
Figure 1. Changing the number of instances of a
message flow in the control center.
WebSphere Business Integration Message Broker V5
The number of instances for a message flow has to be specified in at the message
flow level in the relevant Broker Archive (BAR). To do this:
- Select the BAR in the Broker Administration Navigator Perspective
in the Toolkit to change the number of instances for a particular
message flow.
- Select the Configure tab, required message flow, and specify
the number of instances required in the Additional Instances box
- Make changes to any other message flows in the same BAR file
as required (See Figure 2 below).
- Deploy the BAR to the broker to make the changes effective.
Figure 2. Changing the number of instances of a
message flow in the Message Broker Toolkit.
With the use of Additional Instances, separation
between different copies of the message flow is provided by the
operating system thread level protection on which the message flow
is running. Should an execution group be stopped or fail, all message
flow running in the execution group will stop processing messages.
For this reason, it is wise to consider assigning copies of the
message flow to more than one execution group.
Multiple execution groups
In this approach, one copy of a message flow is
assigned to one execution group within a given WebSphere Business
Integration Message Broker. When many copies of a message flow
are required, many execution groups are also required. The memory
and processing cost of an additional execution group is larger than
that for an within an existing execution, and for this reason you
may decide that this approach is not suitable because of the additional
resources required.
When a message flow is allocated to more than
one execution group, it is still possible to use the Additional
Instances attribute to increase the number of copies of the
message flow running in a particular execution group. Where multiple
copies of a message flow are required, it is most sensible to use
a combination of the Additional Instances
and Multiple Execution Group approaches.
Message sequencing
If message sequence must be maintained within
a message flow, then all instances of the message flow must be assigned
to the same execution group. Message sequence can only be coordinated
across instances of a message flow if all the instances are in the
same execution group. If instances of a message flow are spread
across more than one execution group, message sequence cannot be
guaranteed. Message sequencing is controlled by the Order Mode
property on the Advanced Properties tab of the MQInput node. There
are three possible values:
-
Default
- Messages are retrieved in the order defined by the queue attributes, but
this order is not guaranteed as the messages are processed by the message
flow.
-
User Id
- Messages that have the same UserIdentifier in the MQMD are
retrieved and processed in the order defined by the queue attributes,
and this order is guaranteed to be retained when the messages
are processed. Therefore, a message associated with a particular
UserIdentifier that is being processed by one thread will be
completely processed before the same thread, or another thread,
can start to process another message with the same UserIdentifier.
No other ordering is guaranteed to be preserved.
-
Queue order
- Messages are retrieved and processed by this node in the order defined
by the queue attributes, and this order is guaranteed to be retained when
the messages are processed. This behavior is identical to the behavior exhibited
if the Additional Instances property of the message flow is set
to 0.
Dependening on the strictness of the sequencing,
it may be possible to run multiple instances of a message. For
example, if it is only necessary to maintain message order with
a particular userid then messages for more than one one userid may
be processed at a time.
Where processing does have to serialized through a single
instance of a message flow it is recommended to minimize the amount of processing
in such a flow and to pass messages onto a subsequent message flow which can
be run with multiple copies at the earliest point possible.
Determining the message throughput rate for a message flow
Before you can decide how many copies of a message
flow are needed in total, you have to know the message throughput
rate that is achievable with one copy of the message flow, otherwise
setting the number of copies is pure guess work. It is not possible
to determine the throughput by inspecting the message flow or counting
the number of nodes; for example, you have to run some tests.
This testing does not have to be complex or take
a long period of time. The key requirement is to conduct the measurements
in a realistic environment, that is one as close to the production
environment as possible, using the same type and mix of data, the
same hardware and the same operating system. Measuring in an environment
that is noticeably different from that which is to be used for production
may generate misleading results and lead to a poorly configured
production environment.
The environment used for testing should ideally
be isolated so that the results that you obtain reflect only the
processing of the message flow and not other work being processed.
Obtaining reliable and repeatable measurements on a system which
is busy with other work is extremely difficult and should be avoided.
It is suggested that you measure message throughput over a
period of 5 minutes or more. Measuring for periods of 30 seconds
or less does not yield realistic results as data for the first few
messages may be buffered. This is probably unrealistic in the normal
course of processing where there will typically be a continual throughput
of messages.
The sequence of events in conducting such testing is
typically as follows:
- Initialize the environment, including starting WebSphere Business Integration
Message Broker with a single copy of the message flow being profiled.
- Use a program to load messages on to the WebSphere MQ input queue. The best
approach is to continually add messages onto the input queue as messages are
consumed by the message flow. Sometimes messages are preloaded onto the input
queue. This is not as effective and may skew the results.
- Determine the rate at which messages are written to the output
queue(s). This can be done by writing a simple program to repeatedly
issue an MQGET for messages on the output queue and record the
rate at which messages are read. The program should report the
rate every 30 seconds, for example. An alternative approach is
to keep all of the messages on the output queue and determine
the time interval between the first and
last message. The message rate is then given by the result
of (Put Time of Last Message Put Time of First Message) / Number
of Messages on the queue. You need to process a reasonable number
of messages. This approach is not effective for a small number
of messages and remember that the smaller the number of messages
the less accurate the calculation. To get valid Put timestamps
on the output messages, you need to change the Message Context
attribute in the advanced properties folder of the MQOutput node
to a value of Default.
When running such tests, it is important to ensure
that there are always messages on the input queue, otherwise the
rate reported will not be realistic.
The tools provided in SupportPac IH03, WebSphere MQSeries
Integration Message display, Test and Performance Utilities available at http://www.ibm.com/software/integration/support/supportpacs/individual/ih03.html
are particularly useful for such testing.
The output of this testing should be a message
rate that is achievable with each of the message flows. The next
stage is to compare the rate at which messages can be processed
with the message flow against the required message rate.
The rate at which message flows need to process messages
often varies considerably. Some message flow may only be used at
month end for example and then only need to process hundreds or
low thousands of messages over a day or two. Other message flows
may have to process messages every day and must be capable of processing
tens, hundreds, or even thousands of messages per second.
Those message flows with low throughput requirements probably
need no further study or planning if a small number of instances of the message
flow will be able to process the low volume of messages in sufficient time.
For those message flows with a message throughput
requirement that is significantly greater than the rate which was
achieved with one copy of the message flow, more involved analysis
is required to make optimal use of the resources that you have.
Without this you could easily over allocate resources, such as additional
brokers and machines.
The analysis required is not complex and can be
extremely useful in understanding the characteristics of your message
flow and system as a whole. What you need to determine is whether
the message flow is CPU or I/O bound. You also need to determine
whether by tuning the I/O configuration greater message throughput
is possible with a single copy of the message flow. Once you know
this, it will then be possible to finally determine how many copies
of the message flow and how much processing power is required to
achieve the required message rate. In this next section, we examine
how to determine whether your message flow is CPU or I/O bound.
Determining whether a message flow is CPU bound
or I/O bound
Processors now typically operate at Giga Hertz
clock speeds. Disks operate at millisecond speed. Given this difference
in magnitude, it is clearly much better to be limited by the speed
of a processor than by a disk. Processing which is CPU bound is
generally able to run faster through the use of processors that
run at a greater speed.
A message flow is CPU bound if one copy of it
it is able to fully utilize one processor of a machine. When this
happens, message throughput can be increased by the use of a faster
processor. Message flows that exhibit this characteristic tend
to process non persistent WebSphere MQ messages and have little
or no database I/O. Alternatively, they have may process persistent
messages but have significant parsing and manipulation of the message.
A message flow is I/O bound if it is not able
to fully utilize a processor despite an adequate availability of
CPU and messages to be processed. This manifests itself as a low
CPU utilization. Processing is restricted because the message flow
is waiting for one or more I/O operations to complete. This could
involve WebSphere MQ logging or database processing. For example,
a database log write (which is synchronous) to commit a database
Insert/Delete/Update operation by the message flow will result in
a delay while the log I/O completes. Dependening on the amount
of other activity to the log and the speed of the disk on which
the log is located, the log request may take up to 10 milliseconds
to complete. During this time the thread on which the message flow
is running will be blocked and unable to process any more messages.
To reduce such delays to a minimum, it is necessary to reduce the
latency of such requests by locating logs on disks with low latency,
such as those with a nonvolatile fast write cache. Such disks are
capable of processing a write request in 1 millisecond or less.
Solid state disks are an alternative low latency device. Where it is
not possible to reduce the latency of the disks you need to increase
the level of concurrency within the system,
that is to have more copies of the message flow running together
so that the CPU processing for one instance of the message flow
can be overlapped with the I/O processing of another. This
will significantly reduce the amount of time the CPU is idle waiting
for an I/O to complete.
A danger of running with I/O bound message flows
is to assume that no increase in message throughput is easily possible
and that another WebSphere Business Integration Message Broker must
be configured (possibly on the same or an additional machine) and
used to increase message throughput. This addition of resources
may not be required with suitable tuning. At this point, it is
important to understand in more detail what is happening in the
machine on which the WebSphere Business Integration Message Broker
is running. To do this you need to use a tool that is able to report
CPU and I/O utilization. The tools vary by platform but all achieve
a similar goal.
Suitable tools are:
- The Windows Task Manager or Windows Performance Monitor on
Windows 2000 and Windows NT (see the article,
Custom Performance Analysis using the Microsoft Performance Data
Helper for details on how the Windows Performance Monitor
may be used to measure system performance).
- iostat and vmstat on UNIX systems.
- Resource Measurement Facility (RMF) or Spool Display and Search Facility
(SDSF) on z/OS.
Use the tool appropriate to the platform on which
the WebSphere® Business Integration Message Broker is running. The
aim of the testing is to determine how much of one CPU a single
copy of the message flow is able to use while processing messages.
When the message flow is the only active work in the system, the
system CPU utilization will tell you how much CPU is being consumed
by the message flow. When there is other work active in the system,
you will need to subtract the sum of that work to determine the
amount of CPU consumed by the message flow. Alternatively, there
may be CPU consumption reporting at the process or address space
level.
To complete the analysis, you need to understand
how many CPU processors there are in the system on which the measurements
are taking place. This is so that you know what CPU utilization
represents one processors worth (and so one message flows worth)
of processing. On a four processor machine, each processor provides
25 percent of the available CPU. On an eight processor machine,
each processor provides 12.5 percent of the available CPU.
If the execution group is responsible for a system
CPU utilization of 25 percent on a four-processor system, the message
flow is fully utilizing one processor and so is CPU bound. This
is the maximum rate that the single message flow will be able to process
messages on that machine. A single copy of a message flow will not be able to use
more than one processor (assuming messages are only placed on one input queue).
Additional copies of the message flow would be able to use additional processors.
If the message flow was only responsible for a
system, CPU utilization of 15 percent on a four-processor system,
for example, then it is very likely that message processing is
I/O bound (assuming a plentiful supply of messages on the input
queue). The 15 percent utilization is significantly below the 25
percent that could be expected. At this point, you need to examine
I/O activity since this is the most likely reason for low CPU utilization.
Look for disks with poor response times or high utilizations.
The most likely sources of I/O delays are:
-
The queue manager log when processing persistent messages.
The
solution to this is to increase the buffer sizes and log extent
size of the queue manager log. Also, consider using low latency
disks on which the log is located such as one with a nonvolatile
write cache. This can significantly improve the rate at which
persistent messages can be processed.
-
The database log when there is database
Insert/Delete/Update activity within the message flow.
The same
considerations apply as for the queue manager log.
-
Poor response time from an application that has been called
synchronously.
This may be on the same or a different machine.
The solution to this problem is to investigate the behavior of
the other application and make whatever performance improvements
are possible. This may include running multiple copies of the
application or increasing the hardware resources of the machine
on which the application. runs.
-
Poor database buffer tuning.
If the message flow
reads data from a very large table in a database a large amount of I/O may
have to be performed by the database to retrieve the required row. The
solution to this is to tune the database.
At this point, you should know whether a particular
message flow is CPU bound or I/O bound. If it is I/O bound, you
should have taken the necessary steps to reduce the extent of the
I/O contention. It will not always be possible to change an I/O
bound message flow into a CPU bound one but in many cases it is
possible to reduce the extent to which it is I/O bound. Reducing
the I/O delays will lead to an increase in message throughput so
there is every incentive to go through this exercise.
Once you have determined the rate that is achievable
with a single copy of the message flow, start testing with an increasing
number of copies. An initial estimate of the number of copies required
for a CPU bound message flow is given by the result of (required
message rate / the rate possible with one copy of the message flow).
This is also likely to be the number of processors required as each
copy of the message flow is capable of fully utilizing one processor.
Where the number of processors required exceeds the capacity of
one machine, you will need to plan for a larger machine or use multiple
machines to achieve the target message rate.
If the message flow is I/O bound, it is likely
that you will need to run more copies than in the CPU bound case.
Start with the same number of copies of the message flow as for
the CPU bound case but expect to have to increase the number of
copies required. How many more copies are required will depend
on the extent to which the message flow is I/O bound. Continue
to increase the number of copies and monitor both CPU utilization
and message rate until the target message rate is achieved or the
processors are fully utilized. If the target message rate has not
been achieved, you will need to plan for a larger machine or use
multiple machines to achieve the target message rate. If during
this testing CPU consumption rises without an increase in message
rate, further investigation for bottlenecks and tuning may be necessary
to maximize the message rate. Bear in mind that it may not be possible
to totally eliminate I/O delays.
At this point, you should have determined how
many copies of the message flow are required to achieve the target
message rate. This may exceed the capacity of one machine dependent
on the complexity of the message flow and the speed of the processors.
The number of copies required for CPU bound and I/O bound message
flows is likely to be noticeably different.
Configuration recommendations
Here are some configuration recommendations:
-
More than one instance of a message flow is required.
Consider allocating it over more than one execution group to increase
availability in the event of execution group failure. WebSphere
Business Integration Message Broker provides automatic restart
of failed execution groups anyway but spreading instances of a
message flow across more than one execution group further extends
the availability and will minimize any outage.
-
Message flow has large virtual memory requirements.
Consider restricting all instances of such message flows to a
small number of execution groups to minimize the demand for virtual
storage. Virtual memory sizes for an execution group of 1GB are
not unknown with very large messages and complex message flows.
You do not want 10 execution groups each running a copy of such
a message flow or there will be a requirement for at least 10GB
of virtual memory. By restricting the allocation of the instances
to two execution groups, virtual memory requirements can be reduced
to approximately 2GB.
Use the Additional Instances mechanism to increase the number
of copies of a message flow that are running in preference to
allocating one copy of the message flow to each of many execution
groups. It is easier to manage a small number of execution groups
in the WebSphere MQSeries Integration V2.1 Control Center or WebSphere
Business Integration Message Broker V5 Toolkit.
-
Message flow is run as part of a sequence.
Ensure that the sequence as a whole is balanced and that all message
flows in the sequence are capable of processing messages at the
same rate. As an illustration, suppose we have two message flows,
one which formats a request message called Request_Flow and a
second message flow called Reply_Flow which processes the reply
message. If the processing in Request_Flow is twice as complex
as that in Reply_Flow it is likely that twice as many instances
of Request_Flow compared with Reply_flow will be needed to achieve
a system which is balanced. You should evaluate separately the
number of instances of each message flow that are required to
achieve the target message rate.
 |
Conclusion
This article has hopefully shown you the importance
of understanding the processing profile of each message flow. You
need to understand this to determine how many copies of a message
flow to run in order to achieve a particular message rate. The
behavior of a particular type of message flow should be consistent
given the same type of data is processed within it but different
message flows will have different processing profiles. Some will
be CPU bound while others are I/O bound. It is important to optimize
message flows before starting any tuning or final configuration
work. Fewer copies of a message flow that is CPU bound will be
required to achieve a particular message rate than one which is
I/O bound.
About the author  | 
|  | Tim Dunn is a Senior Performance Specialist with the WebSphere MQ Integrator performance team in IBM Hursley. Tim works with development in evaluating new releases of WebSphere MQ Integrator and with leading customers to provide consultancy on design, configuration, and tuning issues relating to WebSphere MQ Integrator. Tim has presented on WebSphere MQ Integrator performance in the United States and Europe. He has also authored a number of articles on improving the efficiency of a WebSphere MQ Integrator implementation. |
Rate this page
|