Skip to main content

skip to main content

developerWorks  >  WebSphere  >

IBM WebSphere Developer Technical Journal: Performance Testing Protocol for WebSphere Application Server-based Applications

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Alexandre Polozoff (polozoff@us.ibm.com), Software Services for WebSphere consultant, IBM

20 Nov 2002

This article provides a protocol for conducting performance testing to determine the optimal environmental settings for an application in a variety of load scenarios. Topics include planning the performance environment, performing the actual testing, and measuring the application's performance characteristics.

Introduction

The term protocol is defined as "a detailed plan of a scientific or medical experiment, treatment, or procedure". This article provides a protocol for conducting performance testing on WebSphere© Application Server-based applications, including information on planning, setting up the performance environment, performing the actual testing, and measuring the application's characteristics.

Performance testing is the only way to determine the optimal settings (for JVM, connection pooling, etc.) for an application in a variety of load scenarios. Every application is different and exhibits different behavior under a variety of conditions, which suggests, therefore, that it should be mandatory for all applications to undergo performance testing activity before being implemented in a production environment.



Back to top


The Performance Testing Environment

Ideally, the performance test environment will identically mimic the production environment in every detail, from the number of server firewalls and backend resources to the gauge of the network cabling. However, due to the size and scale of high volume production environments, this is rarely practical. A smaller environment with a minimum of two or three physically separate WebSphere Application Server machines is a more typical performance testing base configuration.


Figure 1. The base performance environment
The base performance environment

As shown in Figure 1, the base performance environment in the distributed space has two physically separate WebSphere Application Servers connecting to a remote database and driven by a single, remote HTTP Server.

If the HTTP Server is remote in the application's production environment, then it is preferable to have the HTTP Server remote in the performance environment as well. Each WebSphere Application Server runs independently on the nodes themselves. Having other applications also running that are unrelated to the testing introduces competition for local CPU, memory and disk resources. This not only affects the results of the testing, but the interaction of these applications with the environment's resources is difficult to measure.

At the very least, the two WebSphere Application Servers should have the same machine and OS level configurations. Common mistakes include putting one or the other application server on a different OS patch or fixpack level, or on different memory configurations, resulting in inconsistent results and/or behavior. As an aside, make it a point to doublecheck that the TCP/IP stack settings are identical to each other as well, particularly the duplex settings on the NIC cards.

Ideally, all application data and the database for the admin repository will reside on a remote machine, although the application data does not necessarily need to reside on the same database server as the admin repository. If HTTP Session persistence is enabled, make sure that the Sessions table is isolated from other databases and marked as VOLATILE.

This is not to say that a completely different configuration for the performance environment is entirely unacceptable. In today's business climate, it is common for the HTTP Server to reside locally on each of the WebSphere Application Server nodes. It is less desirable, however, for one of the WebSphere Application Servers to also play double duty as the database for the admin repository. These and other configuration characteristics violate the constraint that applications should not compete for local resources. Such imbalances in the performance environment can, and often do, skew the final results. However, although it is certainly not preferable to have even a compromised performance testing environment, it is better to have almost any type of performance test environment than none at all.

Dedicated server environment

Obviously, the testing environment should mirror the production environment as closely as possible, since any differences (any at all) introduce uncertainty. If you scale down your test environment, you will have to scale up your results to approximate the numbers for the production environment. Similarly, if your HTTP Server is independent in production, but is included on an app server node in test, then your performance results will not be quite indicative of production either. Configuring your test environment is a process of making choices and concessions to get the most accurate data possible.

The ideal performance test environment is made up of dedicated server machines with a dedicated network connecting them. It is difficult to conduct performance testing if the servers for the WebSphere Application Server-based applications are also operating as servers for other unrelated applications or configurations. It is crucial that the competition for local CPU, memory and disk resources be kept to an absolute minimum.

Backend resources can be the hardest to dedicate to performance testing. Because of operating policies and replication costs at some installations, a dedicated server environment is not always possible. This explains why performance tests are often run "overnight" when backend resources are subject to light (or lighter) load conditions. The world of the Internet, however, is continuing to strain backend resources, as multinational corporations are trending toward 24/7 operations.

The lack of a dedicated server environment can throw performance testing for a loop. Some problem scenarios in an undedicated server environment include:

  • Shared network resources where the performance environment is also shared by the organization's intranet. This can cause sporadic performance results depending directly upon the network utilization at the time of the tests. For example, if someone kicks off a network backup that heavily utilizes the network, this could increase response times, or even deny network connectivity to the components in the performance environment altogether. A network sniffer can be used in cases like this to help identify network bandwidth utilizations and determine when negative response times are not due to the application. Still, this is an extremely frustrating and inefficient environment in which to conduct performance testing.
  • Shared backend resources where multiple applications are attempting to access the same or related data. This can also result in slower than normal response times, and is a very difficult situation to identify without the help of tools directly monitoring the backend resources during testing to understand the utilization picture. Additionally, the other applications may be changing data on the backend resources, making repetitive tests difficult or impossible.

A WebSphere Application Server machine with several applications being tested implies that both the applications and the HTTP Servers are under the additional strain of multiple tests. This is a completely reasonable scenario to run, since it conducts performance system integration testing of multiple applications.

Base configuration of the WebSphere Application Server environment

There are base configurations of the WebSphere Application Server environment that are commonly misunderstood or incorrectly configured.

The layered gate
One configuration is the gating down of requests from the Internet into WebSphere Application Server and consequently to the backend resources. The reason for gating the requests is to configure the application for its best performance characteristics. There is a point in the load curve where response time degradation is directly due to the number of requests the application is processing. Once that point has been determined, by using the protocol described, you "gate" the number of requests processed by the application. As a result, incoming requests should be queued up at the HTTP Server until the application is able to process another request. Gating is accomplished by configuring three separate points in the infrastructure that control the flow and volume of requests into the next layer.


Figure 2. Gating requests
Gating requests

Figure 2 shows the primary components in the infrastructure starting from the left side, where the requests come into the HTTP Server from the Internet. The requests are then sent to the WebSphere Application Server, where the application connects to backend resources, such as a database. At each one of the three arrows in the figure is a configuration point gating the incoming requests into the application and subsequently to the backend resource. Gating requests into the environment tunes the maximum workload, and provides a positive user experience.

Gating protocol
The gating protocol described here is based upon recommended WebSphere Application Server best practices, information compiled from Redbooks, and experiences at high volume customer locations.

There are three steps in the protocol, based upon three configuration points in the infrastructure:

1. HTTP Server Maximum Concurrent Requests

Web servers supported by WebSphere Application Server all provide the capability to define the maximum number of concurrent requests (as opposed to concurrent users) to accept. The differentiation between user and request is pointed out here because a single HTML page containing several images results in multiple requests by a single user.

In the Apache world, the maximum concurrent request gate is controlled solely by the MaxClients setting. (The corresponding setting on iPlanet is ThrottleRequests.) Requests for local static content are processed by the Web server, and requests for the application are forwarded to the WebSphere plug-in that sends the request out to the application.

The Web server processes requests for both local static content and for the application. Determining the value for MaxClients requires examining the content sent to the browser and the maximum number of servlet engine threads in the application server. A general rule of thumb to use for setting a starting point value for any application with one Web server and one application server, is:

maxClient = imageContentPerPage*maxServletEngineThreads

where imageContentPerPage is the average (or maximum) number of images in the HTML responses, and maxServletEngineThreads is the maximum number of servlet engine threads defined for the application server.

An extrapolation of this formula is based upon the WebSphere Application Server's environment. Take, for instance, the scenario where there are three Web servers feeding two application servers (these can be clones within the same ServerGroup). The formula, then, becomes:

maxClient = imageContentPerPage*maxServletEngineThreads*numberApplicationClones/numWebServers

If the value for maxClient is too low, the load test client experiences connection errors due to too few available listeners on the Web server. The value for MaxClient should be increased in this case.

2. Maximum Servlet Engine Threads

The maximum number of servlet engine threads is determined by analyzing the performance test results. There is no valid "general" value to set this to, and the default of 25 is normally too low for high volume applications

Something to keep in mind when setting the maximum values for the servlet engine thread count is the amount of the maximum JVM Heap Size settings. Each servlet engine thread is allocated its own stack, which takes up memory within the JVM. As each servlet thread processes requests under a load condition, you also have to account for the number of objects that are created during this activity. Make sure that the JVM maximum heap size setting is high enough to support the increased number of servlet engine threads and avoids Out of Memory conditions. Also, keep in mind that you will probably want to have generational garbage collection enabled, as defined in the WebSphere Application Server InfoCenter Performance Tuning Guide.

The actual maximum value that you use will be derived solely from application monitoring during the load and stress testing. Using the application monitor, determine whether or not all the servlet engine threads are established, then adjust the maximum accordingly and rerun the tests. Bear in mind that changes in the maximum number of servlet engine threads will require that the corresponding change be made to the MaxClients parameter on the Web servers.

3. Maximum Connection Pool Size

The final gate in the chain is the size of the connection pool to the data sources accessed by the application. Documentation on connection pooling (see the Resources section later in this article) states that applications should briefly hold database connections while executing their transactions. This allows for efficient management and sharing of a few database connections. The generally accepted maximum value for data source connections, even in high volume installations, is 40, with the typical application somewhere between 10 and 20.

The actual maximum value that you will use here will also be derived solely from application monitoring during the load and stress testing. Application monitoring should identify how many data source connections are utilized in the pool. If the setting is at 20 and all 20 data source connections are fully utilized under load, then increase the setting to 30. Rerun the tests and reexamine the utilization of the increased connection pool and determine again if the pool is fully utilized or not. Readjust the maximum as needed.

Experimentation is necessary to find the true maximum number of connections needed. When open connections exist, there is an associated expense in the form of memory usage and network utilization that factors into the application's performance. While determining the number of connections needed is not an exact science, through trial and error and application monitoring the best possible value can be found.

Correlate expected test results
The test results should show some correlation between the numbers seen on the load test client side and those on the application server. For example, if three clones of an application server are configured with 25 servlet engine threads per server, and the observation is that the servlet response time is sub-second, then you would expect to see at least 75 requests per second, or more, through the application server. You would also not expect to see 8-second response times on the client side. The results must take into account the number of HTTP connections for static data at the same time. Make sure that the test results match what you are seeing in the application server's configuration. If the results do not match, make sure that the load test client is not running hot (100% CPU), low on memory or that some network bottleneck has been encountered.

Load test clients
The client side of the performance testing environment has a major impact on performance test results. Conventional load testing tools run as agents on several client machines. More than one client machine is needed to generate a representative volume load, since the client is limited (by CPU and memory) to the number of users it can realistically represent while collecting accurate results.


Figure 3. Load testing clients direct volume into the server environment
Clients direct volume to server

In Figure 3 we have three load test clients that are applying load against the servers in the performance test environment. Notice that the load test clients are not residing on the application servers themselves. The clients should always be located on machines separate from the application servers.

Dedicated clients
The load testing client machines should be dedicated to the sole task of load testing. These machines are not to be shared with other applications competing for local CPU, memory and disk resources. Such competition for local resources does affect the reliability of the measured responses. The client machines should remain on the same routers and network configuration, and should be as close as possible (in networking terms) to the dedicated server environment.

Also, the load test client must be capable of generating the appropriate type of request needed for loading the application in question. For example, an application based on RMI access to EJBs is different than testing HTTP-based requests to a JSP or servlets framework.

Base line and consistency of tests
Dedicated load testing machines provide consistency between the separate performance test runs. One of the first steps in the performance testing protocol for an application running on WebSphere Application Server is recording a base line set of results. The base line is only relevant if the entire test scenario can be consistently reproduced (i.e. you can always rerun the base line and get the same results back). Haphazard swapping of the load test machines or their configurations makes this "base lining" task difficult, if not impossible, to achieve and, further, makes analysis of the performance data ambiguous at best.

Time outs
Set the client page timeouts to 2 minutes or less. This will force errors on the client side if the application is not responding within a reasonable amount of time. Few users tolerate lengthy response times.

Collecting and measuring results

Part of the overall exercise in collecting performance data involves capturing metrics that represent the characteristics of the Java Virtual Machine (JVM) or the application server. The ability to view the separate resources of the application server itself, and how they are performing in relation to the rest of the application, is invaluable. WebSphere Resource Analyzer is one such tool that provides timings and resource allocation values on a per-server basis. Other sophisticated tools, such as Wily's Introscope#x2122 and the IBM Tivoli© Monitoring v5.1 WebSphere PAC, offer more indepth data collection with supplementary capabilities, such as compiling data from multiple servers in the cluster into one view, and saving collected metrics into a database for a historical archive. These tools also typically play double duty in the production monitoring environment by watching many of the same critical application performance points. In fact, the performance test activity identifies the application performance points that should be monitored in production.


Figure 4. Application monitoring
Application monitoring

In Figure 4, application monitoring captures metrics identifying application bottlenecks and/or issues with backend resources.

Application monitoring, regardless of the tool you decide upon, is the single important element in performance testing and problem determination. Monitoring provides the ability to measure the response time of the application servlets, JSPs and EJBs against the backend resources, and against those measured by the load test clients. These metrics assist in identifying problem areas requiring attention either within or outside the scope of the application.

Historical archive
As the testing proceeds, the measurements obtained should be compared against past runs in order to determine if performance is improving or degrading. Application monitoring tools can typically save the collected data in one format or another. Higher end tools can even integrate results directly into databases or other data store repositories. If the tool you use does not provide a method for saving data in a usable format, you can keep relevant data in a simple spreadsheet.

Programmatic timings
Application development teams tend to build performance timing measurements into their applications via some sort of logging mechanism. The logging of performance measurements from within the application should be strongly discouraged, since it introduces additional code into the application that, in addition to consuming processing cycles, must also be maintained and tested. External application monitoring tools that bolt onto the application and are controlled from a single point of command are ultimately more efficient.

Monitoring the JVM
The JVM has several key points to monitor during the performance testing activity to better understand how it is performing. Some base parameters to observe include:

  • Number of active servlet engine threads. This allows for understanding the maximum amount of work that the applications in the JVM can provide at any one point in time. It also assists in determining the required maximum settings for the production environment.
  • Number of active ORB threads for applications with EJBs.
  • Free and used memory to help in understanding how the applications are utilizing memory and how often garbage collection cycles are executed.
  • Servlet response time to help compare and contrast the response times observed on the application server against those measured on the load test clients. Bottlenecks through firewalls and other network components could introduce delays that are confirmed by recording servlet response times.

Monitoring the application
Application monitoring is specific to each application. Key methods that access backend resources should be considered for monitoring. Methods that serialize/deserialize objects must be watched in order to understand the performance impact they have.

Frequency of calls and duration of method execution are two performance numbers that should be captured. Frequency is monitored to determine the number of times the methods are called during various performance runs. Method execution timings provide insight on what percentage of the overall response time is spent on specific tasks, and helps identify application bottlenecks. Developers can use this information for further code refactoring to improve the application's performance.

Monitoring backend resources
Monitoring of backend resources during performance testing is a crucial part of the performance test activity. The performance of any application is only as fast as the slowest link in the environment. If the backend resource is not providing adequate levels of performance, then neither is the application accessing the backend resource performing at optimum levels. There can be many reasons why backend resources suffer degraded performance. The administrator for each specific resource with the appropriate toolset and knowledge is needed to help resolve any issues of this sort related to the backend.

Monitoring network resources
The common link of networked application environments is the network itself. Network resources participating in the test activity must also be monitored and their data analyzed to ensure peak throughput and efficiency. This includes the entire network, including any firewalls, routers, CSS switches, load balancers, reverse proxies, etc. that participate in the test. Any performance issues due to improperly configured firewalls, reverse proxies or other network devices must be resolved first, or the data collected will always be skewed and/or inconsistent.

Common network configurations to check for include:

  • Throughput set to half duplex instead of full duplex.
  • Routing taking different hops to/from the same set of devices.
  • Firewall set up for proxy instead of passthrough.

Network resources should be monitored for:

  • CPU (where applicable).
  • Ports in the "established" state.
  • Throughput timings of connections.
  • Bandwidth.


Back to top


Setting Performance Expectations

Setting expectations from a few different perspectives prior to the start of performance testing is very important in making sure that the test results are valuable, and that the testing activity itself is successful. These expectations include what various inputs will be required by the test team, and what outputs will define reasonable application performance. If not set ahead of time, it is difficult to define whether the performance of the application being tested is adequate or not.

Likewise, expectations must be set for the duration of the performance testing activity. Due to the number of tests and parameters that are modified, performance testing can take a considerable amount of time to complete. Setting up front an appropriate amount of test time is to everyone's benefit.

Application expectations

In order to determine if the application is performing adequately, performance expectations must be defined. It should be the first part of any test plan to outline the following expectations:

  1. Acceptable servlet response time.
  2. Acceptable load client response time. This will be different from servlet response time due to network overhead, additional hops through Web servers and firewalls, etc.
  3. Acceptable requests per second throughput.
  4. Acceptable backend resource response time.
  5. Acceptable backend requests per second throughput.
  6. Acceptable network overhead, including Web servers, firewalls, reverse proxies, load balancers, CSS, etc.

Likewise, a plan to address any shortcomings, should the test results not meet the acceptable results criteria, must be developed. There are two basic strategies for alleviating performance bottlenecks within an application:

  1. Throwing more hardware at the problem, which can be expensive.
  2. Fixing the application bottlenecks, which can take a long time.

Depending on the problem, both strategies may be sound and feasible, though budgetary constraints normally play a factor in the decision. Neither solution is inexpensive, but fixing application bottlenecks is generally a better strategy to follow, whenever possible. Fixing performance problems within the application should also result in some type of post-mortem process to document and distribute the knowledge gained from these tasks.

Total duration of the performance testing activity

It is, unfortunately, common for a couple of weeks of performance testing to be scheduled at the end of the development lifecycle, just prior to moving the application into the production environment. The problem with this philosophy is that many application issues do not surface until they are placed under load, and it is only during the performance testing phase that applications are subjected to the expected load volumes of the production environment.

Depending on the application, the environment, and many variable factors, the performance test phase of the development lifecycle can easily take several months to complete, even if the application has few problem issues. Applications with more problems and bugs can take considerably longer. This is one reason why testing early and often within the performance test environment is strongly recommended. The earlier testing is started in the development lifecycle, the sooner application issues can be detected and properly dealt with. Waiting until the very end of the development lifecycle to begin load testing is probably the worst performance testing scenario.

Application acceptance criteria

After any serious software development activity, the application must meet performance test acceptance criteria, prior to the start of performance test activity. If an application does not meet these criteria, it should not be accepted into the performance testing environment:

  1. Unit test capability. All application development efforts must provide a comprehensive unit test strategy and accompanying unit test code that can be executed to determine that the build is complete and functional. If the unit test cases cannot be successfully completed, either due to bugs in the application or the lack of unit test code, then the application should not be accepted for performance testing.
  2. Low load level capability. The application should perform reasonably and within predefined expectations at the single-user and 10-user load levels with normal think time in place. If the application cannot function at these low load levels, then it will certainly be unable to function at higher load levels. Beginning performance testing would be a waste of time.
  3. Test data available. Data for the application to execute during the performance testing must be provided, or detailed in such a manner that the performance test team can set up as close a replica to the production environment as possible. The test data must be realistic, consistent and complete.

An application must never be placed into the production environment until it has passed performance tests with the expected behavior.



Back to top


The Testing Protocol

When the performance test environment is set up and running optimally, the next step is to produce a detailed test plan that will measure the performance and characteristics of the application being tested. The following information is provided as a checklist of tasks to perform and measurements to take. Many of these protocol recommendations also figure in the capacity planning steps for identifying how many JVMs must be defined to adequately handle anticipated user load in production.

Individual test durations

Typically, the individual tests and configuration points that are described below should only be run at the peak load level for an individual test for a maximum of 10 to 30 minutes. The first set of test runs is strictly for gathering data at a variety of configuration points and load levels. Once the results are examined and optimal configuration points for the application-defined expectations have been determined, then longer tests of 12 hours or more in duration can be executed to measure the application's characteristics over time. The longer tests also provide functionality and stability testing under the prescribed load.

Single and multiple JVM measurements

Performance testing has two basic setups: single and multiple JVMs. The single JVM test illustrates base application performance when running alone. Measurements such as responses per second, servlet response time and number of backend accesses provide the maximum throughput to be expected. These are maximums because the clustered environment may have difficulty outperforming the single application, due to backend resource limitations. The clustered environments provide scalability and fail over. On rare occasions, application issues can arise in the multiple JVM configuration that are not possible in the single JVM configuration.

Recommendation: All testing must be done for both the single JVM and for the clustered, multiple JVM configurations.

JVM heap size settings

The JVM heap size settings are adjusted from a minimum to a maximum set of values until the optimal settings are found. Optimal settings are in some part due to the servlet engine thread count settings, as each stack takes up memory within the JVM.

Recommendation: Adjust the JVM heap size settings in reasonable increments to determine the optimal memory settings for the application. Make sure that the JVM heap size settings are within the physical memory limits of the machine, along with any other applications on the same server. Remember that the base operating system also has memory requirements that must be accounted for, and that swapping negatively impacts application performance in all operating system environments.

Generational garbage collection

Introduced with JVM v1.3 for all operating system platforms is the concept of generational garbage collection . High volume applications that actively create many temporary objects and run with a high number of users generally benefit from having generational garbage collection turned on. However, this may or may not be the case depending on how the application uses objects.

Observe the number of major garbage collection cycles that occur during the performance testing and understand the performance implications to the application during these cycles. Physical memory restrictions leading to smaller JVM sizes typically see higher rates of collection under load. Understanding the performance impact here is the key to avoiding surprises in the production environment. Contrary to popular belief, garbage collection does not cause the application server to lose requests.

Recommendation: For each set of JVM heap size settings, run one test with garbage collection turned on, and another with it turned off. These tests should be run long enough in order to ensure that at least a couple of garbage collection cycles are executed so that the application's behavior can be measured. Monitor the garbage collection cycles by watching how the free and used memory is utilized by the JVM.

Servlet engine thread pool

The size of the servlet engine thread pool is what determines the amount of work that the JVM containing the web application can execute at any one time. With WebSphere Application Server v4.0 and later, the thread pool defines minimum and maximum sizes which reflect the limits of the pool size. High volume applications generally have higher thread pool sizes, but factors such as application bottlenecks, usage of the synchronized keyword and/or other limitations within the code prevent efficient utilization of large thread pools.

Applications that utilize the servlet engine thread pool are typically applications with servlets and JSPs. This also includes SOAP-based applications. Application clients that talk directly to EJBs do not use servlet threads, but do use ORB threads (see next section).

Recommendation: Test the application with a variety of minimum and maximum thread pool sizes to determine which settings move as much work as possible through the application. JVM heap size settings generally are adjusted upwards for larger thread pool sizes, but start the testing with minimal memory settings. Monitor the CPU utilization of the application. Once the CPU utilization of the application approaches 80% you will be hitting against the limits of the CPU, since cycles must be reserved for the base operating system itself.

Remember that the servlet engines thread pool size directly affects the MaxClient setting on the Web servers. Ensure that the Web server has enough listeners defined in order to avoid "failed to connect" errors from the load test clients. If "failed to connect" errors do occur, then the Web server's settings for the maximum number of listeners is too low and must be increased.

ORB thread pool size

EJBs that are run within the container execute inside the threads that are allocated to the ORB thread pool. Remote Method Invocation (RMI) -based and servlet-based applications that communicate with EJBs must have the ORB thread pool size configured.

Recommendation: Adjust the size of the ORB thread pool. Monitor the thread pool and EJB activity/response time in order to determine the optimal thread pool size.

Connection pool

Applications that make use of connection pooled JDBC resources need to set the size of the connection pool. Generally accepted practices limits the maximum number of connections in the pool to about 30-40. Little benefit is achieved from having more than 40 connections and one should bear in mind that each established connection consumes memory and network resources.

Session persistence data sources generally need only 10 connections defined for a maximum. Never have a maximum connection pool size that is greater than the servlet engine thread count.

Recommendation: Adjust the maximum and minimum sizes of the connection pool. Monitor the servlet response time and throughput (request/second), JDBC activity and memory utilization for best performance.

The user

It is important to identify the type and number of users for the performance testing activity that will accurately represent the application's behavior in production. Correlation of users to expected loads is measured differently by organizations. Some measure utilization by number of CICS transactions per second, as opposed to the number of users. Correlating the user load to the appropriate metric requires understanding how the application is utilizing backend resources, and becomes an exercise in mathematics. However the load is measured, there should be a common understanding of the final results and what they represent.

Number of users
The number of users defines a specific load onto an application. Since every application probably has different types of users (e.g. a user that browses a catalog vs. a shopper that purchases items), applications must be tested with a variety of scenarios that, together, resemble the average, expected work load.

Some applications perform better at high user loads than others. An application suffering from bottlenecks or excessive synchronization typically exhibits poor response times and low CPU utilization.

The single user load test is typically for establishing application base line performance. As mentioned previously, if the application performs poorly or breaks at the single user load level, it is not useful to continue performance testing the application at higher load levels. Likewise, it is recommended that testing be halted if the application performs poorly at the 10-user level.

Once the CPU utilization on the application server approaches and hits the 100% mark, any increase in the number of users only results in poorer response times. This should be measured and recorded, as it clearly defines a limitation for the application. This also assists in capacity planning, and in determining the number of application clones that will be required within the clustered environment to support load expectations.

Type of user
Typically an application can have several types of users. For instance, a Web site selling items has at least two types of users: a browser and a shopper. One type of user interacts with the site but does not purchase anything. The other user is executing additional functionality, such as shopping and credit card verification. Not all users are shoppers, but all users are at least browsers. Defining the type of user participating in the test is key to understanding the performance of a site. The better a prior knowledge of the cross-segment of users using the site, the more realistic the performance test being conducted. Likewise, understanding the flow of pages typically used helps as well. This requires that the person developing the load test scripts understands the application, types of users, and what those users normally do.

Recommendation: Test each application through all of the previous protocol recommendations at the following user load levels: 1, 10, 100, 500, 800, 1500, etc. (as appropriate based on realistic load expectations). Monitor the application's response time, throughput (requests/second), CPU utilization, JVM memory utilization, thread utilization, and JDBC backend utilization, throughput and response times.

CPU utilization

Recording the CPU utilization of the application server machines is necessary in order to understand the impact of the applied load to the applications under test. The goal of a server is for it to drive to the highest CPU utilization possible. It is not cost effective for expensive servers to run at only 20% CPU utilization. In the distributed environment, CPU utilization must be balanced against expected peak load, hardware failure and defined Quality of Service requirements.


Figure 5. CPU and Response Time
CPU and Response Time

The chart in Figure 5 shows CPU utilization in blue and response time in red. Once CPU utilization has reached saturation, increased load only increases response time. Measurements must be compared to the predefined application expectations to determine at which load levels acceptable response times are possible. This may be at significantly lower CPU utilization if the application has bottlenecks or other issues.

Measure the user load level at which an application saturates the server and document it. These numbers are useful in the capacity planning exercise.



Back to top


Recording and Evaluating Results

The execution of the various tests based on the above protocol recommendations involves a corresponding record keeping activity. Once the data is recorded, the analysis can then be conducted. Record keeping is the final important step in the performance test activity.


Table 1. Sample spreadsheet collects results from a specific run based on the protocol recommendations
Date / TimeAugust 6, 2002 - 12:03:54
Number of Users100
Test Duration (minutes)10
Web Server Settings
MaxClients600
TTL (requests total per process)10,000
WebSphere Application Server Settings
JVM Heap Size Min256
JVM Heap Size Max512
Generational Garbage CollectionOn -XX settings
Servlet Engine Thread Pool Size Minimum10
Servlet Engine Thread Pool Size Maximum200
ORB Thread Pool Size Minimum10
ORB Thread Pool Size Maximum200
Web Server Measurements
CPU Utilization (measured peak)40-45 %
Average Throughput0.45 sec
Requests per second 44.85
Load Client Measurements
Requests per second44.85
Requests Total7,348
Requests Completed Successfully7,342
Requests Timed out4
Requests Failed to Connect2
Page #1 response time0.35 sec
Page #2 response time0.54 sec
Page #3 response time1.33 sec
WebSphere Application Server Measurements
Servlet Response Time0.28 sec
Number of Garbage Collection Cycles2
CPU Utilization (measured peak)33%
JDBC Number Requests per second - Queries125
JDBC Response Time - Queries37 ms
JDBC Number Requests per second - Insert35
JDBC Response Time - Insert125 ms
JDBC Number Requests per second - Update98
JDBC Response Time - Update222 ms

A set of tests based on the protocol recommendations presented in this article results in data such as that depicted in Table 1, where the various elements of the configuration and the measured results are recorded. This is done for each individual series of tests. The set of collected data is dependent upon the backend resources utilized by the application. Applications that do not directly access JDBC resources would obviously not collect JDBC timings. Basic timings and frequency data should be collected for backend resources.

Once the results are collected and compiled they can be generated into a variety of charts to illustrate the performance of the application. It is possible that collecting and compiling the results may involve some manual manipulation of the data, due to the fact that no single tool should be expected to collect all the data required.


Figure 6. Requests completed by number of users
Requests completed by number of users

Figure 7. Response Time by page and number of users
Response Time by page and number of users

Figures 6 and 7 show two possible charts that could be created after combining the test results into a single spreadsheet. The results can be juxtaposed against each other in a graphical format allowing others to analyze the results. Text explaining any measured or observed anomalies should be provided, especially when performance results either dramatically improve or degrade.

(Keep in mind that the manual maintenance of spreadsheets or databases is prone to human error. A review of the data should be made after it is recorded in order to identify possible mistakes or misrepresentations.)

Analysis of the data involves identifying the following application characteristics to determine various optimal values:

  • Servlet response time and client response time
  • Network bandwidth
  • CPU utilization of the application and Web servers
  • Backend resource utilization
  • Networked components utilization.

Once the optimal settings are determined, a consideration must be made as to whether the application should run on its own application server, or on an application server along with other applications. The number of application servers to be run on the same physical machine must also be considered, since there will likely be physical resource limitations. Finally, the clustered environment must be evaluated for not only physical limitations on the application server machines, but also on the backend resources utilized by the applications. All of these factors play into the performance and capacity planning stages of the production environment in order to accomplish optimal performance.

Frequency of performance testing

One common misconception is that performance testing is a one time effort. It can be a one time effort, if you never change the application code or the machine configurations. More often than not, though, application code changes or more powerful machine configurations are introduced into the production environment, or both. Therefore, any time a change occurs in the application code or in the machine configuration, then it follows that performance testing must be re-executed in order to determine the new parameters for optimal performance.



Back to top


Conclusion

Performance testing protocol is a comprehensive mix of definitions, understanding, and execution. Testing early in the development cycle and as often as possible is the best way to determine performance issues prior to the release of an application into the production environment. Teams dedicated to performance testing provide consistent testing and alleviate the burden of performance testing on individual development teams. Tools that provide insight to the JVM and application characteristics are usually well worth their cost in providing quick problem determination, and in keeping the performance test activities as quick and efficient as possible.



Resources



About the author

Alexandre Polozoff is a Software Services for WebSphere consultant engaged in the development of performance practices and techniques for high-volume and large-scale installations. His expertise includes third-party tool evaluations and best practices for performing post-mortem analysis. Alexandre also continues to be involved in open technology standards, such as SNMP, TMN, and CMIP. He can be reached at polozoff@us.ibm.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top