Skip to main content

skip to main content

developerWorks  >  Information Management  >

J2EE and IBM object-relational databases

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Jacques Roy, IBM Worldwide Sales Support, IBM

13 May 2004

WebSphere's growing successes demonstrate the increased use of the J2EE environment for business application deployment. Technical people everywhere must familiarize themselves with this environment. This article provides an overview of the J2EE environment. It then discusses object oriented approach (analysis, design, implementation) that is used in any J2EE development and some issues related to object persistence as it relates to the use of object-relational database management systems (ORDBMS).

Introduction

A database decision should be a strategic decision that will bring you a business advantage. Once the decision is made, you must use the database server to its fullest to realize this advantage. This article provides an overview of the Java™ 2 Enterprise Environment (J2EE) environment and discusses the object-oriented approach (analysis, design, implementation) that is used in J2EE development and some issues related to object persistence as it relates to the use of object-relational database management systems (ORDBMS).



Back to top


Web architecture background

J2EE contains dozens of acronyms that each represent different concepts. To understand this complexity, it is useful to look back at the history of Web architecture, as illustrated in Figure 1.


Figure 1. Web architecture

The architecture shown in Figure 1 represents the Web environment in 1996. At the time, the dominant provider was Netscape, with both its browser and its Web server. Netscape’s products were a major improvement over what had been available only a few months before.

Starting from the left side of Figure 1, we see the browser. It includes additional capabilities to use plug-ins to provide features such as displaying .pdf files. It also added some programming capabilities to improve its interactions with the user. This included a scripting language that was added to the HTML language, JavaScript, (which was created independently from Java) and Java applets that could be downloaded as part of an HTML page.

The browser communicates with the Web server using HyperText Transport Protocol (HTTP). The important characteristic of the HTTP protocol is that it is based on a request-response model. Each request-response is independent from any other. The Web server does not expect another request from the client. This makes this protocol stateless.

The World Wide Web (WWW) was created to provide easy access to documents. There was no need to keep track of complex interactions since each request is complete by itself. The Web server could receive the request and use the information provided to retrieve the requested document in the directory structure controlled by the Web server.

Early in its development, the Web server definition added an interface called CGI (Common Gateway Interface). This was an easy way to call a program that would receive information in a specific format and use it to fulfill the request and return it to the Web server following the defined protocol. This provided the capability to call a program that would generate more dynamic content. Since this is an extension to the browser request, the CGI program is transient: it is created by the request and it terminates once it has returned its result.

The CGI protocol was pushed to its limit and used to implement applications that accessed relational databases. This meant that each time a request asked for information residing in a relational database, a database connection was opened and the data was inserted, updated, and retrieved, and the connection was closed. In most cases the majority of the time was spent connecting to the database.

Two methods were invented to solve this problem:

  • Having the CGI program talk to a permanent program, and
  • Including an API to extend the capabilities of the Web server.

The first option, having the CGI program talk to a permanent program, can have multiple variations. The CGI program can start an application program that will be accessed based on some ID that we return with the response (shown as app-srv in Figure 1). If the ID is not re-used within some timeout interval, the program terminates. The other approach is to have a permanent program that can handle all the requests for this application from any client. This scenario is likely to involve a multi-threaded program that validates the clients and assigns IDs to each new client. It must then keep track of some timeout period for each client ID.

The second option, including an API to extend the capabilities of the Web server, is to write the application using the Web server API (NSAPI in Figure 1) and have the Web server host the application. This way, the Web server can preserve database connections and include application-specific processing. It also needs to keep track of users and connection timeouts.



Back to top


Enter the application server

It made sense to consolidate the way Web applications were implemented into a more complete framework. At that point, the Java concept of "write once, run anywhere" was already very popular. It did not take long before J2EE took shape.

J2EE is a Java specification that is greatly influenced by the Object-Oriented (OO) approach to applications (Java is an OO programming language). Its goal is to provide an application framework that contains all the features required for the implementation of enterprise applications. This includes portability, scalability, transaction control, and so on. The J2EE specifications include:

  • J2SE: Java 2 Standard Edition contains the well-known Java environment including the platform-independent Java development kit/Java run-time environment (JDK/JRE), multi-threaded environment, Java Foundation classes, etc
  • EJB: Enterprise Java Beans provide a standard way to represent objects in a distributed environment. EJBs come in three flavors: session beans, identity beans, and message beans.
  • Servlets: Java Servlets provide a mechanism to operate in the request-response mode of communication.
  • JSP: Java Server Pages are a specialized type of servlets that are used to dynamically create HTML pages displayed to the user.
  • JDBC: The Java DataBase Connectivity interface provides a standardized way to communicate with data sources such as relational databases.
  • JTA/JTS: The Java Transaction API and Java Transaction Service
  • JMS: Java Message Service
  • JNDI: Java Naming and Directory Interface. This is essential to the J2EE environment since it provides a way to keep track of resources without having to know their location. You can equate that to an LDAP directory service.
  • JavaMail
  • JAXP: Java API for XML processing. It also includes other protocols for registry (JAXR), and XML-based RPC communication (JAX-RPC).
  • Connector Architecture: This architecture provides ways to communicate with legacy systems that are not integrated with the J2EE environment.
  • JAAS: Java Authentication and Authorization Services

These specifications are still evolving and more components are being added. Significantly, these specifications are standard-based and aim at providing portability between application providers, application server providers, and hardware platforms. As you can see, the J2EE environment is trying to provide all the possible services required in enterprise applications. This includes many services that have been available in different forms for many years.

A high-level representation of the J2EE environment is provided in Figure 2. An actual implementation may include many other components and can distribute its objects over multiple machines over a large network.


Figure 2. J2EE high level architecture

Note that this architecture allows you to implement client applications that talk directly to the enterprise Java beans through IIOP (Internet Inter-Orb Protocol). This opens the door to types of interactions other than browser-based user interfaces.

It is important to understand some of the major concepts and models underlying the J2EE environment. The next few sections cover these subjects.



Back to top


Request-Response

The interaction between Web-based users and the application server follows the original Web model using the HTTP protocol. This means that we are tied to the request-response model described briefly above.

To facilitate this communication, new Java classes were introduced. The main class used to handle this communication is the HttpServlet class. This class contains a set of methods that match the HTTP protocol. To get the content of a request, the methods are doGet() and doPost().

The HTTP protocol passes on information to the Web server either by putting arguments in the URL or by passing the information separately from the URL. The first one is referred to as a GET command and the second one as a POST command. The advantage of the GET command is that the URL contains all the information required to retrieve the requested information. Because of that, it can be bookmarked for future recall. The POST command sends the additional information separately from the URL, providing better security. It is also more appropriate when a relatively large amount of information needs to be sent. The drawback is that it cannot be bookmarked.

The doGet() and doPost() receive two arguments: HttpServletRequest request, HttpServletResponse response. These additional classes give you all the information you need from the request. You use the response argument to write the answer back. Methods are provided to give you all the functionality you need to provide the answer.

Any interaction between a browser and an application server is done through the HttpServlet class. You should spend some time studying the fields and methods included in the classes mentioned above.



Back to top


Model-View-Controller

J2EE suggests the use of the MVC development model. The idea behind this model is to separate the interaction with the user, the processing, and the data access as much as possible. This model has been around for quite some time. It first appeared, I believe, in the Smalltalk language in the 1980s.

Figure 3 illustrates the use of this model with J2EE components. The application starts by requesting a JSP (View) that returns a page to display on the browser. The user selects an action that sends information to a servlet (Controller) that decides what must be done with it. It can involve retrieving a Java bean (Model), or enterprise Java bean, to provide data access. The servlet can then pass a reference to the Java bean to the JSP to provide access to the data for formatting and display.


Figure 3. MVC model



Back to top


Enterprise Java Beans (EJB)

The EJB architecture provides a standard model for the development of distributed applications. An EJB contains two interfaces: Home and Remote. The home interface is used to create or find an object of a specific type. The remote interface, retrieved through the home interface, gives you access to the public methods of the remote object.

There are three types of EJBs: Session, message, and entity. Session beans provide access to business processes. They come in two flavors: stateful and stateless. A stateful bean preserves information between invocations from a specific client. A stateless session bean can be shared by several clients, since it does not preserve any specific information between invocations.

A message bean is a stateless bean that provides business processes for the manipulation of Java Message Service (JMS) asynchronous messages.

An entity bean represents business data and its related manipulation logic. This business data must be preserved in persistent storage (i.e., database).

EJBs are distributed with deployment descriptors that include information such as transaction attributes, security authorizations, and persistence. We discuss the J2EE persistence in another section below.



Back to top


Object-Oriented Approach (OOA)

Object Orientation has improved software development and it is a key part of the J2EE environment. Some of the keys to OO are the data encapsulation and the concept of inheritance. There are many books on the subject so I won’t go into the details of this approach. There are, however, a few things that are important in the context of this article.

Object-orientation favors a hierarchical approach. We find object inheritance hierarchies and object composition hierarchies. Using part of the DICOM (Digital Imaging and Communications in Medicine) standard, Figure 4 illustrates these two types of hierarchies.


Figure 4. DICOM hierarchies

A Data (DICOMData) object can be specialized to become one of four types of object: Patient, study, series, or image. Figure 4 shows us another hierarchy where a TAG object can be specialized into a DataElement. These types of inheritance are very common in OO analysis, design, and programming. Just look at the Java class hierarchy to see an elaborate set of object inheritance hierarchy.

Figure 4 also shows us examples of aggregation/compositions, including multiple levels in the composition hierarchy: We see that a Patient may include a number of studies, a study may include a number of series, and a series may include a number of images. We also see that all these objects can contain a number of TAG/DataElements since this composition is represented at the parent of the object inheritance hierarchy (DICOMData).

The aggregation/composition model leads to an interesting distinction between objects that is not obvious when reading the OO literature: In most business applications there are "first-class" objects that are actively searched and manipulated directly and there are "second-class" objects that are accessed only through the first-class objects: If you don’t retrieve the first-class object, you will never get to the second-class one. This is typical of the hierarchical model that was the dominant database model 30 years ago. This leads to advantages and disadvantages that we discuss further in a later section.

Another interesting subject in OO is object persistence. The OO literature spends virtually all its time discussing the objects and its interactions with other objects as in-memory objects. Persistence, on the other hand, seems to be considered a bothersome issue. The general line is: "persist the object." The database server does not add value outside of restoring an object if necessary. This is partly why the issue of “impedance mismatch” appears so important to a lot of OO people.



Back to top


The J2EE persistence model

When we talk about persistence in the J2EE environment, we likely refer to the entity beans. The J2EE environment provides two persistence schemes for entity beans: bean-managed and container-managed. Bean-managed persistence leaves all the work to the EJB developer to figure out how to store and retrieve a specific object type.

Container-managed persistence removes the specific of persistence from the implementation of the EJB. The abstract schema, part of the deployment descriptor, defines the persistence fields of the entity bean and its relationships with other beans. This is accomplished with a language, EJB QL, that is a subset of the SQL 92 standard.

With container-managed persistence, when an entity bean is created, its information is saved in the database. Each time a method call modifies the content of the bean, these changes must be reflected in the database. Of course, this also takes into consideration the transaction attributes of the targeted entity bean.

J2EE tries to make object persistence as simple, transparent, and automatic as possible. In the process it wants to make persistence independent from any database product as much as possible, no matter the type of database or persistent storage used. Therefore, a database is relegated to being simply a persistence storage service.



Back to top


IBM’s object-relational databases

IBM provides two state-of-the-art object-relational databases: DB2® Universal Database™ (UDB) and Informix Dynamic Server™ (IDS). They are relational databases that provide the strengths of the relational model and its set processing. They also include object concepts and the capabilities to extend the database server capabilities to better fit your business model. The database server is then an extensible framework to process business data. These extensibility features are in line with the concepts behind the J2EE environment since J2EE is really an extensible framework for applications.

These database products have leadership positions in the database market. To understand how these products reached such prominence, we must first look at why the relational model became dominant in the industry. Before relational databases, the dominant types of databases were organized as hierarchies. This applied the concept of "divide and conquer." Starting at the top of the hierarchy, you would choose a specific node that represents a container object such as a region or a branch that contains a subset of all your database data. This object would potentially contain members. You could then select another subset through a member pointer of that node. This process can go on until you find the exact element you want to operate on. In addition to be able to add, remove, or modify the element, you could also move it to another location in the hierarchy by manipulating the pointers to the element or node.

Hierarchical-type databases have two major advantages:

  • They are lightweight databases with minimum overhead since they simply return the node and member pointers requested by the application.
  • They quickly divide the data into smaller parts to get to the desired record

They also had other characteristics that could use some improvements:

  • Optimized for one problem: By their nature, hierarchical databases optimized access to the data through one specific path. For example, a large bank may divide its units by regions, branches, and accounts. It is then very efficient to locate a specific account belonging to a known branch and region or to do reporting or analysis per region and per branch. It is more difficult to locate all the accounts for a specific customer since this customer may have multiple accounts that were created over time. Furthermore, this customer may have relocated multiple times. She can then have accounts under multiple branches and multiple regions. Most businesses have multiple problems to solve with their data. If they follow the hierarchical model, they solve one problem very well. Other problems could then be poorly solved. To achieve decent performance, this could require duplication of the data in multiple hierarchies, which could greatly increase the complexity of data management.
  • Deals with physical records: The records stored in hierarchical databases are manipulated directly by an application. The application must know the order and type of each field since it must calculate its offset in the record. Any modification to the record structure requires modifications to the applications that access these records.
  • The relocation of objects is done programmatically: This pointer manipulation opens the door to problems of duplication and dangling pointers.

Overall, the hierarchical model is a very viable model in some niche applications as the longevity of the IBM IMS product demonstrates. Relational-type databases go beyond the persistence storage service. They also address the two shortcomings of hierarchical databases listed above. The major characteristics of relational type databases are:

  • Flattened hierarchy: Relational databases represent data in the form of tables. All tables are at the same level. This means that all the data can be accessed directly. Looking back at our bank example above, we could access all the accounts directly and find which ones belong to a specific customer regardless of which branch or region the account belongs to.
  • Logical records: The columns in a table are accessed through column names, not record offsets. Only the specified columns will be used in the operation. This way, an application can be independent from the number of columns and the order of the columns in the database. The tables can be changed to include additional columns without requiring changes to any application. This concept is taken further with the use of VIEWS that provide a virtual table that is build from a subset of one or more tables. You could equate the concept of VIEWS to an object interface. As long as the interface stays the same, applications that use that interface don’t need to be modified.
  • Set Manipulation: Instead of simply retrieving specific records and returning them to the application, a relational-type database has the capability to manipulate the data. These capabilities include sorting, grouping, aggregating, and a large set of functions to manipulate the different data types.
  • Non-Procedural Query Language: Relational-type database include a data manipulation language called SQL (Structured Query Language). This allows a user or application developer to describe which data it wants to operate on instead of how to obtain the data. The database system must then figure out how to fulfill the request. The optimizer uses information such as table sizes, availability of indexes, and data distribution to figure out the optimal way to answer the query.

The relational databases also include the transaction ACID property (Atomicity, Consistency, Isolation, Durability). Over time, relational databases added features such as constraints, stored procedures, backup/recovery, replication, and so on.

IBM continues to improve its ORDBMSs. The general direction covers performance, availability, scalability, manageability, development productivity, integrated information, and business intelligence. There is a big push toward information integration and on-demand computing through data federation and autonomic computing. Some of these features are already in the IBM products and more will appear over time to simplify data management.

What about the "OR" part of the ORDBMS? It is the extensibility part of relational databases. Once we make the leap in concept, it becomes obvious that database extensibility is part of the natural evolution of relational databases.

Relational databases have included limited extensibility features for quite some time. Features such as constraints and stored procedures are forms of extensibility. DB2 UDB also includes "user exits" that allow system and database administrators to tailor parts of the processing. Aren’t stored procedures enough? No. They are designed to apply a procedural processing after the database has returned the data. Their advantage is in limiting the data transfer between the database and the application. Stored procedures are not integrated in the database engine since they are designed to process one record at a time in a procedural manner. We also have to distinguish between stored procedures and store procedure language.

DB2 allows users to write stored procedures in languages such as SQL, Cobol, "C," and Java. IBM-IDS includes a stored procedure language and also allows procedures to be written in "C" and Java. The use of these languages has been extended to the relational model. In addition to write procedures that manipulate the data after the database server has done its processing, we can include the processing within the set processing of the database system. The result is simplified functions that leave the set processing to the database engine. Their execution can also be better integrated, applying earlier in the processing to reduce the amount of data that must be manipulated in follow on steps of the set processing and the integration of these functions in the parallel execution of the queries. This provides multi-threading execution of these functions without having to include complex code to explicitly take advantage of it.

DB2 UDB and IDS allow you to write multiple functions that have the same name but apply to different number of arguments or different argument types. This is similar to having plus operators ("+") that apply to integers, decimals, floats, doubles, or a mix of these types. This is also similar to the polymorphism found in object orientation. In addition to the functions, these databases allow you to create new types and create object hierarchies through the use of table hierarchies. All these features allow you to adapt the database to your business environment instead of compromising your design to fit the database. Going into all the business advantages you get by using the full capabilities of the IBM databases is beyond the scope of this article. Please consult some of the references given at the end of this article for examples on how extensibility can help you be more competitive.

The IBM ORDBMS databases provide a huge amount of functionality that can simplify software development, reduce hardware requirements, and speed time to market. You must make them part of your design to take advantage of their benefits. IBM ORDBMSs are more than persistent storage. They are a way to make you more productive and efficient.



Back to top


The looming objects crisis

In theory, and often in practice, any manipulation on a persistent object requires the instantiation of the object, its manipulation, and its synchronization with persistence storage. This leads to the container-managed persistence and the idea of being independent from the database:

"The advantage of using container-managed persistence is that the entity bean can be logically independent of the data source in which the entity is stored. …"
Enterprise JavaBeans™ Specification, Version 2.1, page 143.

It goes on to mention that a data source could be relational or non-relational such as IMS. Of course, it could also be an object-oriented database.

Any interaction that modified a container-managed bean requires synchronization with the data source to insure a consistent view of the entity bean. Using a chatty interface to an Enterprise JavaBean (EJB) could dramatically increase the number of database interactions, potentially leading to performance problems.

The focus on "one object at a time," instantiated, has the potential to create performance and capacity problems. This problem is not limited to the use of container-managed entity beans but is general to the object-oriented approach. Let’s illustrate with a simple example.

Imagine a large bank that specializes in giving out loans. It has multiple regions that include dozens if not hundreds of branches.


Figure 5. Bank loan organization

The diagram in figure 5 illustrates the multiple regions for the corporation, the multiple branches per region, and the multiple loans per branch. It also indicates that we can have different types of loans.

If the corporation wants to get a list of branches that are taking too much risk in their loans, it collects the list for each region. Each region goes through its branches to find out which is too risky. The branch itself has to go through each loan to get its risk and the amount of its loan so it can calculate the average risk for the branch.

This is the proper object approach: each object encapsulates its information and its processing. The only way a branch can find the risk taken by a loan is to ask the loan for that information. The approach is sound since it removes the tight coupling between types of objects. The communication between objects depends on well-defined interfaces. As long as the interface stays constant, the implementation can change without affecting the overall system.

In our example, if we consider an environment with 10 regions, 100 branches per region, and 10,000 loans per branch, we end up creating over 10 million objects to answer the query. Each loan is asked to return two values: risk level and amount of the loan. With the communication between regions and branches, this gives us a little above 20 million messages between objects. The objects are eventually removes from memory, requiring additional processing.

The numbers used (10/100/10,000) are far from excessive. It is easy to imagine systems generating orders of magnitude more objects. This can easily become a performance and then a capacity (such as memory) problem. Solving the problem by adding nodes to the application server may not solve the problem but adds complexity.

The problem can be reduced if not eliminated by taking advantage of the strength of the object-relational databases such as DB2 UDB and IDS. Keep in mind that most object-oriented people see a database simply as persistence storage. They are used to putting all the processing in their object code. The database is an afterthought.

If we consider when object-orientation became mainstream, we can assume that most, if not all, programmers in their early thirties or younger have been trained to think this way. Since the database is used only to "persist" an object and retrieve it, a database with minimum of overhead has an advantage in this model. This favors hierarchical, network, and object databases. Object-relational type databases that can do so much more than object persistence become big engines that never even get started. This is like being in a race with a race car against a bicycle where you are not allowed to start the engine.

Let’s be clear: The object-oriented approach is excellent. The problem is with the narrow focus of many architect and developers that fail to see that there is a choice on where the processing occurs. Making this choice impacts the design and can have significant performance implications. Taking advantage of the strength of ORDBMSs can simplify the design and greatly improve the performance of the resulting system. This translates into faster time to market and lower development and maintenance costs. Faster results also can provide a significant business advantage.



Back to top


Database independence

I mentioned earlier that J2EE promotes database independence to improve the portability of applications. It goes further by mentioning that it could be any type of database: relational (object-relational) or non-relational. This is a fine goal for the overall J2EE architecture but a very dangerous approach in building business applications.

The goal of a business application should be to provide a business advantage against the competition, not to be portable: Portability is a secondary goal. Non-portable sections can be isolated to limit the work required if porting is ever needed. We want our answers as fast as possible at the lowest cost possible. If you design for portability, you design for the lowest level of functionality, giving away any advantages. It is like an employer that refuses to hire a highly competent individual because he could someday leave and the next employee could be a lot less capable. Following this logic, we should hire the least competent individual possible. If this is not possible, we should make sure to limit the employee contribution to the company so we won’t be disappointed if one day we have to replace him with a less competent person. Expect more and you’ll get more. Expect less and you’ll get less and less over time.

The same should go for enterprise applications. You should weigh the benefits of all the features of your database system and take advantage of them if they give you a business advantage. You can isolate the database interactions in your design to limit the porting effort required in the odd chance you have to move to another database. Since database competition is intense, it is likely that the unique feature you are planning to use today will be in a competitor’s database in the future. If not, it can become part of the negotiation on how this new vendor will compensate for its deficiencies to get your business. So, design carefully but design to win.



Back to top


J2EE complexity

J2EE provides a large number of benefits that facilitate the development of business applications. They include the platform independence, component architecture, multi-tiered application model, unified security mode, and an extensive set of standards that cover areas such as transaction control, database access, and messaging. In fact, J2EE integrates under one roof all the advances in software made over the last 50 years or so. This comes at a price: complexity.

Despite the apparent ease of development that tools such as WebSphere Studio Application Developer provide and the control afforded by the WebSphere Studio Application Server administrative console, we must insure the proper expertise is developed before starting any major projects. The most efficient approach is probably a mix of training and the use of expert consultants.

It is also important to have an integrated team that includes all the areas of expertise. The team should be put together before the project starts. For example, a database administrator (DBA) and an SQL expert should be involved from the start to participate in the discussions on how the different objects will be used. Let me illustrate this with an example.

Figure 4 shows the relationship between the DICOM objects. We see that each type of DICOMData can include a number of other DICOMData objects. We can represent this hierarchical relationship in the database through standard relational methods or using new data types. Please see the reference section for an example on handling hierarchies. The more pressing problem is the handling of data elements.

A data element is represented by a tag and a value. Different types of values exist such as character string, date, decimal, and so on. A value can also have repeating fields. A DICOMData object has the potential of containing hundreds of different data elements. In practice, the number is much lower.

Representing a DICOMData object as one large row of mostly empty data elements is not practical or even possible. We can then represent a DICOMData object as a relationship between the DICOM object itself and the data elements. The first approach would be to have one data element table for each type of data elements. This gives us 23 tables and a 24-table join to retrieve a DICOMData object. Other approaches could allow us to reduce the 23 tables to one. Even then, it is expensive to store and retrieve a DICOMData object using a relational-type database.

If database experts are involved right from the start, these issues can be discussed and new information can be uncovered that will make the implementation easier. It turns out that only a handful of elements from each type of DICOMData objects are used for searches. This allows us to represent this handful of values as part of the DICOMData object and group all the other data elements together in a large object column. The result is a dramatic performance improvement in storing and retrieving DICOMData objects.

In the previous section, we discuss the explosion of objects that is caused by asking to each branch the average risk it is taking on its loans (see Figure 5). The object-relational capabilities of DB2 UDB and IDS allow us to extend the database capabilities to include the risk calculation functions in the database. We can then use either a stored procedure or a user-defined aggregate to calculate the average risk for each branch. We can even add the condition that defines the unacceptable risk level and retrieve only the list of branches that are at risk. The result is a dramatic increase in performance since we don’t have to instantiate one million objects and we avoid most of the messages between objects. Where an object person sees a lot of objects creation and communication, a database person sees the opportunity to provide the answer directly. Processing in the right location provides a simpler and more performing solution.

Other areas of J2EE add to the complexity. They include the multi-threaded environment that implies the sharing of the EJBs. How do you share the EJBs? What about the coordination of multiple EJBs in an update? Is your code re-entrant or do you have critical section that must be locked? That leads to many issues of data integrity, locking strategies, issues that are well known to database people.

Then there is the performance of accessing EJBs and how coarse grain you need to go to get decent performance. You also have to figure out how you will monitor your application to find performance bottlenecks.

The J2EE environment provides great benefits in the development and running of enterprise application. Because of its complexity, it should not be taken lightly. Get the right team of experts involved early in your projects if you want to make them successful.



Back to top


Looking for solutions

There is no ideal solution to the issues raised earlier in this article. It starts with proper education so you can use all the components of your solution to its fullest. This may include the use of applet in the Web browser, the proper design in the middleware, and taking advantage of the strength of the database system.


Figure 6. Collection of objects

The examples above show that the use of collections of objects within an object should be examined carefully. We must determine if a collection is really necessary or if we could replace the collection with one or more methods that will operate on the collection of objects through the database. In some cases, keeping the collection will be the right approach in other cases using methods will provide great benefits such as improved performance and reduced complexity.

Figure 6 illustrates the creation of the objects in the middleware and the communication between objects. If we can obtain the desired information directly from the database, the performance improvement comes from multiple sources. First, we get better performance by avoiding large data transfer between the application in the application server and the database server. The transfer could occur between two machines over a network connection, on the same machine through a network connection, or on the same machine using a shared-memory connection. When the communication is done between machines, the data transfer is impacted by the several hardware components it had to go through (memory, system bus, network controller, etc), network latency, and the different levels of the network protocol stack which adds additional information to each packet transferred between machines.

If the data transfer is on the same machine, it could still go through the network protocol and includes all the processing required to manipulate the packets. The lowest overhead would be through a shared-memory connection. Even then, it still includes the database protocol overhead and memory copy from the server space to the application space. This memory to memory copy is relatively slow compared to the speed of the processor.

When an application requests data from a database, the result is serialized through a database connection. The database server can internally process the information in parallel, providing much better throughput. Keep in mind that databases such as DB2 UDB and IDS are specialized engines that have been optimized for data set manipulation. Their algorithms have been optimized by experts with years of experience in large business application support. It would make sense to take advantage of this expertise instead of trying to re-invent it.

Once the application server receives all the data, it need to instantiate a large number of objects and then communicate with each one to get the tidbit of information required for the needed calculation. The number of calls to the objects and the number of lines of code that must be execute to achieve the desired result gives us an idea of the serious processing required to achieve the desired result. By using the database as the data processing engine, we avoid most of this overhead. It also reduces the complexity of the solution.

Once all the objects are instantiated, there is still a need to create the appropriate processing that will gather the information from each objects, classify the information for proper grouping, and perform the aggregation for the final result. All this can be eliminated by using the database. Object people have often claimed the superiority of the object approach over the procedural programming. When an object has to manipulate a collection of objects, it effectively uses procedural code to complete the operation. This is one more reason to leave the set processing to the database.

The decision of using or not using a collection of objects can be impacted by the types of objects manipulated and the type of manipulation. We saw earlier that some objects are really second-class objects that are retrieved only if the higher-level object is retrieved. This can represent an opportunity to leave them in the database as much as possible.

Some of the processing requires getting information from objects. If this information can be acquired directly from the database server, we can avoid instantiating them. This can be the case for aggregation calculations like in the bank example above but also in the case of reporting. In this latter case, the object would be instantiated to report some information and then would not be required anymore.

Even in the case when objects update their state, object instantiation should be carefully considered. It still involves the object creation, the messages between objects and the persistence of the new state. It may make more sense to work with the database directly. Instead of considering an object in the database as passive, we should consider it at least semi-active. This change in point of view could introduce different application design approach. The result would be to take full advantage of the database to implement a solution.



Back to top


Conclusion

A strong J2EE application takes advantage of strengths of all of its components. This includes the browser or the user interface section, the application server, and the database engine. By putting the processing where it makes the most sense, you can reduce the application complexity and greatly improve the performance.

A database decision should be a strategic decision that will bring you a business advantage. Once the decision is made, you must use the database server to its fullest to realize this advantage. The database matters. By using all the capabilities of the database server, you can eliminate some data movement, reduce object creation and communication between objects, simplify your implementation and get to market faster. A balanced implementation that uses all the components to the fullest of their capabilities requires less hardware and less code. This reduces the costs of building and maintaining the application, savings that can be applied elsewhere in the enterprise to optimize the competitiveness of your business.

We need better tools that integrate the database capabilities into all the phases of OO development. This includes analysis, design, and implementation. In the mean time, we must insure that these tasks are done right by bringing the database expertise early in the process.



Resources



About the author

Jacques Roy is a member of IBM’s worldwide sales support organization. He has over 20 years of industry experience and over 7 years of experience with database extensibility. He is the author of "Informix Dynamic Server.2000: Server-Side Programming in C" and co-author of "Open-Source Components for the Informix Dynamic Server 9.x" in addition to multiple technical articles.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top