Skip to main content

skip to main content

developerWorks  >  Architecture | Information Management  >

Use Hungarian notation to improve database design and application implementation

developerWorks
Document options
PDF format - Fits A4 and Letter

PDF - Fits A4 and Letter
238KB (9 pages)

Get Adobe® Reader®

Document options requiring JavaScript are not displayed


New site feature

Check out our new article design and features. Tell us what you think.


Rate this page

Help us improve this content


Level: Intermediate

Kenneth Stephen (kstephe@us.ibm.com), Software Engineer, IBM

15 Jul 2008

Hungarian notation is a naming convention that can be used in code and design artifacts. In this article, learn a simple technique of Hungarian notation that you can apply during data modeling and implementation to improve the quality of your applications.

Introduction

Hungarian notation is a naming convention in programming where the naming of a variable indicates its usage. Hungarian notation was designed to be language-independent. There are two types: Systems Hungarian notation and Apps Hungarian notation.

Hungarian notation, which was invented by Charles Simonyi, got its name because the prefixes make the variable names look like they're written in a non-English language, and because Simonyi is originally from Hungary. In Simonyi’s version of Hungarian notation, every variable was prefixed with a lower case tag that indicated the kind of thing that the variable contained.

In practice, Hungarian notation has been used mainly to indicate the type of the variable, and this has lead to misconceptions about the usefulness of the technique. As it turns out, the Hungarian naming convention is quite useful. It is one technique among many that helps programmers produce better code faster.

In this article, learn a simple technique, using the notation, that you can apply during your data modeling and implementation phases.

Example

For example:

lSerialNumber <— indicates that the variable is of type long
sSerialNumber <— indicates that the variable is of type string

In most type-safe programming languages, the compiler is well aware of the type of variable and will not allow incorrect type usage. In this case, using the Hungarian notation has little value.

The notation provides benefits in situations where the usage is indicated in the naming convention. For example:

custSerialNumber <— indicates that the serial number is that of a customer 
suppSerialNumber <— indicates that the serial number is that of a supplier

In this case, the compiler will not be able to do any semantic checking; the notation helps the programmer do that. In data modeling and database programming, such capabilities are quite useful. Keep reading to learn how to apply Hungarian notation to improve the quality of your model and programs.



Back to top


Improving the data modeling experience

Consider a simple data model connected to a development process at a software company. The "crow's feet" notation is used here (see Resources for more about crow's feet). The company has several software projects. As shown in Figure 1, if a project is actively being worked on, then it has one or more features. If a feature is being worked on, then it has one or more developers associated with it.


Figure 1. Relationships among project, features, and developers
Relationship between project, features, and developers

There are three different id fields and three different name fields. The data types of the three id fields are all integers, and the data types of the three name fields are strings. The modeling tool, or compiler, can check to see if you incorrectly associated an id to a name, but it won’t be able to tell you if anything is wrong between the name from the project entity being related to the name from the feature entity. This is something that you, as the modeler, would have to pay attention to. Hungarian notation can help you by making things clearer, as shown in Figure 2.


Figure 2. Removing ambiguity by identifying source of data element
Removing ambiguity by identifying source of data element

The company also has projects planned for the future, which are still in the requirements stage, and there are no features defined for them. Analysts are assigned to flesh out the requirement for these projects, as shown in Figure 3.


Figure 3. Semantic conflict for project id
Semantic conflict for "project id"

If you compare Figure 1 and Figure 3, you can see that the project_id field is being used in a different sense. This is a true semantic conflict. In the previous case, where the name fields were conflicting, the different name fields could have all been of different data types and a modeling tool could then have caught the problem if someone equated a feature name to a developer name. In this case, however, the field name and the data type are the same. The difference between the two fields is a pure semantic difference—something no tool could detect. Here again, Hungarian notation can make things clear.


Figure 4. Resolving the semantic conflict
Resolving the semantic conflict in figure 3


Back to top


Improving the implementation

A physical implementation of a model that uses Hungarian notation can also have benefits at development time. When joins are done, similarly named fields have to be unambiguously identified. This is usually accomplished using correlation names. For example, you want to find out if you have developers working on an active project who are also working as analysts on future projects. The following sample code shows how you would code that SQL statement if you were not using Hungarian notation.

select d.name 
from developer d, feature f, project prj, planned_project pprj, analyst a
where d.feature_id = f.id
and f.project_id = prj.id 
and a.id = d.id
and a.project_id = pprj.id

With Hungarian notation, the same SQL statement would look as follows:

select d_name 
from developer, feature, project, planned_project, analyst
where d_f_id = f_id
and f_prj_id = prj_id 
and a_id = d_id
and a_pprj_id = pprj_id

Eliminating a few correlation names and replacing the ‘.’ with a ‘_’ might seem like a cosmetic change, but it's more than that. In the first form, it's easier to make a mistake, such as saying “f.project_id = pprj.id” because there isn’t any identification in “f.project_id” that the “project_id” comes from the project table and not from the “planned_project” table. Hungarian notation makes the source of the data element obvious, which reduces programmer errors.



Back to top


Applying the technique correctly

Like most best practices, if the Hungarian notation technique is not applied correctly when used in database design, then you may have problems. For example, on a recent application that was a Web service exposing a data model to command line queries, the data model contained data elements that the business user was interested in. Hungarian notation was applied in the data model design, resulting in the information in Figure 5.


Figure 5. Logical and physical data entities
Logical and physical data entities

The end users of the command line queries did not know anything about the internal naming conventions in the database. When they used the command line queries, they specified the name of the data elements known to them. For example, if they wanted the end of service date, they would specify “end_of_service_date” rather than “prv_end_of_service_date.” This meant, of course, that the application would have to specify and maintain a mapping between the external names and the internal names.

The situation was complicated because one of the main objectives of the command line client was to implement a simple SQL-like query interface. The command line query program would support simple inner joins and the use of SQL functions on the data elements. To enable this, one would need to differentiate between name when used as a string, name when used as a column name, and name when used as a correlation name or table name. This required a full-fledged parser that could make the distinctions.

For example, consider the following user command:

Query -select "max(gen_avail_date), max(gold_master_date)" -from "product_release" -where
"end_of_service_date < '2010-01-01'"

The application would now need to map this to:

select max(prv_gen_avail_date), max(prv_gold_master_date)
from product_release_view
where prv_end_of_service_date < '2010-01-01'

It's implied that the application has a parser that understands the SQL functions syntax, complexities such as correlation names, joins, and so on.

All of this complexity was unnecessary. If the database view had used the exact same names as the external business elements, no mapping would be required. The application would simply be able to take the different portions of the command line query, as specified by the user, and use that to build an SQL statement dynamically.

Figure 6 shows what is required.


Figure 6. Hiding the notation
Hiding the notation

The external view is a view defined on top of the internal view. This way, the data modeling can have all the advantages associated with Hungarian notation. The SQL statements are programmatically built, based on end-user input, and so don’t suffer from the problem of programmers getting the data semantics wrong.



Resources

Learn

Get products and technologies
  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss


About the author

Kenneth Stephen is an application architect with more than 17 years of experience in IT, primarily involving developing and maintaining applications that use relational databases. He has worked as a database designer and modeler for over six years. He currently works for IBM in AIX development. He holds a bachelor of technology degree in electronics and communication engineering from Kerala University, India.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top


IBM, the IBM logo, ibm.com, DB2, developerWorks, Lotus, Rational, Tivoli, and WebSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others.