 | Level: Intermediate Robert Brunner (rb@ncsa.uiuc.edu), NCSA Research Scientist, Assistant Professor of Astronomy, University of Illinois, Urbana-Champaign
11 Jul 2006 The SQL SELECT statement lets you perform queries against an Apache Derby database to select data from multiple tables where certain conditions are valid. This article builds on previous articles in this series to develop even more powerful SELECT statements. Learn how to use the ORDER BY clause and DISTINCT keywords to modify the rows of data selected by a query; how to include basic mathematical operators in a query; and how to work with the primary built-in SQL functions to convert data from one data type to another, to compute aggregate quantities, to perform mathematical operations, to work with date and time data, and to work with character strings.
Database queries
The previous article, "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 4," introduced the fundamental concepts related to selecting data from an Apache Derby database. These included the SELECT, FROM, and WHERE clauses, and the concept of joining two tables together to facilitate more advanced data selection. That article also
discussed how to initialize your Derby workspace by using the included SQL script file to build
and populate the Apache Derby database demonstrated in the example code listings. If you
haven't already done so, you should read the previous article and initialize your Derby
workspace by running the SQL script available for download with that article.
This article builds on these basic SQL concepts to facilitate more advanced data-selection operations, including
the use of basic mathematical operations and SQL functions within a query. You can use these operations and
functions in the WHERE clause to strengthen the data restrictions in your query or
apply them in the SELECT clause to extract computed values. First, however, the article discusses
several additional SQL query keywords that you can use to modify the results of
your query. Start Derby's ij tool, connect to your test
database, and follow along to learn more about writing SQL queries with the Apache Derby database.
Modify the selection of data
All the queries presented in the previous article were straightforward, selecting a set of
columns from all rows in a table, from a simple join of two tables, or from a query that included
a simple WHERE clause. Although useful, this functionality corresponds to
performing a data dump. Fortunately, SQL provides a number of techniques to filter
both the rows selected from a table and the data extracted. The rest of this section
discusses two basic techniques for controlling the data extracted by a query: the DISTINCT keyword and the ORDER BY clause.
Extract distinct rows
By default, when data is selected by using an SQL query, all rows that satisfy the WHERE clause are extracted from the database. In some cases, this may result in
rows that have identical column values being returned. If you need to restrict your query so that only unique row values are
returned, you can use the DISTINCT qualifier, as shown in Listing 1.
Listing 1. Using the DISTINCT qualifier in a query
rb$ java org.apache.derby.tools.ij
ij version 10.1
ij> connect 'jdbc:derby:test' ;
ij> SELECT DISTINCT vendorNumber AS "Vendor #" FROM bigdog.vendors ;
Vendor #
-----------
3
2
1
3 rows selected
ij> SELECT DISTINCT vendorNumber AS "Vendor #", itemNumber as "Item #"
FROM bigdog.vendors WHERE itemNumber > 5 ;
Vendor # |Item #
-----------------------
1 |6
1 |7
1 |8
3 |9
3 |10
5 rows selected
ij> |
 |
The 411 on DISTINCT
If you want to use the DISTINCT qualifier, it must be the first item
listed in the SELECT clause, as shown in Listing 1, and you can have only one DISTINCT qualifier per SELECT clause. If the selected rows contain a column with NULL values, multiple NULL values are considered duplicates when identifying unique rows.
|
|
The first query in this listing uses the DISTINCT qualifier to restrict the output of
the query to only distinct, or unique, values of the vendorNumber
column, which is the only column listed in the SELECT clause. In the
example schema that these articles use, there are only three vendors (with vendorNumber being restricted to 1, 2, or 3). Thus, when the DISTINCT qualifier is used in the query, only three rows are selected.
The DISTINCT qualifier, however, applies to the entire list of selected
columns, so if multiple columns are listed following a DISTINCT
keyword, only unique combinations of all the columns are selected. This is demonstrated in
the second example, where both vendorNumber and itemNumber are listed in the SELECT
clause. Because every item has a unique itemNumber, every combination of
these two columns is unique, and all rows that satisfy the WHERE clause
are selected -- in other words, the DISTINCT qualifier has no effect on
the results.
One remaining point that you may have noticed from the examples in Listing 1 is that the selected
rows were not in the same order. If the order of selected rows is important, you can easily
control it by using an ORDER BY clause in your query, as shown in next section.
Ordering rows
In general, you can't assume that Apache Derby, or any database, will return rows from a query in a specific
order. If the order is important, you can use the ORDER BY clause to
have Apache Derby order the data that are returned by your query in a particular manner. Generally,
you do so by specifying a column that should be used to provide the ordinal values for
comparison, as shown in Listing 2.
Listing 2. Using the ORDER BY clause in a query
ij> SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
p.price as "Price", p.itemNumber AS "Item #"
FROM bigdog.products AS p, bigdog.vendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
ORDER by v.vendorNumber ;
Vendor # |Vendor |Price |Item #
---------------------------------------------------------------
1 |Luna Vista Limited |32.95 |8
1 |Luna Vista Limited |24.95 |7
1 |Luna Vista Limited |99.99 |2
2 |Mikal Arroyo Incorporated |49.95 |5
2 |Mikal Arroyo Incorporated |29.95 |4
3 |Quiet Beach Industries |34.95 |10
6 rows selected
ij> SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
p.price as "Price", p.itemNumber AS "Item #"
FROM bigdog.products AS p, bigdog.vendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
ORDER BY v.vendorNumber ASC, p.price DESC ;
Vendor # |Vendor |Price |Item #
---------------------------------------------------------------
1 |Luna Vista Limited |99.99 |2
1 |Luna Vista Limited |32.95 |8
1 |Luna Vista Limited |24.95 |7
2 |Mikal Arroyo Incorporated |49.95 |5
2 |Mikal Arroyo Incorporated |29.95 |4
3 |Quiet Beach Industries |34.95 |10
6 rows selected
ij> |
 |
Using column names: A best practice
Although using column numbers may seem like a handy shortcut, it generally isn't a good idea. To
see why, consider what happens if you modify the columns listed in a SELECT clause or just modify their order. If you forget to modify the
numbers used in the ORDER BY clause, the query will break -- or
worse, return bad data. In general, it's a best practice to always be explicit and specify
the column names directly, even if doing so means more typing.
|
|
In Listing 2, the first query uses the ORDER BY clause
to list a subset of all the rows in the table that results from joining the bigdog.vendors table to the bigdog.products table.
The rows are ordered by vendorNumber (the subset is constructed by applying the WHERE clause). An ORDER BY clause can take either
a column name, as in this example, or a column number, which is taken from the order in
which the columns are listed after the SELECT keyword.
You can also specify multiple columns to use during the sorting process and
even specify ASC for ascending order, which is the default, or DESC for descending order. For example, if you used the ORDER BY 1 DESC, 4 DESC clause in the first query, the query would return
the same rows, but they would be ordered by using the vendorNumber column as the
primary sort column in descending order followed by the itemNumber column as the secondary sort
column in descending order.
Do math in a query
Selecting columns from a database provides a number of useful benefits, but being able to compute
and select quantities based on data in a table opens up even more possibilities. Apache Derby
provides several mathematical operators, detailed in Table 1, that you can use in either a
SELECT clause or a WHERE clause.
Table 1. Apache Derby mathematical operators
| Operator | Example | Description |
|---|
unary + | +1.0 | A noop, or no operation, as +4 = 4 | unary - | -p.price | Changes the sign of the value to which it's applied | + | p.itemNumber + 10 | Adds the second value to the first value | - | p.itemNumber - 10 | Subtracts the second value from the first value | * | p.price * 1.0825 | Multiplies the first value by the second value | / | p.price / 100.0 | Divides the first value by the second value |
Using these operators is straightforward because they generally behave exactly as you expect. For example,
if the sales tax is 8.25%, you can return the price for an item both before and after sales tax has been
applied by using SELECT price, price * 1.0825 FROM bigdog.products ;.
As another example, if you have a column called numberItems that tracks the number of items purchased and another column called price that contains the price at which they're
purchased, you can return the total amount paid for those items at a given price by using
numberItems * price. Several of the queries shown in the code listings
in the rest of this article provide additional examples of how to use these operators.
The only concern when using these operators arises from complications that result from using different data types,
such as integer or floating-point, in a mathematical operation. If both operands are the same data type, the result type will be the same.
If you're performing division, this can result in truncation (for example, if you're using two
integer values), which might cause unexpected problems. You can handle this issue properly by using the CAST function described later in this article. On the other hand, if the two operands are different data types,
the result type is promoted to the more complex type.
Use Derby's SQL functions
SQL is a powerful and expressive language that can be used to perform a wide range of actions.
Part of the SQL language's power comes from its ability to directly interact with a
variety of data types. This section discusses the SQL-92 standard built-in functions provided by Apache Derby, which you can use in your
SQL statements to simplify many common data-manipulation tasks. For the purposes of this article,
these functions are broken into five categories:
- Type conversion
- Aggregate
- Math
- Date and time
- Character string
Type-conversion functions
The simplest type of function to use in Apache Derby is the type-conversion function. These functions can be used to explicitly convert one type of data, such as a character string, to another type of data, such as a date or integer, or vice versa. There are two main categories of type-conversion functions: the explicit type functions, detailed in Table 2, and
the CAST function, which provides a more powerful approach to type
conversion.
Table 2. Apache Derby type-conversion functions
| Function | Example | Description |
|---|
BIGINT | BIGINT(123.45) | Returns a 64-bit integer constant for a given character string or number. | CHAR | CHAR(123.45) | Returns a fixed-length character representation, up to a maximum length of 254 bytes, of a given value, which must be one of the
built-in Apache Derby types. An optional second argument can be supplied to specify the
length of the character string. | DATE | Date('2006-05-16') | Returns a date representation of the input value. | DOUBLE | DOUBLE(123.45) | Returns a double-precision, floating-point representation of the input number or character
string. | INTEGER | INTEGER(123.45) | Returns an integer constant for a given character string, date, time, or number. | SMALLINT | SMALLINT(123.45) | Returns a small integer constant for a given character string or number. | TIME | TIME('12:24:30') | Returns a time representation of the input value. | TIMESTAMP | TIMESTAMP('2006-05-16','12:24:30') | Returns a timestamp representation of the input value. | VARCHAR | VARCHAR(123.45) | Returns a variable-length character representation, up to a maximum length of 32,672 bytes, of a given date, time, timestamp, or
character-string value. |
You can use the CAST function to change the length of a character
string or convert one type of data to another, including data types that are not among those
supported by the type-conversion functions listed in Table 2 (such as DECIMAL). In this manner, you can use CAST to make the columns selected by a query more readable or more compact or to preserve numerical precision in a mathematical
expression.
 |
Explicit type conversions: A best practice
So why bother, given that Derby often performs implicit type conversions automatically? The simple answer is that being explicit makes it easier to understand the logic of your code. When code is easier to
understand, it's easier to maintain and is generally more
robust against programming errors. Being explicit when writing SQL queries is always a best practice. |
|
When you're extracting results from a database within a Java™ program, however, you also have the option of converting data types within the application. Both methods provide the same result, but you gain several performance benefits by
having the database cast the values, because you can minimize the network traffic by having the
database send only the actual data required and push any processing to the database engine and
away from a potential thin client. This approach may overburden a database
server if taken to an extreme, so you should use it only when warranted.
Using these type-conversion functions is straightforward, as demonstrated in Listing 3.
Listing 3. Using the CAST operator
ij> SELECT p.price AS "Price",
DOUBLE(p.price * 1.0825) AS "Float Total",
CAST(p.price * 1.0825 AS DECIMAL(6,2)) AS "Total",
CAST(p.description AS CHAR(20)) AS "Description"
FROM bigdog.products AS p
WHERE p.price > 21.0 AND p.stockDate < Date('2006-01-21') ;
Price |Float Total |Total |Description
--------------------------------------------------------------
24.95 |27.008375 |27.00 |White beach towel
32.95 |35.668375 |35.66 |Blue-striped beach to
2 rows selected
|
The single query shown in Listing 3 demonstrates how to properly use several type-conversion
functions. The second line of this query converts the product of the selected price column and the numerical factor of 1.0875, which applies a tax of 8.75% to the price of an item, to a DOUBLE precision floating-point
number. The third and fourth lines use the CAST function to first
convert this same product to a DECIMAL type and then truncate the
selected description column to display only 20 characters. Notice
that the CAST function truncated the calculated price when converting
the value to a DECIMAL data type. Finally,
the DATE function converts a date character string to a
DATE data type. Although this step isn't required, being explicit is
a best practice (see the sidebar on explicit types).
This query also demonstrates the use of the AS clause to rename columns in a
query. This is important because the query uses functions in place of column names. When Apache
Derby doesn't have an explicit name for a column, it automatically uses numbers, which
provide little insight into the nature of the data. By explicitly providing names for all
columns, you avoid this problem. In addition, because these names include white space, you must enclose
them in double quotation marks to indicate to the ij tool that they are character-string constants.
Compute aggregate values
Apache Derby also provides functions that operate on multiple rows, known as aggregate
functions, which you can use to enhance a query. Aggregate functions -- also known as set functions in SQL-92 or, more informally, as column functions -- return a computed
quantity from a column over a number of rows. Apache Derby supports five aggregate functions, which are detailed in Table 3.
Table 3. Apache Derby aggregate functions
| Function | Example | Description |
|---|
AVG | AVG(p.price) | Returns the average value of a column from all rows that satisfy an expression. Can only be used with
built-in numeric data types. The precision of the returned value is defined by the precision
of the column being evaluated. | COUNT | COUNT(p.price) | Returns the number of rows that satisfy an expression, such as a query. Can be used with
any data type. | MAX | MAX(p.price) | Returns the maximum value of a column from all rows that satisfy an expression. Can only be used with
built-in data types. | MIN | MIN(p.price) | Returns the minimum value of a column from all rows that satisfy an expression. Can only be used with
built-in data types. | SUM | SUM(p.price) | Returns the sum of a column over all rows that satisfy an expression. Can only be used with
built-in numeric data types. |
These aggregate functions can often be used to quickly find useful information
that might otherwise be difficult to identify, as shown in Listing 4.
Listing 4. Using aggregate functions in a query
ij> SELECT COUNT(p.itemNumber) AS Number,
CAST((AVG(p.price) + 0.005) AS DECIMAL(5,2)) AS Average,
MIN(p.stockDate) AS "First Date", MAX(p.stockDate) AS "Last Date"
FROM bigdog.products AS p ;
NUMBER |AVERAGE |First Date|Last Date
---------------------------------------------
10 |31.66 |2005-12-20|2006-03-31
1 row selected
|
Listing 4 uses four of the five aggregate functions to get summary information about
the data in the bigdog.products table. The COUNT
function indicates that the table includes ten rows (because the query didn't use a WHERE clause to restrict the rows selected from the table). The
AVG function calculates the average price of all items in the bigdog.products table. Because this calculation results in a
floating-point number, the query explicitly casts the result back to the suitably sized DECIMAL data type. Unlike in Listing 3, this example adds 0.005 to force the CAST function to properly
round the calculated value. Finally, the MIN and MAX functions extract the minimum and maximum dates from the bigdog.products table.
Math functions
Apache Derby provides three built-in functions, detailed in Table 4, that let you perform
more complex mathematical operations in a SQL query, especially when used in concert with the mathematical
operators detailed earlier in this article. These functions can be used in either the SELECT or the WHERE clause of a query.
Table 4. Apache Derby math functions
| Function | Example | Description |
|---|
ABS or ABSVAL | ABS(-1.0) | Returns the absolute value of an expression, which must be
one of the built-in numeric types. The return type is the same as the argument. | MOD | MOD(1, 2) | Returns the remainder when the first argument is divided by
the second argument. The return type is the same as the argument with the biggest integer
type (SMALLINT, INTEGER, or BIGINT). The sign of the result is determined solely from the sign of the first
argument. | SQRT | SQRT(0.5) | Returns the square root of an expression, which must be a
floating-point value. The return type is the same as the argument. |
Date and time functions
Apache Derby provides considerable support for date and time data types. To simplify
using these types in a database query, there are six date and time functions, which are detailed in Table 5.
These functions often provide an alternative to using the type-conversion functions described
earlier. For example, you can replace the p.stockDate <
Date('2006-01-21') part of the WHERE clause in the query
shown in Listing 3 with the YEAR(p.stockDate) < 2006.
Table 5. Apache Derby date and time functions
| Function | Example | Description |
|---|
DAY | DAY(p.stockDate) | Returns an integer that contains the day component of a date, timestamp, or character
string that contains a valid date | HOUR | HOUR('12:21:30') | Returns an integer that contains the hour component of a time, timestamp, or character
string that contains a valid time | MINUTE | MINUTE('12:21:30') | Returns an integer that contains the minute component of a time, timestamp, or character
string that contains a valid time | MONTH | MONTH('2005-12-20') | Returns an integer that contains the month component of a date, timestamp, or character
string that contains a valid date | SECOND | SECOND('12:21:30') | Returns an integer that contains the second component of a time, timestamp, or character
string that contains a valid time | YEAR | YEAR(p.stockDate) < 2006 | Returns an integer that contains the year component of a date, timestamp, or character
string that contains a valid date |
Character-string functions
The last category of functions in Apache Derby simplifies the task of working with character data.
These eight functions, detailed in Table 6, can be used to count character values, convert character
values to all uppercase or all lowercase, remove white space at the start or end of the string,
find a specific substring, or select a substring.
The character values used by these functions can be database columns that are one of the character data types supported by
the Apache Derby database, database columns that can be directly converted to a character data
type, or constant character strings. For example, the query SELECT
UCASE(p.description), UCASE('hello world'), UCASE(p.stockDate) FROM bigdog.products AS
p ; returns three columns, which are all displayed as uppercase character strings.
Table 6. Apache Derby character-string functions
| Function | Example | Description |
|---|
|| | (p.description||v.vendorName) | The concatenation operator combines two values into a new character string. If both values
are of type CHAR, the result is also of type CHAR. If the values are of type VARCHAR
or a numerical data type, which can be converted to type VARCHAR,
the resulting type is VARCHAR. | LCASE, or LOWER | LCASE(p.description) | Returns a character string in which all alphabetical characters in the input value have been converted to lowercase. | LENGTH | LENGTH(p.description) | Returns the number of characters in the input value. Noncharacter data is
implicitly converted to a character string. | LOCATE | LOCATE('beach',p.description) | Returns the starting location for the first occurrence of a substring in a search string,
or zero if the substring isn't found. The first argument is the substring, the second
argument is the search string, and an optional starting position can be
supplied as a third argument. | RTRIM | RTRIM(p.description) | Returns a character string in which all spaces at the end of the input value have been
removed. | LTRIM | LTRIM(p.description) | Returns a character string in which all spaces at the start of the input value have been
removed. | SUBSTR | SUBSTR(p.description, 1, 4) | Returns part of an input character string as a VARCHAR,
starting at the location specified and continuing until the end of the character string, or
to the location specified by the optional third argument. If the starting location is
positive, it's relative to the beginning of the string; if it's negative, it's
relative to the end of the string. | UCASE, or UPPER | UCASE(p.description) | Returns a character string in which all alphabetical characters in the input value have been converted to uppercase. |
 |
Summary
This article introduced several techniques for writing more powerful SQL queries. First, the
DISTINCT keyword was used to select only unique rows in a query. Second, the ORDER BY clause was used to sort the selected rows in a query by one or more criteria. Third, you read about the basic
math operators that can be used in a query. And finally, you learned about the basic built-in SQL functions in detail. These functions can often be used to simplify complex SQL queries.
These techniques will be further demonstrated in future articles where you'll use them to selectively delete or modify data in an existing database.
Resources Learn
- Check out the other articles in this series:
- "Developing with Apache Derby -- Hitting the Trifecta: Introduction to Apache Derby" (developerWorks, February 2006)
introduces the Apache Derby database and provides the foundation for many topics in this series.
- "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 1: Start working with Derby today" (developerWorks, March 2006) introduces the
ij tool and demonstrates how to use it to connect to an Apache Derby database.
- "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 2: Schemas" (developerWorks, April 2006) covers several database concepts, including schemas, tables, and column data tapes, and gives you a simple introduction to SQL.
- "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part
3: Running scripts and inserting data" (developerWorks, May 2006) introduces the concept of executing SQL scripts with Apache Derby and demonstrates how to
insert data into tables in an Apache Derby database.
- "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part
4: Select data with a query" (developerWorks, June 2006) introduces the concept of SQL queries, including the basic SELECT, FROM, and WHERE
clauses.
- Peruse the Apache Derby Project online manuals for more detailed information about how to use the Apache Derby database.
- Take the Apache Derby Project tutorial that details how to
download and install Apache Derby.
-
Learn how to properly verify your Derby download.
- Check out the developerWorks Apache Derby project area for articles, tutorials, and other resources to help you get started with Derby today.
- Learn more about the IBM® Cloudscape™ database, which is built using the Apache Derby code base.
- Browse all the Apache articles and free Apache tutorials available in the developerWorks Open source zone.
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
- Browse for books on these and other technical topics at the Safari bookstore.
Get products and technologies
Discuss
About the author  | 
|  | Robert J. Brunner is a Research Scientist at the National Center for Supercomputing Applications and an Assistant Professor of Astronomy at the University of Illinois, Urbana-Champaign. He has published several books and a number of articles and tutorials on a range of topics. You can reach him at rb@ncsa.uiuc.edu. |
Rate this page
|  |