Skip to main content

skip to main content

developerWorks  >  AIX and UNIX  >

Basic UNIX filesystem operations

Reading through directories for fun and profit

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


New site feature

Check out our new article design and features. Tell us what you think.


Rate this page

Help us improve this content


Level: Introductory

Chris Herborth (chrish@pobox.com), Freelance Writer, Author

23 May 2006

Take advantage of the readdir() and stat() functions to run through the entries of a directory. Because of the plethora of files and directories on a UNIX® system, you're going to need to know how to process directory entries using the readdir() function and extract information about those entries using the stat() function. These basic file system operations can serve you well in your UNIX programming career, allowing you to easily discover and read files, directories, and symbolic links on your UNIX system.

Introduction

The everything is a file philosophy of UNIX® means you'll be working with files and directories all the time, no matter what sort of application you're working on. Everything is stored as a file -- from data to configuration files and then to devices -- and the functions in the stdio.h system header will be your good friends after only a few hours of learning UNIX programming.

A common problem that always seems to trip up new UNIX programmers is how to run through a directory and process the files, directories, and symbolic links found inside. How can you get a list of them, and how can you tell what they are?

Read on to find out how to use the dirent.h family of functions (opendir()/readdir()/closedir()) to read through the entries in a directory, and the stat() function to figure out what those entries are.



Back to top


Before you start

The sample code included with this article (see Downloads) was written in Eclipse 3.1 using the C/C++ Development Tools (CDT); the readdir_demo project is a Managed Make project, which is built using the CDT's program generation rules. You won't find a Makefile in the project, but they're so trivial that you'll have no trouble generating one if you need to compile the code outside of Eclipse.

If you haven't tried using Eclipse yet, you should really give it a go. It's an excellent integrated development environment (IDE) that just gets better with each release. That's coming from a die-hard EMACS and Makefile-based developer, too. See the Resources section at the end of the article for links to some excellent Eclipse articles.



Back to top


Reading directory entries

Given a path to a directory, how do you go about reading the entries inside? You can't open it as a file (with the open() or fopen() functions), and even if you could, the data is probably very specific to the kind of file system you're using and not something that a casual programmer should be messing with.

The dirent.h functions, opendir(), readdir(), and closedir(), are just what you need. Using them is very similar to the open/read/close idiom you're probably used to using with files, with one exception: the readdir() function returns a pointer to a special structure (of type struct dirent) for each directory entry, one at a time. In general, a run through a directory looks something like the pseudocode in Listing 1.


Listing 1. Reading the contents of a directory

dir = opendir( "some/path/name" )
entry = readdir( dir )
while entry is not NULL:
    do_something_with( entry )
    entry = readdir( dir )
closedir( dir )

The opendir() and readdir() functions both return NULL if a problem occurs and the errno global variable is set to indicate what went wrong. If readdir() returns NULL and errno is 0 (sometimes called EOK or ENOERROR), there are no more directory entries.

One thing to watch out for is that each directory contains "." (a reference to the directory) and ".." (a reference to the directory's parent directory) entries. Depending on what you're doing, you might need to skip processing for these entries.

Note that readdir() is not thread-safe, because the returned structure is a static variable stored in the function library. Most modern UNIXes have a thread-safe readdir_r() that you can use instead if you're writing threaded code.



Back to top


What's in a struct dirent?

The POSIX 1003.1 standard defines only one required entry for struct dirent, an array of char named d_name. This is the entry's name as a standard NUL-terminated string. Anything else found in this structure is specific to your UNIX system.

That's right, everything else found in struct dirent is not portable. Strictly conforming systems might not have anything else in there at all. If you write code that uses any extra structure members, you'll have to flag it as not portable and, if you're feeling particularly friendly, include an alternate code path that does the same thing.

For example, many UNIXes include a d_type member and some additional constants that let you check a directory entry's type without making an additional stat() call. Besides saving you another system call, this non-portable extension saves you an expensive trip back to the file system for more metadata. The stat() function is notoriously slow on most UNIXes.



Back to top


Getting file information

Besides getting the names of entries in a directory, you'll probably need some additional information to figure out what to do next. At the very least, you can't tell a file entry from a directory entry from the name alone.

The stat() function fills a struct stat structure with information about a specific file; if you've got a file descriptor instead of a file name, you can use the fstat() function instead. If you want to be able to detect symbolic links as well, use lstat() on a file name.

Unlike the struct dirent that readdir() returns, struct stat has quite a few standard, required members:

  • st_mode -- file permissions (user, other, group) and flags
  • st_ino -- file serial number
  • st_dev -- file device number
  • st_nlink -- file link count
  • st_uid -- the owner's user ID
  • st_gid -- the owner's group ID
  • st_size -- file size in bytes (for regular files)
  • st_atime -- the last access time
  • st_mtime -- the last modification time
  • st_ctime -- the file's creation time

Using the S_*() macros on the st_mode member lets you find out what kind of directory entry you're dealing with:

  • S_ISBLK(mode) -- Is this a block special file? (usually a block-based device of some sort)
  • S_ISCHR(mode) -- Is this a character special file? (again, usually a character-based device of some sort)
  • S_ISDIR(mode) -- Is this a directory?
  • S_ISFIFO(mode) -- Is this a pipe or FIFO special file?
  • S_ISLNK(mode) -- Is this a symbolic link?
  • S_ISREG(mode) -- Is this a regular file?

The stat() call is notoriously slow on most file systems, so you might want to cache this information in memory if you're going to need it again later.

A word about symbolic links

Normally, you don't care about symbolic links. If you happen to call stat() on a symbolic link, you'll get back information about the file to which the link points. This is consistent with the user's experience, since the permissions of the target file, not the symbolic link itself, are what govern their interactions with that file.

Some applications, such as ls as well as backup programs, want to be able to display information about the link itself, such as what file it refers to. This is when you use lstat() instead of stat(); when you have a specific purpose for needing information about the symbolic link itself, instead of working with the linked file.



Back to top


Putting it together

Now that you've read about using readdir() and stat() to find out about the entries of the directory, let's look at some code that demonstrates these functions in action.

The code presented here reads through one or more directories specified on the command line, and prints some information about each entry found in the directories. When it finds another directory, it processes that directory the same way. Symbolic links have their target printed, and you print out the size of regular files. Special files are ignored.

As you can see from Listing 2, all kinds of headers are pulled for this simple demo application. The top chunk contains the standard bits that most programs use, and the last four are required to use readdir() and stat() in this program.


Listing 2. Headers and constants

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <limits.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <unistd.h>

The process_directory() function (which starts in Listing 3 and ends in Listing 6) reads the specified directory and prints some information about each entry. The DIR pointer returned by opendir() is similar to the FILE pointer returned by fopen(); it's an OS-specific object used to track the directory stream, and you should ignore its contents.


Listing 3. Process a directory

unsigned process_directory( char *theDir )
{
    DIR *dir = NULL;
    struct dirent entry;
    struct dirent *entryPtr = NULL;
    int retval = 0;
    unsigned count = 0;
    char pathName[PATH_MAX + 1];

    /* Open the given directory, if you can. */  
    dir = opendir( theDir );
    if( dir == NULL ) {
        printf( "Error opening %s: %s", theDir, strerror( errno ) );
        return 0;
    }

After opening the specified directory, you call readdir_r() (see Listing 4) to get information about the first entry; each subsequent call to readdir_r() returns the next entry, until you reach the end of the directory and entryPtr is set to NULL. You also use strncmp() here to check for the "." and ".." entries in order to skip them. If you don't skip them, you'll be processing directories forever (as "theDir/./././././././././." and so on).


Listing 4. Read a directory entry

    retval = readdir_r( dir, &entry, &entryPtr );
    while( entryPtr != NULL ) {
        struct stat entryInfo;
        
        if( ( strncmp( entry.d_name, ".", PATH_MAX ) == 0 ) ||
            ( strncmp( entry.d_name, "..", PATH_MAX ) == 0 ) ) {
            /* Short-circuit the . and .. entries. */
            retval = readdir_r( dir, &entry, &entryPtr );
            continue;
        }

Now that you have a the entry name of the directory, you need to construct a more complete path (see Listing 5), and then call lstat() to get the entry's information. Because symbolic links get special treatment, use the lstat() function here. You can find their target with the readlink() function.

If the entry is a directory, you recursively call process_directory() on the directory, and add the number of entries it found to your running total. If the entry is a file, you print the name and the number of bytes it's currently using (found using the st_size member of the struct stat).


Listing 5. Process the entry

        (void)strncpy( pathName, theDir, PATH_MAX );
        (void)strncat( pathName, "/", PATH_MAX );
        (void)strncat( pathName, entry.d_name, PATH_MAX );
        
        if( lstat( pathName, &entryInfo ) == 0 ) {
            /* stat() succeeded, let's party */
            count++;
            
            if( S_ISDIR( entryInfo.st_mode ) ) {            
/* directory */
                printf( "processing %s/\n", pathName );
                count += process_directory( pathName );
            } else if( S_ISREG( entryInfo.st_mode ) ) { 
/* regular file */
                printf( "\t%s has %lld bytes\n",
                    pathName, (long long)entryInfo.st_size );
            } else if( S_ISLNK( entryInfo.st_mode ) ) { 
/* symbolic link */
                char targetName[PATH_MAX + 1];
                if( readlink( pathName, targetName, PATH_MAX ) != -1 ) {
                    printf( "\t%s -> %s\n", pathName, targetName );
                } else {
                    printf( "\t%s -> (invalid symbolic link!)\n",
 pathName );
                }
            }
        } else {
            printf( "Error statting %s: %s\n", pathName, strerror( 
errno ) );
        }

At the bottom of the while loop, you read another directory entry and process it. If you've finished processing directory entries, close the directory that is currently open and return the number of entries that were processed.


Listing 6. Read another one

        retval = readdir_r( dir, &entry, &entryPtr );
    }
    
    /* Close the directory and return the number of entries. */
    (void)closedir( dir );
    return count;
}

Finally, Listing 7 shows the main() function of the program, which just calls the process_directory() function for every argument passed on the command line. A real program would have a usage message and provide some sort of feedback if the user didn't specify at least one argument, but I've left that as an exercise for the reader.


Listing 7. Mainline

/* readdir_demo main()
 * 
 * Run through the specified directories, and pass them
 * to process_directory().
 */
int main( int argc, char **argv )
{
    int idx = 0;
    unsigned count = 0;
    
    for( idx = 1; idx < argc; idx++ ) {
        count += process_directory( argv[idx] );
    }
    
    return EXIT_SUCCESS;
}

That's all there is to it. Despite the relatively large number of include files, processing directory entries isn't horribly complex.



Back to top


Summary

Using the readdir() and stat() functions to run through entries of a directory and figure out what additional processing to perform on them is straightforward, and probably something you'll find yourself doing whenever you need to iterate through a directory. It's a useful idiom, and one that some new UNIX developers have trouble grasping. This article should alleviate this, allowing UNIX developers to take advantage of these useful functions.




Back to top


Download

DescriptionNameSizeDownload method
Eclipse 3.1 managed make projectau-readdir_demo.zip24KBHTTP
Information about download methods


Resources

Learn

Get products and technologies
  • Build your next development project with IBM trial software, available for download directly from developerWorks.


Discuss


About the author

Photo of Chris Herborth

Chris Herborth is an award-winning senior technical writer with more than 10 years of experience writing about operating systems and programming. When he's not playing with his son, Alex, or hanging out with his wife, Lynette, he spends his time designing, writing, and researching (that is, playing) video games.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top


Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.