Level: Introductory Jeanna Matthews (jnm@clarkson.edu), Assistant Professor, Clarkson University
01 Jun 2002 Jeanna Matthews, assistant professor of Computer Science at Clarkson University in Potsdam, New York, had only six students in her Advanced Operating Systems class last year. All of them entered the IBM Linux Scholar Challenge. Four of them won. And won big: IBM ThinkPads for themselves and an IBM Linux zSeries for the University! How did this small technology school in the frozen north country of New York State produce three of the 25 winning Linux Scholar Challenge projects worldwide? To tell you how, Dr. Matthews shares her classroom experiences and illustrates the teaching environment at Clarkson. In the rest of our series, the students give you the details on their winning entries.
Introduction
In Fall 2001, I taught a small advanced operating systems course -- just six students. They had all been in the required undergraduate operating systems course with me in Spring 2001. At the end of the spring semester, I sent an e-mail inviting participation in a graduate-style operating systems course where we would read research papers and tackle a substantial Linux kernel project. When class started in the fall we heard about the IBM Linux Scholar Challenge. I had already been planning to do substantial Linux projects, so it seemed natural to make contest submission a goal for the students. In the end, everyone submitted -- four individual submissions and one two-person submission. I knew that some wonderful work had been done on the projects and was hoping that we would have some winners, but I really didn't know what to expect. It was over the holiday break in December that the good news started rolling onto our class mailing list. Bryan Clark was the first one to hear -- "Yahooooo!!!" his mail read. Then Phil Allen and Matt Finlayson sent the mail "Linux Challenge Winners - Bryan isn't the only one!". Their biggest problem was going to be figuring out how to split the laptop they had won. Then finally Dwight Tuinstra sent mail that he had won as well. We wondered if we were headed for a clean sweep. Steve and Matt crossed their fingers and waited but finally heard officially that they hadn't won. Still, there was great joy -- our little class of six had produced 3 out of the 25 winners worldwide! In the days to come, the news just seemed to get better and better. We found out that there had been over 1600 submissions. Then we found out that our three winners were half of the winners in North America! Then came the biggest news: Clarkson had the largest number of winners of any school and that we had therefore won our choice of a zSeries server or a 16-node Linux cluster -- now that is a big win! Since then, many people have asked how our small group was able to achieve such success in the contest. One of them was a proud Clarkson alumna, Barbara Wetmore, who also happened to be an editor for IBM developerWorks. In this article, I attempt to answer that question and to provide interested readers with information that might help you have similar success with your own Linux development.
Hands-on operating system development
VMWare to the rescue
I trace the seeds of our success back to an undergraduate class in operating systems that I taught in Spring 2001. I wanted to provide all the students with hands-on experience with a real operating system kernel. So I approached Anthony Collins, Provost of Clarkson University, with an idea. I had noticed that there was a lab filled with machines that had recently been moved over from the School of Business when they relocated into a new building. The machines were not even assembled, but the plan was to use them for a class on Windows 2000 administration. Students in that class needed exclusive access to their machines because throughout the semester they would configure the machines in a variety of ways that would make it difficult for anyone else to productively use them. My operating systems class would have a similar problem; students modifying the kernel of an operating system is a surefire way to make the machines unusable for anyone else! I proposed that we try running both of these classes on VMWare, a product that allows operating systems to be run inside their own virtual machine, thus isolating other users from any mistakes or the effects of special configurations. Provost Collins was excited by the opportunity to provide students more hands-on experience and the chance to more fully utilize the lab. He graciously agreed to allocate several thousand dollars to purchase VMWare licenses for each machine. Bill MacKinnon, the instructor for the Windows 2000 administration class, also agreed to share the lab and to give VMWare a try. I spent much of the 2001 holiday break working furiously with Bill and the computer support staff at Clarkson to assemble the machines, to install and test VMWare, and to build images for both classes. As the semester began, the lab was in the last stages of construction and with that our experiment began. Getting started
We had a wonderful teaching assistant for operating systems, Mike Akers, who was also the president of the local Linux Users group. He configured a Linux kernel for use with VMWare and worked patiently with three groups of 25 students to show them how to compile their own kernel. We used Gary Nutt's book, Kernel Projects for Linux, as our lab manual. We also used labs from his Windows NT lab manual Operating Systems Projects Using Windows NT to give students exposure to several different operating systems. The Nutt books gave us may good ideas, but the students expressed frustration with the organization of the books and with frequent incomplete or incorrect details. (Gary Nutt maintains errata pages which deal with many of these problems. I expect that if he writes a second edition that many of the problems will be fixed.) I started looking around for other ideas for Linux kernel projects and found some helpful suggestions. Steve Gribble, a professor at the University of Washington, was using a wonderful series of four excercises for his Winter 2001 undergraduate operating system class. Jason Nieh and Ozgur Leonard wrote an article, "Examining VMWare" in the August 2000 issue of Dr. Dobbs that describes their experience using VMWare for an undergraduate operating system at Columbia University (see Resources for a link). In the end, I had the students do a variety of labs, including adding a system call to the Linux kernel, instrumenting the virtual memory subsystem to report the number of page faults for a specified process, and implementing a loadable kernel module. The next section details a typical lab. Adding a system call to the Linux kernel
Adding a system call is a quick and fairly easy way to show yourself that you do indeed have control over the kernel source! To try this on your own, download a copy of the kernel source, configure it, compile your own kernel image, and then place the image in the location expected by your boot loader. If you don't want to actually build a new kernel with a new system call, you could also follow along by simply downloading the code and browsing through the files (or browsing HTML versions of the kernel source online). In whatever form you are viewing the source, there should be an arch directory under the main kernel source directory. This directory contains the platform-specific code (code that will be different if you run Linux on an i386 machine than if you run it on an alpha machine). Some of the code in this directory is in assembly code specific to the indicated architecture. The first step in adding a system call is to edit one of the files under the arch directory, specifically the file arch/PLATFORM/kernel/entry.S (arch/i386/kernel/entry.S if you are working on a PC directly or in VMWare). Among other things, this file contains a list of the system call entry points in the kernel. The list begins with ENTRY(sys_call_table) and then contains an entry for each system call. The system call's number is implied by its position in this list. For example, the fork system call used to create a new copy of the calling process is the third entry in this list, so it is system call number 2. If you look down this list, you will probably see familiar names of calls like read, write, open, close, and exit. These names are familiar because you use them in your code to request services from the operating system. Normally, you make a call to a library like stdio.h in C. These library functions, however, eventually make calls directly to operating system entry points to request actions on objects like files and processes that only the operating system is allowed to manipulate directly. arch/i386/kernel/entry.S
.data ENTRY(sys_call_table)
.long SYMBOL_NAME(sys_setup) /* 0 */
.long SYMBOL_NAME(sys_exit)
.long SYMBOL_NAME(sys_fork)
.long SYMBOL_NAME(sys_read)
.long SYMBOL_NAME(sys_write)
.long SYMBOL_NAME(sys_open) /* 5 */ ....
.long SYMBOL_NAME(sys_getdents64) /* 220 */
.long SYMBOL_NAME(sys_fcntl64)
.long SYMBOL_NAME(sys_printmyname)
.long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */
.rept NR_syscalls-225
.long SYMBOL_NAME(sys_ni_syscall)
.endr
|
When you add your system call, it is important that you do not disturb the current ordering, so you should add your system call to the end of the list and note its number. For example, you might add the entry SYMBOL_NAME(sys_printmyname) at location 222. You may also need to change the number in the .rept NR_syscalls-NUMBER line at the end of this list to make sure that NUMBER is large enough to cover your new system call. It won't hurt if NUMBER is too big, but it must not be too small. If you look at the line we added, the name is actually the name of the function that will implement the system call. You should choose a name for your system call that does not match any other function in the Linux kernel. The names of system call procedures begin with sys_ by convention. Something like sys_YOURNAME would probably be a pretty safe bet. In this example, I suggested sys_printmyname because that is what this system call will eventually do. A similar addition must be made to a file under the include directory at the root of the Linux source. The file include/asm-PLATFORM/unistd.h (again PLATFORM would be i386 for a PC) contains a list of #defines that specify the system call number of each system call. You simply add an entry for your new system call. Once again, the name you choose should be unique. include/asm-i386/unistd.h
#define __NR_exit
1 #define __NR_fork
2 #define __NR_read
3 ... #define __NR_fcntl64 221
#define __NR_printmyname 222
|
Then, of course, we need to actually implement the new system call. When we modified entry.S, we specified that the name of the procedure implementing this system call would be sys_printmyname. Where should we place this procedure, and what should its arguments and return values be? The simplest thing to do is to put this function in any existing source code file. You could also add a new file but that's a bit harder. A logical file in which to place the function is kernel/sys.c. Looking at the other system call procedures in that file will also give you a good idea of what your new system call should look like. Here's the implementation of sys_printmyname: Add to kernel/sys.c
asmlinkage void sys_printmyname(void) { printk("Jeanna was here\n"); }
|
This simple system call will print the string "Jeanna was here" to the system console. By examining some of the other system calls in kernel/sys.c, you should be able to see how to make your system call a bit more complicated -- have it return a value, allow various parameters to be passed to it, and so on. Of course, having a new system call in your kernel isn't much fun unless you can use it. To do that, write a simple C program that calls your system call. Here is a C program, my_testapp.c, that calls my new printmyname system call: my_testapp.c
//this gives us the syscall function:
#include <sys/syscall.h>
//this gives us access to our new system call number:
#include "PATH_TO_YOUR_LINUX_SOURCE/include/asm-i386/unistd.h"
//do this instead for system call numbers in the standard unistd.h
//#include <unistd.h>
int
main (int argc, char **argv) {
syscall( __NR_printmyname);
//we could avoid the include of our modified unistd.h altogether
//by calling syscall(222);
}
|
What next?
More ambitious projects
That undergraduate operating systems class gave me confidence that Linux running under VMWare made kernel-level projects in a real operating system feasible. Besides a grounding in basic operating system concepts, students left that class knowing that they could read the Linux kernel source code, understand it, and change it on their own. We helped them navigate through some activities (like configuring the kernel, compiling it successfully, and loading it) that can deter would-be kernel hackers. VMWare overcame the other major hurdle -- a system in which multiple developers could work without worrying about interfering with one another's work or disrupting the stable base operating system. That semester left me, and several of my students, with a desire to tackle more ambitious projects. I decided to see how much interest there would be in a graduate-level advanced operating systems class the next fall. I also wanted to give Dwight Tuinstra, a Ph.D. student with whom I had been working for about a year, exposure to graduate-level operating system material. I thought extending some of the ideas for log-structured file systems that I had explored in my own dissertation would help immensely with the project we had been working on together. So I e-mailed the students I thought might be interested in more advanced projects, and got positive responses from a small but committed group. Over the summer I worked on plans for this new advanced operating systems class.
Advanced operating systems
Our small advanced operating systems class began in August 2001. We met twice a week for an hour and a half. Over the summer, we had prepared for the class by discussing (by e-mail) several potential Linux implementation projects. Bryan Clark had suggested a user resource tracking system that would eventually form the basis of his winning contest entry. Steve Gribble from the University of Washington also offered to let us use some of his Linux projects. When we heard about the Linux Challenge, it seemed natural to target it as a motivating force for our own projects.
Students can make a difference
In addition to Linux projects, we read four to five research papers per week and used our class periods to discuss them. We read everything from some of the earliest operating systems papers (like Dykstra's 1968 paper on the THE multiprogramming system and papers describing various aspects of Multics, the predecessor to UNIX) to some of the most recent operating systems publications. With the small class size, everyone was challenged to express opinions and really get involved. Students appreciated getting a historical perspective of their field. For example, when we studied Multics I brought in pictures of the IBM 360, a mainframe computer from that era (See Resources for a link). We talked frequently about how young our field is and how rapidly technology has changed over that time. I believe they understood what I was telling them: This field is young and rapidly changing and there is plenty of room for you as innovative computer scientists to make a difference. Students also learned that research is not absolute -- through direct exposure to operating system literature. We discussed two papers each class, and I deliberately chose papers that addressed similiar problems in very different ways. I highlighted how major themes in computer science appeared and reappeared in diverse ways. An example is the tension between building a general purpose operating system that does well for the majority of applications versus providing ways for applications to inform the operating system of specific optimizations to make on its behalf (whether through simple hints, through user level servers, or through code downloaded into the kernel). Class activities
For each paper, I had students submit, before class, a short response including a brief summary and three criticisms of the paper. Over the course of the semester, I saw students become much more willing to point out the limitations of a paper, to suggest additional experiments, and to propose extensions. Just like the undergraduate operating system course in the spring had shown them that operating system code wasn't "untouchable" and out of their reach, this class was showing them that operating system research wasn't untouchable or out of their reach either. They could read it, dissect it, even implement and test it. In addition, I had each student lead a class discussion during the semester. This gave them the opportunity to distill the many details into an effective presentation that provided an overview and highlighted points of comparison. Our class discussions were animated and enjoyable. We frequently went a half hour or more over our scheduled time slot. Having a late afternoon time period was crucial for this. We even scheduled one day for pizza and a movie: Plan 9 from Outer Space (not Plan 9 from Bell Labs -- see Resources for a link). I also made a point of suggesting and giving credit for "optional activities". Students did everything from writing small programs to testing a hypothesis to testing the maximum possible bandwidth from a network card to challenging some published performance numbers. In addition to my suggestions, students were free to suggest their own activities. Regardless, the students were able to get credit for exploring topics of interest to them and reporting back to the class. "There's another optional activity" became a favorite phrase for class discussions. Our contest submissions
These optional activities were good practice for our winning Linux contest entries. In fact, many of our contest submissions evolved directly from papers we read or activities we did as a class. Matthew Sabins implemented his own version of Eraser, a lock detection system for multithreaded programs, after reading the 1997 paper by Savage. Dwight Tuistra submitted winning plans for restucturing the LFS cleaner from NetBSD and porting it to LinLogFS after reading the original LFS paper by Rosenblum and Ousterhout and my own 1999 paper on improving LFS performance. Phil Allen and Matt Finlayson's winning threadpool submission was the product of one of Steve Gribble's assignments and a threadpool bake-off we did in class. (Everyone implemented the threadpool and reported performance numbers, then we spent class time analyzing each one. Ultimately, we found that each implementation had some useful feature not captured in the others. Phil and Matt took on the project of merging these best features into one.) Bryan Clark and Stephen Evanchik worked together to build a user resource tracking system based on a comment Bryan noticed about the need for such a system in the kernel source. They submitted two entries describing different aspects of the system, including Bryan's winning entry.
 |
Conclusion
It was almost one year exactly from the time I started working to assemble a lab for undergraduate operating systems to when we received the news of our three winning entries and the overall prize for Clarkson. Just as direct contact with a real operating system helped students understand operating systems in a way no textbook could, direct contact with primary sources of operating system research helped students understand research. Together, this was a powerful combination -- empowering students both to propose new ideas and to implement them. These are some of the lessons from our experience:
- Hands-on, team-based learning as practiced consistently at Clarkson University helps students succeed in the real world. Small classes are a huge win in this regard.
- Giving students the opportunity to work with real production operating system code motivates them to solve real problems.
- Direct contact with primary sources of research can give students a sense of history and context that empowers them to suggest changes.
- VMWare is an excellent platform for kernel development experience and for efficiently and safely supporting multiple student developers on a single machine.
- Open-source software is an important educational opportunity in the university setting.
For me, this has been an extremely rewarding year of seeing the students I've worked with develop professionally and achieve great success. The students and I are thankful for the Linux Scholar Challenge for encouraging and rewarding student participation and achievements.
Resources
About the author  | |  |
Jeanna Matthews received her Ph.D. in Computer Science at the University of California at Berkeley in December 1999. As an undergraduate, she double-majored in Mathematics and Computer Science at Ohio State University. She has been teaching at Clarkson University in the Department of Mathematics and Computer Science since January 2000. Her research interests include file systems, operating systems, networks, and distributed systems. She and her husband Lenny (who helped write this article) have a family farm in Massena, New York. You can contact Dr. Matthews at jnm@clarkson.edu. |
Rate this page
|