Level: Introductory developerWorks Power Architecture editors, Editors, IBM
25 Jun 2005 This question and answer article features Arnd Bergmann of IBM: a kernel hacker with the IBM Linux Technology Center, the Linux® on Cell Broadband Engine™ (Cell BE) processor kernel maintainer, and author of the spufs file system.
The Cell BE processor has been hailed as revolutionary. It is often described as "many
processors in one without being multicore." This is more or less accurate.
It really is just one PowerPC® core, surrounded (for now) by eight parallel
"Synergistic" Processing Elements (SPEs, also commonly referred to as
Synergistic Processing Units, or SPUs, though the SPU is actually a part
of an SPE, and the terms aren't really interchangeable). The fundamental
difference between the Cell BE processor and dual or multi-core is that
on the Cell BE processor, only the PowerPC core runs the operating system: the SPEs run
(usually highly vectorized) application code.
developerWorks caught up with Arnd Bergmann, the IBM® Linux™ on the Cell BE processor kernel
maintainer, to talk about the port, about the Cell BE processor and Cell BE-based
"workstations" (which aren't workstations at all) and about programming to
Cell BE -- among other things.
developerWorks:
Can we start with a little bit about you? Can you describe the team that
you're working with, and how you came to be working on this Cell port?
Arnd Bergmann:
I'm in a team that is part of the LTC [Linux Technology Center]
organization in IBM. The LTC does most of the Linux-related work within
IBM.
Most of the team that's working on Cell now started out working on the
zSeries® Linux port. We have some other people joining us that had been on
some other software projects in IBM and we founded this little group
that's doing Linux on Cell work. And then we also work together with some
other groups here in the lab in Böblingen and Mainz -- they are doing the
firmware and
hardware development.
dW:
So you more or less self-organized? And volunteered yourselves to work on
Cell mainly because you wanted to?
Bergmann:
Well, I was offered the chance to start working on Cell, and I thought it
sounded like a great idea.
dW:
And before that, you had been working on zSeries -- is there any
similarity between Cell and the mainframes?
Bergmann:
No, it's quite different.
I mean, Linux kernel work is basically the same on all architectures, be
it zSeries, or Cell, or others. But each of these two has things that
make them very different from all of the other architectures.
They don't really have much in common.
The Cell processor
dW:
Could you talk about some of the things that do make Cell so different?
Many of our readers will have been keeping up with Cell news, but I think
there are a lot of people who haven't been, necessarily. How would you
describe Cell to somebody who hasn't been following along?
Bergmann:
Well, the Cell processor on the surface is just a regular PowerPC
processor like others that we have seen.
But there is one thing that makes it very special, which is the fact that
it actually contains multiple small processors on one
processor die -- we now have eight of those so-called
synergistic processing elements [SPEs], which are very high-performance, optimized
all-arithmetic elements -- that act as stand-alone processors and can talk
to each other very fast.
They are all orchestrated by one core processor, which is actually
comparable to a PowerPC running a normal operating system like Linux. The
eight SPEs are not running Linux kernels, just application code.
dW:
So the SPEs are the main thing that make it so revolutionary.
Bergmann:
Yes.
dW:
And they also are what make it so good for game programming, and for
scientific applications?
Bergmann:
Yes. In games, it's mostly about computational work doing vector
operations: floating point operations and the instruction set that is used
on the SPUs are optimized for vector operations. So every single
instruction that is executed there actually behaves like a vector
operation and can always work on a 128-bit word in one cycle, or in one
instruction. In the 128 bits, for example, you could have four
single-precision floating point operations at the same time that are done
by one processor instruction.
dW:
How close to VMX instructions are the SPU instructions?
Bergmann:
Some of the instructions are very similar; and the SPUs have some
additional instructions that go far beyond what you can do with VMX. And
there are a small number of instructions in VMX that don't have a
counterpart on Cell.
The biggest difference is that the SPUs don't actually use PowerPC
instructions. The instruction format itself is different, even though the
assembler or the C intrinsics might be very similar.
dW:
So, are VMX opcodes and SPU opcodes the same? Will VMX optimized code
have to be rewritten to take advantage of SPU?
Bergmann:
Unlike VMX code, the SPU has a complete instruction set to execute
self-contained binaries. The opcodes that are used are very different on
the assembly level but can be used in similar ways from high-level
languages.
dW:
So then the compilers have to be rewritten to include the opcodes for
the SPUs, is that correct?
Bergmann:
Yes. We have a special GCC port, and you can have other compilers as
well that are targeted at the SPU, instead of targeted at the host
instruction set. So if you want to compile the program for the SPU, you
need to use this cross-compiler.
The Cell "workstation"
dW:
It has been said that Cell is supposed to scale to -- everything from
very tiny devices, up to huge supercomputers. What about right in the
middle? Is there any point at all in having a desktop Cell machine?
Bergmann:
For the regular desktop user, I think there will not be much difference,
because you get all the benefits only if you write your applications
specifically for the Cell processor. That's usually done either if you
have just a very limited number of applications on the machine, like you
have on embedded systems, [where you want] for instance to have MPEG
encoding/decoding right there -- or in the supercomputer world, where you
write your applications to run on the computer once.
dW:
So all of the people who are hoping for a Cell computer at home are
probably going to be disappointed.
Bergmann:
Well, if they are programmers, they could of course write their own
applications. And we might see some applications that are optimized for
Cell. For example, you could have video editing software that's optimized
for Cell, or some image manipulation. If you just want to run standard
applications, it probably will not help you a lot.
dW:
Well, but it wouldn't harm you either, would it? To run workaday
applications like e-mail or Web browsing -- there wouldn't be any
performance penalty, would there be?
Bergmann:
No -- you'd just spend much more money for the processor itself, I guess.
[laughter]
dW:
So for ideal use, you would really want it for stuff like fluid dynamics
or plasma dynamics, where you're doing millions of -- no pun intended --
particle in cell calculations, where you're wanting to highly vectorize
everything. Is that correct?
Bergmann:
Yes.
dW:
And one news report I saw said that you would be premiering a Cell-based
workstation at LinuxTag. Can you describe what it is people will be
seeing at LinuxTag?
Bergmann:
I guess there was some misunderstanding.
It isn't so much a "workstation" as a Cell-based Blade server -- a
prototype board. The same, in fact, as the one that was shown at the E3
conference.
dW:
Yes. I saw pictures of that. So it's not really -- other than being
Cell-based and running Linux, it's not really related to the workstation
that's coming out later this year.
Bergmann:
I don't know if there are any plans about the workstation itself -- that
might be the same misunderstanding.
dW:
Oh. Well, I know the early articles about it said that there was going to
be a workstation for game developers -- and I know that in the case of
games, sometimes a board is called a workstation... So it sort of sounds
like that might be it. Do you know anything about plans of IBM offering
Cell-based machines other than the Blade?
Bergmann:
No I don't know anything about other product plans, sorry.
dW:
But we will be having a Cell-based Blade, if nothing else?
Bergmann:
Well, this is the one thing that I'm working on, and I don't know about
the availability for customers at all.
What we are working on now is really a technology study. And people
who are interested in that can ask the IBM Engineering and Technology
Services [ET&S] about
how to evaluate this technology for their purposes.
But as far as I know, there will not be a Cell-based IBM server product.
dW:
Really? Do you have any idea why that is?
Bergmann:
No.
[laughter]
dW:
Would you say -- in your opinion, do you think we should?
Bergmann:
I think there is a huge market and huge demand for those, yes.
The Linux port so far
dW:
Okay... So, well -- on to the kernel. Can you tell us a bit about your
work porting the Linux kernel to Cell?
Bergmann:
Yes. The work that we have done is comprised mainly of three parts.
The first part is getting Linux to run at all on the Cell processor, and
on the board that we have -- which means we had to add some parts to the
PowerPC architecture code, for example, for the CPU ID and interrupt controller to work with
the basic architecture support.
Most of the groundwork for that was done
by the Sony/Toshiba/IBM Design Center (STIDC) in Austin, Texas.
I've started to make this work more compatible with the standard kernel,
so that it fits in a way that gets integrated into current 2.6.13 or
what have you.
The second part is the device drivers for our actual prototype port, so
we have an Ethernet controller and some other hardware that needs device
drivers. That work was mostly done by people here on my team, but not by
myself.
The third part is the exploitation of the SPEs themselves.
That's where the SPU file system that has been described already (see Resources) comes in, and that work is still ongoing. There were some
earlier attempts at a model, but it looks like the SPU file system is the
final model now.
dW:
Can you describe some of the alternatives that you went through and
rejected?
Bergmann:
Okay. The first approach that I know of was to have a device driver
interface, where you have character devices and access each of the
physical SPEs through one device node. But that's a rather clumsy
interface: it's impractical to virtualize that (so that you can have more
applications using the SPEs, than you have physical SPEs available in the
system).
And performance is likely to suffer.
Another attempt was to test a low-level system call interface. For
that, it turned out that you need a large number of different system calls
to get all the functionality that's required to have full application
support.
The problem with that interface was that system calls are always hard to
get into the kernel -- because it's very hard to get them right the first
time. You can never change them if you get them wrong, and there were
just so many system calls that we needed.
dW:
So how many system calls are you ultimately adding?
Bergmann:
With the SPU file system [spufs], I'm adding one new call -- and we're
still discussing whether that would actually be a system call, or an
ioctl, or some other method using simply read and write interfaces.
The original approach with the system call interface would have added
dozens of system calls, though we don't have an exact number because it
was just experimental.
dW:
I do have one follow-up question on the discarding of the device driver
model. Was that a hardware limitation, or -- might something along the
lines of a descriptor-based DMA (DBDMA) have helped to alleviate some of
the issues involved?
Bergmann:
I think the biggest point against the device driver was that this is not
exactly a device that you can drive, but something more substantial, more
integrated into the CPU itself. So a device driver would not be the
correct abstraction.
It's common in Linux to add new functionality via a virtual file system.
dW:
So that's more of a Linux philosophy is what you're saying, but other
operating systems which have a different approach might actually implement
a different model.
Bergmann:
Yes.
If you were writing a kernel specifically just for the Cell processor,
it might be a good choice to use a system call interface.
dW:
Have you yet had a chance to do any benchmarks on it?
Bergmann:
I haven't done any benchmark myself, no.
And I don't know of any others who have done, so I can't tell you
anything about that.
dW:
But when you have -- you do have it running so that it's taking full
advantage of the SPEs right now?
Bergmann:
We have the file system running and we can run applications on it, yes.
And we have done some demos, which were mostly based on one of the earlier
models.
dW:
So does it "feel" really fast? What does it feel like to work on it, and
how do you like it?
Bergmann:
I don't have -- we always use it over the serial
console, which is -- very slow if you type, but [if you] actually try to
run an application using the SPEs then suddenly, it feels really fast.
[laughter]
dW:
But you are self-hosting it? Can you self-build?
Bergmann:
We can, but we don't usually do that, because we have the development
environment on our regular workstations, and then then we access the Cell
machine when we need to run or debug something.
dW:
Is that a planned milestone? self-hosting?
Bergmann:
It would work right now -- we've tried it, and it works fine.
It's just that we don't have so many machines, and we have to share them.
dW:
Are the host machines PPC?
Bergmann:
[laughs]
Yes, we use JS20 Blades and other PPC970-based workstations.
The Linux on Cell team
dW:
You mentioned earlier, working with Sony and Toshiba. Your group there
in Boeblingen -- you are working on the Linux port, and also -- GDB? And
OProfile? Is that right? How is the work split up, can you talk a bit
about who is working on the different pieces right now, and how you work
together?
Bergmann:
Sony is working on the tool chain itself, like GCC and binutils. and we
have someone here who's also doing some work on those.
Most work on the kernel and GDB has been done by IBM in my team and
the STIDC.
Then we have some people doing library support here.
dW:
And do all these keep in touch with e-mail and just using regular means
-- or do you have any special way that you work together?
Bergmann:
It's usually by e-mail and, inside of the Linux team, we use IRC on the
public channels.
dW:
Are there any members of the open source community outside of the three
companies that are involved at this point?
Bergmann:
At this point, it's very hard, but we want to go there very soon, once
all of the stuff that is needed is available. For instance, the GCC patch
and documentation need to be published first, because those will really be
the pieces that people outside of the company need to even get started.
Open source Cell, GPL, and other licensing
dW:
Speaking of the open source community, IBM very recently made an
announcement that they're going to open-source the Cell specifications.
There haven't been a lot of details about that. Have you been involved
with that, or do you know anything more about it?
Bergmann:
No, not really.
dW:
Okay. But for the software that you're working on -- obviously, most of
it is GPL'd, so it will be released as open source.
Bergmann:
Yes.
dW:
Do you know if there are any plans to dual-license that code, so that BSD
(and other) developers can also take advantage of it? Or choosing a
license like the one that was attached to the recent SLOF download on
developerWorks?
Bergmann:
As far as the kernel work goes, that will obviously all be GPL -- we
don't plan to make that available under any other license.
But the library parts that are used by userspace applications will need
to have a more permissive license like a "BSD-like" license or the LGPL,
so people can use our libraries without open-sourcing their own code.
dW:
I know that the kernel does obviously have to be licensed as GPL, but is
there a reason we can't dual-license it? I would imagine that some of the
things that are in there would be of use to BSD developers as well?
Bergmann:
Most of the code that I've done is very Linux-specific and wouldn't be
useful in other architectures. And for some parts, of course, I hooked
into some other code that's already licensed under the GPL, so it's hard
to split those parts up from the parts that I purely wrote myself.
dW:
So the spufs itself is going to be GPL'd?
Bergmann:
... GPL, yes.
dW:
And that's because it is linked with already GPL'd code.
Bergmann:
Yes
Programming the Cell: ISA, SPUs, and APIs
dW:
I know that the Cell is based on one of IBM's PowerPC cores, but I don't
know which one. Is it the 970, or is it a secret to say which one it is?
Bergmann:
As far as I know, the PowerPC core in the Cell was designed specifically
for this processor. It has not been used before that in anything else.
dW:
Cool. But the LinuxTag paper does say that it's compatible with the 970
instruction set.
Bergmann:
Yes. It uses the POWER4™ instruction set, with
VMX extensions, like in
the 970. There are small differences on the kernel side, but those aren't
really interesting to most users.
dW:
You'd be surprised how much detail readers are interested in. Do you
mind going into them?
Bergmann:
Not at all. Mainly, the differences are the numbers of special-purpose
registers and the layout of some of the memory mapped registers. The
interrupt controller is also different.
And another important difference is that Cell uses SMT to
run two threads on one core.
dW:
How similar to the 970 is the Cell, as far as like the General Purpose
Registers and the Special Purpose Registers?
Bergmann:
I don't really know much about the 970, but as far as I know, it's mostly
compatible.
dW:
So is it still 32 General Purpose Registers?
Bergmann:
Oh, yes -- sure. Those are obviously all the same.
dW:
Because of the basic PowerPC spec, is that right?
Bergmann:
Yes.
dW:
So then, if you were to program just for the PPE part of it, you're
pretty compatible across your entire PowerPC line. Is that correct?
Bergmann:
Yes. Of course, you could have GCC extensions to have code optimized for
one CPU or the other.
dW:
Right. Or for the SPEs, or Cells?
Bergmann:
For the SPE, you need a separate back end for the compiler for it. With
PPE, you just need the optimizations if you want them.
dW:
Do you know of any plans for a "virtual machine" -like environment or
an emulator for Cell, for those who are waiting with fingers crossed to
get a Cell down the line, so that they could be getting some development
done now?
Bergmann:
I don't think I can make any official statement about that.
[laughter]
dW:
All right. So as far as programming for the SPEs -- IBM and I think Sony
have both said publicly that the Cell is going to be very easy to program
to. Does that mean that those kinds of things can remain hidden from the
programmer and be optimized by the compiler without the programmer ever
even having to think about it?
Bergmann:
No, not really. There's some possibility that you can have libraries
that hide the interfaces. For example, you could have a library doing
Fourier transformations or doing some MPEG encoding/decoding that's using
either the PowerPC code or the SPU code, and the user would just call the
library interfaces. But someone has to write those libraries as well, and
those people have to think about how to map the code onto the SPE.
dW:
Is there a chance that the API for those library codes might be
something like the Message Passing Interface (MPI), or something that is
already an industry standard?
Bergmann:
Yes. We are interested in making it a standard interface -- we always
try to stick to already well-established interfaces when we can. For
instance, the API they are using to create threads on one SPE is very
similar to the pthreads API. And yes, there has been some discussion on
having MPI ported to the SPEs themselves.
More on Linux: 32-bit kernels, 32-bit apps, and kernel binaries
dW:
Now, going back again to the LinuxTag paper, it said that there won't be
support for 32-bit Linux -- but what about 32-bit applications? Will it
support those in the same way as the 970 does?
Bergmann:
You can always have applications in both 32- and 64-bit. The only
restriction is that you cannot have a 32-bit Linux kernel running on the
Cell processor. The kernel is always 64-bit, but you can use all the
32-bit applications.
dW:
And eventually -- this work that you're doing will eventually merge back
into the main kernel, right?
Bergmann:
Yes. I'm hoping for merging this into the next kernel release -- the
2.6.13 kernel release, because 2.6.12 is already out now.
dW:
So that's really soon, actually.
Bergmann:
Yes, but -- this includes only the architectural support for running
Linux on our boards. It doesn't include the SPU file system, because
there needs to be some more discussion about that.
dW:
All right. And ultimately, once everything is merged in, in a few months
or a year -- do these kinds of architecture-specific code patches and
contributions end up having any impact on the kernel as a whole, or on the
Linux ... project as a whole?
Bergmann:
I think the only impact will be the size. We add some code, which, of
course, contributes to the size of the software tree. And if you enable
the code into ... your kernel binary, for example -- the code is designed
to let you have a single kernel binary that can run on a pSeries® and on
Power Macintosh and Cell processor. So enabling one more architecture will
increase the size of the kernel binary and eat up some small fraction of
the system memory.
dW:
So in an embedded application, you are unlikely to run a universal kernel
binary like that, correct?
Bergmann:
Exactly. You would only enable the Cell part in that.
dW:
And disable the other parts, and build specifically for the Cell in a
memory-constrained environment.
Bergmann:
Yes. But all the distributions can just enable the Cell part for the
kernel, and then it
will run on all PowerPC 64-bit machines.
dW:
All right. Do you have any favorite articles or resources about Cell
that you'd recommend to readers?
Bergmann:
Not a specific one. There was some pretty good media coverage about the
ISSCC conference and E3 conference, and I think those are usually pretty
well-informed compared to what else is there on the Internet.
dW:
Thank you so much for taking the time to join us today -- we really
appreciate it.
Bergmann:
Thank you.
Next time, Meet the Experts will talk with Segher Boessenkool, the author
of SLOF (Slimline Open Firmware). Please send questions
you have for Segher to the developerWorks Power Architecture
editors. We'll include those in the next interview, or -- if you know
of someone you'd like to see profiled, or if you have questions on another
Power Architecture-related topic, please feel free to send those as well, and we will
try to line up the right person or people to answer them in a future
Meet the experts.
Resources -
The LTC doesn't currently have a home page on the IBM Web site, but you can
learn more about the group that does most of the Linux within IBM in IBM:
LTC core to Linux development (LinuxWorld, 2004).
-
Arnd has written a paper for presentation at this week's LinuxTag 2005
that describes the spufs SPU file system and the Linux for Cell
programming model in detail. A slightly expanded version of that paper has also been posted here on developerWorks.
-
Arnd says that some of the best coverage on Cell was published just after
the ISSCC and E3 conferences: articles like ISSCC
2005: The CELL Microprocessor and Jon "Hannibal" Stokes' two-part Introducing
the Cell Processor series at Ars Technica. The ISSCC papers
themselves, as well as two Microprocessor Report studies, have been
publicly posted to the Cell
section of the IBM Microelectronics Technical Library.
-
The 2005 E3 conference saw the launch or media debut of three new game
consoles from Sony, Nintendo, and Microsoft® -- all of which are powered by
Power Architecture™ processors (The Sony PlayStation3 by Cell itself). Find
a roundup
of E3 Power-related coverage and related stories in the June 8 issue
of the developerWorks Power Architecture Community (PAC) Newsletter
calendar section.
-
See a picture
of the prototype Cell board -- affectionately known in some circles as
"frisbee" -- as demonstrated at the 2005 E3 conference.
-
Find more good links on Cell -- and a detailed block diagram of the Cell
processor -- at the IBM
Research CELL Architecture pages.
-
The 2005 IEEE paper, Power
Efficient Processor Architecture and The Cell Processor authored by
H. Peter Hofstee of the IBM Server & Technology Group (and one of Cell's
architects) is available in PDF.
-
The Wikipedia
entry on the Cell processor says that the Cell resembles a modern
desktop computer on a single chip. Read the history, an explanation of
SPEs and SPUs, and find more good
links.
-
Sony this year published a paper on Programming
CELL. An outstanding resource, it also presents some alternative
models for programming Cell. In particular, slide 24 offers a good
overview of Cell programming models that are explored in subsequent
slides.
-
The next Meet the Experts will speak with Segher Boessenkool, the author
of SLOF
(Slimline Open Firmware).
- Have experience you'd be willing to share with Power Architecture zone
readers? Article submissions on all aspects of Power Architecture technology from authors inside and outside
IBM are welcomed. Check out the Power Architecture author
FAQ to learn more.
- Have a question or comment on this story, or
on Power Architecture technology in general?
Post it in the Power Architecture technical forum
or send in a letter to the editors.
- The Power Architecture Community Newsletter includes full-length articles as well as recent news about members of the Power Architecture community and upcoming events of interest.
Learn more
about the Power Architecture Community Newsletter and how to contribute to it. Subscription is free.
- All things Power-related are chronicled in the developerWorks Power
Architecture editors' blog, which is just one of many developerWorks
blogs.
- Find more articles and resources on Power Architecture
technology and all things
related in the developerWorks Power
Architecture technology zone.
- Download a IBM PowerPC 405 Evaluation Kit to demo a SoC in a simulated
environment, or just to explore the fully licensed version of
Power Architecture technology. This and other fine Power Architecture-related downloads are listed in
the developerWorks Power Architecture technology zone's downloads section.
About the author  | |  | The developerWorks Power Architecture editors welcome your comments on this article. E-mail them at dwpower@us.ibm.com. |
Rate this page
|