Level: Introductory J. Desquilbet, Staff, IBM
05 Feb 2004 from The Rational Edge: Desquilbet applies UML diagrams to analyze the structure of an popular constructed language.
coi. rodo .i mi'e jexOm.
Translation: "Hi everybody, my name is Je´rôme."
Lojban (pronounced "LOZH-bahn")
1
is a constructed language that Dr. James Cooke Brown began developing
in 1955, and development work has continued since then, led by hundreds
of workers and supporters. Lojban is intended for human communication
and perhaps human-machine communication in the future. Its goals are to
be culturally neutral, base its grammar on logical principles and consistent
rules, use phonetic spellings to create unambiguous sounds and words,
and be easy to learn. Lojban has 1,350 root words that can be combined
easily to form millions of different words.
Lojban differs in structure from other languages in major ways and was
developed as a test vehicle for scientists studying relationships among
language, thought, and culture.
This article explains how I used
UML diagrams to help myself understand Lojban.
Constructing languages
Designing a new language for human communication or studying one is
a fascinating activity. In addition to the well-known Esperanto,
2
there are thousands of artificial languages (also known as constructed
languages, or "conlangs"). Some are "art languages,"
used only by their inventors or a special community.
3
Tolkien's Elvish languages
4
and Klingon,
5
spoken by the aliens of Star Trek, fall into the art language category.
Lojban (Lojban means "Logical Language" in Lojban)
began as an experiment to test the Sapir-Whorf Hypothesis, or SWH.
6
(Lojban was called Loglan at that time, in 1955). The SWH has "strong"
and "weak" variants. The strong variant holds that language
actually shapes the way we think and determines what we can think about.
Taken to its most negative extreme, this implies that the limits of the
language we speak are the limits of the world we inhabit. The weaker formulation
posits that the language spoken by a linguistic community has an influence
on the community's culture (i.e., on what the community does and thinks).
I discuss the SWH more thoroughly in the Appendix.
Today, the Logical Language Group working on Lojban has departed from
this original objective and has grown Lojban as an "engineering language,"
a category of constructed languages. Lojban is a beautifully designed
language and does not look like any spoken or written natural language.
My explanations in this article may make it seem more complex than it
really is, because they are very brief and not very progressive. Of course,
an actual Lojban text or dialogue would use many grammatical features
that I won't have space to be describe here. If you are interested in
Lojban, read chapter 2, "A Quick Tour of Lojban Grammar, with Diagrams,"
in The Complete Lojban Language book, (see References) to
get a better feel for the language. You can also check out the documents
and lessons at the www.lojban.org Web site. The Lojban community is really
friendly to beginners; feel free to ask questions on the mailing lists.
Learning Lojban grammatical concepts
Lojban grammar is rather unusual and can best be explained using Lojban
terms. (Note: These terms are always written as invariable nouns; in Lojban
plurals are not denoted with a suffix such as "s.")
A standard Lojban sentence, which is called a bridi, expresses
an idea or assertion. In English, all of the following sentences, although
built from different grammatical entities, also express assertions, which
can be paraphrased into relationships:
1. I am your father [to be + noun] = to-be-father-of with {father
=> I, child => you}
2. You are big [to be + adjective] = to-be-big with {who-is-big
=> you}
3. I go to Paris [active verb] = to-go with {goer => I, destination
=> Paris}
4. I give you this [active verb] = to-give with {donor => I,
gift => this, beneficiary => you}
5. That is green [to-be + adjective] = to-be-green with {what is
green => that}
6. You are a cat [to-be + article + noun] = to-be-a-cat with {what/who
is a cat => you}
|
Lojban pronunciation
| | sounds like | | u | /oo/ as in "look" | | o | /o/ as in "show" | | c | /sh/ as in "show" | | g | /g/ as in "god" | | s | /ss/ as in "sun" (it never sounds like /z/) | | j | /j/ as in the French "bonjour," or /s/ as in "pleasure"
| | ‘ | /h/ as in "hello" | | x | /kh/ as in the Arabic "Khaled," or /ch/ as in the
Scottish "loch," or the German "Bach" |
|
We can translate these English sentences into Lojban bridi:
-
mi patfu do
-
do barda
-
mi klama la paris.
-
mi dunda ti do
-
ta crino
-
do mlatu
Note: Lojban words such as patfu, barda, klama, and so forth
were built algorithmically, using today's six most widely spoken languages:
Chinese, Hindi, English, Russian, Spanish, and Arabic.
In Lojban, a place structure (programmers would say signature) has been
defined for each relationship. The places (programmers would say arguments)
in the bridi have a default order, and each place indicates the
role of a word or group of words. For example, the real definition for
klama (see sentence #3 above) shows five places (or arguments):
"{a goer} comes/goes to destination {a destination} from origin
{an origin} via route {a route} using means/vehicle {a vehicle}"
Hence mi klama la paris. la london. ("I go to Paris from
London") means something different than mi klama la london. la
paris. because the order of the arguments indicates the meaning they
have in the relationship.
A place in the bridi is called a sumti. The centerpiece
of the bridi, called the selbri, expresses the relationship
itself. So, typically, a bridi will have the form shown in Figure
1.
sumti selbri sumti sumti ...
Figure 1: Lojban bridi structure
|
Lojban grammar glossary
| |
Word
|
Definition
| |
bridi
| predicate | |
sumti
| place or argument | |
selbri
| predicate relation | |
cmavo
| structure word | |
gadri
| article | |
cmene
| proper name | |
brivla
| predicate word | |
gismu
| root word | |
valsi
| word | |
lujvo
| compound predicate word | |
tanru
| phrase compound |
|
Lojban word categories
If we look back at the six Lojban bridi in the list above, we
see different kinds of words:
-
mi, do, la, ti, and ta belong to the category of small
grammatical words called cmavo.
- Among these, la is an article (gadri) announcing the
name paris. (A name is called a cmene in Lojban.)
-
mi, do, ti, and ta are sumti cmavo, something
like pronouns.
-
patfu, barda, klama, dunda, crino, and mlatu are all
brivla—that is, words that express a relationship and carry
meaning; these particular brivla are gismu, or root words.
Confused? The UML diagram in Figure 2 helped me make sense of all these
elements.
Figure 2: Categories of Lojban words
More Lojban word categories
As you can see, Lojban does not have categories such as noun, verb,
adjective, and adverb. Instead, it has relationships, expressed in bridi,
with one or more words that constitute the selbri at the center.
In the following bridi
-
do mamta mi ("you are a mother of me"--i.e., "you
are my mother")
and
-
do patfu mi ("you are a father of me"--i.e., "you
are my father")
mamta and patfu play the role of the selbri. They
are different brivla. A brivla is a content word, which
can be:
- a gismu, built into the language.
- a lujvo, derived from a combination of gismu.
- a fu'ivla, borrowed from other languages and adapted to Lojban.
Again, we can use the UML diagram in Figure 3 to understand the category
brivla.
Figure 3: Kinds of Lojban brivla
We have already used some gismu, which are formally defined like
this:
-
patfu: x1 is a father of x2.
-
barda: x1 is big/large in property/dimension(s) x2 as compared
with standard/norm x3.
-
klama: x1 comes/goes to destination x2 from origin x3 via route
x4 using means/vehicle x5.
-
dunda: x1 (donor) gives/donates gift/present x2 to recipient/beneficiary
x3 (without payment/exchange).
-
crino: x1 is green/verdant (color adjective).
-
mlatu: x1 is a cat/ (puss/pussy/kitten) of species/breed (feline
animal) x2.
Here, x1, x2, and so on represent the arguments (sumti) that are
accepted in the predicate (bridi) when these gismu play
the role of a selbri. The arguments are optional, but if present,
their order in the bridi helps us interpret the sentence.
Lojban tanru
A selbri can be also a tanru, which is a metaphor, built
with a set of brivla. Examples are:
-
mi sutra bajra (I am a quick runner / I run quickly / I quickly
run).
-
do barda nanla (you are a big boy).
-
mi dunda patfu (I am the father-who-gives).
wherein:
-
sutra: x1 is fast/swift/quick/hasty/rapid at doing/being/bringing
about x2 (event/state).
-
bajra: x1 runs on surface x2 using limbs x3 with gait x4.
-
nanla: x1 is a boy/lad (young male person) of age x2 (immature)
by standard x3.
Note that the meaning of a tanru may be fuzzy.
In a tanru, the left part is called the seltau; it is
a modifier for the rightmost brivla in the tanru, which
is called the tertau. A tanru has the place structure of
its tertau.
A tanru may be more complex, with more than two brivla.
Complex tanru have a semantical "left-grouping rule"
that can be overridden using the cmavo bo, which acts as a top-priority
operator. For example, with the following additional vocabulary...
-
cmalu: x1 is small in property/dimension(s) x2 (ka) as compared
with standard/norm x3.
-
nixli: x1 is a girl (young female person) of a general age
x2 (immature) by standard x3.
-
ckule: x1 is a school/institute/academy at x2 teaching subject(s)
x3 to audience x4 operated by x5.
...you can build the following complex tanru, used as selbri
in an example bridi that all mean "this is a small girl school,"but
whose meanings are clearer than in the English equivalent:
-
ta cmalu nixli ckule ("left-grouping rule" semantics)
"This is a small girl school."
-
ta cmalu bo nixli ckule (carries the same meaning as above):
"This is a small-girl school—in other words, a school for
small girls."
- ta cmalu nixli bo ckule (carries a different meaning):
"This is a small girl-school -- in other words, a small school
for girls."
You can model a tanru with a variant of the UML Composite Pattern,
as shown in Figure 4.
Figure 4: Lojban tanru basic structure
Do you remember the lujvo, which is a kind of brivla? I
said a lujvo is derived from a combination of gismu. The
Lojban vocabulary is founded on a list of 1350 gismu, and building
lujvo is the only way to extend this vocabulary. A lujvo
is built by contracting a tanru and fixing its meaning (via the
usage context).
Let's consider:
-
gerku: x1 is a dog/canine of species/breed x2.
-
zdani: x1 is a nest/house/lair/den/ for x2.
The following tanru
means "a house that has something to do with some dog or dogs."
It could mean any of the following:
- houses occupied by dogs
- houses shaped to look like dogs
- dogs which are also houses (e.g., houses for fleas)
- houses named after dogs
If you want it to mean "doghouse," you must make the tanru
into a lujvo. That is, you have to combine (affix) two of the rafsi
associated with the gismu in the basic dictionary (I will not describe
the exact rules here).
- the rafsi for gerku is ger
- the rafsi for zdani is zda
To specify "doghouse," we can now build a new word from gerku
zdani, and set its meaning and structure:
gerku zdani is now the veljvo of gerzda.
We can depict the relationship between a lujvo and a tanru
(which has something to do with the rafsi of the participant gismu)
as shown in the UML diagram in Figure 5.
Figure 5: A more complete tanru model
Description sumti
Now let's see how to turn a selbri position into a "description
sumti." All the positions x1, x2, and so on in the previous
examples were filled by pronouns (sumti cmavo), except in one example:
"la paris," which has an article (or gadri): la.
This article turns the cmene "paris" into a description
sumti. There are other gadri to use with a gismu.
Suppose I would like to say "My mother gives the green cat to the
big girl." I would need something to fill the places of "give"
(x1 -- the donor), "what" (x2 -- the gift), and "to
whom" (x2 -- the beneficiary) . The cmavo "le"
automatically assumes the first position in the bridi if it is
followed by a unique brivla or tanru. Combined with "se"
it takes the second position, with "te" the third position,
and so on. For example:
-
le dunda (the donor)
-
le se dunda (the gift)
-
le te dunda (the beneficiary)
-
le mlatu (the cat)
-
le se mlatu (the type of cat)
-
le crino mlatu (the cat that has something to do with green-ness)
So:
-
le mi mamta cu dunda la crino mlatu le barda nixli
My mother gives the green cat to the big girl.
-
le crino mlatu cu se dunda
The green cat is given (to someone by somebody).
The green cat is a gift.
-
le barda nixli cu te dunda le crino mlatu
The big girl is given the green cat.
Somebody gives the green cat to the big girl.
Note: "cu" is a cmavo used to introduce the selbri.
If cu were not included in the first example above, "mamta
dunda" would have to be interpreted as a tanru meaning
something like "a giver which has something to do with a mother,"
or a "motherly giver." So, you need something to separate the
end of the first sumti from the beginning of the selbri:
"cu" plays this role. It is optional when the first sumti
is simple, like a sumti cmavo, but is mandatory when the first
sumti is more complex.
Basically, descriptors are used to turn a selbri into a sumti.
If you study Lojban, you'll see how "events" are used to turn
a whole bridi into a selbri.
In fact, the sentences above are object representations (instances)
of the class diagram shown in Figure 6 (an enhancement of Figure 1). Note
that the selbri and sumti classes have been turned into
interfaces.
Figure 6: Lojban grammatical concepts
Click to enlarge
Why create UML diagrams?
Why did I create all the UML diagrams I show in this article? First,
because I'm a visual learner. When I am learning, it enhances my understanding
if I represent concepts and their relationships through a visual medium.
And second, I created the diagrams because modeling in UML has become
a reflexive activity for me as a software engineer (read more about this
in the Appendix). It is my default method for analyzing and understanding
the structure of a complex system -- such as a language.
Of course, my diagrams represent only a map of the concepts; they still
have a lot of white spaces. But they are the beginning of a domain model,
and I can continue detailing this model as I learn more concepts. The
model I show in this article consists mostly of class diagrams, but of
course I can also do more sophisticated modeling. This will enhance my
understanding, but I also recognize that modeling has its limitations:
Learning Lojban will still be challenging for me, and these diagrams may
not cover all the territory. The same would be true if I were to build
a related application -- perhaps a dedicated, structured editor for
writing and automatically correcting Lojban texts, or a translator, or
a computer-aided tutorial, for example. I might have to build entirely
different models, using only small parts of my original domain model,
depending on the nature of the application and the way I analyzed its
use cases.
For learning Lojban or building an application based on Lojban, as for
any other kind of project, a domain model is a valuable and essential
artifact. However, by definition, such a model doesn't define the project.
ki'e .i co'o
As a parting gift, I'd like to leave you with a survival kit,
packed with a few Lojban words and bridi, just in case you get lost in
Lojbanistan during your next holidays:
-
coi (hello)
-
mi na jimpe (I don't understand)
-
mi xagji (I am hungry)
-
ma do cmene (what's your name?)
-
mi prami do (I love you [use this carefully])
-
ki'e (thank you)
-
co'o (bye)
-
ko ko kurji (take care of you)
7
Appendix: Software engineering languages and the Sapir-Whorf effect
Today, the "weak" version of the Sapir-Whorf Hypothesis--
a given language influences the culture that uses it--is widely accepted.
But I think we are still debating the question raised by the "strong"
version of the Hypothesis: Does our language shape (or limit, or extend)
the way we think? If language is a tool to cut our perceived reality into
slices, do different languages end up with different slices—more
precise in some domains and less precise in others?
People who are fluent in several languages would almost certainly answer
"yes." In certain situations, they all find that the ideas they
want to express are easier to formulate in one language than in the others
they know. For example, in the previous sentence I had to use "he/she"
to be inclusive. However, some languages have a third person pronoun that
doesn't distinguish between genders (e.g., in Lojban, sumti cmavo
do not indicate gender or number). Personally, I believe that a language
reflects both the historical background of a culture and an elaboration
process that never ends. So knowing only a single language can
limit our possibilities for communicating an idea, but as we bring new
ideas into the culture, the language expands to accept and reflect that
idea.
Of course, as Lojban's developers discovered, it is very difficult to
invent a new, culturally neutral language, teach it to people from different
cultures, and then wait to see whether it produces a Whorfian effect.
In a sense, that is what the promoters of UML are trying to do within
the software world. However, UML's inventors and developers understand
that software engineers may share a common "culture" based on
common knowledge, problems, and solutions, but we do not all approach
modeling the same way.
For example, to engage with some colleagues I met at a conference, I
suggested modeling a simple fire alarm system for a house. One person,
who was used to building software for controlling an aircraft engine unit,
treated everything as a control loop, with outputs giving feedback to
modify what to do with the inputs. Another person treated everything as
function chains with filters. Each used a different representation and
a different visual "language," which influenced the way they
acted and thought.
So by extension, does the engineering language we apply limit the solution
space we can explore and determine the solutions we can imagine? In the
words of Eric Steven Raymond, a free software movement theorist, "Software
designs are sometimes restricted in avoidable ways by mental habits a
developer has picked up from a particular language or environment (perhaps
a now-obsolete one) and never discarded."
8
Hence the well-known joke:
"Good FORTRAN programmers can program in FORTRAN with any programming
language."
The UML is an attempt to break through the mental habits and restrictions
of language and environment with visual representations that transcend
them. In software engineering, the UML is today's language of choice for
analysis and design; students now learn it at school. But those of us
who have been in the field for some time began internalizing it as we
first discovered OO programming, then OO design, and then OO analysis.
And as we began creating visual representations of our design and analysis
results, we also experienced a paradigm shift in how we thought about
and implemented engineering practices. "Practicing" the UML
gave us a positive, reflexive approach to solving problems.
Does it also prevent us from thinking about solution paths we might
take to solve a problem? If so, then these exclusions should drive UML
enhancements. In our "engineering culture," as opposed to a
"real culture," we have more freedom to change our languages—and
we can use this freedom to evolve the UML.
Acknowledgments
Catherine Southwood really helped improve the English in this article,
and made many useful suggestions. Many thanks to her! .i ki'e doi.
katrin. And Marlene Ellin did a great job in editing the last version
of this article for The Rational Edge. Many thanks, too! .ije
ki'e doi. marlen.
References
Books
John Cowan, The Complete Lojban Language. A Logical Language.
Group Publication, 1997. See http://www.lojban.org/publications/cll.html
Nick Nicholas and John Cowan, What is Lojban? .i la lojban. mo. A
Logical Language Group Publication, 2003. See http://www.lojban.org/publications/level0.html
Robin Turner and Nick Nicholas, Lojban for beginners. http://www.opoudjis.net/lojbanbrochure/lessons/book1.html
Online articles
Eric Steven Raymond, "Tolkien's Tengwar: A romantic orthography for
Lojban," at http://catb.org/~esr/tengwar/lojban-tengwar.html
"What is Lojban? (and the SWH)," at http://www.lojban.org
Lojban, UML, and the SWH can be found in the Wikipedia, at http://www.wikipedia.org
Other Web sites and articles
http://www.catb.org/~esr/writings/cathedral-bazaar/
Eric Steven Raymond's seminal essay about the open-source hacker culture.
http://www.uea.org
The World Esperanto Association.
"Wanted: A World Language," by Edward Sapir, 1931: http://www.langmaker.com/sapir.htm
Notes
1
See the Lojban official Website: http://www.lojban.org
2
See The
World Esperanto Association: http://www.uea.org
3
See http://www.langmaker.com
about Model Languages & The Art of Language Making (Conlang)
4
See http://www.elvish.org
The Elvish Linguistic Fellowship.
5
See http://www.kli.org
The Klingon Language Institute.
6
Lojban and the SWH, discussions: http://www.lojban.org/files/why-lojban/swh.txt
and a presentation of the SWH and compilation of links: http://www.usingenglish.com/speaking-out/linguistic-whorfare.html
7
ko ko kurji is the same a ko kurji ko
(only the sumti order counts in a bridi, not their absolute
place). ko is the imperative for the English word "do."
From the Lojban FAQ: "'ko kurji do' commands 'Take care of you(rself)'
but 'ko kurji ko' commands both that 'You take care of yourself,'
and 'Allow yourself to be taken care of by you,' with a resulting double
emphasis that indicates an especial priority or responsibility for self-focus."
Jeff Prothero's original thought: http://www.lojban.org/files/papers/4thtense
8
See Eric Steven Raymond's Jargon file extract: http://catb.org/~esr/jargon/html/W/Whorfian-mind-lock.html
and Jeff Prothero's original thought: http://www.lojban.org/files/papers/4thtense
About the author  | 
|  | J. Desquilbet joined Rational France eight years ago after working as a developer for Apex and Ada, using the Booch method. Now, as a software engineering consultant, he enjoys helping clients implement RUP and other IBM Rational tools in a variety of software development environments. Linguistics is one of his special interests. |
Rate this page
|