Simple English wikipedia

An xkcd comic led me to the Simple English Wikipedia. This wikipedia aims to provide simplified versions of articles from the Ordinary English Wikipedia by limiting the vocabulary used, grammar complexity, and sentence length. I admire the motivation behind this resource: to make general knowledge accessible to non-native speakers, youthful readers, or those with disabilities. Yet to an adult native English speaker, the language of these articles can be gratingly unaesthetic (and imprecise). For example, consider this excerpt from the page on Mars:

The planet Mars is made of rock. The ground there is red because of iron oxide (rust) in the rocks and dust. The planet has a small carbon dioxide atmosphere. The temperatures on Mars are colder than on Earth, because it is farther away from the Sun. There is some ice at the north and south poles of Mars, and also frozen carbon dioxide.

This is all factually accurate, but achingly simplistic (especially the “because it is farther away from the Sun” statement — the atmospheric composition is also a critical player). On the other hand, if I had to read wikipedia in, say, French, I would no doubt appreciate the simplicity!

But I experienced even more wincing when reading pages about topics from Computer Science, such as the neural network page, half of which consists of:

What is important in the idea of neural networks is that they are able to learn by themselves, an ability which makes them remarkably distinctive in comparison to normal computers, which cannot do anything for which they are not programmed.

(Technically, they don’t learn by themselves — they require supervision in the form of labeled examples — and any machine learning method exhibits the learning property, not just neural networks, and what is a “normal computer” anyway? A neural network is an algorithm for learning a model, not a special-purpose computer. Finally, even neural networks cannot do anything for which they are not programmed! More accurate: “Neural networks can learn from examples, allowing them to make predictions about objects they may never have seen before (generalize).”)

Or consider this part of the page on Computer Science itself:

A computer is a device which takes orders as fast as you can give them to it and works as fast as it can to solve the orders.

(makes a computer sound like an active agent (e.g., waiter), which it isn’t) or from Computer programming:

The instructions in “machine form” are usually in a .EXE file (which is called an executable, because it can be executed). These machine-instructions will by default open a black “command-prompt” window, but can open games as well as other things.

(Well, am I really surprised that “simple” computer programming has such a strong Windows bias? ;) )

There’s an interesting issue at the heart of this project: how do you talk simply without talking down? (Or worse, misleading the reader!) Clear, simple language has real value even outside of this venue. However, translating all value judgments into the simple words “good” or “bad” not only gives the text a childlike sound but also gives its meaning a childlike interpretation, and important distinctions may be lost.

I actually find this wikipedia harder to read, not easier; the stilted sequences of simple sentences dominate my attention with their awkward rhythms and unanticipated gaps (likewise, you may have found my alliteration distracting :) ). Good writing blends its details in to the background and leaves you room to think about the ideas being presented. But yes, I know: I’m not the target audience for this product. I expect that many people are benefiting from much of the information in the Simple English Wikipedia. Hopefully they also get a chance to dig deeper for the real details on their subjects of most interest or need.

Sparklines

I am a latecomer, it seems, to some of Edward Tufte’s brilliant ideas. Today I stumbled across the sparkline, a “small, intense, simple dataword” (Tufte) that is best illustrated by example. Sparklines permit you to display a large volume of numeric information in a very tiny space, while conveying the information perhaps even more effectively than if you’d used a large floating figure or a table. Tufte posted a sparkline introduction, which is an excerpt from his book Beautiful Evidence. Many others have followed up with their own sparkline creations, sparkline generators (in perl, HTML, Excel, and even special font encodings), and sparkline critiques.

One tip I particularly liked had to do with aspect ratio; Tufte suggests (after William S. Cleveland, 1993) adjusting the vertical scale so that slopes are about 45 degrees (rather than very flat or very spiky). While this is simply a rule of thumb, to be violated if the situation calls for it, I did find his examples to be compelling; “lumpy” data does seem to be easier to visually process than “spiky” data.

But what really took my breath away was this particular example:

This is cited by Tufte as appearing in Robert Sedgewick’s 1998 “Algorithms in C”. It illustrates several passes of mergesort being applied to a 200-item list. It is absolutely brilliant! Whoever thought of visualizing the values (sort keys) as the angle of the lines was absolutely inspired. This graphic stole my attention as only a true work of art can. I’m still staring at it in fascination.

I’ll have to be on the lookout for places where sparklines could be the right solution in my next technical paper.

Pleasurable sentences

“Why should a sequence of words be anything but a pleasure?”
— Gertrude Stein

This quote was cited by Dr. Brooks Landon in the very first lecture of my new class, Building Great Sentences: Exploring the Writer’s Craft. He conducted an interesting exploration of the quote and its possible meanings, and then assigned me homework: to list a few sentences that I find pleasurable. What a delightful idea! However, I quickly found that I had trouble separating my emotional response, or affinity for the sentiment expressed by a sentence, from the sentence itself giving me “pleasure” — but then I realized that that was one of his points in the lecture, that he feels you actually can’t separate a sentence’s form from its meaning, and the choice of phrasing does, in fact, alter the message being communicated.

After some musing, here’s what I came up with:

“I cannot read the fiery letters,” said Frodo in a quavering voice. — J.R.R. Tolkien
This sentence is so visually evocative for me that it can’t help being a pleasure. Frodo’s phrasing is also perfect (“cannot” instead of “can’t”, and I just love the term “fiery letters”).
Education is not the filling of a pail, but the lighting of a fire. — William Butler Yeats
This sentiment really resonates with me, and I think Yeats hit on an eloquent pair of analogies to express it. I also like the parallel construction in the two clauses.
The moment one gives close attention to anything, even a blade of grass, it becomes a mysterious, awesome, indescribably magnificent world in itself. — Henry Miller
How true it is. And each time I read this sentence, I find myself briefly lost in imagining the world inside a blade of grass — a neat magic trick executed by Mr. Miller!
Letter writing is the only device for combining solitude with good company. — George Gordon Byron
I have many quotes on the merits of solitude. This is one of my favorites, and it evokes many happy memories of lounging on my porch, writing letters to distant friends, and feeling as if they were sitting with me, sharing deep conversation in person. I like this sentence for its juxtaposition of apparent antonyms, which yields insight into how they can, in fact, be combined.
Friendship is born at that moment when one person says to another: What! You, too? Thought I was the only one. — C.S. Lewis
Again, this one really resonates with me. But I also really like his phrasing, which ably captures the combined feelings of surprise and delight when you strike up a new friendship.
May serendipity enter your door without knocking (but also without using a crowbar). — Elizabeth Vaughan
This sentence was occasioned by a real experience with a real crowbar and my own real front door, but I think it stands quite well on its own even without that memory. The roughness of “crowbar” coming as it does at the end of an otherwise smooth and graceful sentence is like a sudden exhalation that always makes me chuckle.
And a sweet and powerful positive obsession blunts pain, diverts rage, and engages each of us in the greatest, the most intense of our chosen struggles. — Lauren Oya Olamina (Octavia Butler)
It’s always a pleasure when someone provides at least the appearance of a justification for our obsessions.

What sentences bring you pleasure?

Meanwhile, I’m eagerly anticipating the remaining 23 lectures in this series. Thank you, Teaching Company!

Emacs Powertools and Opinions

I recently came across Steve Yegge’s emacs tips. He’s a man after my own heart:

“Using the mouse is almost always the worst possible violation of economy of motion, because you have to pick your hand up and fumble around with it. The mouse is a clumsy instrument, and Emacs gurus consider it a cache miss when they have to resort to using it.”

Cache miss! I love it.

His article is long, detailed, opinionated, and a goldmine of useful emacs nuggets. It will only be interesting to you if you are at least an intermediate emacs user and, say, you agree with the philosophy noted above. Here I include some of my own tips and commentary on his post.

Steve suggests mapping the CapsLock key to be Control, which is sensible in an emacs world, since you use Control orders of magnitude more often than Caps Lock, and ergonomically CapsLock has the more desirable position. But I’ve always been reluctant to do this, because it’s non-standard, and once you’ve adapted to the unusual layout, you render yourself frustrated and mistake-ridden when using anyone else’s keyboard.

Navigation: I already use Ctrl-s (search forward) and Ctrl-r (search backward) liberally to move around in documents (I agree that this is very handy!) and I do use temp buffers for notes — or rather, only one temp buffer. I only use *scratch* for this, because I like being forcibly reminded that there is no auto-save going on and anything I put there won’t be saved unless I explicitly do so (otherwise my compulsively periodic saving goes on as a background processes, in addition to emacs’s autosave, and I generally assume that anything in an emacs buffer has a good dose of permanence). However, I’ve rarely used the regexp search commands, because it was so tedious to type Meta-x-isearch-forward-regexp. Happily, in this article I learned that there are bindings to be had: Ctrl-Alt-s and Ctrl-Alt-r, respectively. (Actually, on my Mac they are Ctrl-Apple-s and Ctrl-Apple-r.) I agree that this is a spectacularly awkward binding to type, but I don’t use it that often (yet) so I’m willing to try it out before remapping it.

The other super-useful navigation bindings that I regularly use (not mentioned in his article) are Ctrl-[ and Ctrl-] to jump to the top and bottom of the file, respectively.

Buffers and windows: I already use most of the buffer and window management commands he listed (splitting, swapping, closing, and listing buffers) but did learn a new one: Ctrl-x + to balance window sizes. Cool!

GUI stuff: I don’t use the menu bar in emacs (sometimes it’s there, sometimes not; I generally ignore it either way). I do, however, like having a scroll bar. I don’t use it much (again, I dislike using the mouse), but it gives a nice visual indication of where I am in a file and how big it is. Yes, the status bar will tell me that I’m 61% through the file, but the visual is much more intuitive, especially because it shows me how much of the file, proportionally, is currently shown in the buffer window. Steve backs off from his initial recommendation to get rid of the scroll bar with an interesting analog/digital argument.

Steve cites “region selection” as a case where the mouse is actually helpful in emacs. I’ll agree, in the case of rectangular region selection (I don’t even know how to do that with the keyboard, much less efficiently), but I do normal region selection all the time with the keyboard. I set the mark (Ctrl-space), move to the end, and then do whatever I needed to do with the region: copy (Meta-w) or cut (Ctrl-w), usually. He seems to view the “move to the end” as the slow bit, but with Ctrl-s (search forward), as noted above, generally this is speedy as well.

Help: I agree that Meta-x-describe-bindings can be a fascinating (and revealing) exploration of the current mode and the powers available to you. I’ve used this in bibtex mode to good effect.

Query-replace: This is one of my favorite bits of emacs power. I love that I can so easily specify text to find and replace, and do it interactively, skipping from instance to instance and individually saying yes or no, or opting to just say yes to all such occurrences (with a !).

Other tidbits I gleaned from this article are

  • Meta-b to skip back a whole word. I can’t believe I never knew this one. I imagine then that Meta-f goes forward a whole word. Hey, it does! I also use Ctrl-t regularly to swap two characters (usually when I typed them in the wrong order), and sure enough… Meta-t swaps two words. Whee!
  • M-x list-matching-lines: this one blew my mind. It takes in a regexp and shows you every line in the buffer that matches it (like grep -n -e on the command line, but hey, you’re still in emacs, and if you go to one of those entries and hit return, your cursor jumps to that position in the buffer).

Any additional tips are welcome!

List Formatting (It’s for Political Candidates, too!)

I’m having a ball listening to Grammar Girl’s podcast. Much of what she covers is already a part of my internal writer/editor/proofreader, but she has a nice, conversational approach that’s enjoyable regardless. Sometimes the episode I happen to listen to is particularly timely, as happened today with Formatting Vertical Lists. Useful tips I gleaned include

  • Don’t use a colon to introduce a list, unless the introduction is a complete sentence. (I think I’ve been violating this one for a while. But see how I reformed my ways in this post!)
  • Capitalize and punctuate list items if they are complete sentences.
  • Use bullets for unordered items, numbers to indicate sequence (steps), and letters to indicate choices or labels you can refer back to later.
  • Use parallel construction.

So, what was timely about this episode? Yesterday I had the pleasure of reading through candidate statements on my sample ballot. One in particular stood out for its egregious abuse of punctuation, overuse of capitalization, and general incoherence. Here is a verbatim excerpt that highlights multiple violations of Grammar Girl’s list formatting recommendations:

Donald Williamson Has The Experience and ECONOMIC RECOVERY PLAN to:
Stop Foreclosures! Help Families Save Their Home! (Recast Loan).
Cut State of California’s Dependency on Foreign Oil.
Wording on Reducing Gas Prices.
Balances California’s Budget: Cuts Fat-Waste, Stops Excessive Spending, No New Taxes!
Establish Health Care – Pharmacy Plan For All Californians.
Economic Recovery Plan: Creates Business, Jobs, Reduces Unemployment.

Oh, where to start? In terms of formatting, Donald has forgotten to use any sort of bullet at all for his list. He included the leading colon (I’m willing to forgive this one). He bolded and underlined each initial word, which only makes the lack of parallelism more grating. He would also have us believe that all of California’s families live in a single home. The vacuous content is even more alarming. I’m still utterly puzzled by #3 (“Wording!”). I’m not sure what “Fat-Waste” actually is, but it sounds pretty gross. Thank goodness Donald’s there for us with his blizzard of buzzword “solutions”, up to and including “No New Taxes!”

I read on, eager to learn how he was going to “recast” loans, do something to gas prices, provide a “health care – pharmacy” plan for all Californians, create “business” and jobs—all without introducing new taxes! Sadly, no details were forthcoming. Nor is any additional content available on his website, which features two bonus non-parallel lists, more families living in that single home, use of “that” instead of “who”, “receive” misspelled, etc.

The saddest part? Donald Williamson cites himself as an “Educator.” America’s future is in his hands, even if he loses his bid to represent the 59th District in the State Assembly. Please, Donald, next time devote just a few dollars to a good proofreading of your campaign materials. If nothing else, it sets a good example.

« Newer entries · Older entries »