Simple English wikipedia

An xkcd comic led me to the Simple English Wikipedia. This wikipedia aims to provide simplified versions of articles from the Ordinary English Wikipedia by limiting the vocabulary used, grammar complexity, and sentence length. I admire the motivation behind this resource: to make general knowledge accessible to non-native speakers, youthful readers, or those with disabilities. Yet to an adult native English speaker, the language of these articles can be gratingly unaesthetic (and imprecise). For example, consider this excerpt from the page on Mars:

The planet Mars is made of rock. The ground there is red because of iron oxide (rust) in the rocks and dust. The planet has a small carbon dioxide atmosphere. The temperatures on Mars are colder than on Earth, because it is farther away from the Sun. There is some ice at the north and south poles of Mars, and also frozen carbon dioxide.

This is all factually accurate, but achingly simplistic (especially the “because it is farther away from the Sun” statement — the atmospheric composition is also a critical player). On the other hand, if I had to read wikipedia in, say, French, I would no doubt appreciate the simplicity!

But I experienced even more wincing when reading pages about topics from Computer Science, such as the neural network page, half of which consists of:

What is important in the idea of neural networks is that they are able to learn by themselves, an ability which makes them remarkably distinctive in comparison to normal computers, which cannot do anything for which they are not programmed.

(Technically, they don’t learn by themselves — they require supervision in the form of labeled examples — and any machine learning method exhibits the learning property, not just neural networks, and what is a “normal computer” anyway? A neural network is an algorithm for learning a model, not a special-purpose computer. Finally, even neural networks cannot do anything for which they are not programmed! More accurate: “Neural networks can learn from examples, allowing them to make predictions about objects they may never have seen before (generalize).”)

Or consider this part of the page on Computer Science itself:

A computer is a device which takes orders as fast as you can give them to it and works as fast as it can to solve the orders.

(makes a computer sound like an active agent (e.g., waiter), which it isn’t) or from Computer programming:

The instructions in “machine form” are usually in a .EXE file (which is called an executable, because it can be executed). These machine-instructions will by default open a black “command-prompt” window, but can open games as well as other things.

(Well, am I really surprised that “simple” computer programming has such a strong Windows bias? ;) )

There’s an interesting issue at the heart of this project: how do you talk simply without talking down? (Or worse, misleading the reader!) Clear, simple language has real value even outside of this venue. However, translating all value judgments into the simple words “good” or “bad” not only gives the text a childlike sound but also gives its meaning a childlike interpretation, and important distinctions may be lost.

I actually find this wikipedia harder to read, not easier; the stilted sequences of simple sentences dominate my attention with their awkward rhythms and unanticipated gaps (likewise, you may have found my alliteration distracting :) ). Good writing blends its details in to the background and leaves you room to think about the ideas being presented. But yes, I know: I’m not the target audience for this product. I expect that many people are benefiting from much of the information in the Simple English Wikipedia. Hopefully they also get a chance to dig deeper for the real details on their subjects of most interest or need.

6 Comments
2 of 4 people learned something from this entry.

  1. Evan Dorn said,

    April 20, 2009 at 8:41 pm

    (Knew it already.)

    The simple english wikipedia is indeed a huge challenge. It’s pretty clear to me that in most cases you can’t dramatically simplify the language of an article without throwing away some of the information as well.

    I find that the problem of http://simple.wikipedia.org is ill-defined. The different constituencies have different needs: children learning english, technically-proficient adults learning english, native english speakers whose reading is below standard, and people wanting to learn information about a subject for which they have no relevant background. All of these groups have a different ideal mix of simplified information vs. simplified language.

    But in any case, it’s a wiki like any other, so when you find sections objectionable – you should feel free to fix them!

    Just be careful about your generalizations: neural networks don’t, by definition, require supervision in the form of labeled examples. There’s an entire field neural nets for unsupervised learning, defined precisely by the fact that they don’t need labeled examples. As I’m sure you knew. :-)

  2. wkiri said,

    April 20, 2009 at 10:32 pm

    I agree about the diffuse target audience of the project. It’s not just about simple words and simple grammar, but also the question of how much content and how sophisticated the ideas themselves should be (which varies, as you noted, across different potential target groups).

    Thanks for the comment on neural networks; you’re quite right. I assume you’re referring to self-organizing maps. I would still hesitate to characterize them as “learning by themselves” since the user/programmer defines the objective function and neighborhood constraints that lead to a particular solution. (Naturally, since learning without any sort of guidance or feedback is impossible, or more aptly, not well defined.)

  3. Susan said,

    April 21, 2009 at 2:31 am

    (Learned something new!)

    I had never heard of something like this. On one hand, it would be a good reading exercise when we get back to studying Spanish, which we have hopes of doing in the fall. (Though I’m not sure it would be much better than the middle grade fantasy novels we’ve been reading to each other.) OTOH, language essentially develops complexity and nuance to express complex, nuanced concepts. I’m OK with a simple article on Mars because there is a lot you can say simply. But I’m not sure that there’s much you can say about neural networks, even if your target audience is educated adults with rudimentary English. To read something useful on this topic, someone probably is going to have to ease into technical writing someplace else.

  4. Terran said,

    April 22, 2009 at 5:53 am

    (Knew it already.)

    Yeah, I followed the exact same xkcd pointer to discover simple Wikipedia. I hadn’t looked through it as extensively as you have, though.

    I agree that the writing feels stilted. What bothers me much more, though, is that the text is flat out wrong or, at the least, highly misleading. The excerpts you give are great examples. Especially the one about computers. That has nothing to do with what I think a computer is. I have trouble imagining someone with a degree in CS writing that definition.

    It seems to me that the core issue is that you don’t have real experts writing the articles. I know that this has been the subject of some strife on the main wikipedia site, but it feels like it’s worse on the simple site, if only because there’s even less incentive for experts to spend their effort writing there. (Less “show off cred”, as it were.) Which is a real shame, because it could clearly use attention from people who both understood a field deeply and are capable of writing clearly and concisely (characteristics which, sadly, come together all too rarely).

    I hadn’t considered Evan’s point about different constituencies, but it’s an excellent one. Personally, I have always wanted a “dial” of some sort on Wikipedia, by which I could set it to show me something at grade school level, high school level, college level, PhD level… Even that wouldn’t address the constituency issue, but it might come closer. If I’m learning a language, I can set things to grade school (accepting that I’ll be seeing a drastically oversimplified view of the subject). While if I’m trying to learn about a new subject as an educated outsider, I can pick, say, college level. Children could then pick whatever level they are most comfortable with, and so on.

    Of course, now you’re talking about trying to generate, oh, 5 times the content that Wikipedia currently has. And there are incentive problems as it is — it would just get dramatically worse. Still, I’m surprised at how successful Wikipedia has been already. It might be possible.

  5. wkiri said,

    April 22, 2009 at 11:04 am

    Of course, now you’re talking about trying to generate, oh, 5 times the content that Wikipedia currently has.

    There has been work in the NLP field that could be relevant for automating this process. Fundamentally, it’s a translation problem. If you start with the text at maximum detail, there’s been a lot of work on automatically generating summaries or other rewriting efforts, some of which leverage machine translation and some of which are independent efforts (e.g., automatically generating an abstract). I agree with you about the desire for a complexity dial. Wikipedia might be an excellent place in which this technology could make major contributions.

  6. Elizabeth said,

    April 23, 2009 at 9:07 am

    (Learned something new!)

    I find this question especially compelling because it’s the #1 concern that skeptical lawyers have about the legal plain-language movement. In a profession where precision is essential, it’s sometimes hard to convince lawyers that simplifying their language will not bleach out the shades of meaning (and in fact, in most cases, will make the meaning clearer). As these wiki articles show, it’s a balancing act that highlights how important it is that language follow the maxim “as simple as possible, but no simpler.”

    The examples you gave from the Simple English Wikipedia break the second part of that rule, losing accuracy in favor of simplicity. I think a more skilled writer could have avoided this, and indeed to be worthy of the audience’s trust, a writer must put in the extra effort (and it is hard work!) not to throw out the baby with the bathwater.

    I think these examples have violated the “and no simpler” clause even with respect to their stated purpose of helping language learners. First, they assume that language learners desire simplicity over accuracy in factual articles, which I don’t think is true. Second, they ignore the reality that more nuanced writing, even at the risk of the occasional unfamiliar word, better facilitates language learning because it’s closer to the way people speak and write in everyday life.

    Thank you for sharing this!

Post a Comment

I knew this already. I learned something new!