The difficulty of text processing

There's a billboard I pass on the way to work these days. It's from a series of Tide ads (semi-ineffective, in that I know they're all for the same product but I often can't quite remember which) that describe messy situations with no further comment except for the product's yellow-and-orange-bullseye logo.

The one that particularly caught my attention reads:

Nice Day.

Top Down.

Bad Pigeon.

And it occurred to me that there's a large array of cultural information that you have to have to make any sense out of that. You have to know that there's a kind of car where you can open the roof, and that opening the roof is known as taking the top down. You have to know that people like driving with the top down when the weather is favorable, and that favorable weather is known as a nice day. You have to know that pigeons leave droppings, and that people don't like pigeon droppings to get on their things, and that with the top down, pigeon droppings can get onto things inside the car (such as the clothing you're wearing). You have to know that the colored pattern used as a background for the billboard is the logo of a detergent, and that detergents are used to clean clothing, and that if someone has a pigeon dropping on their clothing they probably want to clean it. And, of course, you have to know about advertising and how it works and what it's for, which also presupposes a certain understanding of at least the surface-level workings of capitalism.

All of which suggests that it would be awfully difficult for a computer to make any sense out of this billboard. I imagine that many of the abovementioned background facts are included in that big database of common-sense information that some AI researchers developed, but I'd be very surprised if all of that info is in that database. So partly I'm just saying AI is hard.

But I think this is also a useful object lesson for writers of speculative fiction, because it's often tempting to write stories in which an alien who doesn't speak English very well makes funny mistakes due to total lack of cultural knowledge. But my feeling is that it's impossible to achieve real fluency in a language without some cultural knowledge. There's enough that's common among human cultures that it's possible for a human to get by in another language even without much cultural knowledge. But for an alien with absolutely no understanding of human cultures at all, I'm pretty skeptical that they'd be able to put together coherent sentences in a human language.

(Yes, most of the time this is done in stories intended to be funny, and I know that this level of logical rigor shouldn't be applied to comedic stories. It could even be argued that this is yet another of those genre conventions I'm always talking about, probably borrowed from an older genre convention in movies. So perhaps I ought to just label it a pet peeve, and note that I have a hard time seeing past what feels to me like a basic logic hole in this case, unless the story strikes me as so funny that I forget to think about it.)

(And just to be clear, I'm not talking about the kind of thing where an alien doesn't understand a few obscure words. ("What is this 'zeugma' you speak of, human?") I'm talking about the kind of thing where, for example, an alien speaks in normal colloquial idiom-filled English but fails to recognize that English words can have more than one meaning.) (I might make a grudging exception in the case where that only-one-meaning idea is explicitly a major plot element, like in "Spice Pogrom".)

I suppose another way of putting all this is that before an alien can even say "Take me to your leader," they have to know (and/or assume, and/or share) a fair bit of background about human cultures and political systems.

9 Responses to “The difficulty of text processing”

  1. Jay Lake

    Jed —

    What you’re complaining about is, to oversimplify, humans-in-funny-alien-suits, which is nigh inescapable. The truly alien is almost by definition incommunicable. Ted Chiang did it in “Story of Your Life,” but that’s a piece of work with a level of intellectual rigor and reader demand that is almost literally unsurpassed in our canon.

    In the end, even the poor old Horta just wanted to protect her children.

    Jay

    reply
  2. Jenn Reese

    I learned a lot about cultural assumptions working for Slangman (writing books on slang for non-native speakers of English). We tried to write example sentences 1) containing no other slang than the main phrase, and 2) that would make sense to a person from Japan, Russia, Brazil, France, Qatar, etc. It’s a lot harder than it sounds.

    Even a sentence like “Darn! The cat just got out.” implies an understanding that Americans keep their cats indoors on purpose. The Dirty book was interesting because of all the “normal” words in our language with dangerous other meanings, like top & bottom, “mother”, snatch, etc.

    Anyway, I find it a fascinating subject and have certainly been guilty of over-humanizing my aliens in the past. (And will probably be guilty of it in the future too.)

    Jenn

    reply
  3. Wendy Shaffer

    And, of course, as Paul Park points out in his short story, “If Lions Could Speak,” when you write a story about a realistically alien alien, lack of communication inevitably becomes the central theme of the story, which is as artistically limiting in its own way as humans-in-funny-suits stories.

    Humans-in-funny-suits *are* just about inescapable, but most writers’ humans-in-funny-suits aren’t nearly as weird as some real humans can be to other real humans. My personal acid test is that if the aliens don’t seem at least as weird to me as the Japanese seem to me, then they’re not weird enough.

    reply
  4. Jed

    (Re Jay’s comment)

    Hmm. I’ll have to think about that, but offhand I think I’m complaining about something much more specific.

    I’m willing to accept that in most cases aliens are going to have something in common with humanity; that’s both a genre convention and a matter of practicality (writers are human, readers are human, it’s thus awfully difficult to portray aliens who are both interesting and entirely incomprehensible). I’m even willing to take it as a genre convention that most aliens more or less share human linguistic deep structure, that they have the same notions of nouns and verbs that we do, etc. (Though I’m intrigued by the idea of trying to portray aliens who don’t share those concepts.)

    But I’m complaining about a very specific kind of story, in which (a) an alien speaks English fluently and idiomatically, and yet (b) the writer makes a big deal over the alien’s inability to understand certain common terms or structures in English. (I’m okay with either of those alone; it’s the combination that bugs me.) Like an alien who can comfortably discuss the concept of paying rent for an apartment but is certain that the word “landlord” must refer to a god of the earth. (“Even though I know everything there is to know about your society, and 99% of the words in your language, somehow the common word ‘landlord’ doesn’t appear in my dictionary, so I’d better make a dumb guess as to what it might mean, and then say that aloud so the reader will be amused.”) Intelligent computers in sf are often prone to the same sort of difficulty.

    Come to think of it, Data on ST:TNG sometimes annoyed me for the same sorts of reasons. He’s sentient, he’s very smart, he has access to a huge amount of information, he speaks English fairly fluently, yet he sometimes gets tripped up by taking idioms literally. Why doesn’t someone install an idiom dictionary in his database? Anyone familiar with any human language should be aware that (a) humans often use idioms and metaphors and other non-literal phrases, and that (b) most human words are derived from older words with slightly different meanings, and thus that (c) it’s ridiculous to assume that a word or phrase can only mean the specific thing that you might think it means if you combine one specific randomly chosen meaning of each of its component parts.

    Anyway, sorry, I’ll stop now. I suppose a concise summary of my point might be: “There are certain types of humor that Jed doesn’t find funny because he analyzes them too much.”

    reply
  5. Jed

    Re Jenn’s and Wendy’s comments: Good comments. I should note that underlying a lot of what I’m talking about is real-world analogies with humans speaking an unfamiliar language, and I’m treading on shaky ground because I don’t have a lot of experience with either speaking an unfamiliar language or interacting with people who are new to English.

    But my impression has generally been that real fluency requires the kind of cultural background info that Jenn describes.

    And yes, I totally agree with Wendy that aliens should (in general) be more different from Americans than humans from non-American cultures are, especially if the story’s set in a multicultural human future. A couple of times I’ve seen the equivalent of, say, a Japanese person being shocked that an alien would do something so bizarre as eating raw fish.

    Tangentially related, if any of you are interested in portraying alien biology and haven’t seen the traveling Alien Sex slideshow at various big cons, you should take the next opportunity to go see it. It’s all about the fascinating variety of sexual interactions among Earth creatures (especially insects); it’s both a good sparker of ideas for alien sexual patterns, and a good reminder that there’s room for much more variety in same than we usually see in sf.

    reply
  6. Wendy Shaffer

    Actually, Jed, now that I think about it, my experience with reasonably fluent non-native speakers of English confirms your ideas pretty well. (I’ve worked with a lot of foreign-born students.) Non-native speakers often have a hard time decoding jokes, puns, and other kinds of coded references. (Funny example: The Molecular and Cell Biology Department does a weekly grad-student beer hour. Because of rules about advertising functions where alcohol is served on campus, signs advertising the ‘Beer Hour’ always read ‘BXXR hour’ I never met a native speaker of English who couldn’t decode the sign, but many foreign students had no idea what it was referring to.)

    I’ve never come across an instance of linguistic confusion analogous to your ‘landlord’ example above. The closest I recall was a case of phonetic, not semantic confusion: A Catalonian friend, when he first came to America, had difficulty distinguishing between the vowel sounds in ‘beat’ and ‘bit’. (That short ‘ih’ sound in ‘bit’ does not appear in Catalonian or Spanish) Someone asked him if he had ‘sheets’ for his new apartment, and he thought he was being asked if he had ‘sh*ts’.

    Actually, when you think about it, it’s weird that so many aliens in SF speak not only fluent, but relatively *unaccented* English. Though I suspect that a lot of that has to do with the relative difficulty of portraying dialect.

    reply
  7. Rachel Heslin

    When I was in Prague in ’95, a friend from there pointed out a billboard showing a naked man descending from an airline staircase, covering his private parts with a newspaper.

    My friend said that this was a perfect example of all these Hot Shot American Marketing Companies who had come to Prague to make their fortune in this new and exciting market. Unfortunately, they didn’t know squat about the culture.

    See, the billboard was for insurance, and the slogan translated into, “Are you covered?” Of course, in the Czech language, being “covered” has nothing to do with insurance, so the billboard made zero sense to the intended audience.

    Idiots.

    reply
  8. Vardibidian

    As it happens, I have in my hand a copy of The Big Picture: Idioms as Metaphors, by Kevin King (Houghton Mifflin © 1999). It begins with a definition of idiom and metaphor, and explains that knowing the meaning of the individual words might not prevent misinterpreting the expression. It then presents some Basic Metaphors which underly many idioms; for instance, if you know that Ideas are Balls, you will understand that I might bounce one off you so can put your spin on it, we can kick it around, I can field your questions, and see if it really comes from left field.

    Although the alien-with-a-phrase-book might not be on the ball enough to catch all the idioms, it’s hard to believe that if it can get around at all, it would strike out every time someone threw him a curve.

    Still and all, as with a lot of stuff, either it’s funny, or it isn’t, and if it is, I’m willing to forgive, and if it ain’t, then watch out.

    Redintegro Iraq,
    -Vardibidian.

    reply
  9. metasilk

    I attempted to use your post (the bit about the billboards) to illustrate a point with a colleague today. The conversation went like this: I mentioned shared assumptions as communications issue, and described the billboard phrases. She looked stumped. I asked her what she thought of when she read “nice day” — sunny, warm. Then “top Down” : management structure, she said.

    The point was illustrated! 🙂

    Although she still thought the point didn’t apply, which let into a conversation about how almost completely nonrepresentative my thinking patterns are of how my colleagues think, which in turn led to my private but total gratefulness for having friends like you, Vardibidian, etc. who, often enough *do* understand me.

    I’d better get back to work, but thank you, thank you, thank you–just for Being.

    reply

Join the Conversation