language

What Are You Doing Outside the Kitchen?

November 17, 2021

ai hebrew language

In November 2011, I ran four sentences through Google Translate. English to Hebrew.

The sentences:

I wash the car
I wash the floor
I wash the kitchen
I go shopping

Hebrew is a gendered language. Every verb has a masculine and a feminine form. The translator had to pick one.

It picked masculine for the car. Feminine for the floor. Feminine for the kitchen. Masculine for shopping.

The subject is "I" in all four sentences. The subject has no gender. The only thing that changed was the object.

I posted it to Facebook with the title מה את עושה מחוץ למטבח? The Hebrew idiom for "what are you doing outside the kitchen?". It's the kind of thing a certain kind of man says to a woman who has opinions.

Friends were quick to name it. Statistical sexism. And not just the kitchen. The floor gets the feminine treatment too. Both are inside the house.

Ten years later, I ran the same sentences again.

Language Is Made of Rubber

April 1, 2019

books language quote

GEB turns 40 this year. Seven hundred and seventy-seven pages about how meaning works. And one of my favorites is about how it shouldn't. Language is magic.

The amazing thing about language is how imprecisely we use it and still manage to get away with it. People use words in a "spongy" or "rubbery" or even "Nutty-Puttyish" way. If words were nuts and bolts, people could make any bolt fit into any nut; they'd just squish the one into the other, as in some surrealistic painting where everything goes soft. Language, in human hands, becomes almost like a fluid, despite the coarse grain of its components.

Douglas Hofstadter, "Gödel, Escher, Bach"

Three Shadows, One Cube

March 11, 2013

books hebrew language

I found it on Thingiverse. A 3D-printable cube whose shadow, depending on the direction of the light, casts three different QR codes. Each one links to a Wikipedia article. Gödel. Escher. Bach.

The designer's note: "Note that QR codes cannot be read in mirror image, so only 3 of the 6 possible cube orientations cast a readable shadow".

I stared at this for a while.

Hofstadter wrote, in the Introduction to GEB, that he eventually realized Gödel, Escher, and Bach "were only shadows cast in different directions by some central solid essence". He tried to reconstruct that solid. The book was the result.

I read GEB in 2011. It took me ten months. The book is 777 pages and doesn't let you skim. Except for the chapter that's just diagram after diagram of visual pattern puzzles. I skimmed that one.

Footer had opinions about the diagrams too.

Formal systems. Strange loops. What it means for a system to talk about itself. The idea that meaning isn't carried in symbols. It emerges when one structure gets mapped onto another, when a decoder shows up and suddenly the marks mean something.

The concepts came fast and kept compounding. I'd finish a chapter and feel like I'd been handed new eyes. Then the next chapter would use those eyes to see something else.

Critique of Language

February 1, 2013

language quote

Reading Wittgenstein.

All philosophy is a critique of language.

Every argument. Every misunderstanding. Every time I knew exactly what I meant and still couldn't make someone else see it. Every time someone else knew exactly what they meant and I couldn't see it either.

Beyond Good-Turing

April 30, 2012

language

Attended the Alan Turing Centennial Conference at Bar-Ilan University.
100 years since Turing was born.

Corinna Cortes co-invented SVMs. Heads Google Research in New York.

"Beyond Good-Turing". Handling words you've never seen before. Classic NLP problem.

Text is messy. Uncertain. Not clean data.

Her solution: weighted automata. State machines where every transition has a probability attached. Words get weights. Path through the graph gives you the most likely interpretation.

Apparently this is how speech recognition works. Multiple possible interpretations. Find the best path.

Her closing question: "How do we learn from uncertain data?"

Good question. Don't have the answer. Neither does she yet, I think.

Negative Double Positive

December 27, 2011

language

An MIT linguistics professor was lecturing his class the other day. "In English," he said, "a double negative forms a positive. However, in some languages, such as Russian, a double negative remains a negative. But there isn't a single language, not one, in which a double positive can express a negative."

A voice from the back of the room piped up, "Yeah, right."

This has been discussed on the Linguistics Stack Exchange: Is "double positive meaning negative" a common phenomenon?

Yes, for example, it's the same in Italian "sì, sì" (= yes, yes), but it's ambiguous, it depends on intonation and not on the words themselves; this means that "double positive = negative" is wrong speaking about the words, but it works through other means. Changing intonation, that "sì, sì" can be absolutely positive as well. We also use a small variation in written language to substitute the intonation. We write "seh seh" or "se se"... More or less like the English slang variation "ye ye".

In Hebrew, my native language, we have "כן, בטח" (ken, betach — "yes, sure"). With the right intonation, it flips to pure sarcasm. Same with "כן, כן" (ken, ken — "yes, yes"). Can be genuine agreement or complete dismissal.

Related, from The Lousy Linguist:

There are 3 interpretations of "yeah, right" in American English:
Normal (factual agreement): yeah right = 'yes, that is correct'
Sarcastic (opposite meaning): yeah right = 'no way in hell'
Back-channel (sentiment agreement): yeah right = 'mm-hmm'
Thanks to the influence of Seinfeld and Friends throughout the 90s, Sarcastic is probably the default use these days...

The Lousy Linguist links to a 2006 paper from USC that tried to teach a computer to detect sarcastic "yeah right" in phone conversations. Their finding: when human annotators listened without context, they only agreed 52% of the time. Barely above chance. Add context, and agreement jumps to 77%. The machine did best when it ignored tone of voice entirely and focused on contextual cues like laughter and position in the conversation. How something is said matters less than what surrounds it.

At work we're trying to detect intent from text. No audio. No laughter. Just words. Sarcasm is especially confusing: "yeah, sure, I'll buy this camera tomorrow!" means one thing as a reply to Canon posting about the new EOS 600D, and something else entirely when it's a comment on an article about the $120,000 EF 1200mm f/5.6.

I wonder how long until computers actually get this right.

Posts for: #language (10 Posts)