Scouring the Web to Make New Words ‘Lookupable’

A couple of weeks ago, two of my New York Times colleagues chronicled digital culture trends that are so newish and niche-y that conventional English dictionaries don’t yet include words for either of them.

In an article on Sept. 20, Stephanie Rosenbloom, a travel columnist, reviewed flight apps that try to perfect “farecasting” — that is, she explained, the art of “predicting the best date to buy a ticket” to obtain the lowest fares.

That same day Jenna Wortham, a columnist for The Times Magazine, described a phenomenon she called “technomysticism,” in which Internet users embrace medieval beliefs, spells and charms.

These word coinages may be too fresh — and too little used for now — to be of immediate interest to major English dictionaries. But Erin McKean, a lexicographer with an egalitarian approach to language, thinks “madeupical” words such as these deserve to be documented.

Ms. McKean started a campaign last month on Kickstarter, the crowdfunding site, to unearth one million “missing” English words — words that are not currently found in traditional dictionaries. To locate the underdocumented expressions, she has engaged a pair of data scientists to scrape and analyze language used in online publications. Ms. McKean said she planned to incorporate the found words in Wordnik.coman online dictionary of which she is a co-founder.

“We really believe that every word should be lookupable,” Ms. McKean told me recently. “That doesn’t mean that every word should be used in every situation. But we think that people by and large are entirely capable of making that decision for themselves.”

Before her analytics project gets underway next month, Ms. McKean is crowdsourcing a list of missing words for possible inclusion in Wordnik. Candidates so far include: procrastatweeting, dronevertising and roomnesia, a condition in which people forget why they walked into a room.

Ms. McKean, who is a former editor of the New Oxford American Dictionary, and two colleagues introduced the Wordnik site in 2009 with the aim of addressing some limitations they had encountered while working for dictionary publishers.

In a recent quarterly online update, the Oxford English Dictionary added the word “hoverboard” — 26 years after the floating skateboards were first mentioned in the movie “Back to the Future II.” An editor’s note explained that the O.E.D. had decided to add “hoverboard” now because the dictionary’s word-monitoring system had recently detected an increased use of the term, most likely, the note says, related to a 2015 date that is an important plot element in the film.

(It doesn’t always take decades to document a new word. The O.E.D. added “podcast” in 2008 just four years after it says the word emerged.)

With no space limitations or publication deadlines, Wordnik is able to incorporate a vast number of new words on a continuing basis. In addition to human contributors, the site uses automated online searches to locate sentences that contain certain words on blogs, social media, news and other sites.

When a person looks up a term on Wordnik, the site displays full-sentence examples of its usage, taken from sources like The Huffington Post and Boing Boing. If the word already has an entry in certain more traditional dictionaries, the site also provides that definition.

Ms. McKean said Wordnik had accumulated some information on eight million words, both old and new. Its inclusive approach makes the site more of a word welcomer than a winnower.

“The question is no longer, ‘Is this a good word?’ ” Ms. McKean said. “The question is: ‘What is this word good for? Is this word good for what I need?’ ”

She now plans to expand Wordnik’s word-acquisition system by turning to data analytics to pinpoint emerging terms, like farecasting, that writers explained in passing when they mentioned them. Ms. McKean refers to these readily available explanations as “free-range definitions.” They are easy to locate, she said, because writers often use stock phrases, like “also known as” or “scientists term this” to signal to their readers that they’re about to introduce a new or unfamiliar term.

To cast a wider net for her project, Ms. McKean has enlisted Summer.ai, a data analytics firm. The company plans to use computational techniques to analyze online publications for language structure and patterns — like quotation marks and dashes — that are likely to indicate new words accompanied by self-contained definitions.

Some lexicographers already track whether words are nearing the end of their useful life spans. But Manuel Ebert, a former neuroscientist who is the co-founder of Summer.ai, said the Wordnik research might help track the speed of new-word adoption.

“We can actually measure when words get adopted in mainstream lingo,” he said, by looking at when writers stop explaining neologisms like “infotainment” and start using them as if their meanings were commonly understood. “It will be interesting to see which words will very quickly get adopted and which words remain outsiders.”

Researchers like Paul Cook, an assistant professor of computer science at the University of New Brunswick in Canada, are using similar techniques to find other kinds of novel words.

Mr. Cook developed a program several years ago to analyze posts on Twitter that included new lexical blends — like “jeggings,” a combination of jeans and leggings — and their definitions. Among other portmanteau words, his Twitter researchturned up “awksome” (awkward plus awesome) and “hilazing” (hilarious plus amazing). He hopes eventually to use his program to generate a blended-word lexicon.

“We could have some sort of automatically generated blend dictionary,” Mr. Cook said. “If you had information like this, some dictionaries might be interested in providing this kind of information, as opposed to none.”

This more-words-the-merrier approach is one that lexicographers like Ms. McKean favor.

“Every new word added to the expressiveness of English adds to the things that it’s possible to say,” she says. “English already has one of the world’s largest installed user bases. So why wouldn’t we want to add to it?”

http://mobile.nytimes.com/2015/10/04/technology/scouring-the-web-to-make-new-words-lookupable.html?smid=tw-share&referer=http://t.co/fuoIGH6aTg&_r=0

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: