In 500 Billion Words, New Window on Culture
By Patricia Cohen
From New York Times, The
With little fanfare, Google has made a mammoth database culled from nearly 5.2 million digitized books available to the public for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities.
The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words contained in books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian.
The intended audience is scholarly, but a simple online tool allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase’s use over time — a diversion that can quickly become as addictive as the habit-forming game Angry Birds.
With a click you can see that “women,” in comparison with “men,” is rarely mentioned until the early 1970s, when feminism gained a foothold. The lines eventually cross paths about 1986.
You can also learn that Mickey Mouse and Marilyn Monroe don’t get nearly as much attention in print as Jimmy Carter; compare the many more references in English than in Chinese to “Tiananmen Square” after 1989; or follow the ascent of “grilling” from the late 1990s until it outpaced “roasting” and “frying” in 2004.
Read the full article.