Friday, May 06, 2005

Amazon's "Statistically Improbable Phrases"

Wired explores Amazon's new "Statistically Improbable Phrases" -- unusual word strings based on statistical analysis of word frequency and data mining. The technique is also producing concordences of Amazon's offerings.

I tried Umberto Eco's Name of the Rose and the list included "dead monks" and "treasure crypt" (well, and "heptagonal room" -- probably one of the more statistically improbable word string one might hope to unearth).

I'm intrigued by the idea of SIPs as authorial fingerprints. Hunter S. Thompson is one of the only authors I know who regularly uses "atavistic" but curiously, "invective screed" was nowhere to be found.

No comments: