Followup to "reclaiming the Oxford English Dictionary for the
public"
Kragen Sitaker
kragen at pobox.com
Thu Mar 16 03:37:01 EST 2006
In October, I wrote about how it would be nice for the first-edition
OED to be publicly available:
http://lists.canonical.org/pipermail/kragen-tol/2005-October/000794.html
At this point I have scanned volumes 1 (A-B), 2 (C), 3 (D-E), 4 (F-G),
5 (H-K) (Paul Nguyen did the work), and parts of volume 6 (L and M,
but not yet N). I hope to finish the 10 volumes by the end of the
week. The volumes I have scanned so far are available in raw form
online, which is unfortunately not very practical to download. Soon,
more practical versions of these books should be available; their home
pages are
http://www.archive.org/details/oed01arch
http://www.archive.org/details/oed02arch
http://www.archive.org/details/oed03arch
http://www.archive.org/details/oed04arch
http://www.archive.org/details/newenglishdict05murrmiss
http://www.archive.org/details/oed6aarch
http://www.archive.org/details/oed6barch (still incomplete)
In November, I wrote some software to make it possible to do word
lookups without relying on OCR. I posted the software at
http://lists.canonical.org/pipermail/kragen-hacks/2005-November/000421.html
It's currently running at
http://considerate.murch-sitaker.org:8000/page
It needs some user interface improvements.
Here are some thoughts on monetary estimates of the value of this
project.
Online access to the current edition costs US$295 per year and is
currently available to about 30 million people, for a total value of
$8.8 billion per year. (Most of those people pay much less because
they're part of an institutional licensing scheme.)
(https://ams.oup.com/user/newacct.cgi?title=oed lists the pricing;
http://www.askoxford.com/asktheexperts/faq/aboutdictionaries/oedonline?view=uk
says, "Thousands of libraries throughout the English-speaking world
and beyond have access to the online edition - giving more than 30
million people around the world the chance to explore 'the world's
greatest dictionary'.")
The first edition that I am putting online is inferior to the current
edition in several ways: only half of it will be available worldwide
at first due to copyright law, which will create uncertainty about
whether you can find the definition of a particular word and will have
a disproportionate effect on its value; at first, only page images
rather than ASCII text will be available, as we haven't managed to OCR
it yet, and even when we do, a great deal of proofreading work will be
needed; and it's quite outdated.
If we discount the $295 per year by a yearly factor of 1.1, which is
extremely generous, we get a total of $3059 for the next 30 years.
Adding it up to infinity, we get $3245. If we use a more reasonable
(i.e. closer to unity) discount rate, we get a larger value.
Suppose we estimate the value of having access to the public-domain
part of the OED by reference to the version that Oxford has for sale,
discounted by:
- a factor of 6 to account for the fact that the people who have
bothered to buy access at $295 per year are those who are unusually
devoted to words;
- a factor of 3 to account for its incompleteness;
- a factor of 2 to account of it being out-of-date;
- a factor of 2 to account for getting page images instead of ASCII
text.
This brings the total value of the public-domain portion down to $45
per person, or $4.09 per year per person. Approximately 99.55% of the
world's population, or about 6.5 billion people, currently doesn't
have access to the OED.
This values the public-domain version at $26.6 billion per year, or
$293 billion overall. (If you pick a lower discount rate, the $293
billion number becomes much larger.) That means that every page I
scan, out of the fifteen thousand or so, produces about $19.5 million
of value for the world; that's about $9.8 billion an hour. My hourly
wages have usually been less.
More information about the Kragen-tol
mailing list