Followup to "reclaiming the Oxford English Dictionary for the public"

Kragen Sitaker kragen at pobox.com
Thu Mar 16 03:37:01 EST 2006


In October, I wrote about how it would be nice for the first-edition
OED to be publicly available:

http://lists.canonical.org/pipermail/kragen-tol/2005-October/000794.html

At this point I have scanned volumes 1 (A-B), 2 (C), 3 (D-E), 4 (F-G),
5 (H-K) (Paul Nguyen did the work), and parts of volume 6 (L and M,
but not yet N).  I hope to finish the 10 volumes by the end of the
week.  The volumes I have scanned so far are available in raw form
online, which is unfortunately not very practical to download.  Soon,
more practical versions of these books should be available; their home
pages are

http://www.archive.org/details/oed01arch
http://www.archive.org/details/oed02arch
http://www.archive.org/details/oed03arch
http://www.archive.org/details/oed04arch
http://www.archive.org/details/newenglishdict05murrmiss
http://www.archive.org/details/oed6aarch
http://www.archive.org/details/oed6barch  (still incomplete)

In November, I wrote some software to make it possible to do word
lookups without relying on OCR.  I posted the software at

http://lists.canonical.org/pipermail/kragen-hacks/2005-November/000421.html

It's currently running at 

http://considerate.murch-sitaker.org:8000/page

It needs some user interface improvements.

Here are some thoughts on monetary estimates of the value of this
project.

Online access to the current edition costs US$295 per year and is
currently available to about 30 million people, for a total value of
$8.8 billion per year.  (Most of those people pay much less because
they're part of an institutional licensing scheme.)
(https://ams.oup.com/user/newacct.cgi?title=oed lists the pricing;
http://www.askoxford.com/asktheexperts/faq/aboutdictionaries/oedonline?view=uk
says, "Thousands of libraries throughout the English-speaking world
and beyond have access to the online edition - giving more than 30
million people around the world the chance to explore 'the world's
greatest dictionary'.")

The first edition that I am putting online is inferior to the current
edition in several ways: only half of it will be available worldwide
at first due to copyright law, which will create uncertainty about
whether you can find the definition of a particular word and will have
a disproportionate effect on its value; at first, only page images
rather than ASCII text will be available, as we haven't managed to OCR
it yet, and even when we do, a great deal of proofreading work will be
needed; and it's quite outdated.

If we discount the $295 per year by a yearly factor of 1.1, which is
extremely generous, we get a total of $3059 for the next 30 years.
Adding it up to infinity, we get $3245.  If we use a more reasonable
(i.e. closer to unity) discount rate, we get a larger value.

Suppose we estimate the value of having access to the public-domain
part of the OED by reference to the version that Oxford has for sale,
discounted by:
- a factor of 6 to account for the fact that the people who have
  bothered to buy access at $295 per year are those who are unusually
  devoted to words;
- a factor of 3 to account for its incompleteness;
- a factor of 2 to account of it being out-of-date;
- a factor of 2 to account for getting page images instead of ASCII
  text.

This brings the total value of the public-domain portion down to $45
per person, or $4.09 per year per person.  Approximately 99.55% of the
world's population, or about 6.5 billion people, currently doesn't
have access to the OED.

This values the public-domain version at $26.6 billion per year, or
$293 billion overall.  (If you pick a lower discount rate, the $293
billion number becomes much larger.)  That means that every page I
scan, out of the fifteen thousand or so, produces about $19.5 million
of value for the world; that's about $9.8 billion an hour.  My hourly
wages have usually been less.


More information about the Kragen-tol mailing list