thoughts on capability security on the Web

Kragen Sitaker kragen@pobox.com
Thu, 3 Aug 2000 22:25:45 -0400 (EDT)


There have been three major confused-deputy security incidents on the
Web, that I know of.


The first is the Scooter problem; w3c.org granted access to their
internal documents to *.dec.com, which was a membor of the W3C.
scooter.pa.av.dec.com, AltaVista's web-spider, regularly looked all
over the web to fetch documents and index them in AltaVista.  Scooter
unknowingly acted with the authority of DEC, fetched the W3C's
documents, and AltaVista returned scraps of them in query results.

If Scooter had been able to express that it was just a nobody on the
Web, not a DEC employee, it would not have accidentally downloaded
these documents.


The second one is the ClientSideTrojan issue, aka the
delete-your-web-site problem.

Suppose Jack manages a Web site; being a pathetic Microsoft victim,
Jack doesn't know how to log into the Unix Web server and use vi and
Emacs to update his HTML pages, so he uses the easy-to-use HTML
interface instead.  It has forms to do things like update files, delete
files, even delete whole directory trees.

Jack has an enemy, Mary.  Mary is upset with Jack.  She knows Jack
loves his Web site, and would like to hurt him by deleting it.

But if Mary tries to delete Jack's Web site herself, the Web server
will ask her for Jack's name and password.  She doesn't know them, so
it will refuse.

When Jack administers his Web site, the Web server asks his Web browser
for his name and password.  He types them in; the browser remembers
them and sends them with every request to that Web server, so that it
doesn't have to ask him over and over again.

She has a Web site hosted on the same software as Jack's Web site, so
she knows the URLs Jack uses to update his Web site.

Mary uses Netscape to look at the delete-directory-tree form on her
server.  She saves it to disk on her machine.  Then she replaces the
name of her Web site, "mary.example.com", with the name of Jack's Web
site, "jack.example.edu".  She fills in the field for which directory
to delete with "/", meaning the whole server, and she makes all the
form fields hidden.

Now, Mary looks at the form in her Web browser.  It just looks like a
button.  If she clicks on the button, it will try to delete Jack's Web
site, but the Web server will still ask her for Jack's name and
password.

So Mary anonymously posts the modified form on a discussion board she
knows Jack reads, with an enticing caption.

The next morning, Jack adds a couple of pages to his Web site; this
makes his browser memorize his username and password.  Then he goes to
read the discussion board.  He sees Mary's post and clicks on the button.

Now Jack's browser sends a request to delete Jack's Web site to Jack's
Web hosting service.  Since it's sending the request to Jack's Web
hosting service, it adds Jack's memorized username and password.

Jack's Web host deletes Jack's Web site.  Jack is confused.  He doesn't
understand how this happened.

The problem is that Mary was able to fool Jack's Web browser into
making an HTTP request on her behalf, using authority she didn't have.
Jack does not have a way to tell his browser to follow a link using
some other authority; it uses the authority inherent in Jack's
identity.

In an ideal world, Jack's browser would send the canned request with
Mary's authority, not Jack's.

Workaround?  "Don't browse untrusted sites."  Time to discover this
vulnerability?  Roughly seven years.

More details at
http://www.zope.org/Members/jim/ZopeSecurity/ClientSideTrojan


The third one is the 'cross-site scripting' problem, better known as
the 'malicious HTML tags embedded in client Web requests' problem, or
CERT Advisory 2000-02.

On many dynamically-generated Web pages, bits of text from the URL end
up in the resulting Web pages --- for example, in error messages, when
you make an invalid request.

If an evil person supplies a link to a script that will do this, they
can include arbitrary HTML text in the link.  This HTML text will be
sent to the server as part of the URL, and then sent back inside the
document.

This is annoying, though probably not a big deal, for ordinary text.
But when JavaScript comes into play, we have real problems; the
JavaScript executes in the browser with the authority of the server
that bounced it back to the browser, not the authority of the server
that provided the link.  Furthermore, actions taken by the browser as a
result of that JavaScript --- such as posting forms --- execute on the
server with the authority of the browser's user, not the server that
provided the link.

Workaround?  "Don't browse untrusted web sites."  Time to find this
problem?  Roughly five years.  Scope of the problem?  Nearly every Web
server in history.

For more info, see http://www.cert.org/advisories/CA-2000-02.html.

[Ben Sittler tells me that Firefly actually found and fixed this
 problem in their HTML chat system three years ago or so.]


All three of these problems are *confused deputy problems*.  As is
typical of such problems, they are extremely subtle security holes;
they often hide for years in plain sight.  They naturally fall out of
principal-based access control models.

[I unfortunately have a fuzzy concept of the real distinction between
 principal-based access control and capability-based access control.
 Part of it is that principal-based requests derive their authority
 from the identity of their sender, who may be forwarding a request
 from someone else, while capability-based requests derive their
 authority from the contents of the request; another part is that
 capabilities are very fine-grained --- a Web tree with 1000 files
 might have 20000 or 50000 capabilities, while only being used by two
 or three people --- but I am not sure that is relevant here; another
 part is that capability systems unify naming and permission, while
 principal-based systems separate them.]


Yet another problem: no way to delegate authority in a fine-grained
way.  This means that people share usernames and passwords in order to
collaborate, further reducing security.

Here I propose a set of modifications to the Web infrastructure that
would solve these problems transparently, while avoiding similar
problems in the future.

Access to protected Web pages is authorized, as now, by secrets shared
by the Web server and Web client.  HTTP need not change at all; the
only difference is that there are more distinct secrets than before.
Every Web page has at least one secret that enables access to it, but
no other Web pages; most will have dozens or hundreds.

Each secret is a short random string; 25 case-sensitive alphanumeric
characters should be plenty.  It gets included in links.

Each user you care to distinguish from others gets allocated a separate
secret.  You can revoke page access on a page-by-page and user-by-user
basis by severing the secrets from the page.

(This won't work; if page A links to page B, its link will include an
auth string 'foo'.  If user X and user Y have capabilities to read page
A, they will both use 'foo' to get to page B.  If user X then has their
access to both pages revoked, they can still read page B, unless 'foo'
is severed --- which breaks the link between page A and page B.  Fixes
are welcomed.)

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
Perilous to all of us are the devices of an art deeper than we ourselves
possess.
                -- Gandalf the Grey [J.R.R. Tolkien, "Lord of the Rings"]