XML editors
Kragen Sitaker
kragen@pobox.com
Fri, 20 Aug 1999 21:37:56 -0400 (EDT)
XML offers some useful things that are lacking in simple text -- ways
of communicating the structure of your document to the computer -- that
could make editing documents considerably easier.
Mathcad interprets ( as 'insert a matched set of parentheses and
position the cursor inside of it'. Similarly, an XML editor can
interpret insertion of tags as insertion of tag trees. If this is done
right, it should be possible to prevent someone from creating
ill-formed XML, without getting in their way.
Example session.
I type: I see on the screen: (| is cursor; clipped at 72 cols)
< <|></>
? <?XML VERSION="1.0"|?>
> <?XML VERSION="1.0"?>|
<Enter> <?XML VERSION="1.0"?>
|
< <|></>
ht <ht|></ht>
ml <html|></html>
> <html>|</html>
<head> <html><head>|</head></html>
<title> <html><head><title>|</title></head></html>
Bob's bog <html><head><title>Bob's bog|</title></head></html>
> <html><head><title>Bob's bog</title>|</head></html>
<link <html><head><title>Bob's bog</title><link|></link></head></h
rev=" <html><head><title>Bob's bog</title><link rev="|"></link></h
made" <html><head><title>Bob's bog</title><link rev="made"|></link
(deleting up to </title> for visibility)
<space> </title><link rev="made" |></link></head></html>
/ </title><link rev="made" /|></head></html>
>> </title><link rev="made" /></head>|</html>
<body> </title><link rev="made" /></head><body>|</body></html>
etc.
This much could easily be done in Emacs; it would save a bit of time,
but it wouldn't be exactly a revolution. You would, of course, need to
- modify insertion commands to maintain well-formedness:
- modifying opening tag names modifies closing-tag names, as shown
above
- inserting empty-tag markers (/>) makes closing tags disappear
- typing = after an attribute name inserts ="", positioning the
cursor before the first ";
- inserting a ' or " after the = after an attribute name changes
the quotes around the attribute value to the type of quote typed,
and moves the cursor inside the quotes; typing anything else moves
the cursor inside the quotes and inserts it.
- trying to insert a > moves you past the end of the current or
next tag; i.e. if you're inside a tag, it moves you outside the
tag, and otherwise it moves you right past a tag. (Can you put
tags inside attribute values?)
- I'm not sure what to do if somebody types < followed by something
other than a valid tag name (e.g. @, or /, or another <). I'm sure
this is soluble.
- modify movement commands to keep you from moving into places where you
couldn't type anything sensible (notably inside closing </tag>s),
- modify deleting commands
- modifying opening tag names modifies closing tag names too
- if I delete the terminal / inside the tag for an
<emptyelement />, then I should get a closing tag
- if I type a terminal / inside an <openingtag>, making it an
empty-element tag, the closing tag should disappear. For
forgiveness, better make it possible to backspace over it and
make the closing tag come back where it was a moment ago.
- what happens if someone deletes a quote quoting an attribute
value? Only solution I can see is to replace the quote with
the other kind of quote (' becomes ", " becomes '). (Maybe deleting
a character of the attribute value would be better.)
- What happens if somebody tries to delete the = between an
attribute name and value? The simple answer, I think, would be
to delete characters of the attribute value, unless it's empty,
in which case both the = and the quotes could be deleted at once.
- trying to delete the > of a tag results in deleting the last thing
inside, or the whole tag and its corresponding closing tag if nothing
is inside.
Now, with these rules, I think every document we can type will be
well-formed XML. (I don't think we can quite type any well-formed XML
document; in particular, we need to allow for comments, <[CDATA[
sections, DTD stuff, PIs, and character entities, and maybe other stuff
I've forgotten.) And if we start with a well-formed XML document
(except for <[CDATA[ and comments -- gotta take them into account) we
can't turn it into an ill-formed XML document.
I also think that these rules will make people feel good. They'll be
able to figure out how to do whatever they want to do, and whatever
they do will have a visible effect. And they'll be able to type XML,
fast.
So we can do syntax highlighting -- highlighting a whole element when
we're over a tag at the end of it, and drawing tags in different
colors. We can fold -- double-click on a tag, and the element inside
it shrinks visually to a point. (This is useful for outlining.) We
can enable cut-and-paste of entire chunks of well-formed XML; ideally
we wouldn't even have to select the chunk we want if it's contained in
a single element, because we can't reasonably cut and paste parts of an
element.
Still unresolved issues:
- How to cut and paste a sequence of elements, or of text?
- How to move text in and out of CDATA sections? (Presumably
converting & and < to & and < and back. But what do we do with
&somethingelse;?)
- Moving text into a comment is easy. But how do we move text out of a
comment? If we treat it like CDATA, people will get pissed off when
they have to go turn all their <'s back into <'s. Perhaps we can paste
unchanged whatever is well-formed, and textify the rest?
- pasting from other places. Similar to copying text out of a comment.
This guy, plus XQL (http://metalab.unc.edu/xql/ -- see the white paper
on the design of XQL, with the ?? and stuff, not the stupid Microsoftie
proposal with "methods" all over the place) augmented by regexes, could
combine the power of outline processors with the power of AskSam, and
the power and portability of XML to boot. *Somebody* would like it.
:)
--
<kragen@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
Thu Aug 19 1999
81 days until the Internet stock bubble bursts on Monday, 1999-11-08.
<URL:http://www.pobox.com/~kragen/bubble.html>