XML editors

Kragen Sitaker kragen@pobox.com
Fri, 20 Aug 1999 21:37:56 -0400 (EDT)


XML offers some useful things that are lacking in simple text -- ways
of communicating the structure of your document to the computer -- that
could make editing documents considerably easier.

Mathcad interprets ( as 'insert a matched set of parentheses and
position the cursor inside of it'.  Similarly, an XML editor can
interpret insertion of tags as insertion of tag trees.  If this is done
right, it should be possible to prevent someone from creating
ill-formed XML, without getting in their way.

Example session.

I type:     I see on the screen: (| is cursor; clipped at 72 cols)
<           <|></>
?           <?XML VERSION="1.0"|?>
>           <?XML VERSION="1.0"?>|
<Enter>     <?XML VERSION="1.0"?>
            |

<           <|></>
ht          <ht|></ht>
ml          <html|></html>
>           <html>|</html>
<head>      <html><head>|</head></html>
<title>     <html><head><title>|</title></head></html>
Bob's bog   <html><head><title>Bob's bog|</title></head></html>
>           <html><head><title>Bob's bog</title>|</head></html>
<link       <html><head><title>Bob's bog</title><link|></link></head></h
 rev="      <html><head><title>Bob's bog</title><link rev="|"></link></h
made"       <html><head><title>Bob's bog</title><link rev="made"|></link
(deleting up to </title> for visibility)
<space>     </title><link rev="made" |></link></head></html>
/           </title><link rev="made" /|></head></html>
>>          </title><link rev="made" /></head>|</html>
<body>      </title><link rev="made" /></head><body>|</body></html>

etc.

This much could easily be done in Emacs; it would save a bit of time,
but it wouldn't be exactly a revolution.  You would, of course, need to
- modify insertion commands to maintain well-formedness: 
	- modifying opening tag names modifies closing-tag names, as shown 
	  above
	- inserting empty-tag markers (/>) makes closing tags disappear
	- typing = after an attribute name inserts ="", positioning the
	  cursor before the first ";
	- inserting a ' or " after the = after an attribute name changes 
	  the quotes around the attribute value to the type of quote typed,
	  and moves the cursor inside the quotes; typing anything else moves
	  the cursor inside the quotes and inserts it.
	- trying to insert a > moves you past the end of the current or
	  next tag; i.e. if you're inside a tag, it moves you outside the
	  tag, and otherwise it moves you right past a tag.  (Can you put
	  tags inside attribute values?)
	- I'm not sure what to do if somebody types < followed by something
	  other than a valid tag name (e.g. @, or /, or another <).  I'm sure
	  this is soluble.
- modify movement commands to keep you from moving into places where you
  couldn't type anything sensible (notably inside closing </tag>s),
- modify deleting commands 
	- modifying opening tag names modifies closing tag names too
	- if I delete the terminal / inside the tag for an
	  <emptyelement />, then I should get a closing tag
	- if I type a terminal / inside an <openingtag>, making it an
	  empty-element tag, the closing tag should disappear.  For
	  forgiveness, better make it possible to backspace over it and
	  make the closing tag come back where it was a moment ago.
	- what happens if someone deletes a quote quoting an attribute
	  value?  Only solution I can see is to replace the quote with
	  the other kind of quote (' becomes ", " becomes ').  (Maybe deleting
	  a character of the attribute value would be better.)
	- What happens if somebody tries to delete the = between an
	  attribute name and value?  The simple answer, I think, would be
	  to delete characters of the attribute value, unless it's empty,
	  in which case both the = and the quotes could be deleted at once.
	- trying to delete the > of a tag results in deleting the last thing
	  inside, or the whole tag and its corresponding closing tag if nothing
	  is inside.

Now, with these rules, I think every document we can type will be
well-formed XML.  (I don't think we can quite type any well-formed XML
document; in particular, we need to allow for comments, <[CDATA[
sections, DTD stuff, PIs, and character entities, and maybe other stuff
I've forgotten.)  And if we start with a well-formed XML document
(except for <[CDATA[ and comments -- gotta take them into account) we
can't turn it into an ill-formed XML document.

I also think that these rules will make people feel good.  They'll be
able to figure out how to do whatever they want to do, and whatever
they do will have a visible effect.  And they'll be able to type XML,
fast.

So we can do syntax highlighting -- highlighting a whole element when
we're over a tag at the end of it, and drawing tags in different
colors.  We can fold -- double-click on a tag, and the element inside
it shrinks visually to a point.  (This is useful for outlining.)  We
can enable cut-and-paste of entire chunks of well-formed XML; ideally
we wouldn't even have to select the chunk we want if it's contained in
a single element, because we can't reasonably cut and paste parts of an
element.

Still unresolved issues:
- How to cut and paste a sequence of elements, or of text?
- How to move text in and out of CDATA sections?  (Presumably
  converting & and < to &amp; and &lt; and back.  But what do we do with
  &somethingelse;?)
- Moving text into a comment is easy.  But how do we move text out of a
  comment?  If we treat it like CDATA, people will get pissed off when
  they have to go turn all their &lt;'s back into <'s.  Perhaps we can paste
  unchanged whatever is well-formed, and textify the rest?
- pasting from other places.  Similar to copying text out of a comment.

This guy, plus XQL (http://metalab.unc.edu/xql/ -- see the white paper
on the design of XQL, with the ?? and stuff, not the stupid Microsoftie
proposal with "methods" all over the place) augmented by regexes, could
combine the power of outline processors with the power of AskSam, and
the power and portability of XML to boot.  *Somebody* would like it.
:)

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
Thu Aug 19 1999
81 days until the Internet stock bubble bursts on Monday, 1999-11-08.
<URL:http://www.pobox.com/~kragen/bubble.html>