proposed new ANSI escape sequences for hyperlinks to URLs

Kragen Javier Sitaker kragen at canonical.org
Mon Jun 3 03:37:01 EDT 2013


We should totally have a way to render HTML-style links in terminal
emulators.

Why this is a good idea
-----------------------

Lots of terminal emulators already have a way to recognize URLs in
free text so that you can visit them (ctrl-click in gnome-terminal, an
option on the dropdown menu in both gnome-terminal and Konsole), which
is useful.  But if you're chatting over XMPP with someone who's using
HTML (like HipChat), they're probably going to embed links from time
to time.  And they expect that when they say:

> We'll go from my house to Facebook at around 10:00.

where "my house" is linked to
<http://maps.google.com/maps?client=ubuntu&channel=fs&oe=utf-8&q=1456+Edgewood+Dr.,+Palo+Alto,+CA&um=1&ie=UTF-8&hq=&hnear=0x808fbb0d745a65e9:0xfde4b05151805922,1456+Edgewood+Dr,+Palo+Alto,+CA+94301&gl=us&sa=X&ei=bZNCUbbuI62g4AOPyYDQDw&ved=0CDAQ8gEwAA>
and "Facebook" is linked to
<http://maps.google.com/maps?q=facebook&hl=en&sll=37.45392,-122.13933&sspn=0.007495,0.013797&gl=us&hq=facebook&t=m&z=16>,
then you will see something like

> We'll go from *my house* to *Facebook* at around 10:00.

and not

> We'll go from <http://maps.google.com/maps?client=ubuntu&channel=fs&oe  
> =utf-8&q=1456+Edgewood+Dr.,+Palo+Alto,+CA&um=1&ie=UTF-8&hq=&hnear=0x80  
> 8fbb0d745a65e9:0xfde4b05151805922,1456+Edgewood+Dr,+Palo+Alto,+CA+9430  
> 1&gl=us&sa=X&ei=bZNCUbbuI62g4AOPyYDQDw&ved=0CDAQ8gEwAA>my house to  
> <http://maps.google.com/maps?q=facebook&hl=en&sll=37.45392,-122.13933&s  
> spn=0.007495,0.013797&gl=us&hq=facebook&t=m&z=16> Facebook at around  
> 10:00.

which is pretty annoying and hard to read.

The proposed solution: `ESC [ _` and `ESC [ U < url >`
------------------------------------------------------

We define two new ANSI-compatible escape sequences, which ought to be
proposed for the next version of ECMA-48 and the corresponding ISO
standard:

* `CSI _`, or `ESC [ _`, "LNK": to indicate the beginning of a
  hyperlink to an URL.  All characters produced before the next URL
  sequence form part of the link.
* `CSI U`, or `ESC [ U`, "URL": to indicate the end of the link text
  begun by LNK.  This sequence is followed by the unencoded text of
  the URL to which clicking the link should take the user, preceded by
  "<" and terminating before the next ">" or " ", which is part of the
  escape sequence and not displayed.

As an example, a link labeled "Canonical Hackers" linking to
<http://canonical.org/> could be represented as follows, with `ESC`
representing the ASCII escape character:

    ESC[_Canonical HackersESC[U<http://canonical.org/>

A literal rather than quoted version: [_Canonical
Hackers<http://canonical.org/>.

Rationale for the design of the escape sequence
-----------------------------------------------

Above and beyond the typographical possibilities of whitespace, we
already have boldface, underlining, and different colors in our
terminal emulators.  These are produced by "escape sequences",
invisible sequences of characters typically beginning with the ESC
character (character 27) followed by a "[", which change the state of
the terminal so that subsequently displayed characters will have a
different effect.  The `ESC[` sequence is called "CSI", "Control
Sequence Introducer".

So I think we should define an escape sequence to make the following
(or preceding?)  sequence of characters into a clickable link to a
given URL.

It would be desirable if the escape sequence degraded into something
that was human-readable when displayed on terminals that didn't
support it.  This is only possible to a limited extent, since programs
that try to be aware of screen layout will necessarily be confused
about where the text is on the screen, but it is still somewhat
possible.

ANSI escape sequences can't contain sequences of arbitrary characters,
while URLs can contain most characters.  Consequently the escape
sequences for setting the terminal title aren't ANSI escape sequences:
`ESC]0;new title^G`, where `^G` is the BEL character control-G
(character 7) and can also be ` ESC\ `, and `0` can be replaced with
`1` or `2`.  The `ESC]` sequence is known as "OSC" or "Operating
System Command".

xterm's parsing of the above sequence seems to consume anything
following `ESC]` until the next `^G` or ` ESC\ `, even if it doesn't
start with a digit, which is pretty unfriendly.  This suggests that if
we wanted to use the same `ESC]` introduction sequence, incompatible
xterm implementations would eat the URL if we terminated it with the
same `^G`, and other things as well if we used a different terminator.
Even if most people are now using other terminal emulators, this seems
like a compatibility trap.

Putting the link escape sequence *after* rather than *before* the text
to be marked up offers a certain kind of safety; it's quite easy for
an unterminated color escape sequence to set the next few pages of
output to white on white, or flashing, or underlined, or whatnot.  It
also probably improves matters on terminals that don't yet understand
the escape sequence, as they could see
"Canonical<http://canonical.org/>" rather than
"<http://canonical.org>Canonical".  Since you presumably need two
escape sequences anyway to indicate the boundary of the link, you
might as well put the URL in the final one.  So you have one escape
sequence to indicate "beginning of link" and another one to indicate
"end of link; URL is xyz".

For reasons of tradition and URL-safety, I would like the delimiters
(particularly in the gracefully degraded form) to be `<>`.  Ideally
the beginning and ending sequences would also have a pleasing visual
and memorable symmetry, and would disappear from view entirely in old
terminals.

(In the following, `ESC` means the ASCII character 27, escape.)

This suggests using one of the ASCII matching delimiter pairs
``[](){}<>`'``.  `[]` are right out, since they're already in use.
`ESC(` and `ESC)` cause rxvt to swallow the following character; I'm
not sure what their meaning is, but they seem to be used.
Treating ``
`' `` as a matched pair is sadly out of style, due to the unfortunate
but now nearly universal adoption from Microsoft Windows fonts of a
vertical `'`.

`ESC{` and `ESC}` disappear in rxvt and xterm, render as literal ESC
character glyphs in gnome-terminal, overlaid on top of the {} in my
font, and disappear in konsole while producing warning messages on its
stderr.  `ESC<` and `ESC>` disappear in rxvt; the second disappears in
gnome-terminal, while the first renders as a literal ESC character
glyph; they disappear in konsole.  This suggests that perhaps `ESC<`
and `ESC>` are already taken.

A little looking around suggests that `ESC>` is "set numeric keypad
mode" aka DECKPNM on VT100s, `ESC<` is "exit ANSI mode" on VT52s,
while `ESC(` and `ESC)` are used by VT100s to change character sets
(setaltg0, setaltg1, etc.)

`ESC}` on VT100s is "invoke the G2 character set", according to
<http://rtfm.etla.org/xterm/ctlseq.html>, although that's ignored in
xterm and probably all other modern terminal emulators.

So this suggests the following syntax:

    ESC{Canonical.ESC}<http://canonical.org/>

Actually, though, you could use valid ANSI escape sequences instead of
`ESC{` and `ESC}`.  I wanted to use `ESC[a` for the "begin link"
sequence (like `<a>`), but rxvt already uses it for a nonstandard
"move right" sequence, an alternative spelling of `ESC[C`.  (See
rxvt-2.6.4/src/command.c:2652, inside `process_csi_seq`.)  The full
set of CSI-ending codes handled by rxvt seems to be
``iAeBCaDEFG`dHfIZJK at LMXPT^ScmnrshltgW``, which is to say,
``@ABCDEFGHIJKLMPSTWXZ^`acdefghilmnrst``.

Argh, so what to use for the other escape sequence?  Wikipedia says:

> For two character sequences, the second character is in the range
> ASCII 64 to 95 (@ to _). However, most of the sequences are more
> than two characters, and start with the characters ESC and [ (left
> bracket). This sequence is called CSI for Control Sequence
> Introducer (or Control Sequence Initiator). The final character of
> these sequences is in the range ASCII 64 to 126 (@ to ~).

This suggests that we could use "ESC[_", which konsole reports as an
"Undecodable sequence" and drops, rxvt apparently drops (and isn't in
rxvt's list of CSI codes), xterm and screen drop, and gnome-terminal
displays literally.  "ESC[_" is nice and mnemonic: links are normally
underlined.

`ESC[U` would work for the "end link, begin URL" sequence, which could
be followed by the URL wrapped in `<>`.  If the URL is specified to
end at the next space or `>`, then this sequence would be unlikely to
inadvertently gobble up a large quantity of text when random data is
sent to the terminal that randomly happens to include `ESC[U`.  So
that would give us:

    ESC[_Canonical.ESC[U<http://canonical.org/>

Here's a literal example: I am from the [_Canonical
Hackers.<http://canonical.org/>

Implementing the escape sequence
--------------------------------

You could write your own terminal emulator to support links, but it
probably makes more sense to implement the feature in existing
terminal-emulation software.  The popular free-software terminal
emulators are tmux, screen, gnome-terminal, konsole, rxvt, xterm,
Emacs, and whatever Apple ships, plus perhaps implementation in
ncurses is necessary for much application software to use it.

I looked through the available file on my Ubuntu box to see what other
terminal emulators there are.  The relevant popularity metrics from
<http://popcon.debian.org/main/by_vote> are:

    #rank name                            inst  vote   old recent no-files (maintainer)
    34    libncurses5                    137051 119786  7528  9710    27 (Craig Small)                   
    448   libvte9                        65767 30960 26691  8033    83 (Debian Gnome Maintainers)      
    499   gnome-terminal                 54405 27886 21381  5118    20 (Debian Gnome Maintainers)      
    771   xterm                          77540 17077 50118 10317    28 (Debian X Strike Force)         
    918   screen                         45241 12867 30602  1758    14 (Axel Beckert)                  
    1078  libvte-2.90-9                  27674  9519 11363  5591  1201 (Debian Gnome Maintainers)      
    1182  konsole                        16489  8229  6662  1590     8 (Debian Qt/kde Maintainers)     
    1434  emacsen-common                 28896  5699 19961  2805   431 (Rob Browning)                  
    1609  xfce4-terminal                  9368  4110  4360   896     2 (Debian Xfce Maintainers)       
    2109  tmux                            6908  2154  4244   508     2 (Karl Ferdinand Ebert)          
    2285  lxterminal                      5290  1803  2946   541     0 (Debian Lxde Maintainers)       
    2739  terminator                      2158  1274   764   119     1 (Nicolas Valcárcel Scerpella)   
    2766  yakuake                         2324  1242   972   110     0 (Ana Beatriz Guerrero Lopez)    
    3073  guake                           1499   972   449    78     0 (Sylvestre Ledru)               
    4913  tilda                            724   324   367    33     0 (Davide Truffa)                 
    4920  eterm                           1189   322   787    80     0 (Debian Qa Group)               
    5130  terminal.app                     830   294   509    26     1 (Debian Gnustep Maintainers)    
    5289  rxvt                            2032   274  1634   124     0 (Jan Christoph Nordholz)        
    6432  aterm                            972   180   761    31     0 (Debian Qa Group)               
    6948  cutecom                          796   148   596    50     2 (Roman I Khimov)                
    7207  gtkterm                          781   135   619    27     0 (Sebastien Bacher)              
    7299  sakura                           299   132   155    12     0 (Andrew Starr-bochicchio)       
    7376  ajaxterm                         251   128   116     7     0 (Julien Valroff)                
    7855  mrxvt                            452   110   320    22     0 (Jan Christoph Nordholz)        
    7768  picocom                          535   113   378    43     1 (Matt Palmer)                   
    7975  mlterm                           628   106   448    73     1 (Kenshi Muto)                   
    8386  fbterm                          1392    95  1097   199     1 (Nobuhiro Iwamatsu)             
    8947  roxterm                          470    82    79     5   304 (Tony Houghton)                 
    10260 pterm                            345    59   231    55     0 (Colin Watson)                  
    10458 jfbterm                         1142    56  1007    78     1 (Debian Qa Group)               
    10771 kterm                            560    52   497    11     0 (Ishikawa Mutsumi)              
    13843 evilvte                          117    27    87     3     0 (Wen-yen Chuang)                
    14563 microcom                         187    24   150    13     0 (Alexander Reichle-schmehl)     
    14898 xvt                              212    23   177    11     1 (Sam Hocevar)                   
    15979 termit                            71    19    49     3     0 (Thomas Koch)                   
    19919 vala-terminal                     44    10    32     2     0 (Debian Freesmartphone.org Team)
    24553 xiterm+thai                       36     5    28     3     0 (Neutron Soutmun)               
    24634 bogl-bterm                        41     4    32     5     0 (Samuel Thibault)               
    25809 pyqonsole                         31     4    24     3     0 (Alexandre Fayolle)             
    45654 libterm-vt102-perl                 5     0     5     0     0 (Debian Perl Group)             

(I thought Text::CharWidth might be relevant, but it doesn't seem to
handle escape sequences anyway.)

Now, libvte9 or libvte-2.90-9 is the actual terminal emulator library
that powers a number of the above terminal emulators, at least
gnome-terminal, sakura, xfce4-terminal, vala-terminal, tilda,
lxterminal, gtkterm, and evilvte.  But from looking at the code of
gnome-terminal and xfce4-terminal, each separate application built on
top of libvte would probably have to write some code to handle clicks
on link regions.

It seems that once you add the feature to libncurses5 (in the termcap,
at least), libvte9, and gnome-terminal, it's available to 20% of users
of Debian and similar systems; if you add it to xterm, you get another
12%, although it's perhaps dubious whether upstream xterm will accept
such a patch; adding the feature to screen makes it available to
another 9%, although some of them will still be using other terminals
(such as MacOS X terminal) to connect to their screen; konsole gets
another 6%; emacs ansi-color.el 4%; xfce4-terminal 3%; and tmux 1.6%.

Security
--------

Some escape sequences of the past have been disabled for security
reasons in modern software.  However, in general, it's safe to launch
arbitrary URLs in a normal browser at the moment, or so we believe; so
this should be safe.  It may, however, result in terminal users
getting rickrolled.  It would be useful to have a way to see what the
linked URL is before you click on it.



More information about the Kragen-tol mailing list