generating Gray code with an iterated function system

Kragen Javier Sitaker kragen at canonical.org
Mon Nov 7 10:39:18 EST 2011


Michael: I thought you might be interested in this; if so, context is at
<http://lists.canonical.org/pipermail/kragen-hacks/2011-November/000532.html>
and
<http://lists.canonical.org/pipermail/kragen-discuss/2011-November/001188.html>.

On Mon, Nov 07, 2011 at 09:05:04AM +0100, Eugen Leitl wrote:
> Can't post to kragen-hacks

Sorry, forwarded to kragen-discuss.

> Kragen, do you see a computationally (=fewest amounts of gates) cheapest
> way of counting in Gray code that only does local (or as close to local)
> bit flips?

I'm guessing that what you mean by "local" is "can run fast in the
computational limiting scenario where latency induced by physical
distance between gates, rather than propagation delays, is the
dominating factor in computational speed"?

# A plain counter: O(n log n) gates with O(n**(1/3)) latency #

As a baseline, a simple way to generate the reflected binary gray code
I'm using is to run a binary counter and then stick XORs between
adjacent output bits: `(x ^ x >> 1 for x in range(2**n))`, in the Python
notation for such processes.  The XORs are nice and local, but a carry
in the counting could originate from as much as n**(1/3) bits away.
Getting the carry to propagate in less than O(n) time requires O(n log
n) lookahead logic with a path length of O(log n), which is less than
O(n**(1/3)).  So that's O(n log n) gates with O(n**(1/3)) latency, which
is not too terrible.

# Ring counters: O(2**n) gates with O(1) latency #

Can we do better?  You can clearly do O(1) if you accept O(2**n) gates:
just use a ring counter for each bit.  The ring counters don't need to
interact at all, and updating them is O(1).  But that's an outrageous
space cost.

# A carry-save counter doesn't seem to work #

If you simply double the number of flip-flops to 2n, you can make a
carry-save counter, which updates in O(1) time; it's like a
Chinese-style abacus, where each bit can contain a 0, 1, or a 2 that
represents a carry that hasn't yet been propagated to the next higher
bit.  In general, the carry-save representation can have many
representations for a given number --- 1000, 0200, 0120, and 0112 are
all equivalent, for example --- but only one of those representations
will arise in the normal counting sequence of a carry-save counter.  I
suspect that you may be able to purely locally (i.e. with communication
over O(1) distance) decode the carry-save counter state to the same
reflected binary Gray code, but I haven't proved that.  The simple way,
however, doesn't seem to work:

Here's a four-bit carry-save counter's sequence of states, along with
the binary and RBGC, if I didn't screw anything up:

     x   cs  bin rbgc
     4 0012 0100 0110
     5 0021 0101 0111
     6 0102 0110 0101
     7 0111 0111 0100
     8 0112 1000 1100
     9 0121 1001 1101
    10 0202 1010 1111
    11 1011 1011 1110
    12 1012 1100 1010
    13 1021 1101 1011
    14 1102 1110 1001
    15 1111 1111 1000
     0 1112 0000 0000
     1 1121 0001 0001
     2 1202 0010 0011
     3 2011 0011 0010

The question is, I guess, how much of the carry-save counter state do
you need to look at to produce e.g. the leading bit of the Gray code?
All of it, it turns out, because cs=1112 gives you rbgc[0]=0, while
cs=1111 gives you rbgc[0]=1.  Maybe there's a way to adjust the
correspondence so that the scheme works out, but it isn't obvious.

# Many carry-save counters should work: O((log n)**2) gates, O(1) latency #

However, a carry-save counter can generate a single toggling bit just
fine: just stick an XOR on the two bits of its most significant digit.
So, as an improvement to the ring-counter-per-bit idea, you could have a
separate independent carry-save counter per bit --- a two-digit counter
for the LSB, a three-digit counter for the next one, a four-digit
counter for the next one, and so on --- and initialize them so that
their digit sequences toggle in the skewed fashion needed to generate
RBGC.  You need O(log n) counters each containing O(log n) bits.  

That's worse than the theoretical optimum only by a factor of O(log n);
can you do better?

Kragen


More information about the Kragen-discuss mailing list