saving power in portable computers: speculations

Kragen Sitaker kragen@pobox.com
Sat, 4 May 2002 04:46:59 -0400 (EDT)


The time to charge up a FET gate electrode is inversely proportional
to the current into it, which is proportional to the applied voltage.
So one thing overclockers like to do with their CPUs is raise the core
voltage so the chip works reliably at higher clock rates.

So Transmeta's mobile CPUs take the opposite approach: when maximum
speed is not needed, drop the core voltage to conserve power.  Power
consumption at a fixed voltage (on CMOS CPUs) is close to directly
proportional to the clock speed; power consumption at a fixed clock
speed is proportional to the square of the voltage.  And the minimum
reliable voltage is proportional to the clock speed, so the power
consumption is proportional to the cube of the clock speed (plus,
unfortunately, a constant).  So even a minimal slowdown can pay off
big in power usage.

I seem to recall that Transmeta's CPUs aren't making as much of a
difference in power consumption as some had hoped, so they aren't
being widely deployed, which is a shame.  I don't know why this is,
but here are speculations on some other things that can be done to
conserve power for portable computers.

Probably most importantly, use a simpler CPU, for crying out loud.
Debian runs on everything.

Use a battery-backed NVRAM write-back disk cache so the disk can stay
spun down most of the time.  This seems like common sense, but the
only company I'm familiar with that does this is NetApp, and they do
it for their expensive fileservers, not for laptops.  Once the disk
spins up, reading from and writing to it cost very little energy.
Keeping enough data in RAM to keep the disk spun down most of the time
is a tricky engineering problem of its own; it requires designing
software and data to fit users' work patterns, and also guiding users'
usage patterns so they can go many seconds at a time without needing
data not in memory.

Compile programs.  Don't run programs in which the speed-critical
parts are in interpreted languages.  The argument that CPUs are fast
enough that it doesn't matter doesn't apply here.  If the CPU can
throttle its clock speed down by a factor of 2, it can decrease its
power usage by a factor of nearly 8.

Having to wait for the disk to spin up is very annoying, but this
annoyance can be mitigated by software that is good at anticipating
when the user is likely to need data from out of memory in a few more
seconds by detecting that they are about to context-switch.  For
example, if the user begins selecting among open applications or
navigating parts of the filesystem that aren't fully cached in RAM,
they are likely to need data from disk soon, and spinning up the disk
in anticipation may save them frustration.

DRAM refresh cycles consume power even when memory isn't being
accessed.  So if your DRAM is in physical chips that are mapped into
different parts of your physical address space --- rather than, say,
being interleaved --- you can save power by not refreshing chips you
aren't using.  If you have a virtual memory OS, it can work
proactively to compact pages in use into lower memory (transparently
to user processes, just by remapping), swap out pages that aren't
being actively used, and use physical memory for disk caching less ---
at least less than Linux does --- you could often keep most of your
RAM off.  This is in conflict with the previous suggestion, to keep
the disk spun down as much as possible.  I suspect they can both be
used in the same system, but carrying either to the extreme would
result in a painfully slow system.

Computation on my computer happens for several reasons.  

Some computation happens in order to respond to my input and provide
me feedback on the screen.  I probably won't notice whether that
computation takes 100 microseconds, 1 millisecond, or 100
milliseconds.  The computation should be run at the minimal clock
speed needed to guarantee that it will probably complete shortly
before 100 milliseconds.  On a CPU where core voltage is adjusted in
concert with clock speed, running some of it at one speed and the rest
of it at another speed will cost more energy than running it all at
the average of those speeds, but take the same time.  This requires
the system to be able to estimate how many clock cycles it will take
to fulfill some request, a feature not present in most existing
software.

This will sometimes mean that the CPU will need to run at its maximum
possible speed for short periods of time in order to provide better
responsiveness.

All other computation happens to prepare the system to respond to user
requests in the future.  For example, once a day, my system rebuilds
an index of the names of all the files on its disk so I can search
them quickly.  This kind of computation should ideally proceed
continuously at a very low speed.  For example, if my system can
execute maximally a billion clock cycles in a second, or 86 trillion
in a day, and the computational part of reindexing the disk needs only
86 billion clock cycles, it should run continuously at a speed of
roughly a thousand clock cycles per second, costing essentially no CPU
power --- nominally 10^-18 of the CPU's maximum power consumption, so
lost in the noise.  This permits it to take 24 hours to complete.
(Probably, of course, the right balance is somewhere between
continuous execution, which means continuous memory consumption and
thus maximum RAM power waste, and maximum speed, which means maximum
CPU power waste.)

This kind of computation doesn't need to finish under a tight deadline
and can therefore afford to wait for disk access until the next time
the disk spins up.  Such jobs have some ratio between CPU cycles and
disk accesses; if they need to hit the disk once every 60 000 CPU
cycles, and they are running at 1000 cycles per second, then the disk
spinning up every 30 seconds will probably keep them running most of
the time, but the disk spinning up every 120 seconds will leave them
starved.  This ratio can be modified by caching policies and by
multithreading; if you can split this indexing process into 10 threads
where each thread needs to hit the disk every 60 000 CPU cycles, then
you can run each thread at 100 cycles per second with the same
throughput and tolerate spinning up the disk 10 times less often, and
you'll also smooth out variations in individual-thread behavior.
Caching disk content is another method; if you can compute the
locations of the next 10 pieces of data the process will need from the
disk every time you spin the disk up, you can get away with spinning
up the disk 10 times less often.

This kind of background computation has the potential to drastically
reduce CPU power usage in portable computers; if computing an answer
takes 100 million cycles, and the user makes a request every ten
seconds or so, then if you can compute their answer before they ask
for it, you can compute it at 10 million cycles per second.  If you
wait until they ask for it, to deliver it in a second, you have to
compute it at 100 million cycles per second, which will cost 100 times
as much energy as if you had computed it in advance.

To be concrete, for example, you can save power by doing the following:
- compile programs, glacially slowly, as they are typed instead of when 
  the programmer asks to see the errors
- index things promiscuously; text searches in an editor should consult an
  index instead of examining all the text in the buffer.
- zero out pages of memory in the idle task instead of when they are 
  requested
- use backing store in windowing systems instead of redrawing window contents
  when the windows are exposed
- continue to decode digital audio files, in slower than real time, when
  they are paused
- when someone is reading a document onscreen, render the parts immediately
  offscreen, just fast enough that you'll be done shortly before they go to
  look at those parts.  This will prevent having to render those parts at
  full CPU speed when the person goes to look at them.

Optimizations like these will generally improve responsiveness for
systems where power consumption doesn't matter, too, but cutting CPU
power consumption by an order of magnitude or more might be a more
compelling reason for the extra complexity they impose.  (All this
makes me wonder why my Samsung cell phone apparently re-sorts my phone
book every time I go to look at it.)

Of course, all of these have their CPU and memory costs, so they're
not no-brainers and must be applied judiciously, with the right
heuristics to know when to stop.

Backlit LCD displays will probably be replaced with one or another
variety of electrostatic "electronic ink" (like E Ink's or
gyriconmedia.com's) soon; this will be a major improvement both for
outdoor use of laptops and for power consumption.

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
"Why are you withholding me?" -- name withheld  "Oh... And dig this:  I
am a fish.  'Nuff said." -- Joe Blaylock (no further explanation)
These are the denizens of the CLUG mailing list.  Their five-year mission: