a better reload() for Python

Kragen Sitaker kragen@pobox.com
Thu, 24 Jan 2002 03:38:02 -0500 (EST)


This should be available at http://pobox.com/~kragen/sw/newreload-2.tar.gz

"""newreload: a better reload() for Python

reload() has some bugs --- well, let's say documented behavior I don't
like:
1. modules the reloaded module imports don't get reloaded.  This means you
   have to know something about the implementation of a module to reload it
   successfully.
2. reloading the module makes the new version available to modules that have
   imported it, but not to modules that have imported things from it.
3. reloading a module with syntax errors works OK --- it raises an exception
   from reload() and doesn't break the already-loaded module.  But
   reloading a module with semantic errors (e.g. NameErrors during
   initialization) leaves the module in a broken state --- the
   (presumably incorrect) bindings it established before the exception
   remain, and the bindings it would have established after the
   exception (without which the first set of bindings may fail, even
   if they weren't otherwise broken) will not exist.
4. reloading a module from which you have deleted a binding does not delete
   the binding from the module in memory.  This means that your module may
   continue to work when you reload it, even if it's buggy, because of old
   objects sticking around.  The Python manual treats this as a feature that
   lets you reload more quickly, but I've seen it hide bugs more often than
   I've seen it speed reloading.
5. Modules don't get reloaded automatically when they've changed, so
   I'm often left wondering why my bugfix doesn't have any effect.

I don't know how to solve problems (1), (2), and (5).  (Well, I have
some ideas, but this module doesn't implement them.)  This module
solves problems (3) and (4), at the cost of generating some cyclic
garbage.

Problem (3) also affects importing modules; importing a broken module
does indeed create the module in sys.modules.  This module doesn't fix
that module problem marklar marklar.

(3) is a particularly important point for me, because I like to
include a small test suite in my modules which raises an exception if
semantic errors are detected.  Fixing (3) means I can reload my
modules with impunity.  (I'd like to work on an automatic reload
system that reloads the modules whenever I save them, solving problems
(1) and (5); being able to reload modules with impunity is the first
step.)

It is still possible for the module to access the old versions of its
attributes, in a backward-compatible fashion, in fact.  If the module
imports itself, the module it imports will have its previous contents,
if any.  (The first time it's loaded, it will get access to itself,
and will be initially empty.)  It can use 'hasattr' or try:...except
AttributeError: to do things differently depending on whether there's
a previous version of it already loaded or not.  For example, a module
named foo might count the number of times it's been loaded as follows:

import foo
if hasattr(foo, 'loadcount'):
    loadcount = foo.loadcount + 1
else:
    loadcount = 1

To work reliably, the test and fetch must happen before the attribute
is defined, because during initial import and old-style reload,
foo.loadcount refers to the loadcount variable in the module that is
loading --- the one you're just about to redefine.  While reloading
with this function, foo.loadcount refers to the loadcount variable in
another module of the same name, which is the older version that will
be replaced upon successful reload.  So from the time you define
'loadcount' until the time you finish reloading, 'foo.loadcount' will
have a different meaning in the two situations.

(I tried having a name '__old__' that was the old version of the
module, but I decided that the above method was probably better, since
it was backward-compatible and less complex.)

It uses a hideous hack that should work perfectly: it finds an unused
module name in sys.modules, creates a temporary module there, imports
the module into *there*, clears out the old module and moves the
contents of the temporary module into the old module, then deletes the
temporary module.  I wish I had a version of imp.load_module that took
a reference to a module as its first argument instead of the name of a
module.

The 'setmoddict' extension module is optional, but without it, this
version of 'reload' has two serious bugs:

* if you set attributes of the module by accessing it by name, they
  won't be visible as global variable changes to functions in the
  module; and if functions in the module change global variables in
  the module, the changes won't be visible as changes in attributes of
  the module.

* objects that refer to the old contents of the module will probably
  break.  This happens sometimes with the old 'reload' semantics;
  things break in more or less the same way they break with the
  current 'reload', except that deleted globals get deleted.

With setmoddict, using this module to reload a module will probably
make the old contents of the module be cyclic garbage.  Any functions
that were defined in the old module contain a reference to that
module's dictionary.  And that module's dictionary may contain
references to those functions or to a class that contains references
to them.  So Python's reference-counting gc won't collect them when
there are no references to them.  You'll have to invoke the
cycle-finding gc in the gc module for that.

Fixing the cyclic-garbage-on-reload problem is hard.  Probably the
right solution is to use a GC that doesn't suck; reference-counting is
broken by design and inherently pathetically slow.

"""
#'#"

import sys, imp, string

def _find_module_with_dots(name):
    """Like imp.find_module, but supports module names with dots."""
    dotindex = string.rfind(name, '.')
    if dotindex == -1:
        return imp.find_module(name)
    else:
        return imp.find_module(name[dotindex+1:],
                               __import__(name[:dotindex]).__path__)

def _get_empty_module():
    """Returns the name of a module that doesn't exist."""
    i = 0
    while 1:
        modname = "newreload.tmp_mod_%d" % i
        if not sys.modules.has_key(modname):
            return modname
        i = i + 1

def reload(module):
    """A more atomic version of the builtin Python function reload.

    Doesn't leave old bindings around, doesn't install new bindings
    unless the reload finishes successfully (without raising an
    exception).

    """

    someplace_funky = _get_empty_module()
    name = module.__name__
    sys = __import__('sys')
    file, pathname, desc = _find_module_with_dots(name)
    # from this point forward, we have to be careful to clean up:
    # close the file, delete the entry from sys.modules
    try:
        # this is necessary to make __name__ be the right thing during loading:
        sys.modules[someplace_funky] = imp.new_module(name)
        newmodule = imp.load_module(someplace_funky, file, pathname, desc)
        # Rather than moving the new module over, we move its contents into
        # the old module.  This way, other modules that imported the old module
        # now see its new contents.
        # What we'd *really* like is to say
        # 'module.__dict__ = newmodule.__dict__', but we can't do that in pure
        # Python.  I wrote a C module 'setmoddict' to let me do this.
        try:
            from setmoddict import setmoddict
            # note that this makes the old module dictionary unreachable
            setmoddict(module, newmodule.__dict__)
            # This keeps the new module's dictionary from being cleared when
            # the new module is deleted.
            setmoddict(newmodule, {})
        except ImportError:
            # This could be written without accessing __dict__ at all, but this
            # is more concise and probably more efficient.
            module.__dict__.clear()
            # This next statement can't safely refer to anything in
            # global scope. We might be reloading ourselves, in which
            # case global scope is temporarily empty.
            module.__dict__.update(newmodule.__dict__)
            # This is a kludge and a half.  Apparently _PyModule_Clear
            # sets the contents of the module's dictionary to None when
            # the module goes out of existence; this means that functions
            # from the module get None whenever they refer to a global
            # variable in that module.  So we save a reference to the
            # module we imported stuff into so that doesn't happen.
            module.__importedversion__ = newmodule
        return module
    finally:
        if sys.modules.has_key(someplace_funky):
            del sys.modules[someplace_funky]
        file.close()



Here's the optional "setmoddict" extension module the above code uses:
/* Module object modification behind the back of moduleobject.c */

#include "Python.h"

/* This structure is cut and pasted from Objects/moduleobject.c from
 * 2.1.1.  This is generally considered a Bad Thing To Do; it violates
 * encapsulation.  AS A RESULT, UNDER NO CIRCUMSTANCES SHOULD ANYONE
 * USE THIS CODE IN PRODUCTION.  IT COULD MAKE PYTHON CRASH STARTING
 * WITH THE NEXT VERSION.  If setting a module's __dict__ becomes a
 * generally accepted thing to do, then we can add this code to
 * moduleobject.c, where it will be maintained.
 * This module appears to work with Python 2.1.1 and 1.5.2.
 */

typedef struct {
	PyObject_HEAD
	PyObject *md_dict;
} PyModuleObject;


static PyObject *
setmoddict(PyObject *self, PyObject *args)
{
  PyObject *dict, *olddict;
  PyModuleObject *m;
  if (!PyArg_ParseTuple(args, "O!O!:setmoddict", 
			&PyModule_Type, &m, 
			&PyDict_Type, &dict))
    return NULL;
  olddict = m->md_dict;
  m->md_dict = dict;
  Py_INCREF(dict);    /* inc first in case of self-assignment */
  Py_DECREF(olddict);
  Py_INCREF(Py_None); /* !? */
  return Py_None;
}

static PyMethodDef setmoddictmethods[] = {
  {"setmoddict", setmoddict, METH_VARARGS},
  {NULL, NULL}
};

void
initsetmoddict(void)
{
  (void)Py_InitModule("setmoddict", setmoddictmethods);
}


To compile this for Python 2.1 or earlier, put this in a file called
Setup.in:
*shared*
setmoddict setmoddictmodule.c -ldl

Then copy Makefile.pre.in from the Misc directory of the Python
source, do make -f Makefile.pre.in boot, and then do make.