월요일, 12월 11, 2006

Python memory leaks

Tracking Down Memory Leaks in Python


This Page Mostly Obsolete

Most of the stuff on this page is obsolete nowadays, since Python (since version 2.0, I think) now includes a cyclical-reference garbage collector. Leaks like this are still possible, but usually only because an older Python extension module (implementing a container type) is used, one that doesn't adhere to the new GC API.


Long-running processes have a nasty habit of exposing Python's Achilles' Heel: Memory Leaks created by cycles. (objects that point at each other, either directly or circuitously). Reference counting cannot collect cycles. Here's one way to create a cycle:

class thing:
pass

refcount(a) refcount(b)
a = thing() 1
b = thing() 1 1
a.other = b 1 2
b.other = a 2 2

del a 1 2
del b 1 1

Objects a and b have become immortal.

Large and complex systems may create non-obvious cycles. Here are a few quick hints to avoid various ones that I've run into:

  • Be careful with bound method objects. Bound methods are created whenever you refer to a method object through an instance. This happens safely every time you call a method; but a common programming style ('functional') passes such objects around, stores them in variables, etc... In one case storing a bound method in the object made it immortal. Either del this object manually, or change your code.
  • Tracebacks are very dangerous. To be really safe, del a traceback if you have a handle to it. Another good idea is to assign None to both sys.traceback and sys.exc_traceback.
    Tracebacks can capture a large number of objects, especially within Medusa. For example, any exception handler that is called from within the polling loop (no matter how deeply), should do something like this...
        def my_method (self):
    try:
    do_something()
    except:
    try:
    ei = sys.exc_info()
    [... report error ...]
    finally:
    del ei

    ...otherwise it will capture references to every object in the socket map. I have plugged some really bad leaks this way.
  • Keep track of your objects. Either keep a count of how many are around (in a __del__ method), or keep a dictionary of the addresses of all outstanding objects. This will help locate leaks.
    class thing:

    all_things = {}

    def __init__ (self):
    thing.all_things[id(self)] = 1

    def __del__ (self):
    del thing.all_things[id(self)]
    Here is a module that will let you resurrect leaked objects. Using this module should fill you with shame. Make sure no one is looking.
    for addr in thing.all_things.keys():
    r = resurrect.conjure (addr)
    # examine r...
  • Here's the easiest way to find leaking objects: examine the reference count of each of your class objects. This will be roughly equal to the number of extant instance objects.
    # -*- Mode: Python; tab-width: 4 -*-

    import sys
    import types

    def get_refcounts():
    d = {}
    sys.modules
    # collect all classes
    for m in sys.modules.values():
    for sym in dir(m):
    o = getattr (m, sym)
    if type(o) is types.ClassType:
    d[o] = sys.getrefcount (o)
    # sort by refcount
    pairs = map (lambda x: (x[1],x[0]), d.items())
    pairs.sort()
    pairs.reverse()
    return pairs

    def print_top_100():
    for n, c in get_refcounts()[:100]:
    print '%10d %s' % (n, c.__name__)

    if __name__ == '__main__':
    top_100()

Notes

An interface to malloc_stats() [Linux]

원문: http://www.nightmare.com/medusa/memory-leaks.html

댓글 없음: