目录
一. 译文
1. 引用(References)
2. 引用回调(Reference Callbacks)
3. 代理(Proxies)
4. 循环引用(Cyclic References)
5. 缓存对象(Caching Objects)
二. 原文
1. References
2. Reference Callbacks
3. Proxies
4. Cyclic References
5. Caching Objects
这翻译本是自用的,打算也发出来,方便有需求的同志直接参考。译文经过部分修改润色,也会给出原文。
弱引用——对象的垃圾回收引用
作者: Doug Hellmann(道格 赫尔曼)
下面是弱引用模块的简介:
目的:引用一个“昂贵”的对象,但允许其在没有其他非弱引用时被垃圾回收。
可用性:自 Python 2.1 起提供。
weakref 模块支持对象的弱引用。普通引用会增加对象的引用计数,并阻止其被垃圾回收。这在某些情况下并不理想,例如存在循环引用或需要构建一个在内存不足时可以删除的对象缓存。
通过 ref 类管理对对象的弱引用。要检索原始对象,需要调用引用对象。
import weakref
class ExpensiveObject(object):
def __del__(self):
print('(Deleting %s)' % self)
obj = ExpensiveObject()
r = weakref.ref(obj)
print('obj:', obj)
print('ref:', r)
print('r():', r())
print('deleting obj')
del obj
print('r():', r())
在此示例中,obj 被删除后,第二次调用引用时会返回 None。
$ python weakref_ref.py
obj: <__main__.ExpensiveObject object at 0x10046d410>
ref:
r(): <__main__.ExpensiveObject object at 0x10046d410>
deleting obj
(Deleting <__main__.ExpensiveObject object at 0x10046d410>)
r(): None
ref 构造函数可以接受一个可选的第二个参数,作为回调函数在引用对象被删除时调用。
import weakref
class ExpensiveObject(object):
def __del__(self):
print('(Deleting %s)' % self)
def callback(reference):
print('callback(', reference, ')')
obj = ExpensiveObject()
r = weakref.ref(obj, callback)
print('obj:', obj)
print('ref:', r)
print('r():', r())
print('deleting obj')
del obj
print('r():', r())
回调函数在引用对象变为“dead”状态后接收引用对象作为参数。例如,可以用于从缓存中移除弱引用对象。
obj: <__main__.ExpensiveObject object at 0x10046c610>
ref:
r(): <__main__.ExpensiveObject object at 0x10046c610>
deleting obj
callback( )
(Deleting <__main__.ExpensiveObject object at 0x10046c610>)
r(): None
与直接使用 ref 相比,使用代理(proxy)可能更方便。代理可以像原始对象一样使用,无需先调用 ref。
import weakref
class ExpensiveObject(object):
def __init__(self, name):
self.name = name
def __del__(self):
print('(Deleting %s)' % self)
obj = ExpensiveObject('My Object')
r = weakref.ref(obj)
p = weakref.proxy(obj)
print('via obj:', obj.name)
print('via ref:', r().name)
print('via proxy:', p.name)
del obj
print('via proxy:', p.name)
如果代理在引用对象被删除后仍被访问,会引发 ReferenceError 异常。
via obj: My Object
via ref: My Object
via proxy: My Object
(Deleting <__main__.ExpensiveObject object at 0x10046b490>)
via proxy:
Traceback (most recent call last):
File "weakref_proxy.py", line 26, in
print 'via proxy:', p.name
ReferenceError: weakly-referenced object no longer exists
弱引用的一个用途是允许循环引用而不阻止垃圾回收。以下示例展示了在包含循环的图结构中,普通对象和代理的行为差异。
首先,我们需要一个Graph类,它接受给定的任何对象作为序列中的“下一个”节点。
为了简洁起见,此图支持来自每个节点的单个传出引用,这会导致图很无聊,但很容易创建循环。
函数demo()是一个实用函数,通过创建循环然后删除各种引用来练习图类。
import gc
from pprint import pprint
import weakref
class Graph(object):
def __init__(self, name):
self.name = name
self.other = None
def set_next(self, other):
print '%s.set_next(%s (%s))' % (self.name, other, type(other))
self.other = other
def all_nodes(self):
"Generate the nodes in the graph sequence."
yield self
n = self.other
while n and n.name != self.name:
yield n
n = n.other
if n is self:
yield n
return
def __str__(self):
return '->'.join([n.name for n in self.all_nodes()])
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
def __del__(self):
print '(Deleting %s)' % self.name
self.set_next(None)
class WeakGraph(Graph):
def set_next(self, other):
if other is not None:
# See if we should replace the reference
# to other with a weakref.
if self in other.all_nodes():
other = weakref.proxy(other)
super(WeakGraph, self).set_next(other)
return
def collect_and_show_garbage():
"Show what garbage is present."
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Garbage:',
pprint(gc.garbage)
def demo(graph_factory):
print 'Set up graph:'
one = graph_factory('one')
two = graph_factory('two')
three = graph_factory('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
print
print 'Graphs:'
print str(one)
print str(two)
print str(three)
collect_and_show_garbage()
print
three = None
two = None
print 'After 2 references removed:'
print str(one)
collect_and_show_garbage()
print
print 'Removing last reference:'
one = None
collect_and_show_garbage()
现在,我们可以使用gc模块设置一个测试程序来帮助我们调试泄漏。DEBUG_LEAK标志使gc打印有关对象的信息,这些对象只能通过垃圾收集器对它们的引用才能看到。
import gc
from pprint import pprint
import weakref
from weakref_graph import Graph, demo, collect_and_show_garbage
gc.set_debug(gc.DEBUG_LEAK)
print 'Setting up the cycle'
print
demo(Graph)
print
print 'Breaking the cycle and cleaning up garbage'
print
gc.garbage[0].set_next(None)
while gc.garbage:
del gc.garbage[0]
print
collect_and_show_garbage()
即使在demo()中删除了对Graph实例的本地引用,这些图也会全部显示在垃圾列表中,无法被收集。垃圾列表中的字典保存Graph实例的属性。我们可以强制删除这些图,因为我们知道它们是什么:
$ python -u weakref_cycle.py
Setting up the cycle
Set up graph:
one.set_next(two ())
two.set_next(three ())
three.set_next(one->two->three ())
Graphs:
one->two->three->one
two->three->one->two
three->one->two->three
Collecting...
Unreachable objects: 0
Garbage:[]
After 2 references removed:
one->two->three->one
Collecting...
Unreachable objects: 0
Garbage:[]
Removing last reference:
Collecting...
gc: uncollectable
gc: uncollectable
gc: uncollectable
gc: uncollectable
gc: uncollectable
gc: uncollectable
Unreachable objects: 6
Garbage:[Graph(one),
Graph(two),
Graph(three),
{'name': 'one', 'other': Graph(two)},
{'name': 'two', 'other': Graph(three)},
{'name': 'three', 'other': Graph(one)}]
Breaking the cycle and cleaning up garbage
one.set_next(None ())
(Deleting two)
two.set_next(None ())
(Deleting three)
three.set_next(None ())
(Deleting one)
one.set_next(None ())
Collecting...
Unreachable objects: 0
Garbage:[]
现在,让我们定义一个更智能的WeakGraph类,它知道不使用常规引用创建循环,而是在检测到循环时使用ref。
import gc
from pprint import pprint
import weakref
from weakref_graph import Graph, demo
class WeakGraph(Graph):
def set_next(self, other):
if other is not None:
# See if we should replace the reference
# to other with a weakref.
if self in other.all_nodes():
other = weakref.proxy(other)
super(WeakGraph, self).set_next(other)
return
demo(WeakGraph)
由于WeakGraph实例使用代理来引用已经看到的对象,当demo()删除对对象的所有本地引用时,循环被打破,垃圾回收程序可以为我们删除对象。
$ python weakref_weakgraph.py
Set up graph:
one.set_next(two ())
two.set_next(three ())
three.set_next(one->two->three ())
Graphs:
one->two->three
two->three->one->two
three->one->two->three
Collecting...
Unreachable objects: 0
Garbage:[]
After 2 references removed:
one->two->three
Collecting...
Unreachable objects: 0
Garbage:[]
Removing last reference:
(Deleting one)
one.set_next(None ())
(Deleting two)
two.set_next(None ())
(Deleting three)
three.set_next(None ())
Collecting...
Unreachable objects: 0
Garbage:[]
ref和代理类被视为“低级”。虽然它们有助于维护对单个对象的弱引用并允许垃圾回收循环,但如果你需要创建多个对象的缓存,WeakKeyDictionary和WeakValueDictionary提供了更合适的API。
正如你所料,WeakValueDictionary使用对它所持有的值的弱引用,允许在其他代码实际上没有使用它们时对它们进行垃圾回收。
为了说明使用常规字典和WeakValueDictionary进行内存处理之间的区别,让我们再次尝试显式调用垃圾收集器:
import gc
from pprint import pprint
import weakref
gc.set_debug(gc.DEBUG_LEAK)
class ExpensiveObject(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return 'ExpensiveObject(%s)' % self.name
def __del__(self):
print '(Deleting %s)' % self
def demo(cache_factory):
# hold objects so any weak references
# are not removed immediately
all_refs = {}
# the cache using the factory we're given
print 'CACHE TYPE:', cache_factory
cache = cache_factory()
for name in [ 'one', 'two', 'three' ]:
o = ExpensiveObject(name)
cache[name] = o
all_refs[name] = o
del o # decref
print 'all_refs =',
pprint(all_refs)
print 'Before, cache contains:', cache.keys()
for name, value in cache.items():
print ' %s = %s' % (name, value)
del value # decref
# Remove all references to our objects except the cache
print 'Cleanup:'
del all_refs
gc.collect()
print 'After, cache contains:', cache.keys()
for name, value in cache.items():
print ' %s = %s' % (name, value)
print 'demo returning'
return
demo(dict)
print
demo(weakref.WeakValueDictionary)
请注意,必须显式清除引用我们正在缓存的值的任何循环变量,以减少对象的引用计数。否则,垃圾收集器将不会删除对象,它们将保留在缓存中。同样,all_refs变量用于保存引用,以防止它们过早地被垃圾回收。
$ python weakref_valuedict.py
CACHE TYPE:
all_refs ={'one': ExpensiveObject(one),
'three': ExpensiveObject(three),
'two': ExpensiveObject(two)}
Before, cache contains: ['three', 'two', 'one']
three = ExpensiveObject(three)
two = ExpensiveObject(two)
one = ExpensiveObject(one)
Cleanup:
After, cache contains: ['three', 'two', 'one']
three = ExpensiveObject(three)
two = ExpensiveObject(two)
one = ExpensiveObject(one)
demo returning
(Deleting ExpensiveObject(three))
(Deleting ExpensiveObject(two))
(Deleting ExpensiveObject(one))
CACHE TYPE: weakref.WeakValueDictionary
all_refs ={'one': ExpensiveObject(one),
'three': ExpensiveObject(three),
'two': ExpensiveObject(two)}
Before, cache contains: ['three', 'two', 'one']
three = ExpensiveObject(three)
two = ExpensiveObject(two)
one = ExpensiveObject(one)
Cleanup:
(Deleting ExpensiveObject(three))
(Deleting ExpensiveObject(two))
(Deleting ExpensiveObject(one))
After, cache contains: []
demo returning
WeakKeyDictionary的工作原理类似,但对键使用弱引用,而不是字典中的值。
weakref的库文档包含此警告:
警告:由于WeakValueDictionary构建在Python字典之上,因此在迭代时不能改变大小。对于WeakValueDictionary来说,这可能很难保证,因为程序在迭代过程中执行的操作可能会导致字典中的项“魔术般”消失(作为垃圾收集的副作用)。
weakref – Garbage-collectable references to objects
by Doug Hellmann
Purpose:Refer to an “expensive” object, but allow it to be garbage collected if there are no other non-weak references.
Available In:Since 2.1
The weakref module supports weak references to objects. A normal reference increments the reference count on the object and prevents it from being garbage collected. This is not always desirable, either when a circular reference might be present or when building a cache of objects that should be deleted when memory is needed.
Weak references to your objects are managed through the ref class. To retrieve the original object, call the reference object.
import weakref
class ExpensiveObject(object):
def __del__(self):
print '(Deleting %s)' % self
obj = ExpensiveObject()
r = weakref.ref(obj)
print 'obj:', obj
print 'ref:', r
print 'r():', r()
print 'deleting obj'
del obj
print 'r():', r()
In this case, since obj is deleted before the second call to the reference, the ref returns None.
$ python weakref_ref.py obj: <__main__.ExpensiveObject object at 0x10046d410> ref:r(): <__main__.ExpensiveObject object at 0x10046d410> deleting obj (Deleting <__main__.ExpensiveObject object at 0x10046d410>) r(): None
The ref constructor takes an optional second argument that should be a callback function to invoke when the referenced object is deleted.
import weakref
class ExpensiveObject(object):
def __del__(self):
print '(Deleting %s)' % self
def callback(reference):
"""Invoked when referenced object is deleted"""
print 'callback(', reference, ')'
obj = ExpensiveObject()
r = weakref.ref(obj, callback)
print 'obj:', obj
print 'ref:', r
print 'r():', r()
print 'deleting obj'
del obj
print 'r():', r()
The callback receives the reference object as an argument, after the reference is “dead” and no longer refers to the original object. This lets you remove the weak reference object from a cache, for example.
$ python weakref_ref_callback.py obj: <__main__.ExpensiveObject object at 0x10046c610> ref:r(): <__main__.ExpensiveObject object at 0x10046c610> deleting obj callback( ) (Deleting <__main__.ExpensiveObject object at 0x10046c610>) r(): None
Instead of using ref directly, it can be more convenient to use a proxy. Proxies can be used as though they were the original object, so you do not need to call the ref first to access the object.
import weakref
class ExpensiveObject(object):
def __init__(self, name):
self.name = name
def __del__(self):
print '(Deleting %s)' % self
obj = ExpensiveObject('My Object')
r = weakref.ref(obj)
p = weakref.proxy(obj)
print 'via obj:', obj.name
print 'via ref:', r().name
print 'via proxy:', p.name
del obj
print 'via proxy:', p.name
If the proxy is access after the referent object is removed, a ReferenceError exception is raised.
$ python weakref_proxy.py via obj: My Object via ref: My Object via proxy: My Object (Deleting <__main__.ExpensiveObject object at 0x10046b490>) via proxy: Traceback (most recent call last): File "weakref_proxy.py", line 26, inprint 'via proxy:', p.name ReferenceError: weakly-referenced object no longer exists
One use for weak references is to allow cyclic references without preventing garbage collection. This example illustrates the difference between using regular objects and proxies when a graph includes a cycle.
First, we need a Graph class that accepts any object given to it as the “next” node in the sequence. For the sake of brevity, this Graph supports a single outgoing reference from each node, which results in boring graphs but makes it easy to create cycles. The function demo() is a utility function to exercise the graph class by creating a cycle and then removing various references.
import gc
from pprint import pprint
import weakref
class Graph(object):
def __init__(self, name):
self.name = name
self.other = None
def set_next(self, other):
print '%s.set_next(%s (%s))' % (self.name, other, type(other))
self.other = other
def all_nodes(self):
"Generate the nodes in the graph sequence."
yield self
n = self.other
while n and n.name != self.name:
yield n
n = n.other
if n is self:
yield n
return
def __str__(self):
return '->'.join([n.name for n in self.all_nodes()])
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
def __del__(self):
print '(Deleting %s)' % self.name
self.set_next(None)
class WeakGraph(Graph):
def set_next(self, other):
if other is not None:
# See if we should replace the reference
# to other with a weakref.
if self in other.all_nodes():
other = weakref.proxy(other)
super(WeakGraph, self).set_next(other)
return
def collect_and_show_garbage():
"Show what garbage is present."
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Garbage:',
pprint(gc.garbage)
def demo(graph_factory):
print 'Set up graph:'
one = graph_factory('one')
two = graph_factory('two')
three = graph_factory('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
print
print 'Graphs:'
print str(one)
print str(two)
print str(three)
collect_and_show_garbage()
print
three = None
two = None
print 'After 2 references removed:'
print str(one)
collect_and_show_garbage()
print
print 'Removing last reference:'
one = None
collect_and_show_garbage()
Now we can set up a test program using the gc module to help us debug the leak. The DEBUG_LEAK flag causes gc to print information about objects that cannot be seen other than through the reference the garbage collector has to them.
import gc
from pprint import pprint
import weakref
from weakref_graph import Graph, demo, collect_and_show_garbage
gc.set_debug(gc.DEBUG_LEAK)
print 'Setting up the cycle'
print
demo(Graph)
print
print 'Breaking the cycle and cleaning up garbage'
print
gc.garbage[0].set_next(None)
while gc.garbage:
del gc.garbage[0]
print
collect_and_show_garbage()
Even after deleting the local references to the Graph instances in demo(), the graphs all show up in the garbage list and cannot be collected. The dictionaries in the garbage list hold the attributes of the Graph instances. We can forcibly delete the graphs, since we know what they are:
$ python -u weakref_cycle.py Setting up the cycle Set up graph: one.set_next(two ()) two.set_next(three ( )) three.set_next(one->two->three ( )) Graphs: one->two->three->one two->three->one->two three->one->two->three Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: Collecting... gc: uncollectable gc: uncollectable gc: uncollectable gc: uncollectable gc: uncollectable gc: uncollectable Unreachable objects: 6 Garbage:[Graph(one), Graph(two), Graph(three), {'name': 'one', 'other': Graph(two)}, {'name': 'two', 'other': Graph(three)}, {'name': 'three', 'other': Graph(one)}] Breaking the cycle and cleaning up garbage one.set_next(None ( )) (Deleting two) two.set_next(None ( )) (Deleting three) three.set_next(None ( )) (Deleting one) one.set_next(None ( )) Collecting... Unreachable objects: 0 Garbage:[]
And now let’s define a more intelligent WeakGraph class that knows not to create cycles using regular references, but to use a ref when a cycle is detected.
import gc
from pprint import pprint
import weakref
from weakref_graph import Graph, demo
class WeakGraph(Graph):
def set_next(self, other):
if other is not None:
# See if we should replace the reference
# to other with a weakref.
if self in other.all_nodes():
other = weakref.proxy(other)
super(WeakGraph, self).set_next(other)
return
demo(WeakGraph)
Since the WeakGraph instances use proxies to refer to objects that have already been seen, as demo() removes all local references to the objects, the cycle is broken and the garbage collector can delete the objects for us.
$ python weakref_weakgraph.py Set up graph: one.set_next(two ()) two.set_next(three ( )) three.set_next(one->two->three ( )) Graphs: one->two->three two->three->one->two three->one->two->three Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: (Deleting one) one.set_next(None ( )) (Deleting two) two.set_next(None ( )) (Deleting three) three.set_next(None ( )) Collecting... Unreachable objects: 0 Garbage:[]
The ref and proxy classes are considered “low level”. While they are useful for maintaining weak references to individual objects and allowing cycles to be garbage collected, if you need to create a cache of several objects the WeakKeyDictionary and WeakValueDictionary provide a more appropriate API.
As you might expect, the WeakValueDictionary uses weak references to the values it holds, allowing them to be garbage collected when other code is not actually using them.
To illustrate the difference between memory handling with a regular dictionary and WeakValueDictionary, let’s go experiment with explicitly calling the garbage collector again:
import gc
from pprint import pprint
import weakref
gc.set_debug(gc.DEBUG_LEAK)
class ExpensiveObject(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return 'ExpensiveObject(%s)' % self.name
def __del__(self):
print '(Deleting %s)' % self
def demo(cache_factory):
# hold objects so any weak references
# are not removed immediately
all_refs = {}
# the cache using the factory we're given
print 'CACHE TYPE:', cache_factory
cache = cache_factory()
for name in [ 'one', 'two', 'three' ]:
o = ExpensiveObject(name)
cache[name] = o
all_refs[name] = o
del o # decref
print 'all_refs =',
pprint(all_refs)
print 'Before, cache contains:', cache.keys()
for name, value in cache.items():
print ' %s = %s' % (name, value)
del value # decref
# Remove all references to our objects except the cache
print 'Cleanup:'
del all_refs
gc.collect()
print 'After, cache contains:', cache.keys()
for name, value in cache.items():
print ' %s = %s' % (name, value)
print 'demo returning'
return
demo(dict)
print
demo(weakref.WeakValueDictionary)
Notice that any loop variables that refer to the values we are caching must be cleared explicitly to decrement the reference count on the object. Otherwise the garbage collector would not remove the objects and they would remain in the cache. Similarly, the all_refs variable is used to hold references to prevent them from being garbage collected prematurely.
$ python weakref_valuedict.py CACHE TYPE:all_refs ={'one': ExpensiveObject(one), 'three': ExpensiveObject(three), 'two': ExpensiveObject(two)} Before, cache contains: ['three', 'two', 'one'] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: After, cache contains: ['three', 'two', 'one'] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) demo returning (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) CACHE TYPE: weakref.WeakValueDictionary all_refs ={'one': ExpensiveObject(one), 'three': ExpensiveObject(three), 'two': ExpensiveObject(two)} Before, cache contains: ['three', 'two', 'one'] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) After, cache contains: [] demo returning
The WeakKeyDictionary works similarly but uses weak references for the keys instead of the values in the dictionary.
The library documentation for weakref contains this warning:
Warning
Caution: Because a WeakValueDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakValueDictionary because actions performed by the program during iteration may cause items in the dictionary to vanish “by magic” (as a side effect of garbage collection).