杭州锐达数字技术有限公司
查看: 3086|回复: 0

[Python] 用cPickle库保存对象的两个问题,logging和shared instances

[复制链接]
发表于 2014-5-24 23:04 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有帐号?我要加入

x
本帖最后由 Rainyboy 于 2014-5-24 21:26 编辑

1,简介
cPickle库赋予PYthon将用户自定义的对象动态保存到磁盘中的能力,这在实际环境中的作用不言自明,但是由于要保存的数据存在各自的特点,cPickle库并不是每次都按照预想的那样工作,我最近就遇到了一些情况,在这里与大家探讨一下。

2,cPickl库
详细的文档见这里,用cPickle保存和还原对象的方法比较固定,可以事先写成函数:
文件:SRtool.py 最终版

Python 代码,双击复制代码
import cPickle as CP

def save_object(obj, filename):
    with open(filename, 'wb') as op:
        CP.dump(obj, op, CP.HIGHEST_PROTOCOL)
        
def load_object(filename):
    with open(filename, 'rb') as ip:
        return CP.load(ip)




3,对于含logging.looger引用的对象的处理

然后,再看这样的一个类:

文件:UDO.py 第一版

Python 代码,双击复制代码
import matplotlib.pyplot as plt
import logging
class People:
    __jbindex = 0
    def __init__(self,name,age):
        self.name = '%s_PP%d'%(name,self.__jbindex,) 
        self.age = age
        People.__jbindex +=1
        
        self.logger = logging.getLogger('People')
        self.logger.addHandler(logging.NullHandler())
        self.logger.setLevel(logging.DEBUG)
        self.logger.info('New People %s is created'%(name,))
        
    def setdegree(self,deg):
        self.degree = deg
        self.logger.info('degree set to %s, as %s',self.name,self.degree)
    def setdata(self,x,y):
        self.x = x
        self.y = y
    def setFriend(self,frd):
        self.frd = frd
        self.logger.info('%s connected to %s, as a friend'%(self.frd.name,self.name))
    def plot(self,fid):
        plt.figure(fid)
        plt.plot(self.x,self.y)
        plt.grid(True)
        plt.title(self.name)
        plt.show()
    def __str__(self):
        re = 'name:' + self.name + '\n'
        re += 'age:' + str(self.age) + '\n'
        re += 'degree:' + str(self.degree) + '\n'
        return re


这是一个杜撰的用户自定义的类,其中,按照一般的做法,挂上了logging。而且该类的对象还可能包含指向np.array的reference和指向其他people对象本身的reference。假如我们试图像下面文件中这样,保存由上述类产生的对象:

文件:UDO_save.py 第一版
Python 代码,双击复制代码
import numpy as np
from UDO import People
from SRtool import save_object
import sys,logging

def main():
    x = np.linspace(-3,3,20)
    
    FY = People('FANYU',25)
    FY.setdegree('ME')
    FY.setdata(x, np.tanh(x))
    FY.plot(1)
    
    BB = People('BenBen',26)
    BB.setdegree('BL')
    BB.setdata(x, np.sin(x))
    FY.setFriend(BB)
    
    CC = People('ShuShu',40)
    CC.setdegree('HS')
    CC.setFriend(BB)    
    
    print id(BB)
    print id(CC.frd)
    print id(FY.frd)
    
    save_object(BB, r'DATA_BB.pkl')
    save_object(FY, r'DATA_FY.pkl')
    save_object(CC, r'DATA_CC.pkl')


if __name__ == '__main__':
    console = logging.StreamHandler(sys.stdout)
    console.setLevel(logging.INFO)
    formatter = logging.Formatter('%(levelname)8s %(name)50s - %(asctime)s -   %(message)s',datefmt='%M:%S')
    console.setFormatter(formatter)
    logging.getLogger('People').addHandler(console)
    main()




在运行时就会产生错误,其信息是(前面部分是logging):

    INFO                                             People - 20:50 -   New People FANYU is created
    INFO                                             People - 20:50 -   degree set to FANYU_PP0, as ME
    INFO                                             People - 20:51 -   New People BenBen is created
    INFO                                             People - 20:51 -   degree set to BenBen_PP1, as BL
    INFO                                             People - 20:51 -   BenBen_PP1 connected to FANYU_PP0, as a friend
    INFO                                             People - 20:51 -   New People ShuShu is created
    INFO                                             People - 20:51 -   degree set to ShuShu_PP2, as HS
    INFO                                             People - 20:51 -   BenBen_PP1 connected to ShuShu_PP2, as a friend
52777848
52777848
52777848
Traceback (most recent call last):
  File "Z:\code\Python\Learning\src\SaveRecover\UDO_save.py", line 38, in <module>
    main()
  File "Z:\code\Python\Learning\src\SaveRecover\UDO_save.py", line 29, in main
    save_object([BB,FY,CC], r'DATA.pkl')
  File "Z:\code\Python\Learning\src\SaveRecover\SRtool.py", line 5, in save_object
    CP.dump(obj, op, CP.HIGHEST_PROTOCOL)
cPickle.PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed

这个错误主要是因为people类将logger作为私有变量导致的,这对于people类内部是非常方便的,但是在动态保存时却遇到了问题。这是我在这个帖子里想说明的第一个问题。

为了解决这个问题,最简单的方法是将logger先del再保存,当还原为对象时再重新获得logger,这是修正后的UDO.py和UDO_save.py:
文件:UDO.py 第二版,最终版
Python 代码,双击复制代码
import matplotlib.pyplot as plt
import logging
class People:
    __jbindex = 0
    def __init__(self,name,age):
        self.name = '%s_PP%d'%(name,self.__jbindex,) 
        self.age = age
        People.__jbindex +=1
        self.set_logger()
    def set_logger(self):    
        self.logger = logging.getLogger('People')
        self.logger.addHandler(logging.NullHandler())
        self.logger.setLevel(logging.DEBUG)
        self.logger.info('New People %s is created'%(self.name,))
    def del_logger(self):
        del self.logger
    def setdegree(self,deg):
        self.degree = deg
        self.logger.info('degree set to %s, as %s',self.name,self.degree)
    def setdata(self,x,y):
        self.x = x
        self.y = y
    def setFriend(self,frd):
        self.frd = frd
        self.logger.info('%s connected to %s, as a friend'%(self.frd.name,self.name))
    def plot(self,fid):
        plt.figure(fid)
        plt.plot(self.x,self.y)
        plt.grid(True)
        plt.title(self.name)
        plt.show()
    def __str__(self):
        re = 'name:' + self.name + '\n'
        re += 'age:' + str(self.age) + '\n'
        re += 'degree:' + str(self.degree) + '\n'
        return re



文件:UDO_save.py 第二版
Python 代码,双击复制代码
import numpy as np
from UDO import People
from SRtool import save_object
import sys,logging

def main():
    x = np.linspace(-3,3,20)
    
    FY = People('FANYU',25)
    FY.setdegree('ME')
    FY.setdata(x, np.tanh(x))
    FY.plot(1)
    
    BB = People('BenBen',26)
    BB.setdegree('BL')
    BB.setdata(x, np.sin(x))
    FY.setFriend(BB)
    
    CC = People('ShuShu',40)
    CC.setdegree('HS')
    CC.setFriend(BB)    
    
    print id(BB)
    print id(CC.frd)
    print id(FY.frd)
    
    BB.del_logger()
    FY.del_logger()
    CC.del_logger()
    
    save_object(BB, r'DATA_BB.pkl')
    save_object(FY, r'DATA_FY.pkl')
    save_object(CC, r'DATA_CC.pkl')


if __name__ == '__main__':
    console = logging.StreamHandler(sys.stdout)
    console.setLevel(logging.INFO)
    formatter = logging.Formatter('%(levelname)8s %(name)50s - %(asctime)s -   %(message)s',datefmt='%M:%S')
    console.setFormatter(formatter)
    logging.getLogger('People').addHandler(console)
    main()
    


即,通过增加people.set_logger()和people.del_logger()函数来实现logger的删除和恢复。

4,对于shared instances的处理

经过上述调整,程序可以运行了。然而,即便如此,上述代码仍然是不完全正确的,当我们用下面的代码将对象从文件中还原:


文件:UDO_load.py 第一版
Python 代码,双击复制代码
from UDO import People
from SRtool import load_object
import sys,logging

def re():
    BB = load_object(r'DATA_BB.pkl')
    FY = load_object(r'DATA_FY.pkl')
    CC = load_object(r'DATA_CC.pkl')
    BB.set_logger()
    FY.set_logger()
    CC.set_logger()
    FY.plot(1)
    BB.plot(2)
    print id(BB)
    print id(FY.frd)
    print id(CC.frd)

if __name__ == '__main__':
    console = logging.StreamHandler(sys.stdout)
    console.setLevel(logging.INFO)
    formatter = logging.Formatter('%(levelname)8s %(name)50s - %(asctime)s -   %(message)s',datefmt='%M:%S')
    console.setFormatter(formatter)
    logging.getLogger('People').addHandler(console)
    re()



输出的结果是:
    INFO                                             People - 38:02 -   New People BenBen_PP1 is created
    INFO                                             People - 38:02 -   New People FANYU_PP0 is created
    INFO                                             People - 38:02 -   New People ShuShu_PP2 is created
35452168
52719328
52773512



注意本应指向同一个对象(BB)的reference,即FY.frd和CC.frd,此时却各不相同,说明在保存时shared instance并没有按照他们之间应有的关系进行保存,而是各自复制了所指向的对象,这不仅仅是文件大小的增加,而是对原有数据结构的破坏。
实际上,在Pickle库的帮助文档中,有这样一段话(https://docs.python.org/2/library/pickle.html#module-cPickle):
The pickle module differs from marshal in several significant ways
The pickle module keeps track of the objects it has already serialized, so that later references to the same object won’t be serialized again.
marshal doesn’t do thisThis has implications both for recursive objects and object sharing. Recursive objects are objects that contain references to themselves. These are not handled by marshal, and in fact, attempting to marshal recursive objects will crash your Python interpreter. Object sharing happens when there are multiple references to the same object in different places in the object hierarchy being serialized. pickle stores such objects only once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects.

意思就是说,对于那些相互之间存在引用的对象们(文中称为object sharing),该库会很聪明地只保存一份,从而保留数据间的关系不变。
实际上在我们刚才展示的代码中,不难发现其实并没有做到这一点。
遗憾的是文档中也没有具体说如何做才能达到,所以知道自己试试了。
经过一些尝试,最简单的方法是将它们封装在同一个list中,然后保存,见:

文件:UDO_save.py 第三版,最终版

Python 代码,双击复制代码
import numpy as np
from UDO import People
from SRtool import save_object
import sys,logging

def main():
    x = np.linspace(-3,3,20)
    
    FY = People('FANYU',25)
    FY.setdegree('ME')
    FY.setdata(x, np.tanh(x))
    FY.plot(1)
    
    BB = People('BenBen',26)
    BB.setdegree('BL')
    BB.setdata(x, np.sin(x))
    FY.setFriend(BB)
    
    CC = People('ShuShu',40)
    CC.setdegree('HS')
    CC.setFriend(BB)    
    
    print id(BB)
    print id(CC.frd)
    print id(FY.frd)
    
    BB.del_logger()
    FY.del_logger()
    CC.del_logger()

    save_object([BB,FY,CC], r'DATA.pkl')


if __name__ == '__main__':
    console = logging.StreamHandler(sys.stdout)
    console.setLevel(logging.INFO)
    formatter = logging.Formatter('%(levelname)8s %(name)50s - %(asctime)s -   %(message)s',datefmt='%M:%S')
    console.setFormatter(formatter)
    logging.getLogger('People').addHandler(console)
    main()
    



文件:UDO_load.py 第二版,最终版
Python 代码,双击复制代码
#import cPickle as CP
#import numpy as np
from UDO import People
from SRtool import load_object
import sys,logging

def re():
    BB,FY,CC = load_object(r'DATA.pkl')
    BB.set_logger()
    FY.set_logger()
    CC.set_logger()
    FY.plot(1)
    BB.plot(2)
    print id(BB)
    print id(FY.frd)
    print id(CC.frd)

if __name__ == '__main__':
    console = logging.StreamHandler(sys.stdout)
    console.setLevel(logging.INFO)
    formatter = logging.Formatter('%(levelname)8s %(name)50s - %(asctime)s -   %(message)s',datefmt='%M:%S')
    console.setFormatter(formatter)
    logging.getLogger('People').addHandler(console)
    re()



简单说来,就是对于logger,删了再保存,还原了再获取;对于shared instances,先将对象在同一个list中,再保存和还原。









回复
分享到:

使用道具 举报

您需要登录后才可以回帖 登录 | 我要加入

本版积分规则

快速回复 返回顶部 返回列表