Python 標准庫 進程池介紹 -开发者知识库

Python 標准庫 進程池介紹 -开发者知识库,第1张

有些情況下,所要完成的工作可以分解並獨立地分布到多個工作進程,對於這種簡單的情況,可以用Pool類來管理固定數目的工作進程。作業的返回值會收集並作為一個列表返回。(以下程序cpu數量為2,相關函數解釋見python 進程池2 - Pool相關函數)。

 

1 import multiprocessing
2
3 def do_calculation(data):
4 return data*2
5 def start_process():
6 print 'Starting',multiprocessing.current_process().name
7
8 if __name__=='__main__':
9 inputs=list(range(10))
10 print 'Inputs :',inputs
11
12 builtin_output=map(do_calculation,inputs)
13 print 'Build-In :', builtin_output
14
15 pool_size=multiprocessing.cpu_count()*2
16 pool=multiprocessing.Pool(processes=pool_size,
17 initializer=start_process,)
18
19 pool_outputs=pool.map(do_calculation,inputs)
20 pool.close()
21 pool.join()
22
23 print 'Pool :',pool_outputs


 

其運行后的輸出結果如下:
1 Inputs  : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 Build-In : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
3 Starting PoolWorker-2
4 Starting PoolWorker-1
5 Starting PoolWorker-3
6 Starting PoolWorker-4
7 Pool : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

默認情況下,Pool會創建固定數目的工作進程,並向這些工作進程傳遞作業,直到再沒有更多作業為止。maxtasksperchild參數為每個進程執行task的最大數目,設置maxtasksperchild參數可以告訴池在完成一定數量任務之后重新啟動一個工作進程,來避免運行時間很長的工作進程消耗太多的系統資源。

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user. 

notice

python 2.6.6 multiprocessing.Pool沒有maxtaskperchild參數,Pool(processes=None, initializer=None, initargs=())

python 2.7.3 Pool(processes=None, initializer=None, initargs=(), maxtasksperchild=None)

 1 import multiprocessing
2
3 def do_calculation(data):
4 return data*2
5 def start_process():
6 print 'Starting',multiprocessing.current_process().name
7
8 if __name__=='__main__':
9 inputs=list(range(10))
10 print 'Inputs :',inputs
11
12 builtin_output=map(do_calculation,inputs)
13 print 'Build-In :', builtin_output
14
15 pool_size=multiprocessing.cpu_count()*2
16 pool=multiprocessing.Pool(processes=pool_size,
17 initializer=start_process,maxtasksperchild=2)
18
19 pool_outputs=pool.map(do_calculation,inputs)
20 pool.close()
21 pool.join()
22
23 print 'Pool :',pool_outputs

其運行后的輸出結果為:

 1 Inputs  : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 Build-In : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
3 Starting PoolWorker-1
4 Starting PoolWorker-2
5 Starting PoolWorker-3
6 Starting PoolWorker-4
7 Starting PoolWorker-5
8 Starting PoolWorker-6
9 Starting PoolWorker-7
10 Starting PoolWorker-8
11 Pool : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

池完成其所分配的任務時,即使沒有更多的工作要做,也會重新啟動工作進程。從這個輸出可以看到,盡管只有10個任務,而且每個工作進程一次可以完成兩個任務,但是這里創建了8個工作進程。

更多的時候,我們不僅需要多進程執行,還需要關注每個進程的執行結果。

 1 import multiprocessing
2 import time
3
4 def func(msg):
5 for i in xrange(3):
6 print msg
7 time.sleep(1)
8 return "done " msg
9
10 if __name__ == "__main__":
11 pool = multiprocessing.Pool(processes=4)
12 result = []
13 for i in xrange(10):
14 msg = "hello %d" %(i)
15 result.append(pool.apply_async(func, (msg, )))
16 pool.close()
17 pool.join()
18 for res in result:
19 print res.get()
20 print "Sub-process(es) done."

 

參考:

《Python 標准庫》 10.4.17 進程池(p445)

http://www.coder4.com/archives/3352

 

原文:http://www.cnblogs.com/congbo/archive/2012/08/23/2652433.html

來源:http://www.cnblogs.com/congbo/archive/2012/08/23/2652433.html   http://www.16kan.com/post/213223.html

最佳答案:

本文经用户投稿或网站收集转载,如有侵权请联系本站。

发表评论

0条回复