例1
# coding: utf-8
import threading
import urllib2
import Queue
def task(q, url):
result = urllib2.urlopen(url)
q.put(result)
q = Queue.Queue()
urls = ['http://www.baidu.com', 'http://www.qq.com', 'http://www.sina.com']
threads = []
for url in urls:
t = threading.Thread(target=task, args=(q, url))
threads.append(t)
t.start()
for t in threads:
t.join()
while not q.empty():
s = q.get()
print s.code, s.url
例2
from multiprocessing.dummy import Pool as ThreadPool
import urllib2
def task(url):
result = urllib2.urlopen(url)
return result.code, result.url
urls = ['http://www.baidu.com', 'http://www.qq.com', 'http://www.sina.com']
pool = ThreadPool(4)
results = pool.map(task, urls)
print results
例2跟例1思路上本质是一致的,只是在充分利用库函数multiprocessing.dummy中pool.map的特性,让多线程代码在写法上更优雅。
Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.
Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.
Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn’t use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there’s a wait for some I/O).
Queues are almost invariably the best way to farm out work to threads and/or collect the work’s results, by the way, and they’re intrinsically threadsafe so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.
因为GIL的限制,CPython在处理I/O等待时,适合使用多线程。如果想要充分利用多核CPU,处理CPU密集型的任务,最好是使用多进程(multiprocessing)。
在操作系统中,进程是分配资源的基本单位,线程是调度的基本单位。因此,多进程消耗的资源更多,线程更加轻量级;创建进程一般比线程慢。多进程拥有不同的存储区间,多线程共享的内存。一个进程的崩溃,不会影响其他进程;一个线程的崩溃,可能破坏其他同属一个进程的工作线程。这也导致,在进程间共享数据和通信变得更加困难;由于多线程共享内存,为了防止多个线程在同一时间写同一段内存,在python中GIL被引入来解决这一问题。