线程池Queue的有关问题

发布时间：2011-06-29 20:11:16 文章来源：www.iduyao.cn 采编人员：星星草

线程池Queue的问题
我写了一个多线程的爬虫：URL抓取页面中其他URL，继续抓取其页面。里面用到了Queue但是在xp命令行下运行的时候经常光标不动。应该是信号同步的问题，调试很久不得其解，帖代码，往大家指正：

Python code


#coding=utf-8
from __future__ import with_statement
from BeautifulSoup import BeautifulSoup
import urllib2
from threading import Thread
from Queue import Queue
import time
import socket
socket.setdefaulttimeout(5)

class Fetcher:#把操作封到一个类里面，从网上搜得例子
    def __init__(self,th_num):
        self.opener = urllib2.build_opener(urllib2.HTTPHandler)
        self.lock = Lock() #线程锁
        self.q_req = Queue() #任务队列
        self.q_ans = Queue() #完成队列
        self.urls = []#返回抓取页面中的url
        self.th_num = th_num
        for i in range(th_num):#抓取线程
            t = Thread(target=self.thread_get)
            t.setDaemon(True)
            t.start()
        for i in range(th_num):#处理线程
            t = Thread(target=self.thread_put)
            t.setDaemon(True)
            t.start()
        
 
    def join(self): #解构时需等待两个队列完成
        time.sleep(0.5)
        print '=====================im done'
        self.q_req.join()
        self.q_ans.join()
 
    def push(self,req):
        self.q_req.put(req)
    
    def thread_put(self):
        while True:
            try:
                if not self.q_ans.empty():
                    url = self.q_ans.get()
                    self.urls.extend(url)
                    self.q_ans.task_done()
            except Queue.empty,qe:
                print qe,'Queue=========='
                continue
            except Exception ,e:
                print e,'other,excp========'
                continue

    def thread_get(self):
        print 'i am starting------'
        while  True:
            try:
                if self.q_req.empty():
                    continue
                req = self.q_req.get()
            except Queue.empty,qe:
                print 'enmpty-----------'
                continue
            urls = []
            ans = ''
            try:
                ans = self.opener.open(req).read()
                soup = BeautifulSoup(ans)
                for a in soup.findAll('a'):
                    try:
                        if a['href'].startswith('http'):
                            urls.append(a['href'])
                    except KeyError, e:
                        print e ,'=======================KeyError=in=soup=findAll'
                        continue
                    except Exception,ex:
                        print ex,'========================Exception=in=soup=findAll'
                        continue   
                self.q_ans.put(urls)
                self.q_req.task_done()                
            except UnicodeEncodeError, ue:
                print 'unicode----------------------wrong'
                print ue
                print req
                continue
            except urllib2.URLError, ue:
                print 'conn-----------rufuse'
                print ue
                print req
                continue
            except Exception, what:
                print 'other--exception----------in- threadget----'
                print what
                print req
                continue
            time.sleep(0.1) # don't spam
        print 'get=========='
 
def run(links,th_num=10):
    f = Fetcher(th_num)
    for url in links:
        f.push(url)
    f.join()
    return f.urls 

if __name__ == "__main__":
    links = ['http://kingdowin.com/',]
    deep = 2#抓取页面的深度
    while deep > 0:
        urls = run(links)
        deep -= 1
        links = urls
        print links
    print "Exiting Main Thread"

------解决方案--------------------
E:\project\PyCharmProject\proberServer>python test.py
i am starting------
i am starting------
i am starting------
i am starting------
i am starting------
i am starting------
i am starting------
i am starting------

上一篇：怎么优化嵌套循环次数不定的代码
下一篇：【perl】转换（tr或y）统计单词字符,该怎么解决

友情提示：
信息收集于互联网，如果您发现错误或造成侵权，请及时通知本站更正或删除，具体联系方式见页面底部联系我们，谢谢。

其他相似内容：

能不能找到支持 python 2.6 2.7 3.x 版本的 mod_python 呢？解决方法

能不能找到支持 python 2.6 2.7 3.x 版本的 mod_python 呢？ http://archive.apache.org/dist/httpd/modpython/win/3.3.1/ 我在...
windows下安装apache + python + django + mod_wsgi.so解决思路

windows下安装apache + python + django + mod_wsgi.so 对应版本： Apache：Apache HTTP Server (httpd) 2.2.19 Python：Python2.7 Dja...
手工执行python3程序没有关问题，放在cron里面就不执行有中文的代码，高手帮忙啊

手工执行python3程序没问题，放在cron里面就不执行有中文的代码，高手帮忙啊。急！先介绍下基本情况环境: redhat Python3.2 目的：定...
PYTHON用什么编辑器？该怎么解决

PYTHON用什么编辑器？是用自带的IDLE不? ------解决方案-------------------- 看下国外的Python用户都用什么吧： http://jobs.pyth...
为什么在python25中输入下面的代码是异常的？求大神

为什么在python25中输入下面的代码是错误的？？求大神！ if 1 < 0: print '”x” must be atleast 0!' ------解决方案-----...
pyhthon zipfile获取压缩文件列表后怎样打开其中某个文件？该如何处理

pyhthon zipfile获取压缩文件列表后怎样打开其中某个文件？如题。似乎ZipFile没有open操作.. zCmfile = zipfile.ZipFile(target...
本人初学者一个，哪位大神帮小弟我解释一下下面两段

本人菜鸟一个，哪位大神帮我解释一下下面两段 import sys print >> sys.stderr, 'Fatal error: invalid input!' import sys ...
老王的python学习网站！推荐！该怎么处理

老王的python学习网站！推荐！ http://blog.csdn.net/hendom/article/details/7173207 很不错的python学习网站。 http://www.cnpyt...
myeclipse里导入python项目,该怎么处理

myeclipse里导入python项目初学python，我在myeclipse里导入已有项目，选择路径后为什么没出现该项目，这项目不应该有问题啊，我用的是m...
安装PyQt的有关问题

安装PyQt的问题？今天在ubuntu下安装了PyQt-x11-gpl-4.9，但是我按照《getting started with PyQt》上的一个例子 import sys from ...

线程池Queue的有关问题

其他相似内容：

热门推荐：