python3.5 爬取bing搜索结果页面标题、链接

一个简单的爬虫小程序,可以抓取bing输入关键字后第一个页面的标题、链接。

import re,urllib.parse,urllib.request,urllib.error
from bs4 import BeautifulSoup as BS

baseUrl = 'http://cn.bing.com/search?'
word = '鹿晗 吴亦凡 张艺兴'
print(word)
word = word.encode(encoding='utf-8', errors='strict')
#print(word)

data = {'q':word}
data = urllib.parse.urlencode(data)
#print(data)
url = baseUrl+data
print(url)

try:
    html = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
    print(e.code)
except urllib.error.URLError as e:
    print(e.reason)

soup = BS(html,"html.parser")
td = soup.findAll("h2")
count = soup.findAll(class_="sb_count")
for c in count:
    print(c.get_text())

for t in td:
    print(t.get_text())
    pattern = re.compile(r'href="([^"]*)"')
    h = re.search(pattern,str(t))
    if h:
        for x in h.groups():
            print(x)

运行结果截图:

python3.5 爬取bing搜索结果页面标题、链接_第1张图片

你可能感兴趣的:(python,python,bing,爬虫)