python爬虫(二)获取京东python书籍信息

    这个代码主要是为了锻炼对req对ests库和re库方法的熟练度。主要提取了书的书名、作者、出版社、价格等信息

# -*- coding: utf-8 -*-
import requests
import re
from requests.exceptions import RequestException

def get_one_page(url):
	try:
		r = requests.get(url)
		if r.status_code == 200:
			r.encoding= "utf-8"#防止乱码
			return r.text
		return None
	except RequestException:
		return None

def parse_html(html):
	pattern = re.compile(r'.*?.*?'
						 r'.*?
.*?/em>(.*?).*?
.*?([\u4E00-\u9FA5|a-zA-Z\s]*?)(.*?)(.*?)' r'.*?
(.*?)(.*?).*?' r'.*?(.*?).*?' r'.*?
.*?(.*?)(.*?)',re.S) img = pattern.findall(html) print(len(img)) for i in img: print(i[1])#价格 print(i[3]+i[4]+i[5])#name print(i[6],i[7])#author print(i[8])#store print(i[9],i[10]) print() def main(keyword): url = "https://search.jd.com/Search?keyword="+keyword html = get_one_page(url) parse_html(html) if __name__ == '__main__': keyword = input("请输入:") main(keyword)

结果:

python爬虫(二)获取京东python书籍信息_第1张图片

你可能感兴趣的:(python)