学习了很多Python知识,敲写了千万行代码,感觉学Python语言太枯燥乏味了。但是呢,本着美女是学习动力的第一原则,啊哈哈。写个程序把妹子们都下载下来吧。
今天咱们就利用Python爬取唯一图库(http://www.mmonly.cc/mmtp/)上的漂亮的妹子图,给大家一波福利。O(∩_∩)O。
妹子图片质量整体上还是不错呦,放三张不同风格的图大家感受下,O(∩_∩)O哈哈~
import urllib.request
from bs4 import BeautifulSoup
import os
def Download(url,picAlt,name):
...
def run(targetUrl, beginNUM ,endNUM):
...
if beginNUM ==endNUM
...
if __name__ == '__main__':
该程序利用Beautiful Soup实现的,它其实是Python的一个库,主要功能是从网页抓取数据,可参考这篇详细文章(https://cuiqingcai.com/1319.html/comment-page-1#comments)
安装Beautiful Soup
pip install beautiful soup4
导包
from bs4 import BeautifulSoup
建立保存路径
def Download(url,picAlt,name):
path = 'D:\\pythonD爬虫妹子图\\'+picAlt+'\\'
if not os.path.exists(path):
os.makedirs(path)
urllib.request.urlretrieve( url, '{0}{1}.jpg'.format(path, name))
import urllib.request
from bs4 import BeautifulSoup
import os
def Download(url,picAlt,name):
path = 'D:\\pythonD爬虫妹子图\\'+picAlt+'\\'
if not os.path.exists(path):
os.makedirs(path)
urllib.request.urlretrieve( url, '{0}{1}.jpg'.format(path, name))
header = {
"User-Agent":'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive'
}
def run(targetUrl, beginNUM ,endNUM):
req = urllib.request.Request(url=targetUrl,headers=header)
response = urllib.request.urlopen(req)
html = response.read().decode('gb2312','ignore')
soup = BeautifulSoup(html, 'html.parser')
Divs = soup.find_all('div',attrs={'id':'big-pic' })
nowpage = soup.find('span',attrs={'class':'nowpage'}).get_text()
totalpage= soup.find('span',attrs={'class':'totalpage'}).get_text()
if beginNUM ==endNUM :
return
for div in Divs:
beginNUM = beginNUM+1
if div.find("a") is None :
print("没有下一张了")
return
elif div.find("a")['href'] is None or div.find("a")['href']=="":
print("没有下一张了None")
return
print("下载信息:总进度:",beginNUM,"/",endNUM," ,正在下载套图:(",nowpage,"/",totalpage,")")
if int(nowpage)