用python爬虫登录github账户获取所有项目

分享自己今天刚学的用python爬虫登录github账户并获取所有的项目的信息

首先是环境,我装的是python3.5之后的版本

然后安装requests

pip install requests

安装beautifulsoup

pip install beautifulsoup4

然后导入requests和beautifulsoup两个库

from bs4 import BeautifulSoup
import requests

获取github的登陆网址

url = requests.get('https://github.com/login')

存到url这个变量中

用beautifulsoup获取源码

soup1 = BeautifulSoup(url.text,features='html.parser')

找到input框获取名为authenticity_token的属性

获取这个属性的值authenticity_token = tag.get('value')

存cookie值c1=url.cookies.get_dict()

传入要填入的内容

form_data = {
    'authenticity_token':authenticity_token,
    'utf8':'',
    'commit':'Sign in',
    'login':"",#you github email
    'password':""#you password
}

要提交的内容

i2 = requests.post('https://github.com/session',data=form_data,cookies=c1)

再次获取cookie

c2 = i2.cookies.get_dict()

更新cookie

c1.update(c2)

重新组合要发送的内容

i3 = requests.get('https://github.com/settings/repositories',cookies=c1)

用soup重新组成html

然后查看是否成功

for child in list_group.children:
    if isinstance(child,Tag):
        project_tag = child.find(name='a',class_='mr-1')
        size_tag = child.find(name='small')
        temp = 'project:%s(%s);projectpath:%s' % (project_tag.get('href'),size_tag.string,project_tag.string,)
        print(temp)

源码地址:

https://github.com/leebychina/sign.git

你可能感兴趣的:(用python爬虫登录github账户获取所有项目)