本次教程你将学会以下几点:
# Import libraries
import pandas as pd
import sys
from sqlalchemy import create_engine
# Parameters
TableName = "xxx"
DB = {
'drivername': 'mysql+pymysql',
# 'servername': 'DAVID-THINK',
'ip':'xxx',
'port': xxx,
'username': 'xxx',
'password': 'xxx',
'database': 'xxx',
'encode':'charset=utf8',
# 'driver': 'SQL Server Native Client 11.0',
# 'trusted_connection': 'yes',
# 'legacy_schema_aliasing': False
}
# Create the connection
# 'mysql+pymysql://wl:qwe123@http://106.14.148.168:3306/hello_D?charset=utf8'
# connect_info = 'mysql+pymysql://username:passwd@host:3306/dbname?charset=utf8'
engine = create_engine(DB['drivername'] + '://' + DB['username'] + ':'+ DB['password'] + '@' + DB['ip'] + ':' + str(DB['port']) + '/'+ DB['database'] + '?'+ DB['encode'])
连接数据库的关键是创建一个引擎作为与数据库沟通的桥梁,这里用到sqlalchemy库的create_engine()方法,它的第一个参数格式为'mysql+pymysql://username:passwd@host:3306/dbname?charset=utf8',username是mysql的用户名、passwd是密码、host是ip,本地就写127.0.0.1,远程连接就写你自己对应的ip地址、port是端口号,一般都是3306、dbname是数据库名、charset=utf8可以预防中文出现乱码。
利用pandas的read_sql()方法,第一个参数sql可以是表明或sql语句,第二个参数con是连接的引擎对象。可选参数index_col:是选择某一列作为index。
sql = "SELECT * FROM {}" .format(TableName)
df = pd.read_sql(sql=sql, con=engine)
从mysql读取数据的方法除了read_sql(),还有read_sql_table()、read_sql_query(),具体可参考文章、博客。
df.to_csv('sql.csv',index = False)
df.to_excel('sql.xlsx',index = False)
df.to_csv('sql.txt',index = False)
写数据入库要用到pandas.DataFrame.to_sql()方法,主要参数如下:
from datetime import datetime
data_dict = {
'id':56,
'name':'牛逼',
'age':23,
'sex':0,
'qq':'154745845',
'tel':'15748586589',
'period':'第四期',
'course':'机器学习',
'c_time':datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'is_delete':0
}
data = pd.DataFrame(data=data_dict,index = [0])
print(data)
查看每列的类型,发现c_time不是datetime类型。
print(data.info())
Int64Index: 1 entries, 0 to 0
Data columns (total 10 columns):
age 1 non-null int64
c_time 1 non-null object
course 1 non-null object
id 1 non-null int64
is_delete 1 non-null int64
name 1 non-null object
period 1 non-null object
qq 1 non-null object
sex 1 non-null int64
tel 1 non-null object
dtypes: int64(4), object(6)
memory usage: 88.0+ bytes
这里可以使用pd.to_datetime()方法,强制将类型转为时间类型。
data['c_time'] = pd.to_datetime(data['c_time'])
data.info()
Int64Index: 1 entries, 0 to 0
Data columns (total 10 columns):
age 1 non-null int64
c_time 1 non-null datetime64[ns]
course 1 non-null object
id 1 non-null int64
is_delete 1 non-null int64
name 1 non-null object
period 1 non-null object
qq 1 non-null object
sex 1 non-null int64
tel 1 non-null object
dtypes: datetime64[ns](1), int64(4), object(5)
memory usage: 88.0+ bytes
将数据传入数据库内:
data.to_sql(TableName,engine,if_exists='append',index = False)