python打卡day8@浙大疏锦行

今天的任务是继续对数据进行预处理:

1.进行标签编码

2.对连续特征进行归一化和标准化

一、字典的介绍

dict={"name":"豆包","sex":"male","age":"19"} 
dict["name"] 

二、对便签编码

import pandas as pd
data=pd.read_csv(r"heart.csv")
discrete_features=["sex","cp","fbs","restecg","exang","slope","ca","thal"]
continuous_features=[]
for i in data.columns:
    if i not in discrete_features:
        continuous_features.append(i)
continuous_features
data2=pd.read_csv(r"heart.csv")
dd=["cp","restecg","ca","thal"]
db=["slope"]
data=pd.get_dummies(data,columns=dd,drop_first=True)
data
list_finall=[]
for i in data.columns:
    if i not in data2.columns:
        list_finall.append(i)
list_finall
for i in list_finall:
    data[i]=data[i].astype(int)
data
data["slope"].value_counts()
dict={0:"0",1:"1",2:"2"}
dict
data["slope"]=data["slope"].map(dict)
data["slope"]

三、连续变量的处理

(1)归一化

from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
data['age'] = min_max_scaler.fit_transform(data[['age']])
data['age'].head()
def manual_normalize(data):
    min_val = data.min()
    max_val = data.max()
    normalized_data = (data - min_val) / (max_val - min_val)
    return normalized_data
data['age'] = manual_normalize(data['age'])
data['age'].head()

(2)标准化

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() 
data['age'] = scaler.fit_transform(data[['age']])
data['age'].head()

你可能感兴趣的:(python打卡60天行动,python,java,前端)