kNN算法回归实例: 预测房价

文章目录

  • 实战内容
  • 参考资料

实战内容

预测房价

代码如下:

#!/usr/bin/env python 
# -*- coding:utf-8 -*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

#load csv
cols = ['accommodates','bedrooms','bathrooms','beds','price','minimum_nights','maximum_nights','number_of_reviews']
features = ['accommodates','bedrooms','bathrooms','beds','minimum_nights','maximum_nights','number_of_reviews']
dataset = pd.read_csv('listings.csv')
dataset = dataset[cols]
dataset = dataset.dropna()

#analyze dataset roughly
print(dataset.head())
print(dataset.shape)
print(dataset.dtypes)
print(dataset.columns)

# change types of labels
dataset['price'] = dataset.price.str.replace("\$|,",'').astype(float)

# analyze dataset through plot 
#执行过一次后就可以不执行
'''
sns.pairplot(dataset)
plt.show()
'''
# preprocessing minmaxscaler
min_max_scaler = preprocessing.MinMaxScaler()
dataset_minmax = dataset.copy()
dataset_minmax[features] = min_max_scaler.fit_transform(dataset[features])
print(dataset_minmax.head())

#cut dataset
train_minmax = dataset_minmax[features].iloc[:2792]
train_minmax = train_minmax.sample(frac=1,random_state=0)  #shuffle
test_minmax = dataset_minmax[features].iloc[2792:]
label_train = dataset_minmax.price.iloc[:2792]
label_test = dataset_minmax.price.iloc[2792:]

#training
knn = KNeighborsRegressor()
knn.fit(train_minmax,label_train)

#evaluating
label_predictions = knn.predict(test_minmax)
print(len(label_predictions))
print(label_predictions.shape)
label_mse = mean_squared_error(label_test, label_predictions)
rmse = label_mse ** (1/2)
print("RMSE:%f"%rmse)
print(pd.Series(label_predictions))

参考资料

经典算法之K近邻(回归部分)
sns.lmplot()的较详细用法
Python之如何删除pandas DataFrame的某一/几列
numpy增加空维度
reshape(-1,1)和reshape(1,-1)是什么意思?
如何用seaborn作多变量分析?
如何还原归一化数值?

你可能感兴趣的:(机器学习)