Pandas

For each of the four datasets...

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line:  (hint: use statsmodels and look at the Statsmodels notebook)
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
 
anascombe = pd.read_csv('anscombe.csv')
print('Mean of x:', anascombe['x'].mean())
print('Variance of x:', anascombe['x'].std())
print('Mean of y:', anascombe['y'].mean())
print('Variance of y:', anascombe['y'].std())
print('Correlation coefficient between x and y:\n', anascombe.corr())
 
model = ols('x ~ y', anascombe).fit()
print(model.summary())

Using Seaborn, visualize all four datasets.

import pandas as pd
import matplotlib.pyplot as plt
 
anascombe = pd.read_csv('anscombe.csv')
f, ax = plt.subplots()
ax.scatter(anascombe['x'], anascombe['y'])
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

你可能感兴趣的:(python)