5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

5-AM Project: day8 Practical data science with Python 4_第1张图片

Making histograms and violin plots

sns.histplot(x=df['Minutes'], kde=True)

5-AM Project: day8 Practical data science with Python 4_第2张图片

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

5-AM Project: day8 Practical data science with Python 4_第3张图片

Making scatter plots with Matplotlib and Seaborn

plt.scatter(df['Minutes'], df['MB'])

5-AM Project: day8 Practical data science with Python 4_第4张图片

Examining correlations and making correlograms

sns.pairplot(data=df)

5-AM Project: day8 Practical data science with Python 4_第5张图片

Making missing value plots

import missingno as msno
msno.matrix(df)

5-AM Project: day8 Practical data science with Python 4_第6张图片 This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

5-AM Project: day8 Practical data science with Python 4_第7张图片

5-AM Project: day8 Practical data science with Python 4_第8张图片

Using visualization best practices

Saving plots for sharing and reports

你可能感兴趣的:(Data,Science,Python,python,学习,笔记,人工智能)