datanlysis

数据分析

1.matlab
2.Python
3.Numpy

一、Numpy是什么?

  1. Numrical Python,数值的Python,应用于数值分析领域的Python语言工具;
  2. Numpy是一个开源的科学计算库;
  3. Numpy弥补了作为通用编程语言的Python在数值计算方面,能力弱,速度慢的不足;
  4. Numpy拥有丰富的数学函数、强大的多维数组和优异的运算性能;
  5. Numpy与Scipy、scikit、matplotlib等其他科学计算库可以很好的协调工作;
  6. Numpy可以取代matlab等工具,允许用户进行速度开发的同时完成交互式的原型设计
    vector.py
import time
import numpy as np

def o_vector_add(n: int) -> list:
    a, b = [], []
    for i in range(n):
        a.append(i ** 2)
        b.append(i ** 3)
    c = []
    for x, y in zip(a, b):
        c.append(x + y)
    return c

def n_vector_add(n: int) -> np.array:
    return np.arange(n) ** 2 + np.arange(n) ** 3

def runtime(funcName: str, *args) -> float:
    start = time.time()
    result = funcName(*args)
    end = time.time()
    return end - start

if __name__ == '__main__':
    t1 = runtime(o_vector_add, 1000000)
    t2 = runtime(n_vector_add, 1000000)
    print(t1, t2)

二、多维数组

1. numpy中的多维数组是numpy.ndarray类类型的对象,可以用于表示数据结构中的任意维度的数组;

2. 创建多维数组对象

2.1 numpy.arange(起始,终止,步长) ——> 一维数组,首元素就是起始值,尾元素在终止值之前的最后一个元素,步长即每次递增的公差。缺省起始值为0,缺省步长为1
2.2 numpy.array(任何可被解释为数组的容器)

3. 内存连续,元素同质

4. ndarray.dtype属性表示元素的数据类型。通过dtype参数和astype()方法可以指定和修改元素的数据类型。

5. ndarray.shape属性表示数组的维度:(高维度数, …, 低维度数) array.py

6. 元素索引

数组[索引]
数组[行索引][列索引]
数组[页索引][行索引][列索引]
数组[页索引, 行索引, 列索引]

7. numpy的内置类型和自定义类型

7.1 numpy的内置类型

bool_ 1字节布尔型,True/False
int8 1字节有符号整型,-128-127
int16 2字节有符号整型
int32 4字节有符号整型
int64 8字节有符号整型
uint8 1字节无符号整型,0-255
uint16 2字节无符号整型
uint32 4字节无符号整型
uint64 8字节无符号整型
float16 2字节浮点型
float32 4字节浮点型
float64 8字节浮点型
complex64 8字节复数型
complex128 16字节复数型
str_ 字符串型

7.2 自定义类型

通过dtype将多个相同或者不同的numpy内置类型组合成某种复合类型,用于数组元素的数据类型。

7.2.1 除了使用内置类型的全称以外还可以通过类型编码字符串简化类型的说明

numpy.int8 -> i1
numpy.int16 -> i2
numpy.uint32 -> u4
numpy.float64 -> f8
numpy.complex123 -> c16
numpy.str_ -> U字符数
numpy.bool_ -> b

7.2.2对于多字节整数可以加上字节序前缀

< -> 小端字节序,低数位低地址;
0x1234
L H
0x34 0x12
= -> 处理器系统默认

-> 大端字节序,低数位高地址
0x1234
L H
0x12 0x34

	代码:dtype.py
import numpy as np
a = np.array([('ABC', [1, 2, 3])], dtype='U3, 3i4')
print(a.dtype)
print(a[0]['f0'])
print(a[0]['f1'])
print(a[0]['f1'][0])

b = np.array([('ABC', [1, 2, 3])], dtype=[
('name', np.str_, 3),
('scores', np.int32, 3)
])
print(b.dtype)
print(b[0]['name'])
print(b[0]['scores'])
print(b[0]['scores'][0])

c = np.array([('ABC', [1, 2, 3])], dtype={
   
'names': ['name', 'scores'],
'formats': ['U3', '3i4']
})
print(c.dtype)
print(c[0]['name'])
print(c[0]['scores'])
print(c[0]['scores'][0])

d = np.array([('ABC', [1, 2, 3])], dtype={
   
'name': ('U3', 0),
'scores': ('3i4', 12)
})
print(d.dtype)
print(d[0]['name'])
print(d[0]['scores'])
print(d[0]['scores'][0])

e = np.array([0x1234], dtype=(
', {
   'lo': ('u1', 0), 'hi': ('u1', 1)}
))
print('{:x}'.format(e[0]))
print('{:x} {:x}'.format(e['lo'][0], e['hi'][0]))

8. 切片

数组[起始:终止:步长, 起始:终止:步长, …]
缺省起始:首(步长为正)、尾(步长为负)
缺省终止:尾后(步长为正)、首前(步长为负)
缺省步长:1
靠近端部的一个或几个连续的维度使用缺省切片,可以使用"…"表示。

代码:slice.py

import numpy as np
a = np.arange(1, 10)
print(a)
print(a[:3]) # 1 2 3
print(a[3:6]) # 4 5 6
print(a[6:]) # 7 8 9
print(a[::-1]) # 9 8 7 6 5 4 3 2 1
print(a[:-4:-1]) # 9 8 7
print(a[-4:-7:-1]) # 6 5 4
print(a[-7::-1]) # 3 2 1

b = np.arange(1, 25).reshape(2, 3, 4)
print(b)
print(b[:, 0, 0]) # 1 13
print(b[0, :, :]) # 1-12相当于b[0, ...]
print(b[0, 1, ::2]) # 5 7
print(b[:, :, 1])
print(b[:, 1])
print(b[-1, 1:, 2:])

9. 改变维度

9.1视图变维:针对一个数组对象获取其不同维度的视图

数组.reshape(新维度) -> 数组的新维度视图
数组.ravel() -> 数组的一维视图

9.2 复制变维:针对一个数组对象获取其不同维度的副本

数组.flatten() -> 数组的一维副本

9.3 就地变维

数组.shape = (新维度)
数组.resize(新维度)

9.4 视图转置

数组.transpose() -> 数组的转置视图
数组.T:转置视图属性
至少二维数组才能转置

代码:reshape.py

import numpy as np
a = np.arange(1, 9)
print(a)
b = a.reshape(2, 4)
print(b)
c = b.reshape(2, 2, 2)
print(c)
d = c.ravel()
print(d)
e = c.flatten()
print(e)
f = b.reshape(2, 2, 2).copy()
print(f)
a += 10
print(a, b, c, d, e, f, sep='\n')
a.shape = (2, 2, 2)
print(a)
a.resize(2, 4)
print(a)
g = a.transpose() # 等价于a.T
print(g)
# print(np.array([e]).T)
print(e.reshape(-1, 1))

10. 组合与拆分

10.1 垂直组合/拆分

numpy.vstack((上, 下))
numpy.vsplit(数组, 份数) -> 子数组集合

10.2 水平组合/拆分

numpy.hstack((左, 右))
numpy.hsplit(数组, 份数) -> 子数组集合

10.3 深度组合/拆分

numpy.dstack((前, 后))
numpy.dsplit(数组, 份数) -> 子数组集合

10.4 行/列组合

numpy.row_stack((上, 下))
numpy.column_stack((左, 右))

代码:stack.py

import numpy as np
a = np.arange(11, 20).reshape(3, 3)
b = np.arange(21, 30).reshape(3, 3)
c = np.vstack((a, b))
print(a, b, c, sep='\n')
a, b = np.vsplit(c, 2)
print(a, b, sep='\n')
c = np.dstack((a, b))
print(c)
a, b = np.dsplit(c, 2)
print(a.T[0].T, b.T[0].T, sep='\n')
a = a.ravel()
b = b.ravel()
print(a, b, sep='\n')
c = np.row_stack((a, b))
print(c)
# c = np.column_stack((a, b))
c = np.c_[a, b]
print(c)

11. ndarray类的属性

dtype - 元素的类型
shape - 数组的维度
T - 转置视图
ndim - 维数
size - 元素数
itemsize - 元素的字节数
nbytes - 总字节数 = size x itemsizef
flat - 扁平迭代器
real - 实部数组
imag - 虚部数组
数组.tolist() -> 列表对象

代码:attr.py

import numpy as np
a = np.array([
[1+1j, 2+4j, 3+7j],
[4+2j, 5+5j, 6+8j],
[7+3j, 8+6j, 9+9j]
])
print(a.dtype, a.dtype.str, a.dtype.char)
print(a.shape)
print(a.ndim)
print(a.size, len(a))
print(a.itemsize)
print(a.nbytes)
print(a.T)
print(a.real, a.imag, sep='\n')
for elem in a.flat:
print(elem)
print(a.flat[[1, 3, 5]])
a.flat[[2, 4, 6]] = 0
print(a)

三、数据可视化:matplotlib.pyplot(mp)

1. 基本函数

1.1 mp.plot(水平座标数组, 垂直坐标数组)

x:[1 2 3 4]
y:[5 6 7 8]

1.2 mp.plot(…, label=图例文本,linestyle=线型,linewidth=线宽,color=颜色)
1.3 mp.xlim(左边界, 有边界)
1.4 mp.ylim(底边界, 顶边界)
1.5 mp.xticks(刻度位置数组, 刻度文本数组)
1.6 mp.yticks(刻度位置数组, 刻度文本数组)
1.7 ax = mp.gca() # 获取当前坐标轴

ax.spines[“left”].set_position((‘data’, 0))
ax.spines[“top”].set_color(颜色)

1.8 mp.legend(loc=‘upper left’) #‘lower left’ 显示图例
1.9 mp.scatter(水平坐标数组, 垂直坐标数组,marker=点型,s=大小,edgecolor=勾边色,facecolor=填充色,zorder=Z序)
1.10 mp.annotate

mp.annotate(
备注文本,
xy=目标位置,
xycoords=目标坐标系,
xytext=文本的位置,
textcoords=文本坐标系,
fontsize=字体大小,
arrowprops=箭头属性
)

代码:plot1.py

    import numpy as np
    import matplotlib.pyplot as mp

    space = 1.1

    x = np.linspace(-2*np.pi, 2*np.pi, 1000)
    y_sin = np.sin(x)
    y_cos = np.cos(x) / 2
    xo = np.pi * 3 / 4
    yo_cos = np.cos(xo) / 2
    yo_sin = np.sin(xo)

    mp.figure("正余弦函数")
    mp.xlim(x.min() * space, x.max() * space)
    mp.ylim(y_sin.min() * space, y_sin.max() * space)
    mp.xticks([-2 * np.pi, -np.pi, 0, np.pi, np.pi * 3 / 2, 2 * np.pi],
              [r'$-2\pi$', r'$-\pi$', r'$0$', r'$\pi$', r'$\frac{3\pi}{2}$', r'$2\pi$'])
    mp.yticks([-1, -0.5, 0.5, 1])
    ax = mp.gca()
    ax.spines["left"].set_position(("data", 0))
    ax.spines["bottom"].set_position(("data", 0))
    ax.spines["right"].set_color('none')
    ax.spines["top"].set_color('none')
    mp.plot(x, y_cos, label=r'$y=\frac{1}{2}cos(x)$', linestyle=":", linewidth=1, color="dodgerblue")
    mp.plot(x, y_sin, label=r'$y=sin(x)$', linestyle="-", linewidth=0.5, color="orangered")
    mp.scatter([xo, xo], [yo_cos, yo_sin], s=60, edgecolors="limegreen", facecolor="white", zorder=3)
    mp.plot([xo, xo], [yo_cos, yo_sin], linestyle="--", linewidth=1, color="limegreen")
    mp.annotate(
        r"$\frac{1}{2}cos(\frac{3\pi}{4})=-\frac{\sqrt{2}}{4}$",
        xy=(xo, yo_cos),
        xycoords="data",
        xytext=(-90, -40),
        textcoords="offset points",
        fontsize=14,
        arrowprops=dict(
            arrowstyle="->",
            connectionstyle="arc3, rad=.2"
        )
    )
    mp.annotate(
        r"$sin(\frac{3\pi}{4})=\frac{\sqrt{2}}{2}$",
        xy=(xo, yo_sin),
        xycoords="data",
        xytext=(30, 40),
        textcoords="offset points",
        fontsize=14,
        arrowprops=dict(
            arrowstyle="->",
            connectionstyle="arc3, rad=.2"
        )
    )

    mp.legend(loc="upper left")
    mp.show()

2. 图形对象

mp.figure(图形对象名,figsize=窗口大小,dpi=分辨率,facecolor=颜色)

代码:fig.py

import numpy as np
import matplotlib.pyplot as mp

x = np.linspace(-np.pi, np.pi, 1000)
cos_y = np.cos(x) / 2
sin_y = np.sin(x)
mp.figure("Figure Object 1", figsize=(4, 3), dpi=120, facecolor="lightgray")
mp.title("Figure Object 1", fontsize=12)
mp.xlabel("x", fontsize=10)
mp.ylabel("y", fontsize=10)
mp.tick_params(labelsize=8)
mp.grid(linestyle=":")

mp.show()

3. 子图

3.1 缺省布局

mp.subplot(行数,列数,图号)
mp.subplot(2,3,1)
mp.subplot(231)

sub1.py

import matplotlib.pyplot as mp

m, n = 3, 4

mp.figure(facecolor="lightgray")
for i in range(m * n):
    mp.subplot(m, n, i+1)
    mp.xticks(())
    mp.yticks(())
    mp.text(0.5, 0.5, str(i+1), ha="center", va="center", size=36, alpha=0.5)

mp.tight_layout()
mp.show()
3.2 栅格布局

import matplotlib.gridspec as mg
gs = mg.GridSpec(行数,列数)#栅格布局器
mp.subplot(gs[行,列])

代码:sub2.py

import matplotlib.gridspec as mg
import matplotlib.pyplot as mp

mp.figure(facecolor="lightgray")
gs = mg.GridSpec(3, 3)
mp.subplot(gs[0, :2])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "1", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[:2, 2])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "2", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[2, 1:])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "3", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[1:, 0])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "4", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[1, 1])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "5", ha="center", va="center", size=36, alpha=0.5)

mp.tight_layout()
mp.show()
3.3 自由布局

mp.axes([左下角水平坐标,左下角垂直坐标,宽度,高度])所有的尺寸参数都是比例

sub3.py

import matplotlib.pyplot as mp

mp.figure(facecolor="lightgray")
mp.axes([0.03, 0.038, 0.94, 0.924])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "1", ha="center", va="center", size=36, alpha=0.5)

mp.axes([0.63, 0.076, 0.31, 0.308])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "2", ha="center", va="center", size=36, alpha=0.5)

mp.show()

4. 坐标刻度定位器

定位器对象 = mp.xxxLocator(…)
ax = mp.gca()
ax.xaxis.set_major_locator(定位器对象) # 主刻度
ax.xaxis.set_minor_locator(定位器对象) # 次刻度

代码:tick.py

import numpy as np
import matplotlib.pyplot as mp

mp.figure(facecolor="lightgray")
locators = [
    "mp.NullLocator()",
    "mp.MaxNLocator(nbins=3, steps=[1, 3, 5, 7, 9])",
    "mp.FixedLocator(locs=[0, 2.5, 5, 7.5, 10])",
    "mp.AutoLocator()",
    "mp.IndexLocator(offset=0.5, base=1.5)",
    "mp.MultipleLocator()",
    "mp.LinearLocator(numticks=21)",
    "mp.LogLocator(base=2, subs=[1.0])"
]
n_locators = len(locators)
for i, locator in enumerate(locators):
    mp.subplot(n_locators, 1, i+1)
    mp.xlim(0, 10)
    mp.ylim(-1, 1)
    mp.yticks(())
    ax = mp.gca()
    ax.spines["left"].set_color("none")
    ax.spines["top"].set_color("none")
    ax.spines["right"].set_color("none")
    ax.spines["bottom"].set_position(("data", 0))
    ax.xaxis.set_major_locator(eval(locator))
    ax.xaxis.set_minor_locator(mp.MultipleLocator(0.1))
    mp.plot(np.arange(11), np.zeros(11), color="none")
    mp.text(5, 0.3, locator[3:], ha="center", size=12)

mp.tight_layout()
mp.show()

5. 散点图

scatter.py

import numpy as np
import matplotlib.pyplot as mp

n = 1000
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)
d = np.sqrt(x ** 2, y ** 2)

mp.figure("Scatter", facecolor="lightgray")
mp.title("Scatter", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.scatter(
    x, y,
    marker="*",# * D s
    s=60,
    c=d,
    cmap="jet_r",
    alpha=0.5
)

mp.show()

6. 区域填充

mp.fill_between(水平坐标数组,垂直坐标的起点数组,垂直坐标终点数组,条件,color=颜色,alpha=透明度)

代码:fill.py

import numpy as np
import matplotlib.pyplot as mp

n = 1000
x = np.linspace(0, 8 * np.pi, n)
sin_y = np.sin(x)
cos_y = np.cos(x / 2) / 2

mp.figure("Fill", facecolor="lightgray")
mp.title("Fill", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(
    x, sin_y,
    color="dodgerblue",
    label=r"$y=sin(x)$"
)
mp.plot(
    x, cos_y,
    color="limegreen",
    label=r"$y=\frac{1}{2}cos(\frac{x}{2})$"
)
mp.fill_between(x, cos_y, sin_y, cos_y <

你可能感兴趣的:(数据分析,字写笔记,数据分析)