Python学习笔记——数据处理2019-05-02

  • NumPy,数值计算的基础包。它定义了数值数组和矩阵类型以及它们的基本操作。
  • 该SciPy的库,数值算法和特定领域的工具箱,包括信号处理,优化,统计和更多的集合。
  • Matplotlib是一个成熟且受欢迎的绘图软件包,可提供出版品质的2D绘图以及基本的3D绘图

在此基础上,SciPy生态系统包括用于数据管理和计算,高效实验和高性能计算的通用和专用工具。下面我们概述了一些关键的包,尽管有更多相关的包。

数据和计算:

  • pandas,提供高性能,易于使用的数据结构。
  • SymPy,用于符号数学和计算机代数。
  • scikit-image是用于图像处理的算法的集合。
  • scikit-learn是用于机器学习的算法和工具的集合。
  • h5py和PyTables都可以访问以HDF5格式存储的数据。

生产力和高性能计算:

  • IPython是一个丰富的交互式界面,可让您快速处理数据和测试想法。
  • 该Jupyter笔记本提供了Web浏览器IPython的功能多,让您在轻松重现的形式记录您的计算。
  • Cython扩展了Python语法,以便您可以方便地构建C扩展,既可以加速关键代码,也可以与C / C ++库集成。
  • Dask,Joblib或IPyParallel用于分布式处理,重点是数字数据。

质量保证:

  • nose,一个测试Python代码的框架,逐步淘汰优先于pytest。
  • numpydoc,用于记录Scientific Python库的标准和库。

The SciPy ecosystem

Scientific computing in Python builds upon a small core of packages:

  • Python, a general purpose programming language. It is interpreted and dynamically typed and is very suited for interactive work and quick prototyping, while being powerful enough to write large applications in.
  • NumPy, the fundamental package for numerical computation. It defines the numerical array and matrix types and basic operations on them.
  • The SciPy library, a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics and much more.
  • Matplotlib, a mature and popular plotting package, that provides publication-quality 2D plotting as well as rudimentary 3D plotting

On this base, the SciPy ecosystem includes general and specialised tools for data management and computation, productive experimentation and high-performance computing. Below we overview some key packages, though there are many more relevant packages.

Data and computation:

  • pandas, providing high-performance, easy to use data structures.
  • SymPy, for symbolic mathematics and computer algebra.
  • scikit-image is a collection of algorithms for image processing.
  • scikit-learn is a collection of algorithms and tools for machine learning.
  • h5py and PyTables can both access data stored in the HDF5 format.

Productivity and high-performance computing:

  • IPython, a rich interactive interface, letting you quickly process data and test ideas.
  • The Jupyter notebook provides IPython functionality and more in your web browser, allowing you to document your computation in an easily reproducible form.
  • Cython extends Python syntax so that you can conveniently build C extensions, either to speed up critical code, or to integrate with C/C++ libraries.
  • Dask, Joblib or IPyParallel for distributed processing with a focus on numeric data.

Quality assurance:

  • nose, a framework for testing Python code, being phased out in preference for pytest.
  • numpydoc, a standard and library for documenting Scientific Python libraries.

你可能感兴趣的:(Python学习笔记——数据处理2019-05-02)