到底什么是hash partition?

最近在flink的dataset api中看到了hash-partition的概念。

 

下面这个解释[1]比较清晰:

Techopedia explains Hash Partitioning

Hash partitioning is a method to separate out information in a randomized way rather than putting the data in the form of groups. This partitioning system can be used efficiently to manage data on a particular platform. However, there are no performance benefits associated with hash partitioning, as it shuffles the data across the table space randomly.

The partitioning system can be used to efficiently match queries. It makes use of hashing algorithms to distribute the data across the device to space out the load. By this method, the partitions are approximately the same size. The data that can be partitioned is not historical in nature, and thus this method is very easy to use.

 

Reference:

[1]Hash Partitioning

你可能感兴趣的:(Flink)