pattern of distributed system 读书笔记- Patterns of Data Partitioning

pattern of distributed system 读书笔记- Patterns of Data Partitioning_第1张图片

1Fixed Partitions

1.1Problem
1.1.1requirements for mapping data to the cluster nodes.
The distribution should be uniform.
It should be possible to know which cluster node stores a particular data item without making a request to all the nodes.
1.1.2Even after adding only a few new cluster nodes, all the data needs to be moved. When the data size is large, this is undesirable.
1.2Solution
1.2.1One of the most commonly used solution is to map data to logical partitions. Logical partitions are mapped to the cluster nodes.
1.2.2When partitions are moved to new nodes, it should be relatively quick with only a smaller portion of the data being moved.
1.2.3Data storage or retrieval is then a two-step process.
First, you find the partition for the given data item.
Then you find the cluster node where the partition is stored.
1.2.4Choosing the Hash Function
It’s critical to choose the hashing method that gives the same hash values independent of the platform and runtime
1.2.5Mapping Partitions to Cluster Nodes
This mapping also needs to be stored and made accessible to the clients.
a particular cluster node to act as a coordinator or dedicated Consistent Core acts as a coordinator that keeps track of all nodes in the cluster and maps partitions to nodes
1.2.6Tracking Cluster Membership

2Key-Range Partitions

2.1Problem
2.1.1If users want to query a range of keys, specifying only the start and end key, all partitions will need to be queried for the values to be acquired. Querying every partition for a single request is far from optimal.
2.2Solution
2.2.1Create logical partitions for keys ranged in a sorted order. The partitions can then be mapped to cluster nodes.
2.2.2Predefining Key Ranges
If we already know the whole key space and the distribution of keys, the ranges for partitions can be specified upfront.
2.2.3Auto-Splitting Ranges
Calculating Partition Size and Finding the Middle Key
The coordinator, handling the split trigger message, updates the key range metadata for the original partition and creates a new partition metadata for the split range.
The cluster node splits the original partition and creates a new partition
2.2.4Load-Based Splitting
partition splitting on parameters, such as the total number of requests, CPU usage, or memory usage, is also used.

3Two-Phase Commit

3.1Problem
3.1.1nodes cannot make the data accessible to clients until the decision of other cluster nodes is known. Each node needs to know if other nodes successfully stored the data or if they failed.
3.2Solution
3.2.1The essence of two-phase commit

  1. The prepare phase asks each node if it can promise to carry out the update.
  2. The commit phase actually carries it out.
    If any node is unable to make that promise, then the coordinator tells all nodes to roll back, releasing
    any locks they have, and the transaction is aborted.
    It is crucial for each participant to ensure the durability of their decisions using a pattern like Write-Ahead Log. This means that even if a node crashes and subsequently restarts, it should be capable of completing the protocol without any issues.
    3.2.2Locks and Transaction Isolation
    The write locks can be taken only when the transaction is about to commit and the values are to be made visible in the key-value store.
    Different transaction isolation levels give different levels of visibility.
    Deadlock Prevention
    Error On Conflict
    If the wait policy is to error out, it will throw an error and the calling transaction will roll back.
    Wound-Wait
    If the lock owners are all younger than the transaction asking for the lock, all of those transactions are aborted.
    But if the transaction asking the lock is younger than the ones owning the transaction, it waits for the lock
    Wait-Die
    The wait-die method works in the opposite way to wound-wait.
    If the lock owners are all younger than the transaction asking for the lock, then the transaction waits for the lock. But if the transaction asking for the lock is younger than the ones owning the transaction, the transaction is aborted
    Commit and Rollback
    The coordinator implements the commit handling in two phases.
  3. It first sends the prepare request to each of the participants
    Once it receives a successful response from all the participants, the coordinator marks the transaction as prepared to complete. Then it sends the commit request to all the participants
    The cluster node does three things while committing the changes:
    It marks the transaction as committed. Should the cluster node fail at this point, it knows the outcome of the transaction, and can repeat the following steps.
    It applies all the changes to the key-value storage.
    It releases all the acquired locks.
    A cluster node receiving the rollback request does three things:
  4. It records the state of the transaction as rolled back in the write-ahead log.
  5. It discards the transaction state.
  6. It releases all of the locks.
    3.2.3Using Versioned Value
    It is optimal when readonly transactions can work without holding any locks and still guarantee that the values read in a transaction do not change with a concurrent read-write transaction.
    3.2.4Snapshot Isolation
    3.2.5Using Hybrid Clock
    Databases such as MongoDB or CockroachDB use Hybrid Clock to guarantee monotonicity of transactions because every request will adjust the hybrid clock on each server to be the most up-to-date.
    3.2.6Using Replicated Log
    3.2.7Failure Handling

你可能感兴趣的:(pattern of distributed system 读书笔记- Patterns of Data Partitioning)