Distributed Locking

Distributed Locking Explained

Distributed locking is a mechanism used to coordinate access and modification of shared resources in a distributed system, like a microservices architecture or a cluster of databases. It prevents multiple processes or nodes from accessing the same resource simultaneously, which can lead to data inconsistencies, race conditions, and deadlocks.

Here’s how it works:

  • Acquiring the lock: When a process needs to access a shared resource, it first attempts to acquire a lock. This involves sending a request to a central coordinating service or utilizing a distributed data store like Redis.
  • Holding the lock: If the lock is available, the process acquires it and can safely access and modify the resource. Other processes attempting to acquire the lock are blocked until the current holder releases it.
  • Releasing the lock: Once the process finishes using the resource, it releases the lock, allowing other processes to compete for it and potentially acquire it next.

Types of distributed locks:

  • Exclusive lock: Only one process can hold the lock at a time, ensuring mutually exclusive access to the resource.
  • Shared lock: Multiple processes can hold the lock concurrently, allowing read-only access to the resource.
  • Readers-writer lock: Combines exclusive and shared locks, allowing multiple readers but only one writer at a time.

Common implementations:

  • Centralized coordinator: A dedicated service manages lock requests and ownership.
    Distributed data store: Locks are stored and managed in a distributed key-value store like Redis.
  • Leader election: One process acts as the leader and grants/denies lock requests.
    Challenges and considerations:
  • Availability: The coordinating service or data store must be highly available to avoid single points of failure.
  • Performance: Locks can add overhead and impact system performance, so their use should be balanced with potential concurrency issues.
  • Deadlocks: Careful design and lock timeouts are needed to prevent deadlocks where processes wait indefinitely for each other to release locks.

RedLock

Distributed locking is a mechanism used in distributed systems to control multiple nodes’ access to shared resources, ensuring that at any given moment only one node can perform specific operations. These locks are typically employed to safeguard data consistency and integrity, preventing concurrent operations from leading to data conflicts and errors.

There are various algorithms for implementing locks in distributed systems, and Redlock is an algorithm based on Redis for distributed locking. It operates by running five Redis replicas and using majority voting to acquire locks, aiming to enhance system fault tolerance and availability. However, Martin Kleppmann, in his blog post, delves deeply into the Redlock algorithm and points out its potential issues and shortcomings.

Firstly, Kleppmann argues that if locks are solely used for efficiency, to prevent unnecessary duplicate work, then there is no need for complex algorithms like Redlock. In such cases, a straightforward single-node Redis instance might suffice, as occasional lock losses would not lead to significant consequences.

However, when the correctness of the locks is critical, for instance, in operations such as updating shared storage systems, performing calculations, or calling external APIs, it is essential to ensure the validity of the locks. In these scenarios, Kleppmann believes that Redlock is not suitable because it lacks a mechanism for generating fencing tokens, which are necessary to guarantee the safety of the locks. Fencing tokens are incrementing numbers generated by the lock service each time a client acquires the lock, ensuring that even in the event of network delays or paused processes, write requests are handled correctly.

Furthermore, Kleppmann discusses the reliance of the Redlock algorithm on timing assumptions. The algorithm presumes that network delays, process pauses, and clock errors are limited and that the impact of these factors is less than the lifetime of the lock. However, in practice, these assumptions may not hold true; for example, network delays can result in prolonged packet transmission, or adjustments to the system clock can lead to inaccuracies in the lock’s timeout settings. Therefore, once these assumptions are violated, Redlock may violate its safety properties, posing a risk to the security of the locks.

Ultimately, Kleppmann suggests that if locks are needed for correctness, a consensus system like ZooKeeper should be chosen and the use of fencing tokens enforced for all resource accesses under the lock. For locks based on best efforts (i.e., as an efficiency optimization rather than for correctness), he recommends using a single Redis instance and clearly documenting in the code that the locks are approximate and may occasionally fail.

In conclusion, Kleppmann’s article highlights key considerations in designing distributed locking solutions, including the purpose of the locks, the trade-off between correctness and efficiency, and the dependency of algorithms on timing and system model assumptions. His insights remind us to be cautious in selecting and employing distributed locking solutions to ensure the reliability and stability of our systems.


分布式锁是分布式系统中用于控制多个节点对共享资源访问的一种机制,确保在任意时刻只有一个节点能够执行特定操作。这种锁通常用于保护数据的一致性和完整性,防止并发操作导致的数据冲突和错误。

在分布式系统中实现锁的算法有多种,其中Redlock算法是一个基于Redis实现的分布式锁方案。它通过运行五个Redis副本并采用多数派投票的方式来获取锁,旨在提高系统的容错性和可用性。然而,Martin Kleppmann在其博客文章中深入分析了Redlock算法,并指出了其潜在的问题和不足之处。

首先,Kleppmann指出,如果使用锁仅仅是为了提高效率,避免重复工作,那么没有必要采用复杂的Redlock算法。在这种情况下,一个简单的单节点Redis实例可能就足够了,因为即使偶尔丢失一些锁,也不会带来严重的后果。

然而,当锁的正确性对于系统至关重要时,例如在写入共享存储系统、执行计算任务或调用外部API等操作中,必须确保锁的有效性。这时,Kleppmann认为Redlock算法并不适用,因为它缺乏生成栅栏令牌(fencing tokens)的机制来保证锁的安全性。栅栏令牌是一种递增的数字,每次客户端获取锁时由锁服务生成,用于确保即使在网络延迟或进程暂停的情况下,也能正确地处理写请求。

此外,Kleppmann还讨论了Redlock算法对时间假设的依赖问题。算法假定网络延迟、进程暂停和时钟误差都是有限的,并且这些因素的影响都小于锁的生命周期。但在实际环境中,这些假设可能不成立,例如网络延迟可能导致长时间的数据包传输,或者系统时钟的调整可能导致锁的超时设置不准确。因此,一旦这些假设被打破,Redlock算法可能会违反其安全性属性,导致锁的安全性问题。

最后,Kleppmann建议,如果需要为正确性而使用锁,应该选择像ZooKeeper这样的共识系统,并强制在所有受锁保护的资源访问中使用栅栏令牌。而对于仅以最佳努力为基础的锁(即作为效率优化而非正确性保障),他建议使用单个Redis实例,并明确记录代码中的锁仅为近似值,可能会偶尔失败。

总之,Kleppmann的文章强调了在设计分布式锁时需要考虑的关键因素,包括锁的目的、正确性与效率之间的权衡,以及算法对时间和系统模型假设的依赖性。他的观点提醒我们在选择和使用分布式锁解决方案时应谨慎,以确保系统的可靠性和稳定性。


Distributed Locking_第1张图片
Distributed Locking_第2张图片


See

https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
https://dl.acm.org/doi/pdf/10.1145/2639988.2655736

Distributed Locking_第3张图片

你可能感兴趣的:(分布式系统概念和设计,&,GPT,&,ME,Database,&,ME,&,GPT,分布式)