|
马上注册 与译者交流
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
Leslie Lamport
PHOTOGRAPHS
BIRTH:
7 February 1941 in New York, New York
EDUCATION:
Bronx High School of Science (1957); B.S. (Massachusetts Institute of Technology, Mathematics, 1960); M.S (Brandeis University, Mathematics, 1963); PhD (Brandeis University, Mathematics, 1972).
EXPERIENCE:
Massachusetts Computer Associates, 1970-1977; SRI International, 1977-1985; Digital Equipment Corporation and Compaq, 1985-2001; Microsoft Research, from 2001.
HONORS AND AWARDS:
Dijkstra Prize, for the paper "Time, clocks, and the ordering of events in a distributed system" (2000); Honorary Doctorate, University of Rennes (2003); Honorary Doctorate, Christian Albrechts University of Kiel (2003); Honorary Doctorate, École Polytechnique Fédérale de Lausanne (2004); IEEE Emanuel R. Piore Award (2004); Dijkstra Prize, for the paper "Reaching Agreement in the Presence of Faults " (2005); Honorary Doctorate, University of Lugano (2006); Honorary Doctorate, Nancy-Université (2007); IEEE John von Neumann Medal (2008); Member, US National Academy of Sciences (2011); ACM Fellow (2014).
LESLIE LAMPORT DL Author Profile link
United States – 2013
CITATION
For fundamental contributions to the theory and practice of distributed and concurrent systems, notably the invention of concepts such as causality and logical clocks, safety and liveness, replicated state machines, and sequential consistency.
SHORT ANNOTATED
BIBLIOGRAPHY
ACM TURING AWARD
LECTURE VIDEO
RESEARCH
SUBJECTS
ADDITIONAL
MATERIALS
VIDEO INTERVIEW
If we could travel back in time to 1974, perhaps we would have found Leslie Lamport at his busy local neighborhood bakery, grappling with the following issue. The bakery had several cashiers, but if more than one person approached a single cashier at the same time, that cashier would try to talk to all of them at once and become confused. Lamport realized that there needed to be some way to guarantee that people approached cashiers one at a time. This problem reminded Lamport of an issue which has been posed in an earlier article by computer scientist Edsger Dijkstra on another mundane issue: how to share dinner utensils around a dining table. One of the coordination challenges was to guarantee that each utensil was used by at most one diner at a time, which came to be generalized as the mutual exclusion problem, exactly the challenge Lamport faced at the bakery.
One morning in 1974, an idea came to Lamport on how the bakery customers could solve mutual exclusion among themselves, without relying on the bakery for help. It worked roughly like this: people choose numbers when they enter the bakery, and then get served at the cashier according to their number ordering. To choose a number, a customer asks for the number of everyone around her and chooses a number higher than all the others.
This simple idea became an elegant algorithm for solving the mutual exclusion problem without requiring any lower-level indivisible operations. It also was a rich source of future ideas, since many issues had to be worked out. For example, some bakery customers took a long time to check other numbers, and meanwhile more customers arrived and selected additional numbers. Another time, the manager of the bakery wanted to get a snapshot of all the customer numbers in order to prepare enough pastries. Lamport later said "For a couple of years after my discovery of the bakery algorithm, everything I learned about concurrency came from studying it." [1]
The Bakery Algorithm and Lamport's other pioneering works -- many with amusing names and associated parables -- have become pillars of computer science. His collection forms the foundation of broad areas in concurrency, and has influenced the specification, development, and verification of concurrent systems.
After graduating with a PhD in math from Brandeis University in 1972, Lamport worked as a computer scientist at Massachusetts Computer Associates from 1970 to 1977, at SRI International from 1977 to 1985, and at Digital Equipment Corporation Systems Research Center (later owned by Compaq) from 1985 to 2001. In 2001 he joined Microsoft Research in Mountain View, California.
Spending his research career in industrial research environments was not an accident. "I like working in an industrial research lab, because of the input", Lamport said. "If I just work by myself and come up with problems, I’d come up with some small number of things, but if I go out into the world, where people are working on real computer systems, there are a million problems out there. When I look back on most of the things I worked on—Byzantine Generals, Paxos—they came from real-world problems.” [2]
His works shed light on fundamental issues of concurrent programs, for which there was no formal theory at the time. He grappled with fundamental concepts such as causality and logical time, atomic and regular shared registers, sequential consistency, state machine replication, Byzantine agreement and wait-freedom. He worked on algorithms which have become standard engineering practice for fault tolerant distributed systems. He has also developed a substantial body of work on the formal specification and verification of concurrent system, and has contributed to the development of automated tools applying these methods. We will touch on only some of his contributions.
1. Mutual Exclusion solutions and the Bakery algorithm
Lamport's influential works from the 1970's and 1980's came at a time there was only a little understanding of the fundamental issues about programming for multiple concurrent processors.
For example, it was known that correct execution may require parallel activities to exclude one another during periods in "critical sections" when they manipulate the same data, in order to prevent undesired interleaving of operations. The origins of this mutual exclusion problem are from Edsger Dijkstra's pioneering work, which includes his solution. [3] Dijkstra's algorithm, while correct, depends on shared memory accesses being atomic – that one processor reading when another is writing will be made to wait, rather than returning a possibly garbled value. In a sense, it constructs a high-level solution out of low-level mutual exclusion already implemented by the hardware.
Lamport's remarkably elegant and intuitive "Bakery Algorithm" [4] doesn't do that. His solution arranges contending processes in an implicit queue according to their arrival order, much like a wait-queue in a Bakery. Yet it doesn't matter if a processor reading data that is being updated by another processor gets garbage. The algorithm still works.
Lamport recalls his discovery of the Bakery Algorithm.
The Bakery algorithm has become textbook material, and most undergraduates in computer science encounter it in the course of their studies.
2. Foundations of Concurrent programming
Several important new concepts emanated from the Bakery Algorithm work, a trend which recurred several times in Lamport's career. The experience of devising concurrent algorithms and verifying correctness caused him to focus on the basic foundations that would make multiprocessors behave in a manner that programmers can reason mathematically about. While working on a solution for a specific concrete problem, Lamport invented abstractions and general rules needed to reason about its correctness, and these conceptual contributions then became theoretical pillars of concurrent programming.
Loop-freedom: The Bakery Algorithm work introduced an important concept called "loop freedom". Some obvious solutions that come to mind for the mutual exclusion problem pre-assign `turns' in rotation among the processes. But this forces processes to wait for others that are slow and have not yet even reached the point of contention. Using the bakery analogy, it would be akin to arriving to an empty bakery and being asked to wait for a customer who hasn't even arrived at the bakery yet. In contrast, loop-freedom expresses the ability of processes to make progress independent of the speed of other processes. Because the Bakery Algorithm assigns turns to processes in the order of their arrival, it has loop-freedom. This is a crucial concept which has been used in the design of many subsequent algorithms, and in the design of memory architectures. Wait-freedom, a condition requiring independent progress despite failures, has its clear roots in the notion of loop-freedom and the Bakery doorway concept. It was later extensively explored by others, including Maurice Herlihy [5].
Sequential consistency: Working with a multi-core architecture that had distributed cache memory led Lamport to create formal specifications for coherent cache behavior in multiprocessor systems. That work brought some order to the chaos of this field by inventing sequential consistency [6], which has become the gold standard for memory consistency models. This simple and intuitive notion provides just the right level of “atomicity” to allow software to work. Today we design hardware systems with timestamp ordering or partial-store ordering, with added memory fence instructions, which allows programmers to make the hardware appear sequentially consistent. Programmers can then implement algorithms that provide strong consistency properties. This is key to the memory consistency models of Java and C++. Our multicore processors run today based on principles described by Leslie Lamport in 1979.
Atomic and regular registers: The Bakery Algorithm also led Lamport to wonder about the precise semantics of memory when multiple processes interact to share data. It took almost a decade to formalize, and the result is the abstraction of regular and atomic registers [7].
His theory gives each operation on a shared register an explicit duration, starting with an invocation and ending with a result. The registers can be implemented by a variety of techniques, such as replication. Nevertheless, the interactions of processes with an atomic register are supposed to “look like” serial accesses to actual shared memory. The theory also includes weaker semantics of interaction, like those of a regular register. A regular register captures situations in which processes read different replicas of the register while it is being updated. At any moment in time, some replicas may be updated while others are not, and eventually, all replicas will hold the updated value. Importantly, these weaker semantics suffice to support mutual exclusion: the Bakery algorithm works correctly if a reader overlapping a writer obtains back any arbitrary value.
Lamport describes his work to precisely define abstractions for atomic, regular, and safe registers.
This work initiated a distinct subfield of research in distributed computing that is still thriving. Lamport’s atomic objects supported only read and write operations, that is, they were atomic registers. The notion was generalized to other data types by Maurice Herlihy and Jeannette Wing [8], and their term "linearizability" became synonymous with atomicity. Today, essentially all non-relational storage systems developed by companies like Amazon, Google, and Facebook adopt linearizability and sequential consistency for their data coherence guarantees.
3. Foundations of Distributed Systems
A special type of concurrent system is a distributed system, characterized by having processes that use messages to interact with each other. Leslie Lamport has had a huge impact on the way we think about distributed system, as well as on the engineering practices of the field.
Logical clocks: Many people realized that a global notion of time is not natural for a distributed system. Lamport was the first to make precise an alternative notion of "logical clocks", which impose a partial order on events based on the causal relation induced by sending messages from one part of the system to another [9]. His paper on "Time, Clocks, and the Ordering of Events in a Distributed System" has become the most cited of Lamport’s works, and in computer science parlance logical clocks are often nicknamed Lamport timestamps. His paper won the 2000 Principles of Distributed Computing Conference Influential Paper Award (later renamed the Edsger W. Dijkstra Prize in Distributed Computing), and it won an ACM SIGOPS Hall of Fame Award in 2007.
Lamport describes his paper “Time, Clocks…” and its relationship to special relativity.
To understand why that work has become so influential, recognize that at the time of the invention there was no good way to capture the communication delay in distributed systems except by using real time. Lamport realized that the communication delay made those systems very different from a shared-memory multiprocessor system. The insight came when reading a paper on replicated databases [10] and realizing that its logical ordering of commands might violate causality.
Using ordering of events as a way of proving system correctness is mostly what people do today for intuitive proofs of concurrent synchronization algorithms. Another powerful contribution of this work was to demonstrate how to replicate a state machine using logical clocks, which is explained below.
Distributed Snapshots: Once you define causal order, the notion of consistent global states naturally follows. That led to another insightful work. Lamport and Mani Chandy invented the first algorithm for reading the state (taking a `snapshot’) of an arbitrary distributed system [11]. This is such a powerful notion that others later used it in different domains, like networking, self-stabilization, debugging, and distributed systems. This paper received the 2013 ACM SIGOPS Hall of Fame Award.
4. Fault tolerance and State Machine Replication
``A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable'' is a famous Lamport quip. Much of his work is concerned with fault tolerance.
State Machine Replication (SMR): Perhaps the most significant of Lamport's many contributions is the State Machine Replication paradigm, which was introduced in the famous paper on ``Time, Clocks, and the Ordering of Events in a Distributed System', and further developed soon after [12]. The abstraction captures any service as a centralized state machine -- a kind of a universal computing engine similar to a Turing machine. It has an internal state, and it processes commands in sequence, each resulting in a new internal state and producing a response. Lamport realized that the daunting task of replicating a service over multiple computers can be made remarkably simple if you present the same sequence of input commands to all replicas and they proceed through an identical succession of states.
This insightful SMR paradigm underlies many reliable systems, and is considered a standard approach for building replicated distributed systems due to its elegance. But before Lamport developed a full solution using for SMR, he needed to address a core ingredient, agreement, which was tackled in his next work.
Byzantine Agreement: While state machine approaches that are resilient to crash faults are sufficient for many applications, more mission-critical systems, such as for avionics, need an even more extreme model of fault-tolerance that is impervious to nodes that might disrupt the system from within.
At Stanford Research Institute (later called SRI International) in the 1970's, Lamport was part of a team that helped NASA design a robust avionics control system. Formal guarantees were an absolute necessity because of the mission-critical nature of the task. Safety had to be guaranteed against the most extreme system malfunction one could imagine. One of the first challenges the team at SRI was asked to take on was to prove the correctness of a cockpit control scheme, which NASA had designed, with three computer systems that use majority-voting to mask any faulty component.
The result of the team's work was several foundational concepts and insights regarding these stringent types of robust systems. It included a fundamental definition of robustness in this setting, an abstraction of the coordination problem which underlies any replicated system to this date, and a surprising revelation that systems with three computers can never safely run a mission critical cockpit!
Indeed, in two seminal works ("LPS") published with Pease and Shostak [13] [14], the team first identified a somewhat peculiar vulnerability. LPS posited that "a failed component may exhibit a type of behavior that is often overlooked -- namely, sending conflicting information to different parts of the system". More generally, a malfunctioning component could function in a manner completely inconsistent with its prescribed behavior, and might appear almost malicious.
The new fault model needed a name. At the time there was a related classical challenge of coordinating two communicating computers, introduced in a 1975 paper [15] and referred to by Jim Gray in [16] as the "The Two Generals Paradox ". This led Lamport to think of the control computers in a cockpit as an army of Byzantine Generals, with the army trying to form a coordinated attack while inside traitors sent conflicting signals. The name "Byzantine Failures" was adopted for this fault model, and a flurry of academic work followed. The Byzantine fault model is still in use for capturing the worst kind of mishaps and security flaws in systems.
Lamport explains his use of a story about “Byzantine Generals” to frame a problem in fault tolerant distributed systems.
Byzantine Failures analyze the bad things which may happen. But what about the good things that need to happen? LPS also gave an abstract formulation of the problem of reaching coordination despite Byzantine failures; this is known as the "Byzantine Agreement" problem. This succinct formulation expresses the control coordination task as the problem of forming an agreement decision for an individual bit, starting with potentially different bits input to each component. Once you have agreement on a single bit, it is possible to repeatedly use it in order to keep an entire complex system coordinated. The paper shows that four computers are needed to form agreement on a single bit in face of a single malfunction. Three are not enough, because with three units, a faulty unit may send conflicting values to the other two units, and form a different majority with each one. More generally, they showed that 3F+1 units are needed in order to overcome F simultaneously faulty components. To prove this, they used a beautiful symmetry argument which has become known as the `hexagon argument'. This archetype argument has found other uses whenever one argues that a malfunctioning unit that sends conflicting information to different parts of the system looks indistinguishable from a symmetrical situation in which the correct and faulty roles are reversed.
LPS also demonstrated that 3F+1 units are enough, and they presented a solution for reaching Byzantine Agreement among the 3F+1 units in F+1 synchronous communication rounds. They also showed that if you use digital signatures, just 2F+1 units are sufficient and necessary.
The Byzantine Agreement problem and its solutions have become the hallmark of fault tolerant systems. Most systems constructed with redundancy make use of it internally for replication and for coordination. Lamport himself later used it in forming the State Machine Replication paradigm discussed next, which gives the algorithmic foundation of replication.
The 1980 paper was awarded the 2005 Edsger W. Dijkstra Prize in Distributed Computing, and the 1982 paper received the Jean-Claude Laprie Award in Dependable Computing.
Paxos: With a growing understanding of the agreement problem for distributed computing, it was time for Lamport to go back to State Machine Replication and address failures there. The first SMR solution he presented in his 1978 paper assumed there are no failures, and it makes use of logical time to step replicas through the same command sequence. In 1989, Lamport designed a fault tolerant algorithm called Paxos [17] [18]. Continuing his trend of humorous parable-telling, the paper presents the imaginary story of a government parliament on an ancient Greek island named Paxos, where the absence of any number of its members, or possibly all of them, can be tolerated without losing consistency.
Lamport describes the origins of his Paxos algorithm for fault tolerant distributed computing.
Unfortunately the setting as a Greek parable made the paper difficult for most readers to comprehend, and it took nine years from submission to publication in 1998. But the 1989 DEC technical report did get noticed. Lamport's colleague Butler Lampson evangelized the idea to the distributed computing community [19]. Shortly after the publication of Paxos, Google's Chubby system and Apache's open-source ZooKeeper offered State Machine Replication as an external, widely-deployed service.
Paxos stitches together a succession of agreement decisions into a sequence of state-machine commands in an optimized manner. Importantly, the first phase of the agreement component given in the Paxos paper (called Synod) can be avoided when the same leader presides over multiple decisions; that phase needs to be performed only when a leader needs to be replaced. This insightful breakthrough accounts for much of the popularity of Paxos, and was later called Multi-Paxos by the Google team [20]. Lamport's Paxos paper won the ACM SIGOPS (Special Interest Group on Operating Systems) Hall of Fame Award in 2012.
SMR and Paxos have become the de facto standard framework for designing and reasoning about consensus and replication methods. Many companies building critical information systems, including Google, Yahoo, Microsoft, and Amazon, have adopted the Paxos foundations.
5. Formal specification and verification of programs
In the early days of concurrency theory the need surfaced for good tools to describe solutions and prove their correctness. Lamport has made central contributions to the theory of specification and verification of concurrent programs. For example, he was the first to articulate the notions of safety properties and liveness properties for asynchronous distributed algorithms. These were the generalization of “partial correctness” and “total correctness” properties previously defined for sequential programs. Today, safety and liveness properties form the standard classification for correctness properties of asynchronous distributed algorithms.
Another work, with Martin Abadi [21], Introduced a special abstraction called prophecy variables to an algorithm model, to handle a situation where an algorithm resolves a nondeterministic choice before the specification does. Abadi and Lamport pointed out situations where such problems arise, and developed the theory needed to support this extension to the theory. Moreover, they proved that whenever a distributed algorithm meets a specification, where both are expressed as state machines, the correspondence between them can be proved using a combination of prophecy variables and previous notions such as history variables. This work won the 2008 LICS Test-Of-Time award.
Formal Modeling Languages and Verification Tools: In addition to developing the basic notions above, Lamport has developed the language TLA (Temporal Logic of Actions) and the TLA+ toolset, for modeling and verifying distributed algorithms and systems.
Lamport describes TLA and its connection to refinement mapping.
TLA and TLA+ support specification and proof of both safety and liveness properties, using notation based on temporal logic. Lamport has supervised the development of verification tools based on TLA+, notably the TLC model checker built by Yuan Yu. TLA+ and TLC have been used to describe and analyze real systems. For example, these tools were used to find a major error in the coherence protocol used in the hardware for Microsoft’s Xbox 360 prior to its release in 2005. At Intel, it was used for the analysis of a cache-coherence protocol of the Intel Quick Path Interconnect as implemented in the Nehalem core processor. To teach engineers how to use his formal specification tools, Lamport has written a book [22]. More recently, Lamport has developed the PlusCAL formal language and tools for use in verifying distributed algorithms; this work builds upon TLA+.
6. LaTeX
When creating such a vast collection of impactful papers, it is natural to wish for a convenient typesetting tool. Lamport did not just wish for one, he created one for the entire community. Outside the field of concurrency is Lamport’s Latex system [23], a set of macros for use with Donald Knuth’s TeX typesetting system [24]. LaTeX added three important things to TeX:
The concept of ‘typesetting environment’, which had originated in Brian Reid’s Scribe system.
A strong emphasis on structural rather than typographic markup.
A generic document design, flexible enough to be adequate for a wide variety of documents.
Lamport did not originate these ideas, but by pushing them as far as possible he created a system that provides the quality of TeX and a lot of its flexibility, but is much easier to use. Latex became the de facto standard for technical publishing in computer science and many other fields.
There are many other important papers by Leslie Lamport -- too many to describe here. They are listed in chronological order on Lamport's home page [1], accompanied by historical notes that describe the motivation and context of each result.
Any time you access a modern computer, you are likely to be impacted by Leslie Lamport's algorithms. And all of this work started with the quest to understand how to organize a queue at a local bakery.
Author: Dahlia Malkhi
Additional contributors: Martin Abadi, Hagit Attiya, Idit Keidar,
Nancy Lynch, Nir Shavit,
George Varghese, and Len Shustek
Leslie Lamport
照片
出生地:纽约
1941年2月7日在纽约,纽约
学历:布朗克斯科学高中(1957年);学士(麻省理工学院,数学)。
布朗克斯科学高中(1957年);学士(麻省理工学院,数学,1960年);硕士(布兰代斯大学,数学,1963年);博士(布兰代斯大学,数学,1972年)。
工作经验。
马萨诸塞州计算机协会,1970-1977年;SRI国际,1977-1985年;数字设备公司和康柏公司,1985-2001年;微软研究院,2001年起。
荣誉和奖项。
迪克斯特拉奖,获奖论文 "时间、时钟和分布式系统中的事件排序"(2000年);雷恩大学荣誉博士(2003年);基尔克里斯蒂安-阿尔布雷希特大学荣誉博士(2003年);洛桑联邦理工学院荣誉博士(2004年);IEEE伊曼纽尔R. Piore奖(2004年);Dijkstra奖,奖励论文 "Reaching Agreement in the Presence of Faults"(2005年);卢加诺大学荣誉博士(2006年);南锡大学荣誉博士(2007年);IEEE John von Neumann Medal(2008年);美国国家科学院院士(2011年);ACM Fellow(2014年)。
LESLIE LAMPORT DL作者简介链接
美国 - 2013
嘉奖
对分布式系统和并发式系统的理论和实践做出了基本贡献,特别是发明了因果关系和逻辑时钟、安全和有效性、复制状态机和顺序一致性等概念。
简短注释
书目
亚马逊图灵奖
讲座视频
研究成果
主题
额外的
材料
采访视频
如果我们能穿越时空回到1974年,也许我们会发现Leslie Lamport在他繁忙的当地社区面包店里,正在努力解决以下问题。这家面包店有几个收银员,但如果有一个以上的人同时接近一个收银员,这个收银员就会试图同时与所有的人交谈,从而变得混乱。拉姆波特意识到,需要有一些方法来保证人们一次只接触一个收银员。这个问题让Lamport想起了计算机科学家Edsger Dijkstra在早期的一篇文章中提出的另一个平凡的问题:如何在餐桌上分享晚餐用具。其中一个协调方面的挑战是保证每个餐具一次最多只能被一个用餐者使用,这被概括为互斥问题,这正是兰波特在面包店面临的挑战。
1974年的一个早晨,兰波特想到了一个主意,即面包店的顾客可以自己解决相互排斥的问题,而不需要依靠面包店的帮助。它的工作原理大致是这样的:人们在进入面包店时选择数字,然后在收银台根据他们订购的数字获得服务。为了选择一个号码,顾客会询问她周围所有人的号码,然后选择一个比其他所有人都高的号码。
这个简单的想法成为解决互斥问题的优雅算法,而不需要任何低级别的不可分割的操作。这也是未来想法的丰富来源,因为许多问题都需要解决。例如,一些面包店的顾客花了很长时间来检查其他号码,与此同时,更多的顾客来到这里,选择了更多的号码。还有一次,面包店的经理想获得所有顾客号码的快照,以便准备足够的糕点。Lamport后来说:"在我发现面包店算法之后的几年里,我所学到的关于并发的一切都来自于对它的研究。" [1]
面包店算法和Lamport的其他开创性作品--许多都有有趣的名字和相关的寓言--已经成为计算机科学的支柱。他的作品集构成了并发性的广泛领域的基础,并影响了并发系统的规范、开发和验证。
1972年从布兰迪斯大学获得数学博士学位后,Lamport于1970年至1977年在马萨诸塞州计算机协会担任计算机科学家,1977年至1985年在SRI国际公司工作,1985年至2001年在数字设备公司系统研究中心(后来由康柏公司拥有)工作。2001年,他加入了位于加州山景城的微软研究院。
在工业研究环境中度过他的研究生涯并不是一个偶然。"我喜欢在工业研究实验室工作,因为有投入",兰波特说。"如果我只是自己工作,想出问题,我会想出一些少量的东西,但如果我走到外面的世界,人们在真正的计算机系统上工作,那里有无数的问题。当我回顾我所做的大多数事情--拜占庭将军,Paxos--它们来自于现实世界的问题。[2]
他的作品阐明了并发程序的基本问题,当时还没有正式的理论。他处理了一些基本概念,如因果关系和逻辑时间、原子和常规共享寄存器、顺序一致性、状态机复制、拜占庭协议和等待自由。他所研究的算法已经成为分布式系统容错的标准工程实践。他还开发了大量关于并发系统的形式化规范和验证的工作,并对应用这些方法的自动化工具的开发做出了贡献。我们将只谈他的一些贡献。
1. 相互排斥解决方案和Bakery算法
Lamport在20世纪70年代和80年代所做的有影响力的工作,当时对多并发处理器编程的基本问题只有一点点了解。
例如,人们知道正确的执行可能需要并行活动在操作相同数据的 "关键部分 "期间相互排斥,以防止不想要的交错操作。这个相互排斥问题的起源是Edsger Dijkstra的开创性工作,其中包括他的解决方案。[3] Dijkstra的算法虽然正确,但却依赖于共享内存访问的原子性--当另一个处理器正在写入时,一个正在读取的处理器会被要求等待,而不是返回一个可能是乱码的值。在某种意义上,它从硬件已经实现的低级互斥中构建了一个高级解决方案。
Lamport的非常优雅和直观的 "面包店算法"[4]并没有这样做。 他的解决方案将竞争的进程按照它们的到达顺序安排在一个隐式队列中,很像面包店的等待队列。然而,如果一个处理器在读取另一个处理器正在更新的数据时得到了垃圾,那也没有关系。该算法仍然有效。
拉姆波特回忆起他发现的 "面包店算法"。
Bakery算法已经成为教科书的材料,大多数计算机科学的本科生在学习过程中都会遇到它。
2. 并发编程的基础
几个重要的新概念来自于Bakery算法的工作,这种趋势在Lamport的职业生涯中多次出现。设计并发算法和验证正确性的经验使他专注于基本的基础,使多处理器的行为方式可以让程序员用数学的方式来推理。在研究一个具体问题的解决方案时,Lamport发明了对其正确性进行推理所需的抽象和一般规则,这些概念性的贡献后来成为并发编程的理论支柱。
循环自由度。Bakery算法的工作引入了一个重要的概念,叫做 "循环自由"。 对于互斥问题,人们想到的一些明显的解决方案是在进程之间预先分配 "轮流"。但是,这迫使进程等待其他速度较慢、甚至尚未达到争夺点的进程。使用面包店的比喻,这就好比来到一个空荡荡的面包店,被要求等待一个甚至还没有到达面包店的顾客。相比之下,循环自由度表达的是进程独立于其他进程的速度而取得进展的能力。因为面包店算法是按照进程到达的顺序来分配轮次的,所以它有循环自由度。 这是一个关键的概念,在后来许多算法的设计中,以及在内存结构的设计中都被使用。等待自由(Wait-freedom)是一个要求在失败的情况下仍有独立进展的条件,它明显地起源于循环自由的概念和Bakery doorway的概念。后来,包括Maurice Herlihy[5]在内的其他人对它进行了广泛的探索。
顺序的一致性。在与一个具有分布式缓存的多核架构的合作中,Lamport为多处理器系统中的一致性缓存行为创建了正式的规范。这项工作给这个领域的混乱带来了一些秩序,发明了顺序一致性[6],它已经成为内存一致性模型的黄金标准。这个简单而直观的概念提供了恰到好处的 "原子性",使软件能够正常工作。今天,我们设计的硬件系统具有时间戳排序或部分存储排序,并增加了内存栅栏指令,这使得程序员可以使硬件看起来顺序一致。然后,程序员可以实现提供强一致性属性的算法。这是Java和C++的内存一致性模型的关键。我们的多核处理器今天的运行是基于Leslie Lamport在1979年描述的原则。
原子寄存器和常规寄存器。面包店算法也让Lamport想知道当多个进程相互作用以共享数据时,内存的精确语义是什么。这花了将近十年的时间来正式确定,其结果是对常规和原子寄存器的抽象[7]。
他的理论给共享寄存器上的每个操作一个明确的持续时间,以调用开始,以结果结束。寄存器可以通过各种技术来实现,例如复制。然而,进程与原子寄存器的交互被认为 "看起来像 "对实际共享内存的串行访问。该理论还包括较弱的交互语义,如常规寄存器的语义。一个常规的寄存器可以捕捉到这样的情况:当寄存器被更新时,进程读取不同的副本。在任何时候,一些副本可能被更新,而另一些没有被更新,最终,所有的副本都持有更新的值。重要的是,这些较弱的语义足以支持相互排斥:如果一个读者重叠一个写作者获得回任何任意的值,Bakery算法就能正确地工作。
Lamport描述了他精确定义原子、常规和安全寄存器的抽象的工作。
这项工作开创了分布式计算研究的一个独特的子领域,至今仍在蓬勃发展。Lamport的原子对象只支持读和写操作,也就是说,它们是原子寄存器。这个概念被Maurice Herlihy和Jeannette Wing[8]推广到其他数据类型,而他们的术语 "线性化 "成为原子性的同义词。今天,基本上所有由亚马逊、谷歌和Facebook等公司开发的非关系型存储系统都采用线性化和顺序一致性来保证其数据一致性。
3. 分布式系统的基础
分布式系统是一种特殊类型的并发系统,其特点是拥有使用消息进行交互的进程。Leslie Lamport对我们思考分布式系统的方式以及该领域的工程实践产生了巨大影响。
逻辑时钟。许多人意识到,对于一个分布式系统来说,全局性的时间概念是不自然的。 Lamport是第一个明确提出 "逻辑时钟 "这一替代概念的人,它基于从系统的一个部分向另一个部分发送消息所引起的因果关系,对事件施加了一个部分顺序[9]。他关于 "时间、时钟和分布式系统中的事件排序 "的论文已经成为Lamport作品中被引用最多的,在计算机科学术语中,逻辑时钟通常被昵称为Lamport时间戳。他的论文赢得了2000年分布式计算原理会议有影响的论文奖(后来改名为分布式计算的Edsger W. Dijkstra奖),并在2007年获得了ACM SIGOPS名人堂奖。
Lamport描述了他的论文 "时间,时钟...... "以及它与狭义相对论的关系。
要理解为什么这项工作变得如此有影响力,要认识到在发明的时候,除了使用实时,没有好的方法来捕捉分布式系统中的通信延迟。拉姆波特意识到,通信延迟使这些系统与共享内存多处理器系统非常不同。在阅读一篇关于复制数据库的论文[10]并意识到其命令的逻辑排序可能违反了因果关系时,他有了这样的洞察力。
使用事件的排序作为证明系统正确性的方式,主要是今天人们对并发同步算法的直观证明。 这项工作的另一个强大的贡献是证明了如何使用逻辑时钟来复制状态机,这将在下面解释。
分布式快照。 一旦你定义了因果顺序,一致的全局状态的概念就自然而然地出现了。这导致了另一项富有洞察力的工作。 Lamport和Mani Chandy发明了第一个用于读取任意分布式系统的状态(拍摄 "快照")的算法[11]。这是一个非常强大的概念,以至于后来其他人将其用于不同的领域,如网络、自稳定、调试和分布式系统。这篇论文获得了2013年ACM SIGOPS名人堂奖。
4. 容错和状态机复制
`分布式系统是一个你甚至不知道存在的计算机的故障可以使你自己的计算机无法使用的系统'是Lamport著名的调侃。他的许多工作都与容错有关。
状态机复制(SMR)。在Lamport的众多贡献中,最重要的也许是状态机复制范式,它是在著名的 "时间、时钟和分布式系统中的事件排序 "的论文中提出的,并在不久后得到进一步发展[12]。这种抽象将任何服务都捕捉为一个集中的状态机--一种类似于图灵机的通用计算引擎。它有一个内部状态,并依次处理命令,每个命令都会产生一个新的内部状态并产生一个响应。拉姆波特意识到,如果你向所有复制体提供相同的输入命令序列,并且它们通过相同的连续状态进行处理,那么在多台计算机上复制一项服务的艰巨任务可以变得非常简单。
这种富有洞察力的SMR范式是许多可靠系统的基础,由于它的优雅,被认为是建立复制的分布式系统的标准方法。但在Lamport为SMR开发一个完整的解决方案之前,他需要解决一个核心要素,即协议,这在他的下一项工作中得到了解决。
拜占庭协议。虽然对崩溃故障有弹性的状态机方法对许多应用来说是足够的,但更多的关键任务系统,如航空电子设备,需要一个更极端的容错模型,不受可能从内部破坏系统的节点影响。
20世纪70年代,在斯坦福研究所(后来称为SRI国际),Lamport是帮助NASA设计一个强大的航空电子控制系统的团队的一员。由于任务的关键性,形式上的保证是绝对必要的。安全必须得到保证,以应对人们可以想象的最极端的系统故障。 SRI团队被要求接受的第一个挑战是证明NASA设计的驾驶舱控制方案的正确性,该方案由三个计算机系统组成,使用多数票来掩盖任何有问题的部件。
该团队的工作成果是关于这些严格类型的稳健系统的几个基础概念和见解。它包括在这种情况下对鲁棒性的基本定义,对至今为止任何复制系统所依据的协调问题的抽象,以及一个令人惊讶的启示,即有三台计算机的系统永远不能安全地运行一个关键任务的驾驶舱
事实上,在与Pease和Shostak[13][14]共同发表的两项开创性工作("LPS")中,该团队首次发现了一个有点奇特的漏洞。LPS认为,"一个故障的组件可能表现出一种经常被忽视的行为--即向系统的不同部分发送相互冲突的信息"。更广泛地说,一个发生故障的组件可能以与其规定的行为完全不一致的方式运作,而且可能看起来几乎是恶意的。
新的故障模型需要一个名称。 当时,在1975年的一篇论文[15]中介绍了协调两台通信计算机的相关经典挑战,吉姆-格雷在[16]中称其为 "两将军悖论"。这使得Lamport认为驾驶舱内的控制计算机是一支由拜占庭将军组成的军队,军队试图形成一个协调的攻击,而内部的叛徒却发出相互冲突的信号。这种故障模型被命名为 "拜占庭式故障",随后出现了大量的学术工作。拜占庭式故障模型仍在使用,用于捕捉系统中最糟糕的故障和安全缺陷。
Lamport解释说,他使用了一个关于 "拜占庭将军 "的故事来构建容错分布式系统的问题。
拜占庭式故障分析了可能发生的坏事。但需要发生的好事呢?LPS还给出了在拜占庭失败的情况下达成协调的问题的抽象表述;这被称为 "拜占庭协议 "问题。这种简洁的表述将控制协调任务表达为对单个比特形成协议决定的问题,从可能不同的比特输入到每个组件开始。一旦你在单个比特上达成了协议,就有可能反复使用它,以保持整个复杂系统的协调。论文显示,面对一个单一的故障,需要四台计算机在一个位上形成协议。三个是不够的,因为有三个单元,一个有故障的单元可能向其他两个单元发送冲突的值,并与每个单元形成不同的多数。更广泛地说,他们表明需要3F+1个单元才能克服F个同时发生故障的组件。为了证明这一点,他们使用了一个美丽的对称性论证,该论证已被称为 "六边形论证"。这个原型论证还发现了其他用途,每当人们论证一个向系统的不同部分发送冲突信息的故障单元看起来与正确和故障角色相反的对称情况没有区别时。
LPS还证明了3F+1个单元就足够了,他们提出了一个在F+1个同步通信回合中在3F+1个单元之间达成拜占庭协议的解决方案。他们还表明,如果使用数字签名,只要2F+1个单元就足够了,而且是必要的。
拜占庭协议问题及其解决方案已经成为容错系统的标志。大多数用冗余构建的系统在内部利用它进行复制和协调。Lamport本人后来将其用于形成接下来讨论的状态机复制范式,该范式给出了复制的算法基础。
1980年的论文被授予2005年分布式计算的Edsger W. Dijkstra奖,1982年的论文获得了可靠计算的Jean-Claude Laprie奖。
Paxos。随着对分布式计算的协议问题的理解越来越深入,Lamport是时候回到状态机复制并解决那里的故障了。他在1978年的论文中提出的第一个SMR解决方案是假设没有故障,它利用逻辑时间将复制体通过相同的命令序列来进行。1989年,Lamport设计了一种叫做Paxos的容错算法[17] [18]。这篇论文延续了他的幽默比喻的趋势,提出了一个想象中的故事:在一个名为Paxos的古希腊岛屿上有一个政府议会,其中任何数量的成员,或者可能是所有成员的缺席,都可以被容忍而不失去一致性。
Lamport描述了他用于容错分布式计算的Paxos算法的起源。
不幸的是,作为一个希腊寓言的设定使得大多数读者难以理解这篇论文,而且从提交到1998年出版花了九年时间。但1989年的DEC技术报告确实得到了关注。Lamport的同事Butler Lampson向分布式计算社区宣扬了这个想法[19]。在Paxos发表后不久,谷歌的Chubby系统和Apache的开源ZooKeeper提供了状态机复制作为一种外部的、广泛部署的服务。
Paxos以优化的方式将一连串的协议决定缝合到状态机命令序列中。重要的是,Paxos论文中给出的协议组件的第一阶段(称为Synod)可以在同一个领导者主持多个决策时避免;该阶段只需要在领导者需要被替换时执行。这一富有洞察力的突破占了Paxos受欢迎的大部分原因,后来被谷歌团队称为Multi-Paxos[20]。Lamport的Paxos论文在2012年赢得了ACM SIGOPS(操作系统特别兴趣小组)名人堂奖。
SMR和Paxos已经成为设计和推理共识和复制方法的事实上的标准框架。许多建立关键信息系统的公司,包括谷歌、雅虎、微软和亚马逊,都采用了Paxos的基础。
5. 程序的形式化规范和验证
在并发理论的早期,对描述解决方案和证明其正确性的良好工具的需求出现了。Lamport对并发程序的规范和验证理论做出了核心贡献。 例如,他是第一个阐明异步分布式算法的安全属性和活泼属性概念的人。这些是以前为顺序程序定义的 "部分正确性 "和 "完全正确性 "属性的概括。今天,安全和活泼属性构成了异步分布式算法正确性属性的标准分类。
另一项工作是与Martin Abadi[21]合作,在算法模型中引入了一个特殊的抽象概念,称为预言变量,以处理算法在规范之前解决非确定性选择的情况。Abadi和Lamport指出了出现这种问题的情况,并发展了支持这种理论扩展所需的理论。此外,他们还证明,只要分布式算法与规范相遇,两者都表示为状态机,它们之间的对应关系就可以用预言变量和以前的概念如历史变量的组合来证明。这项工作赢得了2008年LICS的时间测试奖。
形式化建模语言和验证工具。除了发展上述基本概念外,Lamport还开发了TLA(行动的时间逻辑)语言和TLA+工具集,用于建模和验证分布式算法和系统。
Lamport描述了TLA和它与细化映射的联系。
TLA和TLA+使用基于时间逻辑的符号,支持安全和活泼属性的规范和证明。Lamport监督了基于TLA+的验证工具的开发,特别是由Yuan Yu建立的TLC模型检查器。TLA+和TLC已经被用于描述和分析实际系统。例如,这些工具被用来发现微软Xbox 360在2005年发布前的硬件中使用的一致性协议的一个重大错误。在英特尔,它被用于分析英特尔快速路径互连的缓存一致性协议,该协议在Nehalem核心处理器中实现。为了教工程师如何使用他的形式化规范工具,Lamport写了一本书[22]。最近,Lamport开发了PlusCAL形式语言和工具,用于验证分布式算法;这项工作建立在TLA+之上。
6. LaTeX
在创作如此庞大的有影响力的论文集时,自然希望有一个方便的排版工具。Lamport不只是希望有一个,他为整个社区创造了一个。在并发领域之外的是Lamport的Latex系统[23],这是一套用于Donald Knuth的TeX排版系统的宏[24]。LaTeX给TeX增加了三个重要的东西。
源自Brian Reid的Scribe系统的 "排版环境 "的概念。
强烈强调结构性而非排版性的标记。
一个通用的文档设计,足够灵活,足以满足各种文档的需要。
Lamport并没有提出这些想法,但通过尽可能地推动这些想法,他创造了一个系统,提供了TeX的质量和它的许多灵活性,但更容易使用。Latex成为计算机科学和许多其他领域技术出版的事实上的标准。
Leslie Lamport还发表了许多其他重要的论文--这里就不一一介绍了。它们按时间顺序列在Lamport的主页上[1],并附有历史说明,描述每项成果的动机和背景。
任何时候你访问一台现代计算机,你都可能受到莱斯利-兰波特算法的影响。而所有这些工作都是从寻求了解如何组织当地面包店的队列开始的。
作者。达利娅-马尔基
其他贡献者。Martin Abadi, Hagit Attiya, Idit Keidar,
Nancy Lynch, Nir Shavit,
George Varghese, and Len Shustek
|
|