Message ID | 20190719083949.5351-1-nborisov@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | Refactor snapshot vs nocow writers locking | expand |
Hi Nikolay, On 19/07/2019 09:39, Nikolay Borisov wrote: > Hello, > > Here is the second version of the DRW lock for btrfs. Main changes from v1: > > * Fixed all checkpatch warnings (Andrea Parri) > * Properly call write_unlock in btrfs_drw_try_write_lock (Filipe Manana) > * Comment fix. > * Stress tested it via locktorture. Survived for 8 straight days on a 4 socket > 48 thread machine. > > I have also produced a PlusCal specification which I'd be happy to discuss with > people since I'm new to formal specification and I seem it doesn't have the > right fidelity: > I haven't played with PlusCal in a while but I figured I'd have a go at it anyway. I've also Cc'd Catalin who's my local TLA+/PCal guru. FYI PlusCal also supports a C-like syntax, which means you can use glorious brackets instead of begin/end (unless you prefer those... I won't judge). I appended my "clean-up" of your spec - mainly changed to C-style syntax, added a few more constant definitions, and split the locker process into separate reader and writer processes. IMO makes it more readable. > ---- MODULE specs ---- > EXTENDS Integers, Sequences, TLC > > CONSTANT NumLockers > > ASSUME NumLockers > 0 > > (*--algorithm DRW > > variables > lock_state = "idle", > states = {"idle", "write_locked", "read_locked", "read_waiting", "write_waiting"}, > threads = [thread \in 1..NumLockers |-> "idle" ] \* everyone is in idle state at first, this generates a tuple > > define > INV_SingleLockerType == \/ lock_state = "write_locked" /\ ~\E thread \in 1..Len(threads): threads[thread] = "read_locked" > \/ lock_state = "read_locked" /\ ~\E thread \in 1..Len(threads): threads[thread] = "write_locked" > \/ lock_state = "idle" /\ \A thread \in 1..Len(threads): threads[thread] = "idle" I've tried to think about liveness properties we would want to check against this spec, but the usual ones don't really work there: AFAICT there is no fairness there, readers can completely block writers (and the opposite is true as well). TLC checks for deadlocks by default (IIRC that should translate to always having > 1 non-stuttering step enabled in the next-state formula), so maybe that is all we need? > end define; > > macro ReadLock(tid) begin > if lock_state = "idle" \/ lock_state = "read_locked" then > lock_state := "read_locked"; > threads[tid] := "read_locked"; > else > assert lock_state = "write_locked"; > threads[tid] := "write_waiting"; \* waiting for writers to finish > await lock_state = "" \/ lock_state = "read_locked"; ^^ That's not a valid lock state, was that meant to be "idle"? BTW your spec doesn't have a type check invariant (something to make sure whatever is in your variables is sane). It seems to be common practice, and it's quite handy to spot stupid mistakes. For this spec it would look something like this: LOCK_STATES == {"idle", "write_locked", "read_locked"} THREAD_STATES == LOCK_STATES \union {"read_waiting", "write_waiting"} TypeCheck == /\ lock_state \in LOCK_STATES /\ \A thread \in THREADS: threads[thread] \in THREAD_STATES > end if; > > end macro; > > macro WriteLock(tid) begin > if lock_state = "idle" \/ lock_state = "write_locked" then > lock_state := "write_locked"; > threads[tid] := "write_locked"; > else > assert lock_state = "read_locked"; > threads[tid] := "reader_waiting"; \* waiting for readers to finish > await lock_state = "idle" \/ lock_state = "write_locked"; Aren't we missing an extra action here (same goes for the read lock)? threads[tid] should be set to "write_locked", and lock_state should be updated if it was previously "idle". Now the nasty thing is that we'd be setting threads[tid] to two different values in the same atomic block, so PlusCal will complain and we'll have to add some labels (which means changing the macro into a procedure). Maybe something like this? procedure WriteLock() { prepare: \* waiting for readers to finish threads[self] := "read_waiting"; lock: await lock_state = "idle" \/ lock_state = "write_locked"; lock_state := "write_locked"; threads[self] := "write_locked"; return; }; + something similar for ReadLock(). Alternatively the "prepare" step could be some counter increment, to more closely mimic the actual implementation, but I don't think it adds much value to the model. --------------------------------------------------------------------------- specs.tla: ---- MODULE specs ---- EXTENDS Integers, Sequences, TLC CONSTANTS NR_WRITERS, NR_READERS, WRITER_TASK, READER_TASK WRITERS == {WRITER_TASK} \X (1..NR_WRITERS) READERS == {READER_TASK} \X (1..NR_READERS) THREADS == WRITERS \union READERS (*--algorithm DRW { variables lock_state = "idle", \* everyone is in idle state at first, this generates a tuple threads = [thread \in THREADS |-> "idle" ] define { LOCK_STATES == {"idle", "write_locked", "read_locked"} THREAD_STATES == LOCK_STATES \union {"read_waiting", "write_waiting"} (* Safety invariants *) TypeCheck == /\ lock_state \in LOCK_STATES /\ \A thread \in THREADS: threads[thread] \in THREAD_STATES NoReadWhenWrite == lock_state = "write_locked" => \A thread \in THREADS: threads[thread] # "read_locked" NoWriteWhenRead == lock_state = "read_locked" => \A thread \in THREADS: threads[thread] # "write_locked" AllIdleWhenIdle == lock_state = "idle" => \A thread \in THREADS: threads[thread] = "idle" (* Ensure critical section exclusiveness *) Exclusion == /\ \E writer \in WRITERS: pc[writer] = "write_cs" => \A reader \in READERS: pc[reader] # "read_cs" /\ \E reader \in READERS: pc[reader] = "read_cs" => \A writer \in WRITERS: pc[writer] # "write_cs" } macro ReadLock(tid) { if (lock_state = "idle" \/ lock_state = "read_locked") { lock_state := "read_locked"; threads[tid] := "read_locked"; } else { assert lock_state = "write_locked"; \* waiting for writers to finish threads[tid] := "write_waiting"; await lock_state = "" \/ lock_state = "read_locked"; }; } macro WriteLock(tid) { if (lock_state = "idle" \/ lock_state = "write_locked") { lock_state := "write_locked"; threads[tid] := "write_locked"; } else { assert lock_state = "read_locked"; \* waiting for readers to finish threads[tid] := "read_waiting"; await lock_state = "idle" \/ lock_state = "write_locked"; }; } macro ReadUnlock(tid) { if (threads[tid] = "read_locked") { threads[tid] := "idle"; if (\A thread \in THREADS: threads[thread] # "read_locked") { \* we were the last read holder, everyone else should be waiting, unlock the lock lock_state := "idle"; }; }; } macro WriteUnlock(tid) { if (threads[tid] = "write_locked") { threads[tid] := "idle"; if (\A thread \in THREADS: threads[thread] # "write_locked") { \* we were the last write holder, everyone else should be waiting, unlock the lock lock_state := "idle"; }; }; } fair process(writer \in WRITERS) { loop: while (TRUE) { WriteLock(self); write_cs: skip; unlock: WriteUnlock(self); }; } fair process(reader \in READERS) { loop: while (TRUE) { ReadLock(self); read_cs: skip; unlock: ReadUnlock(self); }; } }*) ==== specs.cfg: SPECIFICATION Spec \* Add statements after this line. CONSTANTS NR_READERS = 3 NR_WRITERS = 3 READER_TASK = reader WRITER_TASK = writer INVARIANTS TypeCheck NoReadWhenWrite NoWriteWhenRead AllIdleWhenIdle Exclusion
Some nitpicking below: On Mon, Jul 29, 2019 at 03:13:42PM +0100, Valentin Schneider wrote: > specs.tla: > > ---- MODULE specs ---- > EXTENDS Integers, Sequences, TLC > > CONSTANTS > NR_WRITERS, > NR_READERS, > WRITER_TASK, > READER_TASK > > WRITERS == {WRITER_TASK} \X (1..NR_WRITERS) > READERS == {READER_TASK} \X (1..NR_READERS) > THREADS == WRITERS \union READERS Recommendation: use symbolic values for WRITERS and READERS (defined in .cfg: e.g. r1, r2, r3, w1, w2, w2). It allows you do to symmetry optimisations. We've also hit a TLC bug in the past with process values made up of a Cartesian product (though it may have been fixed since). > macro ReadLock(tid) > { > if (lock_state = "idle" \/ lock_state = "read_locked") { > lock_state := "read_locked"; > threads[tid] := "read_locked"; > } else { > assert lock_state = "write_locked"; > \* waiting for writers to finish > threads[tid] := "write_waiting"; > await lock_state = "" \/ lock_state = "read_locked"; lock_state = "idle"? > macro WriteLock(tid) > { > if (lock_state = "idle" \/ lock_state = "write_locked") { > lock_state := "write_locked"; > threads[tid] := "write_locked"; > } else { > assert lock_state = "read_locked"; > \* waiting for readers to finish > threads[tid] := "read_waiting"; > await lock_state = "idle" \/ lock_state = "write_locked"; > }; > } I'd say that's one of the pitfalls of PlusCal. The above is executed atomically, so you'd have the lock_state read and updated in the same action. Looking at the C patches, there is an atomic_read(&lock->readers) followed by a percpu_counter_inc(&lock->writers). Between these two, you can have "readers" becoming non-zero via a different CPU. My suggestion would be to use procedures with labels to express the non-atomicity of such sequences. > macro ReadUnlock(tid) { > if (threads[tid] = "read_locked") { > threads[tid] := "idle"; > if (\A thread \in THREADS: threads[thread] # "read_locked") { > \* we were the last read holder, everyone else should be waiting, unlock the lock > lock_state := "idle"; > }; > }; > } I'd make this close to the proposed C code with atomic counters. You'd not be able to check each thread atomically in practice anyway.
On 29/07/2019 16:33, Catalin Marinas wrote: [...] >> ---- MODULE specs ---- >> EXTENDS Integers, Sequences, TLC >> >> CONSTANTS >> NR_WRITERS, >> NR_READERS, >> WRITER_TASK, >> READER_TASK >> >> WRITERS == {WRITER_TASK} \X (1..NR_WRITERS) >> READERS == {READER_TASK} \X (1..NR_READERS) >> THREADS == WRITERS \union READERS > > Recommendation: use symbolic values for WRITERS and READERS (defined in > .cfg: e.g. r1, r2, r3, w1, w2, w2). It allows you do to symmetry > optimisations. We've also hit a TLC bug in the past with process values > made up of a Cartesian product (though it may have been fixed since). > Right, I had forgotten that one: https://github.com/tlaplus/tlaplus/issues/164 Being very lazy I dislike having to manually input those, but as you say it can't be avoided if we want to use symmetry. >> macro ReadLock(tid) >> { >> if (lock_state = "idle" \/ lock_state = "read_locked") { >> lock_state := "read_locked"; >> threads[tid] := "read_locked"; >> } else { >> assert lock_state = "write_locked"; >> \* waiting for writers to finish >> threads[tid] := "write_waiting"; >> await lock_state = "" \/ lock_state = "read_locked"; > > lock_state = "idle"? > Aye, I didn't modify those macros from the original spec. >> macro WriteLock(tid) >> { >> if (lock_state = "idle" \/ lock_state = "write_locked") { >> lock_state := "write_locked"; >> threads[tid] := "write_locked"; >> } else { >> assert lock_state = "read_locked"; >> \* waiting for readers to finish >> threads[tid] := "read_waiting"; >> await lock_state = "idle" \/ lock_state = "write_locked"; >> }; >> } > > I'd say that's one of the pitfalls of PlusCal. The above is executed > atomically, so you'd have the lock_state read and updated in the same > action. Looking at the C patches, there is an > atomic_read(&lock->readers) followed by a > percpu_counter_inc(&lock->writers). Between these two, you can have > "readers" becoming non-zero via a different CPU. > > My suggestion would be to use procedures with labels to express the > non-atomicity of such sequences. > Agreed, I've suggested something like this in my reply. [...]
On 29/07/2019 17:32, Valentin Schneider wrote: > On 29/07/2019 16:33, Catalin Marinas wrote: [...] >> I'd say that's one of the pitfalls of PlusCal. The above is executed >> atomically, so you'd have the lock_state read and updated in the same >> action. Looking at the C patches, there is an >> atomic_read(&lock->readers) followed by a >> percpu_counter_inc(&lock->writers). Between these two, you can have >> "readers" becoming non-zero via a different CPU. >> >> My suggestion would be to use procedures with labels to express the >> non-atomicity of such sequences. >> > FYI, with a very simple and stupid modification of the spec: ----->8----- macro ReadUnlock() { reader_count := reader_count - 1; \* Condition variable signal is "implicit" here } macro WriteUnlock() { writer_count := writer_count - 1; \* Ditto on the cond var } procedure ReadLock() { add: reader_count := reader_count + 1; lock: await writer_count = 0; return; } procedure WriteLock() { add: writer_count := writer_count + 1; lock: await reader_count = 0; return; }; -----8<----- it's quite easy to trigger the case Paul pointed out in [1]: ----->8----- Error: Deadlock reached. Error: The behavior up to this point is: State 1: <Initial predicate> /\ stack = (<<reader, 1>> :> <<>> @@ <<writer, 1>> :> <<>>) /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "loop_") /\ writer_count = 0 /\ reader_count = 0 /\ lock_state = "idle" State 2: <loop_ line 159, col 16 to line 164, col 72 of module specs> /\ stack = ( <<reader, 1>> :> <<>> @@ <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "add") /\ writer_count = 0 /\ reader_count = 0 /\ lock_state = "idle" State 3: <add line 146, col 14 to line 149, col 63 of module specs> /\ stack = ( <<reader, 1>> :> <<>> @@ <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "lock") /\ writer_count = 1 /\ reader_count = 0 /\ lock_state = "idle" State 4: <loop line 179, col 15 to line 184, col 71 of module specs> /\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@ <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) /\ pc = (<<reader, 1>> :> "add_" @@ <<writer, 1>> :> "lock") /\ writer_count = 1 /\ reader_count = 0 /\ lock_state = "idle" State 5: <add_ line 133, col 15 to line 136, col 64 of module specs> /\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@ <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) /\ pc = (<<reader, 1>> :> "lock_" @@ <<writer, 1>> :> "lock") /\ writer_count = 1 /\ reader_count = 1 /\ lock_state = "idle" -----8<----- Which I think is pretty cool considering the effort that was required (read: not much). [1]: https://lore.kernel.org/lkml/20190607105251.GB28207@linux.ibm.com/
On 30.07.19 г. 14:03 ч., Valentin Schneider wrote: > On 29/07/2019 17:32, Valentin Schneider wrote: >> On 29/07/2019 16:33, Catalin Marinas wrote: > [...] >>> I'd say that's one of the pitfalls of PlusCal. The above is executed >>> atomically, so you'd have the lock_state read and updated in the same >>> action. Looking at the C patches, there is an >>> atomic_read(&lock->readers) followed by a >>> percpu_counter_inc(&lock->writers). Between these two, you can have >>> "readers" becoming non-zero via a different CPU. >>> >>> My suggestion would be to use procedures with labels to express the >>> non-atomicity of such sequences. >>> >> > > FYI, with a very simple and stupid modification of the spec: > > ----->8----- > macro ReadUnlock() > { > reader_count := reader_count - 1; > \* Condition variable signal is "implicit" here > } > > macro WriteUnlock() > { > writer_count := writer_count - 1; > \* Ditto on the cond var > } > > procedure ReadLock() > { > add: > reader_count := reader_count + 1; > lock: > await writer_count = 0; > return; > } > > procedure WriteLock() > { > add: > writer_count := writer_count + 1; > lock: > await reader_count = 0; > return; > }; > -----8<----- > > it's quite easy to trigger the case Paul pointed out in [1]: Yes, however, there was a bug in the original posting, in that btrfs_drw_try_write_lock should have called btrfs_drw_write_unlock instead of btrfs_drw_read_unlock if it sees that readers incremented while it has already incremented its percpu counter. Additionally the implementation doesn't await with the respective variable incremented. Is there a way to express something along the lines of : > procedure WriteLock() > { > add: > writer_count := writer_count + 1; > lock: > await reader_count = 0; If we are about to wait then also decrement writer_count? I guess the correct way to specify it would be: procedure WriteLock() { writer_count := writer_count + 1; await reader_count = 0; return; }; Because the implementation (by using barriers and percpu counters ensures all of this happens as one atomic step?) E.g. before going to sleep we decrement the write unlock. > return; > }; > > ----->8----- > Error: Deadlock reached. > Error: The behavior up to this point is: > State 1: <Initial predicate> > /\ stack = (<<reader, 1>> :> <<>> @@ <<writer, 1>> :> <<>>) > /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "loop_") > /\ writer_count = 0 > /\ reader_count = 0 > /\ lock_state = "idle" > > State 2: <loop_ line 159, col 16 to line 164, col 72 of module specs> > /\ stack = ( <<reader, 1>> :> <<>> @@ > <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) > /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "add") > /\ writer_count = 0 > /\ reader_count = 0 > /\ lock_state = "idle" > > State 3: <add line 146, col 14 to line 149, col 63 of module specs> > /\ stack = ( <<reader, 1>> :> <<>> @@ > <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) > /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "lock") > /\ writer_count = 1 > /\ reader_count = 0 > /\ lock_state = "idle" > > State 4: <loop line 179, col 15 to line 184, col 71 of module specs> > /\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@ > <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) > /\ pc = (<<reader, 1>> :> "add_" @@ <<writer, 1>> :> "lock") > /\ writer_count = 1 > /\ reader_count = 0 > /\ lock_state = "idle" > > State 5: <add_ line 133, col 15 to line 136, col 64 of module specs> > /\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@ > <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> ) > /\ pc = (<<reader, 1>> :> "lock_" @@ <<writer, 1>> :> "lock") > /\ writer_count = 1 > /\ reader_count = 1 > /\ lock_state = "idle" > -----8<----- > > Which I think is pretty cool considering the effort that was required > (read: not much). > > [1]: https://lore.kernel.org/lkml/20190607105251.GB28207@linux.ibm.com/ >
And with a more accurate spec (appended at the end of this email): Model checking completed. No error has been found. Estimates of the probability that TLC did not check all reachable states because two distinct states had the same fingerprint: calculated (optimistic): val = 8.6E-8 based on the actual fingerprints: val = 9.8E-9 3100306 states generated, 651251 distinct states found, 0 states left on queue. IOW, it seems fine in regards to the defined properties + the deadlock check. The EventuallyRead liveness property shows that a waiting reader will always eventually get the lock (no reader starvation), of course assuming no lock-user blocks in its critical section (i.e. no stuttering steps). It doesn't hold for writers - they can be starved by a never-ending stream of readers. IOW \* Eventually one writer gets the lock <> \E writer \in WRITERS: pc[writer] = "write_cs" can be disproven by some behaviours. However, \* No writer ever gets the lock [] \A writer \in WRITERS: pc[writer] # "write_cs" can be disproven as well - there are *some* paths that allow writers to get the lock. I haven't found a neater way to check that other than by having a "debug" property that I don't include in the full-fledged check. Note that the entire content of a label is considered atomic by TLC. From the point of view of that spec: Read lock: - reader count increment is atomic - writer count read is atomic Read unlock: - reader count decrement is atomic (The model's writer counter read is in the same atomic block as the reader count decrement, but it's only used for updating a model-internal variable, so I'm not including it here) Write lock: - write count increment is atomic - reader count read is atomic - writer count decrement is atomic Write unlock: - writer count increment is atomic (ditto on the reader counter read) This doesn't help for the placement of memory barriers, but from an algorithm point of view that seems to be holding up. Here's the actual spec (if I keep this up I'll get a git tree going...) --------------------------------------------------------------------------- specs.tla: ---- MODULE specs ---- EXTENDS Integers, Sequences, FiniteSets, TLC CONSTANTS NR_WRITERS, NR_READERS, WRITER_TASK, READER_TASK WRITERS == {WRITER_TASK} \X (1..NR_WRITERS) READERS == {READER_TASK} \X (1..NR_READERS) THREADS == WRITERS \union READERS (*--algorithm DRW { variables lock_state = "idle", reader_count = 0, writer_count = 0 define { LOCK_STATES == {"idle", "write_locked", "read_locked"} (* Safety invariants *) TypeCheck == /\ lock_state \in LOCK_STATES /\ reader_count \in (0..NR_READERS) /\ writer_count \in (0..NR_WRITERS) (* Ensure critical section exclusiveness *) Exclusion == /\ \E writer \in WRITERS: pc[writer] = "write_cs" => \A reader \in READERS: pc[reader] # "read_cs" /\ \E reader \in READERS: pc[reader] = "read_cs" => \A writer \in WRITERS: pc[writer] # "write_cs" (* Ensure the lock state is sane *) LockState == /\ lock_state = "idle" => (* Not much can be said about the counts - the whole range is valid *) /\ \A writer \in WRITERS: pc[writer] # "write_cs" /\ \A reader \in READERS: pc[reader] # "read_cs" /\ lock_state = "read_locked" => (* Some readers can still be in the locking procedure, just make sure *) (* all readers in their critical section are accounted for *) reader_count >= Cardinality({r \in READERS: pc[r] = "read_cs"}) /\ lock_state = "write_locked" => (* Ditto for writers *) writer_count >= Cardinality({w \in WRITERS: pc[w] = "write_cs"}) (* Liveness properties *) (* A waiting reader *always* eventually gets the lock *) EventuallyRead == reader_count > 0 /\ lock_state # "read_locked" ~> lock_state = "read_locked" (* A waiting writer *can* eventually get the lock *) EventuallyWrite == (* TODO: find a way to express that this can be false in some behaviours *) (* but has to be true in > 0 behaviours *) TRUE } macro ReadUnlock() { reader_count := reader_count - 1; \* Condition variable signal is "implicit" here if (reader_count = 0) { lock_state := "idle"; }; } macro WriteUnlock() { writer_count := writer_count - 1; \* Ditto on the cond var if (writer_count = 0) { lock_state := "idle"; }; } procedure ReadLock() { add: reader_count := reader_count + 1; lock: await writer_count = 0; lock_state := "read_locked"; return; } procedure WriteLock() variables rc; { loop: while (TRUE) { writer_count := writer_count + 1; get_readers: rc := reader_count; check_readers: if (rc > 0) { writer_count := writer_count - 1; wait: await reader_count = 0; } else { goto locked; }; }; locked: lock_state := "write_locked"; return; }; fair process(writer \in WRITERS) { loop: while (TRUE) { call WriteLock(); write_cs: skip; unlock: WriteUnlock(); }; } fair process(reader \in READERS) { loop: while (TRUE) { call ReadLock(); read_cs: skip; unlock: ReadUnlock(); }; } }*) ==== specs.cfg: SPECIFICATION Spec \* Add statements after this line. CONSTANTS NR_READERS = 3 NR_WRITERS = 3 READER_TASK = reader WRITER_TASK = writer INVARIANTS TypeCheck LockState Exclusion PROPERTIES EventuallyRead EventuallyWrite