[v2,0/2] Refactor snapshot vs nocow writers locking

Message ID	20190719083949.5351-1-nborisov@suse.com (mailing list archive)
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Nikolay Borisov <nborisov@suse.com> To: linux-btrfs@vger.kernel.org Cc: paulmck@linux.ibm.com, andrea.parri@amarulasolutions.com, linux-kernel@vger.kernel.org, Nikolay Borisov <nborisov@suse.com> Subject: [PATCH v2 0/2] Refactor snapshot vs nocow writers locking Date: Fri, 19 Jul 2019 11:39:47 +0300 Message-Id: <20190719083949.5351-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	Refactor snapshot vs nocow writers locking \| expand [v2,0/2] Refactor snapshot vs nocow writers locking [v2,1/2] btrfs: Implement DRW lock [v2,2/2] btrfs: convert snapshot/nocow exlcusion to drw lock

Nikolay Borisov July 19, 2019, 8:39 a.m. UTC

Hello, 

Here is the second version of the DRW lock for btrfs. Main changes from v1: 

* Fixed all checkpatch warnings (Andrea Parri)
* Properly call write_unlock in btrfs_drw_try_write_lock (Filipe Manana)
* Comment fix. 
* Stress tested it via locktorture. Survived for 8 straight days on a 4 socket
48 thread machine.

I have also produced a PlusCal specification which I'd be happy to discuss with 
people since I'm new to formal specification and I seem it doesn't have the 
right fidelity: 

---- MODULE specs ----
EXTENDS Integers, Sequences, TLC

CONSTANT NumLockers

ASSUME NumLockers > 0

(*--algorithm DRW

variables
    lock_state = "idle",
    states = {"idle", "write_locked", "read_locked", "read_waiting", "write_waiting"},
    threads = [thread \in 1..NumLockers |-> "idle" ] \* everyone is in idle state at first, this generates a tuple

define
INV_SingleLockerType  == \/ lock_state = "write_locked" /\ ~\E thread \in 1..Len(threads): threads[thread] = "read_locked"
                         \/ lock_state = "read_locked" /\ ~\E thread \in 1..Len(threads): threads[thread] = "write_locked"
                         \/ lock_state = "idle" /\ \A thread \in 1..Len(threads): threads[thread] = "idle"
end define;

macro ReadLock(tid) begin
    if lock_state = "idle" \/ lock_state = "read_locked" then
        lock_state := "read_locked";
        threads[tid] := "read_locked";
    else
        assert lock_state = "write_locked";
        threads[tid] := "write_waiting"; \* waiting for writers to finish
        await lock_state = "" \/ lock_state = "read_locked";
    end if;

end macro;

macro WriteLock(tid) begin
    if lock_state = "idle" \/ lock_state = "write_locked" then
        lock_state := "write_locked";
        threads[tid] := "write_locked";
    else
        assert lock_state = "read_locked";
        threads[tid] := "reader_waiting"; \* waiting for readers to finish
        await lock_state = "idle" \/ lock_state = "write_locked";
    end if;

end macro;

macro ReadUnlock(tid) begin
    if threads[tid] = "read_locked" then
        threads[tid] := "idle";
        if ~\E thread \in 1..Len(threads): threads[thread] = "read_locked" then
            lock_state := "idle"; \* we were the last read holder, everyone else should be waiting, unlock the lock
        end if;
    end if;

end macro;

macro WriteUnlock(tid) begin
    if threads[tid] = "write_locked" then
        threads[tid] := "idle";
        if ~\E thread \in 1..Len(threads): threads[thread] = "write_locked" then
            lock_state := "idle"; \* we were the last write holder, everyone else should be waiting, unlock the lock
        end if;
    end if;

end macro;

process lock \in 1..NumLockers

begin LOCKER:
    while TRUE do
        either
            ReadLock(self);
        or
            WriteLock(self);
        or
            ReadUnlock(self);
        or
            WriteUnlock(self);
        end either;
    end while;
end process;

end algorithm; *)

====


Nikolay Borisov (2):
  btrfs: Implement DRW lock
  btrfs: convert snapshot/nocow exlcusion to drw lock

 fs/btrfs/ctree.h       | 10 ++---
 fs/btrfs/disk-io.c     | 46 ++++++----------------
 fs/btrfs/extent-tree.c | 35 -----------------
 fs/btrfs/file.c        | 11 +++---
 fs/btrfs/inode.c       |  8 ++--
 fs/btrfs/ioctl.c       | 10 ++---
 fs/btrfs/locking.c     | 88 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/locking.h     | 20 ++++++++++
 8 files changed, 134 insertions(+), 94 deletions(-)

Valentin Schneider July 29, 2019, 2:13 p.m. UTC | #1

Hi Nikolay,

On 19/07/2019 09:39, Nikolay Borisov wrote:
> Hello, 
> 
> Here is the second version of the DRW lock for btrfs. Main changes from v1: 
> 
> * Fixed all checkpatch warnings (Andrea Parri)
> * Properly call write_unlock in btrfs_drw_try_write_lock (Filipe Manana)
> * Comment fix. 
> * Stress tested it via locktorture. Survived for 8 straight days on a 4 socket
> 48 thread machine.
> 
> I have also produced a PlusCal specification which I'd be happy to discuss with 
> people since I'm new to formal specification and I seem it doesn't have the 
> right fidelity: 
> 

I haven't played with PlusCal in a while but I figured I'd have a go at it
anyway. I've also Cc'd Catalin who's my local TLA+/PCal guru.


FYI PlusCal also supports a C-like syntax, which means you can use glorious
brackets instead of begin/end (unless you prefer those... I won't judge).

I appended my "clean-up" of your spec - mainly changed to C-style
syntax, added a few more constant definitions, and split the locker
process into separate reader and writer processes. IMO makes it more
readable.

> ---- MODULE specs ----
> EXTENDS Integers, Sequences, TLC
> 
> CONSTANT NumLockers
> 
> ASSUME NumLockers > 0
> 
> (*--algorithm DRW
> 
> variables
>     lock_state = "idle",
>     states = {"idle", "write_locked", "read_locked", "read_waiting", "write_waiting"},
>     threads = [thread \in 1..NumLockers |-> "idle" ] \* everyone is in idle state at first, this generates a tuple
> 
> define
> INV_SingleLockerType  == \/ lock_state = "write_locked" /\ ~\E thread \in 1..Len(threads): threads[thread] = "read_locked"
>                          \/ lock_state = "read_locked" /\ ~\E thread \in 1..Len(threads): threads[thread] = "write_locked"
>                          \/ lock_state = "idle" /\ \A thread \in 1..Len(threads): threads[thread] = "idle"

I've tried to think about liveness properties we would want to check
against this spec, but the usual ones don't really work there: AFAICT
there is no fairness there, readers can completely block writers (and
the opposite is true as well).

TLC checks for deadlocks by default (IIRC that should translate to always
having > 1 non-stuttering step enabled in the next-state formula), so
maybe that is all we need?

> end define;
> 
> macro ReadLock(tid) begin
>     if lock_state = "idle" \/ lock_state = "read_locked" then
>         lock_state := "read_locked";
>         threads[tid] := "read_locked";
>     else
>         assert lock_state = "write_locked";
>         threads[tid] := "write_waiting"; \* waiting for writers to finish
>         await lock_state = "" \/ lock_state = "read_locked";
                             ^^
That's not a valid lock state, was that meant to be "idle"? 

BTW your spec doesn't have a type check invariant (something to make sure
whatever is in your variables is sane). It seems to be common practice, and
it's quite handy to spot stupid mistakes. For this spec it would look
something like this:

LOCK_STATES == {"idle", "write_locked", "read_locked"}
THREAD_STATES == LOCK_STATES \union {"read_waiting", "write_waiting"}

TypeCheck ==
    /\ lock_state \in LOCK_STATES
    /\ \A thread \in THREADS: threads[thread] \in THREAD_STATES

>     end if;
> 
> end macro;
> 
> macro WriteLock(tid) begin
>     if lock_state = "idle" \/ lock_state = "write_locked" then
>         lock_state := "write_locked";
>         threads[tid] := "write_locked";
>     else
>         assert lock_state = "read_locked";
>         threads[tid] := "reader_waiting"; \* waiting for readers to finish
>         await lock_state = "idle" \/ lock_state = "write_locked";

Aren't we missing an extra action here (same goes for the read lock)?

threads[tid] should be set to "write_locked", and lock_state should be
updated if it was previously "idle".

Now the nasty thing is that we'd be setting threads[tid] to two different
values in the same atomic block, so PlusCal will complain and we'll have
to add some labels (which means changing the macro into a procedure).

Maybe something like this?

procedure WriteLock()
{
prepare:
    \* waiting for readers to finish
    threads[self] := "read_waiting";
lock:
    await lock_state = "idle" \/ lock_state = "write_locked";
    lock_state := "write_locked";
    threads[self] := "write_locked";
    return;
};

+ something similar for ReadLock().

Alternatively the "prepare" step could be some counter increment, to more
closely mimic the actual implementation, but I don't think it adds much
value to the model.

---------------------------------------------------------------------------

specs.tla:

---- MODULE specs ----
EXTENDS Integers, Sequences, TLC

CONSTANTS
    NR_WRITERS,
    NR_READERS,
    WRITER_TASK,
    READER_TASK

WRITERS == {WRITER_TASK} \X (1..NR_WRITERS)
READERS == {READER_TASK} \X (1..NR_READERS)
THREADS == WRITERS \union READERS

(*--algorithm DRW {

variables
    lock_state = "idle",
    \* everyone is in idle state at first, this generates a tuple
    threads = [thread \in THREADS |-> "idle" ]

define {

LOCK_STATES == {"idle", "write_locked", "read_locked"}
THREAD_STATES == LOCK_STATES \union {"read_waiting", "write_waiting"}

(* Safety invariants *)
TypeCheck ==
    /\ lock_state \in LOCK_STATES
    /\ \A thread \in THREADS: threads[thread] \in THREAD_STATES

NoReadWhenWrite ==
    lock_state = "write_locked" =>
        \A thread \in THREADS: threads[thread] # "read_locked"

NoWriteWhenRead ==
    lock_state = "read_locked" =>
        \A thread \in THREADS: threads[thread] # "write_locked"

AllIdleWhenIdle ==
    lock_state = "idle" =>
        \A thread \in THREADS: threads[thread] = "idle"

(* Ensure critical section exclusiveness *)
Exclusion ==
    /\ \E writer \in WRITERS: pc[writer] = "write_cs" =>
        \A reader \in READERS: pc[reader] # "read_cs"
    /\ \E reader \in READERS: pc[reader] = "read_cs" =>
        \A writer \in WRITERS: pc[writer] # "write_cs"
}

macro ReadLock(tid)
{
    if (lock_state = "idle" \/ lock_state = "read_locked") {
        lock_state := "read_locked";
        threads[tid] := "read_locked";
    } else {
        assert lock_state = "write_locked";
        \* waiting for writers to finish
        threads[tid] := "write_waiting";
        await lock_state = "" \/ lock_state = "read_locked";
    };
}

macro WriteLock(tid)
{
    if (lock_state = "idle" \/ lock_state = "write_locked") {
        lock_state := "write_locked";
        threads[tid] := "write_locked";
    } else {
        assert lock_state = "read_locked";
        \* waiting for readers to finish
        threads[tid] := "read_waiting";
        await lock_state = "idle" \/ lock_state = "write_locked";
    };
}

macro ReadUnlock(tid) {
    if (threads[tid] = "read_locked") {
        threads[tid] := "idle";
        if (\A thread \in THREADS: threads[thread] # "read_locked") {
            \* we were the last read holder, everyone else should be waiting, unlock the lock
            lock_state := "idle";
        };
    };
}

macro WriteUnlock(tid) {
    if (threads[tid] = "write_locked") {
        threads[tid] := "idle";
        if (\A thread \in THREADS: threads[thread] # "write_locked") {
            \* we were the last write holder, everyone else should be waiting, unlock the lock
            lock_state := "idle";
        };
    };
}

fair process(writer \in WRITERS)
{
loop:
    while (TRUE) {
        WriteLock(self);
write_cs:
        skip;
unlock:
        WriteUnlock(self);
    };
}

fair process(reader \in READERS)
{
loop:
    while (TRUE) {
        ReadLock(self);
read_cs:
        skip;
unlock:
        ReadUnlock(self);
    };

}

}*)
====

specs.cfg:

SPECIFICATION Spec
\* Add statements after this line.

CONSTANTS
	NR_READERS = 3
	NR_WRITERS = 3
	READER_TASK = reader
	WRITER_TASK = writer

INVARIANTS
	TypeCheck
	
	NoReadWhenWrite
	NoWriteWhenRead
	AllIdleWhenIdle
	
	Exclusion

Catalin Marinas July 29, 2019, 3:33 p.m. UTC | #2

Some nitpicking below:

On Mon, Jul 29, 2019 at 03:13:42PM +0100, Valentin Schneider wrote:
> specs.tla:
> 
> ---- MODULE specs ----
> EXTENDS Integers, Sequences, TLC
> 
> CONSTANTS
>     NR_WRITERS,
>     NR_READERS,
>     WRITER_TASK,
>     READER_TASK
> 
> WRITERS == {WRITER_TASK} \X (1..NR_WRITERS)
> READERS == {READER_TASK} \X (1..NR_READERS)
> THREADS == WRITERS \union READERS

Recommendation: use symbolic values for WRITERS and READERS (defined in
.cfg: e.g. r1, r2, r3, w1, w2, w2). It allows you do to symmetry
optimisations. We've also hit a TLC bug in the past with process values
made up of a Cartesian product (though it may have been fixed since).

> macro ReadLock(tid)
> {
>     if (lock_state = "idle" \/ lock_state = "read_locked") {
>         lock_state := "read_locked";
>         threads[tid] := "read_locked";
>     } else {
>         assert lock_state = "write_locked";
>         \* waiting for writers to finish
>         threads[tid] := "write_waiting";
>         await lock_state = "" \/ lock_state = "read_locked";

lock_state = "idle"?

> macro WriteLock(tid)
> {
>     if (lock_state = "idle" \/ lock_state = "write_locked") {
>         lock_state := "write_locked";
>         threads[tid] := "write_locked";
>     } else {
>         assert lock_state = "read_locked";
>         \* waiting for readers to finish
>         threads[tid] := "read_waiting";
>         await lock_state = "idle" \/ lock_state = "write_locked";
>     };
> }

I'd say that's one of the pitfalls of PlusCal. The above is executed
atomically, so you'd have the lock_state read and updated in the same
action. Looking at the C patches, there is an
atomic_read(&lock->readers) followed by a
percpu_counter_inc(&lock->writers). Between these two, you can have
"readers" becoming non-zero via a different CPU.

My suggestion would be to use procedures with labels to express the
non-atomicity of such sequences.

> macro ReadUnlock(tid) {
>     if (threads[tid] = "read_locked") {
>         threads[tid] := "idle";
>         if (\A thread \in THREADS: threads[thread] # "read_locked") {
>             \* we were the last read holder, everyone else should be waiting, unlock the lock
>             lock_state := "idle";
>         };
>     };
> }

I'd make this close to the proposed C code with atomic counters. You'd
not be able to check each thread atomically in practice anyway.

Valentin Schneider July 29, 2019, 4:32 p.m. UTC | #3

On 29/07/2019 16:33, Catalin Marinas wrote:
[...]
>> ---- MODULE specs ----
>> EXTENDS Integers, Sequences, TLC
>>
>> CONSTANTS
>>     NR_WRITERS,
>>     NR_READERS,
>>     WRITER_TASK,
>>     READER_TASK
>>
>> WRITERS == {WRITER_TASK} \X (1..NR_WRITERS)
>> READERS == {READER_TASK} \X (1..NR_READERS)
>> THREADS == WRITERS \union READERS
> 
> Recommendation: use symbolic values for WRITERS and READERS (defined in
> .cfg: e.g. r1, r2, r3, w1, w2, w2). It allows you do to symmetry
> optimisations. We've also hit a TLC bug in the past with process values
> made up of a Cartesian product (though it may have been fixed since).
> 

Right, I had forgotten that one:

  https://github.com/tlaplus/tlaplus/issues/164

Being very lazy I dislike having to manually input those, but as you say
it can't be avoided if we want to use symmetry.

>> macro ReadLock(tid)
>> {
>>     if (lock_state = "idle" \/ lock_state = "read_locked") {
>>         lock_state := "read_locked";
>>         threads[tid] := "read_locked";
>>     } else {
>>         assert lock_state = "write_locked";
>>         \* waiting for writers to finish
>>         threads[tid] := "write_waiting";
>>         await lock_state = "" \/ lock_state = "read_locked";
> 
> lock_state = "idle"?
> 

Aye, I didn't modify those macros from the original spec.

>> macro WriteLock(tid)
>> {
>>     if (lock_state = "idle" \/ lock_state = "write_locked") {
>>         lock_state := "write_locked";
>>         threads[tid] := "write_locked";
>>     } else {
>>         assert lock_state = "read_locked";
>>         \* waiting for readers to finish
>>         threads[tid] := "read_waiting";
>>         await lock_state = "idle" \/ lock_state = "write_locked";
>>     };
>> }
> 
> I'd say that's one of the pitfalls of PlusCal. The above is executed
> atomically, so you'd have the lock_state read and updated in the same
> action. Looking at the C patches, there is an
> atomic_read(&lock->readers) followed by a
> percpu_counter_inc(&lock->writers). Between these two, you can have
> "readers" becoming non-zero via a different CPU.
> 
> My suggestion would be to use procedures with labels to express the
> non-atomicity of such sequences.
> 

Agreed, I've suggested something like this in my reply.

[...]

Valentin Schneider July 30, 2019, 11:03 a.m. UTC | #4

On 29/07/2019 17:32, Valentin Schneider wrote:
> On 29/07/2019 16:33, Catalin Marinas wrote:
[...]
>> I'd say that's one of the pitfalls of PlusCal. The above is executed
>> atomically, so you'd have the lock_state read and updated in the same
>> action. Looking at the C patches, there is an
>> atomic_read(&lock->readers) followed by a
>> percpu_counter_inc(&lock->writers). Between these two, you can have
>> "readers" becoming non-zero via a different CPU.
>>
>> My suggestion would be to use procedures with labels to express the
>> non-atomicity of such sequences.
>>
> 

FYI, with a very simple and stupid modification of the spec:

----->8-----
macro ReadUnlock()
{
    reader_count := reader_count - 1;
    \* Condition variable signal is "implicit" here
}

macro WriteUnlock()
{
    writer_count := writer_count - 1;
    \* Ditto on the cond var
}

procedure ReadLock()
{
add:
    reader_count := reader_count + 1;
lock:
    await writer_count = 0;
    return;
}

procedure WriteLock()
{
add:
    writer_count := writer_count + 1;
lock:
    await reader_count = 0;
    return;
};
-----8<-----

it's quite easy to trigger the case Paul pointed out in [1]:

----->8-----
Error: Deadlock reached.
Error: The behavior up to this point is:
State 1: <Initial predicate>
/\ stack = (<<reader, 1>> :> <<>> @@ <<writer, 1>> :> <<>>)
/\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "loop_")
/\ writer_count = 0
/\ reader_count = 0
/\ lock_state = "idle"

State 2: <loop_ line 159, col 16 to line 164, col 72 of module specs>
/\ stack = ( <<reader, 1>> :> <<>> @@
  <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
/\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "add")
/\ writer_count = 0
/\ reader_count = 0
/\ lock_state = "idle"

State 3: <add line 146, col 14 to line 149, col 63 of module specs>
/\ stack = ( <<reader, 1>> :> <<>> @@
  <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
/\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "lock")
/\ writer_count = 1
/\ reader_count = 0
/\ lock_state = "idle"

State 4: <loop line 179, col 15 to line 184, col 71 of module specs>
/\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@
  <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
/\ pc = (<<reader, 1>> :> "add_" @@ <<writer, 1>> :> "lock")
/\ writer_count = 1
/\ reader_count = 0
/\ lock_state = "idle"

State 5: <add_ line 133, col 15 to line 136, col 64 of module specs>
/\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@
  <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
/\ pc = (<<reader, 1>> :> "lock_" @@ <<writer, 1>> :> "lock")
/\ writer_count = 1
/\ reader_count = 1
/\ lock_state = "idle"
-----8<-----

Which I think is pretty cool considering the effort that was required
(read: not much).

[1]: https://lore.kernel.org/lkml/20190607105251.GB28207@linux.ibm.com/

Nikolay Borisov July 30, 2019, 12:11 p.m. UTC | #5

On 30.07.19 г. 14:03 ч., Valentin Schneider wrote:
> On 29/07/2019 17:32, Valentin Schneider wrote:
>> On 29/07/2019 16:33, Catalin Marinas wrote:
> [...]
>>> I'd say that's one of the pitfalls of PlusCal. The above is executed
>>> atomically, so you'd have the lock_state read and updated in the same
>>> action. Looking at the C patches, there is an
>>> atomic_read(&lock->readers) followed by a
>>> percpu_counter_inc(&lock->writers). Between these two, you can have
>>> "readers" becoming non-zero via a different CPU.
>>>
>>> My suggestion would be to use procedures with labels to express the
>>> non-atomicity of such sequences.
>>>
>>
> 
> FYI, with a very simple and stupid modification of the spec:
> 
> ----->8-----
> macro ReadUnlock()
> {
>     reader_count := reader_count - 1;
>     \* Condition variable signal is "implicit" here
> }
> 
> macro WriteUnlock()
> {
>     writer_count := writer_count - 1;
>     \* Ditto on the cond var
> }
> 
> procedure ReadLock()
> {
> add:
>     reader_count := reader_count + 1;
> lock:
>     await writer_count = 0;
>     return;
> }
> 
> procedure WriteLock()
> {
> add:
>     writer_count := writer_count + 1;
> lock:
>     await reader_count = 0;
>     return;
> };
> -----8<-----
> 
> it's quite easy to trigger the case Paul pointed out in [1]:

Yes, however, there was a bug in the original posting, in that
btrfs_drw_try_write_lock should have called btrfs_drw_write_unlock
instead of btrfs_drw_read_unlock if it sees that readers incremented
while it has already incremented its percpu counter.

Additionally the implementation doesn't await with the respective
variable incremented. Is there a way to express something along the
lines of :


> procedure WriteLock()
> {
> add:
>     writer_count := writer_count + 1;
> lock:
>     await reader_count = 0;

If we are about to wait then also decrement writer_count?  I guess the
correct way to specify it would be:

procedure WriteLock()
 {

     writer_count := writer_count + 1;
     await reader_count = 0;
     return;
 };


Because the implementation (by using barriers and percpu counters
ensures all of this happens as one atomic step?) E.g. before going to
sleep we decrement the write unlock.

>     return;
> };

> 
> ----->8-----
> Error: Deadlock reached.
> Error: The behavior up to this point is:
> State 1: <Initial predicate>
> /\ stack = (<<reader, 1>> :> <<>> @@ <<writer, 1>> :> <<>>)
> /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "loop_")
> /\ writer_count = 0
> /\ reader_count = 0
> /\ lock_state = "idle"
> 
> State 2: <loop_ line 159, col 16 to line 164, col 72 of module specs>
> /\ stack = ( <<reader, 1>> :> <<>> @@
>   <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
> /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "add")
> /\ writer_count = 0
> /\ reader_count = 0
> /\ lock_state = "idle"
> 
> State 3: <add line 146, col 14 to line 149, col 63 of module specs>
> /\ stack = ( <<reader, 1>> :> <<>> @@
>   <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
> /\ pc = (<<reader, 1>> :> "loop" @@ <<writer, 1>> :> "lock")
> /\ writer_count = 1
> /\ reader_count = 0
> /\ lock_state = "idle"
> 
> State 4: <loop line 179, col 15 to line 184, col 71 of module specs>
> /\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@
>   <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
> /\ pc = (<<reader, 1>> :> "add_" @@ <<writer, 1>> :> "lock")
> /\ writer_count = 1
> /\ reader_count = 0
> /\ lock_state = "idle"
> 
> State 5: <add_ line 133, col 15 to line 136, col 64 of module specs>
> /\ stack = ( <<reader, 1>> :> <<[pc |-> "read_cs", procedure |-> "ReadLock"]>> @@
>   <<writer, 1>> :> <<[pc |-> "write_cs", procedure |-> "WriteLock"]>> )
> /\ pc = (<<reader, 1>> :> "lock_" @@ <<writer, 1>> :> "lock")
> /\ writer_count = 1
> /\ reader_count = 1
> /\ lock_state = "idle"
> -----8<-----
> 
> Which I think is pretty cool considering the effort that was required
> (read: not much).
> 
> [1]: https://lore.kernel.org/lkml/20190607105251.GB28207@linux.ibm.com/
>

Valentin Schneider July 30, 2019, 1:36 p.m. UTC | #6

And with a more accurate spec (appended at the end of this email):

    Model checking completed. No error has been found.
      Estimates of the probability that TLC did not check all reachable states
      because two distinct states had the same fingerprint:
      calculated (optimistic):  val = 8.6E-8
      based on the actual fingerprints:  val = 9.8E-9
    3100306 states generated, 651251 distinct states found, 0 states left on queue.

IOW, it seems fine in regards to the defined properties + the deadlock
check.

The EventuallyRead liveness property shows that a waiting reader will
always eventually get the lock (no reader starvation), of course assuming
no lock-user blocks in its critical section (i.e. no stuttering steps).

It doesn't hold for writers - they can be starved by a never-ending stream
of readers. IOW

  \* Eventually one writer gets the lock
  <> \E writer \in WRITERS: pc[writer] = "write_cs"

can be disproven by some behaviours. However,

  \* No writer ever gets the lock
  [] \A writer \in WRITERS: pc[writer] # "write_cs"

can be disproven as well - there are *some* paths that allow writers to
get the lock. I haven't found a neater way to check that other than by
having a "debug" property that I don't include in the full-fledged check.



Note that the entire content of a label is considered atomic by TLC. From
the point of view of that spec:

Read lock:
- reader count increment is atomic
- writer count read is atomic

Read unlock:
- reader count decrement is atomic
(The model's writer counter read is in the same atomic block as the
reader count decrement, but it's only used for updating a model-internal
variable, so I'm not including it here)

Write lock:
- write count increment is atomic
- reader count read is atomic
- writer count decrement is atomic

Write unlock:
- writer count increment is atomic
(ditto on the reader counter read)

This doesn't help for the placement of memory barriers, but from an
algorithm point of view that seems to be holding up.



Here's the actual spec (if I keep this up I'll get a git tree going...)
---------------------------------------------------------------------------
specs.tla:

---- MODULE specs ----
EXTENDS Integers, Sequences, FiniteSets, TLC

CONSTANTS
    NR_WRITERS,
    NR_READERS,
    WRITER_TASK,
    READER_TASK

WRITERS == {WRITER_TASK} \X (1..NR_WRITERS)
READERS == {READER_TASK} \X (1..NR_READERS)
THREADS == WRITERS \union READERS

(*--algorithm DRW {

variables
    lock_state = "idle",
    reader_count = 0,
    writer_count = 0

define {

LOCK_STATES == {"idle", "write_locked", "read_locked"}

(* Safety invariants *)
TypeCheck ==
    /\ lock_state \in LOCK_STATES
    /\ reader_count \in (0..NR_READERS)
    /\ writer_count \in (0..NR_WRITERS)

(* Ensure critical section exclusiveness *)
Exclusion ==
    /\ \E writer \in WRITERS: pc[writer] = "write_cs" =>
        \A reader \in READERS: pc[reader] # "read_cs"
    /\ \E reader \in READERS: pc[reader] = "read_cs" =>
        \A writer \in WRITERS: pc[writer] # "write_cs"

(* Ensure the lock state is sane *)
LockState ==
    /\ lock_state = "idle" =>
        (* Not much can be said about the counts - the whole range is valid *)
        /\ \A writer \in WRITERS: pc[writer] # "write_cs"
        /\ \A reader \in READERS: pc[reader] # "read_cs"
    /\ lock_state = "read_locked" =>
        (* Some readers can still be in the locking procedure, just make sure  *)
	(* all readers in their critical section are accounted for *)
        reader_count >= Cardinality({r \in READERS: pc[r] = "read_cs"})
    /\ lock_state = "write_locked" =>
        (* Ditto for writers *)
        writer_count >= Cardinality({w \in WRITERS: pc[w] = "write_cs"})

(* Liveness properties *)

(* A waiting reader *always* eventually gets the lock *)
EventuallyRead ==
    reader_count > 0 /\ lock_state # "read_locked" ~>
        lock_state = "read_locked"

(* A waiting writer *can* eventually get the lock *)
EventuallyWrite ==
    (* TODO: find a way to express that this can be false in some behaviours *)
    (*       but has to be true in > 0 behaviours *)
    TRUE
}

macro ReadUnlock()
{
    reader_count := reader_count - 1;
    \* Condition variable signal is "implicit" here
    if (reader_count = 0) {
        lock_state := "idle";
    };
}

macro WriteUnlock()
{
    writer_count := writer_count - 1;
    \* Ditto on the cond var
    if (writer_count = 0) {
        lock_state := "idle";
    };
}

procedure ReadLock()
{
add:
    reader_count := reader_count + 1;
lock:
    await writer_count = 0;
    lock_state := "read_locked";
    return;
}

procedure WriteLock()
variables rc;
{
loop:
    while (TRUE) {
        writer_count := writer_count + 1;
get_readers:
        rc := reader_count;
check_readers:
        if (rc > 0) {
            writer_count := writer_count - 1;
wait:
            await reader_count = 0;
        } else {
            goto locked;
        };
    };

locked:
    lock_state := "write_locked";
    return;
};

fair process(writer \in WRITERS)
{
loop:
    while (TRUE) {
        call WriteLock();
write_cs:
        skip;
unlock:
        WriteUnlock();
    };
}

fair process(reader \in READERS)
{
loop:
    while (TRUE) {
        call ReadLock();
read_cs:
        skip;
unlock:
        ReadUnlock();
    };

}

}*)
====

specs.cfg:

SPECIFICATION Spec
\* Add statements after this line.

CONSTANTS
	NR_READERS = 3
	NR_WRITERS = 3
	READER_TASK = reader
	WRITER_TASK = writer

INVARIANTS
	TypeCheck
	LockState
	Exclusion

PROPERTIES
	EventuallyRead
	EventuallyWrite

[v2,0/2] Refactor snapshot vs nocow writers locking

Message

Comments