diff mbox

Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

Message ID CAN-5tyF_cHmiPvEqv8N8S_J-TBymOZV+-A0Vh+GCnP3czFB-wg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Olga Kornievskaia Sept. 14, 2015, 11:54 p.m. UTC
A test case is as the description says:
open(foobar, O_WRONLY);
sleep()  --> reboot the server
close(foobar)

The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).

NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.

That line was introduced to fix an infinite loop for OPEN recovery
upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
injecting BAD_STATEID error using the patch below and the code
recovers without problems. However, I'm not sure the clearing of the
bit is needed any more. I have tested for infinite loop by reverting
the patch and didn't hit the infinite loop.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Trond Myklebust Sept. 15, 2015, 1:39 p.m. UTC | #1
On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> A test case is as the description says:
> open(foobar, O_WRONLY);
> sleep()  --> reboot the server
> close(foobar)
>
> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
> line before going to restart, there is
> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>
> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
> out state and when we go to close it, “call_close” doesn’t get set as
> state flag is not set and CLOSE doesn’t go on the wire.
>
> That line was introduced to fix an infinite loop for OPEN recovery
> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
> injecting BAD_STATEID error using the patch below and the code
> recovers without problems. However, I'm not sure the clearing of the
> bit is needed any more. I have tested for infinite loop by reverting
> the patch and didn't hit the infinite loop.
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index da73bc4..5db3246 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1481,7 +1481,7 @@ restart:
>                                         spin_unlock(&state->state_lock);
>                                 }
>                                 nfs4_put_open_state(state);
> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>                                         &state->flags);
>                                 spin_lock(&sp->so_lock);
>                                 goto restart;

That's an obvious typo. Thanks for spotting it!

As for whether or not the bit clear is needed at all, I think it is
for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
state recovery drain the slot table (just like we've always done for
NFSv4.1) and so I agree that those kernels probably won't be
afflicted.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia Sept. 15, 2015, 2:27 p.m. UTC | #2
On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> A test case is as the description says:
>> open(foobar, O_WRONLY);
>> sleep()  --> reboot the server
>> close(foobar)
>>
>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>> line before going to restart, there is
>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>
>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>> out state and when we go to close it, “call_close” doesn’t get set as
>> state flag is not set and CLOSE doesn’t go on the wire.
>>
>> That line was introduced to fix an infinite loop for OPEN recovery
>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>> injecting BAD_STATEID error using the patch below and the code
>> recovers without problems. However, I'm not sure the clearing of the
>> bit is needed any more. I have tested for infinite loop by reverting
>> the patch and didn't hit the infinite loop.
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index da73bc4..5db3246 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1481,7 +1481,7 @@ restart:
>>                                         spin_unlock(&state->state_lock);
>>                                 }
>>                                 nfs4_put_open_state(state);
>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>                                         &state->flags);
>>                                 spin_lock(&sp->so_lock);
>>                                 goto restart;
>
> That's an obvious typo. Thanks for spotting it!
>
> As for whether or not the bit clear is needed at all, I think it is
> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
> state recovery drain the slot table (just like we've always done for
> NFSv4.1) and so I agree that those kernels probably won't be
> afflicted.
>

Thanks Trond. Do you need me to resubmit it without the last paragraph
or is the patch ok as is?

> Cheers
>   Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Sept. 15, 2015, 3:49 p.m. UTC | #3
On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> A test case is as the description says:
>>> open(foobar, O_WRONLY);
>>> sleep()  --> reboot the server
>>> close(foobar)
>>>
>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>> line before going to restart, there is
>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>
>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>> out state and when we go to close it, “call_close” doesn’t get set as
>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>
>>> That line was introduced to fix an infinite loop for OPEN recovery
>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>> injecting BAD_STATEID error using the patch below and the code
>>> recovers without problems. However, I'm not sure the clearing of the
>>> bit is needed any more. I have tested for infinite loop by reverting
>>> the patch and didn't hit the infinite loop.
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index da73bc4..5db3246 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -1481,7 +1481,7 @@ restart:
>>>                                         spin_unlock(&state->state_lock);
>>>                                 }
>>>                                 nfs4_put_open_state(state);
>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>                                         &state->flags);
>>>                                 spin_lock(&sp->so_lock);
>>>                                 goto restart;
>>
>> That's an obvious typo. Thanks for spotting it!
>>
>> As for whether or not the bit clear is needed at all, I think it is
>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>> state recovery drain the slot table (just like we've always done for
>> NFSv4.1) and so I agree that those kernels probably won't be
>> afflicted.
>>
>
> Thanks Trond. Do you need me to resubmit it without the last paragraph
> or is the patch ok as is?
>

I can easily remove that paragraph when applying the patch, if you
agree that it is superfluous.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia Sept. 15, 2015, 4:52 p.m. UTC | #4
On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>> <trond.myklebust@primarydata.com> wrote:
>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>> A test case is as the description says:
>>>> open(foobar, O_WRONLY);
>>>> sleep()  --> reboot the server
>>>> close(foobar)
>>>>
>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>> line before going to restart, there is
>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>
>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>
>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>> injecting BAD_STATEID error using the patch below and the code
>>>> recovers without problems. However, I'm not sure the clearing of the
>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>> the patch and didn't hit the infinite loop.
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index da73bc4..5db3246 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>                                         spin_unlock(&state->state_lock);
>>>>                                 }
>>>>                                 nfs4_put_open_state(state);
>>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>                                         &state->flags);
>>>>                                 spin_lock(&sp->so_lock);
>>>>                                 goto restart;
>>>
>>> That's an obvious typo. Thanks for spotting it!
>>>
>>> As for whether or not the bit clear is needed at all, I think it is
>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>> state recovery drain the slot table (just like we've always done for
>>> NFSv4.1) and so I agree that those kernels probably won't be
>>> afflicted.
>>>
>>
>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>> or is the patch ok as is?
>>
>
> I can easily remove that paragraph when applying the patch, if you
> agree that it is superfluous.

Thanks. Works for me.

>
> Cheers
>   Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Sept. 17, 2015, 1:34 p.m. UTC | #5
On Tue, Sep 15, 2015 at 12:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>>> <trond.myklebust@primarydata.com> wrote:
>>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>> A test case is as the description says:
>>>>> open(foobar, O_WRONLY);
>>>>> sleep()  --> reboot the server
>>>>> close(foobar)
>>>>>
>>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>>> line before going to restart, there is
>>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>>
>>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>>
>>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>>> injecting BAD_STATEID error using the patch below and the code
>>>>> recovers without problems. However, I'm not sure the clearing of the
>>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>>> the patch and didn't hit the infinite loop.
>>>>>
>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>> index da73bc4..5db3246 100644
>>>>> --- a/fs/nfs/nfs4state.c
>>>>> +++ b/fs/nfs/nfs4state.c
>>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>>                                         spin_unlock(&state->state_lock);
>>>>>                                 }
>>>>>                                 nfs4_put_open_state(state);
>>>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>>                                         &state->flags);
>>>>>                                 spin_lock(&sp->so_lock);
>>>>>                                 goto restart;
>>>>
>>>> That's an obvious typo. Thanks for spotting it!
>>>>
>>>> As for whether or not the bit clear is needed at all, I think it is
>>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>>> state recovery drain the slot table (just like we've always done for
>>>> NFSv4.1) and so I agree that those kernels probably won't be
>>>> afflicted.
>>>>
>>>
>>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>>> or is the patch ok as is?
>>>
>>
>> I can easily remove that paragraph when applying the patch, if you
>> agree that it is superfluous.
>
> Thanks. Works for me.
>

May I also add a signed-off-by line from you? I can't really apply
this (or any other patches) without it.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia Sept. 17, 2015, 1:36 p.m. UTC | #6
On Thu, Sep 17, 2015 at 9:34 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Tue, Sep 15, 2015 at 12:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
>> <trond.myklebust@primarydata.com> wrote:
>>> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>>>> <trond.myklebust@primarydata.com> wrote:
>>>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>> A test case is as the description says:
>>>>>> open(foobar, O_WRONLY);
>>>>>> sleep()  --> reboot the server
>>>>>> close(foobar)
>>>>>>
>>>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>>>> line before going to restart, there is
>>>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>>>
>>>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>>>
>>>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>>>> injecting BAD_STATEID error using the patch below and the code
>>>>>> recovers without problems. However, I'm not sure the clearing of the
>>>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>>>> the patch and didn't hit the infinite loop.
>>>>>>
>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>> index da73bc4..5db3246 100644
>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>>>                                         spin_unlock(&state->state_lock);
>>>>>>                                 }
>>>>>>                                 nfs4_put_open_state(state);
>>>>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>>>                                         &state->flags);
>>>>>>                                 spin_lock(&sp->so_lock);
>>>>>>                                 goto restart;
>>>>>
>>>>> That's an obvious typo. Thanks for spotting it!
>>>>>
>>>>> As for whether or not the bit clear is needed at all, I think it is
>>>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>>>> state recovery drain the slot table (just like we've always done for
>>>>> NFSv4.1) and so I agree that those kernels probably won't be
>>>>> afflicted.
>>>>>
>>>>
>>>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>>>> or is the patch ok as is?
>>>>
>>>
>>> I can easily remove that paragraph when applying the patch, if you
>>> agree that it is superfluous.
>>
>> Thanks. Works for me.
>>
>
> May I also add a signed-off-by line from you? I can't really apply
> this (or any other patches) without it.

Of course.

>
> Cheers
>   Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index da73bc4..5db3246 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1481,7 +1481,7 @@  restart:
                                        spin_unlock(&state->state_lock);
                                }
                                nfs4_put_open_state(state);
-                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
+                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
                                        &state->flags);
                                spin_lock(&sp->so_lock);
                                goto restart;