diff mbox series

ceph: wait on async create before checking caps for syncfs

Message ID 20220606233142.150457-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series ceph: wait on async create before checking caps for syncfs | expand

Commit Message

Jeffrey Layton June 6, 2022, 11:31 p.m. UTC
Currently, we'll call ceph_check_caps, but if we're still waiting on the
reply, we'll end up spinning around on the same inode in
flush_dirty_session_caps. Wait for the async create reply before
flushing caps.

Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
URL: https://tracker.ceph.com/issues/55823
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 1 +
 1 file changed, 1 insertion(+)

I don't know if this will fix the tx queue stalls completely, but I
haven't seen one with this patch in place. I think it makes sense on its
own, either way.

Comments

Xiubo Li June 7, 2022, 1:11 a.m. UTC | #1
On 6/7/22 7:31 AM, Jeff Layton wrote:
> Currently, we'll call ceph_check_caps, but if we're still waiting on the
> reply, we'll end up spinning around on the same inode in
> flush_dirty_session_caps. Wait for the async create reply before
> flushing caps.
>
> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> URL: https://tracker.ceph.com/issues/55823
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/caps.c | 1 +
>   1 file changed, 1 insertion(+)
>
> I don't know if this will fix the tx queue stalls completely, but I
> haven't seen one with this patch in place. I think it makes sense on its
> own, either way.
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 0a48bf829671..5ecfff4b37c9 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>   		ihold(inode);
>   		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>   		spin_unlock(&mdsc->cap_dirty_lock);
> +		ceph_wait_on_async_create(inode);
>   		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>   		iput(inode);
>   		spin_lock(&mdsc->cap_dirty_lock);

This looks good.

Possibly we can add one dedicated list to store the async creating 
inodes instead of getting stuck all the others ?

-- Xiubo
Jeffrey Layton June 7, 2022, 1:21 a.m. UTC | #2
On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote:
> On 6/7/22 7:31 AM, Jeff Layton wrote:
> > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > reply, we'll end up spinning around on the same inode in
> > flush_dirty_session_caps. Wait for the async create reply before
> > flushing caps.
> > 
> > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > URL: https://tracker.ceph.com/issues/55823
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >   fs/ceph/caps.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > I don't know if this will fix the tx queue stalls completely, but I
> > haven't seen one with this patch in place. I think it makes sense on its
> > own, either way.
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 0a48bf829671..5ecfff4b37c9 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >   		ihold(inode);
> >   		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >   		spin_unlock(&mdsc->cap_dirty_lock);
> > +		ceph_wait_on_async_create(inode);
> >   		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >   		iput(inode);
> >   		spin_lock(&mdsc->cap_dirty_lock);
> 
> This looks good.
> 
> Possibly we can add one dedicated list to store the async creating 
> inodes instead of getting stuck all the others ?
> 

I'd be open to that. I think we ought to take this patch first to fix
the immediate bug though, before we add extra complexity.
Xiubo Li June 7, 2022, 1:50 a.m. UTC | #3
On 6/7/22 9:21 AM, Jeff Layton wrote:
> On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote:
>> On 6/7/22 7:31 AM, Jeff Layton wrote:
>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
>>> reply, we'll end up spinning around on the same inode in
>>> flush_dirty_session_caps. Wait for the async create reply before
>>> flushing caps.
>>>
>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
>>> URL: https://tracker.ceph.com/issues/55823
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>    fs/ceph/caps.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> I don't know if this will fix the tx queue stalls completely, but I
>>> haven't seen one with this patch in place. I think it makes sense on its
>>> own, either way.
>>>
>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>> index 0a48bf829671..5ecfff4b37c9 100644
>>> --- a/fs/ceph/caps.c
>>> +++ b/fs/ceph/caps.c
>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>>>    		ihold(inode);
>>>    		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>>>    		spin_unlock(&mdsc->cap_dirty_lock);
>>> +		ceph_wait_on_async_create(inode);
>>>    		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>>>    		iput(inode);
>>>    		spin_lock(&mdsc->cap_dirty_lock);
>> This looks good.
>>
>> Possibly we can add one dedicated list to store the async creating
>> inodes instead of getting stuck all the others ?
>>
> I'd be open to that. I think we ought to take this patch first to fix
> the immediate bug though, before we add extra complexity.

Sounds good to me.

I will merge it to the testing branch for now and let's improve it later.

Thanks

-- Xiubo
Jeffrey Layton June 8, 2022, 1:58 p.m. UTC | #4
On Tue, 2022-06-07 at 09:50 +0800, Xiubo Li wrote:
> On 6/7/22 9:21 AM, Jeff Layton wrote:
> > On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote:
> > > On 6/7/22 7:31 AM, Jeff Layton wrote:
> > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > > > reply, we'll end up spinning around on the same inode in
> > > > flush_dirty_session_caps. Wait for the async create reply before
> > > > flushing caps.
> > > > 
> > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > > > URL: https://tracker.ceph.com/issues/55823
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >    fs/ceph/caps.c | 1 +
> > > >    1 file changed, 1 insertion(+)
> > > > 
> > > > I don't know if this will fix the tx queue stalls completely, but I
> > > > haven't seen one with this patch in place. I think it makes sense on its
> > > > own, either way.
> > > > 
> > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > index 0a48bf829671..5ecfff4b37c9 100644
> > > > --- a/fs/ceph/caps.c
> > > > +++ b/fs/ceph/caps.c
> > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> > > >    		ihold(inode);
> > > >    		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> > > >    		spin_unlock(&mdsc->cap_dirty_lock);
> > > > +		ceph_wait_on_async_create(inode);
> > > >    		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> > > >    		iput(inode);
> > > >    		spin_lock(&mdsc->cap_dirty_lock);
> > > This looks good.
> > > 
> > > Possibly we can add one dedicated list to store the async creating
> > > inodes instead of getting stuck all the others ?
> > > 
> > I'd be open to that. I think we ought to take this patch first to fix
> > the immediate bug though, before we add extra complexity.
> 
> Sounds good to me.
> 
> I will merge it to the testing branch for now and let's improve it later.
> 

Can we also mark this for stable? It's a pretty nasty bug, potentially.
We should get this into mainline fairly soon.

Thanks,
Xiubo Li June 9, 2022, 12:31 a.m. UTC | #5
On 6/8/22 9:58 PM, Jeff Layton wrote:
> On Tue, 2022-06-07 at 09:50 +0800, Xiubo Li wrote:
>> On 6/7/22 9:21 AM, Jeff Layton wrote:
>>> On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote:
>>>> On 6/7/22 7:31 AM, Jeff Layton wrote:
>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
>>>>> reply, we'll end up spinning around on the same inode in
>>>>> flush_dirty_session_caps. Wait for the async create reply before
>>>>> flushing caps.
>>>>>
>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
>>>>> URL: https://tracker.ceph.com/issues/55823
>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>> ---
>>>>>     fs/ceph/caps.c | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> I don't know if this will fix the tx queue stalls completely, but I
>>>>> haven't seen one with this patch in place. I think it makes sense on its
>>>>> own, either way.
>>>>>
>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>>> index 0a48bf829671..5ecfff4b37c9 100644
>>>>> --- a/fs/ceph/caps.c
>>>>> +++ b/fs/ceph/caps.c
>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>>>>>     		ihold(inode);
>>>>>     		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>>>>>     		spin_unlock(&mdsc->cap_dirty_lock);
>>>>> +		ceph_wait_on_async_create(inode);
>>>>>     		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>>>>>     		iput(inode);
>>>>>     		spin_lock(&mdsc->cap_dirty_lock);
>>>> This looks good.
>>>>
>>>> Possibly we can add one dedicated list to store the async creating
>>>> inodes instead of getting stuck all the others ?
>>>>
>>> I'd be open to that. I think we ought to take this patch first to fix
>>> the immediate bug though, before we add extra complexity.
>> Sounds good to me.
>>
>> I will merge it to the testing branch for now and let's improve it later.
>>
> Can we also mark this for stable? It's a pretty nasty bug, potentially.
> We should get this into mainline fairly soon.
Yeah, sure.
>
> Thanks,
Yan, Zheng June 9, 2022, 2:15 a.m. UTC | #6
The recent series of patches that add "wait on async xxxx" at various
places do not seem correct. The correct fix should make mds avoid any
wait when handling async requests.


On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Currently, we'll call ceph_check_caps, but if we're still waiting on the
> reply, we'll end up spinning around on the same inode in
> flush_dirty_session_caps. Wait for the async create reply before
> flushing caps.
>
> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> URL: https://tracker.ceph.com/issues/55823
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c | 1 +
>  1 file changed, 1 insertion(+)
>
> I don't know if this will fix the tx queue stalls completely, but I
> haven't seen one with this patch in place. I think it makes sense on its
> own, either way.
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 0a48bf829671..5ecfff4b37c9 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>                 ihold(inode);
>                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>                 spin_unlock(&mdsc->cap_dirty_lock);
> +               ceph_wait_on_async_create(inode);
>                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>                 iput(inode);
>                 spin_lock(&mdsc->cap_dirty_lock);
> --
> 2.36.1
>
Xiubo Li June 9, 2022, 3:18 a.m. UTC | #7
On 6/9/22 10:15 AM, Yan, Zheng wrote:
> The recent series of patches that add "wait on async xxxx" at various
> places do not seem correct. The correct fix should make mds avoid any
> wait when handling async requests.
>
In this case I am thinking what will happen if the async create request 
is deferred, then the cap flush related request should fail to find the 
ino.

Should we wait ? Then how to distinguish from migrating a subtree and a 
deferred async create cases ?


> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
>> reply, we'll end up spinning around on the same inode in
>> flush_dirty_session_caps. Wait for the async create reply before
>> flushing caps.
>>
>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
>> URL: https://tracker.ceph.com/issues/55823
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>   fs/ceph/caps.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> I don't know if this will fix the tx queue stalls completely, but I
>> haven't seen one with this patch in place. I think it makes sense on its
>> own, either way.
>>
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index 0a48bf829671..5ecfff4b37c9 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>>                  ihold(inode);
>>                  dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>>                  spin_unlock(&mdsc->cap_dirty_lock);
>> +               ceph_wait_on_async_create(inode);
>>                  ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>>                  iput(inode);
>>                  spin_lock(&mdsc->cap_dirty_lock);
>> --
>> 2.36.1
>>
Yan, Zheng June 9, 2022, 3:29 a.m. UTC | #8
On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/9/22 10:15 AM, Yan, Zheng wrote:
> > The recent series of patches that add "wait on async xxxx" at various
> > places do not seem correct. The correct fix should make mds avoid any
> > wait when handling async requests.
> >
> In this case I am thinking what will happen if the async create request
> is deferred, then the cap flush related request should fail to find the
> ino.
>
> Should we wait ? Then how to distinguish from migrating a subtree and a
> deferred async create cases ?
>

async op caps are revoked at freezingtree stage of subtree migration.
see Locker::invalidate_lock_caches

>
> > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> >> Currently, we'll call ceph_check_caps, but if we're still waiting on the
> >> reply, we'll end up spinning around on the same inode in
> >> flush_dirty_session_caps. Wait for the async create reply before
> >> flushing caps.
> >>
> >> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> >> URL: https://tracker.ceph.com/issues/55823
> >> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> >> ---
> >>   fs/ceph/caps.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> I don't know if this will fix the tx queue stalls completely, but I
> >> haven't seen one with this patch in place. I think it makes sense on its
> >> own, either way.
> >>
> >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> >> index 0a48bf829671..5ecfff4b37c9 100644
> >> --- a/fs/ceph/caps.c
> >> +++ b/fs/ceph/caps.c
> >> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >>                  ihold(inode);
> >>                  dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >>                  spin_unlock(&mdsc->cap_dirty_lock);
> >> +               ceph_wait_on_async_create(inode);
> >>                  ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >>                  iput(inode);
> >>                  spin_lock(&mdsc->cap_dirty_lock);
> >> --
> >> 2.36.1
> >>
>
Xiubo Li June 9, 2022, 3:55 a.m. UTC | #9
On 6/9/22 11:29 AM, Yan, Zheng wrote:
> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 6/9/22 10:15 AM, Yan, Zheng wrote:
>>> The recent series of patches that add "wait on async xxxx" at various
>>> places do not seem correct. The correct fix should make mds avoid any
>>> wait when handling async requests.
>>>
>> In this case I am thinking what will happen if the async create request
>> is deferred, then the cap flush related request should fail to find the
>> ino.
>>
>> Should we wait ? Then how to distinguish from migrating a subtree and a
>> deferred async create cases ?
>>
> async op caps are revoked at freezingtree stage of subtree migration.
> see Locker::invalidate_lock_caches
>
Sorry I may not totally understand this issue.

I think you mean in case of migration and then the MDS will revoke caps 
for the async create files and then the kclient will send a MclientCap 
request to mds, right ?

If my understanding is correct, there is another case that:

1, async create a fileA

2, then write a lot of data to it and then release the Fw cap ref, and 
if we should report the size to MDS, it will send a MclientCap request 
to MDS too.

3, what if the async create of fileA was deferred due to some reason, 
then the MclientCap request will fail to find the ino ?


>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
>>>> reply, we'll end up spinning around on the same inode in
>>>> flush_dirty_session_caps. Wait for the async create reply before
>>>> flushing caps.
>>>>
>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
>>>> URL: https://tracker.ceph.com/issues/55823
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>>    fs/ceph/caps.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> I don't know if this will fix the tx queue stalls completely, but I
>>>> haven't seen one with this patch in place. I think it makes sense on its
>>>> own, either way.
>>>>
>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>> index 0a48bf829671..5ecfff4b37c9 100644
>>>> --- a/fs/ceph/caps.c
>>>> +++ b/fs/ceph/caps.c
>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>>>>                   ihold(inode);
>>>>                   dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>>>>                   spin_unlock(&mdsc->cap_dirty_lock);
>>>> +               ceph_wait_on_async_create(inode);
>>>>                   ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>>>>                   iput(inode);
>>>>                   spin_lock(&mdsc->cap_dirty_lock);
>>>> --
>>>> 2.36.1
>>>>
Yan, Zheng June 9, 2022, 4:02 a.m. UTC | #10
On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/9/22 11:29 AM, Yan, Zheng wrote:
> > On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 6/9/22 10:15 AM, Yan, Zheng wrote:
> >>> The recent series of patches that add "wait on async xxxx" at various
> >>> places do not seem correct. The correct fix should make mds avoid any
> >>> wait when handling async requests.
> >>>
> >> In this case I am thinking what will happen if the async create request
> >> is deferred, then the cap flush related request should fail to find the
> >> ino.
> >>
> >> Should we wait ? Then how to distinguish from migrating a subtree and a
> >> deferred async create cases ?
> >>
> > async op caps are revoked at freezingtree stage of subtree migration.
> > see Locker::invalidate_lock_caches
> >
> Sorry I may not totally understand this issue.
>
> I think you mean in case of migration and then the MDS will revoke caps
> for the async create files and then the kclient will send a MclientCap
> request to mds, right ?
>
> If my understanding is correct, there is another case that:
>
> 1, async create a fileA
>
> 2, then write a lot of data to it and then release the Fw cap ref, and
> if we should report the size to MDS, it will send a MclientCap request
> to MDS too.
>
> 3, what if the async create of fileA was deferred due to some reason,
> then the MclientCap request will fail to find the ino ?
>

Async op should not be deferred in any case.

>
> >>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> >>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
> >>>> reply, we'll end up spinning around on the same inode in
> >>>> flush_dirty_session_caps. Wait for the async create reply before
> >>>> flushing caps.
> >>>>
> >>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> >>>> URL: https://tracker.ceph.com/issues/55823
> >>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> >>>> ---
> >>>>    fs/ceph/caps.c | 1 +
> >>>>    1 file changed, 1 insertion(+)
> >>>>
> >>>> I don't know if this will fix the tx queue stalls completely, but I
> >>>> haven't seen one with this patch in place. I think it makes sense on its
> >>>> own, either way.
> >>>>
> >>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> >>>> index 0a48bf829671..5ecfff4b37c9 100644
> >>>> --- a/fs/ceph/caps.c
> >>>> +++ b/fs/ceph/caps.c
> >>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >>>>                   ihold(inode);
> >>>>                   dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >>>>                   spin_unlock(&mdsc->cap_dirty_lock);
> >>>> +               ceph_wait_on_async_create(inode);
> >>>>                   ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >>>>                   iput(inode);
> >>>>                   spin_lock(&mdsc->cap_dirty_lock);
> >>>> --
> >>>> 2.36.1
> >>>>
>
Xiubo Li June 9, 2022, 4:14 a.m. UTC | #11
On 6/9/22 12:02 PM, Yan, Zheng wrote:
> On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 6/9/22 11:29 AM, Yan, Zheng wrote:
>>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>> On 6/9/22 10:15 AM, Yan, Zheng wrote:
>>>>> The recent series of patches that add "wait on async xxxx" at various
>>>>> places do not seem correct. The correct fix should make mds avoid any
>>>>> wait when handling async requests.
>>>>>
>>>> In this case I am thinking what will happen if the async create request
>>>> is deferred, then the cap flush related request should fail to find the
>>>> ino.
>>>>
>>>> Should we wait ? Then how to distinguish from migrating a subtree and a
>>>> deferred async create cases ?
>>>>
>>> async op caps are revoked at freezingtree stage of subtree migration.
>>> see Locker::invalidate_lock_caches
>>>
>> Sorry I may not totally understand this issue.
>>
>> I think you mean in case of migration and then the MDS will revoke caps
>> for the async create files and then the kclient will send a MclientCap
>> request to mds, right ?
>>
>> If my understanding is correct, there is another case that:
>>
>> 1, async create a fileA
>>
>> 2, then write a lot of data to it and then release the Fw cap ref, and
>> if we should report the size to MDS, it will send a MclientCap request
>> to MDS too.
>>
>> 3, what if the async create of fileA was deferred due to some reason,
>> then the MclientCap request will fail to find the ino ?
>>
> Async op should not be deferred in any case.

I am still checking the 'mdcache->path_traverse()', in which it seems 
could forward the request or requeue the request when failing to acquire 
locks. And also in case [1].

[1] https://github.com/ceph/ceph/blob/main/src/mds/Server.cc#L4501.


>>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
>>>>>> reply, we'll end up spinning around on the same inode in
>>>>>> flush_dirty_session_caps. Wait for the async create reply before
>>>>>> flushing caps.
>>>>>>
>>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
>>>>>> URL: https://tracker.ceph.com/issues/55823
>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>> ---
>>>>>>     fs/ceph/caps.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> I don't know if this will fix the tx queue stalls completely, but I
>>>>>> haven't seen one with this patch in place. I think it makes sense on its
>>>>>> own, either way.
>>>>>>
>>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>>>> index 0a48bf829671..5ecfff4b37c9 100644
>>>>>> --- a/fs/ceph/caps.c
>>>>>> +++ b/fs/ceph/caps.c
>>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>>>>>>                    ihold(inode);
>>>>>>                    dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>>>>>>                    spin_unlock(&mdsc->cap_dirty_lock);
>>>>>> +               ceph_wait_on_async_create(inode);
>>>>>>                    ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>>>>>>                    iput(inode);
>>>>>>                    spin_lock(&mdsc->cap_dirty_lock);
>>>>>> --
>>>>>> 2.36.1
>>>>>>
Xiubo Li June 9, 2022, 4:17 a.m. UTC | #12
On 6/9/22 12:02 PM, Yan, Zheng wrote:
> On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 6/9/22 11:29 AM, Yan, Zheng wrote:
>>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>> On 6/9/22 10:15 AM, Yan, Zheng wrote:
>>>>> The recent series of patches that add "wait on async xxxx" at various
>>>>> places do not seem correct. The correct fix should make mds avoid any
>>>>> wait when handling async requests.
>>>>>
>>>> In this case I am thinking what will happen if the async create request
>>>> is deferred, then the cap flush related request should fail to find the
>>>> ino.
>>>>
>>>> Should we wait ? Then how to distinguish from migrating a subtree and a
>>>> deferred async create cases ?
>>>>
>>> async op caps are revoked at freezingtree stage of subtree migration.
>>> see Locker::invalidate_lock_caches
>>>
>> Sorry I may not totally understand this issue.
>>
>> I think you mean in case of migration and then the MDS will revoke caps
>> for the async create files and then the kclient will send a MclientCap
>> request to mds, right ?
>>
>> If my understanding is correct, there is another case that:
>>
>> 1, async create a fileA
>>
>> 2, then write a lot of data to it and then release the Fw cap ref, and
>> if we should report the size to MDS, it will send a MclientCap request
>> to MDS too.
>>
>> 3, what if the async create of fileA was deferred due to some reason,
>> then the MclientCap request will fail to find the ino ?
>>
> Async op should not be deferred in any case.

Recently we have hit a similar bug, caused by deferring a request and 
requeuing it and the following request was executed before it.


>>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
>>>>>> reply, we'll end up spinning around on the same inode in
>>>>>> flush_dirty_session_caps. Wait for the async create reply before
>>>>>> flushing caps.
>>>>>>
>>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
>>>>>> URL: https://tracker.ceph.com/issues/55823
>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>> ---
>>>>>>     fs/ceph/caps.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> I don't know if this will fix the tx queue stalls completely, but I
>>>>>> haven't seen one with this patch in place. I think it makes sense on its
>>>>>> own, either way.
>>>>>>
>>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>>>> index 0a48bf829671..5ecfff4b37c9 100644
>>>>>> --- a/fs/ceph/caps.c
>>>>>> +++ b/fs/ceph/caps.c
>>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
>>>>>>                    ihold(inode);
>>>>>>                    dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
>>>>>>                    spin_unlock(&mdsc->cap_dirty_lock);
>>>>>> +               ceph_wait_on_async_create(inode);
>>>>>>                    ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
>>>>>>                    iput(inode);
>>>>>>                    spin_lock(&mdsc->cap_dirty_lock);
>>>>>> --
>>>>>> 2.36.1
>>>>>>
Yan, Zheng June 9, 2022, 3:20 p.m. UTC | #13
On Thu, Jun 9, 2022 at 12:18 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/9/22 12:02 PM, Yan, Zheng wrote:
> > On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 6/9/22 11:29 AM, Yan, Zheng wrote:
> >>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>> On 6/9/22 10:15 AM, Yan, Zheng wrote:
> >>>>> The recent series of patches that add "wait on async xxxx" at various
> >>>>> places do not seem correct. The correct fix should make mds avoid any
> >>>>> wait when handling async requests.
> >>>>>
> >>>> In this case I am thinking what will happen if the async create request
> >>>> is deferred, then the cap flush related request should fail to find the
> >>>> ino.
> >>>>
> >>>> Should we wait ? Then how to distinguish from migrating a subtree and a
> >>>> deferred async create cases ?
> >>>>
> >>> async op caps are revoked at freezingtree stage of subtree migration.
> >>> see Locker::invalidate_lock_caches
> >>>
> >> Sorry I may not totally understand this issue.
> >>
> >> I think you mean in case of migration and then the MDS will revoke caps
> >> for the async create files and then the kclient will send a MclientCap
> >> request to mds, right ?
> >>
> >> If my understanding is correct, there is another case that:
> >>
> >> 1, async create a fileA
> >>
> >> 2, then write a lot of data to it and then release the Fw cap ref, and
> >> if we should report the size to MDS, it will send a MclientCap request
> >> to MDS too.
> >>
> >> 3, what if the async create of fileA was deferred due to some reason,
> >> then the MclientCap request will fail to find the ino ?
> >>
> > Async op should not be deferred in any case.
>
> Recently we have hit a similar bug, caused by deferring a request and
> requeuing it and the following request was executed before it.
>
that bug is mds bug.

> >>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> >>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the
> >>>>>> reply, we'll end up spinning around on the same inode in
> >>>>>> flush_dirty_session_caps. Wait for the async create reply before
> >>>>>> flushing caps.
> >>>>>>
> >>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> >>>>>> URL: https://tracker.ceph.com/issues/55823
> >>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> >>>>>> ---
> >>>>>>     fs/ceph/caps.c | 1 +
> >>>>>>     1 file changed, 1 insertion(+)
> >>>>>>
> >>>>>> I don't know if this will fix the tx queue stalls completely, but I
> >>>>>> haven't seen one with this patch in place. I think it makes sense on its
> >>>>>> own, either way.
> >>>>>>
> >>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> >>>>>> index 0a48bf829671..5ecfff4b37c9 100644
> >>>>>> --- a/fs/ceph/caps.c
> >>>>>> +++ b/fs/ceph/caps.c
> >>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >>>>>>                    ihold(inode);
> >>>>>>                    dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >>>>>>                    spin_unlock(&mdsc->cap_dirty_lock);
> >>>>>> +               ceph_wait_on_async_create(inode);
> >>>>>>                    ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >>>>>>                    iput(inode);
> >>>>>>                    spin_lock(&mdsc->cap_dirty_lock);
> >>>>>> --
> >>>>>> 2.36.1
> >>>>>>
>
Jeffrey Layton June 29, 2022, 12:08 p.m. UTC | #14
Are you suggesting that the MDS ought to hold a cap message for an inode
before its create request is processed? Note that the MDS won't even be
aware that the inode even _exists_ at that point. As far as the MDS
knows, it's just be a delegated inode number to the client. At what
point does the MDS give up on holding such a cap request if the create
request never comes in for some reason?

I don't see the harm in making the client wait until it gets a create
reply before sending a cap message. If we want to revert fbed7045f552
instead, we can do that, but it'll cause a regression until the MDS is
fixed [1]. Regardless, we need to either take this patch or revert that
one. 

I move that we take this patch for now to address the softlockups. Once
the MDS is fixed we could revert this and fbed7045f552 without causing a
regression.

[1]: https://tracker.ceph.com/issues/54107


On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote:
> The recent series of patches that add "wait on async xxxx" at various
> places do not seem correct. The correct fix should make mds avoid any
> wait when handling async requests.
> 
> 
> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > reply, we'll end up spinning around on the same inode in
> > flush_dirty_session_caps. Wait for the async create reply before
> > flushing caps.
> > 
> > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > URL: https://tracker.ceph.com/issues/55823
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > I don't know if this will fix the tx queue stalls completely, but I
> > haven't seen one with this patch in place. I think it makes sense on its
> > own, either way.
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 0a48bf829671..5ecfff4b37c9 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> >                 ihold(inode);
> >                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> >                 spin_unlock(&mdsc->cap_dirty_lock);
> > +               ceph_wait_on_async_create(inode);
> >                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> >                 iput(inode);
> >                 spin_lock(&mdsc->cap_dirty_lock);
> > --
> > 2.36.1
> >
Yan, Zheng June 30, 2022, 2:33 a.m. UTC | #15
On Wed, Jun 29, 2022 at 8:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Are you suggesting that the MDS ought to hold a cap message for an inode
> before its create request is processed? Note that the MDS won't even be
> aware that the inode even _exists_ at that point. As far as the MDS
> knows, it's just be a delegated inode number to the client. At what
> point does the MDS give up on holding such a cap request if the create
> request never comes in for some reason?
>
For an async request, MDS should not process it immediately.  If there
is any wait when handling async request, it's mds bug. I suggest
tracking down any wait, and fix it.


> I don't see the harm in making the client wait until it gets a create
> reply before sending a cap message. If we want to revert fbed7045f552
> instead, we can do that, but it'll cause a regression until the MDS is
> fixed [1]. Regardless, we need to either take this patch or revert that
> one.
>
> I move that we take this patch for now to address the softlockups. Once
> the MDS is fixed we could revert this and fbed7045f552 without causing a
> regression.
>
> [1]: https://tracker.ceph.com/issues/54107
>
>
> On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote:
> > The recent series of patches that add "wait on async xxxx" at various
> > places do not seem correct. The correct fix should make mds avoid any
> > wait when handling async requests.
> >
> >
> > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > > reply, we'll end up spinning around on the same inode in
> > > flush_dirty_session_caps. Wait for the async create reply before
> > > flushing caps.
> > >
> > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > > URL: https://tracker.ceph.com/issues/55823
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/caps.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > I don't know if this will fix the tx queue stalls completely, but I
> > > haven't seen one with this patch in place. I think it makes sense on its
> > > own, either way.
> > >
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index 0a48bf829671..5ecfff4b37c9 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> > >                 ihold(inode);
> > >                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> > >                 spin_unlock(&mdsc->cap_dirty_lock);
> > > +               ceph_wait_on_async_create(inode);
> > >                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> > >                 iput(inode);
> > >                 spin_lock(&mdsc->cap_dirty_lock);
> > > --
> > > 2.36.1
> > >
>
> --
> Jeff Layton <jlayton@kernel.org>
Yan, Zheng June 30, 2022, 3:32 a.m. UTC | #16
On Thu, Jun 30, 2022 at 10:33 AM Yan, Zheng <ukernel@gmail.com> wrote:
>
> On Wed, Jun 29, 2022 at 8:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > Are you suggesting that the MDS ought to hold a cap message for an inode
> > before its create request is processed? Note that the MDS won't even be
> > aware that the inode even _exists_ at that point. As far as the MDS
> > knows, it's just be a delegated inode number to the client. At what
> > point does the MDS give up on holding such a cap request if the create
> > request never comes in for some reason?
> >
> For an async request, MDS should not process it immediately.  If there
> is any wait when handling async request, it's mds bug. I suggest
> tracking down any wait, and fix it.
>

I mean: For an async request, MDS should process it immediately. This
is important for preserving request ordering.

>
> > I don't see the harm in making the client wait until it gets a create
> > reply before sending a cap message. If we want to revert fbed7045f552
> > instead, we can do that, but it'll cause a regression until the MDS is
> > fixed [1]. Regardless, we need to either take this patch or revert that
> > one.
> >
> > I move that we take this patch for now to address the softlockups. Once
> > the MDS is fixed we could revert this and fbed7045f552 without causing a
> > regression.
> >
> > [1]: https://tracker.ceph.com/issues/54107
> >
> >
> > On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote:
> > > The recent series of patches that add "wait on async xxxx" at various
> > > places do not seem correct. The correct fix should make mds avoid any
> > > wait when handling async requests.
> > >
> > >
> > > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the
> > > > reply, we'll end up spinning around on the same inode in
> > > > flush_dirty_session_caps. Wait for the async create reply before
> > > > flushing caps.
> > > >
> > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
> > > > URL: https://tracker.ceph.com/issues/55823
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/caps.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > I don't know if this will fix the tx queue stalls completely, but I
> > > > haven't seen one with this patch in place. I think it makes sense on its
> > > > own, either way.
> > > >
> > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > index 0a48bf829671..5ecfff4b37c9 100644
> > > > --- a/fs/ceph/caps.c
> > > > +++ b/fs/ceph/caps.c
> > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
> > > >                 ihold(inode);
> > > >                 dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
> > > >                 spin_unlock(&mdsc->cap_dirty_lock);
> > > > +               ceph_wait_on_async_create(inode);
> > > >                 ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
> > > >                 iput(inode);
> > > >                 spin_lock(&mdsc->cap_dirty_lock);
> > > > --
> > > > 2.36.1
> > > >
> >
> > --
> > Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 0a48bf829671..5ecfff4b37c9 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4389,6 +4389,7 @@  static void flush_dirty_session_caps(struct ceph_mds_session *s)
 		ihold(inode);
 		dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
 		spin_unlock(&mdsc->cap_dirty_lock);
+		ceph_wait_on_async_create(inode);
 		ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
 		iput(inode);
 		spin_lock(&mdsc->cap_dirty_lock);