Message ID | 20220606233142.150457-1-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ceph: wait on async create before checking caps for syncfs | expand |
On 6/7/22 7:31 AM, Jeff Layton wrote: > Currently, we'll call ceph_check_caps, but if we're still waiting on the > reply, we'll end up spinning around on the same inode in > flush_dirty_session_caps. Wait for the async create reply before > flushing caps. > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > URL: https://tracker.ceph.com/issues/55823 > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/ceph/caps.c | 1 + > 1 file changed, 1 insertion(+) > > I don't know if this will fix the tx queue stalls completely, but I > haven't seen one with this patch in place. I think it makes sense on its > own, either way. > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index 0a48bf829671..5ecfff4b37c9 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > ihold(inode); > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > spin_unlock(&mdsc->cap_dirty_lock); > + ceph_wait_on_async_create(inode); > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > iput(inode); > spin_lock(&mdsc->cap_dirty_lock); This looks good. Possibly we can add one dedicated list to store the async creating inodes instead of getting stuck all the others ? -- Xiubo
On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote: > On 6/7/22 7:31 AM, Jeff Layton wrote: > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > > reply, we'll end up spinning around on the same inode in > > flush_dirty_session_caps. Wait for the async create reply before > > flushing caps. > > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > > URL: https://tracker.ceph.com/issues/55823 > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/ceph/caps.c | 1 + > > 1 file changed, 1 insertion(+) > > > > I don't know if this will fix the tx queue stalls completely, but I > > haven't seen one with this patch in place. I think it makes sense on its > > own, either way. > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > index 0a48bf829671..5ecfff4b37c9 100644 > > --- a/fs/ceph/caps.c > > +++ b/fs/ceph/caps.c > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > > ihold(inode); > > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > > spin_unlock(&mdsc->cap_dirty_lock); > > + ceph_wait_on_async_create(inode); > > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > > iput(inode); > > spin_lock(&mdsc->cap_dirty_lock); > > This looks good. > > Possibly we can add one dedicated list to store the async creating > inodes instead of getting stuck all the others ? > I'd be open to that. I think we ought to take this patch first to fix the immediate bug though, before we add extra complexity.
On 6/7/22 9:21 AM, Jeff Layton wrote: > On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote: >> On 6/7/22 7:31 AM, Jeff Layton wrote: >>> Currently, we'll call ceph_check_caps, but if we're still waiting on the >>> reply, we'll end up spinning around on the same inode in >>> flush_dirty_session_caps. Wait for the async create reply before >>> flushing caps. >>> >>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) >>> URL: https://tracker.ceph.com/issues/55823 >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> --- >>> fs/ceph/caps.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> I don't know if this will fix the tx queue stalls completely, but I >>> haven't seen one with this patch in place. I think it makes sense on its >>> own, either way. >>> >>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >>> index 0a48bf829671..5ecfff4b37c9 100644 >>> --- a/fs/ceph/caps.c >>> +++ b/fs/ceph/caps.c >>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) >>> ihold(inode); >>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); >>> spin_unlock(&mdsc->cap_dirty_lock); >>> + ceph_wait_on_async_create(inode); >>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); >>> iput(inode); >>> spin_lock(&mdsc->cap_dirty_lock); >> This looks good. >> >> Possibly we can add one dedicated list to store the async creating >> inodes instead of getting stuck all the others ? >> > I'd be open to that. I think we ought to take this patch first to fix > the immediate bug though, before we add extra complexity. Sounds good to me. I will merge it to the testing branch for now and let's improve it later. Thanks -- Xiubo
On Tue, 2022-06-07 at 09:50 +0800, Xiubo Li wrote: > On 6/7/22 9:21 AM, Jeff Layton wrote: > > On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote: > > > On 6/7/22 7:31 AM, Jeff Layton wrote: > > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > > > > reply, we'll end up spinning around on the same inode in > > > > flush_dirty_session_caps. Wait for the async create reply before > > > > flushing caps. > > > > > > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > > > > URL: https://tracker.ceph.com/issues/55823 > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > fs/ceph/caps.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > I don't know if this will fix the tx queue stalls completely, but I > > > > haven't seen one with this patch in place. I think it makes sense on its > > > > own, either way. > > > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > > index 0a48bf829671..5ecfff4b37c9 100644 > > > > --- a/fs/ceph/caps.c > > > > +++ b/fs/ceph/caps.c > > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > > > > ihold(inode); > > > > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > > > > spin_unlock(&mdsc->cap_dirty_lock); > > > > + ceph_wait_on_async_create(inode); > > > > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > > > > iput(inode); > > > > spin_lock(&mdsc->cap_dirty_lock); > > > This looks good. > > > > > > Possibly we can add one dedicated list to store the async creating > > > inodes instead of getting stuck all the others ? > > > > > I'd be open to that. I think we ought to take this patch first to fix > > the immediate bug though, before we add extra complexity. > > Sounds good to me. > > I will merge it to the testing branch for now and let's improve it later. > Can we also mark this for stable? It's a pretty nasty bug, potentially. We should get this into mainline fairly soon. Thanks,
On 6/8/22 9:58 PM, Jeff Layton wrote: > On Tue, 2022-06-07 at 09:50 +0800, Xiubo Li wrote: >> On 6/7/22 9:21 AM, Jeff Layton wrote: >>> On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote: >>>> On 6/7/22 7:31 AM, Jeff Layton wrote: >>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the >>>>> reply, we'll end up spinning around on the same inode in >>>>> flush_dirty_session_caps. Wait for the async create reply before >>>>> flushing caps. >>>>> >>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) >>>>> URL: https://tracker.ceph.com/issues/55823 >>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>>> --- >>>>> fs/ceph/caps.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> I don't know if this will fix the tx queue stalls completely, but I >>>>> haven't seen one with this patch in place. I think it makes sense on its >>>>> own, either way. >>>>> >>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >>>>> index 0a48bf829671..5ecfff4b37c9 100644 >>>>> --- a/fs/ceph/caps.c >>>>> +++ b/fs/ceph/caps.c >>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) >>>>> ihold(inode); >>>>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); >>>>> spin_unlock(&mdsc->cap_dirty_lock); >>>>> + ceph_wait_on_async_create(inode); >>>>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); >>>>> iput(inode); >>>>> spin_lock(&mdsc->cap_dirty_lock); >>>> This looks good. >>>> >>>> Possibly we can add one dedicated list to store the async creating >>>> inodes instead of getting stuck all the others ? >>>> >>> I'd be open to that. I think we ought to take this patch first to fix >>> the immediate bug though, before we add extra complexity. >> Sounds good to me. >> >> I will merge it to the testing branch for now and let's improve it later. >> > Can we also mark this for stable? It's a pretty nasty bug, potentially. > We should get this into mainline fairly soon. Yeah, sure. > > Thanks,
The recent series of patches that add "wait on async xxxx" at various places do not seem correct. The correct fix should make mds avoid any wait when handling async requests. On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > reply, we'll end up spinning around on the same inode in > flush_dirty_session_caps. Wait for the async create reply before > flushing caps. > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > URL: https://tracker.ceph.com/issues/55823 > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/ceph/caps.c | 1 + > 1 file changed, 1 insertion(+) > > I don't know if this will fix the tx queue stalls completely, but I > haven't seen one with this patch in place. I think it makes sense on its > own, either way. > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index 0a48bf829671..5ecfff4b37c9 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > ihold(inode); > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > spin_unlock(&mdsc->cap_dirty_lock); > + ceph_wait_on_async_create(inode); > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > iput(inode); > spin_lock(&mdsc->cap_dirty_lock); > -- > 2.36.1 >
On 6/9/22 10:15 AM, Yan, Zheng wrote: > The recent series of patches that add "wait on async xxxx" at various > places do not seem correct. The correct fix should make mds avoid any > wait when handling async requests. > In this case I am thinking what will happen if the async create request is deferred, then the cap flush related request should fail to find the ino. Should we wait ? Then how to distinguish from migrating a subtree and a deferred async create cases ? > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: >> Currently, we'll call ceph_check_caps, but if we're still waiting on the >> reply, we'll end up spinning around on the same inode in >> flush_dirty_session_caps. Wait for the async create reply before >> flushing caps. >> >> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) >> URL: https://tracker.ceph.com/issues/55823 >> Signed-off-by: Jeff Layton <jlayton@kernel.org> >> --- >> fs/ceph/caps.c | 1 + >> 1 file changed, 1 insertion(+) >> >> I don't know if this will fix the tx queue stalls completely, but I >> haven't seen one with this patch in place. I think it makes sense on its >> own, either way. >> >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >> index 0a48bf829671..5ecfff4b37c9 100644 >> --- a/fs/ceph/caps.c >> +++ b/fs/ceph/caps.c >> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) >> ihold(inode); >> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); >> spin_unlock(&mdsc->cap_dirty_lock); >> + ceph_wait_on_async_create(inode); >> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); >> iput(inode); >> spin_lock(&mdsc->cap_dirty_lock); >> -- >> 2.36.1 >>
On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote: > > > On 6/9/22 10:15 AM, Yan, Zheng wrote: > > The recent series of patches that add "wait on async xxxx" at various > > places do not seem correct. The correct fix should make mds avoid any > > wait when handling async requests. > > > In this case I am thinking what will happen if the async create request > is deferred, then the cap flush related request should fail to find the > ino. > > Should we wait ? Then how to distinguish from migrating a subtree and a > deferred async create cases ? > async op caps are revoked at freezingtree stage of subtree migration. see Locker::invalidate_lock_caches > > > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > >> Currently, we'll call ceph_check_caps, but if we're still waiting on the > >> reply, we'll end up spinning around on the same inode in > >> flush_dirty_session_caps. Wait for the async create reply before > >> flushing caps. > >> > >> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > >> URL: https://tracker.ceph.com/issues/55823 > >> Signed-off-by: Jeff Layton <jlayton@kernel.org> > >> --- > >> fs/ceph/caps.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> I don't know if this will fix the tx queue stalls completely, but I > >> haven't seen one with this patch in place. I think it makes sense on its > >> own, either way. > >> > >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > >> index 0a48bf829671..5ecfff4b37c9 100644 > >> --- a/fs/ceph/caps.c > >> +++ b/fs/ceph/caps.c > >> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > >> ihold(inode); > >> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > >> spin_unlock(&mdsc->cap_dirty_lock); > >> + ceph_wait_on_async_create(inode); > >> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > >> iput(inode); > >> spin_lock(&mdsc->cap_dirty_lock); > >> -- > >> 2.36.1 > >> >
On 6/9/22 11:29 AM, Yan, Zheng wrote: > On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote: >> >> On 6/9/22 10:15 AM, Yan, Zheng wrote: >>> The recent series of patches that add "wait on async xxxx" at various >>> places do not seem correct. The correct fix should make mds avoid any >>> wait when handling async requests. >>> >> In this case I am thinking what will happen if the async create request >> is deferred, then the cap flush related request should fail to find the >> ino. >> >> Should we wait ? Then how to distinguish from migrating a subtree and a >> deferred async create cases ? >> > async op caps are revoked at freezingtree stage of subtree migration. > see Locker::invalidate_lock_caches > Sorry I may not totally understand this issue. I think you mean in case of migration and then the MDS will revoke caps for the async create files and then the kclient will send a MclientCap request to mds, right ? If my understanding is correct, there is another case that: 1, async create a fileA 2, then write a lot of data to it and then release the Fw cap ref, and if we should report the size to MDS, it will send a MclientCap request to MDS too. 3, what if the async create of fileA was deferred due to some reason, then the MclientCap request will fail to find the ino ? >>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: >>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the >>>> reply, we'll end up spinning around on the same inode in >>>> flush_dirty_session_caps. Wait for the async create reply before >>>> flushing caps. >>>> >>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) >>>> URL: https://tracker.ceph.com/issues/55823 >>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>> --- >>>> fs/ceph/caps.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> I don't know if this will fix the tx queue stalls completely, but I >>>> haven't seen one with this patch in place. I think it makes sense on its >>>> own, either way. >>>> >>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >>>> index 0a48bf829671..5ecfff4b37c9 100644 >>>> --- a/fs/ceph/caps.c >>>> +++ b/fs/ceph/caps.c >>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) >>>> ihold(inode); >>>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); >>>> spin_unlock(&mdsc->cap_dirty_lock); >>>> + ceph_wait_on_async_create(inode); >>>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); >>>> iput(inode); >>>> spin_lock(&mdsc->cap_dirty_lock); >>>> -- >>>> 2.36.1 >>>>
On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote: > > > On 6/9/22 11:29 AM, Yan, Zheng wrote: > > On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote: > >> > >> On 6/9/22 10:15 AM, Yan, Zheng wrote: > >>> The recent series of patches that add "wait on async xxxx" at various > >>> places do not seem correct. The correct fix should make mds avoid any > >>> wait when handling async requests. > >>> > >> In this case I am thinking what will happen if the async create request > >> is deferred, then the cap flush related request should fail to find the > >> ino. > >> > >> Should we wait ? Then how to distinguish from migrating a subtree and a > >> deferred async create cases ? > >> > > async op caps are revoked at freezingtree stage of subtree migration. > > see Locker::invalidate_lock_caches > > > Sorry I may not totally understand this issue. > > I think you mean in case of migration and then the MDS will revoke caps > for the async create files and then the kclient will send a MclientCap > request to mds, right ? > > If my understanding is correct, there is another case that: > > 1, async create a fileA > > 2, then write a lot of data to it and then release the Fw cap ref, and > if we should report the size to MDS, it will send a MclientCap request > to MDS too. > > 3, what if the async create of fileA was deferred due to some reason, > then the MclientCap request will fail to find the ino ? > Async op should not be deferred in any case. > > >>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > >>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the > >>>> reply, we'll end up spinning around on the same inode in > >>>> flush_dirty_session_caps. Wait for the async create reply before > >>>> flushing caps. > >>>> > >>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > >>>> URL: https://tracker.ceph.com/issues/55823 > >>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> > >>>> --- > >>>> fs/ceph/caps.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> I don't know if this will fix the tx queue stalls completely, but I > >>>> haven't seen one with this patch in place. I think it makes sense on its > >>>> own, either way. > >>>> > >>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > >>>> index 0a48bf829671..5ecfff4b37c9 100644 > >>>> --- a/fs/ceph/caps.c > >>>> +++ b/fs/ceph/caps.c > >>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > >>>> ihold(inode); > >>>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > >>>> spin_unlock(&mdsc->cap_dirty_lock); > >>>> + ceph_wait_on_async_create(inode); > >>>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > >>>> iput(inode); > >>>> spin_lock(&mdsc->cap_dirty_lock); > >>>> -- > >>>> 2.36.1 > >>>> >
On 6/9/22 12:02 PM, Yan, Zheng wrote: > On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote: >> >> On 6/9/22 11:29 AM, Yan, Zheng wrote: >>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote: >>>> On 6/9/22 10:15 AM, Yan, Zheng wrote: >>>>> The recent series of patches that add "wait on async xxxx" at various >>>>> places do not seem correct. The correct fix should make mds avoid any >>>>> wait when handling async requests. >>>>> >>>> In this case I am thinking what will happen if the async create request >>>> is deferred, then the cap flush related request should fail to find the >>>> ino. >>>> >>>> Should we wait ? Then how to distinguish from migrating a subtree and a >>>> deferred async create cases ? >>>> >>> async op caps are revoked at freezingtree stage of subtree migration. >>> see Locker::invalidate_lock_caches >>> >> Sorry I may not totally understand this issue. >> >> I think you mean in case of migration and then the MDS will revoke caps >> for the async create files and then the kclient will send a MclientCap >> request to mds, right ? >> >> If my understanding is correct, there is another case that: >> >> 1, async create a fileA >> >> 2, then write a lot of data to it and then release the Fw cap ref, and >> if we should report the size to MDS, it will send a MclientCap request >> to MDS too. >> >> 3, what if the async create of fileA was deferred due to some reason, >> then the MclientCap request will fail to find the ino ? >> > Async op should not be deferred in any case. I am still checking the 'mdcache->path_traverse()', in which it seems could forward the request or requeue the request when failing to acquire locks. And also in case [1]. [1] https://github.com/ceph/ceph/blob/main/src/mds/Server.cc#L4501. >>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: >>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the >>>>>> reply, we'll end up spinning around on the same inode in >>>>>> flush_dirty_session_caps. Wait for the async create reply before >>>>>> flushing caps. >>>>>> >>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) >>>>>> URL: https://tracker.ceph.com/issues/55823 >>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>>>> --- >>>>>> fs/ceph/caps.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> I don't know if this will fix the tx queue stalls completely, but I >>>>>> haven't seen one with this patch in place. I think it makes sense on its >>>>>> own, either way. >>>>>> >>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >>>>>> index 0a48bf829671..5ecfff4b37c9 100644 >>>>>> --- a/fs/ceph/caps.c >>>>>> +++ b/fs/ceph/caps.c >>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) >>>>>> ihold(inode); >>>>>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); >>>>>> spin_unlock(&mdsc->cap_dirty_lock); >>>>>> + ceph_wait_on_async_create(inode); >>>>>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); >>>>>> iput(inode); >>>>>> spin_lock(&mdsc->cap_dirty_lock); >>>>>> -- >>>>>> 2.36.1 >>>>>>
On 6/9/22 12:02 PM, Yan, Zheng wrote: > On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote: >> >> On 6/9/22 11:29 AM, Yan, Zheng wrote: >>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote: >>>> On 6/9/22 10:15 AM, Yan, Zheng wrote: >>>>> The recent series of patches that add "wait on async xxxx" at various >>>>> places do not seem correct. The correct fix should make mds avoid any >>>>> wait when handling async requests. >>>>> >>>> In this case I am thinking what will happen if the async create request >>>> is deferred, then the cap flush related request should fail to find the >>>> ino. >>>> >>>> Should we wait ? Then how to distinguish from migrating a subtree and a >>>> deferred async create cases ? >>>> >>> async op caps are revoked at freezingtree stage of subtree migration. >>> see Locker::invalidate_lock_caches >>> >> Sorry I may not totally understand this issue. >> >> I think you mean in case of migration and then the MDS will revoke caps >> for the async create files and then the kclient will send a MclientCap >> request to mds, right ? >> >> If my understanding is correct, there is another case that: >> >> 1, async create a fileA >> >> 2, then write a lot of data to it and then release the Fw cap ref, and >> if we should report the size to MDS, it will send a MclientCap request >> to MDS too. >> >> 3, what if the async create of fileA was deferred due to some reason, >> then the MclientCap request will fail to find the ino ? >> > Async op should not be deferred in any case. Recently we have hit a similar bug, caused by deferring a request and requeuing it and the following request was executed before it. >>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: >>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the >>>>>> reply, we'll end up spinning around on the same inode in >>>>>> flush_dirty_session_caps. Wait for the async create reply before >>>>>> flushing caps. >>>>>> >>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) >>>>>> URL: https://tracker.ceph.com/issues/55823 >>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>>>> --- >>>>>> fs/ceph/caps.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> I don't know if this will fix the tx queue stalls completely, but I >>>>>> haven't seen one with this patch in place. I think it makes sense on its >>>>>> own, either way. >>>>>> >>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >>>>>> index 0a48bf829671..5ecfff4b37c9 100644 >>>>>> --- a/fs/ceph/caps.c >>>>>> +++ b/fs/ceph/caps.c >>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) >>>>>> ihold(inode); >>>>>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); >>>>>> spin_unlock(&mdsc->cap_dirty_lock); >>>>>> + ceph_wait_on_async_create(inode); >>>>>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); >>>>>> iput(inode); >>>>>> spin_lock(&mdsc->cap_dirty_lock); >>>>>> -- >>>>>> 2.36.1 >>>>>>
On Thu, Jun 9, 2022 at 12:18 PM Xiubo Li <xiubli@redhat.com> wrote: > > > On 6/9/22 12:02 PM, Yan, Zheng wrote: > > On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@redhat.com> wrote: > >> > >> On 6/9/22 11:29 AM, Yan, Zheng wrote: > >>> On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@redhat.com> wrote: > >>>> On 6/9/22 10:15 AM, Yan, Zheng wrote: > >>>>> The recent series of patches that add "wait on async xxxx" at various > >>>>> places do not seem correct. The correct fix should make mds avoid any > >>>>> wait when handling async requests. > >>>>> > >>>> In this case I am thinking what will happen if the async create request > >>>> is deferred, then the cap flush related request should fail to find the > >>>> ino. > >>>> > >>>> Should we wait ? Then how to distinguish from migrating a subtree and a > >>>> deferred async create cases ? > >>>> > >>> async op caps are revoked at freezingtree stage of subtree migration. > >>> see Locker::invalidate_lock_caches > >>> > >> Sorry I may not totally understand this issue. > >> > >> I think you mean in case of migration and then the MDS will revoke caps > >> for the async create files and then the kclient will send a MclientCap > >> request to mds, right ? > >> > >> If my understanding is correct, there is another case that: > >> > >> 1, async create a fileA > >> > >> 2, then write a lot of data to it and then release the Fw cap ref, and > >> if we should report the size to MDS, it will send a MclientCap request > >> to MDS too. > >> > >> 3, what if the async create of fileA was deferred due to some reason, > >> then the MclientCap request will fail to find the ino ? > >> > > Async op should not be deferred in any case. > > Recently we have hit a similar bug, caused by deferring a request and > requeuing it and the following request was executed before it. > that bug is mds bug. > >>>>> On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > >>>>>> Currently, we'll call ceph_check_caps, but if we're still waiting on the > >>>>>> reply, we'll end up spinning around on the same inode in > >>>>>> flush_dirty_session_caps. Wait for the async create reply before > >>>>>> flushing caps. > >>>>>> > >>>>>> Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > >>>>>> URL: https://tracker.ceph.com/issues/55823 > >>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> > >>>>>> --- > >>>>>> fs/ceph/caps.c | 1 + > >>>>>> 1 file changed, 1 insertion(+) > >>>>>> > >>>>>> I don't know if this will fix the tx queue stalls completely, but I > >>>>>> haven't seen one with this patch in place. I think it makes sense on its > >>>>>> own, either way. > >>>>>> > >>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > >>>>>> index 0a48bf829671..5ecfff4b37c9 100644 > >>>>>> --- a/fs/ceph/caps.c > >>>>>> +++ b/fs/ceph/caps.c > >>>>>> @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > >>>>>> ihold(inode); > >>>>>> dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > >>>>>> spin_unlock(&mdsc->cap_dirty_lock); > >>>>>> + ceph_wait_on_async_create(inode); > >>>>>> ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > >>>>>> iput(inode); > >>>>>> spin_lock(&mdsc->cap_dirty_lock); > >>>>>> -- > >>>>>> 2.36.1 > >>>>>> >
Are you suggesting that the MDS ought to hold a cap message for an inode before its create request is processed? Note that the MDS won't even be aware that the inode even _exists_ at that point. As far as the MDS knows, it's just be a delegated inode number to the client. At what point does the MDS give up on holding such a cap request if the create request never comes in for some reason? I don't see the harm in making the client wait until it gets a create reply before sending a cap message. If we want to revert fbed7045f552 instead, we can do that, but it'll cause a regression until the MDS is fixed [1]. Regardless, we need to either take this patch or revert that one. I move that we take this patch for now to address the softlockups. Once the MDS is fixed we could revert this and fbed7045f552 without causing a regression. [1]: https://tracker.ceph.com/issues/54107 On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote: > The recent series of patches that add "wait on async xxxx" at various > places do not seem correct. The correct fix should make mds avoid any > wait when handling async requests. > > > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > > reply, we'll end up spinning around on the same inode in > > flush_dirty_session_caps. Wait for the async create reply before > > flushing caps. > > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > > URL: https://tracker.ceph.com/issues/55823 > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/ceph/caps.c | 1 + > > 1 file changed, 1 insertion(+) > > > > I don't know if this will fix the tx queue stalls completely, but I > > haven't seen one with this patch in place. I think it makes sense on its > > own, either way. > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > index 0a48bf829671..5ecfff4b37c9 100644 > > --- a/fs/ceph/caps.c > > +++ b/fs/ceph/caps.c > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > > ihold(inode); > > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > > spin_unlock(&mdsc->cap_dirty_lock); > > + ceph_wait_on_async_create(inode); > > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > > iput(inode); > > spin_lock(&mdsc->cap_dirty_lock); > > -- > > 2.36.1 > >
On Wed, Jun 29, 2022 at 8:08 PM Jeff Layton <jlayton@kernel.org> wrote: > > Are you suggesting that the MDS ought to hold a cap message for an inode > before its create request is processed? Note that the MDS won't even be > aware that the inode even _exists_ at that point. As far as the MDS > knows, it's just be a delegated inode number to the client. At what > point does the MDS give up on holding such a cap request if the create > request never comes in for some reason? > For an async request, MDS should not process it immediately. If there is any wait when handling async request, it's mds bug. I suggest tracking down any wait, and fix it. > I don't see the harm in making the client wait until it gets a create > reply before sending a cap message. If we want to revert fbed7045f552 > instead, we can do that, but it'll cause a regression until the MDS is > fixed [1]. Regardless, we need to either take this patch or revert that > one. > > I move that we take this patch for now to address the softlockups. Once > the MDS is fixed we could revert this and fbed7045f552 without causing a > regression. > > [1]: https://tracker.ceph.com/issues/54107 > > > On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote: > > The recent series of patches that add "wait on async xxxx" at various > > places do not seem correct. The correct fix should make mds avoid any > > wait when handling async requests. > > > > > > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > > > reply, we'll end up spinning around on the same inode in > > > flush_dirty_session_caps. Wait for the async create reply before > > > flushing caps. > > > > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > > > URL: https://tracker.ceph.com/issues/55823 > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > fs/ceph/caps.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > I don't know if this will fix the tx queue stalls completely, but I > > > haven't seen one with this patch in place. I think it makes sense on its > > > own, either way. > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > index 0a48bf829671..5ecfff4b37c9 100644 > > > --- a/fs/ceph/caps.c > > > +++ b/fs/ceph/caps.c > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > > > ihold(inode); > > > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > > > spin_unlock(&mdsc->cap_dirty_lock); > > > + ceph_wait_on_async_create(inode); > > > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > > > iput(inode); > > > spin_lock(&mdsc->cap_dirty_lock); > > > -- > > > 2.36.1 > > > > > -- > Jeff Layton <jlayton@kernel.org>
On Thu, Jun 30, 2022 at 10:33 AM Yan, Zheng <ukernel@gmail.com> wrote: > > On Wed, Jun 29, 2022 at 8:08 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > Are you suggesting that the MDS ought to hold a cap message for an inode > > before its create request is processed? Note that the MDS won't even be > > aware that the inode even _exists_ at that point. As far as the MDS > > knows, it's just be a delegated inode number to the client. At what > > point does the MDS give up on holding such a cap request if the create > > request never comes in for some reason? > > > For an async request, MDS should not process it immediately. If there > is any wait when handling async request, it's mds bug. I suggest > tracking down any wait, and fix it. > I mean: For an async request, MDS should process it immediately. This is important for preserving request ordering. > > > I don't see the harm in making the client wait until it gets a create > > reply before sending a cap message. If we want to revert fbed7045f552 > > instead, we can do that, but it'll cause a regression until the MDS is > > fixed [1]. Regardless, we need to either take this patch or revert that > > one. > > > > I move that we take this patch for now to address the softlockups. Once > > the MDS is fixed we could revert this and fbed7045f552 without causing a > > regression. > > > > [1]: https://tracker.ceph.com/issues/54107 > > > > > > On Thu, 2022-06-09 at 10:15 +0800, Yan, Zheng wrote: > > > The recent series of patches that add "wait on async xxxx" at various > > > places do not seem correct. The correct fix should make mds avoid any > > > wait when handling async requests. > > > > > > > > > On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > > > > reply, we'll end up spinning around on the same inode in > > > > flush_dirty_session_caps. Wait for the async create reply before > > > > flushing caps. > > > > > > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > > > > URL: https://tracker.ceph.com/issues/55823 > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > fs/ceph/caps.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > I don't know if this will fix the tx queue stalls completely, but I > > > > haven't seen one with this patch in place. I think it makes sense on its > > > > own, either way. > > > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > > index 0a48bf829671..5ecfff4b37c9 100644 > > > > --- a/fs/ceph/caps.c > > > > +++ b/fs/ceph/caps.c > > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > > > > ihold(inode); > > > > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > > > > spin_unlock(&mdsc->cap_dirty_lock); > > > > + ceph_wait_on_async_create(inode); > > > > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > > > > iput(inode); > > > > spin_lock(&mdsc->cap_dirty_lock); > > > > -- > > > > 2.36.1 > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 0a48bf829671..5ecfff4b37c9 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) ihold(inode); dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); spin_unlock(&mdsc->cap_dirty_lock); + ceph_wait_on_async_create(inode); ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); iput(inode); spin_lock(&mdsc->cap_dirty_lock);
Currently, we'll call ceph_check_caps, but if we're still waiting on the reply, we'll end up spinning around on the same inode in flush_dirty_session_caps. Wait for the async create reply before flushing caps. Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) URL: https://tracker.ceph.com/issues/55823 Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/ceph/caps.c | 1 + 1 file changed, 1 insertion(+) I don't know if this will fix the tx queue stalls completely, but I haven't seen one with this patch in place. I think it makes sense on its own, either way.