Message ID | 20180517032944.13230-1-zyan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 17, 2018 at 5:29 AM, Yan, Zheng <zyan@redhat.com> wrote: > In the case of -ENOSPC, writeback thread may wait on itself. The call > stack looks like: > > inode_wait_for_writeback+0x26/0x40 > evict+0xb5/0x1a0 > iput+0x1d2/0x220 > ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] > writepages_finish+0x2d3/0x410 [ceph] > __complete_request+0x26/0x60 [libceph] > complete_request+0x2e/0x70 [libceph] > __submit_request+0x256/0x330 [libceph] > submit_request+0x2b/0x30 [libceph] > ceph_osdc_start_request+0x25/0x40 [libceph] > ceph_writepages_start+0xdfe/0x1320 [ceph] > do_writepages+0x1f/0x70 > __writeback_single_inode+0x45/0x330 > writeback_sb_inodes+0x26a/0x600 > __writeback_inodes_wb+0x92/0xc0 > wb_writeback+0x274/0x330 > wb_workfn+0x2d5/0x3b0 This is exactly what I was worried about when Jeff introduced the possibility of complete_request() on the submit thread. Do you think this is the only such case or there may be others? Another related issue is that normally ->r_callback is invoked without any libceph locks held -- handle_reply() drops both osd->lock and osdc->lock before calling __complete_request(). In this case it is called with both of these locks held. Given that umount -f will use the same mechanism, could you please double check all fs/ceph callbacks? I wonder if we should maybe do something different in libceph... Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2018-05-17 at 11:32 +0200, Ilya Dryomov wrote: > On Thu, May 17, 2018 at 5:29 AM, Yan, Zheng <zyan@redhat.com> wrote: > > In the case of -ENOSPC, writeback thread may wait on itself. The call > > stack looks like: > > > > inode_wait_for_writeback+0x26/0x40 > > evict+0xb5/0x1a0 > > iput+0x1d2/0x220 > > ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] > > writepages_finish+0x2d3/0x410 [ceph] > > __complete_request+0x26/0x60 [libceph] > > complete_request+0x2e/0x70 [libceph] > > __submit_request+0x256/0x330 [libceph] > > submit_request+0x2b/0x30 [libceph] > > ceph_osdc_start_request+0x25/0x40 [libceph] > > ceph_writepages_start+0xdfe/0x1320 [ceph] > > do_writepages+0x1f/0x70 > > __writeback_single_inode+0x45/0x330 > > writeback_sb_inodes+0x26a/0x600 > > __writeback_inodes_wb+0x92/0xc0 > > wb_writeback+0x274/0x330 > > wb_workfn+0x2d5/0x3b0 > > This is exactly what I was worried about when Jeff introduced the > possibility of complete_request() on the submit thread. Do you think > this is the only such case or there may be others? > > Another related issue is that normally ->r_callback is invoked > without any libceph locks held -- handle_reply() drops both osd->lock > and osdc->lock before calling __complete_request(). In this case it > is called with both of these locks held. > Not in the "fail_request" case. The lack of clear locking rules with these callbacks makes it really difficult to suss out these problems. > Given that umount -f will use the same mechanism, could you please > double check all fs/ceph callbacks? I wonder if we should maybe do > something different in libceph... Might a simpler fix be to just have __submit_request queue the complete_request callback to a workqueue in the ENOSPC case? That should be a rare thing in most cases.
On Thu, May 17, 2018 at 1:40 PM, Jeff Layton <jlayton@redhat.com> wrote: > On Thu, 2018-05-17 at 11:32 +0200, Ilya Dryomov wrote: >> On Thu, May 17, 2018 at 5:29 AM, Yan, Zheng <zyan@redhat.com> wrote: >> > In the case of -ENOSPC, writeback thread may wait on itself. The call >> > stack looks like: >> > >> > inode_wait_for_writeback+0x26/0x40 >> > evict+0xb5/0x1a0 >> > iput+0x1d2/0x220 >> > ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] >> > writepages_finish+0x2d3/0x410 [ceph] >> > __complete_request+0x26/0x60 [libceph] >> > complete_request+0x2e/0x70 [libceph] >> > __submit_request+0x256/0x330 [libceph] >> > submit_request+0x2b/0x30 [libceph] >> > ceph_osdc_start_request+0x25/0x40 [libceph] >> > ceph_writepages_start+0xdfe/0x1320 [ceph] >> > do_writepages+0x1f/0x70 >> > __writeback_single_inode+0x45/0x330 >> > writeback_sb_inodes+0x26a/0x600 >> > __writeback_inodes_wb+0x92/0xc0 >> > wb_writeback+0x274/0x330 >> > wb_workfn+0x2d5/0x3b0 >> >> This is exactly what I was worried about when Jeff introduced the >> possibility of complete_request() on the submit thread. Do you think >> this is the only such case or there may be others? >> >> Another related issue is that normally ->r_callback is invoked >> without any libceph locks held -- handle_reply() drops both osd->lock >> and osdc->lock before calling __complete_request(). In this case it >> is called with both of these locks held. >> > > Not in the "fail_request" case. The lack of clear locking rules with > these callbacks makes it really difficult to suss out these problems. Yeah, it was (is?) pretty much the same with Objecter in userspace. The locking issue is old and I guess we have learned to be careful there. Calling the callback from the submit thread is new. > >> Given that umount -f will use the same mechanism, could you please >> double check all fs/ceph callbacks? I wonder if we should maybe do >> something different in libceph... > > Might a simpler fix be to just have __submit_request queue the > complete_request callback to a workqueue in the ENOSPC case? That should > be a rare thing in most cases. That was my thought as well, but it needs to be justified and this stack trace is actually a bad example. In the common case the callback is invoked by the messenger, so blocking is undesirable. Blocking on writeback is particularly so -- unless I'm misunderstanding something, that can deadlock even under normal conditions. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On May 17, 2018, at 20:27, Ilya Dryomov <idryomov@gmail.com> wrote: > > On Thu, May 17, 2018 at 1:40 PM, Jeff Layton <jlayton@redhat.com> wrote: >> On Thu, 2018-05-17 at 11:32 +0200, Ilya Dryomov wrote: >>> On Thu, May 17, 2018 at 5:29 AM, Yan, Zheng <zyan@redhat.com> wrote: >>>> In the case of -ENOSPC, writeback thread may wait on itself. The call >>>> stack looks like: >>>> >>>> inode_wait_for_writeback+0x26/0x40 >>>> evict+0xb5/0x1a0 >>>> iput+0x1d2/0x220 >>>> ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] >>>> writepages_finish+0x2d3/0x410 [ceph] >>>> __complete_request+0x26/0x60 [libceph] >>>> complete_request+0x2e/0x70 [libceph] >>>> __submit_request+0x256/0x330 [libceph] >>>> submit_request+0x2b/0x30 [libceph] >>>> ceph_osdc_start_request+0x25/0x40 [libceph] >>>> ceph_writepages_start+0xdfe/0x1320 [ceph] >>>> do_writepages+0x1f/0x70 >>>> __writeback_single_inode+0x45/0x330 >>>> writeback_sb_inodes+0x26a/0x600 >>>> __writeback_inodes_wb+0x92/0xc0 >>>> wb_writeback+0x274/0x330 >>>> wb_workfn+0x2d5/0x3b0 >>> >>> This is exactly what I was worried about when Jeff introduced the >>> possibility of complete_request() on the submit thread. Do you think >>> this is the only such case or there may be others? >>> >>> Another related issue is that normally ->r_callback is invoked >>> without any libceph locks held -- handle_reply() drops both osd->lock >>> and osdc->lock before calling __complete_request(). In this case it >>> is called with both of these locks held. >>> >> >> Not in the "fail_request" case. The lack of clear locking rules with >> these callbacks makes it really difficult to suss out these problems. > > Yeah, it was (is?) pretty much the same with Objecter in userspace. > The locking issue is old and I guess we have learned to be careful > there. Calling the callback from the submit thread is new. > >> >>> Given that umount -f will use the same mechanism, could you please >>> double check all fs/ceph callbacks? I wonder if we should maybe do >>> something different in libceph... >> >> Might a simpler fix be to just have __submit_request queue the >> complete_request callback to a workqueue in the ENOSPC case? That should >> be a rare thing in most cases. > > That was my thought as well, but it needs to be justified and this > stack trace is actually a bad example. In the common case the callback > is invoked by the messenger, so blocking is undesirable. Blocking on > writeback is particularly so -- unless I'm misunderstanding something, > that can deadlock even under normal conditions. It can’t happen on normal condition. writepages_finish() drops inode’s last reference only when there is no more dirty/writeback page. Writeback should be already done or be about to done. Regards Yan, Zheng > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 17, 2018 at 4:07 PM, Yan, Zheng <zyan@redhat.com> wrote: > > >> On May 17, 2018, at 20:27, Ilya Dryomov <idryomov@gmail.com> wrote: >> >> On Thu, May 17, 2018 at 1:40 PM, Jeff Layton <jlayton@redhat.com> wrote: >>> On Thu, 2018-05-17 at 11:32 +0200, Ilya Dryomov wrote: >>>> On Thu, May 17, 2018 at 5:29 AM, Yan, Zheng <zyan@redhat.com> wrote: >>>>> In the case of -ENOSPC, writeback thread may wait on itself. The call >>>>> stack looks like: >>>>> >>>>> inode_wait_for_writeback+0x26/0x40 >>>>> evict+0xb5/0x1a0 >>>>> iput+0x1d2/0x220 >>>>> ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] >>>>> writepages_finish+0x2d3/0x410 [ceph] >>>>> __complete_request+0x26/0x60 [libceph] >>>>> complete_request+0x2e/0x70 [libceph] >>>>> __submit_request+0x256/0x330 [libceph] >>>>> submit_request+0x2b/0x30 [libceph] >>>>> ceph_osdc_start_request+0x25/0x40 [libceph] >>>>> ceph_writepages_start+0xdfe/0x1320 [ceph] >>>>> do_writepages+0x1f/0x70 >>>>> __writeback_single_inode+0x45/0x330 >>>>> writeback_sb_inodes+0x26a/0x600 >>>>> __writeback_inodes_wb+0x92/0xc0 >>>>> wb_writeback+0x274/0x330 >>>>> wb_workfn+0x2d5/0x3b0 >>>> >>>> This is exactly what I was worried about when Jeff introduced the >>>> possibility of complete_request() on the submit thread. Do you think >>>> this is the only such case or there may be others? >>>> >>>> Another related issue is that normally ->r_callback is invoked >>>> without any libceph locks held -- handle_reply() drops both osd->lock >>>> and osdc->lock before calling __complete_request(). In this case it >>>> is called with both of these locks held. >>>> >>> >>> Not in the "fail_request" case. The lack of clear locking rules with >>> these callbacks makes it really difficult to suss out these problems. >> >> Yeah, it was (is?) pretty much the same with Objecter in userspace. >> The locking issue is old and I guess we have learned to be careful >> there. Calling the callback from the submit thread is new. >> >>> >>>> Given that umount -f will use the same mechanism, could you please >>>> double check all fs/ceph callbacks? I wonder if we should maybe do >>>> something different in libceph... >>> >>> Might a simpler fix be to just have __submit_request queue the >>> complete_request callback to a workqueue in the ENOSPC case? That should >>> be a rare thing in most cases. >> >> That was my thought as well, but it needs to be justified and this >> stack trace is actually a bad example. In the common case the callback >> is invoked by the messenger, so blocking is undesirable. Blocking on >> writeback is particularly so -- unless I'm misunderstanding something, >> that can deadlock even under normal conditions. > > It can’t happen on normal condition. writepages_finish() drops inode’s last reference only when there is no more dirty/writeback page. Writeback should be already done or be about to done. I see, so at most it will wait for the writeback thread to get to inode_sync_complete()? While this patch isn't too ugly, I'm leaning towards adding a finisher for all complete_request() special cases. I'm not convinced this is the only problematic site in fs/ceph and there is a patch pending that makes blocking optional in rbd, so the space is about to grow. I have put this on my list for next week. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5f7ad3d0df2e..9db2f4108951 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -772,6 +772,17 @@ static void writepages_finish(struct ceph_osd_request *req) ceph_release_pages(osd_data->pages, num_pages); } + if (rc < 0 && total_pages) { + /* + * In the case of error, this function may directly get + * called by the thread that does writeback. The writeback + * thread should not drop inode's last reference. Otherwise + * iput_final() may call inode_wait_for_writeback(), which + * waits on writeback. + */ + ihold(inode); + } + ceph_put_wrbuffer_cap_refs(ci, total_pages, snapc); osd_data = osd_req_op_extent_osd_data(req, 0); @@ -781,6 +792,16 @@ static void writepages_finish(struct ceph_osd_request *req) else kfree(osd_data->pages); ceph_osdc_put_request(req); + + if (rc < 0 && total_pages) { + for (;;) { + if (atomic_add_unless(&inode->i_count, -1, 1)) + break; + /* let writeback work drop the last reference */ + if (queue_work(fsc->wb_wq, &ci->i_wb_work)) + break; + } + } } /* diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index df3875fdfa41..aa7c5a4ff137 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1752,9 +1752,17 @@ static void ceph_writeback_work(struct work_struct *work) struct ceph_inode_info *ci = container_of(work, struct ceph_inode_info, i_wb_work); struct inode *inode = &ci->vfs_inode; + int wrbuffer_refs; + + spin_lock(&ci->i_ceph_lock); + wrbuffer_refs = ci->i_wrbuffer_ref; + spin_unlock(&ci->i_ceph_lock); + + if (wrbuffer_refs) { + dout("writeback %p\n", inode); + filemap_fdatawrite(&inode->i_data); + } - dout("writeback %p\n", inode); - filemap_fdatawrite(&inode->i_data); iput(inode); }
In the case of -ENOSPC, writeback thread may wait on itself. The call stack looks like: inode_wait_for_writeback+0x26/0x40 evict+0xb5/0x1a0 iput+0x1d2/0x220 ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] complete_request+0x2e/0x70 [libceph] __submit_request+0x256/0x330 [libceph] submit_request+0x2b/0x30 [libceph] ceph_osdc_start_request+0x25/0x40 [libceph] ceph_writepages_start+0xdfe/0x1320 [ceph] do_writepages+0x1f/0x70 __writeback_single_inode+0x45/0x330 writeback_sb_inodes+0x26a/0x600 __writeback_inodes_wb+0x92/0xc0 wb_writeback+0x274/0x330 wb_workfn+0x2d5/0x3b0 The fix is make writepages_finish() not drop inode's last reference. Link: http://tracker.ceph.com/issues/23978 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> --- fs/ceph/addr.c | 21 +++++++++++++++++++++ fs/ceph/inode.c | 12 ++++++++++-- 2 files changed, 31 insertions(+), 2 deletions(-)