Message ID | 20200604074606.266213-30-david@fromorbit.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs: rework inode flushing to make inode reclaim fully asynchronous | expand |
On Thu, Jun 04, 2020 at 05:46:05PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_iflush_done() does 3 distinct operations to the inodes attached > to the buffer. Separate these operations out into functions so that > it is easier to modify these operations independently in future. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/xfs_inode_item.c | 154 +++++++++++++++++++++------------------- > 1 file changed, 81 insertions(+), 73 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > index dee7385466f83..3894d190ea5b9 100644 > --- a/fs/xfs/xfs_inode_item.c > +++ b/fs/xfs/xfs_inode_item.c > @@ -640,101 +640,64 @@ xfs_inode_item_destroy( > > > /* > - * This is the inode flushing I/O completion routine. It is called > - * from interrupt level when the buffer containing the inode is > - * flushed to disk. It is responsible for removing the inode item > - * from the AIL if it has not been re-logged, and unlocking the inode's > - * flush lock. > - * > - * To reduce AIL lock traffic as much as possible, we scan the buffer log item > - * list for other inodes that will run this function. We remove them from the > - * buffer list so we can process all the inode IO completions in one AIL lock > - * traversal. > - * > - * Note: Now that we attach the log item to the buffer when we first log the > - * inode in memory, we can have unflushed inodes on the buffer list here. These > - * inodes will have a zero ili_last_fields, so skip over them here. > + * We only want to pull the item from the AIL if it is actually there > + * and its location in the log has not changed since we started the > + * flush. Thus, we only bother if the inode's lsn has not changed. > */ > void > -xfs_iflush_done( > - struct xfs_buf *bp) > +xfs_iflush_ail_updates( > + struct xfs_ail *ailp, > + struct list_head *list) > { > - struct xfs_inode_log_item *iip; > - struct xfs_log_item *lip, *n; > - struct xfs_ail *ailp = bp->b_mount->m_ail; > - int need_ail = 0; > - LIST_HEAD(tmp); > + struct xfs_log_item *lip; > + xfs_lsn_t tail_lsn = 0; > > - /* > - * Pull the attached inodes from the buffer one at a time and take the > - * appropriate action on them. > - */ > - list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > - iip = INODE_ITEM(lip); > + /* this is an opencoded batch version of xfs_trans_ail_delete */ > + spin_lock(&ailp->ail_lock); > + list_for_each_entry(lip, list, li_bio_list) { > + xfs_lsn_t lsn; > > - if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > - xfs_iflush_abort(iip->ili_inode); > + if (INODE_ITEM(lip)->ili_flush_lsn != lip->li_lsn) { > + clear_bit(XFS_LI_FAILED, &lip->li_flags); > continue; > } That seems like strange logic. Shouldn't we clear LI_FAILED regardless? > > - if (!iip->ili_last_fields) > - continue; > - > - list_move_tail(&lip->li_bio_list, &tmp); > - > - /* Do an unlocked check for needing the AIL lock. */ > - if (iip->ili_flush_lsn == lip->li_lsn || > - test_bit(XFS_LI_FAILED, &lip->li_flags)) > - need_ail++; > + lsn = xfs_ail_delete_one(ailp, lip); > + if (!tail_lsn && lsn) > + tail_lsn = lsn; > } > + xfs_ail_update_finish(ailp, tail_lsn); > +} > ... > @@ -745,6 +708,51 @@ xfs_iflush_done( > } > } > > +/* > + * Inode buffer IO completion routine. It is responsible for removing inodes > + * attached to the buffer from the AIL if they have not been re-logged, as well > + * as completing the flush and unlocking the inode. > + */ > +void > +xfs_iflush_done( > + struct xfs_buf *bp) > +{ > + struct xfs_log_item *lip, *n; > + LIST_HEAD(flushed_inodes); > + LIST_HEAD(ail_updates); > + > + /* > + * Pull the attached inodes from the buffer one at a time and take the > + * appropriate action on them. > + */ > + list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > + struct xfs_inode_log_item *iip = INODE_ITEM(lip); > + > + if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > + xfs_iflush_abort(iip->ili_inode); > + continue; > + } > + if (!iip->ili_last_fields) > + continue; > + > + /* Do an unlocked check for needing the AIL lock. */ > + if (iip->ili_flush_lsn == lip->li_lsn || > + test_bit(XFS_LI_FAILED, &lip->li_flags)) > + list_move_tail(&lip->li_bio_list, &ail_updates); > + else > + list_move_tail(&lip->li_bio_list, &flushed_inodes); Not sure I see the point of having two lists here, particularly since this is all based on lockless logic. At the very least it's a subtle change in AIL processing logic and I don't think that should be buried in a refactoring patch. Brian > + } > + > + if (!list_empty(&ail_updates)) { > + xfs_iflush_ail_updates(bp->b_mount->m_ail, &ail_updates); > + list_splice_tail(&ail_updates, &flushed_inodes); > + } > + > + xfs_iflush_finish(bp, &flushed_inodes); > + if (!list_empty(&flushed_inodes)) > + list_splice_tail(&flushed_inodes, &bp->b_li_list); > +} > + > /* > * This is the inode flushing abort routine. It is called from xfs_iflush when > * the filesystem is shutting down to clean up the inode state. It is > -- > 2.26.2.761.g0e0b3e54be >
On Tue, Jun 09, 2020 at 09:12:49AM -0400, Brian Foster wrote: > On Thu, Jun 04, 2020 at 05:46:05PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > xfs_iflush_done() does 3 distinct operations to the inodes attached > > to the buffer. Separate these operations out into functions so that > > it is easier to modify these operations independently in future. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > fs/xfs/xfs_inode_item.c | 154 +++++++++++++++++++++------------------- > > 1 file changed, 81 insertions(+), 73 deletions(-) > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > index dee7385466f83..3894d190ea5b9 100644 > > --- a/fs/xfs/xfs_inode_item.c > > +++ b/fs/xfs/xfs_inode_item.c > > @@ -640,101 +640,64 @@ xfs_inode_item_destroy( > > > > > > /* > > - * This is the inode flushing I/O completion routine. It is called > > - * from interrupt level when the buffer containing the inode is > > - * flushed to disk. It is responsible for removing the inode item > > - * from the AIL if it has not been re-logged, and unlocking the inode's > > - * flush lock. > > - * > > - * To reduce AIL lock traffic as much as possible, we scan the buffer log item > > - * list for other inodes that will run this function. We remove them from the > > - * buffer list so we can process all the inode IO completions in one AIL lock > > - * traversal. > > - * > > - * Note: Now that we attach the log item to the buffer when we first log the > > - * inode in memory, we can have unflushed inodes on the buffer list here. These > > - * inodes will have a zero ili_last_fields, so skip over them here. > > + * We only want to pull the item from the AIL if it is actually there > > + * and its location in the log has not changed since we started the > > + * flush. Thus, we only bother if the inode's lsn has not changed. > > */ > > void > > -xfs_iflush_done( > > - struct xfs_buf *bp) > > +xfs_iflush_ail_updates( > > + struct xfs_ail *ailp, > > + struct list_head *list) > > { > > - struct xfs_inode_log_item *iip; > > - struct xfs_log_item *lip, *n; > > - struct xfs_ail *ailp = bp->b_mount->m_ail; > > - int need_ail = 0; > > - LIST_HEAD(tmp); > > + struct xfs_log_item *lip; > > + xfs_lsn_t tail_lsn = 0; > > > > - /* > > - * Pull the attached inodes from the buffer one at a time and take the > > - * appropriate action on them. > > - */ > > - list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > > - iip = INODE_ITEM(lip); > > + /* this is an opencoded batch version of xfs_trans_ail_delete */ > > + spin_lock(&ailp->ail_lock); > > + list_for_each_entry(lip, list, li_bio_list) { > > + xfs_lsn_t lsn; > > > > - if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > > - xfs_iflush_abort(iip->ili_inode); > > + if (INODE_ITEM(lip)->ili_flush_lsn != lip->li_lsn) { > > + clear_bit(XFS_LI_FAILED, &lip->li_flags); > > continue; > > } > > That seems like strange logic. Shouldn't we clear LI_FAILED regardless? It's the same logic as before this patch series: if (INODE_ITEM(blip)->ili_logged && blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) { /* * xfs_ail_update_finish() only cares about the * lsn of the first tail item removed, any * others will be at the same or higher lsn so * we just ignore them. */ xfs_lsn_t lsn = xfs_ail_delete_one(ailp, blip); if (!tail_lsn && lsn) tail_lsn = lsn; } else { xfs_clear_li_failed(blip); } I've just re-ordered it to check for relogged inodes first instead of handling that in the else branch. i.e. we do clear XFS_LI_FAILED always: xfs_ail_delete_one() does it for the log items that are being removed from the AIL.... > > +/* > > + * Inode buffer IO completion routine. It is responsible for removing inodes > > + * attached to the buffer from the AIL if they have not been re-logged, as well > > + * as completing the flush and unlocking the inode. > > + */ > > +void > > +xfs_iflush_done( > > + struct xfs_buf *bp) > > +{ > > + struct xfs_log_item *lip, *n; > > + LIST_HEAD(flushed_inodes); > > + LIST_HEAD(ail_updates); > > + > > + /* > > + * Pull the attached inodes from the buffer one at a time and take the > > + * appropriate action on them. > > + */ > > + list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > > + struct xfs_inode_log_item *iip = INODE_ITEM(lip); > > + > > + if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > > + xfs_iflush_abort(iip->ili_inode); > > + continue; > > + } > > + if (!iip->ili_last_fields) > > + continue; > > + > > + /* Do an unlocked check for needing the AIL lock. */ > > + if (iip->ili_flush_lsn == lip->li_lsn || > > + test_bit(XFS_LI_FAILED, &lip->li_flags)) > > + list_move_tail(&lip->li_bio_list, &ail_updates); > > + else > > + list_move_tail(&lip->li_bio_list, &flushed_inodes); > > Not sure I see the point of having two lists here, particularly since > this is all based on lockless logic. It's not lockless - it's all done under the buffer lock which protects the buffer log item list... > At the very least it's a subtle > change in AIL processing logic and I don't think that should be buried > in a refactoring patch. I don't think it changes logic at all - what am I missing? FWIW, I untangled the function this way because the "track dirty inodes by ordered buffers" patchset completely removes the AIL stuff - the ail_updates list and the xfs_iflush_ail_updates() function go away completely and the rest of the refactoring remains unchanged. i.e. as the commit messages says, this change makes follow-on patches much easier to understand... Cheers, Dave.
On Wed, Jun 10, 2020 at 08:14:31AM +1000, Dave Chinner wrote: > On Tue, Jun 09, 2020 at 09:12:49AM -0400, Brian Foster wrote: > > On Thu, Jun 04, 2020 at 05:46:05PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > xfs_iflush_done() does 3 distinct operations to the inodes attached > > > to the buffer. Separate these operations out into functions so that > > > it is easier to modify these operations independently in future. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > > --- > > > fs/xfs/xfs_inode_item.c | 154 +++++++++++++++++++++------------------- > > > 1 file changed, 81 insertions(+), 73 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > > index dee7385466f83..3894d190ea5b9 100644 > > > --- a/fs/xfs/xfs_inode_item.c > > > +++ b/fs/xfs/xfs_inode_item.c > > > @@ -640,101 +640,64 @@ xfs_inode_item_destroy( > > > > > > > > > /* > > > - * This is the inode flushing I/O completion routine. It is called > > > - * from interrupt level when the buffer containing the inode is > > > - * flushed to disk. It is responsible for removing the inode item > > > - * from the AIL if it has not been re-logged, and unlocking the inode's > > > - * flush lock. > > > - * > > > - * To reduce AIL lock traffic as much as possible, we scan the buffer log item > > > - * list for other inodes that will run this function. We remove them from the > > > - * buffer list so we can process all the inode IO completions in one AIL lock > > > - * traversal. > > > - * > > > - * Note: Now that we attach the log item to the buffer when we first log the > > > - * inode in memory, we can have unflushed inodes on the buffer list here. These > > > - * inodes will have a zero ili_last_fields, so skip over them here. > > > + * We only want to pull the item from the AIL if it is actually there > > > + * and its location in the log has not changed since we started the > > > + * flush. Thus, we only bother if the inode's lsn has not changed. > > > */ > > > void > > > -xfs_iflush_done( > > > - struct xfs_buf *bp) > > > +xfs_iflush_ail_updates( > > > + struct xfs_ail *ailp, > > > + struct list_head *list) > > > { > > > - struct xfs_inode_log_item *iip; > > > - struct xfs_log_item *lip, *n; > > > - struct xfs_ail *ailp = bp->b_mount->m_ail; > > > - int need_ail = 0; > > > - LIST_HEAD(tmp); > > > + struct xfs_log_item *lip; > > > + xfs_lsn_t tail_lsn = 0; > > > > > > - /* > > > - * Pull the attached inodes from the buffer one at a time and take the > > > - * appropriate action on them. > > > - */ > > > - list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > > > - iip = INODE_ITEM(lip); > > > + /* this is an opencoded batch version of xfs_trans_ail_delete */ > > > + spin_lock(&ailp->ail_lock); > > > + list_for_each_entry(lip, list, li_bio_list) { > > > + xfs_lsn_t lsn; > > > > > > - if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > > > - xfs_iflush_abort(iip->ili_inode); > > > + if (INODE_ITEM(lip)->ili_flush_lsn != lip->li_lsn) { > > > + clear_bit(XFS_LI_FAILED, &lip->li_flags); > > > continue; > > > } > > > > That seems like strange logic. Shouldn't we clear LI_FAILED regardless? > > It's the same logic as before this patch series: > > if (INODE_ITEM(blip)->ili_logged && > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) { > /* > * xfs_ail_update_finish() only cares about the > * lsn of the first tail item removed, any > * others will be at the same or higher lsn so > * we just ignore them. > */ > xfs_lsn_t lsn = xfs_ail_delete_one(ailp, blip); > if (!tail_lsn && lsn) > tail_lsn = lsn; > } else { > xfs_clear_li_failed(blip); > } > > I've just re-ordered it to check for relogged inodes first instead > of handling that in the else branch. > Hmm.. I guess I'm confused why the logic seems to be swizzled around. An earlier patch lifted the bit clear outside of this check, then we seem to put it back in place in a different order for some reason..? > i.e. we do clear XFS_LI_FAILED always: xfs_ail_delete_one() does it > for the log items that are being removed from the AIL.... > > > > +/* > > > + * Inode buffer IO completion routine. It is responsible for removing inodes > > > + * attached to the buffer from the AIL if they have not been re-logged, as well > > > + * as completing the flush and unlocking the inode. > > > + */ > > > +void > > > +xfs_iflush_done( > > > + struct xfs_buf *bp) > > > +{ > > > + struct xfs_log_item *lip, *n; > > > + LIST_HEAD(flushed_inodes); > > > + LIST_HEAD(ail_updates); > > > + > > > + /* > > > + * Pull the attached inodes from the buffer one at a time and take the > > > + * appropriate action on them. > > > + */ > > > + list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > > > + struct xfs_inode_log_item *iip = INODE_ITEM(lip); > > > + > > > + if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > > > + xfs_iflush_abort(iip->ili_inode); > > > + continue; > > > + } > > > + if (!iip->ili_last_fields) > > > + continue; > > > + > > > + /* Do an unlocked check for needing the AIL lock. */ > > > + if (iip->ili_flush_lsn == lip->li_lsn || > > > + test_bit(XFS_LI_FAILED, &lip->li_flags)) > > > + list_move_tail(&lip->li_bio_list, &ail_updates); > > > + else > > > + list_move_tail(&lip->li_bio_list, &flushed_inodes); > > > > Not sure I see the point of having two lists here, particularly since > > this is all based on lockless logic. > > It's not lockless - it's all done under the buffer lock which > protects the buffer log item list... > > > At the very least it's a subtle > > change in AIL processing logic and I don't think that should be buried > > in a refactoring patch. > > I don't think it changes logic at all - what am I missing? > I'm referring to the fact that we no longer check the lsn of each (flushed) log item attached to the buffer under the ail lock. Note that I am not saying it's necessarily wrong, but rather that IMO it's too subtle a change to silently squash into a refactoring patch. > FWIW, I untangled the function this way because the "track dirty > inodes by ordered buffers" patchset completely removes the AIL stuff > - the ail_updates list and the xfs_iflush_ail_updates() function go > away completely and the rest of the refactoring remains unchanged. > i.e. as the commit messages says, this change makes follow-on > patches much easier to understand... > The general function breakdown seems fine to me. I find the multiple list processing to be a bit overdone, particularly if it doesn't serve a current functional purpose. If the purpose is to support a future patch series, I'd suggest to continue using the existing logic of moving all flushed inodes to a single list and leave the separate list bits to the start of the series where it's useful so it's possible to review with the associated context (or alternatively just defer the entire patch). Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com >
On Wed, Jun 10, 2020 at 09:08:33AM -0400, Brian Foster wrote: > On Wed, Jun 10, 2020 at 08:14:31AM +1000, Dave Chinner wrote: > > On Tue, Jun 09, 2020 at 09:12:49AM -0400, Brian Foster wrote: > > > On Thu, Jun 04, 2020 at 05:46:05PM +1000, Dave Chinner wrote: > > > > - if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > > > > - xfs_iflush_abort(iip->ili_inode); > > > > + if (INODE_ITEM(lip)->ili_flush_lsn != lip->li_lsn) { > > > > + clear_bit(XFS_LI_FAILED, &lip->li_flags); > > > > continue; > > > > } > > > > > > That seems like strange logic. Shouldn't we clear LI_FAILED regardless? > > > > It's the same logic as before this patch series: > > > > if (INODE_ITEM(blip)->ili_logged && > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) { > > /* > > * xfs_ail_update_finish() only cares about the > > * lsn of the first tail item removed, any > > * others will be at the same or higher lsn so > > * we just ignore them. > > */ > > xfs_lsn_t lsn = xfs_ail_delete_one(ailp, blip); > > if (!tail_lsn && lsn) > > tail_lsn = lsn; > > } else { > > xfs_clear_li_failed(blip); > > } > > > > I've just re-ordered it to check for relogged inodes first instead > > of handling that in the else branch. > > Hmm.. I guess I'm confused why the logic seems to be swizzled around. An > earlier patch lifted the bit clear outside of this check, then we seem > to put it back in place in a different order for some reason..? Oh, you're right - xfs_ail_delete_one() doesn't do that anymore - it got pulled up into xfs_trans_ail_delete() instead. So much stuff has changed in this patchset and I've largely moved on to all the followup stuff now, I'm starting to lose track of what this patchset does and the reasons why I did stuff a couple of months ago... I'll fix that up. > > > > + * Inode buffer IO completion routine. It is responsible for removing inodes > > > > + * attached to the buffer from the AIL if they have not been re-logged, as well > > > > + * as completing the flush and unlocking the inode. > > > > + */ > > > > +void > > > > +xfs_iflush_done( > > > > + struct xfs_buf *bp) > > > > +{ > > > > + struct xfs_log_item *lip, *n; > > > > + LIST_HEAD(flushed_inodes); > > > > + LIST_HEAD(ail_updates); > > > > + > > > > + /* > > > > + * Pull the attached inodes from the buffer one at a time and take the > > > > + * appropriate action on them. > > > > + */ > > > > + list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { > > > > + struct xfs_inode_log_item *iip = INODE_ITEM(lip); > > > > + > > > > + if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { > > > > + xfs_iflush_abort(iip->ili_inode); > > > > + continue; > > > > + } > > > > + if (!iip->ili_last_fields) > > > > + continue; > > > > + > > > > + /* Do an unlocked check for needing the AIL lock. */ > > > > + if (iip->ili_flush_lsn == lip->li_lsn || > > > > + test_bit(XFS_LI_FAILED, &lip->li_flags)) > > > > + list_move_tail(&lip->li_bio_list, &ail_updates); > > > > + else > > > > + list_move_tail(&lip->li_bio_list, &flushed_inodes); > > > > > > Not sure I see the point of having two lists here, particularly since > > > this is all based on lockless logic. > > > > It's not lockless - it's all done under the buffer lock which > > protects the buffer log item list... > > > > > At the very least it's a subtle > > > change in AIL processing logic and I don't think that should be buried > > > in a refactoring patch. > > > > I don't think it changes logic at all - what am I missing? > > > > I'm referring to the fact that we no longer check the lsn of each > (flushed) log item attached to the buffer under the ail lock. That whole loop in xfs_iflush_ail_updates() runs under the AIL lock, so it does the right thing for anything that is moved to the "ail_updates" list. If we win the unlocked race (li_lsn does not change) then we move the inode to the ail update list and it gets rechecked under the AIL lock and does the right thing. If we lose the race (li_lsn changes) then the inode has been redirtied and we *don't need to check it under the AIL* - all we need to do is leave it attached to the buffer. This is the same as the old code: win the race, need_ail is incremented and we recheck under the AIL lock. Lose the race and we don't recheck under the AIL because we don't need to. This happened less under the old code, because it typically only happened with single dirty inodes on a cluster buffer (think directory inode under long running large directory modification operations), but that race most definitely existed and the code most definitely handled it correctly. Keep in mind that this inode redirtying/AIL repositioning race can even occur /after/ we've locked and removed items from the AIL but before we've run xfs_iflush_finish(). i.e. we can remove it from the AIL but by the time xfs_iflush_finish() runs it's back in the AIL. > Note that > I am not saying it's necessarily wrong, but rather that IMO it's too > subtle a change to silently squash into a refactoring patch. Except it isn't a change at all. The same subtle issue exists in the code before this patch. It's just that this refactoring makes subtle race conditions that were previously unknown to reviewers so much more obvious they can now see them clearly. That tells me the code is much improved by this refactoring, not that there's a problem that needs reworking.... > > FWIW, I untangled the function this way because the "track dirty > > inodes by ordered buffers" patchset completely removes the AIL stuff > > - the ail_updates list and the xfs_iflush_ail_updates() function go > > away completely and the rest of the refactoring remains unchanged. > > i.e. as the commit messages says, this change makes follow-on > > patches much easier to understand... > > > > The general function breakdown seems fine to me. I find the multiple > list processing to be a bit overdone, particularly if it doesn't serve a > current functional purpose. If the purpose is to support a future patch > series, I'd suggest to continue using the existing logic of moving all > flushed inodes to a single list and leave the separate list bits to the > start of the series where it's useful so it's possible to review with > the associated context (or alternatively just defer the entire patch). That's how I originally did it, and it was a mess. it didn't separate cleanly at all, and didn't make future patches much easier at all. Hence I don't think reworking the patch just to look different gains us anything at this point... Cheers, Dave.
On Thu, Jun 11, 2020 at 10:16:22AM +1000, Dave Chinner wrote: > On Wed, Jun 10, 2020 at 09:08:33AM -0400, Brian Foster wrote: > > On Wed, Jun 10, 2020 at 08:14:31AM +1000, Dave Chinner wrote: > > > On Tue, Jun 09, 2020 at 09:12:49AM -0400, Brian Foster wrote: > > > > On Thu, Jun 04, 2020 at 05:46:05PM +1000, Dave Chinner wrote: ... > > > > I'm referring to the fact that we no longer check the lsn of each > > (flushed) log item attached to the buffer under the ail lock. > > That whole loop in xfs_iflush_ail_updates() runs under the AIL > lock, so it does the right thing for anything that is moved to the > "ail_updates" list. > > If we win the unlocked race (li_lsn does not change) then we move > the inode to the ail update list and it gets rechecked under the AIL > lock and does the right thing. If we lose the race (li_lsn changes) > then the inode has been redirtied and we *don't need to check it > under the AIL* - all we need to do is leave it attached to the > buffer. > > This is the same as the old code: win the race, need_ail is > incremented and we recheck under the AIL lock. Lose the race and > we don't recheck under the AIL because we don't need to. This > happened less under the old code, because it typically only happened > with single dirty inodes on a cluster buffer (think directory inode > under long running large directory modification operations), but > that race most definitely existed and the code most definitely > handled it correctly. > > Keep in mind that this inode redirtying/AIL repositioning race can > even occur /after/ we've locked and removed items from the AIL but > before we've run xfs_iflush_finish(). i.e. we can remove it from the > AIL but by the time xfs_iflush_finish() runs it's back in the AIL. > All of the above would make a nice commit log for an independent patch. ;) Note again that I wasn't suggesting the logic was incorrect... > > Note that > > I am not saying it's necessarily wrong, but rather that IMO it's too > > subtle a change to silently squash into a refactoring patch. > > Except it isn't a change at all. The same subtle issue exists in the > code before this patch. It's just that this refactoring makes subtle > race conditions that were previously unknown to reviewers so much > more obvious they can now see them clearly. That tells me the code > is much improved by this refactoring, not that there's a problem > that needs reworking.... > This patch elevates a bit of code from effectively being an (ail) lock avoidance optimization to essentially per-item filtering logic without any explanation beyond facilitating future modifications. Independent of whether it's correct, this is not purely a refactoring change IMO. > > > FWIW, I untangled the function this way because the "track dirty > > > inodes by ordered buffers" patchset completely removes the AIL stuff > > > - the ail_updates list and the xfs_iflush_ail_updates() function go > > > away completely and the rest of the refactoring remains unchanged. > > > i.e. as the commit messages says, this change makes follow-on > > > patches much easier to understand... > > > > > > > The general function breakdown seems fine to me. I find the multiple > > list processing to be a bit overdone, particularly if it doesn't serve a > > current functional purpose. If the purpose is to support a future patch > > series, I'd suggest to continue using the existing logic of moving all > > flushed inodes to a single list and leave the separate list bits to the > > start of the series where it's useful so it's possible to review with > > the associated context (or alternatively just defer the entire patch). > > That's how I originally did it, and it was a mess. it didn't > separate cleanly at all, and didn't make future patches much easier > at all. Hence I don't think reworking the patch just to look > different gains us anything at this point... > I find that hard to believe. This patch splits the buffer list into two lists, processes the first one, immediately combines it with the second, then processes the second which is no different from the single list that was constructed by the original code. The only reasons I can see for this kind of churn is either to address some kind of performance or efficiency issue or if the lists are used for further changes. The former is not a documented reason and there's no context for the latter because it's apparently part of some future series. TBH, I think this patch should probably be broken down into two or three independent patches anyways. What's the issue with something like the appended diff (on top of this patch) in the meantime? If the multiple list logic is truly necessary, reintroduce it when it's used so it's actually reviewable.. Brian --- 8< --- diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 3894d190ea5b..83580e204560 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -718,8 +718,8 @@ xfs_iflush_done( struct xfs_buf *bp) { struct xfs_log_item *lip, *n; - LIST_HEAD(flushed_inodes); - LIST_HEAD(ail_updates); + int need_ail = 0; + LIST_HEAD(tmp); /* * Pull the attached inodes from the buffer one at a time and take the @@ -732,25 +732,24 @@ xfs_iflush_done( xfs_iflush_abort(iip->ili_inode); continue; } + if (!iip->ili_last_fields) continue; - /* Do an unlocked check for needing the AIL lock. */ + list_move_tail(&lip->li_bio_list, &tmp); + + /* Do an unlocked check for needing AIL processing */ if (iip->ili_flush_lsn == lip->li_lsn || test_bit(XFS_LI_FAILED, &lip->li_flags)) - list_move_tail(&lip->li_bio_list, &ail_updates); - else - list_move_tail(&lip->li_bio_list, &flushed_inodes); + need_ail++; } - if (!list_empty(&ail_updates)) { - xfs_iflush_ail_updates(bp->b_mount->m_ail, &ail_updates); - list_splice_tail(&ail_updates, &flushed_inodes); - } + if (need_ail) + xfs_iflush_ail_updates(bp->b_mount->m_ail, &tmp); - xfs_iflush_finish(bp, &flushed_inodes); - if (!list_empty(&flushed_inodes)) - list_splice_tail(&flushed_inodes, &bp->b_li_list); + xfs_iflush_finish(bp, &tmp); + if (!list_empty(&tmp)) + list_splice_tail(&tmp, &bp->b_li_list); } /*
On Thu, Jun 11, 2020 at 10:07:09AM -0400, Brian Foster wrote: > > TBH, I think this patch should probably be broken down into two or three > independent patches anyways. To what end? The patch is already small, it's simple to understand and it's been tested. What does breaking it up into a bunch more smaller patches actually gain us? It means hours more work on my side without any change in the end result. It's -completely wasted effort- if all I'm doing this for is to get you to issue a RVB on it. Fine grained patches do not come for free, and in a patch series that is already 30 patches long making it even longer just increases the time and resources it costs *me* to maintian it until it is merged. > What's the issue with something like the > appended diff (on top of this patch) in the meantime? If the multiple > list logic is truly necessary, reintroduce it when it's used so it's > actually reviewable.. Nothing. Except it causes conflicts further through my patch set which do the work of removing this AIL specific code. IOWs, it just *increases the amount of work I have to do* without actually providing any benefit to anyone... -Dave.
On Mon, Jun 15, 2020 at 4:50 AM Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Jun 11, 2020 at 10:07:09AM -0400, Brian Foster wrote: > > > > TBH, I think this patch should probably be broken down into two or three > > independent patches anyways. > > To what end? The patch is already small, it's simple to understand > and it's been tested. What does breaking it up into a bunch more > smaller patches actually gain us? > > It means hours more work on my side without any change in the end > result. It's -completely wasted effort- if all I'm doing this for is > to get you to issue a RVB on it. Fine grained patches do not come > for free, and in a patch series that is already 30 patches long > making it even longer just increases the time and resources it costs > *me* to maintian it until it is merged. > This links back to a conversation we started about improving the review process. One of the questions was regarding RVB being too "binary". If I am new to the subsystem, obviously my RVB weights less, but both Darrick and Dave expressed their desire to get review by more eyes to double check the "small details". To that end, Darrick has suggested the RVB be accompanied with "verified correctness" "verified design" or whatnot. So instead of a binary RVB, Brian could express his review outcome in a machine friendly way, because this: "TBH, I think this patch should probably be broken down..." would be interpreted quite differently depending on culture. My interpretation is: "ACK on correctness", "ACK on design", "not happy about patch breakdown", "not a NACK", but I could be wrong. Then, instead of a rigid rule for maintainer to require two RVB per patch, we can employ more fine grained rules, for example: - No NACKs - Two RVB for correctness - One RVB for design - etc.. Also, it could help to write down review rules for the subsystem in a bureaucratic document that can be agreed on, so that not every patch needs to have the discussion about whether breaking this patch up is mandatory or not. There is a big difference between saying: "this patch is too big for me to review, I will not be doing as good a job of review if it isn't split into patches" and "this patch has already been reviewed, I already know everything there is to know about it, there are no bisect-ability issues, but for aesthetics, I think it should be broken down". I am not saying that latter opinion should not be voiced, but when said it should be said explicitly. High coding standards have gotten xfs very far, but failing to maintain a healthy balance with pragmatism will inevitably come with a price. Need I remind you that the issue that Dave is fixing has been affecting real users and it has been reported over three years ago? Thanks, Amir.
On Mon, Jun 15, 2020 at 11:49:57AM +1000, Dave Chinner wrote: > On Thu, Jun 11, 2020 at 10:07:09AM -0400, Brian Foster wrote: > > > > TBH, I think this patch should probably be broken down into two or three > > independent patches anyways. > > To what end? The patch is already small, it's simple to understand > and it's been tested. What does breaking it up into a bunch more > smaller patches actually gain us? > I think you overestimate the simplicity to somebody who doesn't have context on whatever upcoming changes you have. I spent more time staring at this wondering what the list filtering logic was for than I would have needed to review the entire patch were those changes not included. > It means hours more work on my side without any change in the end > result. It's -completely wasted effort- if all I'm doing this for is > to get you to issue a RVB on it. Fine grained patches do not come > for free, and in a patch series that is already 30 patches long > making it even longer just increases the time and resources it costs > *me* to maintian it until it is merged. > Note that I said "two or three" and then sent you a diff that breaks it down into two. That addresses my concern. > > What's the issue with something like the > > appended diff (on top of this patch) in the meantime? If the multiple > > list logic is truly necessary, reintroduce it when it's used so it's > > actually reviewable.. > > Nothing. Except it causes conflicts further through my patch set > which do the work of removing this AIL specific code. IOWs, it just > *increases the amount of work I have to do* without actually > providing any benefit to anyone... > Reapply the list filtering logic (reverting the same diff I already sent) at the beginning of your upcoming series that uses it. I sent the diff as a courtesy because you seem to be rather frustrated wrt to any suggestion to change this series, but this seems like a standard case of misplaced code to me with a simple fix. The fact that this is used somehow or another in a series that is so far unposted and unreviewed is not a valid justification IMO. I really don't understand what the issue is here wrt to moving the changes to where they're used. Brian > -Dave. > -- > Dave Chinner > david@fromorbit.com >
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index dee7385466f83..3894d190ea5b9 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -640,101 +640,64 @@ xfs_inode_item_destroy( /* - * This is the inode flushing I/O completion routine. It is called - * from interrupt level when the buffer containing the inode is - * flushed to disk. It is responsible for removing the inode item - * from the AIL if it has not been re-logged, and unlocking the inode's - * flush lock. - * - * To reduce AIL lock traffic as much as possible, we scan the buffer log item - * list for other inodes that will run this function. We remove them from the - * buffer list so we can process all the inode IO completions in one AIL lock - * traversal. - * - * Note: Now that we attach the log item to the buffer when we first log the - * inode in memory, we can have unflushed inodes on the buffer list here. These - * inodes will have a zero ili_last_fields, so skip over them here. + * We only want to pull the item from the AIL if it is actually there + * and its location in the log has not changed since we started the + * flush. Thus, we only bother if the inode's lsn has not changed. */ void -xfs_iflush_done( - struct xfs_buf *bp) +xfs_iflush_ail_updates( + struct xfs_ail *ailp, + struct list_head *list) { - struct xfs_inode_log_item *iip; - struct xfs_log_item *lip, *n; - struct xfs_ail *ailp = bp->b_mount->m_ail; - int need_ail = 0; - LIST_HEAD(tmp); + struct xfs_log_item *lip; + xfs_lsn_t tail_lsn = 0; - /* - * Pull the attached inodes from the buffer one at a time and take the - * appropriate action on them. - */ - list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { - iip = INODE_ITEM(lip); + /* this is an opencoded batch version of xfs_trans_ail_delete */ + spin_lock(&ailp->ail_lock); + list_for_each_entry(lip, list, li_bio_list) { + xfs_lsn_t lsn; - if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { - xfs_iflush_abort(iip->ili_inode); + if (INODE_ITEM(lip)->ili_flush_lsn != lip->li_lsn) { + clear_bit(XFS_LI_FAILED, &lip->li_flags); continue; } - if (!iip->ili_last_fields) - continue; - - list_move_tail(&lip->li_bio_list, &tmp); - - /* Do an unlocked check for needing the AIL lock. */ - if (iip->ili_flush_lsn == lip->li_lsn || - test_bit(XFS_LI_FAILED, &lip->li_flags)) - need_ail++; + lsn = xfs_ail_delete_one(ailp, lip); + if (!tail_lsn && lsn) + tail_lsn = lsn; } + xfs_ail_update_finish(ailp, tail_lsn); +} - /* - * We only want to pull the item from the AIL if it is actually there - * and its location in the log has not changed since we started the - * flush. Thus, we only bother if the inode's lsn has not changed. - */ - if (need_ail) { - xfs_lsn_t tail_lsn = 0; - - /* this is an opencoded batch version of xfs_trans_ail_delete */ - spin_lock(&ailp->ail_lock); - list_for_each_entry(lip, &tmp, li_bio_list) { - clear_bit(XFS_LI_FAILED, &lip->li_flags); - if (lip->li_lsn == INODE_ITEM(lip)->ili_flush_lsn) { - xfs_lsn_t lsn = xfs_ail_delete_one(ailp, lip); - if (!tail_lsn && lsn) - tail_lsn = lsn; - } - } - xfs_ail_update_finish(ailp, tail_lsn); - } +/* + * Walk the list of inodes that have completed their IOs. If they are clean + * remove them from the list and dissociate them from the buffer. Buffers that + * are still dirty remain linked to the buffer and on the list. Caller must + * handle them appropriately. + */ +void +xfs_iflush_finish( + struct xfs_buf *bp, + struct list_head *list) +{ + struct xfs_log_item *lip, *n; - /* - * Clean up and unlock the flush lock now we are done. We can clear the - * ili_last_fields bits now that we know that the data corresponding to - * them is safely on disk. - */ - list_for_each_entry_safe(lip, n, &tmp, li_bio_list) { + list_for_each_entry_safe(lip, n, list, li_bio_list) { + struct xfs_inode_log_item *iip = INODE_ITEM(lip); bool drop_buffer = false; - list_del_init(&lip->li_bio_list); - iip = INODE_ITEM(lip); - spin_lock(&iip->ili_lock); /* * Remove the reference to the cluster buffer if the inode is - * clean in memory. Drop the buffer reference once we've dropped - * the locks we hold. If the inode is dirty in memory, we need - * to put the inode item back on the buffer list for another - * pass through the flush machinery. + * clean in memory and drop the buffer reference once we've + * dropped the locks we hold. */ ASSERT(iip->ili_item.li_buf == bp); if (!iip->ili_fields) { iip->ili_item.li_buf = NULL; + list_del_init(&lip->li_bio_list); drop_buffer = true; - } else { - list_add(&lip->li_bio_list, &bp->b_li_list); } iip->ili_last_fields = 0; iip->ili_flush_lsn = 0; @@ -745,6 +708,51 @@ xfs_iflush_done( } } +/* + * Inode buffer IO completion routine. It is responsible for removing inodes + * attached to the buffer from the AIL if they have not been re-logged, as well + * as completing the flush and unlocking the inode. + */ +void +xfs_iflush_done( + struct xfs_buf *bp) +{ + struct xfs_log_item *lip, *n; + LIST_HEAD(flushed_inodes); + LIST_HEAD(ail_updates); + + /* + * Pull the attached inodes from the buffer one at a time and take the + * appropriate action on them. + */ + list_for_each_entry_safe(lip, n, &bp->b_li_list, li_bio_list) { + struct xfs_inode_log_item *iip = INODE_ITEM(lip); + + if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE)) { + xfs_iflush_abort(iip->ili_inode); + continue; + } + if (!iip->ili_last_fields) + continue; + + /* Do an unlocked check for needing the AIL lock. */ + if (iip->ili_flush_lsn == lip->li_lsn || + test_bit(XFS_LI_FAILED, &lip->li_flags)) + list_move_tail(&lip->li_bio_list, &ail_updates); + else + list_move_tail(&lip->li_bio_list, &flushed_inodes); + } + + if (!list_empty(&ail_updates)) { + xfs_iflush_ail_updates(bp->b_mount->m_ail, &ail_updates); + list_splice_tail(&ail_updates, &flushed_inodes); + } + + xfs_iflush_finish(bp, &flushed_inodes); + if (!list_empty(&flushed_inodes)) + list_splice_tail(&flushed_inodes, &bp->b_li_list); +} + /* * This is the inode flushing abort routine. It is called from xfs_iflush when * the filesystem is shutting down to clean up the inode state. It is