Message ID | 20170608140014.25132-3-cmaiolino@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Thu, Jun 08, 2017 at 04:00:14PM +0200, Carlos Maiolino wrote: > When a buffer has been failed during writeback, the inode items into it > are kept flush locked, and are never resubmitted due the flush lock, so, > if any buffer fails to be written, the items in AIL are never written to > disk and never unlocked. > > This causes unmount operation to hang due these items flush locked in AIL, > but this also causes the items in AIL to never be written back, even when > the IO device comes back to normal. > > I've been testing this patch with a DM-thin device, creating a > filesystem larger than the real device. > > When writing enough data to fill the DM-thin device, XFS receives ENOSPC > errors from the device, and keep spinning on xfsaild (when 'retry > forever' configuration is set). > > At this point, the filesystem can not be unmounted because of the flush locked > items in AIL, but worse, the items in AIL are never retried at all > (once xfs_inode_item_push() will skip the items that are flush locked), > even if the underlying DM-thin device is expanded to the proper size. > > This patch fixes both cases, retrying any item that has been failed > previously, using the infra-structure provided by the previous patch. > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > V2: > - Fix XFS_LI_FAILED flag removal > - Use atomic operations to set and clear XFS_LI_FAILED flag > - Remove check for XBF_WRITE_FAIL in xfs_inode_item_push > - Add more comments to the code > - Add a helper function to resubmit the failed buffers, so this > can be also used in dquot system without duplicating code > > V3: > - drop xfs_imap_to_bp call using a pointer in the log item to > hold the buffer address, fixing a possible infinite loop if > xfs_map_to_bp returns -EIO > - use xa_lock instead of atomic operations to handle log item > flags > - Add a hold to the buffer for each log item failed > - move buffer resubmission up in xfs_inode_item_push() > > fs/xfs/xfs_buf_item.c | 39 +++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_buf_item.h | 2 ++ > fs/xfs/xfs_inode_item.c | 44 +++++++++++++++++++++++++++++++++++++++++--- > fs/xfs/xfs_trans.h | 1 + > fs/xfs/xfs_trans_ail.c | 4 ++-- > 5 files changed, 85 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > index 523b7a4..9b7bd23 100644 > --- a/fs/xfs/xfs_buf_item.c > +++ b/fs/xfs/xfs_buf_item.c > @@ -1222,3 +1222,42 @@ xfs_buf_iodone( > xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE); > xfs_buf_item_free(BUF_ITEM(lip)); > } > + > +/* > + * Requeue a failed buffer for writeback > + * > + * Return true if the buffer has been re-queued properly, false otherwise > + */ > +bool > +xfs_buf_resubmit_failed_buffers( > + struct xfs_inode *ip, > + struct xfs_log_item *lip, > + struct list_head *buffer_list) > +{ > + struct xfs_log_item *next; > + struct xfs_buf *bp; > + bool ret; > + > + bp = lip->li_buf; > + > + /* Clear XFS_LI_FAILED flag from all items before resubmit > + * > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > + * function already have it acquired > + */ > + for (; lip; lip = next) { > + next = lip->li_bio_list; > + lip->li_flags &= ~XFS_LI_FAILED; > + xfs_buf_rele(bp); We need to check that the log item has LI_FAILED set before we release the buffer because it's possible for the buffer to have a mix of failed and recently flushed (not failed) items. We should set ->li_buf = NULL wherever we clear LI_FAILED as well. A couple small inline helpers to fail/unfail log items correctly (i.e., xfs_log_item_[fail|clear]() or some such) might help to make sure we get the reference counting right. We could also assert that the correct lock is held from those helpers. > + } > + > + /* Add this buffer back to the delayed write list */ > + xfs_buf_lock(bp); > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > + ret = false; > + else > + ret = true; > + > + xfs_buf_unlock(bp); > + return ret; > +} > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > index f7eba99..32aceae 100644 > --- a/fs/xfs/xfs_buf_item.h > +++ b/fs/xfs/xfs_buf_item.h > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > xfs_log_item_t *); > void xfs_buf_iodone_callbacks(struct xfs_buf *); > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > + struct list_head *); > > extern kmem_zone_t *xfs_buf_item_zone; > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > index 08cb7d1..2f5a49e 100644 > --- a/fs/xfs/xfs_inode_item.c > +++ b/fs/xfs/xfs_inode_item.c > @@ -27,6 +27,7 @@ > #include "xfs_error.h" > #include "xfs_trace.h" > #include "xfs_trans_priv.h" > +#include "xfs_buf_item.h" > #include "xfs_log.h" > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > } > > +STATIC void > +xfs_inode_item_error( > + struct xfs_log_item *lip, > + struct xfs_buf *bp, > + unsigned int bflags) > +{ > + > + /* > + * The buffer writeback containing this inode has been failed > + * mark it as failed and unlock the flush lock, so it can be retried > + * again. > + * Use xa_lock to avoid races with other log item state changes. > + */ > + spin_lock(&lip->li_ailp->xa_lock); I'd push the ->xa_lock up into caller and cycle it once over the list rather than once per item. This is more consistent with the locking pattern for successful I/Os. > + xfs_buf_hold(bp); > + lip->li_flags |= XFS_LI_FAILED; > + lip->li_buf = bp; Also, while I think this is technically Ok based on the current implementation, I think we should also test that LI_FAILED is not already set and only hold the buf in that case (i.e., the previously mentioned helper). This helps prevent a landmine if the implementation/locking/etc. changes down the road. I think it would be fine to assert that LI_FAILED is not set already if we wanted to, though, as long as there's a brief comment as well. > + spin_unlock(&lip->li_ailp->xa_lock); > +} > + > STATIC uint > xfs_inode_item_push( > struct xfs_log_item *lip, > @@ -504,6 +525,18 @@ xfs_inode_item_push( > } > > /* > + * The buffer containing this item failed to be written back > + * previously. Resubmit the buffer for IO. > + */ > + if (lip->li_flags & XFS_LI_FAILED) { > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > + buffer_list)) > + rval = XFS_ITEM_FLUSHING; > + > + goto out_unlock; > + } I think this needs to go at least before the ilock. We don't need the latter lock because the inode is already flushed. Somebody else could be holding ilock and blocked on the flush lock. If that occurs, we'll never make progress. I also think we should lock/unlock the buffer here rather than down in the helper. Hmm, maybe even queue it as well so it's clear what's going on. For example, something like: if (lip->li_flags & XFS_LI_FAILED) { xfs_buf_lock(bp); xfs_buf_clear_failed_items(bp); if (!xfs_buf_delwri_queue(bp, buffer_list)) rval = XFS_ITEM_FLUSHING; xfs_buf_unlock(bp); goto out_unlock; } ... but that is mostly just aesthetic. > + > + /* > * Stale inode items should force out the iclog. > */ > if (ip->i_flags & XFS_ISTALE) { > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > .iop_unlock = xfs_inode_item_unlock, > .iop_committed = xfs_inode_item_committed, > .iop_push = xfs_inode_item_push, > - .iop_committing = xfs_inode_item_committing > + .iop_committing = xfs_inode_item_committing, > + .iop_error = xfs_inode_item_error > }; > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > * the AIL lock. > */ > iip = INODE_ITEM(blip); > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > + lip->li_flags & XFS_LI_FAILED) > need_ail++; > > blip = next; > @@ -718,7 +753,8 @@ xfs_iflush_done( > > /* make sure we capture the state of the initial inode. */ > iip = INODE_ITEM(lip); > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > + lip->li_flags & XFS_LI_FAILED) > need_ail++; > > /* > @@ -739,6 +775,8 @@ xfs_iflush_done( > if (INODE_ITEM(blip)->ili_logged && > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > mlip_changed |= xfs_ail_delete_one(ailp, blip); > + else if (blip->li_flags & XFS_LI_FAILED) > + blip->li_flags &= ~XFS_LI_FAILED; This needs to do the full "unfail" sequence as well (another place to call the helper). > } > > if (mlip_changed) { > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > index 3486517..86c3464 100644 > --- a/fs/xfs/xfs_trans.h > +++ b/fs/xfs/xfs_trans.h > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > struct xfs_ail *li_ailp; /* ptr to AIL */ > uint li_type; /* item type */ > uint li_flags; /* misc flags */ > + struct xfs_buf *li_buf; /* real buffer pointer */ > struct xfs_log_item *li_bio_list; /* buffer item list */ > void (*li_cb)(struct xfs_buf *, > struct xfs_log_item *); > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 9056c0f..36e08dd 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > bool > xfs_ail_delete_one( > struct xfs_ail *ailp, > - struct xfs_log_item *lip) > + struct xfs_log_item *lip) > { > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > xfs_ail_delete(ailp, lip); > - lip->li_flags &= ~XFS_LI_IN_AIL; > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); ... and the same thing here. Brian > lip->li_lsn = 0; > > return mlip == lip; > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey. > > +/* > > + * Requeue a failed buffer for writeback > > + * > > + * Return true if the buffer has been re-queued properly, false otherwise > > + */ > > +bool > > +xfs_buf_resubmit_failed_buffers( > > + struct xfs_inode *ip, > > + struct xfs_log_item *lip, > > + struct list_head *buffer_list) > > +{ > > + struct xfs_log_item *next; > > + struct xfs_buf *bp; > > + bool ret; > > + > > + bp = lip->li_buf; > > + > > + /* Clear XFS_LI_FAILED flag from all items before resubmit > > + * > > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > > + * function already have it acquired > > + */ > > + for (; lip; lip = next) { > > + next = lip->li_bio_list; > > + lip->li_flags &= ~XFS_LI_FAILED; > > + xfs_buf_rele(bp); > > We need to check that the log item has LI_FAILED set before we release > the buffer because it's possible for the buffer to have a mix of failed > and recently flushed (not failed) items. We should set ->li_buf = NULL > wherever we clear LI_FAILED as well. > > A couple small inline helpers to fail/unfail log items correctly (i.e., > xfs_log_item_[fail|clear]() or some such) might help to make sure we get > the reference counting right. We could also assert that the correct lock > is held from those helpers. > Agreed, will queue it for V4 > > + } > > + > > + /* Add this buffer back to the delayed write list */ > > + xfs_buf_lock(bp); > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > > + ret = false; > > + else > > + ret = true; > > + > > + xfs_buf_unlock(bp); > > + return ret; > > +} > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > > index f7eba99..32aceae 100644 > > --- a/fs/xfs/xfs_buf_item.h > > +++ b/fs/xfs/xfs_buf_item.h > > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > > xfs_log_item_t *); > > void xfs_buf_iodone_callbacks(struct xfs_buf *); > > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > > + struct list_head *); > > > > extern kmem_zone_t *xfs_buf_item_zone; > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > index 08cb7d1..2f5a49e 100644 > > --- a/fs/xfs/xfs_inode_item.c > > +++ b/fs/xfs/xfs_inode_item.c > > @@ -27,6 +27,7 @@ > > #include "xfs_error.h" > > #include "xfs_trace.h" > > #include "xfs_trans_priv.h" > > +#include "xfs_buf_item.h" > > #include "xfs_log.h" > > > > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > > } > > > > +STATIC void > > +xfs_inode_item_error( > > + struct xfs_log_item *lip, > > + struct xfs_buf *bp, > > + unsigned int bflags) > > +{ > > + > > + /* > > + * The buffer writeback containing this inode has been failed > > + * mark it as failed and unlock the flush lock, so it can be retried > > + * again. > > + * Use xa_lock to avoid races with other log item state changes. > > + */ > > + spin_lock(&lip->li_ailp->xa_lock); > > I'd push the ->xa_lock up into caller and cycle it once over the list > rather than once per item. This is more consistent with the locking > pattern for successful I/Os. > I considered it too, and decided to use spin_lock/unlock on each interaction to avoid having it locked for a long time, but I don't think there is any harm in using a single lock/unlock. > > + xfs_buf_hold(bp); > > + lip->li_flags |= XFS_LI_FAILED; > > + lip->li_buf = bp; > > Also, while I think this is technically Ok based on the current > implementation, I think we should also test that LI_FAILED is not > already set and only hold the buf in that case (i.e., the previously > mentioned helper). This helps prevent a landmine if the > implementation/locking/etc. changes down the road. > > I think it would be fine to assert that LI_FAILED is not set already if > we wanted to, though, as long as there's a brief comment as well. > > > + spin_unlock(&lip->li_ailp->xa_lock); > > +} > > + > > STATIC uint > > xfs_inode_item_push( > > struct xfs_log_item *lip, > > @@ -504,6 +525,18 @@ xfs_inode_item_push( > > } > > > > /* > > + * The buffer containing this item failed to be written back > > + * previously. Resubmit the buffer for IO. > > + */ > > + if (lip->li_flags & XFS_LI_FAILED) { > > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > > + buffer_list)) > > + rval = XFS_ITEM_FLUSHING; > > + > > + goto out_unlock; > > + } > > I think this needs to go at least before the ilock. We don't need the > latter lock because the inode is already flushed. Somebody else could be > holding ilock and blocked on the flush lock. If that occurs, we'll never > make progress. It should be after the ilock IIRC, having it before the ilock will deadlock it any unmount when there are still inodes to be destroyed. > > I also think we should lock/unlock the buffer here rather than down in > the helper. Hmm, maybe even queue it as well so it's clear what's going > on. For example, something like: > > if (lip->li_flags & XFS_LI_FAILED) { > xfs_buf_lock(bp); > xfs_buf_clear_failed_items(bp); > if (!xfs_buf_delwri_queue(bp, buffer_list)) > rval = XFS_ITEM_FLUSHING; > xfs_buf_unlock(bp); > goto out_unlock; > } > Well, we could do that, but we would need to duplicate this code between inode and dquot item, while keeping the helper as-is, we can re-use it in both inode and dquot. > ... but that is mostly just aesthetic. > > > + > > + /* > > * Stale inode items should force out the iclog. > > */ > > if (ip->i_flags & XFS_ISTALE) { > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > > .iop_unlock = xfs_inode_item_unlock, > > .iop_committed = xfs_inode_item_committed, > > .iop_push = xfs_inode_item_push, > > - .iop_committing = xfs_inode_item_committing > > + .iop_committing = xfs_inode_item_committing, > > + .iop_error = xfs_inode_item_error > > }; > > > > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > > * the AIL lock. > > */ > > iip = INODE_ITEM(blip); > > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > > + lip->li_flags & XFS_LI_FAILED) > > need_ail++; > > > > blip = next; > > @@ -718,7 +753,8 @@ xfs_iflush_done( > > > > /* make sure we capture the state of the initial inode. */ > > iip = INODE_ITEM(lip); > > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > > + lip->li_flags & XFS_LI_FAILED) > > need_ail++; > > > > /* > > @@ -739,6 +775,8 @@ xfs_iflush_done( > > if (INODE_ITEM(blip)->ili_logged && > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > > mlip_changed |= xfs_ail_delete_one(ailp, blip); > > + else if (blip->li_flags & XFS_LI_FAILED) > > + blip->li_flags &= ~XFS_LI_FAILED; > > This needs to do the full "unfail" sequence as well (another place to > call the helper). > > > } > > > > if (mlip_changed) { > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > index 3486517..86c3464 100644 > > --- a/fs/xfs/xfs_trans.h > > +++ b/fs/xfs/xfs_trans.h > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > > struct xfs_ail *li_ailp; /* ptr to AIL */ > > uint li_type; /* item type */ > > uint li_flags; /* misc flags */ > > + struct xfs_buf *li_buf; /* real buffer pointer */ > > struct xfs_log_item *li_bio_list; /* buffer item list */ > > void (*li_cb)(struct xfs_buf *, > > struct xfs_log_item *); > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > index 9056c0f..36e08dd 100644 > > --- a/fs/xfs/xfs_trans_ail.c > > +++ b/fs/xfs/xfs_trans_ail.c > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > > bool > > xfs_ail_delete_one( > > struct xfs_ail *ailp, > > - struct xfs_log_item *lip) > > + struct xfs_log_item *lip) > > { > > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > > xfs_ail_delete(ailp, lip); > > - lip->li_flags &= ~XFS_LI_IN_AIL; > > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); > > ... and the same thing here. > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some more reviews on this version before sending V4. Thanks for the review Brian. Cheers
On Mon, Jun 12, 2017 at 02:44:23PM +0200, Carlos Maiolino wrote: > Hey. > > > > +/* > > > + * Requeue a failed buffer for writeback > > > + * > > > + * Return true if the buffer has been re-queued properly, false otherwise > > > + */ > > > +bool > > > +xfs_buf_resubmit_failed_buffers( > > > + struct xfs_inode *ip, > > > + struct xfs_log_item *lip, > > > + struct list_head *buffer_list) > > > +{ > > > + struct xfs_log_item *next; > > > + struct xfs_buf *bp; > > > + bool ret; > > > + > > > + bp = lip->li_buf; > > > + > > > + /* Clear XFS_LI_FAILED flag from all items before resubmit > > > + * > > > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > > > + * function already have it acquired > > > + */ > > > + for (; lip; lip = next) { > > > + next = lip->li_bio_list; > > > + lip->li_flags &= ~XFS_LI_FAILED; > > > + xfs_buf_rele(bp); > > > > We need to check that the log item has LI_FAILED set before we release > > the buffer because it's possible for the buffer to have a mix of failed > > and recently flushed (not failed) items. We should set ->li_buf = NULL > > wherever we clear LI_FAILED as well. > > > > A couple small inline helpers to fail/unfail log items correctly (i.e., > > xfs_log_item_[fail|clear]() or some such) might help to make sure we get > > the reference counting right. We could also assert that the correct lock > > is held from those helpers. > > > > Agreed, will queue it for V4 > > > > + } > > > + > > > + /* Add this buffer back to the delayed write list */ > > > + xfs_buf_lock(bp); > > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > > > + ret = false; > > > + else > > > + ret = true; > > > + > > > + xfs_buf_unlock(bp); > > > + return ret; > > > +} > > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > > > index f7eba99..32aceae 100644 > > > --- a/fs/xfs/xfs_buf_item.h > > > +++ b/fs/xfs/xfs_buf_item.h > > > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > > > xfs_log_item_t *); > > > void xfs_buf_iodone_callbacks(struct xfs_buf *); > > > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > > > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > > > + struct list_head *); > > > > > > extern kmem_zone_t *xfs_buf_item_zone; > > > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > > index 08cb7d1..2f5a49e 100644 > > > --- a/fs/xfs/xfs_inode_item.c > > > +++ b/fs/xfs/xfs_inode_item.c > > > @@ -27,6 +27,7 @@ > > > #include "xfs_error.h" > > > #include "xfs_trace.h" > > > #include "xfs_trans_priv.h" > > > +#include "xfs_buf_item.h" > > > #include "xfs_log.h" > > > > > > > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > > > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > > > } > > > > > > +STATIC void > > > +xfs_inode_item_error( > > > + struct xfs_log_item *lip, > > > + struct xfs_buf *bp, > > > + unsigned int bflags) > > > +{ > > > + > > > + /* > > > + * The buffer writeback containing this inode has been failed > > > + * mark it as failed and unlock the flush lock, so it can be retried > > > + * again. > > > + * Use xa_lock to avoid races with other log item state changes. > > > + */ > > > + spin_lock(&lip->li_ailp->xa_lock); > > > > I'd push the ->xa_lock up into caller and cycle it once over the list > > rather than once per item. This is more consistent with the locking > > pattern for successful I/Os. > > > > I considered it too, and decided to use spin_lock/unlock on each interaction to > avoid having it locked for a long time, but I don't think there is any harm in > using a single lock/unlock. > I don't think we're too concerned about performance at this low a level when when I/Os are failing (or yet, at least), but I'm guessing that pulling up the lock is still lighter weight than what a successful I/O completion does. That aside, I think we also want the entire sequence of marking LI_FAILED to be atomic with respect to the AIL (e.g., so the next AIL push sees the state of the items before the errors are processed or after, not somewhere in between). > > > + xfs_buf_hold(bp); > > > + lip->li_flags |= XFS_LI_FAILED; > > > + lip->li_buf = bp; > > > > Also, while I think this is technically Ok based on the current > > implementation, I think we should also test that LI_FAILED is not > > already set and only hold the buf in that case (i.e., the previously > > mentioned helper). This helps prevent a landmine if the > > implementation/locking/etc. changes down the road. > > > > I think it would be fine to assert that LI_FAILED is not set already if > > we wanted to, though, as long as there's a brief comment as well. > > > > > + spin_unlock(&lip->li_ailp->xa_lock); > > > +} > > > + > > > STATIC uint > > > xfs_inode_item_push( > > > struct xfs_log_item *lip, > > > @@ -504,6 +525,18 @@ xfs_inode_item_push( > > > } > > > > > > /* > > > + * The buffer containing this item failed to be written back > > > + * previously. Resubmit the buffer for IO. > > > + */ > > > + if (lip->li_flags & XFS_LI_FAILED) { > > > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > > > + buffer_list)) > > > + rval = XFS_ITEM_FLUSHING; > > > + > > > + goto out_unlock; > > > + } > > > > I think this needs to go at least before the ilock. We don't need the > > latter lock because the inode is already flushed. Somebody else could be > > holding ilock and blocked on the flush lock. If that occurs, we'll never > > make progress. > > It should be after the ilock IIRC, having it before the ilock will deadlock it > any unmount when there are still inodes to be destroyed. Hmm.. so if we get into the failed state on an inode and a reclaim attempt occurs on the inode, xfs_reclaim_inode() can grab the ilock and then block on the flush lock. With this code as it is, we'll now never resubmit the failed buffer (unless via some other inode, by chance) because we can't get the inode lock. IOW, we're stuck in a livelock that resembles the original problem. AFAICT, we don't need the inode lock in this case at all. The only reason for it here that I'm aware of is to follow the ilock -> flush lock ordering rules and we know the inode is already flush locked in the failed case. Can you elaborate on the unmount deadlock issue if we move this before the ilock? Shouldn't the I/O error handling detect the unmounting state the next go around and flag it as a permanent error? I'm wondering if there is some other bug lurking there that we need to work around... (Note that I think it's important we resolve this above all else, because getting this correct is probably what dictates whether this is a viable approach or that we need to start looking at something like Dave's flush lock rework idea.) > > > > I also think we should lock/unlock the buffer here rather than down in > > the helper. Hmm, maybe even queue it as well so it's clear what's going > > on. For example, something like: > > > > if (lip->li_flags & XFS_LI_FAILED) { > > xfs_buf_lock(bp); > > xfs_buf_clear_failed_items(bp); > > if (!xfs_buf_delwri_queue(bp, buffer_list)) > > rval = XFS_ITEM_FLUSHING; > > xfs_buf_unlock(bp); > > goto out_unlock; > > } > > > > Well, we could do that, but we would need to duplicate this code between inode > and dquot item, while keeping the helper as-is, we can re-use it in both inode > and dquot. > That doesn't seem so bad to me. It's basically just a lock and delwri queue of a buffer. That said, we could also create a couple separate helpers here: one that requeues a buffer and another that clears failed items on a buffer. Hmm, I'm actually wondering if we can refactor xfs_inode_item_push() a bit to just reuse some of the existing code here. Perhaps it's best to get the locking down correctly before looking into this... BTW, I just realized that the xfs_buf_lock() above probably needs to be a trylock (where we return XFS_ITEM_LOCKED on failure). The buffer lock via xfs_iflush() is a trylock as well, and we actually drop and reacquire ->xa_lock across that and the delwri queue of the buffer (but I'm not totally sure that part is necessary here). > > ... but that is mostly just aesthetic. > > > > > + > > > + /* > > > * Stale inode items should force out the iclog. > > > */ > > > if (ip->i_flags & XFS_ISTALE) { > > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > > > .iop_unlock = xfs_inode_item_unlock, > > > .iop_committed = xfs_inode_item_committed, > > > .iop_push = xfs_inode_item_push, > > > - .iop_committing = xfs_inode_item_committing > > > + .iop_committing = xfs_inode_item_committing, > > > + .iop_error = xfs_inode_item_error > > > }; > > > > > > > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > > > * the AIL lock. > > > */ > > > iip = INODE_ITEM(blip); > > > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > > > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > > > + lip->li_flags & XFS_LI_FAILED) > > > need_ail++; > > > > > > blip = next; > > > @@ -718,7 +753,8 @@ xfs_iflush_done( > > > > > > /* make sure we capture the state of the initial inode. */ > > > iip = INODE_ITEM(lip); > > > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > > > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > > > + lip->li_flags & XFS_LI_FAILED) > > > need_ail++; > > > > > > /* > > > @@ -739,6 +775,8 @@ xfs_iflush_done( > > > if (INODE_ITEM(blip)->ili_logged && > > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > > > mlip_changed |= xfs_ail_delete_one(ailp, blip); > > > + else if (blip->li_flags & XFS_LI_FAILED) > > > + blip->li_flags &= ~XFS_LI_FAILED; > > > > This needs to do the full "unfail" sequence as well (another place to > > call the helper). > > > > > } > > > > > > if (mlip_changed) { > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > > index 3486517..86c3464 100644 > > > --- a/fs/xfs/xfs_trans.h > > > +++ b/fs/xfs/xfs_trans.h > > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > > > struct xfs_ail *li_ailp; /* ptr to AIL */ > > > uint li_type; /* item type */ > > > uint li_flags; /* misc flags */ > > > + struct xfs_buf *li_buf; /* real buffer pointer */ > > > struct xfs_log_item *li_bio_list; /* buffer item list */ > > > void (*li_cb)(struct xfs_buf *, > > > struct xfs_log_item *); > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > > index 9056c0f..36e08dd 100644 > > > --- a/fs/xfs/xfs_trans_ail.c > > > +++ b/fs/xfs/xfs_trans_ail.c > > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > > > bool > > > xfs_ail_delete_one( > > > struct xfs_ail *ailp, > > > - struct xfs_log_item *lip) > > > + struct xfs_log_item *lip) > > > { > > > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > > > > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > > > xfs_ail_delete(ailp, lip); > > > - lip->li_flags &= ~XFS_LI_IN_AIL; > > > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); > > > > ... and the same thing here. > > > > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some > more reviews on this version before sending V4. > Sounds good. Brian > Thanks for the review Brian. > > Cheers > > -- > Carlos > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 12, 2017 at 09:45:44AM -0400, Brian Foster wrote: > On Mon, Jun 12, 2017 at 02:44:23PM +0200, Carlos Maiolino wrote: > > Hey. > > > > > > +/* > > > > + * Requeue a failed buffer for writeback > > > > + * > > > > + * Return true if the buffer has been re-queued properly, false otherwise > > > > + */ > > > > +bool > > > > +xfs_buf_resubmit_failed_buffers( > > > > + struct xfs_inode *ip, > > > > + struct xfs_log_item *lip, > > > > + struct list_head *buffer_list) > > > > +{ > > > > + struct xfs_log_item *next; > > > > + struct xfs_buf *bp; > > > > + bool ret; > > > > + > > > > + bp = lip->li_buf; > > > > + > > > > + /* Clear XFS_LI_FAILED flag from all items before resubmit > > > > + * > > > > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > > > > + * function already have it acquired > > > > + */ > > > > + for (; lip; lip = next) { > > > > + next = lip->li_bio_list; > > > > + lip->li_flags &= ~XFS_LI_FAILED; > > > > + xfs_buf_rele(bp); > > > > > > We need to check that the log item has LI_FAILED set before we release > > > the buffer because it's possible for the buffer to have a mix of failed > > > and recently flushed (not failed) items. We should set ->li_buf = NULL > > > wherever we clear LI_FAILED as well. > > > > > > A couple small inline helpers to fail/unfail log items correctly (i.e., > > > xfs_log_item_[fail|clear]() or some such) might help to make sure we get > > > the reference counting right. We could also assert that the correct lock > > > is held from those helpers. > > > > > > > Agreed, will queue it for V4 > > > > > > + } > > > > + > > > > + /* Add this buffer back to the delayed write list */ > > > > + xfs_buf_lock(bp); > > > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > > > > + ret = false; > > > > + else > > > > + ret = true; > > > > + > > > > + xfs_buf_unlock(bp); > > > > + return ret; > > > > +} > > > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > > > > index f7eba99..32aceae 100644 > > > > --- a/fs/xfs/xfs_buf_item.h > > > > +++ b/fs/xfs/xfs_buf_item.h > > > > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > > > > xfs_log_item_t *); > > > > void xfs_buf_iodone_callbacks(struct xfs_buf *); > > > > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > > > > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > > > > + struct list_head *); > > > > > > > > extern kmem_zone_t *xfs_buf_item_zone; > > > > > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > > > index 08cb7d1..2f5a49e 100644 > > > > --- a/fs/xfs/xfs_inode_item.c > > > > +++ b/fs/xfs/xfs_inode_item.c > > > > @@ -27,6 +27,7 @@ > > > > #include "xfs_error.h" > > > > #include "xfs_trace.h" > > > > #include "xfs_trans_priv.h" > > > > +#include "xfs_buf_item.h" > > > > #include "xfs_log.h" > > > > > > > > > > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > > > > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > > > > } > > > > > > > > +STATIC void > > > > +xfs_inode_item_error( > > > > + struct xfs_log_item *lip, > > > > + struct xfs_buf *bp, > > > > + unsigned int bflags) > > > > +{ > > > > + > > > > + /* > > > > + * The buffer writeback containing this inode has been failed > > > > + * mark it as failed and unlock the flush lock, so it can be retried > > > > + * again. > > > > + * Use xa_lock to avoid races with other log item state changes. > > > > + */ > > > > + spin_lock(&lip->li_ailp->xa_lock); > > > > > > I'd push the ->xa_lock up into caller and cycle it once over the list > > > rather than once per item. This is more consistent with the locking > > > pattern for successful I/Os. > > > > > > > I considered it too, and decided to use spin_lock/unlock on each interaction to > > avoid having it locked for a long time, but I don't think there is any harm in > > using a single lock/unlock. > > > > I don't think we're too concerned about performance at this low a level > when when I/Os are failing (or yet, at least), but I'm guessing that > pulling up the lock is still lighter weight than what a successful I/O > completion does. > > That aside, I think we also want the entire sequence of marking > LI_FAILED to be atomic with respect to the AIL (e.g., so the next AIL > push sees the state of the items before the errors are processed or > after, not somewhere in between). > Agreed. > > > > + xfs_buf_hold(bp); > > > > + lip->li_flags |= XFS_LI_FAILED; > > > > + lip->li_buf = bp; > > > > > > Also, while I think this is technically Ok based on the current > > > implementation, I think we should also test that LI_FAILED is not > > > already set and only hold the buf in that case (i.e., the previously > > > mentioned helper). This helps prevent a landmine if the > > > implementation/locking/etc. changes down the road. > > > > > > I think it would be fine to assert that LI_FAILED is not set already if > > > we wanted to, though, as long as there's a brief comment as well. > > > > > > > + spin_unlock(&lip->li_ailp->xa_lock); > > > > +} > > > > + > > > > STATIC uint > > > > xfs_inode_item_push( > > > > struct xfs_log_item *lip, > > > > @@ -504,6 +525,18 @@ xfs_inode_item_push( > > > > } > > > > > > > > /* > > > > + * The buffer containing this item failed to be written back > > > > + * previously. Resubmit the buffer for IO. > > > > + */ > > > > + if (lip->li_flags & XFS_LI_FAILED) { > > > > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > > > > + buffer_list)) > > > > + rval = XFS_ITEM_FLUSHING; > > > > + > > > > + goto out_unlock; > > > > + } > > > > > > I think this needs to go at least before the ilock. We don't need the > > > latter lock because the inode is already flushed. Somebody else could be > > > holding ilock and blocked on the flush lock. If that occurs, we'll never > > > make progress. > > > > It should be after the ilock IIRC, having it before the ilock will deadlock it > > any unmount when there are still inodes to be destroyed. > > Hmm.. so if we get into the failed state on an inode and a reclaim > attempt occurs on the inode, xfs_reclaim_inode() can grab the ilock and > then block on the flush lock. With this code as it is, we'll now never > resubmit the failed buffer (unless via some other inode, by chance) > because we can't get the inode lock. IOW, we're stuck in a livelock that > resembles the original problem. > > AFAICT, we don't need the inode lock in this case at all. The only > reason for it here that I'm aware of is to follow the ilock -> flush > lock ordering rules and we know the inode is already flush locked in the > failed case. Can you elaborate on the unmount deadlock issue if we move > this before the ilock? Shouldn't the I/O error handling detect the > unmounting state the next go around and flag it as a permanent error? > I'm wondering if there is some other bug lurking there that we need to > work around... > Nevermind, I didn't investigate when I've hit the deadlock, I found what happened, fixed, and I'll queue this to V4. > (Note that I think it's important we resolve this above all else, > because getting this correct is probably what dictates whether this is a > viable approach or that we need to start looking at something like > Dave's flush lock rework idea.) > > > > > > > I also think we should lock/unlock the buffer here rather than down in > > > the helper. Hmm, maybe even queue it as well so it's clear what's going > > > on. For example, something like: > > > > > > if (lip->li_flags & XFS_LI_FAILED) { > > > xfs_buf_lock(bp); > > > xfs_buf_clear_failed_items(bp); > > > if (!xfs_buf_delwri_queue(bp, buffer_list)) > > > rval = XFS_ITEM_FLUSHING; > > > xfs_buf_unlock(bp); > > > goto out_unlock; > > > } > > > > > > > Well, we could do that, but we would need to duplicate this code between inode > > and dquot item, while keeping the helper as-is, we can re-use it in both inode > > and dquot. > > > > That doesn't seem so bad to me. It's basically just a lock and delwri > queue of a buffer. That said, we could also create a couple separate > helpers here: one that requeues a buffer and another that clears failed > items on a buffer. Hmm, I'm actually wondering if we can refactor > xfs_inode_item_push() a bit to just reuse some of the existing code > here. Perhaps it's best to get the locking down correctly before looking > into this... > > BTW, I just realized that the xfs_buf_lock() above probably needs to be > a trylock (where we return XFS_ITEM_LOCKED on failure). The buffer lock > via xfs_iflush() is a trylock as well, and we actually drop and > reacquire ->xa_lock across that and the delwri queue of the buffer (but > I'm not totally sure that part is necessary here). > > > > ... but that is mostly just aesthetic. > > > > > > > + > > > > + /* > > > > * Stale inode items should force out the iclog. > > > > */ > > > > if (ip->i_flags & XFS_ISTALE) { > > > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > > > > .iop_unlock = xfs_inode_item_unlock, > > > > .iop_committed = xfs_inode_item_committed, > > > > .iop_push = xfs_inode_item_push, > > > > - .iop_committing = xfs_inode_item_committing > > > > + .iop_committing = xfs_inode_item_committing, > > > > + .iop_error = xfs_inode_item_error > > > > }; > > > > > > > > > > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > > > > * the AIL lock. > > > > */ > > > > iip = INODE_ITEM(blip); > > > > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > > > > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > > > > + lip->li_flags & XFS_LI_FAILED) > > > > need_ail++; > > > > > > > > blip = next; > > > > @@ -718,7 +753,8 @@ xfs_iflush_done( > > > > > > > > /* make sure we capture the state of the initial inode. */ > > > > iip = INODE_ITEM(lip); > > > > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > > > > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > > > > + lip->li_flags & XFS_LI_FAILED) > > > > need_ail++; > > > > > > > > /* > > > > @@ -739,6 +775,8 @@ xfs_iflush_done( > > > > if (INODE_ITEM(blip)->ili_logged && > > > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > > > > mlip_changed |= xfs_ail_delete_one(ailp, blip); > > > > + else if (blip->li_flags & XFS_LI_FAILED) > > > > + blip->li_flags &= ~XFS_LI_FAILED; > > > > > > This needs to do the full "unfail" sequence as well (another place to > > > call the helper). > > > > > > > } > > > > > > > > if (mlip_changed) { > > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > > > index 3486517..86c3464 100644 > > > > --- a/fs/xfs/xfs_trans.h > > > > +++ b/fs/xfs/xfs_trans.h > > > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > > > > struct xfs_ail *li_ailp; /* ptr to AIL */ > > > > uint li_type; /* item type */ > > > > uint li_flags; /* misc flags */ > > > > + struct xfs_buf *li_buf; /* real buffer pointer */ > > > > struct xfs_log_item *li_bio_list; /* buffer item list */ > > > > void (*li_cb)(struct xfs_buf *, > > > > struct xfs_log_item *); > > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > > > index 9056c0f..36e08dd 100644 > > > > --- a/fs/xfs/xfs_trans_ail.c > > > > +++ b/fs/xfs/xfs_trans_ail.c > > > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > > > > bool > > > > xfs_ail_delete_one( > > > > struct xfs_ail *ailp, > > > > - struct xfs_log_item *lip) > > > > + struct xfs_log_item *lip) > > > > { > > > > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > > > > > > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > > > > xfs_ail_delete(ailp, lip); > > > > - lip->li_flags &= ~XFS_LI_IN_AIL; > > > > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); > > > > > > ... and the same thing here. > > > > > > > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some > > more reviews on this version before sending V4. > > > > Sounds good. > > Brian > > > Thanks for the review Brian. > > > > Cheers > > > > -- > > Carlos > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 523b7a4..9b7bd23 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -1222,3 +1222,42 @@ xfs_buf_iodone( xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE); xfs_buf_item_free(BUF_ITEM(lip)); } + +/* + * Requeue a failed buffer for writeback + * + * Return true if the buffer has been re-queued properly, false otherwise + */ +bool +xfs_buf_resubmit_failed_buffers( + struct xfs_inode *ip, + struct xfs_log_item *lip, + struct list_head *buffer_list) +{ + struct xfs_log_item *next; + struct xfs_buf *bp; + bool ret; + + bp = lip->li_buf; + + /* Clear XFS_LI_FAILED flag from all items before resubmit + * + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this + * function already have it acquired + */ + for (; lip; lip = next) { + next = lip->li_bio_list; + lip->li_flags &= ~XFS_LI_FAILED; + xfs_buf_rele(bp); + } + + /* Add this buffer back to the delayed write list */ + xfs_buf_lock(bp); + if (!xfs_buf_delwri_queue(bp, buffer_list)) + ret = false; + else + ret = true; + + xfs_buf_unlock(bp); + return ret; +} diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index f7eba99..32aceae 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, xfs_log_item_t *); void xfs_buf_iodone_callbacks(struct xfs_buf *); void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, + struct list_head *); extern kmem_zone_t *xfs_buf_item_zone; diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 08cb7d1..2f5a49e 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -27,6 +27,7 @@ #include "xfs_error.h" #include "xfs_trace.h" #include "xfs_trans_priv.h" +#include "xfs_buf_item.h" #include "xfs_log.h" @@ -475,6 +476,26 @@ xfs_inode_item_unpin( wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); } +STATIC void +xfs_inode_item_error( + struct xfs_log_item *lip, + struct xfs_buf *bp, + unsigned int bflags) +{ + + /* + * The buffer writeback containing this inode has been failed + * mark it as failed and unlock the flush lock, so it can be retried + * again. + * Use xa_lock to avoid races with other log item state changes. + */ + spin_lock(&lip->li_ailp->xa_lock); + xfs_buf_hold(bp); + lip->li_flags |= XFS_LI_FAILED; + lip->li_buf = bp; + spin_unlock(&lip->li_ailp->xa_lock); +} + STATIC uint xfs_inode_item_push( struct xfs_log_item *lip, @@ -504,6 +525,18 @@ xfs_inode_item_push( } /* + * The buffer containing this item failed to be written back + * previously. Resubmit the buffer for IO. + */ + if (lip->li_flags & XFS_LI_FAILED) { + if (!xfs_buf_resubmit_failed_buffers(ip, lip, + buffer_list)) + rval = XFS_ITEM_FLUSHING; + + goto out_unlock; + } + + /* * Stale inode items should force out the iclog. */ if (ip->i_flags & XFS_ISTALE) { @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { .iop_unlock = xfs_inode_item_unlock, .iop_committed = xfs_inode_item_committed, .iop_push = xfs_inode_item_push, - .iop_committing = xfs_inode_item_committing + .iop_committing = xfs_inode_item_committing, + .iop_error = xfs_inode_item_error }; @@ -710,7 +744,8 @@ xfs_iflush_done( * the AIL lock. */ iip = INODE_ITEM(blip); - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || + lip->li_flags & XFS_LI_FAILED) need_ail++; blip = next; @@ -718,7 +753,8 @@ xfs_iflush_done( /* make sure we capture the state of the initial inode. */ iip = INODE_ITEM(lip); - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || + lip->li_flags & XFS_LI_FAILED) need_ail++; /* @@ -739,6 +775,8 @@ xfs_iflush_done( if (INODE_ITEM(blip)->ili_logged && blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) mlip_changed |= xfs_ail_delete_one(ailp, blip); + else if (blip->li_flags & XFS_LI_FAILED) + blip->li_flags &= ~XFS_LI_FAILED; } if (mlip_changed) { diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 3486517..86c3464 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -49,6 +49,7 @@ typedef struct xfs_log_item { struct xfs_ail *li_ailp; /* ptr to AIL */ uint li_type; /* item type */ uint li_flags; /* misc flags */ + struct xfs_buf *li_buf; /* real buffer pointer */ struct xfs_log_item *li_bio_list; /* buffer item list */ void (*li_cb)(struct xfs_buf *, struct xfs_log_item *); diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 9056c0f..36e08dd 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( bool xfs_ail_delete_one( struct xfs_ail *ailp, - struct xfs_log_item *lip) + struct xfs_log_item *lip) { struct xfs_log_item *mlip = xfs_ail_min(ailp); trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); xfs_ail_delete(ailp, lip); - lip->li_flags &= ~XFS_LI_IN_AIL; + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); lip->li_lsn = 0; return mlip == lip;
When a buffer has been failed during writeback, the inode items into it are kept flush locked, and are never resubmitted due the flush lock, so, if any buffer fails to be written, the items in AIL are never written to disk and never unlocked. This causes unmount operation to hang due these items flush locked in AIL, but this also causes the items in AIL to never be written back, even when the IO device comes back to normal. I've been testing this patch with a DM-thin device, creating a filesystem larger than the real device. When writing enough data to fill the DM-thin device, XFS receives ENOSPC errors from the device, and keep spinning on xfsaild (when 'retry forever' configuration is set). At this point, the filesystem can not be unmounted because of the flush locked items in AIL, but worse, the items in AIL are never retried at all (once xfs_inode_item_push() will skip the items that are flush locked), even if the underlying DM-thin device is expanded to the proper size. This patch fixes both cases, retrying any item that has been failed previously, using the infra-structure provided by the previous patch. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- V2: - Fix XFS_LI_FAILED flag removal - Use atomic operations to set and clear XFS_LI_FAILED flag - Remove check for XBF_WRITE_FAIL in xfs_inode_item_push - Add more comments to the code - Add a helper function to resubmit the failed buffers, so this can be also used in dquot system without duplicating code V3: - drop xfs_imap_to_bp call using a pointer in the log item to hold the buffer address, fixing a possible infinite loop if xfs_map_to_bp returns -EIO - use xa_lock instead of atomic operations to handle log item flags - Add a hold to the buffer for each log item failed - move buffer resubmission up in xfs_inode_item_push() fs/xfs/xfs_buf_item.c | 39 +++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_buf_item.h | 2 ++ fs/xfs/xfs_inode_item.c | 44 +++++++++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_trans.h | 1 + fs/xfs/xfs_trans_ail.c | 4 ++-- 5 files changed, 85 insertions(+), 5 deletions(-)