Message ID | 20200514165658.GC6714@magnolia (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v2] xfs: use ordered buffers to initialize dquot buffers during quotacheck | expand |
On Thu, May 14, 2020 at 09:56:58AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > ... > > Fix this by changing the ondisk dquot initialization function to use > ordered buffers to write out fresh dquot blocks if it detects that we're > running quotacheck. If the system goes down before quotacheck can > complete, the CHKD flags will not be set in the superblock and the next > mount will run quotacheck again, which can fix uninitialized dquot > buffers. This requires amending the defer code to maintaine ordered > buffer state across defer rolls for the sake of the dquot allocation > code. > > For regular operations we preserve the current behavior since the dquot > items require properly initialized ondisk dquot records. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > v2: rework the code comment explaining all this > --- > fs/xfs/libxfs/xfs_defer.c | 10 +++++++ > fs/xfs/xfs_dquot.c | 62 ++++++++++++++++++++++++++++++++++++--------- > 2 files changed, 58 insertions(+), 14 deletions(-) > ... > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > index 52e0f7245afc..f60a8967f9d5 100644 > --- a/fs/xfs/xfs_dquot.c > +++ b/fs/xfs/xfs_dquot.c ... > @@ -238,11 +240,45 @@ xfs_qm_init_dquot_blk( ... > + > + /* > + * When quotacheck runs, we use delayed writes to update all the dquots > + * on disk in an efficient manner instead of logging the individual > + * dquot changes as they are made. > + * > + * Hence if we log the buffer that we allocate here, then crash > + * post-quotacheck while the logged initialisation is still in the > + * active region of the log, we can lose the information quotacheck > + * wrote directly to the buffer. That is, log recovery will replay the > + * dquot buffer initialisation over the top of whatever information > + * quotacheck had written to the buffer. > + * > + * To avoid this problem, dquot allocation during quotacheck needs to > + * avoid logging the initialised buffer, but we still need to have > + * writeback of the buffer pin the tail of the log so that it is > + * initialised on disk before we remove the allocation transaction from > + * the active region of the log. Marking the buffer as ordered instead > + * of logging it provides this behaviour. > + * > + * If we crash before quotacheck completes, a subsequent quotacheck run > + * will re-allocate and re-initialize the dquot records as needed. > + */ I took a stab at condensing the comment a bit, FWIW (diff below). LGTM either way. Thanks for the update. Reviewed-by: Brian Foster <bfoster@redhat.com> > + if (!(mp->m_qflags & qflag)) > + xfs_trans_ordered_buf(tp, bp); > + else > + xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1); > } > > /* > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index f60a8967f9d5..55b95d45303b 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -254,26 +254,20 @@ xfs_qm_init_dquot_blk( xfs_trans_dquot_buf(tp, bp, blftype); /* - * When quotacheck runs, we use delayed writes to update all the dquots - * on disk in an efficient manner instead of logging the individual - * dquot changes as they are made. + * quotacheck uses delayed writes to update all the dquots on disk in an + * efficient manner instead of logging the individual dquot changes as + * they are made. However if we log the buffer allocated here and crash + * after quotacheck while the logged initialisation is still in the + * active region of the log, log recovery can replay the dquot buffer + * initialisation over the top of the checked dquots and corrupt quota + * accounting. * - * Hence if we log the buffer that we allocate here, then crash - * post-quotacheck while the logged initialisation is still in the - * active region of the log, we can lose the information quotacheck - * wrote directly to the buffer. That is, log recovery will replay the - * dquot buffer initialisation over the top of whatever information - * quotacheck had written to the buffer. - * - * To avoid this problem, dquot allocation during quotacheck needs to - * avoid logging the initialised buffer, but we still need to have - * writeback of the buffer pin the tail of the log so that it is - * initialised on disk before we remove the allocation transaction from - * the active region of the log. Marking the buffer as ordered instead - * of logging it provides this behaviour. - * - * If we crash before quotacheck completes, a subsequent quotacheck run - * will re-allocate and re-initialize the dquot records as needed. + * To avoid this problem, quotacheck cannot log the initialised buffer. + * We must still dirty the buffer and write it back before the + * allocation transaction clears the log. Therefore, mark the buffer as + * ordered instead of logging it directly. This is safe for quotacheck + * because it detects and repairs allocated but initialized dquot blocks + * in the quota inodes. */ if (!(mp->m_qflags & qflag)) xfs_trans_ordered_buf(tp, bp);
On Mon, May 18, 2020 at 09:16:25AM -0400, Brian Foster wrote: > On Thu, May 14, 2020 at 09:56:58AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > ... > > > > Fix this by changing the ondisk dquot initialization function to use > > ordered buffers to write out fresh dquot blocks if it detects that we're > > running quotacheck. If the system goes down before quotacheck can > > complete, the CHKD flags will not be set in the superblock and the next > > mount will run quotacheck again, which can fix uninitialized dquot > > buffers. This requires amending the defer code to maintaine ordered > > buffer state across defer rolls for the sake of the dquot allocation > > code. > > > > For regular operations we preserve the current behavior since the dquot > > items require properly initialized ondisk dquot records. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > v2: rework the code comment explaining all this > > --- > > fs/xfs/libxfs/xfs_defer.c | 10 +++++++ > > fs/xfs/xfs_dquot.c | 62 ++++++++++++++++++++++++++++++++++++--------- > > 2 files changed, 58 insertions(+), 14 deletions(-) > > > ... > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > > index 52e0f7245afc..f60a8967f9d5 100644 > > --- a/fs/xfs/xfs_dquot.c > > +++ b/fs/xfs/xfs_dquot.c > ... > > @@ -238,11 +240,45 @@ xfs_qm_init_dquot_blk( > ... > > + > > + /* > > + * When quotacheck runs, we use delayed writes to update all the dquots > > + * on disk in an efficient manner instead of logging the individual > > + * dquot changes as they are made. > > + * > > + * Hence if we log the buffer that we allocate here, then crash > > + * post-quotacheck while the logged initialisation is still in the > > + * active region of the log, we can lose the information quotacheck > > + * wrote directly to the buffer. That is, log recovery will replay the > > + * dquot buffer initialisation over the top of whatever information > > + * quotacheck had written to the buffer. > > + * > > + * To avoid this problem, dquot allocation during quotacheck needs to > > + * avoid logging the initialised buffer, but we still need to have > > + * writeback of the buffer pin the tail of the log so that it is > > + * initialised on disk before we remove the allocation transaction from > > + * the active region of the log. Marking the buffer as ordered instead > > + * of logging it provides this behaviour. > > + * > > + * If we crash before quotacheck completes, a subsequent quotacheck run > > + * will re-allocate and re-initialize the dquot records as needed. > > + */ > > I took a stab at condensing the comment a bit, FWIW (diff below). LGTM > either way. Thanks for the update. > > Reviewed-by: Brian Foster <bfoster@redhat.com> > > > + if (!(mp->m_qflags & qflag)) > > + xfs_trans_ordered_buf(tp, bp); > > + else > > + xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1); > > } > > > > /* > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > index f60a8967f9d5..55b95d45303b 100644 > --- a/fs/xfs/xfs_dquot.c > +++ b/fs/xfs/xfs_dquot.c > @@ -254,26 +254,20 @@ xfs_qm_init_dquot_blk( > xfs_trans_dquot_buf(tp, bp, blftype); > > /* > - * When quotacheck runs, we use delayed writes to update all the dquots > - * on disk in an efficient manner instead of logging the individual > - * dquot changes as they are made. > + * quotacheck uses delayed writes to update all the dquots on disk in an > + * efficient manner instead of logging the individual dquot changes as > + * they are made. However if we log the buffer allocated here and crash > + * after quotacheck while the logged initialisation is still in the > + * active region of the log, log recovery can replay the dquot buffer > + * initialisation over the top of the checked dquots and corrupt quota > + * accounting. > * > - * Hence if we log the buffer that we allocate here, then crash > - * post-quotacheck while the logged initialisation is still in the > - * active region of the log, we can lose the information quotacheck > - * wrote directly to the buffer. That is, log recovery will replay the > - * dquot buffer initialisation over the top of whatever information > - * quotacheck had written to the buffer. > - * > - * To avoid this problem, dquot allocation during quotacheck needs to > - * avoid logging the initialised buffer, but we still need to have > - * writeback of the buffer pin the tail of the log so that it is > - * initialised on disk before we remove the allocation transaction from > - * the active region of the log. Marking the buffer as ordered instead > - * of logging it provides this behaviour. > - * > - * If we crash before quotacheck completes, a subsequent quotacheck run > - * will re-allocate and re-initialize the dquot records as needed. > + * To avoid this problem, quotacheck cannot log the initialised buffer. > + * We must still dirty the buffer and write it back before the > + * allocation transaction clears the log. Therefore, mark the buffer as > + * ordered instead of logging it directly. This is safe for quotacheck > + * because it detects and repairs allocated but initialized dquot blocks > + * in the quota inodes. I think I like your revised comment better. :) --D > */ > if (!(mp->m_qflags & qflag)) > xfs_trans_ordered_buf(tp, bp); >
I would have split the addition of support for order buffers to the
defer mechanism into a separate patch.
Otherwise this looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 1172fbf072d8..d8f586256add 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -240,10 +240,13 @@ xfs_defer_trans_roll( struct xfs_log_item *lip; struct xfs_buf *bplist[XFS_DEFER_OPS_NR_BUFS]; struct xfs_inode *iplist[XFS_DEFER_OPS_NR_INODES]; + unsigned int ordered = 0; /* bitmap */ int bpcount = 0, ipcount = 0; int i; int error; + BUILD_BUG_ON(NBBY * sizeof(ordered) < XFS_DEFER_OPS_NR_BUFS); + list_for_each_entry(lip, &tp->t_items, li_trans) { switch (lip->li_type) { case XFS_LI_BUF: @@ -254,7 +257,10 @@ xfs_defer_trans_roll( ASSERT(0); return -EFSCORRUPTED; } - xfs_trans_dirty_buf(tp, bli->bli_buf); + if (bli->bli_flags & XFS_BLI_ORDERED) + ordered |= (1U << bpcount); + else + xfs_trans_dirty_buf(tp, bli->bli_buf); bplist[bpcount++] = bli->bli_buf; } break; @@ -295,6 +301,8 @@ xfs_defer_trans_roll( /* Rejoin the buffers and dirty them so the log moves forward. */ for (i = 0; i < bpcount; i++) { xfs_trans_bjoin(tp, bplist[i]); + if (ordered & (1U << i)) + xfs_trans_ordered_buf(tp, bplist[i]); xfs_trans_bhold(tp, bplist[i]); } diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 52e0f7245afc..f60a8967f9d5 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -205,16 +205,18 @@ xfs_qm_adjust_dqtimers( */ STATIC void xfs_qm_init_dquot_blk( - xfs_trans_t *tp, - xfs_mount_t *mp, - xfs_dqid_t id, - uint type, - xfs_buf_t *bp) + struct xfs_trans *tp, + struct xfs_mount *mp, + xfs_dqid_t id, + uint type, + struct xfs_buf *bp) { struct xfs_quotainfo *q = mp->m_quotainfo; - xfs_dqblk_t *d; - xfs_dqid_t curid; - int i; + struct xfs_dqblk *d; + xfs_dqid_t curid; + unsigned int qflag; + unsigned int blftype; + int i; ASSERT(tp); ASSERT(xfs_buf_islocked(bp)); @@ -238,11 +240,45 @@ xfs_qm_init_dquot_blk( } } - xfs_trans_dquot_buf(tp, bp, - (type & XFS_DQ_USER ? XFS_BLF_UDQUOT_BUF : - ((type & XFS_DQ_PROJ) ? XFS_BLF_PDQUOT_BUF : - XFS_BLF_GDQUOT_BUF))); - xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1); + if (type & XFS_DQ_USER) { + qflag = XFS_UQUOTA_CHKD; + blftype = XFS_BLF_UDQUOT_BUF; + } else if (type & XFS_DQ_PROJ) { + qflag = XFS_PQUOTA_CHKD; + blftype = XFS_BLF_PDQUOT_BUF; + } else { + qflag = XFS_GQUOTA_CHKD; + blftype = XFS_BLF_GDQUOT_BUF; + } + + xfs_trans_dquot_buf(tp, bp, blftype); + + /* + * When quotacheck runs, we use delayed writes to update all the dquots + * on disk in an efficient manner instead of logging the individual + * dquot changes as they are made. + * + * Hence if we log the buffer that we allocate here, then crash + * post-quotacheck while the logged initialisation is still in the + * active region of the log, we can lose the information quotacheck + * wrote directly to the buffer. That is, log recovery will replay the + * dquot buffer initialisation over the top of whatever information + * quotacheck had written to the buffer. + * + * To avoid this problem, dquot allocation during quotacheck needs to + * avoid logging the initialised buffer, but we still need to have + * writeback of the buffer pin the tail of the log so that it is + * initialised on disk before we remove the allocation transaction from + * the active region of the log. Marking the buffer as ordered instead + * of logging it provides this behaviour. + * + * If we crash before quotacheck completes, a subsequent quotacheck run + * will re-allocate and re-initialize the dquot records as needed. + */ + if (!(mp->m_qflags & qflag)) + xfs_trans_ordered_buf(tp, bp); + else + xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1); } /*