Message ID | 154630855085.14372.11366121347834353110.stgit@magnolia (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | xfs: inode scrubber fixes | expand |
On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > Teach scrub how to handle the case that there are one or more inobt > records covering a given inode cluster. This fixes the operation on big > block filesystems (e.g. 64k blocks, 512 byte inodes). > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/scrub/ialloc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c > index 2f6c2d7fa3fd..1be6a5ebd61e 100644 > --- a/fs/xfs/scrub/ialloc.c > +++ b/fs/xfs/scrub/ialloc.c > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree( > xfs_ino_t fsino; > xfs_agino_t agino; > unsigned int offset; > + unsigned int cluster_buf_base; > bool irec_free; > bool ino_inuse; > bool freemask_ok; > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree( > * Given an inobt record, an offset of a cluster within the record, > * and an offset of an inode within a cluster, compute which fs inode > * we're talking about and the offset of the inode record within the > - * inode buffer. > + * inode buffer, being careful about inobt records that don't align > + * with the start of the inode buffer when block sizes are large. > */ > agino = irec->ir_startino + cluster_base + cluster_index; > fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino); > - offset = cluster_index * mp->m_sb.sb_inodesize; > + cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino + > + cluster_base); > + offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize; So cluster_buf_base should always be 0 unless the record itself is not cluster aligned (i.e., the second record in a large FSB), right? If so, cluster_base should also be 0 in any case where cluster_buf_base != 0. I'm wondering if that can be made a little more explicit, or perhaps self-documented with an assert. Thinking more about it, it's kind of confusing to me either way because cluster_index isn't really a cluster index in this case, rather it's more of a record index because it doesn't account for the fact that the record is a subset of the cluster. Hmmm... could we simplify this by setting imap.im_boffset appropriately in the caller such that dip always points to either the first inode in the buffer or first inode in the record, then just pass in dip directly and let the caller increment it in the associated loop? Maybe that's something for another patch.. Brian > if (offset >= BBTOB(cluster_bp->b_length)) { > xchk_btree_set_corrupt(bs->sc, bs->cur, 0); > goto out; >
On Fri, Jan 04, 2019 at 01:38:56PM -0500, Brian Foster wrote: > On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > Teach scrub how to handle the case that there are one or more inobt > > records covering a given inode cluster. This fixes the operation on big > > block filesystems (e.g. 64k blocks, 512 byte inodes). > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > fs/xfs/scrub/ialloc.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c > > index 2f6c2d7fa3fd..1be6a5ebd61e 100644 > > --- a/fs/xfs/scrub/ialloc.c > > +++ b/fs/xfs/scrub/ialloc.c > > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree( > > xfs_ino_t fsino; > > xfs_agino_t agino; > > unsigned int offset; > > + unsigned int cluster_buf_base; > > bool irec_free; > > bool ino_inuse; > > bool freemask_ok; > > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree( > > * Given an inobt record, an offset of a cluster within the record, > > * and an offset of an inode within a cluster, compute which fs inode > > * we're talking about and the offset of the inode record within the > > - * inode buffer. > > + * inode buffer, being careful about inobt records that don't align > > + * with the start of the inode buffer when block sizes are large. > > */ > > agino = irec->ir_startino + cluster_base + cluster_index; > > fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino); > > - offset = cluster_index * mp->m_sb.sb_inodesize; > > + cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino + > > + cluster_base); > > + offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize; > > So cluster_buf_base should always be 0 unless the record itself is not > cluster aligned (i.e., the second record in a large FSB), right? Right. For the (n > 0)th inobt record for a large FSB, ir_startino's block offset will be less than inopblock, so cluster_buf_base will be greater than zero. > If so, cluster_base should also be 0 in any case where > cluster_buf_base != 0. I'm wondering if that can be made a little > more explicit, or perhaps self-documented with an assert. Right. For the more typical case where there are multiple clusters for each inobt record, ir_startino's block offset will always point to the start of an inode cluster (and therefore cluster_buf_base will be zero) but cluster_base will increment as we visit each cluster in the inobt. I think this code can become: cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino); ASSERT(cluster_buf_base == 0 || cluster_base == 0); offset = (cluster_buf_base + cluster_index) << mp->m_sb.sb_inopblog; But I'll go feed that to the test machines before I commit to that. :) > Thinking more about it, it's kind of confusing to me either way because > cluster_index isn't really a cluster index in this case, rather it's > more of a record index because it doesn't account for the fact that the > record is a subset of the cluster. Hmmm... could we simplify this by > setting imap.im_boffset appropriately in the caller such that dip always > points to either the first inode in the buffer or first inode in the > record, then just pass in dip directly and let the caller increment it > in the associated loop? Maybe that's something for another patch.. I think that would be possible... --D > Brian > > > if (offset >= BBTOB(cluster_bp->b_length)) { > > xchk_btree_set_corrupt(bs->sc, bs->cur, 0); > > goto out; > >
On Fri, Jan 04, 2019 at 04:29:16PM -0800, Darrick J. Wong wrote: > On Fri, Jan 04, 2019 at 01:38:56PM -0500, Brian Foster wrote: > > On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > Teach scrub how to handle the case that there are one or more inobt > > > records covering a given inode cluster. This fixes the operation on big > > > block filesystems (e.g. 64k blocks, 512 byte inodes). > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > --- > > > fs/xfs/scrub/ialloc.c | 8 ++++++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c > > > index 2f6c2d7fa3fd..1be6a5ebd61e 100644 > > > --- a/fs/xfs/scrub/ialloc.c > > > +++ b/fs/xfs/scrub/ialloc.c > > > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree( > > > xfs_ino_t fsino; > > > xfs_agino_t agino; > > > unsigned int offset; > > > + unsigned int cluster_buf_base; > > > bool irec_free; > > > bool ino_inuse; > > > bool freemask_ok; > > > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree( > > > * Given an inobt record, an offset of a cluster within the record, > > > * and an offset of an inode within a cluster, compute which fs inode > > > * we're talking about and the offset of the inode record within the > > > - * inode buffer. > > > + * inode buffer, being careful about inobt records that don't align > > > + * with the start of the inode buffer when block sizes are large. > > > */ > > > agino = irec->ir_startino + cluster_base + cluster_index; > > > fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino); > > > - offset = cluster_index * mp->m_sb.sb_inodesize; > > > + cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino + > > > + cluster_base); > > > + offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize; > > > > So cluster_buf_base should always be 0 unless the record itself is not > > cluster aligned (i.e., the second record in a large FSB), right? > > Right. For the (n > 0)th inobt record for a large FSB, ir_startino's > block offset will be less than inopblock, so cluster_buf_base will be > greater than zero. > > > If so, cluster_base should also be 0 in any case where > > cluster_buf_base != 0. I'm wondering if that can be made a little > > more explicit, or perhaps self-documented with an assert. > > Right. For the more typical case where there are multiple clusters for > each inobt record, ir_startino's block offset will always point to the > start of an inode cluster (and therefore cluster_buf_base will be zero) > but cluster_base will increment as we visit each cluster in the inobt. > > I think this code can become: > > cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino); > ASSERT(cluster_buf_base == 0 || cluster_base == 0); > offset = (cluster_buf_base + cluster_index) << mp->m_sb.sb_inopblog; > > But I'll go feed that to the test machines before I commit to that. :) > > > Thinking more about it, it's kind of confusing to me either way because > > cluster_index isn't really a cluster index in this case, rather it's > > more of a record index because it doesn't account for the fact that the > > record is a subset of the cluster. Hmmm... could we simplify this by > > setting imap.im_boffset appropriately in the caller such that dip always > > points to either the first inode in the buffer or first inode in the > > record, then just pass in dip directly and let the caller increment it > > in the associated loop? Maybe that's something for another patch.. > > I think that would be possible... > If so, I think it's worthwhile because then you'd only need to pass the record, the inode index and the dip pointer to this function. AFAICT you'd just need to pull up the cluster_buf_base logic here (along with a useful comment) to the im_boffset assignment in the caller. Brian > --D > > > Brian > > > > > if (offset >= BBTOB(cluster_bp->b_length)) { > > > xchk_btree_set_corrupt(bs->sc, bs->cur, 0); > > > goto out; > > >
On Mon, Jan 07, 2019 at 08:45:55AM -0500, Brian Foster wrote: > On Fri, Jan 04, 2019 at 04:29:16PM -0800, Darrick J. Wong wrote: > > On Fri, Jan 04, 2019 at 01:38:56PM -0500, Brian Foster wrote: > > > On Mon, Dec 31, 2018 at 06:09:10PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > > > Teach scrub how to handle the case that there are one or more inobt > > > > records covering a given inode cluster. This fixes the operation on big > > > > block filesystems (e.g. 64k blocks, 512 byte inodes). > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > --- > > > > fs/xfs/scrub/ialloc.c | 8 ++++++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c > > > > index 2f6c2d7fa3fd..1be6a5ebd61e 100644 > > > > --- a/fs/xfs/scrub/ialloc.c > > > > +++ b/fs/xfs/scrub/ialloc.c > > > > @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree( > > > > xfs_ino_t fsino; > > > > xfs_agino_t agino; > > > > unsigned int offset; > > > > + unsigned int cluster_buf_base; > > > > bool irec_free; > > > > bool ino_inuse; > > > > bool freemask_ok; > > > > @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree( > > > > * Given an inobt record, an offset of a cluster within the record, > > > > * and an offset of an inode within a cluster, compute which fs inode > > > > * we're talking about and the offset of the inode record within the > > > > - * inode buffer. > > > > + * inode buffer, being careful about inobt records that don't align > > > > + * with the start of the inode buffer when block sizes are large. > > > > */ > > > > agino = irec->ir_startino + cluster_base + cluster_index; > > > > fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino); > > > > - offset = cluster_index * mp->m_sb.sb_inodesize; > > > > + cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino + > > > > + cluster_base); > > > > + offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize; > > > > > > So cluster_buf_base should always be 0 unless the record itself is not > > > cluster aligned (i.e., the second record in a large FSB), right? > > > > Right. For the (n > 0)th inobt record for a large FSB, ir_startino's > > block offset will be less than inopblock, so cluster_buf_base will be > > greater than zero. > > > > > If so, cluster_base should also be 0 in any case where > > > cluster_buf_base != 0. I'm wondering if that can be made a little > > > more explicit, or perhaps self-documented with an assert. > > > > Right. For the more typical case where there are multiple clusters for > > each inobt record, ir_startino's block offset will always point to the > > start of an inode cluster (and therefore cluster_buf_base will be zero) > > but cluster_base will increment as we visit each cluster in the inobt. > > > > I think this code can become: > > > > cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino); > > ASSERT(cluster_buf_base == 0 || cluster_base == 0); > > offset = (cluster_buf_base + cluster_index) << mp->m_sb.sb_inopblog; > > > > But I'll go feed that to the test machines before I commit to that. :) > > > > > Thinking more about it, it's kind of confusing to me either way because > > > cluster_index isn't really a cluster index in this case, rather it's > > > more of a record index because it doesn't account for the fact that the > > > record is a subset of the cluster. Hmmm... could we simplify this by > > > setting imap.im_boffset appropriately in the caller such that dip always > > > points to either the first inode in the buffer or first inode in the > > > record, then just pass in dip directly and let the caller increment it > > > in the associated loop? Maybe that's something for another patch.. > > > > I think that would be possible... > > > > If so, I think it's worthwhile because then you'd only need to pass the > record, the inode index and the dip pointer to this function. AFAICT > you'd just need to pull up the cluster_buf_base logic here (along with > a useful comment) to the im_boffset assignment in the caller. <nod> It looks like it cleans things up quite a bit. --D > Brian > > > --D > > > > > Brian > > > > > > > if (offset >= BBTOB(cluster_bp->b_length)) { > > > > xchk_btree_set_corrupt(bs->sc, bs->cur, 0); > > > > goto out; > > > >
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c index 2f6c2d7fa3fd..1be6a5ebd61e 100644 --- a/fs/xfs/scrub/ialloc.c +++ b/fs/xfs/scrub/ialloc.c @@ -162,6 +162,7 @@ xchk_iallocbt_check_cluster_ifree( xfs_ino_t fsino; xfs_agino_t agino; unsigned int offset; + unsigned int cluster_buf_base; bool irec_free; bool ino_inuse; bool freemask_ok; @@ -174,11 +175,14 @@ xchk_iallocbt_check_cluster_ifree( * Given an inobt record, an offset of a cluster within the record, * and an offset of an inode within a cluster, compute which fs inode * we're talking about and the offset of the inode record within the - * inode buffer. + * inode buffer, being careful about inobt records that don't align + * with the start of the inode buffer when block sizes are large. */ agino = irec->ir_startino + cluster_base + cluster_index; fsino = XFS_AGINO_TO_INO(mp, bs->cur->bc_private.a.agno, agino); - offset = cluster_index * mp->m_sb.sb_inodesize; + cluster_buf_base = XFS_INO_TO_OFFSET(mp, irec->ir_startino + + cluster_base); + offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize; if (offset >= BBTOB(cluster_bp->b_length)) { xchk_btree_set_corrupt(bs->sc, bs->cur, 0); goto out;