Message ID | 20240821104412.8539-1-liuhuan01@kylinos.cn (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs_db: make sure agblocks is valid to prevent corruption | expand |
On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: > From: liuh <liuhuan01@kylinos.cn> > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > #0 libxfs_getbuf_flags > #1 libxfs_getbuf_flags > #2 libxfs_buf_read_map > #3 libxfs_buf_read > #4 libxfs_mount > #5 init > #6 main > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > In this case, user cannot run in expert mode also. > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue. > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > --- > db/init.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/db/init.c b/db/init.c > index cea25ae5..2d3295ba 100644 > --- a/db/init.c > +++ b/db/init.c > @@ -129,6 +129,13 @@ init( > } > } > > + if (unlikely(sbp->sb_agblocks == 0)) { > + fprintf(stderr, > + _("%s: device %s agblocks unexpected\n"), > + progname, x.data.name); > + exit(1); What if we set sb_agblocks to 1 and let the debugger continue? --D > + } > + > agcount = sbp->sb_agcount; > mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); > if (!mp) { > -- > 2.43.0 >
On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote: > On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: > > From: liuh <liuhuan01@kylinos.cn> > > > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > > #0 libxfs_getbuf_flags > > #1 libxfs_getbuf_flags > > #2 libxfs_buf_read_map > > #3 libxfs_buf_read > > #4 libxfs_mount > > #5 init > > #6 main > > > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > > In this case, user cannot run in expert mode also. > > > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue. > > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. > > > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > > --- > > db/init.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/db/init.c b/db/init.c > > index cea25ae5..2d3295ba 100644 > > --- a/db/init.c > > +++ b/db/init.c > > @@ -129,6 +129,13 @@ init( > > } > > } > > > > + if (unlikely(sbp->sb_agblocks == 0)) { > > + fprintf(stderr, > > + _("%s: device %s agblocks unexpected\n"), > > + progname, x.data.name); > > + exit(1); > > What if we set sb_agblocks to 1 and let the debugger continue? Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and maybe crash unexpectedly than to refuse to allow any diagnosis of the corrupt filesystem. xfs_db is a debug and forensic analysis tool. Having it crash because it didn't handle some corruption entirely corectly isn't something that we should be particularly worried about... -Dave.
在 2024/8/27 11:23, Dave Chinner 写道: > On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote: >> On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: >>> From: liuh <liuhuan01@kylinos.cn> >>> >>> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. >>> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 >>> >>> System will generate signal SIGFPE corrupt the process. And the stack as follow: >>> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags >>> #0 libxfs_getbuf_flags >>> #1 libxfs_getbuf_flags >>> #2 libxfs_buf_read_map >>> #3 libxfs_buf_read >>> #4 libxfs_mount >>> #5 init >>> #6 main >>> >>> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. >>> In this case, user cannot run in expert mode also. >>> >>> Never check (mp)->m_sb.sb_agblocks before use it cause this issue. >>> Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. >>> >>> Signed-off-by: liuh <liuhuan01@kylinos.cn> >>> --- >>> db/init.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/db/init.c b/db/init.c >>> index cea25ae5..2d3295ba 100644 >>> --- a/db/init.c >>> +++ b/db/init.c >>> @@ -129,6 +129,13 @@ init( >>> } >>> } >>> >>> + if (unlikely(sbp->sb_agblocks == 0)) { >>> + fprintf(stderr, >>> + _("%s: device %s agblocks unexpected\n"), >>> + progname, x.data.name); >>> + exit(1); >> What if we set sb_agblocks to 1 and let the debugger continue? > Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and > maybe crash unexpectedly than to refuse to allow any diagnosis of > the corrupt filesystem. > > xfs_db is a debug and forensic analysis tool. Having it crash > because it didn't handle some corruption entirely corectly isn't > something that we should be particularly worried about... > > -Dave. I agree with both of you, xfs_db is just a debugger tool. But for the above case, xfs_db can do nothing, even do a simple view on primary superblock. The user all knowns is that xfs_db goto corrupt, but don't know what's cause the problem. If set sb_agblocks to 1, xfs_db can going to work to view on primary superblock, but can't relay on it to view more information. Maybe left a warning message and set sb_agblocks to 1 and let debugger continue is better. Thanks.
On Tue, Aug 27, 2024 at 06:24:25PM +0800, liuh wrote: > 在 2024/8/27 11:23, Dave Chinner 写道: > > On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote: > > > On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: > > > > From: liuh <liuhuan01@kylinos.cn> > > > > > > > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > > > > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > > > > > > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > > > > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > > > > #0 libxfs_getbuf_flags > > > > #1 libxfs_getbuf_flags > > > > #2 libxfs_buf_read_map > > > > #3 libxfs_buf_read > > > > #4 libxfs_mount > > > > #5 init > > > > #6 main > > > > > > > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > > > > In this case, user cannot run in expert mode also. > > > > > > > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue. > > > > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. > > > > > > > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > > > > --- > > > > db/init.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/db/init.c b/db/init.c > > > > index cea25ae5..2d3295ba 100644 > > > > --- a/db/init.c > > > > +++ b/db/init.c > > > > @@ -129,6 +129,13 @@ init( > > > > } > > > > } > > > > + if (unlikely(sbp->sb_agblocks == 0)) { > > > > + fprintf(stderr, > > > > + _("%s: device %s agblocks unexpected\n"), > > > > + progname, x.data.name); > > > > + exit(1); > > > What if we set sb_agblocks to 1 and let the debugger continue? > > Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and > > maybe crash unexpectedly than to refuse to allow any diagnosis of > > the corrupt filesystem. > > > > xfs_db is a debug and forensic analysis tool. Having it crash > > because it didn't handle some corruption entirely corectly isn't > > something that we should be particularly worried about... > > > > -Dave. > > I agree with both of you, xfs_db is just a debugger tool. > But for the above case, xfs_db can do nothing, even do a simple view on > primary superblock. > The user all knowns is that xfs_db goto corrupt, but don't know what's cause > the problem. > > If set sb_agblocks to 1, xfs_db can going to work to view on primary > superblock, > but can't relay on it to view more information. Yes, a value of "1" will avoid the crash. However, it is obviously not an ideal choice because a value of 1 is not a valid value for sb_agblocks. IOWs, Darricks suggestion really was not to literally use a value of 1, but to substitute a non-zero value that allows xfs_db to largely function correctly on a filesystem corrupted in this way. > Maybe left a warning message and set sb_agblocks to 1 and let debugger > continue is better. When two senior developers both say "do it this way", it is worth trying to understand why they made that suggestion, not take the suggestion as the exact solution to the problem. We've provided the -framework- for the desired solution, but you still need to do some work to make it work. So what is a valid value for sb_agblocks? Is that value more functional that a value of 1? If it is more functional, can we calculate a better approximation of the correct value? Go an look at all the geometry information we have in the superblock that we can calculate an approximate AG size from. sb_agcount tells us how many AGs there are. sb_dblocks tells us the size of the filesystem. sb_agblklog tells us the maximum size the AG could possibly be. Go look for redundant metadata that we can get the exact value of agblocks from without relying on a specific sb_agblocks value. agf->agf_length should always equal sb_agblocks agi->agi_length should always equal sb_agblocks Look at the sb_agblocks verification/bounds checking code to determine absolute min/max valid values. XFS_MIN_AG_BYTES tells us the smallest valid AG size. XFS_MIN_AG_BLOCKS tells us the smallest valid AG block count. So, how would you go about calculating an approximate, if not exactly correct value for the missing sb_agblocks value? -Dave.
diff --git a/db/init.c b/db/init.c index cea25ae5..2d3295ba 100644 --- a/db/init.c +++ b/db/init.c @@ -129,6 +129,13 @@ init( } } + if (unlikely(sbp->sb_agblocks == 0)) { + fprintf(stderr, + _("%s: device %s agblocks unexpected\n"), + progname, x.data.name); + exit(1); + } + agcount = sbp->sb_agcount; mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); if (!mp) {