diff mbox series

xfs_db: make sure agblocks is valid to prevent corruption

Message ID 20240821104412.8539-1-liuhuan01@kylinos.cn (mailing list archive)
State Superseded, archived
Headers show
Series xfs_db: make sure agblocks is valid to prevent corruption | expand

Commit Message

liuh Aug. 21, 2024, 10:44 a.m. UTC
From: liuh <liuhuan01@kylinos.cn>

Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
	xfs_db -c "sb 0" -c "p agblocks" /dev/loop1

System will generate signal SIGFPE corrupt the process. And the stack as follow:
corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
	#0  libxfs_getbuf_flags
	#1  libxfs_getbuf_flags
	#2  libxfs_buf_read_map
	#3  libxfs_buf_read
	#4  libxfs_mount
	#5  init
	#6  main

The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
In this case, user cannot run in expert mode also.

Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.

Signed-off-by: liuh <liuhuan01@kylinos.cn>
---
 db/init.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Darrick J. Wong Aug. 23, 2024, 12:49 a.m. UTC | #1
On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
> From: liuh <liuhuan01@kylinos.cn>
> 
> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> 	xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
> 
> System will generate signal SIGFPE corrupt the process. And the stack as follow:
> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> 	#0  libxfs_getbuf_flags
> 	#1  libxfs_getbuf_flags
> 	#2  libxfs_buf_read_map
> 	#3  libxfs_buf_read
> 	#4  libxfs_mount
> 	#5  init
> 	#6  main
> 
> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> In this case, user cannot run in expert mode also.
> 
> Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
> Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
> 
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
>  db/init.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/db/init.c b/db/init.c
> index cea25ae5..2d3295ba 100644
> --- a/db/init.c
> +++ b/db/init.c
> @@ -129,6 +129,13 @@ init(
>  		}
>  	}
>  
> +	if (unlikely(sbp->sb_agblocks == 0)) {
> +		fprintf(stderr,
> +			_("%s: device %s agblocks unexpected\n"),
> +			progname, x.data.name);
> +		exit(1);

What if we set sb_agblocks to 1 and let the debugger continue?

--D

> +	}
> +
>  	agcount = sbp->sb_agcount;
>  	mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
>  	if (!mp) {
> -- 
> 2.43.0
>
Dave Chinner Aug. 27, 2024, 3:23 a.m. UTC | #2
On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
> > From: liuh <liuhuan01@kylinos.cn>
> > 
> > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> > 	xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
> > 
> > System will generate signal SIGFPE corrupt the process. And the stack as follow:
> > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> > 	#0  libxfs_getbuf_flags
> > 	#1  libxfs_getbuf_flags
> > 	#2  libxfs_buf_read_map
> > 	#3  libxfs_buf_read
> > 	#4  libxfs_mount
> > 	#5  init
> > 	#6  main
> > 
> > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> > In this case, user cannot run in expert mode also.
> > 
> > Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
> > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
> > 
> > Signed-off-by: liuh <liuhuan01@kylinos.cn>
> > ---
> >  db/init.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/db/init.c b/db/init.c
> > index cea25ae5..2d3295ba 100644
> > --- a/db/init.c
> > +++ b/db/init.c
> > @@ -129,6 +129,13 @@ init(
> >  		}
> >  	}
> >  
> > +	if (unlikely(sbp->sb_agblocks == 0)) {
> > +		fprintf(stderr,
> > +			_("%s: device %s agblocks unexpected\n"),
> > +			progname, x.data.name);
> > +		exit(1);
> 
> What if we set sb_agblocks to 1 and let the debugger continue?

Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and
maybe crash unexpectedly than to refuse to allow any diagnosis of
the corrupt filesystem.

xfs_db is a debug and forensic analysis tool. Having it crash
because it didn't handle some corruption entirely corectly isn't
something that we should be particularly worried about...

-Dave.
liuh Aug. 27, 2024, 10:24 a.m. UTC | #3
在 2024/8/27 11:23, Dave Chinner 写道:
> On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote:
>> On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
>>> From: liuh <liuhuan01@kylinos.cn>
>>>
>>> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
>>> 	xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
>>>
>>> System will generate signal SIGFPE corrupt the process. And the stack as follow:
>>> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
>>> 	#0  libxfs_getbuf_flags
>>> 	#1  libxfs_getbuf_flags
>>> 	#2  libxfs_buf_read_map
>>> 	#3  libxfs_buf_read
>>> 	#4  libxfs_mount
>>> 	#5  init
>>> 	#6  main
>>>
>>> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
>>> In this case, user cannot run in expert mode also.
>>>
>>> Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
>>> Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
>>>
>>> Signed-off-by: liuh <liuhuan01@kylinos.cn>
>>> ---
>>>   db/init.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/db/init.c b/db/init.c
>>> index cea25ae5..2d3295ba 100644
>>> --- a/db/init.c
>>> +++ b/db/init.c
>>> @@ -129,6 +129,13 @@ init(
>>>   		}
>>>   	}
>>>   
>>> +	if (unlikely(sbp->sb_agblocks == 0)) {
>>> +		fprintf(stderr,
>>> +			_("%s: device %s agblocks unexpected\n"),
>>> +			progname, x.data.name);
>>> +		exit(1);
>> What if we set sb_agblocks to 1 and let the debugger continue?
> Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and
> maybe crash unexpectedly than to refuse to allow any diagnosis of
> the corrupt filesystem.
>
> xfs_db is a debug and forensic analysis tool. Having it crash
> because it didn't handle some corruption entirely corectly isn't
> something that we should be particularly worried about...
>
> -Dave.

I agree with both of you, xfs_db is just a debugger tool.
But for the above case, xfs_db can do nothing, even do a simple view on 
primary superblock.
The user all knowns is that xfs_db goto corrupt, but don't know what's 
cause the problem.

If set sb_agblocks to 1, xfs_db can going to work to view on primary 
superblock,
but can't relay on it to view more information.

Maybe left a warning message and set sb_agblocks to 1 and let debugger 
continue is better.

Thanks.
Dave Chinner Aug. 27, 2024, 11:37 p.m. UTC | #4
On Tue, Aug 27, 2024 at 06:24:25PM +0800, liuh wrote:
> 在 2024/8/27 11:23, Dave Chinner 写道:
> > On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
> > > > From: liuh <liuhuan01@kylinos.cn>
> > > > 
> > > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> > > > 	xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
> > > > 
> > > > System will generate signal SIGFPE corrupt the process. And the stack as follow:
> > > > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> > > > 	#0  libxfs_getbuf_flags
> > > > 	#1  libxfs_getbuf_flags
> > > > 	#2  libxfs_buf_read_map
> > > > 	#3  libxfs_buf_read
> > > > 	#4  libxfs_mount
> > > > 	#5  init
> > > > 	#6  main
> > > > 
> > > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> > > > In this case, user cannot run in expert mode also.
> > > > 
> > > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
> > > > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
> > > > 
> > > > Signed-off-by: liuh <liuhuan01@kylinos.cn>
> > > > ---
> > > >   db/init.c | 7 +++++++
> > > >   1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/db/init.c b/db/init.c
> > > > index cea25ae5..2d3295ba 100644
> > > > --- a/db/init.c
> > > > +++ b/db/init.c
> > > > @@ -129,6 +129,13 @@ init(
> > > >   		}
> > > >   	}
> > > > +	if (unlikely(sbp->sb_agblocks == 0)) {
> > > > +		fprintf(stderr,
> > > > +			_("%s: device %s agblocks unexpected\n"),
> > > > +			progname, x.data.name);
> > > > +		exit(1);
> > > What if we set sb_agblocks to 1 and let the debugger continue?
> > Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and
> > maybe crash unexpectedly than to refuse to allow any diagnosis of
> > the corrupt filesystem.
> > 
> > xfs_db is a debug and forensic analysis tool. Having it crash
> > because it didn't handle some corruption entirely corectly isn't
> > something that we should be particularly worried about...
> > 
> > -Dave.
> 
> I agree with both of you, xfs_db is just a debugger tool.
> But for the above case, xfs_db can do nothing, even do a simple view on
> primary superblock.
> The user all knowns is that xfs_db goto corrupt, but don't know what's cause
> the problem.
> 
> If set sb_agblocks to 1, xfs_db can going to work to view on primary
> superblock,
> but can't relay on it to view more information.

Yes, a value of "1" will avoid the crash.

However, it is obviously not an ideal choice because a value of 1 is
not a valid value for sb_agblocks.

IOWs, Darricks suggestion really was not to literally use a value of
1, but to substitute a non-zero value that allows xfs_db to largely
function correctly on a filesystem corrupted in this way.

> Maybe left a warning message and set sb_agblocks to 1 and let debugger
> continue is better.

When two senior developers both say "do it this way", it is worth
trying to understand why they made that suggestion, not take the
suggestion as the exact solution to the problem. We've provided
the -framework- for the desired solution, but you still need to do
some work to make it work.

So what is a valid value for sb_agblocks? Is that value more
functional that a value of 1?  If it is more functional, can we
calculate a better approximation of the correct value?

Go an look at all the geometry information we have in
the superblock that we can calculate an approximate AG size from.

	sb_agcount tells us how many AGs there are.
	sb_dblocks tells us the size of the filesystem.
	sb_agblklog tells us the maximum size the AG could possibly be.

Go look for redundant metadata that we can get the exact value of
agblocks from without relying on a specific sb_agblocks value.

	agf->agf_length should always equal sb_agblocks
	agi->agi_length should always equal sb_agblocks

Look at the sb_agblocks verification/bounds checking code to
determine absolute min/max valid values.

	XFS_MIN_AG_BYTES tells us the smallest valid AG size.
	XFS_MIN_AG_BLOCKS tells us the smallest valid AG block count.

So, how would you go about calculating an approximate, if not
exactly correct value for the missing sb_agblocks value?

-Dave.
diff mbox series

Patch

diff --git a/db/init.c b/db/init.c
index cea25ae5..2d3295ba 100644
--- a/db/init.c
+++ b/db/init.c
@@ -129,6 +129,13 @@  init(
 		}
 	}
 
+	if (unlikely(sbp->sb_agblocks == 0)) {
+		fprintf(stderr,
+			_("%s: device %s agblocks unexpected\n"),
+			progname, x.data.name);
+		exit(1);
+	}
+
 	agcount = sbp->sb_agcount;
 	mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
 	if (!mp) {