diff mbox series

[V2] mkfs: avoid divide-by-zero when hardware reports optimal i/o size as 0

Message ID c347c3d7-2181-d114-7547-2649bb2f1022@redhat.com (mailing list archive)
State Accepted
Headers show
Series [V2] mkfs: avoid divide-by-zero when hardware reports optimal i/o size as 0 | expand

Commit Message

Eric Sandeen Aug. 1, 2018, 8:49 p.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
AG alignment code into a separate function.  It got rid of
redundant checks for dswidth != 0 since calc_stripe_factors was
supposed to guarantee that if dsunit is non-zero dswidth will be
as well.  Unfortunately, there's hardware out there that reports its
optimal i/o size as larger than the maximum i/o size, which the kernel
treats as broken and zeros out the optimal i/o size.

To resolve this we can check the topology before consuming it, and
ignore the bad stripe geometry.

[sandeen: remove guessing heuristic, just warn and ignore bad data.]

Fixes: 051b4e37f5e (mkfs: factor AG alignment)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---

so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
really must have meant, becaue next time the root cause may be differnt.
In other cases we ignore bad geometry, I think we should in this case as
well.  This will also let me go forward with a factored-out geometry checker,
and for user-specified badness we'll warn and exit, for kernel-provided
badness we'll warn and ignore.


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jeff Mahoney Aug. 1, 2018, 8:55 p.m. UTC | #1
Looks good to me.  I'm fine falling back to zeroed values.  Thanks for 
following up on this.

Reviewed-by: Jeff Mahoney <jeffm@suse.com>

-Jeff


On 8/1/18 4:49 PM, Eric Sandeen wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> AG alignment code into a separate function.  It got rid of
> redundant checks for dswidth != 0 since calc_stripe_factors was
> supposed to guarantee that if dsunit is non-zero dswidth will be
> as well.  Unfortunately, there's hardware out there that reports its
> optimal i/o size as larger than the maximum i/o size, which the kernel
> treats as broken and zeros out the optimal i/o size.
> 
> To resolve this we can check the topology before consuming it, and
> ignore the bad stripe geometry.
> 
> [sandeen: remove guessing heuristic, just warn and ignore bad data.]
> 
> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> ---
> 
> so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
> really must have meant, becaue next time the root cause may be differnt.
> In other cases we ignore bad geometry, I think we should in this case as
> well.  This will also let me go forward with a factored-out geometry checker,
> and for user-specified badness we'll warn and exit, for kernel-provided
> badness we'll warn and ignore.
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 1074886..2e53c1e 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2281,11 +2281,20 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>   
>   	/* if no stripe config set, use the device default */
>   	if (!dsunit) {
> -		dsunit = ft->dsunit;
> -		dswidth = ft->dswidth;
> -		use_dev = true;
> +		/* Ignore nonsense from device.  XXX add more validation */
> +		if (ft->dsunit && ft->dswidth == 0) {
> +			fprintf(stderr,
> +_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
> +				progname, BBTOB(ft->dsunit));
> +			ft->dsunit = 0;
> +			ft->dswidth = 0;
> +		} else {
> +			dsunit = ft->dsunit;
> +			dswidth = ft->dswidth;
> +			use_dev = true;
> +		}
>   	} else {
> -		/* check and warn is alignment is sub-optimal */
> +		/* check and warn if user-specified alignment is sub-optimal */
>   		if (ft->dsunit && ft->dsunit != dsunit) {
>   			fprintf(stderr,
>   _("%s: Specified data stripe unit %d is not the same as the volume stripe unit %d\n"),
> 
>
Carlos Maiolino Aug. 2, 2018, 9:34 a.m. UTC | #2
On Wed, Aug 01, 2018 at 03:49:45PM -0500, Eric Sandeen wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> AG alignment code into a separate function.  It got rid of
> redundant checks for dswidth != 0 since calc_stripe_factors was
> supposed to guarantee that if dsunit is non-zero dswidth will be
> as well.  Unfortunately, there's hardware out there that reports its
> optimal i/o size as larger than the maximum i/o size, which the kernel
> treats as broken and zeros out the optimal i/o size.
> 
> To resolve this we can check the topology before consuming it, and
> ignore the bad stripe geometry.
> 
> [sandeen: remove guessing heuristic, just warn and ignore bad data.]
> 
> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> ---
> 

I'm ok with it, but I'm starting to think calc_stripe_factors() is growing more
than it should, so, I'm thinking if we shouldn't factor out all these validation
paths into different functions, I'm ok doing that if you guys think it's not a
waste of time?

For the patch itself:

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
> really must have meant, becaue next time the root cause may be differnt.
> In other cases we ignore bad geometry, I think we should in this case as
> well.  This will also let me go forward with a factored-out geometry checker,
> and for user-specified badness we'll warn and exit, for kernel-provided
> badness we'll warn and ignore.
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 1074886..2e53c1e 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2281,11 +2281,20 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>  
>  	/* if no stripe config set, use the device default */
>  	if (!dsunit) {
> -		dsunit = ft->dsunit;
> -		dswidth = ft->dswidth;
> -		use_dev = true;
> +		/* Ignore nonsense from device.  XXX add more validation */
> +		if (ft->dsunit && ft->dswidth == 0) {
> +			fprintf(stderr,
> +_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
> +				progname, BBTOB(ft->dsunit));
> +			ft->dsunit = 0;
> +			ft->dswidth = 0;
> +		} else {
> +			dsunit = ft->dsunit;
> +			dswidth = ft->dswidth;
> +			use_dev = true;
> +		}
>  	} else {
> -		/* check and warn is alignment is sub-optimal */
> +		/* check and warn if user-specified alignment is sub-optimal */
>  		if (ft->dsunit && ft->dsunit != dsunit) {
>  			fprintf(stderr,
>  _("%s: Specified data stripe unit %d is not the same as the volume stripe unit %d\n"),
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Aug. 2, 2018, 3:32 p.m. UTC | #3
On 8/2/18 4:34 AM, Carlos Maiolino wrote:
> On Wed, Aug 01, 2018 at 03:49:45PM -0500, Eric Sandeen wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>> AG alignment code into a separate function.  It got rid of
>> redundant checks for dswidth != 0 since calc_stripe_factors was
>> supposed to guarantee that if dsunit is non-zero dswidth will be
>> as well.  Unfortunately, there's hardware out there that reports its
>> optimal i/o size as larger than the maximum i/o size, which the kernel
>> treats as broken and zeros out the optimal i/o size.
>>
>> To resolve this we can check the topology before consuming it, and
>> ignore the bad stripe geometry.
>>
>> [sandeen: remove guessing heuristic, just warn and ignore bad data.]
>>
>> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
>> ---
>>
> 
> I'm ok with it, but I'm starting to think calc_stripe_factors() is growing more
> than it should, so, I'm thinking if we shouldn't factor out all these validation
> paths into different functions, I'm ok doing that if you guys think it's not a
> waste of time?

Yes, that's what I said below.  ;)

>> This will also let me go forward with a factored-out geometry checker,
>> and for user-specified badness we'll warn and exit, for kernel-provided
>> badness we'll warn and ignore.

I've already written something up for this, sorry - was waiting for next
cycle to send it out.

-Eric

> For the patch itself:
> 
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> 
>> so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
>> really must have meant, becaue next time the root cause may be differnt.
>> In other cases we ignore bad geometry, I think we should in this case as
>> well.  This will also let me go forward with a factored-out geometry checker,
>> and for user-specified badness we'll warn and exit, for kernel-provided
>> badness we'll warn and ignore.
>>
>> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
>> index 1074886..2e53c1e 100644
>> --- a/mkfs/xfs_mkfs.c
>> +++ b/mkfs/xfs_mkfs.c
>> @@ -2281,11 +2281,20 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>>  
>>  	/* if no stripe config set, use the device default */
>>  	if (!dsunit) {
>> -		dsunit = ft->dsunit;
>> -		dswidth = ft->dswidth;
>> -		use_dev = true;
>> +		/* Ignore nonsense from device.  XXX add more validation */
>> +		if (ft->dsunit && ft->dswidth == 0) {
>> +			fprintf(stderr,
>> +_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
>> +				progname, BBTOB(ft->dsunit));
>> +			ft->dsunit = 0;
>> +			ft->dswidth = 0;
>> +		} else {
>> +			dsunit = ft->dsunit;
>> +			dswidth = ft->dswidth;
>> +			use_dev = true;
>> +		}
>>  	} else {
>> -		/* check and warn is alignment is sub-optimal */
>> +		/* check and warn if user-specified alignment is sub-optimal */
>>  		if (ft->dsunit && ft->dsunit != dsunit) {
>>  			fprintf(stderr,
>>  _("%s: Specified data stripe unit %d is not the same as the volume stripe unit %d\n"),
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 5, 2018, 10:20 p.m. UTC | #4
On Wed, Aug 01, 2018 at 03:49:45PM -0500, Eric Sandeen wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> AG alignment code into a separate function.  It got rid of
> redundant checks for dswidth != 0 since calc_stripe_factors was
> supposed to guarantee that if dsunit is non-zero dswidth will be
> as well.  Unfortunately, there's hardware out there that reports its
> optimal i/o size as larger than the maximum i/o size, which the kernel
> treats as broken and zeros out the optimal i/o size.
> 
> To resolve this we can check the topology before consuming it, and
> ignore the bad stripe geometry.
> 
> [sandeen: remove guessing heuristic, just warn and ignore bad data.]
> 
> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> ---
> 
> so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
> really must have meant, becaue next time the root cause may be differnt.
> In other cases we ignore bad geometry, I think we should in this case as
> well.  This will also let me go forward with a factored-out geometry checker,
> and for user-specified badness we'll warn and exit, for kernel-provided
> badness we'll warn and ignore.
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 1074886..2e53c1e 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2281,11 +2281,20 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>  
>  	/* if no stripe config set, use the device default */
>  	if (!dsunit) {
> -		dsunit = ft->dsunit;
> -		dswidth = ft->dswidth;
> -		use_dev = true;
> +		/* Ignore nonsense from device.  XXX add more validation */
> +		if (ft->dsunit && ft->dswidth == 0) {
> +			fprintf(stderr,
> +_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
> +				progname, BBTOB(ft->dsunit));
> +			ft->dsunit = 0;
> +			ft->dswidth = 0;

Not sure this is the right thing to do. If a stripe unit has been
given, then the device has an alignment requirement. If it hasn't
given an "optimal IO size", then shouldn't we just set ft->dswidth =
ft->dsunit to retain the alignment the device requested?

Cheers,

Dave.
Eric Sandeen Aug. 6, 2018, 4:06 a.m. UTC | #5
On 8/5/18 5:20 PM, Dave Chinner wrote:
> On Wed, Aug 01, 2018 at 03:49:45PM -0500, Eric Sandeen wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>> AG alignment code into a separate function.  It got rid of
>> redundant checks for dswidth != 0 since calc_stripe_factors was
>> supposed to guarantee that if dsunit is non-zero dswidth will be
>> as well.  Unfortunately, there's hardware out there that reports its
>> optimal i/o size as larger than the maximum i/o size, which the kernel
>> treats as broken and zeros out the optimal i/o size.
>>
>> To resolve this we can check the topology before consuming it, and
>> ignore the bad stripe geometry.
>>
>> [sandeen: remove guessing heuristic, just warn and ignore bad data.]
>>
>> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
>> ---
>>
>> so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
>> really must have meant, becaue next time the root cause may be differnt.
>> In other cases we ignore bad geometry, I think we should in this case as
>> well.  This will also let me go forward with a factored-out geometry checker,
>> and for user-specified badness we'll warn and exit, for kernel-provided
>> badness we'll warn and ignore.
>>
>> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
>> index 1074886..2e53c1e 100644
>> --- a/mkfs/xfs_mkfs.c
>> +++ b/mkfs/xfs_mkfs.c
>> @@ -2281,11 +2281,20 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>>  
>>  	/* if no stripe config set, use the device default */
>>  	if (!dsunit) {
>> -		dsunit = ft->dsunit;
>> -		dswidth = ft->dswidth;
>> -		use_dev = true;
>> +		/* Ignore nonsense from device.  XXX add more validation */
>> +		if (ft->dsunit && ft->dswidth == 0) {
>> +			fprintf(stderr,
>> +_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
>> +				progname, BBTOB(ft->dsunit));
>> +			ft->dsunit = 0;
>> +			ft->dswidth = 0;
> 
> Not sure this is the right thing to do. If a stripe unit has been
> given, then the device has an alignment requirement. If it hasn't
> given an "optimal IO size", then shouldn't we just set ft->dswidth =
> ft->dsunit to retain the alignment the device requested?

Yeah, I'm on the fence about that.  If it's giving us inconsistent information,
how can we know what's right and wrong?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 6, 2018, 10:27 p.m. UTC | #6
On Sun, Aug 05, 2018 at 11:06:57PM -0500, Eric Sandeen wrote:
> On 8/5/18 5:20 PM, Dave Chinner wrote:
> > On Wed, Aug 01, 2018 at 03:49:45PM -0500, Eric Sandeen wrote:
> >> From: Jeff Mahoney <jeffm@suse.com>
> >>
> >> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> >> AG alignment code into a separate function.  It got rid of
> >> redundant checks for dswidth != 0 since calc_stripe_factors was
> >> supposed to guarantee that if dsunit is non-zero dswidth will be
> >> as well.  Unfortunately, there's hardware out there that reports its
> >> optimal i/o size as larger than the maximum i/o size, which the kernel
> >> treats as broken and zeros out the optimal i/o size.
> >>
> >> To resolve this we can check the topology before consuming it, and
> >> ignore the bad stripe geometry.
> >>
> >> [sandeen: remove guessing heuristic, just warn and ignore bad data.]
> >>
> >> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> >> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> >> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> >> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> >> ---
> >>
> >> so, I rewrote this a bit.  I'm not a fan of guessing what the kernel
> >> really must have meant, becaue next time the root cause may be differnt.
> >> In other cases we ignore bad geometry, I think we should in this case as
> >> well.  This will also let me go forward with a factored-out geometry checker,
> >> and for user-specified badness we'll warn and exit, for kernel-provided
> >> badness we'll warn and ignore.
> >>
> >> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> >> index 1074886..2e53c1e 100644
> >> --- a/mkfs/xfs_mkfs.c
> >> +++ b/mkfs/xfs_mkfs.c
> >> @@ -2281,11 +2281,20 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
> >>  
> >>  	/* if no stripe config set, use the device default */
> >>  	if (!dsunit) {
> >> -		dsunit = ft->dsunit;
> >> -		dswidth = ft->dswidth;
> >> -		use_dev = true;
> >> +		/* Ignore nonsense from device.  XXX add more validation */
> >> +		if (ft->dsunit && ft->dswidth == 0) {
> >> +			fprintf(stderr,
> >> +_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
> >> +				progname, BBTOB(ft->dsunit));
> >> +			ft->dsunit = 0;
> >> +			ft->dswidth = 0;
> > 
> > Not sure this is the right thing to do. If a stripe unit has been
> > given, then the device has an alignment requirement. If it hasn't
> > given an "optimal IO size", then shouldn't we just set ft->dswidth =
> > ft->dsunit to retain the alignment the device requested?
> 
> Yeah, I'm on the fence about that.  If it's giving us inconsistent information,
> how can we know what's right and wrong?

In general, adding alignment when it's not needed does not hurt
performance. However, not having alignment when it is needed almost
always hurts performance.

From that perspective, I think what we should do here is obvious :P

Cheers,

Dave.
diff mbox series

Patch

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 1074886..2e53c1e 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2281,11 +2281,20 @@  _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
 
 	/* if no stripe config set, use the device default */
 	if (!dsunit) {
-		dsunit = ft->dsunit;
-		dswidth = ft->dswidth;
-		use_dev = true;
+		/* Ignore nonsense from device.  XXX add more validation */
+		if (ft->dsunit && ft->dswidth == 0) {
+			fprintf(stderr,
+_("%s: Volume reports stripe unit of %d bytes and stripe width of 0, ignoring.\n"),
+				progname, BBTOB(ft->dsunit));
+			ft->dsunit = 0;
+			ft->dswidth = 0;
+		} else {
+			dsunit = ft->dsunit;
+			dswidth = ft->dswidth;
+			use_dev = true;
+		}
 	} else {
-		/* check and warn is alignment is sub-optimal */
+		/* check and warn if user-specified alignment is sub-optimal */
 		if (ft->dsunit && ft->dsunit != dsunit) {
 			fprintf(stderr,
 _("%s: Specified data stripe unit %d is not the same as the volume stripe unit %d\n"),