diff mbox

mkfs: fix divide-by-zero in align_ag_geometry

Message ID 20180621025520.9115-1-jeffm@suse.com (mailing list archive)
State Superseded
Headers show

Commit Message

Jeff Mahoney June 21, 2018, 2:55 a.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
AG alignment code into a separate function.  It got rid of
redundant checks for dswidth != 0 but did too good a job since now
it doesn't check at all.  When we hit the check to see if agsize
is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash
on a divide by zero.

This patch re-adds the check to the top of align_ag_geometry.

Fixes: 051b4e37f5e (mkfs: factor AG alignment)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 mkfs/xfs_mkfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dave Chinner June 21, 2018, 3:57 a.m. UTC | #1
On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> AG alignment code into a separate function.  It got rid of
> redundant checks for dswidth != 0 but did too good a job since now
> it doesn't check at all.

Of course they got removed - we've already validated the CLI input
and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
zero in calc_stripe_factors().

i.e. We do input validation of CLI paramters before anything else so
that later users (like align_ag_geometry()) can assume the
parameters they are using are valid. In this case, the assumption is
that either both dsunit and dswidth are zero or that both are
non-zero and dswidth an integer multple of dsunit.

> When we hit the check to see if agsize
> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash
> on a divide by zero.

What CLI config did you use to hit this? I'd like to reproduce it so
I can see where calc_stripe_factors() is going wrong....

Cheers,

Dave.
Jeff Mahoney June 21, 2018, 7:15 p.m. UTC | #2
On 6/20/18 11:57 PM, Dave Chinner wrote:
> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>> AG alignment code into a separate function.  It got rid of
>> redundant checks for dswidth != 0 but did too good a job since now
>> it doesn't check at all.
> 
> Of course they got removed - we've already validated the CLI input
> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
> zero in calc_stripe_factors().
> 
> i.e. We do input validation of CLI paramters before anything else so
> that later users (like align_ag_geometry()) can assume the
> parameters they are using are valid. In this case, the assumption is
> that either both dsunit and dswidth are zero or that both are
> non-zero and dswidth an integer multple of dsunit.

It's not coming from the CLI parameters.  It's coming from the topology.
 The blkid topology stuff is returning 8k for minimal i/o and 0 for
optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
which takes it from the device.  We set cfg->dsunit=16 and
cfg->dswidth=0, and then head down to align_ag_geometry.

The topology on this system looks like:

ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
psectorsize = 512}

That matches with a few of the dm targets I see reported on this system.

Since minimal io size isn't sector size, we don't clear it.  Talking
with Eric, we should probably just extent that check in
blkid_get_topology to cover that case since it's not like we can just
reject what blkid gives us.

>> When we hit the check to see if agsize
>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash
>> on a divide by zero.
> 
> What CLI config did you use to hit this? I'd like to reproduce it so
> I can see where calc_stripe_factors() is going wrong....

It was just "mkfs.xfs <path-to-mpath-device>"

-Jeff
Eric Sandeen June 21, 2018, 7:26 p.m. UTC | #3
On 6/21/18 2:15 PM, Jeff Mahoney wrote:
> On 6/20/18 11:57 PM, Dave Chinner wrote:
>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>>> AG alignment code into a separate function.  It got rid of
>>> redundant checks for dswidth != 0 but did too good a job since now
>>> it doesn't check at all.
>>
>> Of course they got removed - we've already validated the CLI input
>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
>> zero in calc_stripe_factors().
>>
>> i.e. We do input validation of CLI paramters before anything else so
>> that later users (like align_ag_geometry()) can assume the
>> parameters they are using are valid. In this case, the assumption is
>> that either both dsunit and dswidth are zero or that both are
>> non-zero and dswidth an integer multple of dsunit.
> 
> It's not coming from the CLI parameters.  It's coming from the topology.
>  The blkid topology stuff is returning 8k for minimal i/o and 0 for
> optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
> which takes it from the device.  We set cfg->dsunit=16 and
> cfg->dswidth=0, and then head down to align_ag_geometry.
> 
> The topology on this system looks like:
> 
> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
> psectorsize = 512}
> 
> That matches with a few of the dm targets I see reported on this system.
> 
> Since minimal io size isn't sector size, we don't clear it.  Talking
> with Eric, we should probably just extent that check in
> blkid_get_topology to cover that case since it's not like we can just
> reject what blkid gives us.
> 
>>> When we hit the check to see if agsize
>>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash
>>> on a divide by zero.
>>
>> What CLI config did you use to hit this? I'd like to reproduce it so
>> I can see where calc_stripe_factors() is going wrong....
> 
> It was just "mkfs.xfs <path-to-mpath-device>"

yeah, so in blkid_get_topology we have:

        /*
         * If the reported values are the same as the physical sector size
         * do not bother to report anything.  It will only cause warnings
         * if people specify larger stripe units or widths manually.
         */
        if (*sunit == *psectorsize || *swidth == *psectorsize) {
                *sunit = 0;
                *swidth = 0;
        }

as a sanity check.  We should probably extend that (or add a similar test)
which sets both to zero if either is zero, for the same reason as the CLI
validation does it.  Comments should explain why....

        /*
         * Ensure that if either sunit or stripe width is zero, the other
         * is as well.  Having only one set is not valid stripe geometry.
         */
        if (*sunit == 0 || *swidth == 0) {
                *sunit = 0;
                *swidth = 0;
        }

FWIW if I set 

        *sunit = 8192;
        *swidth = 0;

manually in blkid_get_topology I do get the same floating point exception;
if that's what we get from libblkid for some weird device, we'd go boom.

-Eric
Dave Chinner June 21, 2018, 7:49 p.m. UTC | #4
On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote:
> On 6/21/18 2:15 PM, Jeff Mahoney wrote:
> > On 6/20/18 11:57 PM, Dave Chinner wrote:
> >> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
> >>> From: Jeff Mahoney <jeffm@suse.com>
> >>>
> >>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> >>> AG alignment code into a separate function.  It got rid of
> >>> redundant checks for dswidth != 0 but did too good a job since now
> >>> it doesn't check at all.
> >>
> >> Of course they got removed - we've already validated the CLI input
> >> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
> >> zero in calc_stripe_factors().
> >>
> >> i.e. We do input validation of CLI paramters before anything else so
> >> that later users (like align_ag_geometry()) can assume the
> >> parameters they are using are valid. In this case, the assumption is
> >> that either both dsunit and dswidth are zero or that both are
> >> non-zero and dswidth an integer multple of dsunit.
> > 
> > It's not coming from the CLI parameters.  It's coming from the topology.
> >  The blkid topology stuff is returning 8k for minimal i/o and 0 for
> > optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
> > which takes it from the device.  We set cfg->dsunit=16 and
> > cfg->dswidth=0, and then head down to align_ag_geometry.
> > 
> > The topology on this system looks like:
> > 
> > ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
> > 
> > psectorsize = 512}
> > That matches with a few of the dm targets I see reported on this system.

Exactly what kind of dm device does that - we've never had anyone
report this before? Also, if it's a dm device, shouldn't it
also be fixed to output a sane set of IO characteristics in /sys?

> > Since minimal io size isn't sector size, we don't clear it.  Talking
> > with Eric, we should probably just extent that check in
> > blkid_get_topology to cover that case since it's not like we can just
> > reject what blkid gives us.
> > 
> >>> When we hit the check to see if agsize
> >>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash
> >>> on a divide by zero.
> >>
> >> What CLI config did you use to hit this? I'd like to reproduce it so
> >> I can see where calc_stripe_factors() is going wrong....
> > 
> > It was just "mkfs.xfs <path-to-mpath-device>"
> 
> yeah, so in blkid_get_topology we have:
> 
>         /*
>          * If the reported values are the same as the physical sector size
>          * do not bother to report anything.  It will only cause warnings
>          * if people specify larger stripe units or widths manually.
>          */
>         if (*sunit == *psectorsize || *swidth == *psectorsize) {
>                 *sunit = 0;
>                 *swidth = 0;
>         }
> 
> as a sanity check.  We should probably extend that (or add a similar test)
> which sets both to zero if either is zero, for the same reason as the CLI
> validation does it.  Comments should explain why....
> 
>         /*
>          * Ensure that if either sunit or stripe width is zero, the other
>          * is as well.  Having only one set is not valid stripe geometry.
>          */
>         if (*sunit == 0 || *swidth == 0) {
>                 *sunit = 0;
>                 *swidth = 0;
>         }

Yup, we need to validate that input properly before using it. Silly
me - I should have known better that to assume external code would
give back sane results.....

Cheers,

Dave.
Jeff Mahoney June 21, 2018, 9:31 p.m. UTC | #5
On 6/21/18 3:49 PM, Dave Chinner wrote:
> On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote:
>> On 6/21/18 2:15 PM, Jeff Mahoney wrote:
>>> On 6/20/18 11:57 PM, Dave Chinner wrote:
>>>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
>>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>>
>>>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>>>>> AG alignment code into a separate function.  It got rid of
>>>>> redundant checks for dswidth != 0 but did too good a job since now
>>>>> it doesn't check at all.
>>>>
>>>> Of course they got removed - we've already validated the CLI input
>>>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
>>>> zero in calc_stripe_factors().
>>>>
>>>> i.e. We do input validation of CLI paramters before anything else so
>>>> that later users (like align_ag_geometry()) can assume the
>>>> parameters they are using are valid. In this case, the assumption is
>>>> that either both dsunit and dswidth are zero or that both are
>>>> non-zero and dswidth an integer multple of dsunit.
>>>
>>> It's not coming from the CLI parameters.  It's coming from the topology.
>>>  The blkid topology stuff is returning 8k for minimal i/o and 0 for
>>> optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
>>> which takes it from the device.  We set cfg->dsunit=16 and
>>> cfg->dswidth=0, and then head down to align_ag_geometry.
>>>
>>> The topology on this system looks like:
>>>
>>> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
>>>
>>> psectorsize = 512}
>>> That matches with a few of the dm targets I see reported on this system.
> 
> Exactly what kind of dm device does that - we've never had anyone
> report this before? Also, if it's a dm device, shouldn't it
> also be fixed to output a sane set of IO characteristics in /sys?

It's multipath, so it just follows the stacking rules.  The underlying
SCSI devices report the same numbers.  The optimal io number is
documented as being optional, at least for the kernel, so we need to
handle it being 0 anyway.  I'm not sure if the device specifying a
minimum i/o size larger than the sector size and also not specifying an
optimal i/o size is valid SCSI.  I'll ask for more information since now
I'm also curious.

-Jeff
Dave Chinner June 21, 2018, 10:36 p.m. UTC | #6
On Thu, Jun 21, 2018 at 05:31:58PM -0400, Jeff Mahoney wrote:
> On 6/21/18 3:49 PM, Dave Chinner wrote:
> > On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote:
> >> On 6/21/18 2:15 PM, Jeff Mahoney wrote:
> >>> On 6/20/18 11:57 PM, Dave Chinner wrote:
> >>>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
> >>>>> From: Jeff Mahoney <jeffm@suse.com>
> >>>>>
> >>>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> >>>>> AG alignment code into a separate function.  It got rid of
> >>>>> redundant checks for dswidth != 0 but did too good a job since now
> >>>>> it doesn't check at all.
> >>>>
> >>>> Of course they got removed - we've already validated the CLI input
> >>>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
> >>>> zero in calc_stripe_factors().
> >>>>
> >>>> i.e. We do input validation of CLI paramters before anything else so
> >>>> that later users (like align_ag_geometry()) can assume the
> >>>> parameters they are using are valid. In this case, the assumption is
> >>>> that either both dsunit and dswidth are zero or that both are
> >>>> non-zero and dswidth an integer multple of dsunit.
> >>>
> >>> It's not coming from the CLI parameters.  It's coming from the topology.
> >>>  The blkid topology stuff is returning 8k for minimal i/o and 0 for
> >>> optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
> >>> which takes it from the device.  We set cfg->dsunit=16 and
> >>> cfg->dswidth=0, and then head down to align_ag_geometry.
> >>>
> >>> The topology on this system looks like:
> >>>
> >>> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
> >>>
> >>> psectorsize = 512}
> >>> That matches with a few of the dm targets I see reported on this system.
> > 
> > Exactly what kind of dm device does that - we've never had anyone
> > report this before? Also, if it's a dm device, shouldn't it
> > also be fixed to output a sane set of IO characteristics in /sys?
> 
> It's multipath, so it just follows the stacking rules.  The underlying
> SCSI devices report the same numbers.  The optimal io number is
> documented as being optional, at least for the kernel, so we need to
> handle it being 0 anyway.  I'm not sure if the device specifying a
> minimum i/o size larger than the sector size and also not specifying an
> optimal i/o size is valid SCSI.  I'll ask for more information since now
> I'm also curious.

Ah, so it came from the hardware? In that case, we probably
shouldn't zero sunit when blkid reports this whacky case. i.e. I
think we should set swidth = sunit so that we retain allocation
alignment to the minimum IO size the device specified.

Cheers,

Dave.
Eric Sandeen June 21, 2018, 11:16 p.m. UTC | #7
On 6/21/18 5:36 PM, Dave Chinner wrote:
> On Thu, Jun 21, 2018 at 05:31:58PM -0400, Jeff Mahoney wrote:
>> On 6/21/18 3:49 PM, Dave Chinner wrote:
>>> On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote:
>>>> On 6/21/18 2:15 PM, Jeff Mahoney wrote:
>>>>> On 6/20/18 11:57 PM, Dave Chinner wrote:
>>>>>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote:
>>>>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>>>>
>>>>>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>>>>>>> AG alignment code into a separate function.  It got rid of
>>>>>>> redundant checks for dswidth != 0 but did too good a job since now
>>>>>>> it doesn't check at all.
>>>>>>
>>>>>> Of course they got removed - we've already validated the CLI input
>>>>>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is
>>>>>> zero in calc_stripe_factors().
>>>>>>
>>>>>> i.e. We do input validation of CLI paramters before anything else so
>>>>>> that later users (like align_ag_geometry()) can assume the
>>>>>> parameters they are using are valid. In this case, the assumption is
>>>>>> that either both dsunit and dswidth are zero or that both are
>>>>>> non-zero and dswidth an integer multple of dsunit.
>>>>>
>>>>> It's not coming from the CLI parameters.  It's coming from the topology.
>>>>>  The blkid topology stuff is returning 8k for minimal i/o and 0 for
>>>>> optimal.  Without a CLI config, we have dunit=0 in calc_stripe_factors,
>>>>> which takes it from the device.  We set cfg->dsunit=16 and
>>>>> cfg->dswidth=0, and then head down to align_ag_geometry.
>>>>>
>>>>> The topology on this system looks like:
>>>>>
>>>>> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512,
>>>>>
>>>>> psectorsize = 512}
>>>>> That matches with a few of the dm targets I see reported on this system.
>>>
>>> Exactly what kind of dm device does that - we've never had anyone
>>> report this before? Also, if it's a dm device, shouldn't it
>>> also be fixed to output a sane set of IO characteristics in /sys?
>>
>> It's multipath, so it just follows the stacking rules.  The underlying
>> SCSI devices report the same numbers.  The optimal io number is
>> documented as being optional, at least for the kernel, so we need to
>> handle it being 0 anyway.  I'm not sure if the device specifying a
>> minimum i/o size larger than the sector size and also not specifying an
>> optimal i/o size is valid SCSI.  I'll ask for more information since now
>> I'm also curious.
> 
> Ah, so it came from the hardware? In that case, we probably
> shouldn't zero sunit when blkid reports this whacky case. i.e. I
> think we should set swidth = sunit so that we retain allocation
> alignment to the minimum IO size the device specified.

Hrmph.

yeah, I'd really like to see 

# blockdev --getiomin --getioopt --getss --getpbsz

for all the devices in the stack, I guess...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen June 22, 2018, 1:34 a.m. UTC | #8
On 6/21/18 6:16 PM, Eric Sandeen wrote:
>> Ah, so it came from the hardware? In that case, we probably
>> shouldn't zero sunit when blkid reports this whacky case. i.e. I
>> think we should set swidth = sunit so that we retain allocation
>> alignment to the minimum IO size the device specified.
> Hrmph.
> 
> yeah, I'd really like to see 
> 
> # blockdev --getiomin --getioopt --getss --getpbsz
> 
> for all the devices in the stack, I guess...

Ok, so Jeff shared the lsblk output with me; the underlying
device and the mpath device are both:

MIN-IO OPT-IO PHY-SEC LOG-SEC
8192      0     512     512

so logical/physical 512, with minimum io 8192 and optimal io 0.
On further inspection Jeff says this may be because an optimal
size is reported as greater than the maximum size, so we get 0
due to the inconsistency.

Ok, well, I guess I see your point that we should probably set
sunit=swidth=8192 here.  I was thinking that conflicted with
how we process it on the commandline but no, it doesn't; if you're
going to manually specify you can't set one to 0 and not the other,
but we have to make the best of what the hardware tells us.

So we could do:

        /*
         * Some odd hardware sets minimum IO but not optimal; assume
         * minimal is the smallest preferred IO size, and set optimal
	 * to the same value, i.e. a stripe of 1 disk.
         */
        if (*sunit && *swidth == 0)
                *swidth = *sunit;

But it's clearly buggy hardware.  How are we to know which values 
are accurate and which are not?  It still may be better to just set
to no stripe geometry if we've gotten nonsense from libblockdev,
if the device wants better performance it needs to have rational
firmware... trying to 2nd guess bad values is just a crapshoot.
I guess we could issue a warning ...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney June 22, 2018, 1:40 a.m. UTC | #9
On 6/21/18 9:34 PM, Eric Sandeen wrote:
> On 6/21/18 6:16 PM, Eric Sandeen wrote:
>>> Ah, so it came from the hardware? In that case, we probably
>>> shouldn't zero sunit when blkid reports this whacky case. i.e. I
>>> think we should set swidth = sunit so that we retain allocation
>>> alignment to the minimum IO size the device specified.
>> Hrmph.
>>
>> yeah, I'd really like to see 
>>
>> # blockdev --getiomin --getioopt --getss --getpbsz
>>
>> for all the devices in the stack, I guess...
> 
> Ok, so Jeff shared the lsblk output with me; the underlying
> device and the mpath device are both:
> 
> MIN-IO OPT-IO PHY-SEC LOG-SEC
> 8192      0     512     512
> 
> so logical/physical 512, with minimum io 8192 and optimal io 0.
> On further inspection Jeff says this may be because an optimal
> size is reported as greater than the maximum size, so we get 0
> due to the inconsistency.

For this device, the user reported the following sg_vpd -a output:
  Optimal transfer length granularity: 16 blocks
   -- This is what gets exported as minimum_io_size
  Maximum transfer length: 32768 blocks
  Optimal transfer length: 65535 blocks

The SCSI layer checks to see if the optimal blocks value is larger than
what the device reports as its max and whether it's larger than 65535 or
smaller than page size, and refuses to set the optimal io size so it
defaults to 0.  Given the values above, that has to be what's happening
here.

> Ok, well, I guess I see your point that we should probably set
> sunit=swidth=8192 here.  I was thinking that conflicted with
> how we process it on the commandline but no, it doesn't; if you're
> going to manually specify you can't set one to 0 and not the other,
> but we have to make the best of what the hardware tells us.
> 
> So we could do:
> 
>         /*
>          * Some odd hardware sets minimum IO but not optimal; assume
>          * minimal is the smallest preferred IO size, and set optimal
> 	 * to the same value, i.e. a stripe of 1 disk.
>          */
>         if (*sunit && *swidth == 0)
>                 *swidth = *sunit;
> 
> But it's clearly buggy hardware.  How are we to know which values 
> are accurate and which are not?  It still may be better to just set
> to no stripe geometry if we've gotten nonsense from libblockdev,
> if the device wants better performance it needs to have rational
> firmware... trying to 2nd guess bad values is just a crapshoot.
> I guess we could issue a warning ...

I think the sane defaults + warning is the way to go here too.

-Jeff
Jeff Mahoney July 19, 2018, 9:09 p.m. UTC | #10
On 6/21/18 9:34 PM, Eric Sandeen wrote:
> On 6/21/18 6:16 PM, Eric Sandeen wrote:
>>> Ah, so it came from the hardware? In that case, we probably
>>> shouldn't zero sunit when blkid reports this whacky case. i.e. I
>>> think we should set swidth = sunit so that we retain allocation
>>> alignment to the minimum IO size the device specified.
>> Hrmph.
>>
>> yeah, I'd really like to see 
>>
>> # blockdev --getiomin --getioopt --getss --getpbsz
>>
>> for all the devices in the stack, I guess...
> 
> Ok, so Jeff shared the lsblk output with me; the underlying
> device and the mpath device are both:
> 
> MIN-IO OPT-IO PHY-SEC LOG-SEC
> 8192      0     512     512
> 
> so logical/physical 512, with minimum io 8192 and optimal io 0.
> On further inspection Jeff says this may be because an optimal
> size is reported as greater than the maximum size, so we get 0
> due to the inconsistency.
> 
> Ok, well, I guess I see your point that we should probably set
> sunit=swidth=8192 here.  I was thinking that conflicted with
> how we process it on the commandline but no, it doesn't; if you're
> going to manually specify you can't set one to 0 and not the other,
> but we have to make the best of what the hardware tells us.
> 
> So we could do:
> 
>         /*
>          * Some odd hardware sets minimum IO but not optimal; assume
>          * minimal is the smallest preferred IO size, and set optimal
> 	 * to the same value, i.e. a stripe of 1 disk.
>          */
>         if (*sunit && *swidth == 0)
>                 *swidth = *sunit;
> 
> But it's clearly buggy hardware.  How are we to know which values 
> are accurate and which are not?  It still may be better to just set
> to no stripe geometry if we've gotten nonsense from libblockdev,
> if the device wants better performance it needs to have rational
> firmware... trying to 2nd guess bad values is just a crapshoot.
> I guess we could issue a warning ...

Getting back to this after some time off.  I agree we should warn, but
if we warn, we need to do it when we have more context.  Otherwise we'll
warn even when the user specifies stripe parameters, which just looks
sloppy.  I have an updated patch that puts it in calc_stripe_factors and
only issues the warning if we will consume those values.

-Jeff
diff mbox

Patch

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 78d0ce5d..28a7e70c 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2670,7 +2670,7 @@  align_ag_geometry(
 	uint64_t	tmp_agsize;
 	int		dsunit = cfg->dsunit;
 
-	if (!dsunit)
+	if (!dsunit || !cfg->dswidth)
 		goto validate;
 
 	/*