Message ID | a673fbd3-5038-2dc8-8135-a58c24042734@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mkfs.xfs: sanity check stripe geometry from blkid | expand |
On Fri, May 15, 2020 at 02:14:17PM -0500, Eric Sandeen wrote: > We validate commandline options for stripe unit and stripe width, and > if a device returns nonsensical values via libblkid, the superbock write > verifier will eventually catch it and fail (noisily and cryptically) but > it seems a bit cleaner to just do a basic sanity check on the numbers > as soon as we get them from blkid, and if they're bogus, ignore them from > the start. > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > diff --git a/libfrog/topology.c b/libfrog/topology.c > index b1b470c9..38ed03b7 100644 > --- a/libfrog/topology.c > +++ b/libfrog/topology.c > @@ -213,6 +213,19 @@ static void blkid_get_topology( > val = blkid_topology_get_optimal_io_size(tp); > *swidth = val; > > + /* > + * Occasionally, firmware is broken and returns optimal < minimum, > + * or optimal which is not a multiple of minimum. > + * In that case just give up and set both to zero, we can't trust > + * information from this device. Similar to xfs_validate_sb_common(). > + */ > + if (*sunit) { > + if ((*sunit > *swidth) || (*swidth % *sunit != 0)) { I feel like we're copypasting this sunit/swidth checking logic all over xfsprogs and yet we're still losing the stripe unit validation whackamole game. In the end, we want to check more or less the same things for each pair of stripe unit and stripe width: * integer overflows of either value * sunit and swidth alignment wrt sector size * if either sunit or swidth are zero, both should be zero * swidth must be a multiple of sunit All four of these rules apply to the blkid_get_toplogy answers for the data device, the log device, and the realtime device; and any mkfs CLI overrides of those values. IOWs, is there some way to refactor those four rules into a single validation function and call that in the six(ish) places we need it? Especially since you're the one who played the last round of whackamole, back in May 2018. :) --D > + *sunit = 0; > + *swidth = 0; > + } > + } > + > /* > * If the reported values are the same as the physical sector size > * do not bother to report anything. It will only cause warnings >
On 5/15/20 3:48 PM, Darrick J. Wong wrote: > On Fri, May 15, 2020 at 02:14:17PM -0500, Eric Sandeen wrote: >> We validate commandline options for stripe unit and stripe width, and >> if a device returns nonsensical values via libblkid, the superbock write >> verifier will eventually catch it and fail (noisily and cryptically) but >> it seems a bit cleaner to just do a basic sanity check on the numbers >> as soon as we get them from blkid, and if they're bogus, ignore them from >> the start. >> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >> --- >> >> diff --git a/libfrog/topology.c b/libfrog/topology.c >> index b1b470c9..38ed03b7 100644 >> --- a/libfrog/topology.c >> +++ b/libfrog/topology.c >> @@ -213,6 +213,19 @@ static void blkid_get_topology( >> val = blkid_topology_get_optimal_io_size(tp); >> *swidth = val; >> >> + /* >> + * Occasionally, firmware is broken and returns optimal < minimum, >> + * or optimal which is not a multiple of minimum. >> + * In that case just give up and set both to zero, we can't trust >> + * information from this device. Similar to xfs_validate_sb_common(). >> + */ >> + if (*sunit) { >> + if ((*sunit > *swidth) || (*swidth % *sunit != 0)) { > > I feel like we're copypasting this sunit/swidth checking logic all over > xfsprogs That's because we are! > and yet we're still losing the stripe unit validation whackamole > game. Need moar hammers! > In the end, we want to check more or less the same things for each pair > of stripe unit and stripe width: > > * integer overflows of either value > * sunit and swidth alignment wrt sector size > * if either sunit or swidth are zero, both should be zero > * swidth must be a multiple of sunit > > All four of these rules apply to the blkid_get_toplogy answers for the > data device, the log device, and the realtime device; and any mkfs CLI > overrides of those values. > > IOWs, is there some way to refactor those four rules into a single > validation function and call that in the six(ish) places we need it? > Especially since you're the one who played the last round of whackamole, > back in May 2018. :) So .... I would like to do that refactoring. I'd also like to fix this with some expediency, TBH... Refactoring is going to be a little more complicated, I fear, because sanity on "what came straight from blkid" is a little different from "what came from cmdline" and has slightly different checks than "how does it fit into the superblock we just read?" This (swidth-vs-sunit-is-borken) is common enough that I wanted to just kill it with fire, and um ... make it all better/cohesive at some later date. I don't like arguing for expediency over beauty but well... here I am. -Eric > --D > >> + *sunit = 0; >> + *swidth = 0; >> + } >> + } >> + >> /* >> * If the reported values are the same as the physical sector size >> * do not bother to report anything. It will only cause warnings >> >
On 5/15/20 3:54 PM, Eric Sandeen wrote: > On 5/15/20 3:48 PM, Darrick J. Wong wrote: >> On Fri, May 15, 2020 at 02:14:17PM -0500, Eric Sandeen wrote: >>> We validate commandline options for stripe unit and stripe width, and >>> if a device returns nonsensical values via libblkid, the superbock write >>> verifier will eventually catch it and fail (noisily and cryptically) but >>> it seems a bit cleaner to just do a basic sanity check on the numbers >>> as soon as we get them from blkid, and if they're bogus, ignore them from >>> the start. >>> >>> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >>> --- >>> >>> diff --git a/libfrog/topology.c b/libfrog/topology.c >>> index b1b470c9..38ed03b7 100644 >>> --- a/libfrog/topology.c >>> +++ b/libfrog/topology.c >>> @@ -213,6 +213,19 @@ static void blkid_get_topology( >>> val = blkid_topology_get_optimal_io_size(tp); >>> *swidth = val; >>> >>> + /* >>> + * Occasionally, firmware is broken and returns optimal < minimum, >>> + * or optimal which is not a multiple of minimum. >>> + * In that case just give up and set both to zero, we can't trust >>> + * information from this device. Similar to xfs_validate_sb_common(). >>> + */ >>> + if (*sunit) { >>> + if ((*sunit > *swidth) || (*swidth % *sunit != 0)) { >> >> I feel like we're copypasting this sunit/swidth checking logic all over >> xfsprogs > > That's because we are! > >> and yet we're still losing the stripe unit validation whackamole >> game. > > Need moar hammers! > >> In the end, we want to check more or less the same things for each pair >> of stripe unit and stripe width: >> >> * integer overflows of either value >> * sunit and swidth alignment wrt sector size >> * if either sunit or swidth are zero, both should be zero >> * swidth must be a multiple of sunit >> >> All four of these rules apply to the blkid_get_toplogy answers for the >> data device, the log device, and the realtime device; and any mkfs CLI >> overrides of those values. >> >> IOWs, is there some way to refactor those four rules into a single >> validation function and call that in the six(ish) places we need it? >> Especially since you're the one who played the last round of whackamole, >> back in May 2018. :) > > So .... I would like to do that refactoring. I'd also like to fix this > with some expediency, TBH... > > Refactoring is going to be a little more complicated, I fear, because sanity > on "what came straight from blkid" is a little different from "what came from > cmdline" and has slightly different checks than "how does it fit into the > superblock we just read?" > > This (swidth-vs-sunit-is-borken) is common enough that I wanted to just kill > it with fire, and um ... make it all better/cohesive at some later date. > > I don't like arguing for expediency over beauty but well... here I am. A little more context, failures like this are caught today, but only in the validators, and it's very noisy and nonobvious when it happens. And we seem to keep finding devices that are broken like this, so would be nice to just ignore them and move on. -Eric
On Fri, May 15, 2020 at 03:54:34PM -0500, Eric Sandeen wrote: > On 5/15/20 3:48 PM, Darrick J. Wong wrote: > > On Fri, May 15, 2020 at 02:14:17PM -0500, Eric Sandeen wrote: > >> We validate commandline options for stripe unit and stripe width, and > >> if a device returns nonsensical values via libblkid, the superbock write > >> verifier will eventually catch it and fail (noisily and cryptically) but > >> it seems a bit cleaner to just do a basic sanity check on the numbers > >> as soon as we get them from blkid, and if they're bogus, ignore them from > >> the start. > >> > >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> > >> --- > >> > >> diff --git a/libfrog/topology.c b/libfrog/topology.c > >> index b1b470c9..38ed03b7 100644 > >> --- a/libfrog/topology.c > >> +++ b/libfrog/topology.c > >> @@ -213,6 +213,19 @@ static void blkid_get_topology( > >> val = blkid_topology_get_optimal_io_size(tp); > >> *swidth = val; > >> > >> + /* Tabs not spaces > >> + * Occasionally, firmware is broken and returns optimal < minimum, > >> + * or optimal which is not a multiple of minimum. > >> + * In that case just give up and set both to zero, we can't trust > >> + * information from this device. Similar to xfs_validate_sb_common(). > >> + */ > >> + if (*sunit) { > >> + if ((*sunit > *swidth) || (*swidth % *sunit != 0)) { Why not combine these? if (*sunit != 0 && (*sunit > *swidth || *swidth % *sunit != 0)) { Aside from that the code looks fine I guess... > > I feel like we're copypasting this sunit/swidth checking logic all over > > xfsprogs > > That's because we are! > > > and yet we're still losing the stripe unit validation whackamole > > game. > > Need moar hammers! > > > In the end, we want to check more or less the same things for each pair > > of stripe unit and stripe width: > > > > * integer overflows of either value > > * sunit and swidth alignment wrt sector size > > * if either sunit or swidth are zero, both should be zero > > * swidth must be a multiple of sunit > > > > All four of these rules apply to the blkid_get_toplogy answers for the > > data device, the log device, and the realtime device; and any mkfs CLI > > overrides of those values. > > > > IOWs, is there some way to refactor those four rules into a single > > validation function and call that in the six(ish) places we need it? > > Especially since you're the one who played the last round of whackamole, > > back in May 2018. :) > > So .... I would like to do that refactoring. I'd also like to fix this > with some expediency, TBH... > > Refactoring is going to be a little more complicated, I fear, because sanity > on "what came straight from blkid" is a little different from "what came from > cmdline" and has slightly different checks than "how does it fit into the > superblock we just read?" Admittedly I wondered if "refactor all these checks" would fall apart because each tool has its own slightly different reporting and logging requirements. You could make a checker function return an enum of what it's mad about and each caller could either have a message catalogue or just bail depending on the circumstances, but now I've probably overengineered the corner case catching code. > This (swidth-vs-sunit-is-borken) is common enough that I wanted to just kill > it with fire, and um ... make it all better/cohesive at some later date. > > I don't like arguing for expediency over beauty but well... here I am. :( --D > -Eric > > > --D > > > >> + *sunit = 0; > >> + *swidth = 0; > >> + } > >> + } > >> + > >> /* > >> * If the reported values are the same as the physical sector size > >> * do not bother to report anything. It will only cause warnings > >> > > >
On 5/15/20 4:10 PM, Darrick J. Wong wrote: > On Fri, May 15, 2020 at 03:54:34PM -0500, Eric Sandeen wrote: >> On 5/15/20 3:48 PM, Darrick J. Wong wrote: >>> On Fri, May 15, 2020 at 02:14:17PM -0500, Eric Sandeen wrote: >>>> We validate commandline options for stripe unit and stripe width, and >>>> if a device returns nonsensical values via libblkid, the superbock write >>>> verifier will eventually catch it and fail (noisily and cryptically) but >>>> it seems a bit cleaner to just do a basic sanity check on the numbers >>>> as soon as we get them from blkid, and if they're bogus, ignore them from >>>> the start. >>>> >>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >>>> --- >>>> >>>> diff --git a/libfrog/topology.c b/libfrog/topology.c >>>> index b1b470c9..38ed03b7 100644 >>>> --- a/libfrog/topology.c >>>> +++ b/libfrog/topology.c >>>> @@ -213,6 +213,19 @@ static void blkid_get_topology( >>>> val = blkid_topology_get_optimal_io_size(tp); >>>> *swidth = val; >>>> >>>> + /* > > Tabs not spaces yeah I have no idea how that happened :P >>>> + * Occasionally, firmware is broken and returns optimal < minimum, >>>> + * or optimal which is not a multiple of minimum. >>>> + * In that case just give up and set both to zero, we can't trust >>>> + * information from this device. Similar to xfs_validate_sb_common(). >>>> + */ >>>> + if (*sunit) { >>>> + if ((*sunit > *swidth) || (*swidth % *sunit != 0)) { > > Why not combine these? > > if (*sunit != 0 && (*sunit > *swidth || *swidth % *sunit != 0)) { was making it look a little like the kernel sb checks but *shrug* > Aside from that the code looks fine I guess... > >>> I feel like we're copypasting this sunit/swidth checking logic all over >>> xfsprogs >> >> That's because we are! >> >>> and yet we're still losing the stripe unit validation whackamole >>> game. >> >> Need moar hammers! >> >>> In the end, we want to check more or less the same things for each pair >>> of stripe unit and stripe width: >>> >>> * integer overflows of either value >>> * sunit and swidth alignment wrt sector size >>> * if either sunit or swidth are zero, both should be zero >>> * swidth must be a multiple of sunit >>> >>> All four of these rules apply to the blkid_get_toplogy answers for the >>> data device, the log device, and the realtime device; and any mkfs CLI >>> overrides of those values. >>> >>> IOWs, is there some way to refactor those four rules into a single >>> validation function and call that in the six(ish) places we need it? >>> Especially since you're the one who played the last round of whackamole, >>> back in May 2018. :) >> >> So .... I would like to do that refactoring. I'd also like to fix this >> with some expediency, TBH... >> >> Refactoring is going to be a little more complicated, I fear, because sanity >> on "what came straight from blkid" is a little different from "what came from >> cmdline" and has slightly different checks than "how does it fit into the >> superblock we just read?" > > Admittedly I wondered if "refactor all these checks" would fall apart > because each tool has its own slightly different reporting and logging > requirements. You could make a checker function return an enum of what > it's mad about and each caller could either have a message catalogue or > just bail depending on the circumstances, but now I've probably > overengineered the corner case catching code. > >> This (swidth-vs-sunit-is-borken) is common enough that I wanted to just kill >> it with fire, and um ... make it all better/cohesive at some later date. >> >> I don't like arguing for expediency over beauty but well... here I am. > > :( Well, I guess it's not actually that urgent; we don't handle it well today but maybe I should resist the urge to do another spot-fix that could(?) be handled better.... i'll put this on the backburner & give it a bit more thought I guess. Thanks, -Eric
diff --git a/libfrog/topology.c b/libfrog/topology.c index b1b470c9..38ed03b7 100644 --- a/libfrog/topology.c +++ b/libfrog/topology.c @@ -213,6 +213,19 @@ static void blkid_get_topology( val = blkid_topology_get_optimal_io_size(tp); *swidth = val; + /* + * Occasionally, firmware is broken and returns optimal < minimum, + * or optimal which is not a multiple of minimum. + * In that case just give up and set both to zero, we can't trust + * information from this device. Similar to xfs_validate_sb_common(). + */ + if (*sunit) { + if ((*sunit > *swidth) || (*swidth % *sunit != 0)) { + *sunit = 0; + *swidth = 0; + } + } + /* * If the reported values are the same as the physical sector size * do not bother to report anything. It will only cause warnings
We validate commandline options for stripe unit and stripe width, and if a device returns nonsensical values via libblkid, the superbock write verifier will eventually catch it and fail (noisily and cryptically) but it seems a bit cleaner to just do a basic sanity check on the numbers as soon as we get them from blkid, and if they're bogus, ignore them from the start. Signed-off-by: Eric Sandeen <sandeen@redhat.com> ---