diff mbox

Illegal value 0 for -l sunit option ?

Message ID 20171118215140.GL5858@dastard (mailing list archive)
State Deferred, archived
Headers show

Commit Message

Dave Chinner Nov. 18, 2017, 9:51 p.m. UTC
On Sat, Nov 18, 2017 at 04:00:08PM +0100, Vladimir Gozora wrote:
> Hello XFS team,
> 
> I'm working on ReaR project (https://github.com/rear/rear) which
> aims for bare-metal disaster recovery.
> Lately I’ve run across behavior of mkfs.xfs which I don’t really
> know if is correct or not.
> The thing is that when I try to create XFS file system with
> xfsprogs-4.5.0 following commands runs just fine:
> mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i
> projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d
> sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n
> size=4096 -n version=2 -r extsize=4096 <destination>
> when I try same command with xfsprogs-4.10.0, I get following error:
> Illegal value 0 for -l sunit option. value is too small

Yup, old mkfs would accept values that were illegal and could do
completly unpredictable things with them. We tightened up the input
parsing to only accept valid values xfsprogs in 4.7.0, and in
particular this commit is relevant:

commit 2942ff49bdb6df08fb56674215b31c07cfc7c1fd
Author: Jan Tulak <jtulak@redhat.com>
Date:   Tue Jun 21 12:52:22 2016 +1000

    mkfs: fix -l su minval
    
    -l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE,
    because the upper limit is imposed by kernel on iclogbuf: stripe
    unit can't be bigger than the log buffer, but the log buffer can
    span multiple stripe units. L_SUNIT is changed in the same way.
    
    Signed-off-by: Jan Tulak <jtulak@redhat.com>
    Reviewed-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>


> In ReaR we are trying to backup whole OS and recreate it closest to
> original as possible, so I was collecting data obtained from
> xfs_info and used them during recovery phase. This approach would
> unfortunately not work any more, since I can't use (at least) log
> sections "sunit=0 blks"
> and pass it back to mkfs.xfs ...
> Is there some (correct) way how I could save options that were used
> during XFS file system creation and use them later, during recovery
> phase?

Why not just use xfs_copy? I mean, if all you're trying to do is
make a space efficient block-level clone of an XFS filesytem, then
we've already got a tool to do that...

Cheers,

Dave.

Comments

Eric Sandeen Nov. 19, 2017, 4:23 a.m. UTC | #1
On 11/18/17 3:51 PM, Dave Chinner wrote:
> On Sat, Nov 18, 2017 at 04:00:08PM +0100, Vladimir Gozora wrote:
>> Hello XFS team,
>>
>> I'm working on ReaR project (https://github.com/rear/rear) which
>> aims for bare-metal disaster recovery.
>> Lately I’ve run across behavior of mkfs.xfs which I don’t really
>> know if is correct or not.
>> The thing is that when I try to create XFS file system with
>> xfsprogs-4.5.0 following commands runs just fine:
>> mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i
>> projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d
>> sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n
>> size=4096 -n version=2 -r extsize=4096 <destination>
>> when I try same command with xfsprogs-4.10.0, I get following error:
>> Illegal value 0 for -l sunit option. value is too small
> 
> Yup, old mkfs would accept values that were illegal and could do
> completly unpredictable things with them. We tightened up the input
> parsing to only accept valid values xfsprogs in 4.7.0, and in
> particular this commit is relevant:
> 
> commit 2942ff49bdb6df08fb56674215b31c07cfc7c1fd
> Author: Jan Tulak <jtulak@redhat.com>
> Date:   Tue Jun 21 12:52:22 2016 +1000
> 
>     mkfs: fix -l su minval
>     
>     -l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE,
>     because the upper limit is imposed by kernel on iclogbuf: stripe
>     unit can't be bigger than the log buffer, but the log buffer can
>     span multiple stripe units. L_SUNIT is changed in the same way.
>     
>     Signed-off-by: Jan Tulak <jtulak@redhat.com>
>     Reviewed-by: Eric Sandeen <sandeen@redhat.com>
>     Signed-off-by: Dave Chinner <david@fromorbit.com>

Honestly, the minimum changed before that - the commit below fixed an
incorrect minval of XLOG_MIN_RECORD_BSIZE (16k) to "1" -
which honestly doesn't make much sense to me.

It was 1974d3f1a6495978419d931020f63c9cac8ba230 which changed
it from accepting min 0 to accepting min XLOG_MIN_RECORD_BSIZE:

                { .index = L_SUNIT,
+                 .minval = BTOBB(XLOG_MIN_RECORD_BSIZE),
+                 .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE),
                  .defaultval = SUBOPT_NEEDS_VAL,
                },

...

                                case L_SUNIT:
-                                       if (!value || *value == '\0')
-                                               reqval('l', subopts, L_SUNIT);
                                        if (lsunit)
                                                respec('l', subopts, L_SUNIT);
-                                       lsunit = getnum(value, 0, 0, false);
-                                       if (lsunit < 0)
-                                               illegal(value, "l sunit");
+                                       lsunit = getnum_checked(value, &lopts,
+                                                                L_SUNIT);
                                        lsunitflag = 1;
                                        break;

Specifying "0" as a value to mean "no stripe" is how I always
interpreted these things.  Not to mention that D_SUNIT accepts 0
without complaint, and  it's further compounded/confusing by xfs_info
reporting a stripe unit of 0 blocks, something it won't accept on a mkfs
commandline?  Makes little sense.

I'm not really convinced that a minval of 1 is correct.  The commitlog
changes the minval but speaks only to the rationale for the maxval, so
it's no help.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladimir Gozora Nov. 19, 2017, 10:40 a.m. UTC | #2
Hello Dave,

Thanks for your reply!

> Yup, old mkfs would accept values that were illegal and could do 
> completly unpredictable things with them. We tightened up the input 
> parsing to only accept valid values xfsprogs in 4.7.0

I fully agree here, you can't allow illegal values to enter file system 
structures. What is somehow confusing for me (as a user who don't have 
deep knowledge of XFS internals) is xfs_info output who proclaims XFS file 
system options that are set and in use (so I'd consider them valid), but 
on the other hand are evaluated by mkfs.xfs as invalid :-/.

Apart from that when I omit "-l sunit=0" in my mkfs.xfs command, 
file system is successfully created and consecutive call to xfs_info 
returns that my file system was created with "sunit=0 blks" ... This 
somehow don't feels right to me.

> Why not just use xfs_copy? I mean, if all you're trying to do is
> make a space efficient block-level clone of an XFS filesytem, then
> we've already got a tool to do that...

Honestly I've did not know much about this tool until you've mentioned it. 
I've tried it today for first time in my life and despite its certain 
usefulness I can't really use it for solving problem of mine. ReaR can 
operate in multiple modes where xfs_copy could be implemented, 
unfortunately I'm trying to solve problems in mode that is most frequently 
used by our users and implementing xfs_copy could cause havoc. The mode 
I'm talking about just collects information about your current 
partitioning, file systems, mount points etc, and stores them as config 
file into bootable ISO (which does not contains actual OS data). Once you 
boot ISO, all these data are pull out and passed to appropriate user space 
programs for structure recreation. So this is the main reason why I need 
to have original file system values at hand ...

Best regards!

Vlado
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Nov. 19, 2017, 10:34 p.m. UTC | #3
On Sat, Nov 18, 2017 at 10:23:26PM -0600, Eric Sandeen wrote:
> On 11/18/17 3:51 PM, Dave Chinner wrote:
> > On Sat, Nov 18, 2017 at 04:00:08PM +0100, Vladimir Gozora wrote:
> >> Hello XFS team,
> >>
> >> I'm working on ReaR project (https://github.com/rear/rear) which
> >> aims for bare-metal disaster recovery.
> >> Lately I’ve run across behavior of mkfs.xfs which I don’t really
> >> know if is correct or not.
> >> The thing is that when I try to create XFS file system with
> >> xfsprogs-4.5.0 following commands runs just fine:
> >> mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i
> >> projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d
> >> sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n
> >> size=4096 -n version=2 -r extsize=4096 <destination>
> >> when I try same command with xfsprogs-4.10.0, I get following error:
> >> Illegal value 0 for -l sunit option. value is too small
> > 
> > Yup, old mkfs would accept values that were illegal and could do
> > completly unpredictable things with them. We tightened up the input
> > parsing to only accept valid values xfsprogs in 4.7.0, and in
> > particular this commit is relevant:
> > 
> > commit 2942ff49bdb6df08fb56674215b31c07cfc7c1fd
> > Author: Jan Tulak <jtulak@redhat.com>
> > Date:   Tue Jun 21 12:52:22 2016 +1000
> > 
> >     mkfs: fix -l su minval
> >     
> >     -l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE,
> >     because the upper limit is imposed by kernel on iclogbuf: stripe
> >     unit can't be bigger than the log buffer, but the log buffer can
> >     span multiple stripe units. L_SUNIT is changed in the same way.
> >     
> >     Signed-off-by: Jan Tulak <jtulak@redhat.com>
> >     Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> >     Signed-off-by: Dave Chinner <david@fromorbit.com>
> 
> Honestly, the minimum changed before that - the commit below fixed an
> incorrect minval of XLOG_MIN_RECORD_BSIZE (16k) to "1" -
> which honestly doesn't make much sense to me.

Yeah, that was just plain wrong - I forgot that log stripe unit
could be any integer multiple of the fliesystem block size, not just
the valid sizes of the in-core log buffers. The intent was to remove
the specification of illegal values - many other illegal
values were also caught by that set of commits in 4.7.

> It was 1974d3f1a6495978419d931020f63c9cac8ba230 which changed
> it from accepting min 0 to accepting min XLOG_MIN_RECORD_BSIZE:
> 
>                 { .index = L_SUNIT,
> +                 .minval = BTOBB(XLOG_MIN_RECORD_BSIZE),
> +                 .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE),
>                   .defaultval = SUBOPT_NEEDS_VAL,
>                 },
> 
> ...
> 
>                                 case L_SUNIT:
> -                                       if (!value || *value == '\0')
> -                                               reqval('l', subopts, L_SUNIT);
>                                         if (lsunit)
>                                                 respec('l', subopts, L_SUNIT);
> -                                       lsunit = getnum(value, 0, 0, false);
> -                                       if (lsunit < 0)
> -                                               illegal(value, "l sunit");
> +                                       lsunit = getnum_checked(value, &lopts,
> +                                                                L_SUNIT);
>                                         lsunitflag = 1;
>                                         break;
> 
> Specifying "0" as a value to mean "no stripe" is how I always
> interpreted these things.

Which you can't actually do for v2 logs.  V2 logs *always* has a
stripe unit set, it's supposed to be a multiple of the filesystem
block size, and it's supposed to be recorded in bytes in the
superblock sb_logsunit.  The thing is, this means that a 4k block
size filesystem would always have a stripe unit of 4k - which is
clearly doesn't - and that would have been a performance regression
vs v1 logs for fsync heavy workloads. I'll come back to this.

ALso, let's not forget that setting lsu = 0 is completely illegal on
devices with a sector size != BBSIZE - the log stripe unit is
supposed to indicate the sector size for such devices, and mkfs will
calculate that automatically based on topology info.

> Not to mention that D_SUNIT accepts 0 without complaint, and  it's
> further compounded/confusing by xfs_info reporting a stripe unit
> of 0 blocks, something it won't accept on a mkfs commandline?
> Makes little sense.

It's perfectly valid to have a zero data stripe unit - it has no
"atomic write" configuration infomration encoded in it. That's why
it's not valid to have a zero log stripe unit for v2 logs - it's
critical information for correct log behaviour.  Let's go
back to the mkfs commit that introduced V2 logs:

commit 73bf5988043fcb9d8981a1d8fb4f9936ba0a0551
Author: Steve Lord <lord@sgi.com>
Date:   Tue Jun 18 20:32:20 2002 +0000

    Bump revision number of version 2 log support

[.....]

+       if (logversion == 2)
+               sbp->sb_logsunit = (lsunit == 0) ? 1 : lsunit;
+       else
+               sbp->sb_logsunit = 0;


IOWs, mkfs has always written a non-zero value into sb_logsunit, and
it's always been a value of 1 when no log stripe unit has been
specified. What's that value of 1 mean, though? It's not a multiple
of the filesystem block size - it's a hack to make v2 logs use a
stripe unit of a single sector. i.e. all the code treats a v2 log
with a sb_logsunit == 1 as though it is a v1 log.  e.g.  when
calculating stripe unit alignemnt padding of log writes in
xlog_sync():

       /* Round out the log write size */
        if (v2 && log->l_mp->m_sb.sb_logsunit > 1) {
                /* we have a v2 stripe unit to use */
                count = XLOG_LSUNITTOB(log, XLOG_BTOLSUNIT(log, count_init));
        } else {
                count = BBTOB(BTOBB(count_init));
        }

IOWs, a value of 1 effectively means "use the v1 padding limits".

So, what does XFS_IOC_FSGEOMETRY tell userspace:

        if (new_version >= 4) {
                geo->flags |=
                        (xfs_sb_version_haslogv2(&mp->m_sb) ?
                                XFS_FSOP_GEOM_FLAGS_LOGV2 : 0);
                geo->logsunit = mp->m_sb.sb_logsunit;
        }

It tell userspace whatever is in the superblock.

And what does xfs_info then output?

	 "", geo.logsectsize, geo.logsunit / geo.blocksize, lazycount,

Yeah, it assumes the number is in bytes and a multiple of block
size.  IOWs, a log stripe unit of 0 indicates basic block stripe
alignment (i.e. none) for both v1 and v2 logs without a stripe unit
configured. 

Just because xfs_info reports a value of "X" for some paramter, it
does not mean that value of "X" is a valid mkfs CLI parameter.
You can't say "no log stripe unit" on a v2 log. At best we can
define "lsu=0/lsunit=0" to mean "use the default" but then people
are going to complain that "I set it to 0 but mkfs is reporting
4096 and I can't change it".

To summaries: a log stripe value of 0 is illegal, a value of 1 means
means "behave like a v1 log", and any other value means
"use this byte count as the stripe unit".

> I'm not really convinced that a minval of 1 is correct.  The commitlog
> changes the minval but speaks only to the rationale for the maxval, so
> it's no help.

The information is all there in the git history... :/

Cheers,

Dave.
diff mbox

Patch

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 1593ee893f92..ce1ade25780e 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -452,7 +452,7 @@  struct opt_params lopts = {
                { .index = L_SUNIT,
                  .conflicts = { L_SU,
                                 LAST_CONFLICT },
-                 .minval = BTOBB(XLOG_MIN_RECORD_BSIZE),
+                 .minval = 1,
                  .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE),
                  .defaultval = SUBOPT_NEEDS_VAL,
                },
@@ -460,7 +460,7 @@  struct opt_params lopts = {
                  .conflicts = { L_SUNIT,
                                 LAST_CONFLICT },
                  .convert = true,
-                 .minval = XLOG_MIN_RECORD_BSIZE,
+                 .minval = BBTOB(1),
                  .maxval = XLOG_MAX_RECORD_BSIZE,
                  .defaultval = SUBOPT_NEEDS_VAL,
                },