Message ID | 20171118215140.GL5858@dastard (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
On 11/18/17 3:51 PM, Dave Chinner wrote: > On Sat, Nov 18, 2017 at 04:00:08PM +0100, Vladimir Gozora wrote: >> Hello XFS team, >> >> I'm working on ReaR project (https://github.com/rear/rear) which >> aims for bare-metal disaster recovery. >> Lately I’ve run across behavior of mkfs.xfs which I don’t really >> know if is correct or not. >> The thing is that when I try to create XFS file system with >> xfsprogs-4.5.0 following commands runs just fine: >> mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i >> projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d >> sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n >> size=4096 -n version=2 -r extsize=4096 <destination> >> when I try same command with xfsprogs-4.10.0, I get following error: >> Illegal value 0 for -l sunit option. value is too small > > Yup, old mkfs would accept values that were illegal and could do > completly unpredictable things with them. We tightened up the input > parsing to only accept valid values xfsprogs in 4.7.0, and in > particular this commit is relevant: > > commit 2942ff49bdb6df08fb56674215b31c07cfc7c1fd > Author: Jan Tulak <jtulak@redhat.com> > Date: Tue Jun 21 12:52:22 2016 +1000 > > mkfs: fix -l su minval > > -l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE, > because the upper limit is imposed by kernel on iclogbuf: stripe > unit can't be bigger than the log buffer, but the log buffer can > span multiple stripe units. L_SUNIT is changed in the same way. > > Signed-off-by: Jan Tulak <jtulak@redhat.com> > Reviewed-by: Eric Sandeen <sandeen@redhat.com> > Signed-off-by: Dave Chinner <david@fromorbit.com> Honestly, the minimum changed before that - the commit below fixed an incorrect minval of XLOG_MIN_RECORD_BSIZE (16k) to "1" - which honestly doesn't make much sense to me. It was 1974d3f1a6495978419d931020f63c9cac8ba230 which changed it from accepting min 0 to accepting min XLOG_MIN_RECORD_BSIZE: { .index = L_SUNIT, + .minval = BTOBB(XLOG_MIN_RECORD_BSIZE), + .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE), .defaultval = SUBOPT_NEEDS_VAL, }, ... case L_SUNIT: - if (!value || *value == '\0') - reqval('l', subopts, L_SUNIT); if (lsunit) respec('l', subopts, L_SUNIT); - lsunit = getnum(value, 0, 0, false); - if (lsunit < 0) - illegal(value, "l sunit"); + lsunit = getnum_checked(value, &lopts, + L_SUNIT); lsunitflag = 1; break; Specifying "0" as a value to mean "no stripe" is how I always interpreted these things. Not to mention that D_SUNIT accepts 0 without complaint, and it's further compounded/confusing by xfs_info reporting a stripe unit of 0 blocks, something it won't accept on a mkfs commandline? Makes little sense. I'm not really convinced that a minval of 1 is correct. The commitlog changes the minval but speaks only to the rationale for the maxval, so it's no help. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Dave, Thanks for your reply! > Yup, old mkfs would accept values that were illegal and could do > completly unpredictable things with them. We tightened up the input > parsing to only accept valid values xfsprogs in 4.7.0 I fully agree here, you can't allow illegal values to enter file system structures. What is somehow confusing for me (as a user who don't have deep knowledge of XFS internals) is xfs_info output who proclaims XFS file system options that are set and in use (so I'd consider them valid), but on the other hand are evaluated by mkfs.xfs as invalid :-/. Apart from that when I omit "-l sunit=0" in my mkfs.xfs command, file system is successfully created and consecutive call to xfs_info returns that my file system was created with "sunit=0 blks" ... This somehow don't feels right to me. > Why not just use xfs_copy? I mean, if all you're trying to do is > make a space efficient block-level clone of an XFS filesytem, then > we've already got a tool to do that... Honestly I've did not know much about this tool until you've mentioned it. I've tried it today for first time in my life and despite its certain usefulness I can't really use it for solving problem of mine. ReaR can operate in multiple modes where xfs_copy could be implemented, unfortunately I'm trying to solve problems in mode that is most frequently used by our users and implementing xfs_copy could cause havoc. The mode I'm talking about just collects information about your current partitioning, file systems, mount points etc, and stores them as config file into bootable ISO (which does not contains actual OS data). Once you boot ISO, all these data are pull out and passed to appropriate user space programs for structure recreation. So this is the main reason why I need to have original file system values at hand ... Best regards! Vlado -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Nov 18, 2017 at 10:23:26PM -0600, Eric Sandeen wrote: > On 11/18/17 3:51 PM, Dave Chinner wrote: > > On Sat, Nov 18, 2017 at 04:00:08PM +0100, Vladimir Gozora wrote: > >> Hello XFS team, > >> > >> I'm working on ReaR project (https://github.com/rear/rear) which > >> aims for bare-metal disaster recovery. > >> Lately I’ve run across behavior of mkfs.xfs which I don’t really > >> know if is correct or not. > >> The thing is that when I try to create XFS file system with > >> xfsprogs-4.5.0 following commands runs just fine: > >> mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i > >> projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d > >> sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n > >> size=4096 -n version=2 -r extsize=4096 <destination> > >> when I try same command with xfsprogs-4.10.0, I get following error: > >> Illegal value 0 for -l sunit option. value is too small > > > > Yup, old mkfs would accept values that were illegal and could do > > completly unpredictable things with them. We tightened up the input > > parsing to only accept valid values xfsprogs in 4.7.0, and in > > particular this commit is relevant: > > > > commit 2942ff49bdb6df08fb56674215b31c07cfc7c1fd > > Author: Jan Tulak <jtulak@redhat.com> > > Date: Tue Jun 21 12:52:22 2016 +1000 > > > > mkfs: fix -l su minval > > > > -l su should be in range BBTOB(1) <= L_SU <= XLOG_MAX_RECORD_BSIZE, > > because the upper limit is imposed by kernel on iclogbuf: stripe > > unit can't be bigger than the log buffer, but the log buffer can > > span multiple stripe units. L_SUNIT is changed in the same way. > > > > Signed-off-by: Jan Tulak <jtulak@redhat.com> > > Reviewed-by: Eric Sandeen <sandeen@redhat.com> > > Signed-off-by: Dave Chinner <david@fromorbit.com> > > Honestly, the minimum changed before that - the commit below fixed an > incorrect minval of XLOG_MIN_RECORD_BSIZE (16k) to "1" - > which honestly doesn't make much sense to me. Yeah, that was just plain wrong - I forgot that log stripe unit could be any integer multiple of the fliesystem block size, not just the valid sizes of the in-core log buffers. The intent was to remove the specification of illegal values - many other illegal values were also caught by that set of commits in 4.7. > It was 1974d3f1a6495978419d931020f63c9cac8ba230 which changed > it from accepting min 0 to accepting min XLOG_MIN_RECORD_BSIZE: > > { .index = L_SUNIT, > + .minval = BTOBB(XLOG_MIN_RECORD_BSIZE), > + .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE), > .defaultval = SUBOPT_NEEDS_VAL, > }, > > ... > > case L_SUNIT: > - if (!value || *value == '\0') > - reqval('l', subopts, L_SUNIT); > if (lsunit) > respec('l', subopts, L_SUNIT); > - lsunit = getnum(value, 0, 0, false); > - if (lsunit < 0) > - illegal(value, "l sunit"); > + lsunit = getnum_checked(value, &lopts, > + L_SUNIT); > lsunitflag = 1; > break; > > Specifying "0" as a value to mean "no stripe" is how I always > interpreted these things. Which you can't actually do for v2 logs. V2 logs *always* has a stripe unit set, it's supposed to be a multiple of the filesystem block size, and it's supposed to be recorded in bytes in the superblock sb_logsunit. The thing is, this means that a 4k block size filesystem would always have a stripe unit of 4k - which is clearly doesn't - and that would have been a performance regression vs v1 logs for fsync heavy workloads. I'll come back to this. ALso, let's not forget that setting lsu = 0 is completely illegal on devices with a sector size != BBSIZE - the log stripe unit is supposed to indicate the sector size for such devices, and mkfs will calculate that automatically based on topology info. > Not to mention that D_SUNIT accepts 0 without complaint, and it's > further compounded/confusing by xfs_info reporting a stripe unit > of 0 blocks, something it won't accept on a mkfs commandline? > Makes little sense. It's perfectly valid to have a zero data stripe unit - it has no "atomic write" configuration infomration encoded in it. That's why it's not valid to have a zero log stripe unit for v2 logs - it's critical information for correct log behaviour. Let's go back to the mkfs commit that introduced V2 logs: commit 73bf5988043fcb9d8981a1d8fb4f9936ba0a0551 Author: Steve Lord <lord@sgi.com> Date: Tue Jun 18 20:32:20 2002 +0000 Bump revision number of version 2 log support [.....] + if (logversion == 2) + sbp->sb_logsunit = (lsunit == 0) ? 1 : lsunit; + else + sbp->sb_logsunit = 0; IOWs, mkfs has always written a non-zero value into sb_logsunit, and it's always been a value of 1 when no log stripe unit has been specified. What's that value of 1 mean, though? It's not a multiple of the filesystem block size - it's a hack to make v2 logs use a stripe unit of a single sector. i.e. all the code treats a v2 log with a sb_logsunit == 1 as though it is a v1 log. e.g. when calculating stripe unit alignemnt padding of log writes in xlog_sync(): /* Round out the log write size */ if (v2 && log->l_mp->m_sb.sb_logsunit > 1) { /* we have a v2 stripe unit to use */ count = XLOG_LSUNITTOB(log, XLOG_BTOLSUNIT(log, count_init)); } else { count = BBTOB(BTOBB(count_init)); } IOWs, a value of 1 effectively means "use the v1 padding limits". So, what does XFS_IOC_FSGEOMETRY tell userspace: if (new_version >= 4) { geo->flags |= (xfs_sb_version_haslogv2(&mp->m_sb) ? XFS_FSOP_GEOM_FLAGS_LOGV2 : 0); geo->logsunit = mp->m_sb.sb_logsunit; } It tell userspace whatever is in the superblock. And what does xfs_info then output? "", geo.logsectsize, geo.logsunit / geo.blocksize, lazycount, Yeah, it assumes the number is in bytes and a multiple of block size. IOWs, a log stripe unit of 0 indicates basic block stripe alignment (i.e. none) for both v1 and v2 logs without a stripe unit configured. Just because xfs_info reports a value of "X" for some paramter, it does not mean that value of "X" is a valid mkfs CLI parameter. You can't say "no log stripe unit" on a v2 log. At best we can define "lsu=0/lsunit=0" to mean "use the default" but then people are going to complain that "I set it to 0 but mkfs is reporting 4096 and I can't change it". To summaries: a log stripe value of 0 is illegal, a value of 1 means means "behave like a v1 log", and any other value means "use this byte count as the stripe unit". > I'm not really convinced that a minval of 1 is correct. The commitlog > changes the minval but speaks only to the rationale for the maxval, so > it's no help. The information is all there in the git history... :/ Cheers, Dave.
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 1593ee893f92..ce1ade25780e 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -452,7 +452,7 @@ struct opt_params lopts = { { .index = L_SUNIT, .conflicts = { L_SU, LAST_CONFLICT }, - .minval = BTOBB(XLOG_MIN_RECORD_BSIZE), + .minval = 1, .maxval = BTOBB(XLOG_MAX_RECORD_BSIZE), .defaultval = SUBOPT_NEEDS_VAL, }, @@ -460,7 +460,7 @@ struct opt_params lopts = { .conflicts = { L_SUNIT, LAST_CONFLICT }, .convert = true, - .minval = XLOG_MIN_RECORD_BSIZE, + .minval = BBTOB(1), .maxval = XLOG_MAX_RECORD_BSIZE, .defaultval = SUBOPT_NEEDS_VAL, },