Message ID | CAOQ4uxjEUAR1-1ikZSN2e99gK4k_mEMnhGGYvw=2Zj6oLOPVuQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Nov 03, 2017 at 02:19:15PM +0200, Amir Goldstein wrote: > Whether or not we implement physical shirk is not the main issue. > The main issue is implicitly changing the meaning of an existing API. I don't intend on changing the existing API. I said "there are bugs in the RFC code, and I'm addressing them". > This is how I propose to smooth in the new API with as little pain as > possible for existing deployments, yet making sure that "usable block" > is only modified by new programs that intend to modify it: > > ---------------------- > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > index 8f22fc579dbb..922798ebf3e8 100644 > --- a/fs/xfs/xfs_fsops.c > +++ b/fs/xfs/xfs_fsops.c > @@ -171,10 +171,25 @@ xfs_growfs_data_private( > xfs_rfsblock_t nfree; > xfs_agnumber_t oagcount; > int pct; > + unsigned ver; > xfs_trans_t *tp; > > nb = in->newblocks; > - pct = in->imaxpct; > + pct = in->imaxpct & 0xff; > + ver = in->imaxpct >> 8; > + if (ver > 1) > + return -EINVAL; > + > + /* > + * V0 API is only allowed to grow sb_usable_dblocks along with > + * sb_dblocks. Any other change to sb_usable_dblocks requires > + * V1 API to prove that userspace is aware of usable_dblocks. > + */ > + if (ver == 0 && xfs_sb_version_hasthinspace(mp->m_sb) && > + (mp->m_sb.sb_usable_dblocks != mp->m_sb.sb_dblocks || > + nb < mp->m_sb.sb_dblocks)) > + return -EINVAL; > + > if (nb < mp->m_sb.sb_dblocks || pct < 0 || pct > 100) > return -EINVAL; > > ---------------- > > When xfs_grow WANTS to change size of fs known to be thin, it should set > in->imaxpct |= 1 << 8; > > Dave, > > Is that the complication of implementation you were talking about? > Really? No, it's not. Seems like everyone is still yelling over me instead of listening. Let's start with a demonstration. I'm going to make a thin filesystem on a kernel running my current patchset, then I'm going to use an old xfs_growfs binary (from 4.12) to try to grow and shrink that filesystem. So, create, mount: $ sudo ~/packages/mkfs.xfs -f -dthin,size=1g /dev/pmem0 Default configuration sourced from package build definitions meta-data=/dev/pmem0 isize=512 agcount=8, agsize=262144 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 data = bsize=4096 blocks=2097152, imaxpct=25, thinblocks=262144 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ sudo mount /dev/pmem0 /mnt/test $ Let's now use the old growfs to look at the filesystem: $ sudo xfs_growfs -V xfs_growfs version 4.12.0 $ sudo xfs_info /mnt/test meta-data=/dev/pmem0 isize=512 agcount=8, agsize=262144 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=262144, imaxpct=25, thin=0 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ It reports as a 1GB filesystem, but has 8 AGs of 1GB each. Clearly a thinspace filesystem, but grwofs doesn't know that. The kernel reports it's size as: $ df -h /mnt/test Filesystem Size Used Avail Use% Mounted on /dev/pmem0 1006M 33M 974M 4% /mnt/test $ Now, let's grow it: $ sudo xfs_growfs -D 500000 /mnt/test meta-data=/dev/pmem0 isize=512 agcount=8, agsize=262144 blks [...] data blocks changed from 262144 to 500000 $ And the kernel reported size: $ df -h /mnt/test Filesystem Size Used Avail Use% Mounted on /dev/pmem0 1.9G 33M 1.9G 2% /mnt/test $ And xfs_info for good measure: $ sudo xfs_info /mnt/test meta-data=/dev/pmem0 isize=512 agcount=8, agsize=262144 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=500000, imaxpct=25, thin=0 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ The thinspace grow succeeded exactly as it should - it grew to 500k blocks without changing the physical filesystem. Thinspace is completely transparent to the grow operation now, and will work with old growfs binaries and apps just fine. So, onto shrink with an old grwofs binary: $ sudo xfs_growfs -D 400000 /mnt/test meta-data=/dev/pmem0 isize=512 agcount=8, agsize=262144 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=500000, imaxpct=25, thin=0 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data size 400000 too small, old size is 500000 $ Oh, look, the old grwofs binary refused to shrink the filesystem. It didn't even call into the filesystem, it just assumes that you can't shrink XFS filesystems and doesn't even try. So, this means old userspace and new thinspace filesystems are pretty safe without having to change the interface. And tells us that the main change the userspace growfs code needs is to allow shrink to proceed. Hence, the new XFS_IOC_FSGEOMETRY ioctl and this change: - if (!error && dsize < geo.datablocks) { + if (dsize < geo.datablocks && !thin_enabled) { fprintf(stderr, _("data size %lld too small," " old size is %lld\n"), (long long)dsize, (long long)geo.datablocks); So, now the growfs binary can only shrink filesystems that report as thinspace filesystems. It will still refuse to shrink filesystems that cannot be shrunk. With a new binary: $ sudo xfs_growfs -V xfs_growfs version 4.13.1 $ sudo xfs_growfs -D 400000 /mnt/test meta-data=/dev/pmem0 isize=512 agcount=8, agsize=262144 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=500000, imaxpct=25, thin=1 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 500000 to 400000 $ df -h /mnt/test Filesystem Size Used Avail Use% Mounted on /dev/pmem0 1.6G 33M 1.5G 3% /mnt/test $ We can see it shrinks a thinspace filesystem just fine. A normal thick filesystem: $ sudo ~/packages/mkfs.xfs /dev/pmem1 Default configuration sourced from package build definitions meta-data=/dev/pmem1 isize=512 agcount=4, agsize=524288 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 data = bsize=4096 blocks=2097152, imaxpct=25, thinblocks=0 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ sudo mount /dev/pmem1 /mnt/scratch $ sudo xfs_growfs -D 400000 /mnt/scratch [....] data size 400000 too small, old size is 2097152 $ Fails in exactly the same way as they did before. IOWs, the only thing that growfs requires was awareness that it could shrink a filesystem (a XFS_IOC_FSGEOMETRY change) and then without any other changes it can operate sanely on thinspace filesystems. There is no need to change the actual XFS_IOC_FSGROWDATA ioctl, because it doesn't care what the underlying implementation does or supports.... ---- That's a demonstration that the growfs interface does not need to change for userspace to retain sane, predicatable, obvious behaviour across both old and new growfs binaries with the introduction of thinspace filesystems. People calling the growfs ioctl directly without adding thinspace awareness will just see some filesystems succeed when shrinking - they'll need to add checks for the thinspace flag in the XFS_IOC_FSGEOMETRY results if they want to do anything smarter, but otherwise thinspace grow and shrink will just /work as expected/ with existing binaries. Amir, my previous comments about unnecessary interface complication is based on these observations. You seem to be making an underlying assumption that existing binaries can't grow/shrink thinspace filesystems without modification. This assumption is incorrect. Indeed, code complexity isn't the issue here - the issue is the smearing of implementation details into an abstract control command that is completely independent of the underlying implementations. Indeed, userspace does not need to be aware of whether the filesystem is a thinspace filesystem or not - it just needs to know the current size so that if it doesn't need to change the size (e.g. imaxpct change) it puts the right value into the growfs control structure. That is the bit I got wrong in the RFC code I posted - the XFS_IOC_FSGEOMETRY ioctl changes put the wrong size in geo->datablocks for thinspace filesystems. I've said this multiple times now, but the message doesn't seem to have sunk in - if we put the right data in XFS_IOC_FSGEOMETRY, then userspace doesn't need to care about whether it's a thin or a thick filesystem that it is operating on. However, if we start changing the growfs ioctl because it [apparently] needs to be aware of thin/thick filesystems, we introduce kernel/userspace compatibility issues that currently don't exist. And, as I've clearly demonstrated, those interface differences *do not need to exist* for both old and new growfs binaries to work correctly with thinspace fielsystems. That's the "complication of implementation" I was refering to - introducing interface changes that would result in extra changes to userspace binaries and as such introduce user visible binary compatibility issues that users do not need to (and should not) be exposed to. Not to mention other application developers that might be using the existing geometry and grwofs ioctls - shrink will now just work on existing binaries without them having to do anything.... > Don't you see that this is the right thing to do w.r.t. API design? No, I don't, because you're trying to solve a problem that, quite simply, doesn't exist. Cheers, Dave.
On Mon, Nov 6, 2017 at 3:16 AM, Dave Chinner <david@fromorbit.com> wrote: ... > Not to mention other application developers that might > be using the existing geometry and grwofs ioctls - shrink will now Acknowledging that those "other application" may exist in the wild makes it even harder to claim that allowing to change usable_dblocks with existing API is not going to cause pain for users... > just work on existing binaries without them having to do > anything.... > >> Don't you see that this is the right thing to do w.r.t. API design? > > No, I don't, because you're trying to solve a problem that, quite > simply, doesn't exist. > It is *very* possible that you are right, but you have not proven that the problem does not exist. You have proven that the problem does not exist w.r.t old xfs_grow -D <size> and you correctly claimed that the problem with old xfs_grow -m <imaxpct> is an implementation bug with RFC patches. Let me give an example that will demonstrate my concern. One of our older NAS products, still deployed with many customers has LVM based volume manager and ext3 file system. When user changes the size of a volume via Web UI, lower level commands will resize LVM and then resize2fs to max size. Because "resize2fs to max size" is not an atomic operation and because this is a "for dummies" product, in order to recover from "half-resize", there is a post-mount script that runs resize2fs unconditionally after boot. So in this product, the LVM volume size is treated as an "intent log" for file system size grow operation. I find it hard to believe that this practice is so novel that nobody else ever used it and for that matter with xfs file system over LVM and xfs_grow -d. Now imagine you upgrade such a system to a kernel that supports "thinspace" and new xfsprogs and create thin file systems, and then downgrade the system to a kernel that sill supports "thinspace", but xfsprogs that do not (or even a proprietary system component that uses XFS_IOC_FSGROWDATA ioctl to perform the "auto-grow"). The results will be that all the thin file systems will all "auto-grow" to the thick size of the volume. So the way I see it, my proposal to require explicitly XFS_IOC_FSGROWDATA API V1 for any change to usable_dblocks that is not coupled with same change to dblocks is meant to resolve userspace/kernel compatibility issues. And I fail to see how that requirement makes it hard to maintain userspace/kernel compatibility: - xfs_growfs needs to check for "thinspace" flag and if exists use V1 API - old kernel can't mount "thinspace" fs, so it can never see V1 API unless from a buggy program, that will get -EINVAL - old xfs_growfs will keep failing to shrink even a thin fs - old xfs_growfs will succeed to grow, except (*) for a thin fs that was previously shrunk (*) That exception is relating to the example I described above, and we seem to not be in agreement about the desired behavior. IIUC, you like the fact that old xfs_grow can grow a thin and shrunk fs where I see troubled lurking in this behavior. So we can agree to disagree on the desired behavior, but for the record, this and only this point is the API design flaw I am talking about. There may be complexities w.r.t maintaining userspace/kernel compatibility with the proposed solution. I trust you on this because you have far more experience than me with maintaining historic baggage of wrongly designed APIs. If no one else is concerned about the old xfs_grow -d use case and no one else shares my opinion about the desired behavior in that use case, then I withdraw my claims. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 06, 2017 at 11:48:05AM +0200, Amir Goldstein wrote: > On Mon, Nov 6, 2017 at 3:16 AM, Dave Chinner <david@fromorbit.com> wrote: > > just work on existing binaries without them having to do > > anything.... > > > >> Don't you see that this is the right thing to do w.r.t. API design? > > > > No, I don't, because you're trying to solve a problem that, quite > > simply, doesn't exist. > > > > It is *very* possible that you are right, but you have not proven that > the problem does not exist. You have proven that the problem > does not exist w.r.t old xfs_grow -D <size> and you correctly > claimed that the problem with old xfs_grow -m <imaxpct> is an > implementation bug with RFC patches. > > Let me give an example that will demonstrate my concern. > > One of our older NAS products, still deployed with many customers > has LVM based volume manager and ext3 file system. > When user changes the size of a volume via Web UI, lower level > commands will resize LVM and then resize2fs to max size. > Because "resize2fs to max size" is not an atomic operation and > because this is a "for dummies" product, in order to recover from > "half-resize", there is a post-mount script that runs resize2fs > unconditionally after boot. Sure, but if you have a product using thinspace filesystems then you are going to need to do something different. To think you can just plug a thinspace filesystem into an existing stack and have it work unmodified is naive at best. Indeed, with a thinspace filesystem, the LVM volume *never needs resizing* because it will be created far larger than ever required in the first place. And recovering from a crash half way through a thinspace growfs operation? Well, it's all done through the filesystem journal because it's all done through transactions. IOWs, thinspace filesystems solve this "atomic resize" problem without needing crutches or external constructs to hide non-atomic operations. Essentially, if you have a storage product and you plug thinspace filesystems into it without management modifications, then you get to keep all the bits that break and don't work correctly. All I care is that stupid things don't happen if someone has integrated thinspace filesystems correctly and then a user does this: > Now imagine you upgrade such a system to a kernel that supports > "thinspace" and new xfsprogs and create thin file systems, and then > downgrade the system to a kernel that sill supports "thinspace", but > xfsprogs that do not (or even a proprietary system component that > uses XFS_IOC_FSGROWDATA ioctl to perform the "auto-grow"). In this case the thinspace filesystems will behave exactly like a physical filesystem. i.e. they will physically grow to the size of the underlying device. I can't stop this from happening, but I can ensure that it doesn't do irreversable damage and that it's reversible as soon as userspace is restored to suport thinspace administration again. i.e. just shrink it back down to the required thin size, and it's like that grow never occurred... i.e. it's not the end of the world, and you can recover cleanly from it without any issues. > The results will be that all the thin file systems will all "auto-grow" > to the thick size of the volume. Of course it will - the user/app/admin asked the kernel to grow the filesystem to the size of the underlying device. I don't know what you expect a thinspace filesystem to do here other than *grow the filesystem to the size of the underlying device*. > So we can agree to disagree on the desired behavior, but for the > record, this and only this point is the API design flaw I am talking > about. What flaw? You've come up with a scenario where the thinspace filesystem will behave like physical filesystem because that's what the thinspace unaware admin program expects it to be. i.e. thinspace is completely transparent to userspace, and physical grow does not prevent subsequent shrinks back to the original thinspace size. None of this indicates that there is an interface problem... Cheers, Dave.
On Mon, Nov 6, 2017 at 11:46 PM, Dave Chinner <david@fromorbit.com> wrote: [...] >> commands will resize LVM and then resize2fs to max size. >> Because "resize2fs to max size" is not an atomic operation and >> because this is a "for dummies" product, in order to recover from >> "half-resize", there is a post-mount script that runs resize2fs >> unconditionally after boot. > > Sure, but if you have a product using thinspace filesystems then you > are going to need to do something different. To think you can just > plug a thinspace filesystem into an existing stack and have it work > unmodified is naive at best. > I do not expect it to "work" on the contrary - I expect it not to work, just the same as xfs_repair will refuse to repair an xfs with unknown feature flags. [...] > >> Now imagine you upgrade such a system to a kernel that supports >> "thinspace" and new xfsprogs and create thin file systems, and then >> downgrade the system to a kernel that sill supports "thinspace", but >> xfsprogs that do not (or even a proprietary system component that >> uses XFS_IOC_FSGROWDATA ioctl to perform the "auto-grow"). > > In this case the thinspace filesystems will behave exactly like a > physical filesystem. i.e. they will physically grow to the size of > the underlying device. I can't stop this from happening, but I can > ensure that it doesn't do irreversable damage and that it's > reversible as soon as userspace is restored to suport thinspace > administration again. i.e. just shrink it back down to the required > thin size, and it's like that grow never occurred... > > i.e. it's not the end of the world, and you can recover cleanly from > it without any issues. > Very true, not the end of the world. That is why your design is something I can "live with". But it does have potential to cause pain downstream in the future and I just don't see any reason why this pain cannot be avoided. I fail to see the downside of not allowing old xfs_grow to modify thin space. >> The results will be that all the thin file systems will all "auto-grow" >> to the thick size of the volume. > > Of course it will - the user/app/admin asked the kernel to grow the > filesystem to the size of the underlying device. I don't know what > you expect a thinspace filesystem to do here other than *grow the > filesystem to the size of the underlying device*. I expect kernel to tell user EINVAL and warn that user needs to use newer xfsprogs to auto grow thin space. Let me re-iterate the requirement we are disagreeing on: - old xfs_growfs will succeed to grow, *except* for a thin fs that was previously shrunk (i.e. dblocks != usable_dblocks) You explained at length why the exception is not a must. I do not remember a single argument that explains what's wrong with keeping the exception. I claimed that this exception can reduce pain to end users. In response, you wrote that "user/app/admin asked to grow fs to maximum size" and in so many words that they can "keep the pieces". What bad things can happen if the clueless user/app/admin is refused to grow fs to maximum size? The practice of "not sure you know what you are doing so please keep away" has been a very good practice for xfs and file systems for years. Why not abide by this law in this case? Cheers, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 8f22fc579dbb..922798ebf3e8 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -171,10 +171,25 @@ xfs_growfs_data_private( xfs_rfsblock_t nfree; xfs_agnumber_t oagcount; int pct; + unsigned ver; xfs_trans_t *tp; nb = in->newblocks; - pct = in->imaxpct; + pct = in->imaxpct & 0xff; + ver = in->imaxpct >> 8; + if (ver > 1) + return -EINVAL; + + /* + * V0 API is only allowed to grow sb_usable_dblocks along with + * sb_dblocks. Any other change to sb_usable_dblocks requires + * V1 API to prove that userspace is aware of usable_dblocks. + */ + if (ver == 0 && xfs_sb_version_hasthinspace(mp->m_sb) && + (mp->m_sb.sb_usable_dblocks != mp->m_sb.sb_dblocks || + nb < mp->m_sb.sb_dblocks)) + return -EINVAL; + if (nb < mp->m_sb.sb_dblocks || pct < 0 || pct > 100) return -EINVAL;