Message ID | cover.1647894991.git.dsterba@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] Btrfs updates for 5.18 | expand |
On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > this update contains feature updates, performance improvements, > preparatory and core work and some related VFS updates. Please pull, > thanks. Hmm. This was marked as spam for me. Not sure why - the email looks fine, and has proper DKIM and SPF records. Linus
On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > - allow reflinks/deduplication from two different mounts of the same > filesystem So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. In particular, I'm not seeing any commentary about different filesystems for this. There are several filesystems that use that ->remap_file_range() operation, so these relaxed rules don't just affect btrfs. Yes, yes, checking for i_sb matching does seem sensible, but I'd *really* have liked some sign that people checked with other filesystem maintainers and this is ok for all of them, and they didn't make assumptions about "always same mount" rather than "always same filesystem". This affects at least cifs, nfs, overlayfs and ocfs2. Adding fsdevel, and pointing to that - if (src_file->f_path.mnt != dst_file->f_path.mnt) + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") And yes, there was already a comment about "Practically, they only need to be on the same file system" from before that matches the new behavior, but hey, comments have been known to be wrong in the past too. And yes, I'm also aware that do_clone_file_range() already had that exact same i_sb check and it's not new, but since ioctl_file_clone() cheched for the mount path, I don't think you could actually reach it without being on the same mount. And while discussing these sanity checks: wouldn't it make sense to check that *both* the source file and the destination file support that remap_file_range() op, and it's the same op? Yes, yes, it probably always is in practice, but I could imagine some type confusion thing. So wouldn't it be nice to also have something like if (dst_file->f_op != src_file->f_op) goto out_drop_write; in there? I'm thinking "how about dedupe from a directory to a regular file" kind of craziness... Linus
The pull request you sent on Mon, 21 Mar 2022 22:33:04 +0100:
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5191290407668028179f2544a11ae9b57f0bcf07
Thank you!
On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote: > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > - allow reflinks/deduplication from two different mounts of the same > > filesystem > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > In particular, I'm not seeing any commentary about different > filesystems for this. > > There are several filesystems that use that ->remap_file_range() > operation, so these relaxed rules don't just affect btrfs. > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > *really* have liked some sign that people checked with other > filesystem maintainers and this is ok for all of them, and they didn't > make assumptions about "always same mount" rather than "always same > filesystem". > > This affects at least cifs, nfs, overlayfs and ocfs2. I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the patch. This did surprise nfsd when xfstests started failing, but talking with Bruce he didn't complain once he understood what was going on. Believe me I have 0 interest in getting the other maintainers upset with me by sneaking something by them, I made sure to run it by people first, tho I probably should have checked with people directly other than Darrick. > > Adding fsdevel, and pointing to that > > - if (src_file->f_path.mnt != dst_file->f_path.mnt) > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) > > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") > > And yes, there was already a comment about "Practically, they only > need to be on the same file system" from before that matches the new > behavior, but hey, comments have been known to be wrong in the past > too. > > And yes, I'm also aware that do_clone_file_range() already had that > exact same i_sb check and it's not new, but since ioctl_file_clone() > cheched for the mount path, I don't think you could actually reach it > without being on the same mount. > > And while discussing these sanity checks: wouldn't it make sense to > check that *both* the source file and the destination file support > that remap_file_range() op, and it's the same op? > > Yes, yes, it probably always is in practice, but I could imagine some > type confusion thing. So wouldn't it be nice to also have something > like > > if (dst_file->f_op != src_file->f_op) > goto out_drop_write; > > in there? I'm thinking "how about dedupe from a directory to a regular > file" kind of craziness... > This more fine-grained checking is handled by generic_remap_file_range_prep() to make sure we don't try to dedup a directory or pipe or some other nonsense. Thanks, Josef
On Tue, Mar 22, 2022 at 1:55 PM Josef Bacik <josef@toxicpanda.com> wrote: > > This more fine-grained checking is handled by generic_remap_file_range_prep() to > make sure we don't try to dedup a directory or pipe or some other nonsense. Yeah, that does seem to take care of the obvious cases, and requires that both files be regular files at least. I'm still not a huge fan of how we use the 'f_op->remap_file_range' of the source file, without really checking that the destination file is ok with remap_file_range. They end up _superficially_ very similar, yes, but I can point to filesystems that use different f_op's for different files. And some of those depend on - wait for it - how the filesystem was mounted. See for example cifs: if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) file->f_op = &cifs_file_direct_nobrl_ops; else file->f_op = &cifs_file_direct_ops; so I'm just thinking "what about doing remap_file_range between two regular files that act differently - either due to mount options or other details". In that cifs example, read_iter and write_iter are different. Yes, copy/remap_file_range uses the same function pointer, but it still worries me about copying from a mount to another if there might be different semantics for IO between them. I think in this cifs case, the superblock ends up being the same, so the mnt_cifs_flags end up being the same, and the above is not actually a real issue. But conceptually I could imagine cases where that wasn't the case - or even cases like /proc that have fundamentally different file operations for different files) Linus
On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote: > On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote: > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > > > - allow reflinks/deduplication from two different mounts of the same > > > filesystem > > > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > > > In particular, I'm not seeing any commentary about different > > filesystems for this. > > > > There are several filesystems that use that ->remap_file_range() > > operation, so these relaxed rules don't just affect btrfs. > > > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > > *really* have liked some sign that people checked with other > > filesystem maintainers and this is ok for all of them, and they didn't > > make assumptions about "always same mount" rather than "always same > > filesystem". > > > > > This affects at least cifs, nfs, overlayfs and ocfs2. > > I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the > patch. This did surprise nfsd when xfstests started failing, but talking with > Bruce he didn't complain once he understood what was going on. FWIW, I remember talking about this with Bruce and (probably Anna too) during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was 2018(?) At the time, I think we resolved that nfs42_remap_file_range was capable of detecting and dealing with unsupported requests, so a direct comparison of the ->remap_file_range or ->f_op wasn't necessary for them. > Believe me I > have 0 interest in getting the other maintainers upset with me by sneaking > something by them, I made sure to run it by people first, tho I probably should > have checked with people directly other than Darrick. I /am/ a little curious what Steve French has to say w.r.t CIFS. AFAICT overlayfs passes the request down to the appropriate fs under-layer, so its correctness mostly depends on the under-layer's implementation. But I'll let Amir or someone chime in on that. ;) As for ocfs2, back when I added support for ->remap_file_range to ocfs2, cross-mount reflink and dedupe worked fine, or at least as well as anything works on ocfs2. (XFS has always supported cross-mount remappings.) > > > > Adding fsdevel, and pointing to that > > > > - if (src_file->f_path.mnt != dst_file->f_path.mnt) > > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) > > > > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") > > > > And yes, there was already a comment about "Practically, they only > > need to be on the same file system" from before that matches the new > > behavior, but hey, comments have been known to be wrong in the past > > too. > > > > And yes, I'm also aware that do_clone_file_range() already had that > > exact same i_sb check and it's not new, but since ioctl_file_clone() > > cheched for the mount path, I don't think you could actually reach it > > without being on the same mount. > > > > And while discussing these sanity checks: wouldn't it make sense to > > check that *both* the source file and the destination file support > > that remap_file_range() op, and it's the same op? > > > > Yes, yes, it probably always is in practice, but I could imagine some > > type confusion thing. So wouldn't it be nice to also have something > > like > > > > if (dst_file->f_op != src_file->f_op) > > goto out_drop_write; > > > > in there? I'm thinking "how about dedupe from a directory to a regular > > file" kind of craziness... > > > > This more fine-grained checking is handled by generic_remap_file_range_prep() to > make sure we don't try to dedup a directory or pipe or some other nonsense. Yes. The VFS only allows remapping between regular files. --D > Thanks, > > Josef
On Tue, Mar 22, 2022 at 10:11 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > - allow reflinks/deduplication from two different mounts of the same > > filesystem > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > In particular, I'm not seeing any commentary about different > filesystems for this. > > There are several filesystems that use that ->remap_file_range() > operation, so these relaxed rules don't just affect btrfs. > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > *really* have liked some sign that people checked with other > filesystem maintainers and this is ok for all of them, and they didn't > make assumptions about "always same mount" rather than "always same > filesystem". > > This affects at least cifs, nfs, overlayfs and ocfs2. overlayfs shouldn't have a problem with that change. IIUC, cifs would also gain from this, because clone is implemented on server side and even two different sb's could technically do server side duplicate_extents. Same goes for nfs v4.2. There was a lot of discussion on these aspects when cross server (i.e. cross sb) copy was implemented not so long ago. Relaxing cross-mnt clone is nothing compared to that. > > Adding fsdevel, and pointing to that > > - if (src_file->f_path.mnt != dst_file->f_path.mnt) > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) > > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") > > And yes, there was already a comment about "Practically, they only > need to be on the same file system" from before that matches the new > behavior, but hey, comments have been known to be wrong in the past > too. As the one who left this comment I can say it is based only on common sense, similar to the rationale in this recent commit. If it is any help, overlayfs has been doing cross mnt clones since 913b86e92e1f vfs: allow vfs_clone_file_range() across mount points I left the comment because I did not need to take responsibility for changing user behavior at the time, but I do not see any immediate harm from the user behavior changes now. > > And yes, I'm also aware that do_clone_file_range() already had that > exact same i_sb check and it's not new, but since ioctl_file_clone() > cheched for the mount path, I don't think you could actually reach it > without being on the same mount. > > And while discussing these sanity checks: wouldn't it make sense to > check that *both* the source file and the destination file support > that remap_file_range() op, and it's the same op? > > Yes, yes, it probably always is in practice, but I could imagine some > type confusion thing. So wouldn't it be nice to also have something > like > > if (dst_file->f_op != src_file->f_op) > goto out_drop_write; > > in there? I'm thinking "how about dedupe from a directory to a regular > file" kind of craziness... Both S_ISDIR and !S_ISREG cases are already checked for both clone and dedupe on both files (twice in fact), so at least that is not a concern. There may be other reasons to worry about, but I can't think of any. Thanks, Amir.
On Thu, Mar 24, 2022 at 1:35 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote: > > On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote: > > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > > > > > - allow reflinks/deduplication from two different mounts of the same > > > > filesystem > > > > > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > > > > > In particular, I'm not seeing any commentary about different > > > filesystems for this. > > > > > > There are several filesystems that use that ->remap_file_range() > > > operation, so these relaxed rules don't just affect btrfs. > > > > > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > > > *really* have liked some sign that people checked with other > > > filesystem maintainers and this is ok for all of them, and they didn't > > > make assumptions about "always same mount" rather than "always same > > > filesystem". > > > > > > > > This affects at least cifs, nfs, overlayfs and ocfs2. > > > > I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the > > patch. This did surprise nfsd when xfstests started failing, but talking with > > Bruce he didn't complain once he understood what was going on. > > FWIW, I remember talking about this with Bruce and (probably Anna too) > during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was > 2018(?) At the time, I think we resolved that nfs42_remap_file_range > was capable of detecting and dealing with unsupported requests, so a > direct comparison of the ->remap_file_range or ->f_op wasn't necessary > for them. > > > Believe me I > > have 0 interest in getting the other maintainers upset with me by sneaking > > something by them, I made sure to run it by people first, tho I probably should > > have checked with people directly other than Darrick. > > I /am/ a little curious what Steve French has to say w.r.t CIFS. > > AFAICT overlayfs passes the request down to the appropriate fs > under-layer, so its correctness mostly depends on the under-layer's > implementation. But I'll let Amir or someone chime in on that. ;) > The thing is, overlayfs ALREADY does cross-mnt clone_file_range() on underlying layers. So if there was a bug with allowing cross mnt clones on xfs it would have been in the wild for a long time already. OTOH, overlayfs doesn't support nfs/cifs/ocfs2 as upper fs. If you mount an overlay with lower and upper layer on the same xfs/btrfs sb the original mnt of lower path and upper patch is irrelevant. Overlayfs uses different private mnt per layer anyway, so if the source file is on lower layer then even if originally the overlay mount was done with different upper/lower mounts of the same sb, clone via overlayfs would work. Allowing cross overlayfs mount clone makes very little difference. Thanks, Amir.