Message ID | 20180906160620.16277-1-lhenriques@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | copy_file_range in cephfs kernel client | expand |
I don't have much useful to say here (unless Zheng wants me to look carefully at the use of copy-get), but I'm excited to see this getting done! :) One thing I will note is that it might be a good idea if at least one of the system admin or the Ceph cluster admin can disable this behavior, just in case bugs turn up with the copy-from op. I don't expect any as this is a pretty friendly use-case for it (protected by FS caps, hurray!) but it is the first non-cache-tiering user that will turn up in the wild I'm aware of. -Greg On Thu, Sep 6, 2018 at 9:06 AM, Luis Henriques <lhenriques@suse.com> wrote: > Changes since v2: > > - Files size checks are now done after we have all the required caps > > Here's the main changes since v1, after Zheng's review: > > 1. ceph_osdc_copy_from() now receives source and destination snapids > instead of ceph_vino structs > > 2. Also get FILE_RD capabilities in ceph_copy_file_range() for source > file as other clients may have dirty data in their cache. > > 3. Fallback to VFS copy_file_range default implementation if we're > copying beyond source file EOF > > Note that 2. required an extra patch modifying ceph_try_get_caps() so > that it could perform a non-blocking attempt at getting CEPH_CAP_FILE_RD > capabilities. > > And here's the original (v1) RFC cover letter just for reference: > > This series is my initial attempt at getting a copy_file_range syscall > implementation in the kernel cephfs client using the 'copy-from' RADOS > operation. > > The idea of getting this implemented was from Greg -- or, at least, he > created a feature in the tracker [1]. I just decided to give it a try > as the feature wasn't assigned to anyone ;-) > > I have this patchset sitting on my laptop for a while already, waiting > for me to revisit it, review some of its TODOs... but I finally decided > to send it out as-is instead, to get some early feedback. > > The first patch implements the copy-from operation in the libceph > module. Unfortunately, the documentation for this operation is > nonexistent and I had to do a lot of digging to figure out the details > (and I probably I missed something!). For example, initially I was > hoping that this operation could be used to copy more than one object at > the time. Doing an OSD request per object copy is not ideal, but > unfortunately it seems to be the only way. Anyway, my expectations are > that this new operation will be useful for other features in the future. > > The 2nd patch is where the copy_file_range is implemented and could > probably be optimised, but I didn't bother with that for now. The > important bit is that we still may need to do some manual copies if the > offsets aren't object aligned or if the length is smaller than the > object size. I'm using do_splice_direct() for the manual copies as it > was the easiest way to get a PoC running, but maybe there are better > ways. > > I've done some functional testing on this PoC. And it also passes the > generic xfstest suite, in particular the copy_file_range specific tests > (430-434). But I haven't done any benchmarks to measure any performance > changes in using this syscall. > > Any feedback is welcome, specially regarding the TODOs on the code. > > [1] https://tracker.ceph.com/issues/21944 > > > Luis Henriques (3): > ceph: add non-blocking parameter to ceph_try_get_caps() > ceph: support the RADOS copy-from operation > ceph: support copy_file_range file operation > > fs/ceph/addr.c | 2 +- > fs/ceph/caps.c | 7 +- > fs/ceph/file.c | 221 ++++++++++++++++++++++++++++++++ > fs/ceph/super.h | 2 +- > include/linux/ceph/osd_client.h | 17 +++ > include/linux/ceph/rados.h | 19 +++ > net/ceph/osd_client.c | 72 +++++++++++ > 7 files changed, 335 insertions(+), 5 deletions(-) >
Gregory Farnum <gfarnum@redhat.com> writes: > I don't have much useful to say here (unless Zheng wants me to look > carefully at the use of copy-get), but I'm excited to see this getting > done! :) > > One thing I will note is that it might be a good idea if at least one > of the system admin or the Ceph cluster admin can disable this > behavior, just in case bugs turn up with the copy-from op. I don't > expect any as this is a pretty friendly use-case for it (protected by > FS caps, hurray!) but it is the first non-cache-tiering user that will > turn up in the wild I'm aware of. If I understand you correctly, you're talking about adding a knob so that the OSDs could simply return an error when requested to perform a 'copy-from' op, right? That shouldn't be too difficult, I believe. Returning an error in PrimaryLogPG::do_osd_ops if that knob is set should do the job, but I really don't feel comfortable touching the OSDs code. If you really think this is something useful, I can try to spend some time looking into that, but it will take me some time :) Cheers,
On Fri, Sep 7, 2018 at 2:11 AM, Luis Henriques <lhenriques@suse.com> wrote: > Gregory Farnum <gfarnum@redhat.com> writes: > >> I don't have much useful to say here (unless Zheng wants me to look >> carefully at the use of copy-get), but I'm excited to see this getting >> done! :) >> >> One thing I will note is that it might be a good idea if at least one >> of the system admin or the Ceph cluster admin can disable this >> behavior, just in case bugs turn up with the copy-from op. I don't >> expect any as this is a pretty friendly use-case for it (protected by >> FS caps, hurray!) but it is the first non-cache-tiering user that will >> turn up in the wild I'm aware of. > > If I understand you correctly, you're talking about adding a knob so > that the OSDs could simply return an error when requested to perform a > 'copy-from' op, right? That shouldn't be too difficult, I believe. Oh heavens no. I mean a switch on the client side to not invoke the copy-from op at all. Probably that would just mean doing a full read of the source and write of the destination, but perhaps you could also just unhook the copy_file_range implementation from the VFS? I would assume this is pretty trivial but if it's complicated then don't worry about it — I bring it up mostly because Josh thought there were some outstanding issues with this RADOS op but on deeper inspection he was wrong and conflating some other things. :) -Greg > > Returning an error in PrimaryLogPG::do_osd_ops if that knob is set > should do the job, but I really don't feel comfortable touching the OSDs > code. If you really think this is something useful, I can try to spend > some time looking into that, but it will take me some time :) > > Cheers, > -- > Luis > >> -Greg >> >> On Thu, Sep 6, 2018 at 9:06 AM, Luis Henriques <lhenriques@suse.com> wrote: >>> Changes since v2: >>> >>> - Files size checks are now done after we have all the required caps >>> >>> Here's the main changes since v1, after Zheng's review: >>> >>> 1. ceph_osdc_copy_from() now receives source and destination snapids >>> instead of ceph_vino structs >>> >>> 2. Also get FILE_RD capabilities in ceph_copy_file_range() for source >>> file as other clients may have dirty data in their cache. >>> >>> 3. Fallback to VFS copy_file_range default implementation if we're >>> copying beyond source file EOF >>> >>> Note that 2. required an extra patch modifying ceph_try_get_caps() so >>> that it could perform a non-blocking attempt at getting CEPH_CAP_FILE_RD >>> capabilities. >>> >>> And here's the original (v1) RFC cover letter just for reference: >>> >>> This series is my initial attempt at getting a copy_file_range syscall >>> implementation in the kernel cephfs client using the 'copy-from' RADOS >>> operation. >>> >>> The idea of getting this implemented was from Greg -- or, at least, he >>> created a feature in the tracker [1]. I just decided to give it a try >>> as the feature wasn't assigned to anyone ;-) >>> >>> I have this patchset sitting on my laptop for a while already, waiting >>> for me to revisit it, review some of its TODOs... but I finally decided >>> to send it out as-is instead, to get some early feedback. >>> >>> The first patch implements the copy-from operation in the libceph >>> module. Unfortunately, the documentation for this operation is >>> nonexistent and I had to do a lot of digging to figure out the details >>> (and I probably I missed something!). For example, initially I was >>> hoping that this operation could be used to copy more than one object at >>> the time. Doing an OSD request per object copy is not ideal, but >>> unfortunately it seems to be the only way. Anyway, my expectations are >>> that this new operation will be useful for other features in the future. >>> >>> The 2nd patch is where the copy_file_range is implemented and could >>> probably be optimised, but I didn't bother with that for now. The >>> important bit is that we still may need to do some manual copies if the >>> offsets aren't object aligned or if the length is smaller than the >>> object size. I'm using do_splice_direct() for the manual copies as it >>> was the easiest way to get a PoC running, but maybe there are better >>> ways. >>> >>> I've done some functional testing on this PoC. And it also passes the >>> generic xfstest suite, in particular the copy_file_range specific tests >>> (430-434). But I haven't done any benchmarks to measure any performance >>> changes in using this syscall. >>> >>> Any feedback is welcome, specially regarding the TODOs on the code. >>> >>> [1] https://tracker.ceph.com/issues/21944 >>> >>> >>> Luis Henriques (3): >>> ceph: add non-blocking parameter to ceph_try_get_caps() >>> ceph: support the RADOS copy-from operation >>> ceph: support copy_file_range file operation >>> >>> fs/ceph/addr.c | 2 +- >>> fs/ceph/caps.c | 7 +- >>> fs/ceph/file.c | 221 ++++++++++++++++++++++++++++++++ >>> fs/ceph/super.h | 2 +- >>> include/linux/ceph/osd_client.h | 17 +++ >>> include/linux/ceph/rados.h | 19 +++ >>> net/ceph/osd_client.c | 72 +++++++++++ >>> 7 files changed, 335 insertions(+), 5 deletions(-) >>> >>
Gregory Farnum <gfarnum@redhat.com> writes: > On Fri, Sep 7, 2018 at 2:11 AM, Luis Henriques <lhenriques@suse.com> wrote: >> Gregory Farnum <gfarnum@redhat.com> writes: >> >>> I don't have much useful to say here (unless Zheng wants me to look >>> carefully at the use of copy-get), but I'm excited to see this getting >>> done! :) >>> >>> One thing I will note is that it might be a good idea if at least one >>> of the system admin or the Ceph cluster admin can disable this >>> behavior, just in case bugs turn up with the copy-from op. I don't >>> expect any as this is a pretty friendly use-case for it (protected by >>> FS caps, hurray!) but it is the first non-cache-tiering user that will >>> turn up in the wild I'm aware of. >> >> If I understand you correctly, you're talking about adding a knob so >> that the OSDs could simply return an error when requested to perform a >> 'copy-from' op, right? That shouldn't be too difficult, I believe. > > Oh heavens no. I mean a switch on the client side to not invoke the > copy-from op at all. Probably that would just mean doing a full read > of the source and write of the destination, but perhaps you could also > just unhook the copy_file_range implementation from the VFS? > > I would assume this is pretty trivial but if it's complicated then > don't worry about it — I bring it up mostly because Josh thought there > were some outstanding issues with this RADOS op but on deeper > inspection he was wrong and conflating some other things. :) Ah, sure! That could be done. I guess that a new mount option ('nocopyfrom'?) could be added, which would cause the ceph copy_file_range implementation to simply fallback to the default VFS implementation (i.e. simply do a do_splice_direct). I'll suggest something in an extra patch in v4 of this series. Thanks! Cheers,