Message ID | 20240520102033.9361-1-nj.shetty@samsung.com (mailing list archive) |
---|---|
Headers | show |
Series | Implement copy offload support | expand |
On 5/20/24 03:20, Nitesh Shetty wrote:
> 4. This bio is merged with the request containing the destination info.
bios with different operation types must never be merged. From attempt_merge():
if (req_op(req) != req_op(next))
return NULL;
Thanks,
Bart.
On Mon, May 20, 2024 at 03:50:13PM +0530, Nitesh Shetty wrote:
> So copy offload works only for request based storage drivers.
I don't think that is actually true. It just requires a fair amount of
code in a bio based driver to match the bios up.
I'm missing any kind of information on what this patch set as-is
actually helps with. What operations are sped up, for what operations
does it reduce resource usage?
Part of that might be that the included use case of offloading
copy_file_range doesn't seem particularly useful - on any advance
file system that would be done using reflinks anyway.
Have you considered hooking into dm-kcopyd which would be an
instant win instead? Or into garbage collection in zoned or other
log structured file systems? Those would probably really like
multiple source bios, though.
On 01/06/24 07:47AM, Christoph Hellwig wrote: >On Mon, May 20, 2024 at 03:50:13PM +0530, Nitesh Shetty wrote: >> So copy offload works only for request based storage drivers. > >I don't think that is actually true. It just requires a fair amount of >code in a bio based driver to match the bios up. > >I'm missing any kind of information on what this patch set as-is >actually helps with. What operations are sped up, for what operations >does it reduce resource usage? > The major benefit of this copy-offload/emulation framework is observed in fabrics setup, for copy workloads across the network. The host will send offload command over the network and actual copy can be achieved using emulation on the target (hence patch 4). This results in higher performance and lower network consumption, as compared to read and write travelling across the network. With this design of copy-offload/emulation we are able to see the following improvements as compared to userspace read + write on a NVMeOF TCP setup: Setup1: Network Speed: 1000Mb/s Host PC: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz Target PC: AMD Ryzen 9 5900X 12-Core Processor block size 8k: Improvement in IO BW from 106 MiB/s to 360 MiB/s Network utilisation drops from 97% to 6%. block-size 1M: Improvement in IO BW from 104 MiB/s to 2677 MiB/s Network utilisation drops from 92% to 0.66%. Setup2: Network Speed: 100Gb/s Server: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 72 cores (host and target have the same configuration) block-size 8k: 17.5% improvement in IO BW (794 MiB/s to 933 MiB/s). Network utilisation drops from 6.75% to 0.16%. >Part of that might be that the included use case of offloading >copy_file_range doesn't seem particularly useful - on any advance >file system that would be done using reflinks anyway. > Instead of coining a new user interface just for copy, we thought of using existing infra for plumbing. When this series gets merged, we can add io-uring interface. >Have you considered hooking into dm-kcopyd which would be an >instant win instead? Or into garbage collection in zoned or other >log structured file systems? Those would probably really like >multiple source bios, though. > Our initial few version of the series had dm-kcopyd use case. We dropped it, to make overall series lightweight and make it easier to review and test. When the current series gets merged, we will start adding more in-kernel users in next phase. Thank you, Nitesh Shetty