Message ID | 20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com (mailing list archive) |
---|---|
Headers | show |
Series | fuse: fuse-over-io-uring | expand |
Overall I think this looks pretty reasonable from an io_uring point of view. Some minor comments in the replies that would need to get resolved, and we'll need to get Ming's buffer work done to reap the dio benefits. I ran a quick benchmark here, doing 4k buffered random reads from a big file. I see about 25% improvement for that case, and notably at half the CPU usage.
On 9/4/24 18:42, Jens Axboe wrote: > Overall I think this looks pretty reasonable from an io_uring point of > view. Some minor comments in the replies that would need to get > resolved, and we'll need to get Ming's buffer work done to reap the dio > benefits. > > I ran a quick benchmark here, doing 4k buffered random reads from a big > file. I see about 25% improvement for that case, and notably at half the > CPU usage. That is a bit low for my needs, but you will definitely need to wake up on the same core - not applied in this patch version. I also need to re-test with current kernel versions, but I think even that is not perfect. We had a rather long discussion here https://lore.kernel.org/lkml/d9151806-c63a-c1da-12ad-c9c1c7039785@amd.com/T/#r58884ee2c68f9ac5fdb89c4e3a968007ff08468e and there is a seesaw hack, which makes it work perfectly. Then got persistently distracted with other work - so far I didn't track down yet why __wake_up_on_current_cpu didn't work. Back that time it was also only still patch and not in linux yet. I need to retest and possible figure out where the task switch happens. Also, if you are testing with with buffered writes, v2 series had more optimization, like a core+1 hack for async IO. I think in order to get it landed and to agree on the approach with Miklos it is better to first remove all these optimizations and then fix it later... Though for performance testing it is not optimal. Thanks, Bernd
On 9/4/24 1:37 PM, Bernd Schubert wrote: > > > On 9/4/24 18:42, Jens Axboe wrote: >> Overall I think this looks pretty reasonable from an io_uring point of >> view. Some minor comments in the replies that would need to get >> resolved, and we'll need to get Ming's buffer work done to reap the dio >> benefits. >> >> I ran a quick benchmark here, doing 4k buffered random reads from a big >> file. I see about 25% improvement for that case, and notably at half the >> CPU usage. > > That is a bit low for my needs, but you will definitely need to wake up on > the same core - not applied in this patch version. I also need to re-test > with current kernel versions, but I think even that is not perfect. > > We had a rather long discussion here > https://lore.kernel.org/lkml/d9151806-c63a-c1da-12ad-c9c1c7039785@amd.com/T/#r58884ee2c68f9ac5fdb89c4e3a968007ff08468e > and there is a seesaw hack, which makes it work perfectly. > Then got persistently distracted with other work - so far I didn't track down yet why > __wake_up_on_current_cpu didn't work. Back that time it was also only still > patch and not in linux yet. I need to retest and possible figure out where > the task switch happens. I'll give it a look, wasn't too worried about it as we're also still missing the zero copy bits. More concerned with just getting the core of it sane, which I think we're pretty close to. Then we can work on making it even faster post that. > Also, if you are testing with with buffered writes, > v2 series had more optimization, like a core+1 hack for async IO. > I think in order to get it landed and to agree on the approach with > Miklos it is better to first remove all these optimizations and then > fix it later... Though for performance testing it is not optimal. Exactly, that's why I objected to some of the v2 io_uring hackery that just wasn't palatable.