Message ID | 14ec5d880cfca878bf32e643243c7ab3f4a52440.1587309014.git.qemu_oss@crudebyte.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | 9pfs: readdir optimization | expand |
On Sonntag, 19. April 2020 17:06:17 CEST Christian Schoenebeck wrote: > Make top half really top half and bottom half really bottom half: > > Each T_readdir request handling is hopping between threads (main > I/O thread and background I/O driver threads) several times for > every individual directory entry, which sums up to huge latencies > for handling just a single T_readdir request. > > Instead of doing that, collect now all required directory entries > (including all potentially required stat buffers for each entry) in > one rush on a background I/O thread from fs driver by calling the > previously added function v9fs_co_readdir_many() instead of > v9fs_co_readdir(), then assemble the entire resulting network > response message for the readdir request on main I/O thread. The > fs driver is still aborting the directory entry retrieval loop > (on the background I/O thread inside of v9fs_co_readdir_many()) > as soon as it would exceed the client's requested maximum R_readdir > response size. So this will not introduce a performance penalty on > another end. > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > --- > hw/9pfs/9p.c | 122 +++++++++++++++++++++++---------------------------- > 1 file changed, 55 insertions(+), 67 deletions(-) Ping. Anybody? I would like to roll out this patch set soon and this is the only patch in this series missing a review yet.
On Wed, 03 Jun 2020 19:16:08 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Sonntag, 19. April 2020 17:06:17 CEST Christian Schoenebeck wrote: > > Make top half really top half and bottom half really bottom half: > > > > Each T_readdir request handling is hopping between threads (main > > I/O thread and background I/O driver threads) several times for > > every individual directory entry, which sums up to huge latencies > > for handling just a single T_readdir request. > > > > Instead of doing that, collect now all required directory entries > > (including all potentially required stat buffers for each entry) in > > one rush on a background I/O thread from fs driver by calling the > > previously added function v9fs_co_readdir_many() instead of > > v9fs_co_readdir(), then assemble the entire resulting network > > response message for the readdir request on main I/O thread. The > > fs driver is still aborting the directory entry retrieval loop > > (on the background I/O thread inside of v9fs_co_readdir_many()) > > as soon as it would exceed the client's requested maximum R_readdir > > response size. So this will not introduce a performance penalty on > > another end. > > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > > --- > > hw/9pfs/9p.c | 122 +++++++++++++++++++++++---------------------------- > > 1 file changed, 55 insertions(+), 67 deletions(-) > > Ping. Anybody? > > I would like to roll out this patch set soon and this is the only patch in > this series missing a review yet. > Hi Christian, Sorry for getting back to this only now :-\ So I still have some concerns about the locking of the directory stream pointer a fid. It was initially introduced to avoid concurrent accesses by multiple threads to the corresponding internal glibc object, as recommended in the readdir(3) manual page. Now, this patch considerably extends the critical section to also contain calls to telldir() and all the _many_ readdir()... so I'm not sure exactly what's the purpose of that mutex right now. Please provide more details. Cheers, -- Greg
On Montag, 29. Juni 2020 18:39:02 CEST Greg Kurz wrote: > On Wed, 03 Jun 2020 19:16:08 +0200 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > On Sonntag, 19. April 2020 17:06:17 CEST Christian Schoenebeck wrote: > > > Make top half really top half and bottom half really bottom half: > > > > > > Each T_readdir request handling is hopping between threads (main > > > I/O thread and background I/O driver threads) several times for > > > every individual directory entry, which sums up to huge latencies > > > for handling just a single T_readdir request. > > > > > > Instead of doing that, collect now all required directory entries > > > (including all potentially required stat buffers for each entry) in > > > one rush on a background I/O thread from fs driver by calling the > > > previously added function v9fs_co_readdir_many() instead of > > > v9fs_co_readdir(), then assemble the entire resulting network > > > response message for the readdir request on main I/O thread. The > > > fs driver is still aborting the directory entry retrieval loop > > > (on the background I/O thread inside of v9fs_co_readdir_many()) > > > as soon as it would exceed the client's requested maximum R_readdir > > > response size. So this will not introduce a performance penalty on > > > another end. > > > > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > > > --- > > > > > > hw/9pfs/9p.c | 122 +++++++++++++++++++++++---------------------------- > > > 1 file changed, 55 insertions(+), 67 deletions(-) > > > > Ping. Anybody? > > > > I would like to roll out this patch set soon and this is the only patch in > > this series missing a review yet. > > Hi Christian, Hi Greg, > Sorry for getting back to this only now :-\ > > So I still have some concerns about the locking of the directory stream > pointer a fid. It was initially introduced to avoid concurrent accesses > by multiple threads to the corresponding internal glibc object, as > recommended in the readdir(3) manual page. Now, this patch considerably > extends the critical section to also contain calls to telldir() and all > the _many_ readdir()... so I'm not sure exactly what's the purpose of > that mutex right now. Please provide more details. The intention of this lock is to prevent concurrent r/w (i.e. telldir()/ readdir()) of the dir stream position by two or more active readdir requests handled in parallel, provided that they would use the same FID. Due to the latter requirement for this to become relevant, we already agreed that this is not something that would usually happen with a Linux 9p client, but there is nothing from protocol point of view that would prohibit this to be done by a client, so the resulting undefined behaviour should be prevented, which this lock does. What's your precise concern? Best regards, Christian Schoenebeck
On Tue, 30 Jun 2020 17:16:40 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Montag, 29. Juni 2020 18:39:02 CEST Greg Kurz wrote: > > On Wed, 03 Jun 2020 19:16:08 +0200 > > > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > On Sonntag, 19. April 2020 17:06:17 CEST Christian Schoenebeck wrote: > > > > Make top half really top half and bottom half really bottom half: > > > > > > > > Each T_readdir request handling is hopping between threads (main > > > > I/O thread and background I/O driver threads) several times for > > > > every individual directory entry, which sums up to huge latencies > > > > for handling just a single T_readdir request. > > > > > > > > Instead of doing that, collect now all required directory entries > > > > (including all potentially required stat buffers for each entry) in > > > > one rush on a background I/O thread from fs driver by calling the > > > > previously added function v9fs_co_readdir_many() instead of > > > > v9fs_co_readdir(), then assemble the entire resulting network > > > > response message for the readdir request on main I/O thread. The > > > > fs driver is still aborting the directory entry retrieval loop > > > > (on the background I/O thread inside of v9fs_co_readdir_many()) > > > > as soon as it would exceed the client's requested maximum R_readdir > > > > response size. So this will not introduce a performance penalty on > > > > another end. > > > > > > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > > > > --- > > > > > > > > hw/9pfs/9p.c | 122 +++++++++++++++++++++++---------------------------- > > > > 1 file changed, 55 insertions(+), 67 deletions(-) > > > > > > Ping. Anybody? > > > > > > I would like to roll out this patch set soon and this is the only patch in > > > this series missing a review yet. > > > > Hi Christian, > > Hi Greg, > > > Sorry for getting back to this only now :-\ > > > > So I still have some concerns about the locking of the directory stream > > pointer a fid. It was initially introduced to avoid concurrent accesses > > by multiple threads to the corresponding internal glibc object, as > > recommended in the readdir(3) manual page. Now, this patch considerably > > extends the critical section to also contain calls to telldir() and all > > the _many_ readdir()... so I'm not sure exactly what's the purpose of > > that mutex right now. Please provide more details. > > The intention of this lock is to prevent concurrent r/w (i.e. telldir()/ > readdir()) of the dir stream position by two or more active readdir requests > handled in parallel, provided that they would use the same FID. Due to the > latter requirement for this to become relevant, we already agreed that this is > not something that would usually happen with a Linux 9p client, but there is > nothing from protocol point of view that would prohibit this to be done by a > client, so the resulting undefined behaviour should be prevented, which this > lock does. > Makes sense. The dir stream position is no different from glibc's internal dirent in that respect: it shouldn't be used by concurrent threads. > What's your precise concern? > My overall concern is still the same. The patches are big and I've been away for too long, so I need some more discussion to reassemble my thoughts and the puzzle :) But now that I'm starting to understand why it's a good thing to extend the critical section, I realize it should be extended even more: we also have calls to seekdir() and rewindir() in v9fs_readdir() and friends that could _theoretically_ cause the very same kind of undefined behavior you're mentioning. I think the change is important enough that it deserves its own patch. I expect that moving the locking to the top-level handler might also simplify the existing code, and thus your series as well. Please let me come up with a patch first. > Best regards, > Christian Schoenebeck > > Cheers, -- Greg
On Dienstag, 30. Juni 2020 18:39:57 CEST Greg Kurz wrote: > On Tue, 30 Jun 2020 17:16:40 +0200 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > On Montag, 29. Juni 2020 18:39:02 CEST Greg Kurz wrote: > > > On Wed, 03 Jun 2020 19:16:08 +0200 > > > > > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > > On Sonntag, 19. April 2020 17:06:17 CEST Christian Schoenebeck wrote: > > > > > Make top half really top half and bottom half really bottom half: > > > > > > > > > > Each T_readdir request handling is hopping between threads (main > > > > > I/O thread and background I/O driver threads) several times for > > > > > every individual directory entry, which sums up to huge latencies > > > > > for handling just a single T_readdir request. > > > > > > > > > > Instead of doing that, collect now all required directory entries > > > > > (including all potentially required stat buffers for each entry) in > > > > > one rush on a background I/O thread from fs driver by calling the > > > > > previously added function v9fs_co_readdir_many() instead of > > > > > v9fs_co_readdir(), then assemble the entire resulting network > > > > > response message for the readdir request on main I/O thread. The > > > > > fs driver is still aborting the directory entry retrieval loop > > > > > (on the background I/O thread inside of v9fs_co_readdir_many()) > > > > > as soon as it would exceed the client's requested maximum R_readdir > > > > > response size. So this will not introduce a performance penalty on > > > > > another end. > > > > > > > > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > > > > > --- > > > > > > > > > > hw/9pfs/9p.c | 122 > > > > > +++++++++++++++++++++++---------------------------- > > > > > 1 file changed, 55 insertions(+), 67 deletions(-) > > > > > > > > Ping. Anybody? > > > > > > > > I would like to roll out this patch set soon and this is the only > > > > patch in > > > > this series missing a review yet. > > > > > > Hi Christian, > > > > Hi Greg, > > > > > Sorry for getting back to this only now :-\ > > > > > > So I still have some concerns about the locking of the directory stream > > > pointer a fid. It was initially introduced to avoid concurrent accesses > > > by multiple threads to the corresponding internal glibc object, as > > > recommended in the readdir(3) manual page. Now, this patch considerably > > > extends the critical section to also contain calls to telldir() and all > > > the _many_ readdir()... so I'm not sure exactly what's the purpose of > > > that mutex right now. Please provide more details. > > > > The intention of this lock is to prevent concurrent r/w (i.e. telldir()/ > > readdir()) of the dir stream position by two or more active readdir > > requests handled in parallel, provided that they would use the same FID. > > Due to the latter requirement for this to become relevant, we already > > agreed that this is not something that would usually happen with a Linux > > 9p client, but there is nothing from protocol point of view that would > > prohibit this to be done by a client, so the resulting undefined > > behaviour should be prevented, which this lock does. > > Makes sense. The dir stream position is no different from glibc's internal > dirent in that respect: it shouldn't be used by concurrent threads. Exactly, it is a conceptual issue per se in general, independent of whether glibc is used and independent of the fs driver. > > What's your precise concern? > > My overall concern is still the same. The patches are big and I've > been away for too long, so I need some more discussion to reassemble > my thoughts and the puzzle :) Np, if you have questions, just come along. > But now that I'm starting to understand why it's a good thing to > extend the critical section, I realize it should be extended > even more: we also have calls to seekdir() and rewindir() in > v9fs_readdir() and friends that could _theoretically_ cause the > very same kind of undefined behavior you're mentioning. You are talking about the "old" code now. Yes, it is wrong and I did not bother to fix the "old" implementation, since this optimized readdir implementation fixed this issue as well in one rush, but also note ... > I think the change is important enough that it deserves its > own patch. I expect that moving the locking to the top-level > handler might also simplify the existing code, and thus your > series as well. ... please reconsider if you really want to open that box. The "old" readdir implementation on main thread is inherently wrong, and handling the lock there and error pathes is extremely cumbersome and error prone. Also: it does not make sense to move locking on this series from fs driver back to main I/O thread. Atomicity must retain on driver side, not on top half. Best regards, Christian Schoenebeck
On Tue, 30 Jun 2020 20:00:08 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Dienstag, 30. Juni 2020 18:39:57 CEST Greg Kurz wrote: > > On Tue, 30 Jun 2020 17:16:40 +0200 > > > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > On Montag, 29. Juni 2020 18:39:02 CEST Greg Kurz wrote: > > > > On Wed, 03 Jun 2020 19:16:08 +0200 > > > > > > > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > > > On Sonntag, 19. April 2020 17:06:17 CEST Christian Schoenebeck wrote: > > > > > > Make top half really top half and bottom half really bottom half: > > > > > > > > > > > > Each T_readdir request handling is hopping between threads (main > > > > > > I/O thread and background I/O driver threads) several times for > > > > > > every individual directory entry, which sums up to huge latencies > > > > > > for handling just a single T_readdir request. > > > > > > > > > > > > Instead of doing that, collect now all required directory entries > > > > > > (including all potentially required stat buffers for each entry) in > > > > > > one rush on a background I/O thread from fs driver by calling the > > > > > > previously added function v9fs_co_readdir_many() instead of > > > > > > v9fs_co_readdir(), then assemble the entire resulting network > > > > > > response message for the readdir request on main I/O thread. The > > > > > > fs driver is still aborting the directory entry retrieval loop > > > > > > (on the background I/O thread inside of v9fs_co_readdir_many()) > > > > > > as soon as it would exceed the client's requested maximum R_readdir > > > > > > response size. So this will not introduce a performance penalty on > > > > > > another end. > > > > > > > > > > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > > > > > > --- > > > > > > > > > > > > hw/9pfs/9p.c | 122 > > > > > > +++++++++++++++++++++++---------------------------- > > > > > > 1 file changed, 55 insertions(+), 67 deletions(-) > > > > > > > > > > Ping. Anybody? > > > > > > > > > > I would like to roll out this patch set soon and this is the only > > > > > patch in > > > > > this series missing a review yet. > > > > > > > > Hi Christian, > > > > > > Hi Greg, > > > > > > > Sorry for getting back to this only now :-\ > > > > > > > > So I still have some concerns about the locking of the directory stream > > > > pointer a fid. It was initially introduced to avoid concurrent accesses > > > > by multiple threads to the corresponding internal glibc object, as > > > > recommended in the readdir(3) manual page. Now, this patch considerably > > > > extends the critical section to also contain calls to telldir() and all > > > > the _many_ readdir()... so I'm not sure exactly what's the purpose of > > > > that mutex right now. Please provide more details. > > > > > > The intention of this lock is to prevent concurrent r/w (i.e. telldir()/ > > > readdir()) of the dir stream position by two or more active readdir > > > requests handled in parallel, provided that they would use the same FID. > > > Due to the latter requirement for this to become relevant, we already > > > agreed that this is not something that would usually happen with a Linux > > > 9p client, but there is nothing from protocol point of view that would > > > prohibit this to be done by a client, so the resulting undefined > > > behaviour should be prevented, which this lock does. > > > > Makes sense. The dir stream position is no different from glibc's internal > > dirent in that respect: it shouldn't be used by concurrent threads. > > Exactly, it is a conceptual issue per se in general, independent of whether > glibc is used and independent of the fs driver. > > > > What's your precise concern? > > > > My overall concern is still the same. The patches are big and I've > > been away for too long, so I need some more discussion to reassemble > > my thoughts and the puzzle :) > > Np, if you have questions, just come along. > > > But now that I'm starting to understand why it's a good thing to > > extend the critical section, I realize it should be extended > > even more: we also have calls to seekdir() and rewindir() in > > v9fs_readdir() and friends that could _theoretically_ cause the > > very same kind of undefined behavior you're mentioning. > > You are talking about the "old" code now. Yes, it is wrong and I did not No I'm talking about code that isn't changed by this series: if (initial_offset == 0) { v9fs_co_rewinddir(pdu, fidp); } else { v9fs_co_seekdir(pdu, fidp, initial_offset); } count = v9fs_do_readdir(pdu, fidp, max_count); Leaving these outside the critical section seems to negate your arguments... please clarify. > bother to fix the "old" implementation, since this optimized readdir > implementation fixed this issue as well in one rush, but also note ... > > > I think the change is important enough that it deserves its > > own patch. I expect that moving the locking to the top-level > > handler might also simplify the existing code, and thus your > > series as well. > > ... please reconsider if you really want to open that box. The "old" readdir > implementation on main thread is inherently wrong, and handling the lock there > and error pathes is extremely cumbersome and error prone. > There are indeed several issues with the current readdir: 1) potential inconsistency on concurrent Treaddir requests 2) excessive dispatching to worker threads So we agreed that 1) should never happen with a regular linux client (we could even dump the lock actually) but we want to make it clear in the code anyway that actions on the directory stream are serialized. 2) basically means moving some logic from the frontend to the backend, ie. called from v9fs_co_run_in_worker(). This implies that a very long request (ie. one that would span on many calls to readdir()) cannot be interrupted by the client anymore, as opposed to what we have now BTW. I tend to think that addressing both issues in one "rush" is as much "error prone". > Also: it does not make sense to move locking on this series from fs driver > back to main I/O thread. Atomicity must retain on driver side, not on top > half. > Then the whole seekdir/rewinddir/telldir/readdir sequence should be called with a single v9fs_co_run_in_worker() invocation, in which case the locking could just be something like: int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp, struct V9fsDirEnt **entries, int32_t maxsize, bool dostat) { int err = 0; if (v9fs_request_cancelled(pdu)) { return -EINTR; } v9fs_readdir_lock(&fidp->fs.dir); v9fs_co_run_in_worker({ err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); }); v9fs_readdir_unlock(&fidp->fs.dir); return err; } This would technically leave the locking in the main I/O thread, but it isn't conceptually different from locking at the beginning of do_readdir_lock() and unlocking before returning. My concern is that I don't know how coroutine mutexes behave with multiple worker threads... > Best regards, > Christian Schoenebeck > > Cheers, -- Greg
On Mittwoch, 1. Juli 2020 12:09:24 CEST Greg Kurz wrote: > No I'm talking about code that isn't changed by this series: > > if (initial_offset == 0) { > v9fs_co_rewinddir(pdu, fidp); > } else { > v9fs_co_seekdir(pdu, fidp, initial_offset); > } > count = v9fs_do_readdir(pdu, fidp, max_count); > > Leaving these outside the critical section seems to negate > your arguments... please clarify. Yeah, I admit I have neglected this issue, as concurrent readdir requests with same FID always was some kind of theoretical issue. But yes, you are right, that should be addressed by 1. entirely removing the rewinddir/seekdir code here 2. passing initial_offset to v9fs_do_readdir(), which in turn would pass it to readdir_many() 3. readdir_many() would handle rewinddir/seekdir exclusively in its crticial section. > There are indeed several issues with the current readdir: > > 1) potential inconsistency on concurrent Treaddir requests > > 2) excessive dispatching to worker threads > > So we agreed that 1) should never happen with a regular linux > client (we could even dump the lock actually) but we want to > make it clear in the code anyway that actions on the directory > stream are serialized. My point is you are trying to fix a merely thereotical issue on code that's conceptually, inherently wrong and dead end code at first place. Top half code should always be designed to be as thin as possible. The old readdir code though is the complete opposite. > 2) basically means moving some logic from the frontend to the > backend, ie. called from v9fs_co_run_in_worker(). This implies > that a very long request (ie. one that would span on many calls > to readdir()) cannot be interrupted by the client anymore, as > opposed to what we have now BTW. 3) Complexity of old readdir code is so much bigger compared to the new one such that probability of additional issues is also significantly higher. > I tend to think that addressing both issues in one "rush" is > as much "error prone". No it's not. The optimized readdir implementation is quantifyable, significantly less complex than the old implementation (i.e. significantly smaller amount of branches and error pathes, determenistic clear separation of thread's task assignments which includes much smaller amount of thread hops). Less complexity and increased determinism consequently means reduced chance of misbehaviours. So that's not a subjective, but rather a quantifyable, proven statement. You can't switch from the old (wrong) concept to the new concept without some minimum amount of changes, which yes are not small, but I don't see any way to make the change set smaller without yet introducing new negative side effects. I am now using words that I heard from your side before: please be realistic. Currently man power on 9p code is extremely limited to put it mildly. Yes, we could spend time on fixing this (theoretical) issue on the old readdir could. But what would it buy? Due to the limited man power we can only move forward with compromises; in this case you are fighting for code that's DOA. The only thing that you achieve by still sticking to the old readdir code is you are preventing that we move forward at all. Remember: I originally sent this patch almost 7 months ago. > > Also: it does not make sense to move locking on this series from fs driver > > back to main I/O thread. Atomicity must retain on driver side, not on top > > half. > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > be called with a single v9fs_co_run_in_worker() invocation, in > which case the locking could just be something like: > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp, > struct V9fsDirEnt **entries, > int32_t maxsize, bool dostat) > { > int err = 0; > > if (v9fs_request_cancelled(pdu)) { > return -EINTR; > } > > v9fs_readdir_lock(&fidp->fs.dir); > > v9fs_co_run_in_worker({ > err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); > }); > > v9fs_readdir_unlock(&fidp->fs.dir); > return err; > } That's exactly what should be prevented. The lock must be on driver thread side, not on main thread side. The goal is to reduce the time slice of individual locks. If the lock is on main thread, the time slice of a lock (even on huge directories with thousands of entries) is multiple factors larger than if the lock is on driver thread side. So that's not just few percent or so, it is huge. > This would technically leave the locking in the main I/O thread, > but it isn't conceptually different from locking at the beginning > of do_readdir_lock() and unlocking before returning. My concern > is that I don't know how coroutine mutexes behave with multiple > worker threads... Well, your Mutex -> CoMutex change was intended to fix a deadlock with the old readdir implementation. That deadlock could not happen with the new readdir implementation, which suggests that it probably makes sense to revert this change (i.e. CoMutex -> Mutex) for this new implementation. Best regards, Christian Schoenebeck
On Wed, 01 Jul 2020 13:47:12 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Mittwoch, 1. Juli 2020 12:09:24 CEST Greg Kurz wrote: > > No I'm talking about code that isn't changed by this series: > > > > if (initial_offset == 0) { > > v9fs_co_rewinddir(pdu, fidp); > > } else { > > v9fs_co_seekdir(pdu, fidp, initial_offset); > > } > > count = v9fs_do_readdir(pdu, fidp, max_count); > > > > Leaving these outside the critical section seems to negate > > your arguments... please clarify. > > Yeah, I admit I have neglected this issue, as concurrent readdir requests with > same FID always was some kind of theoretical issue. But yes, you are right, It's perfectly fine to miss things, that's what reviews are made for :) > that should be addressed by > > 1. entirely removing the rewinddir/seekdir code here > > 2. passing initial_offset to v9fs_do_readdir(), which in turn would > pass it to readdir_many() > > 3. readdir_many() would handle rewinddir/seekdir exclusively in its crticial > section. > Sounds good. v7 ? > > There are indeed several issues with the current readdir: > > > > 1) potential inconsistency on concurrent Treaddir requests > > > > 2) excessive dispatching to worker threads > > > > So we agreed that 1) should never happen with a regular linux > > client (we could even dump the lock actually) but we want to > > make it clear in the code anyway that actions on the directory > > stream are serialized. > > My point is you are trying to fix a merely thereotical issue on code that's > conceptually, inherently wrong and dead end code at first place. Top half code I'm not trying to fix anything. We already have locking in place to deal with this theoretical issue and it interferes with your changes. I don't care that much if a silly guest shoots itself in the foot actually, so it'll be fine with me if you just dump the lock, as long as it doesn't cause QEMU to hang or crash. > should always be designed to be as thin as possible. The old readdir code > though is the complete opposite. > It isn't readdir only, most requests span over multiple v9fs_co_*() calls... > > 2) basically means moving some logic from the frontend to the > > backend, ie. called from v9fs_co_run_in_worker(). This implies > > that a very long request (ie. one that would span on many calls > > to readdir()) cannot be interrupted by the client anymore, as > > opposed to what we have now BTW. > ... most likely to allow clients to interrupt a long request with a Tflush. Have you considered this aspect in your changes ? > 3) Complexity of old readdir code is so much bigger compared to the new one > such that probability of additional issues is also significantly higher. > > > I tend to think that addressing both issues in one "rush" is > > as much "error prone". > > No it's not. The optimized readdir implementation is quantifyable, > significantly less complex than the old implementation (i.e. significantly > smaller amount of branches and error pathes, determenistic clear separation of > thread's task assignments which includes much smaller amount of thread hops). > Less complexity and increased determinism consequently means reduced chance of > misbehaviours. So that's not a subjective, but rather a quantifyable, proven > statement. > > You can't switch from the old (wrong) concept to the new concept without some > minimum amount of changes, which yes are not small, but I don't see any way to > make the change set smaller without yet introducing new negative side effects. > > I am now using words that I heard from your side before: please be realistic. > Currently man power on 9p code is extremely limited to put it mildly. Yes, we > could spend time on fixing this (theoretical) issue on the old readdir could. > But what would it buy? Due to the limited man power we can only move forward > with compromises; in this case you are fighting for code that's DOA. The only > thing that you achieve by still sticking to the old readdir code is you are > preventing that we move forward at all. Remember: I originally sent this patch > almost 7 months ago. > > > > Also: it does not make sense to move locking on this series from fs driver > > > back to main I/O thread. Atomicity must retain on driver side, not on top > > > half. > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > > be called with a single v9fs_co_run_in_worker() invocation, in > > which case the locking could just be something like: > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp, > > struct V9fsDirEnt **entries, > > int32_t maxsize, bool dostat) > > { > > int err = 0; > > > > if (v9fs_request_cancelled(pdu)) { > > return -EINTR; > > } > > > > v9fs_readdir_lock(&fidp->fs.dir); > > > > v9fs_co_run_in_worker({ > > err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); > > }); > > > > v9fs_readdir_unlock(&fidp->fs.dir); > > return err; > > } > > That's exactly what should be prevented. The lock must be on driver thread > side, not on main thread side. The goal is to reduce the time slice of > individual locks. If the lock is on main thread, the time slice of a lock > (even on huge directories with thousands of entries) is multiple factors > larger than if the lock is on driver thread side. So that's not just few > percent or so, it is huge. > Sorry I don't get it... In a contention-less situation, which is the case we really care for, qemu_co_mutex_lock() has no overhead. > > This would technically leave the locking in the main I/O thread, > > but it isn't conceptually different from locking at the beginning > > of do_readdir_lock() and unlocking before returning. My concern > > is that I don't know how coroutine mutexes behave with multiple > > worker threads... > > Well, your Mutex -> CoMutex change was intended to fix a deadlock with the old > readdir implementation. That deadlock could not happen with the new readdir > implementation, which suggests that it probably makes sense to revert this > change (i.e. CoMutex -> Mutex) for this new implementation. > No we can't because it is also used with 9p2000.u, that you said you don't want to fix. > Best regards, > Christian Schoenebeck > > Cheers, -- Greg
On Mittwoch, 1. Juli 2020 17:12:40 CEST Greg Kurz wrote: > On Wed, 01 Jul 2020 13:47:12 +0200 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > On Mittwoch, 1. Juli 2020 12:09:24 CEST Greg Kurz wrote: > > > No I'm talking about code that isn't changed by this series: > > > if (initial_offset == 0) { > > > > > > v9fs_co_rewinddir(pdu, fidp); > > > > > > } else { > > > > > > v9fs_co_seekdir(pdu, fidp, initial_offset); > > > > > > } > > > count = v9fs_do_readdir(pdu, fidp, max_count); > > > > > > Leaving these outside the critical section seems to negate > > > your arguments... please clarify. > > > > Yeah, I admit I have neglected this issue, as concurrent readdir requests > > with same FID always was some kind of theoretical issue. But yes, you are > > right, > It's perfectly fine to miss things, that's what reviews are made for :) > > > that should be addressed by > > > > 1. entirely removing the rewinddir/seekdir code here > > > > 2. passing initial_offset to v9fs_do_readdir(), which in turn would > > > > pass it to readdir_many() > > > > 3. readdir_many() would handle rewinddir/seekdir exclusively in its > > crticial> > > section. > > Sounds good. v7 ? Sure, that's not the problem, I can repost this handled appropriately of course. But would you please finalize your picture of this patch set first? I would really (try) to finally nail this thing with the next version. > > > There are indeed several issues with the current readdir: > > > > > > 1) potential inconsistency on concurrent Treaddir requests > > > > > > 2) excessive dispatching to worker threads > > > > > > So we agreed that 1) should never happen with a regular linux > > > client (we could even dump the lock actually) but we want to > > > make it clear in the code anyway that actions on the directory > > > stream are serialized. > > > > My point is you are trying to fix a merely thereotical issue on code > > that's > > conceptually, inherently wrong and dead end code at first place. Top half > > code > I'm not trying to fix anything. We already have locking in place to > deal with this theoretical issue and it interferes with your changes. > I don't care that much if a silly guest shoots itself in the foot > actually, so it'll be fine with me if you just dump the lock, as > long as it doesn't cause QEMU to hang or crash. Ah ok, I got you as if you wanted to fix more details on the old readdir code, which would be a clear blocker for this patch set to proceed. Good then. > > should always be designed to be as thin as possible. The old readdir code > > though is the complete opposite. > > It isn't readdir only, most requests span over multiple v9fs_co_*() calls... Right, I know! And that's actually my root motivation to finally bring this patch set forward, since I am very aware that this patch set is just a small brick in the overall procecss of fixing similarly affected code portions. So yes, that's my plan to fix them with upcoming patch sets, too. Having said that, could we probably try to make future reviews a bit more efficient at certain aspects? For instance it would help a lot if the patch set was reviewed entirely, and not stopped at the very first issue spotted and then suspended to ++version, as this creates large latencies in the overall review process? And likewise it would also help if review is prioritized on relevant behaviour aspects (concept, design) first before arguing about code style details like variable names or other details invisible to users. And finally: compromises. As man power on 9p is very limited, it would make sense to limit patch sets at a certain extent and agree to postpone certain non-critical issues to subsequent patches (or sets of), otherwise a patch set will grow and grow and it will take ages to get forward. > > > 2) basically means moving some logic from the frontend to the > > > backend, ie. called from v9fs_co_run_in_worker(). This implies > > > that a very long request (ie. one that would span on many calls > > > to readdir()) cannot be interrupted by the client anymore, as > > > opposed to what we have now BTW. > > ... most likely to allow clients to interrupt a long request with a > Tflush. Have you considered this aspect in your changes ? In this particular patch set, no, as for readdir this should not be an issue in practice for anybody. After this patch set is applied, even on huge directories, the fs driver's duration for readdir would barely be measurable. In fact the server's latency would always be much higher, yet. But also for the upcoming, planned patch sets (i.e. other request types): That would be an example where I would ask you to lower you expectations a bit and that we could agree to postpone such an aspect to a subsequent, separate patch (or set of). > > 3) Complexity of old readdir code is so much bigger compared to the new > > one > > > > such that probability of additional issues is also significantly > > higher. > > > > > > I tend to think that addressing both issues in one "rush" is > > > as much "error prone". > > > > No it's not. The optimized readdir implementation is quantifyable, > > significantly less complex than the old implementation (i.e. significantly > > smaller amount of branches and error pathes, determenistic clear > > separation of thread's task assignments which includes much smaller > > amount of thread hops). Less complexity and increased determinism > > consequently means reduced chance of misbehaviours. So that's not a > > subjective, but rather a quantifyable, proven statement. > > > > You can't switch from the old (wrong) concept to the new concept without > > some minimum amount of changes, which yes are not small, but I don't see > > any way to make the change set smaller without yet introducing new > > negative side effects. > > > > I am now using words that I heard from your side before: please be > > realistic. Currently man power on 9p code is extremely limited to put it > > mildly. Yes, we could spend time on fixing this (theoretical) issue on > > the old readdir could. But what would it buy? Due to the limited man > > power we can only move forward with compromises; in this case you are > > fighting for code that's DOA. The only thing that you achieve by still > > sticking to the old readdir code is you are preventing that we move > > forward at all. Remember: I originally sent this patch almost 7 months > > ago. > > > > > > Also: it does not make sense to move locking on this series from fs > > > > driver > > > > back to main I/O thread. Atomicity must retain on driver side, not on > > > > top > > > > half. > > > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > > > be called with a single v9fs_co_run_in_worker() invocation, in > > > which case the locking could just be something like: > > > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp, > > > > > > struct V9fsDirEnt **entries, > > > int32_t maxsize, bool dostat) > > > > > > { > > > > > > int err = 0; > > > > > > if (v9fs_request_cancelled(pdu)) { > > > > > > return -EINTR; > > > > > > } > > > > > > v9fs_readdir_lock(&fidp->fs.dir); > > > > > > v9fs_co_run_in_worker({ > > > > > > err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); > > > > > > }); > > > > > > v9fs_readdir_unlock(&fidp->fs.dir); > > > return err; > > > > > > } > > > > That's exactly what should be prevented. The lock must be on driver thread > > side, not on main thread side. The goal is to reduce the time slice of > > individual locks. If the lock is on main thread, the time slice of a lock > > (even on huge directories with thousands of entries) is multiple factors > > larger than if the lock is on driver thread side. So that's not just few > > percent or so, it is huge. > > Sorry I don't get it... In a contention-less situation, which is the > case we really care for, qemu_co_mutex_lock() has no overhead. Yes, it only kicks in if there is concurrency. > > > This would technically leave the locking in the main I/O thread, > > > but it isn't conceptually different from locking at the beginning > > > of do_readdir_lock() and unlocking before returning. My concern > > > is that I don't know how coroutine mutexes behave with multiple > > > worker threads... > > > > Well, your Mutex -> CoMutex change was intended to fix a deadlock with the > > old readdir implementation. That deadlock could not happen with the new > > readdir implementation, which suggests that it probably makes sense to > > revert this change (i.e. CoMutex -> Mutex) for this new implementation. > > No we can't because it is also used with 9p2000.u, that you said you > don't want to fix. Yes and no, I did not say I don't want to fix readdir for 9p2000.u at all. What I said was I want to prioritize and concentrate on the most relevant issues first. 9p2000.L is the most commonly used protocol variant, so I would like to fix the most severe (e.g. performance) issues for 9p2000.L first before spending time on 9p2000.u which is AFAICS barely used in comparison, which I personally don't use at all, and which I am hence not testing in the same degree and cannot serve with the same quality as 9p2000.L right now. Plus if there are really users caring for 9p2000.u, I can gladly assist them to address these issues for 9p2000.u as well, provided that they help at least with testing the required 9p2000.u changes. Back to the actual topic: so what do we do about the mutex then? CoMutex for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it would just be a transitional measure. Best regards, Christian Schoenebeck
On Thu, 02 Jul 2020 13:43:11 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Mittwoch, 1. Juli 2020 17:12:40 CEST Greg Kurz wrote: > > On Wed, 01 Jul 2020 13:47:12 +0200 > > > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > On Mittwoch, 1. Juli 2020 12:09:24 CEST Greg Kurz wrote: > > > > No I'm talking about code that isn't changed by this series: > > > > if (initial_offset == 0) { > > > > > > > > v9fs_co_rewinddir(pdu, fidp); > > > > > > > > } else { > > > > > > > > v9fs_co_seekdir(pdu, fidp, initial_offset); > > > > > > > > } > > > > count = v9fs_do_readdir(pdu, fidp, max_count); > > > > > > > > Leaving these outside the critical section seems to negate > > > > your arguments... please clarify. > > > > > > Yeah, I admit I have neglected this issue, as concurrent readdir requests > > > with same FID always was some kind of theoretical issue. But yes, you are > > > right, > > It's perfectly fine to miss things, that's what reviews are made for :) > > > > > that should be addressed by > > > > > > 1. entirely removing the rewinddir/seekdir code here > > > > > > 2. passing initial_offset to v9fs_do_readdir(), which in turn would > > > > > > pass it to readdir_many() > > > > > > 3. readdir_many() would handle rewinddir/seekdir exclusively in its > > > crticial> > > > section. > > > > Sounds good. v7 ? > > Sure, that's not the problem, I can repost this handled appropriately of > course. > > But would you please finalize your picture of this patch set first? I would > really (try) to finally nail this thing with the next version. > I'll do my best. > > > > There are indeed several issues with the current readdir: > > > > > > > > 1) potential inconsistency on concurrent Treaddir requests > > > > > > > > 2) excessive dispatching to worker threads > > > > > > > > So we agreed that 1) should never happen with a regular linux > > > > client (we could even dump the lock actually) but we want to > > > > make it clear in the code anyway that actions on the directory > > > > stream are serialized. > > > > > > My point is you are trying to fix a merely thereotical issue on code > > > that's > > > conceptually, inherently wrong and dead end code at first place. Top half > > > code > > I'm not trying to fix anything. We already have locking in place to > > deal with this theoretical issue and it interferes with your changes. > > I don't care that much if a silly guest shoots itself in the foot > > actually, so it'll be fine with me if you just dump the lock, as > > long as it doesn't cause QEMU to hang or crash. > > Ah ok, I got you as if you wanted to fix more details on the old readdir code, > which would be a clear blocker for this patch set to proceed. Good then. > > > > should always be designed to be as thin as possible. The old readdir code > > > though is the complete opposite. > > > > It isn't readdir only, most requests span over multiple v9fs_co_*() calls... > > Right, I know! And that's actually my root motivation to finally bring this > patch set forward, since I am very aware that this patch set is just a small > brick in the overall procecss of fixing similarly affected code portions. So > yes, that's my plan to fix them with upcoming patch sets, too. > > Having said that, could we probably try to make future reviews a bit more > efficient at certain aspects? For instance it would help a lot if the patch > set was reviewed entirely, and not stopped at the very first issue spotted and > then suspended to ++version, as this creates large latencies in the overall > review process? > Review of 9pfs is a best effort thing... I usually stop reviewing when I'm fed up or when all the time I could reasonably invest is elapsed. Breaking down the series in smaller patches is the usual improvement for that. > And likewise it would also help if review is prioritized on relevant behaviour > aspects (concept, design) first before arguing about code style details like > variable names or other details invisible to users. > I don't remember questioning the overall concept behind these changes because it looks reasonable enough (even if I would appreciate to be able to verify them with a working reproducer). Even if it is invisible to the users, coding style or how a series is broken down in patches is important for developers. > And finally: compromises. As man power on 9p is very limited, it would make > sense to limit patch sets at a certain extent and agree to postpone certain > non-critical issues to subsequent patches (or sets of), otherwise a patch set > will grow and grow and it will take ages to get forward. > FWIW this patchset was more than 10 patches and now it is only 5 :) The good news is that you'll soon be able to merge things by yourself. I'll help you when I can but I won't be the gating factor anymore. Hurrah ! > > > > 2) basically means moving some logic from the frontend to the > > > > backend, ie. called from v9fs_co_run_in_worker(). This implies > > > > that a very long request (ie. one that would span on many calls > > > > to readdir()) cannot be interrupted by the client anymore, as > > > > opposed to what we have now BTW. > > > > ... most likely to allow clients to interrupt a long request with a > > Tflush. Have you considered this aspect in your changes ? > > In this particular patch set, no, as for readdir this should not be an issue > in practice for anybody. After this patch set is applied, even on huge > directories, the fs driver's duration for readdir would barely be measurable. > In fact the server's latency would always be much higher, yet. > Reproducer ? Numbers ? ;) > But also for the upcoming, planned patch sets (i.e. other request types): That > would be an example where I would ask you to lower you expectations a bit and > that we could agree to postpone such an aspect to a subsequent, separate patch > (or set of). > > > > 3) Complexity of old readdir code is so much bigger compared to the new > > > one > > > > > > such that probability of additional issues is also significantly > > > higher. > > > > > > > > I tend to think that addressing both issues in one "rush" is > > > > as much "error prone". > > > > > > No it's not. The optimized readdir implementation is quantifyable, > > > significantly less complex than the old implementation (i.e. significantly > > > smaller amount of branches and error pathes, determenistic clear > > > separation of thread's task assignments which includes much smaller > > > amount of thread hops). Less complexity and increased determinism > > > consequently means reduced chance of misbehaviours. So that's not a > > > subjective, but rather a quantifyable, proven statement. > > > > > > You can't switch from the old (wrong) concept to the new concept without > > > some minimum amount of changes, which yes are not small, but I don't see > > > any way to make the change set smaller without yet introducing new > > > negative side effects. > > > > > > I am now using words that I heard from your side before: please be > > > realistic. Currently man power on 9p code is extremely limited to put it > > > mildly. Yes, we could spend time on fixing this (theoretical) issue on > > > the old readdir could. But what would it buy? Due to the limited man > > > power we can only move forward with compromises; in this case you are > > > fighting for code that's DOA. The only thing that you achieve by still > > > sticking to the old readdir code is you are preventing that we move > > > forward at all. Remember: I originally sent this patch almost 7 months > > > ago. > > > > > > > > Also: it does not make sense to move locking on this series from fs > > > > > driver > > > > > back to main I/O thread. Atomicity must retain on driver side, not on > > > > > top > > > > > half. > > > > > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > > > > be called with a single v9fs_co_run_in_worker() invocation, in > > > > which case the locking could just be something like: > > > > > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp, > > > > > > > > struct V9fsDirEnt **entries, > > > > int32_t maxsize, bool dostat) > > > > > > > > { > > > > > > > > int err = 0; > > > > > > > > if (v9fs_request_cancelled(pdu)) { > > > > > > > > return -EINTR; > > > > > > > > } > > > > > > > > v9fs_readdir_lock(&fidp->fs.dir); > > > > > > > > v9fs_co_run_in_worker({ > > > > > > > > err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); > > > > > > > > }); > > > > > > > > v9fs_readdir_unlock(&fidp->fs.dir); > > > > return err; > > > > > > > > } > > > > > > That's exactly what should be prevented. The lock must be on driver thread > > > side, not on main thread side. The goal is to reduce the time slice of > > > individual locks. If the lock is on main thread, the time slice of a lock > > > (even on huge directories with thousands of entries) is multiple factors > > > larger than if the lock is on driver thread side. So that's not just few > > > percent or so, it is huge. > > > > Sorry I don't get it... In a contention-less situation, which is the > > case we really care for, qemu_co_mutex_lock() has no overhead. > > Yes, it only kicks in if there is concurrency. > And we don't care to preserve performance for silly clients, do we ? > > > > This would technically leave the locking in the main I/O thread, > > > > but it isn't conceptually different from locking at the beginning > > > > of do_readdir_lock() and unlocking before returning. My concern > > > > is that I don't know how coroutine mutexes behave with multiple > > > > worker threads... > > > > > > Well, your Mutex -> CoMutex change was intended to fix a deadlock with the > > > old readdir implementation. That deadlock could not happen with the new > > > readdir implementation, which suggests that it probably makes sense to > > > revert this change (i.e. CoMutex -> Mutex) for this new implementation. > > > > No we can't because it is also used with 9p2000.u, that you said you > > don't want to fix. > > Yes and no, I did not say I don't want to fix readdir for 9p2000.u at all. > What I said was I want to prioritize and concentrate on the most relevant > issues first. 9p2000.L is the most commonly used protocol variant, so I would > like to fix the most severe (e.g. performance) issues for 9p2000.L first > before spending time on 9p2000.u which is AFAICS barely used in comparison, > which I personally don't use at all, and which I am hence not testing in the > same degree and cannot serve with the same quality as 9p2000.L right now. > > Plus if there are really users caring for 9p2000.u, I can gladly assist them > to address these issues for 9p2000.u as well, provided that they help at least > with testing the required 9p2000.u changes. > I would personally do the opposite... again ;) Like you say we essentially care for 9p2000.L and we only do limited support for 9p2000.u. If we have a change that we really want for 9p2000.L but it has an impact on 9p2000.u because of shared code, it is fine to do the changes anyway, including changes to the 9p2000.u specific code. Very basic testing of 9p2000.u (mount/ls or find/umount) or maybe running pjd-fstest is enough IMHO. If we break someone's setup and he complains, then we can ask him to test a fix before we merge it. > Back to the actual topic: so what do we do about the mutex then? CoMutex for > 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it would just > be a transitional measure. > That would ruin my day... More seriously, the recent fix for a deadlock condition that was present for years proves that nobody seems to be using silly clients with QEMU. So I think we should just dump the lock and add a one-time warning in the top level handlers when we detect a duplicate readdir request on the same fid. This should be a very simple patch (I can take care of it right away). > Best regards, > Christian Schoenebeck > > Cheers, -- Greg
On Donnerstag, 2. Juli 2020 17:35:00 CEST Greg Kurz wrote: > > > It isn't readdir only, most requests span over multiple v9fs_co_*() > > > calls...> > > Right, I know! And that's actually my root motivation to finally bring > > this > > patch set forward, since I am very aware that this patch set is just a > > small brick in the overall procecss of fixing similarly affected code > > portions. So yes, that's my plan to fix them with upcoming patch sets, > > too. > > > > Having said that, could we probably try to make future reviews a bit more > > efficient at certain aspects? For instance it would help a lot if the > > patch > > set was reviewed entirely, and not stopped at the very first issue spotted > > and then suspended to ++version, as this creates large latencies in the > > overall review process? > > Review of 9pfs is a best effort thing... I usually stop reviewing when I'm > fed up or when all the time I could reasonably invest is elapsed. Breaking > down the series in smaller patches is the usual improvement for that. No need to defend, you do these things voluntarily. I am glad that you spend time on this project at all. I would appreciate though if we could reduce the overall time for a patch set a bit, and my suggestion trying to walk through an entire set before re-posting a new version might indeed bring an improvement without anybody having to invest more time, but rather less. > > And likewise it would also help if review is prioritized on relevant > > behaviour aspects (concept, design) first before arguing about code style > > details like variable names or other details invisible to users. > > I don't remember questioning the overall concept behind these changes > because it looks reasonable enough (even if I would appreciate to be > able to verify them with a working reproducer). What exactly do you mean here with working reproducer? > Even if it is invisible to the users, coding style or how a series is > broken down in patches is important for developers. Dedication for detail beyond a certain fine graded level is luxury, and for that reason it is designated for projects which clearly can afford it. On (sub)projects with low activity and high latency (like this one) it easily leads to frustration, which is obviously contra productive. I don't say don't care about code style, details et al., but caring a bit less on reviews would not hurt, so to say. A bit of (invisible) tolerance avoids friction and stagnancy. > > And finally: compromises. As man power on 9p is very limited, it would > > make > > sense to limit patch sets at a certain extent and agree to postpone > > certain > > non-critical issues to subsequent patches (or sets of), otherwise a patch > > set will grow and grow and it will take ages to get forward. > > FWIW this patchset was more than 10 patches and now it is only 5 :) Hey, no misleading number crunching please. ;-) Three dropped patches were never intended to be merged, they were pure benchmarks. > The good news is that you'll soon be able to merge things by yourself. > I'll help you when I can but I won't be the gating factor anymore. > > Hurrah ! Yep, I'll do my best to pursue your work, I appreciate what you did and that you still volunteer to help! > > > > > 2) basically means moving some logic from the frontend to the > > > > > backend, ie. called from v9fs_co_run_in_worker(). This implies > > > > > that a very long request (ie. one that would span on many calls > > > > > to readdir()) cannot be interrupted by the client anymore, as > > > > > opposed to what we have now BTW. > > > > > > ... most likely to allow clients to interrupt a long request with a > > > Tflush. Have you considered this aspect in your changes ? > > > > In this particular patch set, no, as for readdir this should not be an > > issue in practice for anybody. After this patch set is applied, even on > > huge directories, the fs driver's duration for readdir would barely be > > measurable. In fact the server's latency would always be much higher, > > yet. > > Reproducer ? Numbers ? ;) Well, you have already seen and run the benchmark in this series yourself. You can easily hit several hundred ms server latency, which is very hard to top with any fs driver reading a directory. Ok, maybe with a tape drive you could, but that's pretty much it. :) With a 'normal' system the fs driver would rather block for something between <1ms .. 8ms, i.e. fs driver completes before it would be able to actually see a flush request. But to make it also clear: it would not bite with the new design either. If someone really would see a necessity to abort readdir requests, that would be possible without huge changes. > > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > > > > > be called with a single v9fs_co_run_in_worker() invocation, in > > > > > which case the locking could just be something like: > > > > > > > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState > > > > > *fidp, > > > > > > > > > > struct V9fsDirEnt **entries, > > > > > int32_t maxsize, bool dostat) > > > > > > > > > > { > > > > > > > > > > int err = 0; > > > > > > > > > > if (v9fs_request_cancelled(pdu)) { > > > > > > > > > > return -EINTR; > > > > > > > > > > } > > > > > > > > > > v9fs_readdir_lock(&fidp->fs.dir); > > > > > > > > > > v9fs_co_run_in_worker({ > > > > > > > > > > err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); > > > > > > > > > > }); > > > > > > > > > > v9fs_readdir_unlock(&fidp->fs.dir); > > > > > return err; > > > > > > > > > > } > > > > > > > > That's exactly what should be prevented. The lock must be on driver > > > > thread > > > > side, not on main thread side. The goal is to reduce the time slice of > > > > individual locks. If the lock is on main thread, the time slice of a > > > > lock > > > > (even on huge directories with thousands of entries) is multiple > > > > factors > > > > larger than if the lock is on driver thread side. So that's not just > > > > few > > > > percent or so, it is huge. > > > > > > Sorry I don't get it... In a contention-less situation, which is the > > > case we really care for, qemu_co_mutex_lock() has no overhead. > > > > Yes, it only kicks in if there is concurrency. > > And we don't care to preserve performance for silly clients, do we ? We don't necessarily need to preserve performance for them, that's right. But it is already handled appropriately for them, so destroying it (i.e. slowing them down) only makes sense if there is a good reason for that. > > > > > This would technically leave the locking in the main I/O thread, > > > > > but it isn't conceptually different from locking at the beginning > > > > > of do_readdir_lock() and unlocking before returning. My concern > > > > > is that I don't know how coroutine mutexes behave with multiple > > > > > worker threads... > > > > > > > > Well, your Mutex -> CoMutex change was intended to fix a deadlock with > > > > the > > > > old readdir implementation. That deadlock could not happen with the > > > > new > > > > readdir implementation, which suggests that it probably makes sense to > > > > revert this change (i.e. CoMutex -> Mutex) for this new > > > > implementation. > > > > > > No we can't because it is also used with 9p2000.u, that you said you > > > don't want to fix. > > > > Yes and no, I did not say I don't want to fix readdir for 9p2000.u at all. > > What I said was I want to prioritize and concentrate on the most relevant > > issues first. 9p2000.L is the most commonly used protocol variant, so I > > would like to fix the most severe (e.g. performance) issues for 9p2000.L > > first before spending time on 9p2000.u which is AFAICS barely used in > > comparison, which I personally don't use at all, and which I am hence not > > testing in the same degree and cannot serve with the same quality as > > 9p2000.L right now. > > > > Plus if there are really users caring for 9p2000.u, I can gladly assist > > them to address these issues for 9p2000.u as well, provided that they > > help at least with testing the required 9p2000.u changes. > > I would personally do the opposite... again ;) > > Like you say we essentially care for 9p2000.L and we only do limited > support for 9p2000.u. If we have a change that we really want for > 9p2000.L but it has an impact on 9p2000.u because of shared code, > it is fine to do the changes anyway, including changes to the 9p2000.u > specific code. Very basic testing of 9p2000.u (mount/ls or find/umount) > or maybe running pjd-fstest is enough IMHO. If we break someone's setup > and he complains, then we can ask him to test a fix before we merge it. Well, so you want to handle the relevant 9p2000.u readdir optimizations by yourself, and you would send it as follow-up patch set (after this set is through), including test cases? > > Back to the actual topic: so what do we do about the mutex then? CoMutex > > for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it > > would just be a transitional measure. > > That would ruin my day... > > More seriously, the recent fix for a deadlock condition that was present > for years proves that nobody seems to be using silly clients with QEMU. > So I think we should just dump the lock and add a one-time warning in > the top level handlers when we detect a duplicate readdir request on > the same fid. This should be a very simple patch (I can take care of > it right away). Looks like we have a consensus! Then I wait for your patch removing the lock, and for your feedback whether or not you see anything else in this patch set? > > Best regards, > > Christian Schoenebeck > > Cheers, > > -- > Greg Best regards, Christian Schoenebeck
On Donnerstag, 2. Juli 2020 19:23:35 CEST Christian Schoenebeck wrote: > > > Back to the actual topic: so what do we do about the mutex then? CoMutex > > > for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it > > > would just be a transitional measure. > > > > That would ruin my day... > > > > More seriously, the recent fix for a deadlock condition that was present > > for years proves that nobody seems to be using silly clients with QEMU. > > So I think we should just dump the lock and add a one-time warning in > > the top level handlers when we detect a duplicate readdir request on > > the same fid. This should be a very simple patch (I can take care of > > it right away). > > Looks like we have a consensus! Then I wait for your patch removing the > lock, and for your feedback whether or not you see anything else in this > patch set? Please wait before you work on this patch. I really do think your patch should be based on/after this optimization patch for one reason: if (and even though it's a big if) somebody comes along with a silly client as you named it, then your patch can simply be reverted, which would not be possible if it's next. So I would really suggest I change this patch here to go the ugly way with 2 mutex types for readdir 9p2000.L vs 9p2000.L, and your patch would get rid of that mess by removing the lock entirely, okay? Best regards, Christian Schoenebeck
On Thu, 02 Jul 2020 19:23:35 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Donnerstag, 2. Juli 2020 17:35:00 CEST Greg Kurz wrote: > > > > It isn't readdir only, most requests span over multiple v9fs_co_*() > > > > calls...> > > > Right, I know! And that's actually my root motivation to finally bring > > > this > > > patch set forward, since I am very aware that this patch set is just a > > > small brick in the overall procecss of fixing similarly affected code > > > portions. So yes, that's my plan to fix them with upcoming patch sets, > > > too. > > > > > > Having said that, could we probably try to make future reviews a bit more > > > efficient at certain aspects? For instance it would help a lot if the > > > patch > > > set was reviewed entirely, and not stopped at the very first issue spotted > > > and then suspended to ++version, as this creates large latencies in the > > > overall review process? > > > > Review of 9pfs is a best effort thing... I usually stop reviewing when I'm > > fed up or when all the time I could reasonably invest is elapsed. Breaking > > down the series in smaller patches is the usual improvement for that. > > No need to defend, you do these things voluntarily. I am glad that you spend > time on this project at all. I would appreciate though if we could reduce the > overall time for a patch set a bit, and my suggestion trying to walk through > an entire set before re-posting a new version might indeed bring an > improvement without anybody having to invest more time, but rather less. > I'll try to adjust but I think you should also try to split patches (like you did eventually with the addition of patch 3). > > > And likewise it would also help if review is prioritized on relevant > > > behaviour aspects (concept, design) first before arguing about code style > > > details like variable names or other details invisible to users. > > > > I don't remember questioning the overall concept behind these changes > > because it looks reasonable enough (even if I would appreciate to be > > able to verify them with a working reproducer). > > What exactly do you mean here with working reproducer? > Some test program to be run in a real guest that gets improved performance with this patch set for example. > > Even if it is invisible to the users, coding style or how a series is > > broken down in patches is important for developers. > > Dedication for detail beyond a certain fine graded level is luxury, and for > that reason it is designated for projects which clearly can afford it. On > (sub)projects with low activity and high latency (like this one) it easily > leads to frustration, which is obviously contra productive. > The only way to avoid frustration when things don't go to your pace is to take control, as you decided to do :) > I don't say don't care about code style, details et al., but caring a bit less > on reviews would not hurt, so to say. A bit of (invisible) tolerance avoids > friction and stagnancy. > No, this friction and stagnancy is essentially the result of not sharing the same agenda and different tastes also. > > > And finally: compromises. As man power on 9p is very limited, it would > > > make > > > sense to limit patch sets at a certain extent and agree to postpone > > > certain > > > non-critical issues to subsequent patches (or sets of), otherwise a patch > > > set will grow and grow and it will take ages to get forward. > > > > FWIW this patchset was more than 10 patches and now it is only 5 :) > > Hey, no misleading number crunching please. ;-) Three dropped patches were > never intended to be merged, they were pure benchmarks. > Heh :) > > The good news is that you'll soon be able to merge things by yourself. > > I'll help you when I can but I won't be the gating factor anymore. > > > > Hurrah ! > > Yep, I'll do my best to pursue your work, I appreciate what you did and that > you still volunteer to help! > And if I'm to picky on reviews, feel free to ignore my remarks ;) > > > > > > 2) basically means moving some logic from the frontend to the > > > > > > backend, ie. called from v9fs_co_run_in_worker(). This implies > > > > > > that a very long request (ie. one that would span on many calls > > > > > > to readdir()) cannot be interrupted by the client anymore, as > > > > > > opposed to what we have now BTW. > > > > > > > > ... most likely to allow clients to interrupt a long request with a > > > > Tflush. Have you considered this aspect in your changes ? > > > > > > In this particular patch set, no, as for readdir this should not be an > > > issue in practice for anybody. After this patch set is applied, even on > > > huge directories, the fs driver's duration for readdir would barely be > > > measurable. In fact the server's latency would always be much higher, > > > yet. > > > > Reproducer ? Numbers ? ;) > > Well, you have already seen and run the benchmark in this series yourself. You > can easily hit several hundred ms server latency, which is very hard to top > with any fs driver reading a directory. Ok, maybe with a tape drive you could, > but that's pretty much it. :) With a 'normal' system the fs driver would Or some network file system on an extremely slow connection... > rather block for something between <1ms .. 8ms, i.e. fs driver completes > before it would be able to actually see a flush request. > > But to make it also clear: it would not bite with the new design either. If > someone really would see a necessity to abort readdir requests, that would be > possible without huge changes. > Tflush requests are handled in the top-half exclusively by design, ie. they rely on the worker thread handling the targeted request to yield control back to the main I/O thread. Since this doesn't happen anymore with your patches, I'm not sure how you be able to abort a _many readdir_ request once it's been fired into a worker thread. A possible solution could be to break down a Treaddir into multiple _many readdirs_ jobs, and have a knob or some logic to control the latency ratio. > > > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > > > > > > be called with a single v9fs_co_run_in_worker() invocation, in > > > > > > which case the locking could just be something like: > > > > > > > > > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState > > > > > > *fidp, > > > > > > > > > > > > struct V9fsDirEnt **entries, > > > > > > int32_t maxsize, bool dostat) > > > > > > > > > > > > { > > > > > > > > > > > > int err = 0; > > > > > > > > > > > > if (v9fs_request_cancelled(pdu)) { > > > > > > > > > > > > return -EINTR; > > > > > > > > > > > > } > > > > > > > > > > > > v9fs_readdir_lock(&fidp->fs.dir); > > > > > > > > > > > > v9fs_co_run_in_worker({ > > > > > > > > > > > > err = do_readdir_many(pdu, fidp, entries, maxsize, dostat); > > > > > > > > > > > > }); > > > > > > > > > > > > v9fs_readdir_unlock(&fidp->fs.dir); > > > > > > return err; > > > > > > > > > > > > } > > > > > > > > > > That's exactly what should be prevented. The lock must be on driver > > > > > thread > > > > > side, not on main thread side. The goal is to reduce the time slice of > > > > > individual locks. If the lock is on main thread, the time slice of a > > > > > lock > > > > > (even on huge directories with thousands of entries) is multiple > > > > > factors > > > > > larger than if the lock is on driver thread side. So that's not just > > > > > few > > > > > percent or so, it is huge. > > > > > > > > Sorry I don't get it... In a contention-less situation, which is the > > > > case we really care for, qemu_co_mutex_lock() has no overhead. > > > > > > Yes, it only kicks in if there is concurrency. > > > > And we don't care to preserve performance for silly clients, do we ? > > We don't necessarily need to preserve performance for them, that's right. But > it is already handled appropriately for them, so destroying it (i.e. slowing > them down) only makes sense if there is a good reason for that. > Ending up with a mix of two different kind of mutexes for 9p2000.L and .u is a good enough reason for me. > > > > > > This would technically leave the locking in the main I/O thread, > > > > > > but it isn't conceptually different from locking at the beginning > > > > > > of do_readdir_lock() and unlocking before returning. My concern > > > > > > is that I don't know how coroutine mutexes behave with multiple > > > > > > worker threads... > > > > > > > > > > Well, your Mutex -> CoMutex change was intended to fix a deadlock with > > > > > the > > > > > old readdir implementation. That deadlock could not happen with the > > > > > new > > > > > readdir implementation, which suggests that it probably makes sense to > > > > > revert this change (i.e. CoMutex -> Mutex) for this new > > > > > implementation. > > > > > > > > No we can't because it is also used with 9p2000.u, that you said you > > > > don't want to fix. > > > > > > Yes and no, I did not say I don't want to fix readdir for 9p2000.u at all. > > > What I said was I want to prioritize and concentrate on the most relevant > > > issues first. 9p2000.L is the most commonly used protocol variant, so I > > > would like to fix the most severe (e.g. performance) issues for 9p2000.L > > > first before spending time on 9p2000.u which is AFAICS barely used in > > > comparison, which I personally don't use at all, and which I am hence not > > > testing in the same degree and cannot serve with the same quality as > > > 9p2000.L right now. > > > > > > Plus if there are really users caring for 9p2000.u, I can gladly assist > > > them to address these issues for 9p2000.u as well, provided that they > > > help at least with testing the required 9p2000.u changes. > > > > I would personally do the opposite... again ;) > > > > Like you say we essentially care for 9p2000.L and we only do limited > > support for 9p2000.u. If we have a change that we really want for > > 9p2000.L but it has an impact on 9p2000.u because of shared code, > > it is fine to do the changes anyway, including changes to the 9p2000.u > > specific code. Very basic testing of 9p2000.u (mount/ls or find/umount) > > or maybe running pjd-fstest is enough IMHO. If we break someone's setup > > and he complains, then we can ask him to test a fix before we merge it. > > Well, so you want to handle the relevant 9p2000.u readdir optimizations by > yourself, and you would send it as follow-up patch set (after this set is > through), including test cases? > Ah it wasn't my point. I was just saying that any valuable change for 9p2000.L prevails and if you have to change some common code (eg. locking) that could impact the 9p2000.u experience, you can do it anyway, even if you only do smoke testing with 9p2000.u. > > > Back to the actual topic: so what do we do about the mutex then? CoMutex > > > for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it > > > would just be a transitional measure. > > > > That would ruin my day... > > > > More seriously, the recent fix for a deadlock condition that was present > > for years proves that nobody seems to be using silly clients with QEMU. > > So I think we should just dump the lock and add a one-time warning in > > the top level handlers when we detect a duplicate readdir request on > > the same fid. This should be a very simple patch (I can take care of > > it right away). > > Looks like we have a consensus! Then I wait for your patch removing the lock, > and for your feedback whether or not you see anything else in this patch set? > I was ready to do that but now I'm reading you other mail... I'll continue the discussion there. > > > Best regards, > > > Christian Schoenebeck > > > > Cheers, > > > > -- > > Greg > > Best regards, > Christian Schoenebeck > >
On Fri, 03 Jul 2020 10:08:09 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Donnerstag, 2. Juli 2020 19:23:35 CEST Christian Schoenebeck wrote: > > > > Back to the actual topic: so what do we do about the mutex then? CoMutex > > > > for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but it > > > > would just be a transitional measure. > > > > > > That would ruin my day... > > > > > > More seriously, the recent fix for a deadlock condition that was present > > > for years proves that nobody seems to be using silly clients with QEMU. > > > So I think we should just dump the lock and add a one-time warning in > > > the top level handlers when we detect a duplicate readdir request on > > > the same fid. This should be a very simple patch (I can take care of > > > it right away). > > > > Looks like we have a consensus! Then I wait for your patch removing the > > lock, and for your feedback whether or not you see anything else in this > > patch set? > > Please wait before you work on this patch. I really do think your patch should > be based on/after this optimization patch for one reason: if (and even though > it's a big if) somebody comes along with a silly client as you named it, then > your patch can simply be reverted, which would not be possible if it's next. > > So I would really suggest I change this patch here to go the ugly way with 2 > mutex types for readdir 9p2000.L vs 9p2000.L, and your patch would get rid of > that mess by removing the lock entirely, okay? > If someones ever comes along with a silly client, she will get warnings explaining that she might get silly results. If it turns out that we really need to support that for valid reasons, it will be okay to focus on the appropriate fix when the time comes, not now. So I don't say any real benefit to postponing the removal of the lock after your series, but I do at least see one benefit, it will reduce the level of noise. > Best regards, > Christian Schoenebeck > >
On Freitag, 3. Juli 2020 17:53:15 CEST Greg Kurz wrote: > > > I don't remember questioning the overall concept behind these changes > > > because it looks reasonable enough (even if I would appreciate to be > > > able to verify them with a working reproducer). > > > > What exactly do you mean here with working reproducer? > > Some test program to be run in a real guest that gets improved performance > with this patch set for example. Mhm, for now the benchmark patches already provided should be proof enough that this optimization will bring significant improvements in the end. Like mentioned on a side note; to actually feel the difference with a real Linux client, a kernel patch is also required for the guest. But I definitely don't start dealing with LKML before the ground is laid on QEMU side (i.e. this patch set is merged on master). > > I don't say don't care about code style, details et al., but caring a bit > > less on reviews would not hurt, so to say. A bit of (invisible) tolerance > > avoids friction and stagnancy. > > No, this friction and stagnancy is essentially the result of not sharing > the same agenda and different tastes also. That summary is a bit too simple, and you know that. But okay, let's leave it that way. > > rather block for something between <1ms .. 8ms, i.e. fs driver completes > > before it would be able to actually see a flush request. > > > > But to make it also clear: it would not bite with the new design either. > > If > > someone really would see a necessity to abort readdir requests, that would > > be possible without huge changes. > > Tflush requests are handled in the top-half exclusively by design, ie. > they rely on the worker thread handling the targeted request to yield > control back to the main I/O thread. Since this doesn't happen anymore > with your patches, I'm not sure how you be able to abort a _many readdir_ > request once it's been fired into a worker thread. > > A possible solution could be to break down a Treaddir into multiple > _many readdirs_ jobs, and have a knob or some logic to control the > latency ratio. Too complicated, there is no need to break it down into subtasks or something. It can be handled with a simple abort condition in readdir_many()'s loop. But that's not an issue for now to discuss the details. > > > > > > > Then the whole seekdir/rewinddir/telldir/readdir sequence should > > > > > > > be called with a single v9fs_co_run_in_worker() invocation, in > > > > > > > which case the locking could just be something like: > > > > > > > > > > > > > > int coroutine_fn v9fs_co_readdir_many(V9fsPDU *pdu, V9fsFidState > > > > > > > *fidp, > > > > > > > > > > > > > > struct V9fsDirEnt > > > > > > > **entries, > > > > > > > int32_t maxsize, bool > > > > > > > dostat) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > int err = 0; > > > > > > > > > > > > > > if (v9fs_request_cancelled(pdu)) { > > > > > > > > > > > > > > return -EINTR; > > > > > > > > > > > > > > } > > > > > > > > > > > > > > v9fs_readdir_lock(&fidp->fs.dir); > > > > > > > > > > > > > > v9fs_co_run_in_worker({ > > > > > > > > > > > > > > err = do_readdir_many(pdu, fidp, entries, maxsize, > > > > > > > dostat); > > > > > > > > > > > > > > }); > > > > > > > > > > > > > > v9fs_readdir_unlock(&fidp->fs.dir); > > > > > > > return err; > > > > > > > > > > > > > > } > > > > > > > > > > > > That's exactly what should be prevented. The lock must be on > > > > > > driver > > > > > > thread > > > > > > side, not on main thread side. The goal is to reduce the time > > > > > > slice of > > > > > > individual locks. If the lock is on main thread, the time slice of > > > > > > a > > > > > > lock > > > > > > (even on huge directories with thousands of entries) is multiple > > > > > > factors > > > > > > larger than if the lock is on driver thread side. So that's not > > > > > > just > > > > > > few > > > > > > percent or so, it is huge. > > > > > > > > > > Sorry I don't get it... In a contention-less situation, which is > > > > > the > > > > > case we really care for, qemu_co_mutex_lock() has no overhead. > > > > > > > > Yes, it only kicks in if there is concurrency. > > > > > > And we don't care to preserve performance for silly clients, do we ? > > > > We don't necessarily need to preserve performance for them, that's right. > > But it is already handled appropriately for them, so destroying it (i.e. > > slowing them down) only makes sense if there is a good reason for that. > > Ending up with a mix of two different kind of mutexes for 9p2000.L and .u is > a good enough reason for me. Correct behaviour always has precedence before code aesthetics, but we already agreed to remove the lock entirely anyway, so let's move on. > > > > > > > This would technically leave the locking in the main I/O thread, > > > > > > > but it isn't conceptually different from locking at the > > > > > > > beginning > > > > > > > of do_readdir_lock() and unlocking before returning. My concern > > > > > > > is that I don't know how coroutine mutexes behave with multiple > > > > > > > worker threads... > > > > > > > > > > > > Well, your Mutex -> CoMutex change was intended to fix a deadlock > > > > > > with > > > > > > the > > > > > > old readdir implementation. That deadlock could not happen with > > > > > > the > > > > > > new > > > > > > readdir implementation, which suggests that it probably makes > > > > > > sense to > > > > > > revert this change (i.e. CoMutex -> Mutex) for this new > > > > > > implementation. > > > > > > > > > > No we can't because it is also used with 9p2000.u, that you said you > > > > > don't want to fix. > > > > > > > > Yes and no, I did not say I don't want to fix readdir for 9p2000.u at > > > > all. > > > > What I said was I want to prioritize and concentrate on the most > > > > relevant > > > > issues first. 9p2000.L is the most commonly used protocol variant, so > > > > I > > > > would like to fix the most severe (e.g. performance) issues for > > > > 9p2000.L > > > > first before spending time on 9p2000.u which is AFAICS barely used in > > > > comparison, which I personally don't use at all, and which I am hence > > > > not > > > > testing in the same degree and cannot serve with the same quality as > > > > 9p2000.L right now. > > > > > > > > Plus if there are really users caring for 9p2000.u, I can gladly > > > > assist > > > > them to address these issues for 9p2000.u as well, provided that they > > > > help at least with testing the required 9p2000.u changes. > > > > > > I would personally do the opposite... again ;) > > > > > > Like you say we essentially care for 9p2000.L and we only do limited > > > support for 9p2000.u. If we have a change that we really want for > > > 9p2000.L but it has an impact on 9p2000.u because of shared code, > > > it is fine to do the changes anyway, including changes to the 9p2000.u > > > specific code. Very basic testing of 9p2000.u (mount/ls or find/umount) > > > or maybe running pjd-fstest is enough IMHO. If we break someone's setup > > > and he complains, then we can ask him to test a fix before we merge it. > > > > Well, so you want to handle the relevant 9p2000.u readdir optimizations by > > yourself, and you would send it as follow-up patch set (after this set is > > through), including test cases? > > Ah it wasn't my point. I was just saying that any valuable change for > 9p2000.L prevails and if you have to change some common code (eg. > locking) that could impact the 9p2000.u experience, you can do it > anyway, even if you only do smoke testing with 9p2000.u. Ah I see, so you would. But you don't. ;-) And yes, I could. But I won't either, for the following reasons: 1. I would have to write readdir test cases for 9p2000.u as well, because the 9p2000.L tests are incompatible. -> My time 2. I have to change the 9p2000.u server code -> My time 3. This patch grows substantially, by both lines and amount of patches -> My time and probably +10 versions more of this patch series. 4. I would have to do at least some basic testing of 9p2000.u behaviour -> My time All these things would bring me +x months farther away from reaching my actual goal: making 9p useable by Linux clients. So no, I don't waste time on 9p2000.u optimizations before I see somebody actually caring for 9p2000.u efficiency. Best regards, Christian Schoenebeck
On Freitag, 3. Juli 2020 18:08:21 CEST Greg Kurz wrote: > On Fri, 03 Jul 2020 10:08:09 +0200 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > On Donnerstag, 2. Juli 2020 19:23:35 CEST Christian Schoenebeck wrote: > > > > > Back to the actual topic: so what do we do about the mutex then? > > > > > CoMutex > > > > > for 9p2000.u and Mutex for 9p2000.L? I know you find that ugly, but > > > > > it > > > > > would just be a transitional measure. > > > > > > > > That would ruin my day... > > > > > > > > More seriously, the recent fix for a deadlock condition that was > > > > present > > > > for years proves that nobody seems to be using silly clients with > > > > QEMU. > > > > So I think we should just dump the lock and add a one-time warning in > > > > the top level handlers when we detect a duplicate readdir request on > > > > the same fid. This should be a very simple patch (I can take care of > > > > it right away). > > > > > > Looks like we have a consensus! Then I wait for your patch removing the > > > lock, and for your feedback whether or not you see anything else in this > > > patch set? > > > > Please wait before you work on this patch. I really do think your patch > > should be based on/after this optimization patch for one reason: if (and > > even though it's a big if) somebody comes along with a silly client as > > you named it, then your patch can simply be reverted, which would not be > > possible if it's next. > > > > So I would really suggest I change this patch here to go the ugly way with > > 2 mutex types for readdir 9p2000.L vs 9p2000.L, and your patch would get > > rid of that mess by removing the lock entirely, okay? > > If someones ever comes along with a silly client, she will get warnings > explaining that she might get silly results. If it turns out that we > really need to support that for valid reasons, it will be okay to focus > on the appropriate fix when the time comes, not now. So I don't say any > real benefit to postponing the removal of the lock after your series, but > I do at least see one benefit, it will reduce the level of noise. Reasons: 1. Less work for me, as I would have more conflicts to resolve manually if your patch would come next. 2. The dual mutex change is just like 3 lines of code more -> trivial (and my problem) 3. If somebody complains, I just have to revert your patch. 4. For you, it does neither mean more time nor more efforts, as you haven't started to write the patch yet. 5. The actual outcome from your side is the same: you get what you want, the lock will be gone entirely after all anyway. and most notably: 6. I don't see any advantage if your patch would come next. Best regards, Christian Schoenebeck
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 43584aca41..8283a3cfbb 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -971,30 +971,6 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, return 0; } -static int coroutine_fn dirent_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, - struct dirent *dent, V9fsQID *qidp) -{ - struct stat stbuf; - V9fsPath path; - int err; - - v9fs_path_init(&path); - - err = v9fs_co_name_to_path(pdu, &fidp->path, dent->d_name, &path); - if (err < 0) { - goto out; - } - err = v9fs_co_lstat(pdu, &path, &stbuf); - if (err < 0) { - goto out; - } - err = stat_to_qid(pdu, &stbuf, qidp); - -out: - v9fs_path_free(&path); - return err; -} - V9fsPDU *pdu_alloc(V9fsState *s) { V9fsPDU *pdu = NULL; @@ -2337,6 +2313,18 @@ size_t v9fs_readdir_response_size(V9fsString *name) return 24 + v9fs_string_size(name); } +static void v9fs_free_dirents(struct V9fsDirEnt *e) +{ + struct V9fsDirEnt *next = NULL; + + for (; e; e = next) { + next = e->next; + g_free(e->dent); + g_free(e->st); + g_free(e); + } +} + static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, int32_t max_count) { @@ -2345,54 +2333,53 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, V9fsString name; int len, err = 0; int32_t count = 0; - off_t saved_dir_pos; struct dirent *dent; + struct stat *st; + struct V9fsDirEnt *entries = NULL; - /* save the directory position */ - saved_dir_pos = v9fs_co_telldir(pdu, fidp); - if (saved_dir_pos < 0) { - return saved_dir_pos; - } - - while (1) { - v9fs_readdir_lock(&fidp->fs.dir); + /* + * inode remapping requires the device id, which in turn might be + * different for different directory entries, so if inode remapping is + * enabled we have to make a full stat for each directory entry + */ + const bool dostat = pdu->s->ctx.export_flags & V9FS_REMAP_INODES; - err = v9fs_co_readdir(pdu, fidp, &dent); - if (err || !dent) { - break; - } - v9fs_string_init(&name); - v9fs_string_sprintf(&name, "%s", dent->d_name); - if ((count + v9fs_readdir_response_size(&name)) > max_count) { - v9fs_readdir_unlock(&fidp->fs.dir); + /* + * Fetch all required directory entries altogether on a background IO + * thread from fs driver. We don't want to do that for each entry + * individually, because hopping between threads (this main IO thread + * and background IO driver thread) would sum up to huge latencies. + */ + count = v9fs_co_readdir_many(pdu, fidp, &entries, max_count, dostat); + if (count < 0) { + err = count; + count = 0; + goto out; + } + count = 0; - /* Ran out of buffer. Set dir back to old position and return */ - v9fs_co_seekdir(pdu, fidp, saved_dir_pos); - v9fs_string_free(&name); - return count; - } + for (struct V9fsDirEnt *e = entries; e; e = e->next) { + dent = e->dent; if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { - /* - * dirent_to_qid() implies expensive stat call for each entry, - * we must do that here though since inode remapping requires - * the device id, which in turn might be different for - * different entries; we cannot make any assumption to avoid - * that here. - */ - err = dirent_to_qid(pdu, fidp, dent, &qid); + st = e->st; + /* e->st should never be NULL, but just to be sure */ + if (!st) { + err = -1; + break; + } + + /* remap inode */ + err = stat_to_qid(pdu, st, &qid); if (err < 0) { - v9fs_readdir_unlock(&fidp->fs.dir); - v9fs_co_seekdir(pdu, fidp, saved_dir_pos); - v9fs_string_free(&name); - return err; + break; } } else { /* * Fill up just the path field of qid because the client uses * only that. To fill the entire qid structure we will have * to stat each dirent found, which is expensive. For the - * latter reason we don't call dirent_to_qid() here. Only drawback + * latter reason we don't call stat_to_qid() here. Only drawback * is that no multi-device export detection of stat_to_qid() * would be done and provided as error to the user here. But * user would get that error anyway when accessing those @@ -2405,25 +2392,26 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, qid.version = 0; } + v9fs_string_init(&name); + v9fs_string_sprintf(&name, "%s", dent->d_name); + /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */ len = pdu_marshal(pdu, 11 + count, "Qqbs", &qid, dent->d_off, dent->d_type, &name); - v9fs_readdir_unlock(&fidp->fs.dir); + v9fs_string_free(&name); if (len < 0) { - v9fs_co_seekdir(pdu, fidp, saved_dir_pos); - v9fs_string_free(&name); - return len; + err = len; + break; } + count += len; - v9fs_string_free(&name); - saved_dir_pos = dent->d_off; } - v9fs_readdir_unlock(&fidp->fs.dir); - +out: + v9fs_free_dirents(entries); if (err < 0) { return err; }
Make top half really top half and bottom half really bottom half: Each T_readdir request handling is hopping between threads (main I/O thread and background I/O driver threads) several times for every individual directory entry, which sums up to huge latencies for handling just a single T_readdir request. Instead of doing that, collect now all required directory entries (including all potentially required stat buffers for each entry) in one rush on a background I/O thread from fs driver by calling the previously added function v9fs_co_readdir_many() instead of v9fs_co_readdir(), then assemble the entire resulting network response message for the readdir request on main I/O thread. The fs driver is still aborting the directory entry retrieval loop (on the background I/O thread inside of v9fs_co_readdir_many()) as soon as it would exceed the client's requested maximum R_readdir response size. So this will not introduce a performance penalty on another end. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> --- hw/9pfs/9p.c | 122 +++++++++++++++++++++++---------------------------- 1 file changed, 55 insertions(+), 67 deletions(-)