Message ID | 158826201391.1344781.9403916162733181811.stgit@bahia.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | 9pfs: Fix potential deadlock of QEMU mainloop | expand |
On Donnerstag, 30. April 2020 17:53:34 CEST Greg Kurz wrote: > Locking was introduced to address the deprecation of readdir_r(3) > in glibc 2.24. It turns out that the frontend code is the worst > place to handle a critical section with a pthread mutex: the code > runs in a coroutine on behalf of the QEMU mainloop and then yields > control, waiting for the fsdev backend to process the request in a > worker thread. If the client resends another readdir request for the > same fid before the previous one finally unlocked the mutex, we're > deadlocked. This never bit us because the linux client serializes > readdir requests for the same fid, but the small qtest in this patch > demonstrates how easy it would be for a guest to hang QEMU. > > A solution could be to use CoMutex instead of QemuMutex, but it seems > better to narrow the critical section to the worker thread code and to > return a copy of the dirent to the frontend. Ok, but why not both? Moving locks to worker thread and QemuMutex -> CoMutex? > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > hw/9pfs/9p-synth.c | 12 +++++++++++ > hw/9pfs/9p-synth.h | 2 ++ > hw/9pfs/9p.c | 45 > +++++++++++++----------------------------- hw/9pfs/codir.c | > 11 ++++++++-- > hw/9pfs/coth.h | 3 ++- > tests/qtest/virtio-9p-test.c | 33 +++++++++++++++++++++++++++++++ > 6 files changed, 72 insertions(+), 34 deletions(-) > > diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c > index 7eb210ffa89b..fb0741569bcd 100644 > --- a/hw/9pfs/9p-synth.c > +++ b/hw/9pfs/9p-synth.c > @@ -256,6 +256,13 @@ static struct dirent *synth_readdir(FsContext *ctx, > V9fsFidOpenState *fs) if (entry) { > synth_open->offset++; > } > + > + if (!strcmp(node->name, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR)) { > + while (synth_open->offset != > QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET) { + sched_yield(); > + } > + } > + > return entry; > } The while loop condition would deserve a short comment here of what it's about. > @@ -596,6 +603,11 @@ static int synth_init(FsContext *ctx, Error **errp) > assert(!ret); > g_free(name); > } > + > + ret = qemu_v9fs_synth_mkdir( > + NULL, 0700, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR, &dir > + ); > + assert(!ret); > } > } > > diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h > index 036d7e4a5b1c..c9a65b61ed45 100644 > --- a/hw/9pfs/9p-synth.h > +++ b/hw/9pfs/9p-synth.h > @@ -59,6 +59,8 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int > mode, #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir" > #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d" > #define QTEST_V9FS_SYNTH_READDIR_NFILES 100 > +#define QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR "ReadDirDeadLock" > +#define QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET 3 > > /* Any write to the "FLUSH" file is handled one byte at a time by the > * backend. If the byte is zero, the backend returns success (ie, 1), > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > index 9e046f7acb51..ac84ae804496 100644 > --- a/hw/9pfs/9p.c > +++ b/hw/9pfs/9p.c > @@ -2170,7 +2170,7 @@ static int coroutine_fn > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > struct stat stbuf; > off_t saved_dir_pos; > - struct dirent *dent; > + struct dirent dent; > > /* save the directory position */ > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > @@ -2181,13 +2181,11 @@ static int coroutine_fn > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { > v9fs_path_init(&path); > > - v9fs_readdir_lock(&fidp->fs.dir); > - That's the deadlock fix, but ... > err = v9fs_co_readdir(pdu, fidp, &dent); > - if (err || !dent) { > + if (err <= 0) { > break; > } ... even though this code simplification might make sense, I don't think it should be mixed with the deadlock fix together in one patch. They are not related with each other, nor is the code simplification you are aiming trivial enough to justify squashing. The deadlock fix should make it through the stable branches, while the code simplification should not. So that's better off as a separate cleanup patch. Plus the commit log does not even mention these cleanup changes. > - err = v9fs_co_name_to_path(pdu, &fidp->path, dent->d_name, &path); > + err = v9fs_co_name_to_path(pdu, &fidp->path, dent.d_name, &path); > if (err < 0) { > break; > } > @@ -2195,13 +2193,11 @@ static int coroutine_fn > v9fs_do_readdir_with_stat(V9fsPDU *pdu, if (err < 0) { > break; > } > - err = stat_to_v9stat(pdu, &path, dent->d_name, &stbuf, &v9stat); > + err = stat_to_v9stat(pdu, &path, dent.d_name, &stbuf, &v9stat); > if (err < 0) { > break; > } > if ((count + v9stat.size + 2) > max_count) { > - v9fs_readdir_unlock(&fidp->fs.dir); > - > /* Ran out of buffer. Set dir back to old position and return > */ v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > v9fs_stat_free(&v9stat); > @@ -2212,8 +2208,6 @@ static int coroutine_fn > v9fs_do_readdir_with_stat(V9fsPDU *pdu, /* 11 = 7 + 4 (7 = start offset, 4 > = space for storing count) */ len = pdu_marshal(pdu, 11 + count, "S", > &v9stat); > > - v9fs_readdir_unlock(&fidp->fs.dir); > - > if (len < 0) { > v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > v9fs_stat_free(&v9stat); > @@ -2223,11 +2217,9 @@ static int coroutine_fn > v9fs_do_readdir_with_stat(V9fsPDU *pdu, count += len; > v9fs_stat_free(&v9stat); > v9fs_path_free(&path); > - saved_dir_pos = dent->d_off; > + saved_dir_pos = dent.d_off; > } > > - v9fs_readdir_unlock(&fidp->fs.dir); > - > v9fs_path_free(&path); > if (err < 0) { > return err; > @@ -2340,7 +2332,7 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, int len, err = 0; > int32_t count = 0; > off_t saved_dir_pos; > - struct dirent *dent; > + struct dirent dent; > > /* save the directory position */ > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > @@ -2349,17 +2341,13 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU > *pdu, V9fsFidState *fidp, } > > while (1) { > - v9fs_readdir_lock(&fidp->fs.dir); > - > err = v9fs_co_readdir(pdu, fidp, &dent); > - if (err || !dent) { > + if (err <= 0) { > break; > } > v9fs_string_init(&name); > - v9fs_string_sprintf(&name, "%s", dent->d_name); > + v9fs_string_sprintf(&name, "%s", dent.d_name); > if ((count + v9fs_readdir_data_size(&name)) > max_count) { > - v9fs_readdir_unlock(&fidp->fs.dir); > - > /* Ran out of buffer. Set dir back to old position and return > */ v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > v9fs_string_free(&name); > @@ -2374,9 +2362,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, * different entries; we cannot make any assumption to > avoid * that here. > */ > - err = dirent_to_qid(pdu, fidp, dent, &qid); > + err = dirent_to_qid(pdu, fidp, &dent, &qid); > if (err < 0) { > - v9fs_readdir_unlock(&fidp->fs.dir); > v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > v9fs_string_free(&name); > return err; > @@ -2392,8 +2379,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, * user would get that error anyway when accessing those > * files/dirs through other ways. > */ > - size = MIN(sizeof(dent->d_ino), sizeof(qid.path)); > - memcpy(&qid.path, &dent->d_ino, size); > + size = MIN(sizeof(dent.d_ino), sizeof(qid.path)); > + memcpy(&qid.path, &dent.d_ino, size); > /* Fill the other fields with dummy values */ > qid.type = 0; > qid.version = 0; > @@ -2401,10 +2388,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, > > /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */ > len = pdu_marshal(pdu, 11 + count, "Qqbs", > - &qid, dent->d_off, > - dent->d_type, &name); > - > - v9fs_readdir_unlock(&fidp->fs.dir); > + &qid, dent.d_off, > + dent.d_type, &name); > > if (len < 0) { > v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > @@ -2413,11 +2398,9 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, } > count += len; > v9fs_string_free(&name); > - saved_dir_pos = dent->d_off; > + saved_dir_pos = dent.d_off; > } > > - v9fs_readdir_unlock(&fidp->fs.dir); > - > if (err < 0) { > return err; > } > diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c > index 73f9a751e178..90356dc60140 100644 > --- a/hw/9pfs/codir.c > +++ b/hw/9pfs/codir.c > @@ -19,7 +19,7 @@ > #include "coth.h" > > int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, V9fsFidState *fidp, > - struct dirent **dent) > + struct dirent *dent) > { > int err; > V9fsState *s = pdu->s; > @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, struct dirent *entry; > > errno = 0; > + > + v9fs_readdir_lock(&fidp->fs.dir); > + > entry = s->ops->readdir(&s->ctx, &fidp->fs); > if (!entry && errno) { > err = -errno; > + } else if (entry) { > + memcpy(dent, entry, sizeof(*dent)); > + err = 1; I find using sizeof(*dent) a bit dangerous considering potential type changes in future. I would rather use sizeof(struct dirent). It is also more human friendly to read IMO. > } else { > - *dent = entry; > err = 0; > } > + > + v9fs_readdir_unlock(&fidp->fs.dir); > }); > return err; > } > diff --git a/hw/9pfs/coth.h b/hw/9pfs/coth.h > index c2cdc7a9eaac..696c90111261 100644 > --- a/hw/9pfs/coth.h > +++ b/hw/9pfs/coth.h > @@ -48,7 +48,8 @@ > > void co_run_in_worker_bh(void *); > int coroutine_fn v9fs_co_readlink(V9fsPDU *, V9fsPath *, V9fsString *); > -int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent > **); +/* Returns < 0 on error, 1 on success and 0 on end of directory */ > +int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent > *); off_t coroutine_fn v9fs_co_telldir(V9fsPDU *, V9fsFidState *); > void coroutine_fn v9fs_co_seekdir(V9fsPDU *, V9fsFidState *, off_t); > void coroutine_fn v9fs_co_rewinddir(V9fsPDU *, V9fsFidState *); That API comment should better go into the .c file as a multi-line comment instead of squeezing it as single-line comment into the header file here. It would also make sense either defining macros, or enums as those 2 possible succesful return types of that function instead of simply using hard coded values (0 and 1). > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c > index 2167322985ec..f46ad888daa6 100644 > --- a/tests/qtest/virtio-9p-test.c > +++ b/tests/qtest/virtio-9p-test.c > @@ -793,6 +793,38 @@ static void fs_flush_ignored(void *obj, void *data, > QGuestAllocator *t_alloc) g_free(wnames[0]); > } > > +static void fs_readdir_deadlock(void *obj, void *data, QGuestAllocator > *t_alloc) +{ > + QVirtio9P *v9p = obj; > + alloc = t_alloc; > + char *const wnames[] = { > g_strdup(QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR) }; + P9Req *req, *req2; > + > + fs_attach(v9p, NULL, t_alloc); > + req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0); > + v9fs_req_wait_for_reply(req, NULL); > + v9fs_rwalk(req, NULL, NULL); > + > + req = v9fs_tlopen(v9p, 1, O_WRONLY, 0); > + v9fs_req_wait_for_reply(req, NULL); > + v9fs_rlopen(req, NULL, NULL); > + > + /* > + * The first request will remain pending in the backend until > + * a request with the magic offset is received. This deadlocks > + * the mainloop of an older QEMU that still handles the critical > + * section around readdir() in the frontend code. > + */ > + req = v9fs_treaddir(v9p, 1, 0, 128, 0); > + req2 = v9fs_treaddir(v9p, 1, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET, > 128, + 1); > + v9fs_req_wait_for_reply(req, NULL); > + v9fs_req_free(req); > + v9fs_req_free(req2); > + > + g_free(wnames[0]); > +} > + > static void register_virtio_9p_test(void) > { > qos_add_test("config", "virtio-9p", pci_config, NULL); > @@ -810,6 +842,7 @@ static void register_virtio_9p_test(void) > qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored, > NULL); > qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL); > + qos_add_test("fs/readdir/deadlock", "virtio-9p", fs_readdir_deadlock, > NULL); } > > libqos_init(register_virtio_9p_test); Best regards, Christian Schoenebeck
On Mittwoch, 6. Mai 2020 15:05:23 CEST Christian Schoenebeck wrote: > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > index 9e046f7acb51..ac84ae804496 100644 > > --- a/hw/9pfs/9p.c > > +++ b/hw/9pfs/9p.c > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > struct stat stbuf; > > off_t saved_dir_pos; > > > > - struct dirent *dent; > > + struct dirent dent; One more: since this dirent structure is now on the stack, it should better be initialized for safety reasons. Best regards, Christian Schoenebeck
On Wed, 06 May 2020 15:05:23 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Donnerstag, 30. April 2020 17:53:34 CEST Greg Kurz wrote: > > Locking was introduced to address the deprecation of readdir_r(3) > > in glibc 2.24. It turns out that the frontend code is the worst > > place to handle a critical section with a pthread mutex: the code > > runs in a coroutine on behalf of the QEMU mainloop and then yields > > control, waiting for the fsdev backend to process the request in a > > worker thread. If the client resends another readdir request for the > > same fid before the previous one finally unlocked the mutex, we're > > deadlocked. This never bit us because the linux client serializes > > readdir requests for the same fid, but the small qtest in this patch > > demonstrates how easy it would be for a guest to hang QEMU. > > > > A solution could be to use CoMutex instead of QemuMutex, but it seems > > better to narrow the critical section to the worker thread code and to > > return a copy of the dirent to the frontend. > > Ok, but why not both? Moving locks to worker thread and QemuMutex -> CoMutex? > Using CoMutex would be mandatory if we leave the locking where it sits today, so that the main thread can switch to other coroutines instead of blocking. We don't have the same requirement with the worker thread: it just needs to do the actual readdir() and then it goes back to the thread pool, waiting to be summoned again for some other work. So I'd rather use standard mutexes to keep things simple... why would you want to use a CoMutex here ? > > Signed-off-by: Greg Kurz <groug@kaod.org> > > --- > > hw/9pfs/9p-synth.c | 12 +++++++++++ > > hw/9pfs/9p-synth.h | 2 ++ > > hw/9pfs/9p.c | 45 > > +++++++++++++----------------------------- hw/9pfs/codir.c | > > 11 ++++++++-- > > hw/9pfs/coth.h | 3 ++- > > tests/qtest/virtio-9p-test.c | 33 +++++++++++++++++++++++++++++++ > > 6 files changed, 72 insertions(+), 34 deletions(-) > > > > diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c > > index 7eb210ffa89b..fb0741569bcd 100644 > > --- a/hw/9pfs/9p-synth.c > > +++ b/hw/9pfs/9p-synth.c > > @@ -256,6 +256,13 @@ static struct dirent *synth_readdir(FsContext *ctx, > > V9fsFidOpenState *fs) if (entry) { > > synth_open->offset++; > > } > > + > > + if (!strcmp(node->name, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR)) { > > + while (synth_open->offset != > > QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET) { + sched_yield(); > > + } > > + } > > + > > return entry; > > } > > The while loop condition would deserve a short comment here of what it's > about. > Will do. > > @@ -596,6 +603,11 @@ static int synth_init(FsContext *ctx, Error **errp) > > assert(!ret); > > g_free(name); > > } > > + > > + ret = qemu_v9fs_synth_mkdir( > > + NULL, 0700, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR, &dir > > + ); > > + assert(!ret); > > } > > } > > > > diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h > > index 036d7e4a5b1c..c9a65b61ed45 100644 > > --- a/hw/9pfs/9p-synth.h > > +++ b/hw/9pfs/9p-synth.h > > @@ -59,6 +59,8 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int > > mode, #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir" > > #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d" > > #define QTEST_V9FS_SYNTH_READDIR_NFILES 100 > > +#define QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR "ReadDirDeadLock" > > +#define QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET 3 > > > > /* Any write to the "FLUSH" file is handled one byte at a time by the > > * backend. If the byte is zero, the backend returns success (ie, 1), > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > index 9e046f7acb51..ac84ae804496 100644 > > --- a/hw/9pfs/9p.c > > +++ b/hw/9pfs/9p.c > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > struct stat stbuf; > > off_t saved_dir_pos; > > - struct dirent *dent; > > + struct dirent dent; > > > > /* save the directory position */ > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > @@ -2181,13 +2181,11 @@ static int coroutine_fn > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { > > v9fs_path_init(&path); > > > > - v9fs_readdir_lock(&fidp->fs.dir); > > - > > That's the deadlock fix, but ... > > > err = v9fs_co_readdir(pdu, fidp, &dent); > > - if (err || !dent) { > > + if (err <= 0) { > > break; > > } > > ... even though this code simplification might make sense, I don't think it > should be mixed with the deadlock fix together in one patch. They are not I could possibly split this in two patches, one for returning a copy and one for moving the locking around, but... > related with each other, nor is the code simplification you are aiming trivial ... this assertion is somewhat wrong: moving the locking to v9fs_co_readdir() really requires it returns a copy. > enough to justify squashing. The deadlock fix should make it through the > stable branches, while the code simplification should not. So that's better > off as a separate cleanup patch. > The issue has been there for such a long time without causing any trouble. Not worth adding churn in stable for a bug that is impossible to hit with a regular linux guest. > Plus the commit log does not even mention these cleanup changes. > What you call cleanup changes is simply "change the function to return a copy of the dirent by reference instead of an unsafe pointer"... and it is _briefly_ mentioned at the end of the changelog. I'll give a try at two patches and see if it's clearer. > > - err = v9fs_co_name_to_path(pdu, &fidp->path, dent->d_name, &path); > > + err = v9fs_co_name_to_path(pdu, &fidp->path, dent.d_name, &path); > > if (err < 0) { > > break; > > } > > @@ -2195,13 +2193,11 @@ static int coroutine_fn > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, if (err < 0) { > > break; > > } > > - err = stat_to_v9stat(pdu, &path, dent->d_name, &stbuf, &v9stat); > > + err = stat_to_v9stat(pdu, &path, dent.d_name, &stbuf, &v9stat); > > if (err < 0) { > > break; > > } > > if ((count + v9stat.size + 2) > max_count) { > > - v9fs_readdir_unlock(&fidp->fs.dir); > > - > > /* Ran out of buffer. Set dir back to old position and return > > */ v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > > v9fs_stat_free(&v9stat); > > @@ -2212,8 +2208,6 @@ static int coroutine_fn > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, /* 11 = 7 + 4 (7 = start offset, 4 > > = space for storing count) */ len = pdu_marshal(pdu, 11 + count, "S", > > &v9stat); > > > > - v9fs_readdir_unlock(&fidp->fs.dir); > > - > > if (len < 0) { > > v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > > v9fs_stat_free(&v9stat); > > @@ -2223,11 +2217,9 @@ static int coroutine_fn > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, count += len; > > v9fs_stat_free(&v9stat); > > v9fs_path_free(&path); > > - saved_dir_pos = dent->d_off; > > + saved_dir_pos = dent.d_off; > > } > > > > - v9fs_readdir_unlock(&fidp->fs.dir); > > - > > v9fs_path_free(&path); > > if (err < 0) { > > return err; > > @@ -2340,7 +2332,7 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > > V9fsFidState *fidp, int len, err = 0; > > int32_t count = 0; > > off_t saved_dir_pos; > > - struct dirent *dent; > > + struct dirent dent; > > > > /* save the directory position */ > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > @@ -2349,17 +2341,13 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU > > *pdu, V9fsFidState *fidp, } > > > > while (1) { > > - v9fs_readdir_lock(&fidp->fs.dir); > > - > > err = v9fs_co_readdir(pdu, fidp, &dent); > > - if (err || !dent) { > > + if (err <= 0) { > > break; > > } > > v9fs_string_init(&name); > > - v9fs_string_sprintf(&name, "%s", dent->d_name); > > + v9fs_string_sprintf(&name, "%s", dent.d_name); > > if ((count + v9fs_readdir_data_size(&name)) > max_count) { > > - v9fs_readdir_unlock(&fidp->fs.dir); > > - > > /* Ran out of buffer. Set dir back to old position and return > > */ v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > > v9fs_string_free(&name); > > @@ -2374,9 +2362,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > > V9fsFidState *fidp, * different entries; we cannot make any assumption to > > avoid * that here. > > */ > > - err = dirent_to_qid(pdu, fidp, dent, &qid); > > + err = dirent_to_qid(pdu, fidp, &dent, &qid); > > if (err < 0) { > > - v9fs_readdir_unlock(&fidp->fs.dir); > > v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > > v9fs_string_free(&name); > > return err; > > @@ -2392,8 +2379,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > > V9fsFidState *fidp, * user would get that error anyway when accessing those > > * files/dirs through other ways. > > */ > > - size = MIN(sizeof(dent->d_ino), sizeof(qid.path)); > > - memcpy(&qid.path, &dent->d_ino, size); > > + size = MIN(sizeof(dent.d_ino), sizeof(qid.path)); > > + memcpy(&qid.path, &dent.d_ino, size); > > /* Fill the other fields with dummy values */ > > qid.type = 0; > > qid.version = 0; > > @@ -2401,10 +2388,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > > V9fsFidState *fidp, > > > > /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */ > > len = pdu_marshal(pdu, 11 + count, "Qqbs", > > - &qid, dent->d_off, > > - dent->d_type, &name); > > - > > - v9fs_readdir_unlock(&fidp->fs.dir); > > + &qid, dent.d_off, > > + dent.d_type, &name); > > > > if (len < 0) { > > v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > > @@ -2413,11 +2398,9 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > > V9fsFidState *fidp, } > > count += len; > > v9fs_string_free(&name); > > - saved_dir_pos = dent->d_off; > > + saved_dir_pos = dent.d_off; > > } > > > > - v9fs_readdir_unlock(&fidp->fs.dir); > > - > > if (err < 0) { > > return err; > > } > > diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c > > index 73f9a751e178..90356dc60140 100644 > > --- a/hw/9pfs/codir.c > > +++ b/hw/9pfs/codir.c > > @@ -19,7 +19,7 @@ > > #include "coth.h" > > > > int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, V9fsFidState *fidp, > > - struct dirent **dent) > > + struct dirent *dent) > > { > > int err; > > V9fsState *s = pdu->s; > > @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, > > V9fsFidState *fidp, struct dirent *entry; > > > > errno = 0; > > + > > + v9fs_readdir_lock(&fidp->fs.dir); > > + > > entry = s->ops->readdir(&s->ctx, &fidp->fs); > > if (!entry && errno) { > > err = -errno; > > + } else if (entry) { > > + memcpy(dent, entry, sizeof(*dent)); > > + err = 1; > > I find using sizeof(*dent) a bit dangerous considering potential type changes > in future. I would rather use sizeof(struct dirent). It is also more human > friendly to read IMO. > Hmm... I believe it's the opposite actually: with sizeof(*dent), memcpy will always copy the number of bytes that are expected to fit in *dent, no matter the type. But yes, since memcpy() doesn't do any type checking for us, I think I'll just turn this into: *dent = *entry; > > } else { > > - *dent = entry; > > err = 0; > > } > > + > > + v9fs_readdir_unlock(&fidp->fs.dir); > > }); > > return err; > > } > > diff --git a/hw/9pfs/coth.h b/hw/9pfs/coth.h > > index c2cdc7a9eaac..696c90111261 100644 > > --- a/hw/9pfs/coth.h > > +++ b/hw/9pfs/coth.h > > @@ -48,7 +48,8 @@ > > > > void co_run_in_worker_bh(void *); > > int coroutine_fn v9fs_co_readlink(V9fsPDU *, V9fsPath *, V9fsString *); > > -int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent > > **); +/* Returns < 0 on error, 1 on success and 0 on end of directory */ > > +int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent > > *); off_t coroutine_fn v9fs_co_telldir(V9fsPDU *, V9fsFidState *); > > void coroutine_fn v9fs_co_seekdir(V9fsPDU *, V9fsFidState *, off_t); > > void coroutine_fn v9fs_co_rewinddir(V9fsPDU *, V9fsFidState *); > > That API comment should better go into the .c file as a multi-line comment > instead of squeezing it as single-line comment into the header file here. > > It would also make sense either defining macros, or enums as those 2 possible > succesful return types of that function instead of simply using hard coded > values (0 and 1). > Yeah, I'll drop the comment and go for an enum I guess. > > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c > > index 2167322985ec..f46ad888daa6 100644 > > --- a/tests/qtest/virtio-9p-test.c > > +++ b/tests/qtest/virtio-9p-test.c > > @@ -793,6 +793,38 @@ static void fs_flush_ignored(void *obj, void *data, > > QGuestAllocator *t_alloc) g_free(wnames[0]); > > } > > > > +static void fs_readdir_deadlock(void *obj, void *data, QGuestAllocator > > *t_alloc) +{ > > + QVirtio9P *v9p = obj; > > + alloc = t_alloc; > > + char *const wnames[] = { > > g_strdup(QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR) }; + P9Req *req, *req2; > > + > > + fs_attach(v9p, NULL, t_alloc); > > + req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0); > > + v9fs_req_wait_for_reply(req, NULL); > > + v9fs_rwalk(req, NULL, NULL); > > + > > + req = v9fs_tlopen(v9p, 1, O_WRONLY, 0); > > + v9fs_req_wait_for_reply(req, NULL); > > + v9fs_rlopen(req, NULL, NULL); > > + > > + /* > > + * The first request will remain pending in the backend until > > + * a request with the magic offset is received. This deadlocks > > + * the mainloop of an older QEMU that still handles the critical > > + * section around readdir() in the frontend code. > > + */ > > + req = v9fs_treaddir(v9p, 1, 0, 128, 0); > > + req2 = v9fs_treaddir(v9p, 1, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET, > > 128, + 1); > > + v9fs_req_wait_for_reply(req, NULL); > > + v9fs_req_free(req); > > + v9fs_req_free(req2); > > + > > + g_free(wnames[0]); > > +} > > + > > static void register_virtio_9p_test(void) > > { > > qos_add_test("config", "virtio-9p", pci_config, NULL); > > @@ -810,6 +842,7 @@ static void register_virtio_9p_test(void) > > qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored, > > NULL); > > qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL); > > + qos_add_test("fs/readdir/deadlock", "virtio-9p", fs_readdir_deadlock, > > NULL); } > > > > libqos_init(register_virtio_9p_test); > > Best regards, > Christian Schoenebeck > > Thanks for the review ! Cheers, -- Greg
On Wed, 06 May 2020 15:36:16 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Mittwoch, 6. Mai 2020 15:05:23 CEST Christian Schoenebeck wrote: > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > index 9e046f7acb51..ac84ae804496 100644 > > > --- a/hw/9pfs/9p.c > > > +++ b/hw/9pfs/9p.c > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > struct stat stbuf; > > > off_t saved_dir_pos; > > > > > > - struct dirent *dent; > > > + struct dirent dent; > > One more: since this dirent structure is now on the stack, it should better be > initialized for safety reasons. > I don't think so, for two reasons: - I can't think of an initializer that would make sense for a dirent - if a future change introduces a branch where dent could be used uninitialized, I'd rather give a chance to the compiler to bark > Best regards, > Christian Schoenebeck > >
On Mittwoch, 6. Mai 2020 19:49:10 CEST Greg Kurz wrote: > > Ok, but why not both? Moving locks to worker thread and QemuMutex -> > > CoMutex? > Using CoMutex would be mandatory if we leave the locking where it sits > today, so that the main thread can switch to other coroutines instead > of blocking. We don't have the same requirement with the worker thread: > it just needs to do the actual readdir() and then it goes back to the > thread pool, waiting to be summoned again for some other work. Yes, I know. > So I'd > rather use standard mutexes to keep things simple... why would you > want to use a CoMutex here ? Like you said, it would not be mandatory, nor a big deal, the idea was just if a lock takes longer than expected then a worker thread could already continue with another task. I mean the amount of worker threads are limited they are not growing on demand, are they? I also haven't reviewed QEMU's lock implementations in very detail, but IIRC CoMutexes are completely handled in user space, while QemuMutex uses regular OS mutexes and hence might cost context switches. > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > index 9e046f7acb51..ac84ae804496 100644 > > > --- a/hw/9pfs/9p.c > > > +++ b/hw/9pfs/9p.c > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > struct stat stbuf; > > > off_t saved_dir_pos; > > > > > > - struct dirent *dent; > > > + struct dirent dent; > > > > > > /* save the directory position */ > > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > > > > > @@ -2181,13 +2181,11 @@ static int coroutine_fn > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { > > > > > > v9fs_path_init(&path); > > > > > > - v9fs_readdir_lock(&fidp->fs.dir); > > > - > > > > That's the deadlock fix, but ... > > > > > err = v9fs_co_readdir(pdu, fidp, &dent); > > > > > > - if (err || !dent) { > > > + if (err <= 0) { > > > > > > break; > > > > > > } > > > > ... even though this code simplification might make sense, I don't think > > it > > should be mixed with the deadlock fix together in one patch. They are not > > I could possibly split this in two patches, one for returning a copy > and one for moving the locking around, but... > > > related with each other, nor is the code simplification you are aiming > > trivial > ... this assertion is somewhat wrong: moving the locking to > v9fs_co_readdir() really requires it returns a copy. Yeah, I am also not sure whether a split would make it more trivial enough in this case to be worth the hassle. If you find an acceptable solution, good, if not then leave it one patch. > > enough to justify squashing. The deadlock fix should make it through the > > stable branches, while the code simplification should not. So that's > > better > > off as a separate cleanup patch. > > The issue has been there for such a long time without causing any > trouble. Not worth adding churn in stable for a bug that is impossible > to hit with a regular linux guest. Who knows. There are also other clients out there. A potential deadlock is still a serious issue after all. > > > @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, > > > V9fsFidState *fidp, struct dirent *entry; > > > > > > errno = 0; > > > > > > + > > > + v9fs_readdir_lock(&fidp->fs.dir); > > > + > > > > > > entry = s->ops->readdir(&s->ctx, &fidp->fs); > > > if (!entry && errno) { > > > > > > err = -errno; > > > > > > + } else if (entry) { > > > + memcpy(dent, entry, sizeof(*dent)); > > > + err = 1; > > > > I find using sizeof(*dent) a bit dangerous considering potential type > > changes in future. I would rather use sizeof(struct dirent). It is also > > more human friendly to read IMO. > > Hmm... I believe it's the opposite actually: with sizeof(*dent), memcpy > will always copy the number of bytes that are expected to fit in *dent, > no matter the type. Yes, but what you intend is to flat copy a structure, not pointers. So no matter how the type is going to be changed you always actually wanted (semantically) copy(sizeof(struct dirent), nelements) Now it is nelements=1, in future it might also be nelements>1, but what you certainly don't ever want here is copy(sizeof(void*), nelements) > But yes, since memcpy() doesn't do any type checking for us, I think > I'll just turn this into: > > *dent = *entry; Ok Best regards, Christian Schoenebeck
On Mittwoch, 6. Mai 2020 19:54:15 CEST Greg Kurz wrote: > On Wed, 06 May 2020 15:36:16 +0200 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > On Mittwoch, 6. Mai 2020 15:05:23 CEST Christian Schoenebeck wrote: > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > > index 9e046f7acb51..ac84ae804496 100644 > > > > --- a/hw/9pfs/9p.c > > > > +++ b/hw/9pfs/9p.c > > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > > > struct stat stbuf; > > > > off_t saved_dir_pos; > > > > > > > > - struct dirent *dent; > > > > + struct dirent dent; > > > > One more: since this dirent structure is now on the stack, it should > > better be initialized for safety reasons. > > I don't think so, for two reasons: > - I can't think of an initializer that would make sense for a dirent The same as it would (implied - usually, e.g. with gmalloc0()) if you were allocating it on heap: by initializing it with all zeroes, e.g. just: struct dirent dent = {}; > - if a future change introduces a branch where dent could be used > uninitialized, I'd rather give a chance to the compiler to bark The compiler would reliably bark on using it unitialized if you were about to access it directly within the same function. But that's not the case here. dirent is passed by reference to a function which will be altering it. Such stacked relations usually require more sophisticated diagnostics, like e.g. exeuting the LLVM sanitizer. Best regards, Christian Schoenebeck
On Thu, 07 May 2020 13:46:50 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Mittwoch, 6. Mai 2020 19:54:15 CEST Greg Kurz wrote: > > On Wed, 06 May 2020 15:36:16 +0200 > > > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > On Mittwoch, 6. Mai 2020 15:05:23 CEST Christian Schoenebeck wrote: > > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > > > index 9e046f7acb51..ac84ae804496 100644 > > > > > --- a/hw/9pfs/9p.c > > > > > +++ b/hw/9pfs/9p.c > > > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > > > > > struct stat stbuf; > > > > > off_t saved_dir_pos; > > > > > > > > > > - struct dirent *dent; > > > > > + struct dirent dent; > > > > > > One more: since this dirent structure is now on the stack, it should > > > better be initialized for safety reasons. > > > > I don't think so, for two reasons: > > - I can't think of an initializer that would make sense for a dirent > > The same as it would (implied - usually, e.g. with gmalloc0()) if you were > allocating it on heap: by initializing it with all zeroes, e.g. just: > > struct dirent dent = {}; > Ok, so you have a zeroed dent, which is likely just as wrong as any other arbitrary value... I just agree that if we have some bug that prevents v9fs_co_readdir() to fill the dentry, this would prevent arbitrary host data to leak to the guest. > > - if a future change introduces a branch where dent could be used > > uninitialized, I'd rather give a chance to the compiler to bark > > The compiler would reliably bark on using it unitialized if you were about to > access it directly within the same function. But that's not the case here. Hmm... that's the case with the current code base: v9fs_do_readdir_with_stat: err = v9fs_co_readdir(pdu, fidp, &dent); if (err <= 0) { break; } err = v9fs_co_name_to_path(pdu, &fidp->path, dent.d_name, &path); ^^^^ v9fs_do_readdir: err = v9fs_co_readdir(pdu, fidp, &dent); if (err <= 0) { break; } v9fs_string_init(&name); v9fs_string_sprintf(&name, "%s", dent.d_name); ^^^^ > dirent is passed by reference to a function which will be altering it. Such > stacked relations usually require more sophisticated diagnostics, like e.g. > exeuting the LLVM sanitizer. > We also do Coverity runs on a regular basis. > Best regards, > Christian Schoenebeck > >
On Thu, 07 May 2020 13:37:30 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Mittwoch, 6. Mai 2020 19:49:10 CEST Greg Kurz wrote: > > > Ok, but why not both? Moving locks to worker thread and QemuMutex -> > > > CoMutex? > > Using CoMutex would be mandatory if we leave the locking where it sits > > today, so that the main thread can switch to other coroutines instead > > of blocking. We don't have the same requirement with the worker thread: > > it just needs to do the actual readdir() and then it goes back to the > > thread pool, waiting to be summoned again for some other work. > > Yes, I know. > > > So I'd > > rather use standard mutexes to keep things simple... why would you > > want to use a CoMutex here ? > > Like you said, it would not be mandatory, nor a big deal, the idea was just if > a lock takes longer than expected then a worker thread could already continue > with another task. I mean the amount of worker threads are limited they are > not growing on demand, are they? > Yes, the pool is limited to a fixed number of 64 threads, but... > I also haven't reviewed QEMU's lock implementations in very detail, but IIRC > CoMutexes are completely handled in user space, while QemuMutex uses regular > OS mutexes and hence might cost context switches. > ... since the locking would only been exercised with an hypothetical client doing stupid things, this is beginning to look like bike-shedding to me. :) > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > > index 9e046f7acb51..ac84ae804496 100644 > > > > --- a/hw/9pfs/9p.c > > > > +++ b/hw/9pfs/9p.c > > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > > > struct stat stbuf; > > > > off_t saved_dir_pos; > > > > > > > > - struct dirent *dent; > > > > + struct dirent dent; > > > > > > > > /* save the directory position */ > > > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > > > > > > > @@ -2181,13 +2181,11 @@ static int coroutine_fn > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { > > > > > > > > v9fs_path_init(&path); > > > > > > > > - v9fs_readdir_lock(&fidp->fs.dir); > > > > - > > > > > > That's the deadlock fix, but ... > > > > > > > err = v9fs_co_readdir(pdu, fidp, &dent); > > > > > > > > - if (err || !dent) { > > > > + if (err <= 0) { > > > > > > > > break; > > > > > > > > } > > > > > > ... even though this code simplification might make sense, I don't think > > > it > > > should be mixed with the deadlock fix together in one patch. They are not > > > > I could possibly split this in two patches, one for returning a copy > > and one for moving the locking around, but... > > > > > related with each other, nor is the code simplification you are aiming > > > trivial > > ... this assertion is somewhat wrong: moving the locking to > > v9fs_co_readdir() really requires it returns a copy. > > Yeah, I am also not sure whether a split would make it more trivial enough in > this case to be worth the hassle. If you find an acceptable solution, good, if > not then leave it one patch. > Another option would be to g_malloc() the dirent in v9fs_co_readdir() and g_free() in the callers. This would cause less churn since we could keep the same function signature. > > > enough to justify squashing. The deadlock fix should make it through the > > > stable branches, while the code simplification should not. So that's > > > better > > > off as a separate cleanup patch. > > > > The issue has been there for such a long time without causing any > > trouble. Not worth adding churn in stable for a bug that is impossible > > to hit with a regular linux guest. > > Who knows. There are also other clients out there. A potential deadlock is > still a serious issue after all. > Well, I guess Cc: qemu-stable doesn't cost much and then I let other people decide. I have enough in my plate with upstream. > > > > @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, > > > > V9fsFidState *fidp, struct dirent *entry; > > > > > > > > errno = 0; > > > > > > > > + > > > > + v9fs_readdir_lock(&fidp->fs.dir); > > > > + > > > > > > > > entry = s->ops->readdir(&s->ctx, &fidp->fs); > > > > if (!entry && errno) { > > > > > > > > err = -errno; > > > > > > > > + } else if (entry) { > > > > + memcpy(dent, entry, sizeof(*dent)); > > > > + err = 1; > > > > > > I find using sizeof(*dent) a bit dangerous considering potential type > > > changes in future. I would rather use sizeof(struct dirent). It is also > > > more human friendly to read IMO. > > > > Hmm... I believe it's the opposite actually: with sizeof(*dent), memcpy > > will always copy the number of bytes that are expected to fit in *dent, > > no matter the type. > > Yes, but what you intend is to flat copy a structure, not pointers. So no > matter how the type is going to be changed you always actually wanted > (semantically) > > copy(sizeof(struct dirent), nelements) > > Now it is nelements=1, in future it might also be nelements>1, but what you > certainly don't ever want here is > > copy(sizeof(void*), nelements) > > > But yes, since memcpy() doesn't do any type checking for us, I think > > I'll just turn this into: > > > > *dent = *entry; > > Ok > > Best regards, > Christian Schoenebeck > >
On Donnerstag, 7. Mai 2020 16:33:28 CEST Greg Kurz wrote: > > I also haven't reviewed QEMU's lock implementations in very detail, but > > IIRC CoMutexes are completely handled in user space, while QemuMutex uses > > regular OS mutexes and hence might cost context switches. > > ... since the locking would only been exercised with an hypothetical > client doing stupid things, this is beginning to look like bike-shedding > to me. :) Aha, keep that in mind when you're doing your next review. ;-) No seriously, like I said, I don't really care too much about Mutex vs. CoMutex in you patch here. It was actually more about wide-picture thinking, i.e. other places of (co)mutexes being used or other potential changes that would make this or other uses more relevant one day. > > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > > > index 9e046f7acb51..ac84ae804496 100644 > > > > > --- a/hw/9pfs/9p.c > > > > > +++ b/hw/9pfs/9p.c > > > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > > > > > struct stat stbuf; > > > > > off_t saved_dir_pos; > > > > > > > > > > - struct dirent *dent; > > > > > + struct dirent dent; > > > > > > > > > > /* save the directory position */ > > > > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > > > > > > > > > @@ -2181,13 +2181,11 @@ static int coroutine_fn > > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { > > > > > > > > > > v9fs_path_init(&path); > > > > > > > > > > - v9fs_readdir_lock(&fidp->fs.dir); > > > > > - > > > > > > > > That's the deadlock fix, but ... > > > > > > > > > err = v9fs_co_readdir(pdu, fidp, &dent); > > > > > > > > > > - if (err || !dent) { > > > > > + if (err <= 0) { > > > > > > > > > > break; > > > > > > > > > > } > > > > > > > > ... even though this code simplification might make sense, I don't > > > > think > > > > it > > > > should be mixed with the deadlock fix together in one patch. They are > > > > not > > > > > > I could possibly split this in two patches, one for returning a copy > > > and one for moving the locking around, but... > > > > > > > related with each other, nor is the code simplification you are aiming > > > > trivial > > > > > > ... this assertion is somewhat wrong: moving the locking to > > > v9fs_co_readdir() really requires it returns a copy. > > > > Yeah, I am also not sure whether a split would make it more trivial enough > > in this case to be worth the hassle. If you find an acceptable solution, > > good, if not then leave it one patch. > > Another option would be to g_malloc() the dirent in v9fs_co_readdir() and > g_free() in the callers. This would cause less churn since we could keep > the same function signature. I was actually just going to suggest the same. So yes, looks like a less invasive change to me. Best regards, Christian Schoenebeck
On Thu, 07 May 2020 17:03:46 +0200 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Donnerstag, 7. Mai 2020 16:33:28 CEST Greg Kurz wrote: > > > I also haven't reviewed QEMU's lock implementations in very detail, but > > > IIRC CoMutexes are completely handled in user space, while QemuMutex uses > > > regular OS mutexes and hence might cost context switches. > > > > ... since the locking would only been exercised with an hypothetical > > client doing stupid things, this is beginning to look like bike-shedding > > to me. :) > > Aha, keep that in mind when you're doing your next review. ;-) > Fair enough. :) > No seriously, like I said, I don't really care too much about Mutex vs. > CoMutex in you patch here. It was actually more about wide-picture thinking, > i.e. other places of (co)mutexes being used or other potential changes that > would make this or other uses more relevant one day. > Then I agree with you that it would be better to use CoMutex if we were experiencing thread pool exhaustion indeed. > > > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > > > > index 9e046f7acb51..ac84ae804496 100644 > > > > > > --- a/hw/9pfs/9p.c > > > > > > +++ b/hw/9pfs/9p.c > > > > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn > > > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; > > > > > > > > > > > > struct stat stbuf; > > > > > > off_t saved_dir_pos; > > > > > > > > > > > > - struct dirent *dent; > > > > > > + struct dirent dent; > > > > > > > > > > > > /* save the directory position */ > > > > > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > > > > > > > > > > > @@ -2181,13 +2181,11 @@ static int coroutine_fn > > > > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { > > > > > > > > > > > > v9fs_path_init(&path); > > > > > > > > > > > > - v9fs_readdir_lock(&fidp->fs.dir); > > > > > > - > > > > > > > > > > That's the deadlock fix, but ... > > > > > > > > > > > err = v9fs_co_readdir(pdu, fidp, &dent); > > > > > > > > > > > > - if (err || !dent) { > > > > > > + if (err <= 0) { > > > > > > > > > > > > break; > > > > > > > > > > > > } > > > > > > > > > > ... even though this code simplification might make sense, I don't > > > > > think > > > > > it > > > > > should be mixed with the deadlock fix together in one patch. They are > > > > > not > > > > > > > > I could possibly split this in two patches, one for returning a copy > > > > and one for moving the locking around, but... > > > > > > > > > related with each other, nor is the code simplification you are aiming > > > > > trivial > > > > > > > > ... this assertion is somewhat wrong: moving the locking to > > > > v9fs_co_readdir() really requires it returns a copy. > > > > > > Yeah, I am also not sure whether a split would make it more trivial enough > > > in this case to be worth the hassle. If you find an acceptable solution, > > > good, if not then leave it one patch. > > > > Another option would be to g_malloc() the dirent in v9fs_co_readdir() and > > g_free() in the callers. This would cause less churn since we could keep > > the same function signature. > > I was actually just going to suggest the same. So yes, looks like a less > invasive change to me. > I'll just do that then :) > Best regards, > Christian Schoenebeck > >
diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c index 7eb210ffa89b..fb0741569bcd 100644 --- a/hw/9pfs/9p-synth.c +++ b/hw/9pfs/9p-synth.c @@ -256,6 +256,13 @@ static struct dirent *synth_readdir(FsContext *ctx, V9fsFidOpenState *fs) if (entry) { synth_open->offset++; } + + if (!strcmp(node->name, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR)) { + while (synth_open->offset != QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET) { + sched_yield(); + } + } + return entry; } @@ -596,6 +603,11 @@ static int synth_init(FsContext *ctx, Error **errp) assert(!ret); g_free(name); } + + ret = qemu_v9fs_synth_mkdir( + NULL, 0700, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR, &dir + ); + assert(!ret); } } diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h index 036d7e4a5b1c..c9a65b61ed45 100644 --- a/hw/9pfs/9p-synth.h +++ b/hw/9pfs/9p-synth.h @@ -59,6 +59,8 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode, #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir" #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d" #define QTEST_V9FS_SYNTH_READDIR_NFILES 100 +#define QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR "ReadDirDeadLock" +#define QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET 3 /* Any write to the "FLUSH" file is handled one byte at a time by the * backend. If the byte is zero, the backend returns success (ie, 1), diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 9e046f7acb51..ac84ae804496 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -2170,7 +2170,7 @@ static int coroutine_fn v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0; struct stat stbuf; off_t saved_dir_pos; - struct dirent *dent; + struct dirent dent; /* save the directory position */ saved_dir_pos = v9fs_co_telldir(pdu, fidp); @@ -2181,13 +2181,11 @@ static int coroutine_fn v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) { v9fs_path_init(&path); - v9fs_readdir_lock(&fidp->fs.dir); - err = v9fs_co_readdir(pdu, fidp, &dent); - if (err || !dent) { + if (err <= 0) { break; } - err = v9fs_co_name_to_path(pdu, &fidp->path, dent->d_name, &path); + err = v9fs_co_name_to_path(pdu, &fidp->path, dent.d_name, &path); if (err < 0) { break; } @@ -2195,13 +2193,11 @@ static int coroutine_fn v9fs_do_readdir_with_stat(V9fsPDU *pdu, if (err < 0) { break; } - err = stat_to_v9stat(pdu, &path, dent->d_name, &stbuf, &v9stat); + err = stat_to_v9stat(pdu, &path, dent.d_name, &stbuf, &v9stat); if (err < 0) { break; } if ((count + v9stat.size + 2) > max_count) { - v9fs_readdir_unlock(&fidp->fs.dir); - /* Ran out of buffer. Set dir back to old position and return */ v9fs_co_seekdir(pdu, fidp, saved_dir_pos); v9fs_stat_free(&v9stat); @@ -2212,8 +2208,6 @@ static int coroutine_fn v9fs_do_readdir_with_stat(V9fsPDU *pdu, /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */ len = pdu_marshal(pdu, 11 + count, "S", &v9stat); - v9fs_readdir_unlock(&fidp->fs.dir); - if (len < 0) { v9fs_co_seekdir(pdu, fidp, saved_dir_pos); v9fs_stat_free(&v9stat); @@ -2223,11 +2217,9 @@ static int coroutine_fn v9fs_do_readdir_with_stat(V9fsPDU *pdu, count += len; v9fs_stat_free(&v9stat); v9fs_path_free(&path); - saved_dir_pos = dent->d_off; + saved_dir_pos = dent.d_off; } - v9fs_readdir_unlock(&fidp->fs.dir); - v9fs_path_free(&path); if (err < 0) { return err; @@ -2340,7 +2332,7 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, int len, err = 0; int32_t count = 0; off_t saved_dir_pos; - struct dirent *dent; + struct dirent dent; /* save the directory position */ saved_dir_pos = v9fs_co_telldir(pdu, fidp); @@ -2349,17 +2341,13 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, } while (1) { - v9fs_readdir_lock(&fidp->fs.dir); - err = v9fs_co_readdir(pdu, fidp, &dent); - if (err || !dent) { + if (err <= 0) { break; } v9fs_string_init(&name); - v9fs_string_sprintf(&name, "%s", dent->d_name); + v9fs_string_sprintf(&name, "%s", dent.d_name); if ((count + v9fs_readdir_data_size(&name)) > max_count) { - v9fs_readdir_unlock(&fidp->fs.dir); - /* Ran out of buffer. Set dir back to old position and return */ v9fs_co_seekdir(pdu, fidp, saved_dir_pos); v9fs_string_free(&name); @@ -2374,9 +2362,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, * different entries; we cannot make any assumption to avoid * that here. */ - err = dirent_to_qid(pdu, fidp, dent, &qid); + err = dirent_to_qid(pdu, fidp, &dent, &qid); if (err < 0) { - v9fs_readdir_unlock(&fidp->fs.dir); v9fs_co_seekdir(pdu, fidp, saved_dir_pos); v9fs_string_free(&name); return err; @@ -2392,8 +2379,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, * user would get that error anyway when accessing those * files/dirs through other ways. */ - size = MIN(sizeof(dent->d_ino), sizeof(qid.path)); - memcpy(&qid.path, &dent->d_ino, size); + size = MIN(sizeof(dent.d_ino), sizeof(qid.path)); + memcpy(&qid.path, &dent.d_ino, size); /* Fill the other fields with dummy values */ qid.type = 0; qid.version = 0; @@ -2401,10 +2388,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */ len = pdu_marshal(pdu, 11 + count, "Qqbs", - &qid, dent->d_off, - dent->d_type, &name); - - v9fs_readdir_unlock(&fidp->fs.dir); + &qid, dent.d_off, + dent.d_type, &name); if (len < 0) { v9fs_co_seekdir(pdu, fidp, saved_dir_pos); @@ -2413,11 +2398,9 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, } count += len; v9fs_string_free(&name); - saved_dir_pos = dent->d_off; + saved_dir_pos = dent.d_off; } - v9fs_readdir_unlock(&fidp->fs.dir); - if (err < 0) { return err; } diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c index 73f9a751e178..90356dc60140 100644 --- a/hw/9pfs/codir.c +++ b/hw/9pfs/codir.c @@ -19,7 +19,7 @@ #include "coth.h" int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, V9fsFidState *fidp, - struct dirent **dent) + struct dirent *dent) { int err; V9fsState *s = pdu->s; @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, V9fsFidState *fidp, struct dirent *entry; errno = 0; + + v9fs_readdir_lock(&fidp->fs.dir); + entry = s->ops->readdir(&s->ctx, &fidp->fs); if (!entry && errno) { err = -errno; + } else if (entry) { + memcpy(dent, entry, sizeof(*dent)); + err = 1; } else { - *dent = entry; err = 0; } + + v9fs_readdir_unlock(&fidp->fs.dir); }); return err; } diff --git a/hw/9pfs/coth.h b/hw/9pfs/coth.h index c2cdc7a9eaac..696c90111261 100644 --- a/hw/9pfs/coth.h +++ b/hw/9pfs/coth.h @@ -48,7 +48,8 @@ void co_run_in_worker_bh(void *); int coroutine_fn v9fs_co_readlink(V9fsPDU *, V9fsPath *, V9fsString *); -int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent **); +/* Returns < 0 on error, 1 on success and 0 on end of directory */ +int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent *); off_t coroutine_fn v9fs_co_telldir(V9fsPDU *, V9fsFidState *); void coroutine_fn v9fs_co_seekdir(V9fsPDU *, V9fsFidState *, off_t); void coroutine_fn v9fs_co_rewinddir(V9fsPDU *, V9fsFidState *); diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c index 2167322985ec..f46ad888daa6 100644 --- a/tests/qtest/virtio-9p-test.c +++ b/tests/qtest/virtio-9p-test.c @@ -793,6 +793,38 @@ static void fs_flush_ignored(void *obj, void *data, QGuestAllocator *t_alloc) g_free(wnames[0]); } +static void fs_readdir_deadlock(void *obj, void *data, QGuestAllocator *t_alloc) +{ + QVirtio9P *v9p = obj; + alloc = t_alloc; + char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_READDIR_DEADLOCK_DIR) }; + P9Req *req, *req2; + + fs_attach(v9p, NULL, t_alloc); + req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0); + v9fs_req_wait_for_reply(req, NULL); + v9fs_rwalk(req, NULL, NULL); + + req = v9fs_tlopen(v9p, 1, O_WRONLY, 0); + v9fs_req_wait_for_reply(req, NULL); + v9fs_rlopen(req, NULL, NULL); + + /* + * The first request will remain pending in the backend until + * a request with the magic offset is received. This deadlocks + * the mainloop of an older QEMU that still handles the critical + * section around readdir() in the frontend code. + */ + req = v9fs_treaddir(v9p, 1, 0, 128, 0); + req2 = v9fs_treaddir(v9p, 1, QTEST_V9FS_SYNTH_READDIR_DEADLOCK_OFFSET, 128, + 1); + v9fs_req_wait_for_reply(req, NULL); + v9fs_req_free(req); + v9fs_req_free(req2); + + g_free(wnames[0]); +} + static void register_virtio_9p_test(void) { qos_add_test("config", "virtio-9p", pci_config, NULL); @@ -810,6 +842,7 @@ static void register_virtio_9p_test(void) qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored, NULL); qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL); + qos_add_test("fs/readdir/deadlock", "virtio-9p", fs_readdir_deadlock, NULL); } libqos_init(register_virtio_9p_test);
Locking was introduced to address the deprecation of readdir_r(3) in glibc 2.24. It turns out that the frontend code is the worst place to handle a critical section with a pthread mutex: the code runs in a coroutine on behalf of the QEMU mainloop and then yields control, waiting for the fsdev backend to process the request in a worker thread. If the client resends another readdir request for the same fid before the previous one finally unlocked the mutex, we're deadlocked. This never bit us because the linux client serializes readdir requests for the same fid, but the small qtest in this patch demonstrates how easy it would be for a guest to hang QEMU. A solution could be to use CoMutex instead of QemuMutex, but it seems better to narrow the critical section to the worker thread code and to return a copy of the dirent to the frontend. Signed-off-by: Greg Kurz <groug@kaod.org> --- hw/9pfs/9p-synth.c | 12 +++++++++++ hw/9pfs/9p-synth.h | 2 ++ hw/9pfs/9p.c | 45 +++++++++++++----------------------------- hw/9pfs/codir.c | 11 ++++++++-- hw/9pfs/coth.h | 3 ++- tests/qtest/virtio-9p-test.c | 33 +++++++++++++++++++++++++++++++ 6 files changed, 72 insertions(+), 34 deletions(-)