Message ID | 20170104013355.4a8923b6@duuni (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 04, 2017 at 01:34:50AM +0200, Tuomas Tynkkynen wrote: > I got these logs from the server & client with 9p tracepoints enabled: > > https://gist.githubusercontent.com/dezgeg/02447100b3182167403099fe7de4d941/raw/3772e408ddf586fb662ac9148fc10943529a6b99/dmesg%2520with%25209p%2520trace > https://gist.githubusercontent.com/dezgeg/e1e0c7f354042e1d9bdf7e9135934a65/raw/3a0e3b4f7a5229fd0be032c6839b578d47a21ce4/qemu.log > Lovely... 27349:out 8 0001 TSTATFS, tag 1 27350:out 12 0001 TLOPEN, tag 1 27351:complete 9 0001 RSTATFS, tag 1 27352:complete 13 0001 RLOPEN, tag 1 27677:out 8 0001 TSTATFS, tag 1 27678:out 12 0001 TLOPEN, tag 1 27679:complete 9 0001 RSTATFS, tag 1 27680:complete 13 0001 RLOPEN, tag 1 29667:out 8 0001 TSTATFS, tag 1 29668:out 110 0001 TWALK, tag 1 29669:complete 9 0001 RSTATFS, tag 1 29670:complete 7 0001 RLERROR, tag 1 42317:out 110 0001 TWALK, tag 1 42318:out 8 0001 TSTATFS, tag 1 42319:complete 9 0001 RSTATFS, tag 1 42320:complete 7 0001 RERROR, tag 1 Those are outright protocol violations: tag can be reused only after either a reply bearing the same tag has arrived *or* TFLUSH for that tag had been issued and successful (type == RFLUSH) reply bearing the tag of that TFLUSH has arrived. Your log doesn't contain any TFLUSH (108) at all, so it should've been plain and simple "no reuse until server sends a reply with matching tag". Otherwise the the dump looks sane. In the end all issued requests had been served, so it's not as if the client had been waiting for a free tag or for a response to be produced by the server. Interesting... dmesg doesn't seem to contain the beginning of the 9P trace - only 89858 out of 108818 in the dump, even though it claims to have lost only 4445 events. The last exchanges in the trace are > P9_TLOPEN tag 90 err 0 > P9_TLOPEN tag 25 err 0 > P9_TLOPEN tag 15 err 0 > P9_TLOPEN tag 104 err 0 > P9_TLOPEN tag 102 err 0 > P9_TLOPEN tag 91 err 0 > P9_TLOPEN tag 12 err 0 < P9_TREADLINK tag 12 which should've been complete 13 005a complete 13 0019 complete 13 000f complete 13 0068 complete 13 0066 complete 13 005b out 22 000c and the only plausible match for that is 108784:complete 13 005a 108798:complete 13 000f 108799:complete 13 0019 108800:complete 13 0068 108801:complete 13 0066 108802:complete 13 005b but there's a bunch of replies in between that hadn't been logged *and* such a TREADLINK request isn't seen after 108341:out 22 000c ... and sure enough, there's a whole lot of processes stuck in readlink. Interesting... Which version of qemu/what arguments are you using there? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 4 Jan 2017 01:47:53 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Wed, Jan 04, 2017 at 01:34:50AM +0200, Tuomas Tynkkynen wrote: > > > I got these logs from the server & client with 9p tracepoints enabled: > > > > https://gist.githubusercontent.com/dezgeg/02447100b3182167403099fe7de4d941/raw/3772e408ddf586fb662ac9148fc10943529a6b99/dmesg%2520with%25209p%2520trace > > https://gist.githubusercontent.com/dezgeg/e1e0c7f354042e1d9bdf7e9135934a65/raw/3a0e3b4f7a5229fd0be032c6839b578d47a21ce4/qemu.log > > > > Lovely... > > 27349:out 8 0001 TSTATFS, tag 1 > 27350:out 12 0001 TLOPEN, tag 1 > 27351:complete 9 0001 RSTATFS, tag 1 > 27352:complete 13 0001 RLOPEN, tag 1 > > 27677:out 8 0001 TSTATFS, tag 1 > 27678:out 12 0001 TLOPEN, tag 1 > 27679:complete 9 0001 RSTATFS, tag 1 > 27680:complete 13 0001 RLOPEN, tag 1 > > 29667:out 8 0001 TSTATFS, tag 1 > 29668:out 110 0001 TWALK, tag 1 > 29669:complete 9 0001 RSTATFS, tag 1 > 29670:complete 7 0001 RLERROR, tag 1 > > 42317:out 110 0001 TWALK, tag 1 > 42318:out 8 0001 TSTATFS, tag 1 > 42319:complete 9 0001 RSTATFS, tag 1 > 42320:complete 7 0001 RERROR, tag 1 > > Those are outright protocol violations: tag can be reused only after either > a reply bearing the same tag has arrived *or* TFLUSH for that tag had been > issued and successful (type == RFLUSH) reply bearing the tag of that TFLUSH > has arrived. Your log doesn't contain any TFLUSH (108) at all, so it should've > been plain and simple "no reuse until server sends a reply with matching tag". > Argh, I had completely forgotten there is another 9p mount involved, so the log is mixed between the two mounts. Apologies, there is now a new trace attached with the V9fsState pointer dumped in the mix as well. > Otherwise the the dump looks sane. In the end all issued requests had been > served, so it's not as if the client had been waiting for a free tag or for > a response to be produced by the server. > > Interesting... dmesg doesn't seem to contain the beginning of the 9P trace - > only 89858 out of 108818 in the dump, even though it claims to have lost > only 4445 events. [...] Here's logs that should be complete this time: https://gist.githubusercontent.com/dezgeg/08629d4c8ca79da794bc087e5951e518/raw/a1a82b9bc24e5282c82beb43a9dc91974ffcf75a/9p.qemu.log https://gist.githubusercontent.com/dezgeg/1d5f1cc0647e336c59989fc62780eb2e/raw/d7623755a895b0441c608ddba366d20bbf47ab0b/9p.dmesg.log > > Interesting... Which version of qemu/what arguments are you using there? This is QEMU 2.7.0, with these on the server side: -virtfs local,path=/nix/store,security_model=none,mount_tag=store -virtfs local,path=$TMPDIR/xchg,security_model=none,mount_tag=xchg and on the client side: store on /nix/store type 9p (rw,relatime,dirsync,trans=virtio,version=9p2000.L,cache=loose) xchg on /tmp/xchg type 9p (rw,relatime,dirsync,trans=virtio,version=9p2000.L,cache=loose) 'store' is the one receiving the hammering, 'xchg' I think is mostly sitting idle. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Here's logs that should be complete this time: > > https://gist.githubusercontent.com/dezgeg/08629d4c8ca79da794bc087e5951e518/raw/a1a82b9bc24e5282c82beb43a9dc91974ffcf75a/9p.qemu.log > https://gist.githubusercontent.com/dezgeg/1d5f1cc0647e336c59989fc62780eb2e/raw/d7623755a895b0441c608ddba366d20bbf47ab0b/9p.dmesg.log Fun. All requests prior to [ 360.110282] dd-1899 1.... 18497262us : 9p_client_req: client 18446612134390128640 request P9_TWALK tag 25 line in dmesg had been seen by the servers; all requests starting with that one had not. Replies to earlier requests kept arriving just fine. From the server side, everything looks nice and sane - it has processed all requests it had seen, and aside of slight difference in the arrival order server-side and client-side logs match, except for the last 26 requests the client claims to have sent and the server has never seen. All traffic for another client (there had been less of it) has ceased long before that point, so we can't really tell if it's just this client that got buggered. Interesting... Looking at the tracepoints, those requests got through p9_client_prepare_req(); we have no idea whether it got through p9_virtio_request(). OTOH, AFAICS nothing had been sleeping in there... FWIW, it might be interesting to try WARN_ON(!virtqueue_kick(chan->vq)); in p9_virtio_request() (instead of blind virtqueue_kick(chan->vq)) and see if it triggers. Incidentally, it looks like p9_virtio_request() ought to return an error if that happens... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 4 Jan 2017 23:01:01 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > > Here's logs that should be complete this time: > > > > https://gist.githubusercontent.com/dezgeg/08629d4c8ca79da794bc087e5951e518/raw/a1a82b9bc24e5282c82beb43a9dc91974ffcf75a/9p.qemu.log > > https://gist.githubusercontent.com/dezgeg/1d5f1cc0647e336c59989fc62780eb2e/raw/d7623755a895b0441c608ddba366d20bbf47ab0b/9p.dmesg.log > > Fun. All requests prior to > [ 360.110282] dd-1899 1.... 18497262us : 9p_client_req: client 18446612134390128640 request P9_TWALK tag 25 > line in dmesg had been seen by the servers; all requests starting with that > one had not. Replies to earlier requests kept arriving just fine. > Looking at the tag numbers, I think we're hitting the hardcoded limit of 128 simultaneous requests in QEMU (which doesn't produce any error, new requests are silently dropped). Tuomas, can you change MAX_REQ to some higher value (< 65535 since tag is 2-byte and 0xffff is reserved) to confirm ? > >From the server side, everything looks nice and sane - it has processed > all requests it had seen, and aside of slight difference in the arrival > order server-side and client-side logs match, except for the last 26 > requests the client claims to have sent and the server has never seen. > > All traffic for another client (there had been less of it) has ceased long > before that point, so we can't really tell if it's just this client that > got buggered. Interesting... > > Looking at the tracepoints, those requests got through p9_client_prepare_req(); > we have no idea whether it got through p9_virtio_request(). OTOH, AFAICS > nothing had been sleeping in there... > > FWIW, it might be interesting to try > WARN_ON(!virtqueue_kick(chan->vq)); > in p9_virtio_request() (instead of blind virtqueue_kick(chan->vq)) and see > if it triggers. Incidentally, it looks like p9_virtio_request() ought to > return an error if that happens... > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > V9fs-developer mailing list > V9fs-developer@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/v9fs-developer -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 06, 2017 at 02:52:35PM +0100, Greg Kurz wrote: > Looking at the tag numbers, I think we're hitting the hardcoded limit of 128 > simultaneous requests in QEMU (which doesn't produce any error, new requests > are silently dropped). > > Tuomas, can you change MAX_REQ to some higher value (< 65535 since tag is > 2-byte and 0xffff is reserved) to confirm ? Huh? Just how is a client supposed to cope with that behaviour? 9P is not SunRPC - there's a reason why it doesn't live on top of UDP. Sure, it's datagram-oriented, but it really wants reliable transport... Setting the ring size at MAX_REQ is fine; that'll give you ENOSPC on attempt to put a request there, and p9_virtio_request() will wait for things to clear, but if you've accepted a request, that's bloody it - you really should go and handle it. How does it happen, anyway? qemu-side, I mean... Does it move the buffer to used ring as soon as it has fetched the request? AFAICS, it doesn't - virtqueue_push() is called just before pdu_free(); we might get complications in case of TFLUSH handling (queue with MAX_REQ-1 requests submitted, TFLUSH arrives, cancel_pdu is found and ->cancelled is set on it, then v9fs_flush() waits for it to complete. Once the damn thing is done, buffer is released by virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken up. In the meanwhile, another request arrives into the slot of freed by that virtqueue_push() and we are out of pdus. So it could happen, and the things might get unpleasant to some extent, but... no TFLUSH had been present in all that traffic. And none of the stuck processes had been spinning in p9_virtio_request(), so they *did* find ring slots... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 7 Jan 2017 06:26:47 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Fri, Jan 06, 2017 at 02:52:35PM +0100, Greg Kurz wrote: > > > Looking at the tag numbers, I think we're hitting the hardcoded limit of 128 > > simultaneous requests in QEMU (which doesn't produce any error, new requests > > are silently dropped). > > > > Tuomas, can you change MAX_REQ to some higher value (< 65535 since tag is > > 2-byte and 0xffff is reserved) to confirm ? > > Huh? > > Just how is a client supposed to cope with that behaviour? 9P is not > SunRPC - there's a reason why it doesn't live on top of UDP. Sure, it's > datagram-oriented, but it really wants reliable transport... > > Setting the ring size at MAX_REQ is fine; that'll give you ENOSPC on > attempt to put a request there, and p9_virtio_request() will wait for > things to clear, but if you've accepted a request, that's bloody it - > you really should go and handle it. > Yes you're right and "dropped" in my previous mail meant "not accepted" actually (virtqueue_pop() not called)... sorry for the confusion. :-\ > How does it happen, anyway? qemu-side, I mean... Does it move the buffer > to used ring as soon as it has fetched the request? AFAICS, it doesn't - > virtqueue_push() is called just before pdu_free(); we might get complications > in case of TFLUSH handling (queue with MAX_REQ-1 requests submitted, TFLUSH > arrives, cancel_pdu is found and ->cancelled is set on it, then v9fs_flush() > waits for it to complete. Once the damn thing is done, buffer is released by > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > up. In the meanwhile, another request arrives into the slot of freed by > that virtqueue_push() and we are out of pdus. > Indeed. Even if this doesn't seem to be the problem here, I guess this should be fixed. > So it could happen, and the things might get unpleasant to some extent, but... > no TFLUSH had been present in all that traffic. And none of the stuck > processes had been spinning in p9_virtio_request(), so they *did* find > ring slots... So we're back to your previous proposal of checking if virtqueue_kick() returned false... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote: > > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > > up. In the meanwhile, another request arrives into the slot of freed by > > that virtqueue_push() and we are out of pdus. > > > > Indeed. Even if this doesn't seem to be the problem here, I guess this should > be fixed. FWIW, there's something that looks like an off-by-one in v9fs_device_realize_common(): /* initialize pdu allocator */ QLIST_INIT(&s->free_list); QLIST_INIT(&s->active_list); for (i = 0; i < (MAX_REQ - 1); i++) { QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next); s->pdus[i].s = s; s->pdus[i].idx = i; } Had been there since the original merge of 9p support into qemu - that code had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into free list. So your scenario with failing pdu_alloc() is still possible. In that log the total amount of pending requests has reached 128 for the first time right when the requests had stopped being handled and even though it had dropped below that shortly after, extra requests being put into queue had not been processed at all... I'm not familiar with qemu guts enough to tell if that's a plausible scenario, though... shouldn't subsequent queue insertions (after enough slots had been released) simply trigger virtio_queue_notify_vq() again? It *is* a bug (if we get a burst filling a previously empty queue all at once, there won't be any slots becoming freed), but that's obviously not the case here - slots were getting freed, after all. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 07, 2017 at 05:19:10PM +0000, Al Viro wrote: > released) simply trigger virtio_queue_notify_vq() again? It *is* a bug > (if we get a burst filling a previously empty queue all at once, there won't > be any slots becoming freed Umm... Nope, even in that scenario we'll have all requests except the last one processed. I'm trying to put together a simpler reproducer, but... no luck so far. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 07, 2017 at 06:15:23PM +0000, Al Viro wrote: > On Sat, Jan 07, 2017 at 05:19:10PM +0000, Al Viro wrote: > > > released) simply trigger virtio_queue_notify_vq() again? It *is* a bug > > (if we get a burst filling a previously empty queue all at once, there won't > > be any slots becoming freed > > Umm... Nope, even in that scenario we'll have all requests except the last > one processed. I'm trying to put together a simpler reproducer, but... > no luck so far. FWIW, here's another fun bug in qemu 9pfs: client may issue multiple Tflush on the same pending request. Server must not reply to them out of order and it must reply to at least the last one. If client has sent more than one Tflush, it must treat an arrival of Rflush to one of those as if it has received an Rflush to each of the earlier Tflush as well (IOW, *their* tags are released). Client may not reuse the tag of victim request until it has received Rflush for the last Tflush it has sent. Linux kernel-side 9p client doesn't issue such multiple Tflush, but the protocol allows that. qemu server does not guarantee in-order replies to multiple Tflush; worse, while it is likely to reply only to one of those, it may very well be the _first_ one. Subsequent ones will be lost; moreover, it'll leak both coroutines and ring slots. AFAICS, a clean way to deal with that would be to handle Tflush synchronously, right in pdu_submit(): if pdu->type == Tflush look the victim request up. if !victim || victim == pdu // [*] send reply and free pdu immediately. if victim->type == Tflush // [**] pdu->victim = victim->victim else pdu->victim = victim mark the victim cancelled add pdu to the tail of pdu->victim->flushes and let pdu_complete(pdu) send a reply to each request in pdu->flushes as well (freeing them as we go) Some locking might be needed to avoid the races between pdu_submit() and pdu_complete(), but it's not going to be a wide critical area. I'm not familiar with qemu locking primitives, sorry; what's providing an exclusion between QLIST_INSERT_HEAD in pdu_alloc() (from handle_9p_output()) and QLIST_REMOVE in pdu_free() (from pdu_complete())? Looks like the same thing ought to suffice here... [*] a cute way to hurt the current qemu server, BTW - coroutine that waits for itself to complete... [**] Tflush to Tflush is another fun corner case - it does *not* do anything to the earlier Tflush, but reply to it must follow that to original Tflush to avoid tag reuse problems. Note that this is not the "multiple Tflush" case mentioned above - those would all have oldtag pointing to the same request; they are not chained and unlike this one can happen with legitimate clients. Tflush to Tflush, OTOH, is not something a sane client would do. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 7 Jan 2017 17:19:10 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote: > > > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > > > up. In the meanwhile, another request arrives into the slot of freed by > > > that virtqueue_push() and we are out of pdus. > > > > > > > Indeed. Even if this doesn't seem to be the problem here, I guess this should > > be fixed. > > FWIW, there's something that looks like an off-by-one in > v9fs_device_realize_common(): > /* initialize pdu allocator */ > QLIST_INIT(&s->free_list); > QLIST_INIT(&s->active_list); > for (i = 0; i < (MAX_REQ - 1); i++) { > QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next); > s->pdus[i].s = s; > s->pdus[i].idx = i; > } > > Had been there since the original merge of 9p support into qemu - that code > had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into > free list. So your scenario with failing pdu_alloc() is still possible. Indeed, this (MAX_REQ - 1) thing looks wrong. Thanks for poiting that out. > In that log the total amount of pending requests has reached 128 for the > first time right when the requests had stopped being handled and even > though it had dropped below that shortly after, extra requests being put > into queue had not been processed at all... > > I'm not familiar with qemu guts enough to tell if that's a plausible scenario, > though... shouldn't subsequent queue insertions (after enough slots had been > released) simply trigger virtio_queue_notify_vq() again? It *is* a bug > (if we get a burst filling a previously empty queue all at once, there won't > be any slots becoming freed), but that's obviously not the case here - > slots were getting freed, after all. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 7 Jan 2017 17:19:10 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote: > > > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > > > up. In the meanwhile, another request arrives into the slot of freed by > > > that virtqueue_push() and we are out of pdus. > > > > > > > Indeed. Even if this doesn't seem to be the problem here, I guess this should > > be fixed. > > FWIW, there's something that looks like an off-by-one in > v9fs_device_realize_common(): > /* initialize pdu allocator */ > QLIST_INIT(&s->free_list); > QLIST_INIT(&s->active_list); > for (i = 0; i < (MAX_REQ - 1); i++) { > QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next); > s->pdus[i].s = s; > s->pdus[i].idx = i; > } > > Had been there since the original merge of 9p support into qemu - that code > had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into > free list. So your scenario with failing pdu_alloc() is still possible. > In that log the total amount of pending requests has reached 128 for the > first time right when the requests had stopped being handled and even > though it had dropped below that shortly after, extra requests being put > into queue had not been processed at all... > Yes, this does seem to be related to this or otherwise MAX_REQ related! - Bumping MAX_REQ up to 1024 makes the hang go away (on 4.7). - Dropping it to 64 makes the same hang happen on kernels where it worked before (I tried 4.4.x). - Doing s/(MAX_REQ - 1)/MAX_REQ/ makes the hang go away. I tested QEMU 2.8.0 as well and the behaviour is the same there. Here are the logs for 4.4 hanging with MAX_REQ == 64 without the loop condition changed: https://gist.githubusercontent.com/dezgeg/b5f0b7f8a0f3d8b6acb1566d7edcb2f0/raw/00241426890ea28d844986243c3b706881432fb4/9p-44.dmesg.log https://gist.githubusercontent.com/dezgeg/2bffe6c0271c4c9c382ac6363ce1864b/raw/92329eaef38305f82090de5dde3c944561afa372/9p-44.qemu.log -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 09, 2017 at 08:39:31PM +0200, Tuomas Tynkkynen wrote: > Yes, this does seem to be related to this or otherwise MAX_REQ related! > - Bumping MAX_REQ up to 1024 makes the hang go away (on 4.7). > - Dropping it to 64 makes the same hang happen on kernels where it worked > before (I tried 4.4.x). > - Doing s/(MAX_REQ - 1)/MAX_REQ/ makes the hang go away. Note that it's still possible to trigger the same situation with that off-by-one taken care of; if client sends 64 Treadlink and 64 Tflush (one for each of those), then follows by another pile of Treadlink (feeding them in as soon as free slots appear), the server can end up with failing pdu_alloc() - completion of readlink will release its slot immediately, but pdu won't get freed until the corresponding flush has gotten the CPU. I'm _not_ familiar with scheduling in qemu and I don't quite understand the mechanism of getting from "handle_9p_output() bailed out with some requests still not processed" to "no further requests get processed", so it might be that for some reason triggering the former as described above won't escalate to the latter, but I wouldn't count upon that. Another thing I'm very certain about is that 9 0 0 0 108 1 0 1 0 sent by broken client (Tflush tag = 1 oldtag = 1) will do nasty things to qemu server. v9fs_flush() will try to find the pdu of request to cancel, find its own argument, put itself on its ->complete and yield CPU, expecting to get back once the victim gets through to pdu_complete(). Since the victim is itself... AFAICS, the things client can expect wrt Tflush handling are * in no case should Rflush be sent before the reply to request its trying to cancel * the only case when server _may_ not send Rflush is the arrival of more than one Tflush with the same oldtag; in that case it is allowed to suppress replies to earlier ones. If they are not suppressed, replies should come in the order of Tflush arrivals. * if reply to Tflush is sent (see above), it must be Rflush. * multiple Tflush with the same oldtag are allowed; Linux kernel client does not issue those, but other clients might. As the matter of fact, Plan 9 kernel client *does* issue those. * Tflush to Tflush is no-op; it still needs a reply, and ordering constraints apply (it can't be sent before the reply to Tflush it's been refering to, which, in turn, can't be sent before the reply to request the first Tflush refers to). Normally such requests are not sent, but in principle they are allowed. * Tflush to request that isn't being processed should be answered immediately. The same goes for Tflush refering to itself. The former is not an error (we might have already sent a reply), but the latter might be worth a loud warning - clients are definitely not supposed to do that. It still needs Rflush in response - Rerror is not allowed. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 8 Jan 2017 05:46:39 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Sat, Jan 07, 2017 at 06:15:23PM +0000, Al Viro wrote: > > On Sat, Jan 07, 2017 at 05:19:10PM +0000, Al Viro wrote: > > > > > released) simply trigger virtio_queue_notify_vq() again? It *is* a bug > > > (if we get a burst filling a previously empty queue all at once, there won't > > > be any slots becoming freed > > > > Umm... Nope, even in that scenario we'll have all requests except the last > > one processed. I'm trying to put together a simpler reproducer, but... > > no luck so far. > > FWIW, here's another fun bug in qemu 9pfs: client may issue multiple > Tflush on the same pending request. Server must not reply to them > out of order and it must reply to at least the last one. If client has > sent more than one Tflush, it must treat an arrival of Rflush to one of > those as if it has received an Rflush to each of the earlier Tflush as > well (IOW, *their* tags are released). Client may not reuse the tag of > victim request until it has received Rflush for the last Tflush it has > sent. > > Linux kernel-side 9p client doesn't issue such multiple Tflush, but the > protocol allows that. > > qemu server does not guarantee in-order replies to multiple Tflush; worse, > while it is likely to reply only to one of those, it may very well be > the _first_ one. Subsequent ones will be lost; moreover, it'll leak > both coroutines and ring slots. > Yeah I'm aware about that, I had started to work on a fix but it's low priority on my TODO list since linux guests don't do that... and I got scheduled. > AFAICS, a clean way to deal with that would be to handle Tflush synchronously, > right in pdu_submit(): > if pdu->type == Tflush > look the victim request up. > if !victim || victim == pdu // [*] > send reply and free pdu immediately. > if victim->type == Tflush // [**] > pdu->victim = victim->victim > else > pdu->victim = victim > mark the victim cancelled > add pdu to the tail of pdu->victim->flushes > and let pdu_complete(pdu) send a reply to each request in pdu->flushes > as well (freeing them as we go) > I had kept the asynchronous way and it resulted in quite convoluted code. I'll give a try as per your suggestion. > Some locking might be needed to avoid the races between pdu_submit() and > pdu_complete(), but it's not going to be a wide critical area. I'm not > familiar with qemu locking primitives, sorry; what's providing an exclusion > between QLIST_INSERT_HEAD in pdu_alloc() (from handle_9p_output()) > and QLIST_REMOVE in pdu_free() (from pdu_complete())? Looks like the same > thing ought to suffice here... > This code runs in coroutines (stack switching, see include/qemu/coroutine.h for details): it is serialized. > [*] a cute way to hurt the current qemu server, BTW - coroutine that waits for > itself to complete... > Indeed :) > [**] Tflush to Tflush is another fun corner case - it does *not* do anything > to the earlier Tflush, but reply to it must follow that to original Tflush > to avoid tag reuse problems. Note that this is not the "multiple Tflush" > case mentioned above - those would all have oldtag pointing to the same > request; they are not chained and unlike this one can happen with legitimate > clients. Tflush to Tflush, OTOH, is not something a sane client would do. I agree I don't see when a client would want to do that but FWIW, it is mentioned in the last paragraph of http://man.cat-v.org/plan_9/5/flush :) Thanks for your valuable suggestions, Al! Cheers. -- Greg -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index aea7e9d..8a6b426 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -662,6 +662,7 @@ static void coroutine_fn pdu_complete(V9fsPDU *pdu, ssize_t len) /* fill out the header */ pdu_marshal(pdu, 0, "dbw", (int32_t)len, id, pdu->tag); + fprintf(stderr, "complete %d %04x\n", id, pdu->tag); /* keep these in sync */ pdu->size = len; @@ -2347,6 +2348,7 @@ static void v9fs_flush(void *opaque) return; } trace_v9fs_flush(pdu->tag, pdu->id, tag); + fprintf(stderr, "flush %04x %04x\n", tag, pdu->tag); QLIST_FOREACH(cancel_pdu, &s->active_list, next) { if (cancel_pdu->tag == tag) { diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 1782e4a..6a5ac04 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -76,6 +76,7 @@ static void handle_9p_output(VirtIODevice *vdev, VirtQueue *vq) pdu->id = out.id; pdu->tag = le16_to_cpu(out.tag_le); + fprintf(stderr, "out %d %04x\n", pdu->id, pdu->tag); qemu_co_queue_init(&pdu->complete); pdu_submit(pdu);