Message ID | 1374497956-32104-1-git-send-email-andi@etezian.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
From: Andi Shyti <andi@etezian.org> Date: Mon, 22 Jul 2013 14:59:16 +0200 > This patch gets rid of the following warning: > > net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function] > static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req) > > The rdma_cancelled function is not called anywhere in the kernel > > Signed-off-by: Andi Shyti <andi@etezian.org> Applied to net-next, thanks. ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
On Wed, 2013-07-24 at 15:46 -0700, David Miller wrote: > From: Andi Shyti <andi@etezian.org> > > This patch gets rid of the following warning: > > > > net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function] > > static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req) > > > > The rdma_cancelled function is not called anywhere in the kernel > > > > Signed-off-by: Andi Shyti <andi@etezian.org> > > Applied to net-next, thanks. After this patch one might as well revert the rest of commit 80b45261a0b2 ("9P: Add cancelled() to the transport functions.") too. It seems the entire cancelled callback stuff is now pointless. As I already asked in https://lkml.org/lkml/2013/7/15/87 : did that commit "forget to actually hook up rdma_cancelled() into p9_rdma_trans()?". It does look so to me. Paul Bolle ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
From: Paul Bolle <pebolle@tiscali.nl> Date: Thu, 25 Jul 2013 01:09:47 +0200 > On Wed, 2013-07-24 at 15:46 -0700, David Miller wrote: >> From: Andi Shyti <andi@etezian.org> >> > This patch gets rid of the following warning: >> > >> > net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function] >> > static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req) >> > >> > The rdma_cancelled function is not called anywhere in the kernel >> > >> > Signed-off-by: Andi Shyti <andi@etezian.org> >> >> Applied to net-next, thanks. > > After this patch one might as well revert the rest of commit > 80b45261a0b2 ("9P: Add cancelled() to the transport functions.") too. It > seems the entire cancelled callback stuff is now pointless. > > As I already asked in https://lkml.org/lkml/2013/7/15/87 : did that > commit "forget to actually hook up rdma_cancelled() into > p9_rdma_trans()?". It does look so to me. If nobody responds to this in the next day or so, feel free to send me a patch which rips it all out. No response means they don't care, and neither, therefore, should we. Thanks. ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Hi, David Miller wrote on Wed, Jul 24, 2013 : > From: Paul Bolle <pebolle@tiscali.nl> > Date: Thu, 25 Jul 2013 01:09:47 +0200 > > After this patch one might as well revert the rest of commit > > 80b45261a0b2 ("9P: Add cancelled() to the transport functions.") too. It > > seems the entire cancelled callback stuff is now pointless. > > > > As I already asked in https://lkml.org/lkml/2013/7/15/87 : did that > > commit "forget to actually hook up rdma_cancelled() into > > p9_rdma_trans()?". It does look so to me. > > If nobody responds to this in the next day or so, feel free to send me > a patch which rips it all out. > > No response means they don't care, and neither, therefore, should we. Well, I do care - but I couldn't find where the trans->cancelled member function was supposed to be called anyway... So adding it to the struct and fixing the warning is well and fine, but if it's still never called in the end I don't see much point and there's nothing to test. It's on my loooong todo list of things to look at, but as no other reply came and the patch was never picked up it kind of dropped in priority. I'll have another look next week I guess the worst that could happen would be submitting the rdma_cancelled function again when this is sorted out :) Regards,
Hard morning, sorry for the double mail. Dominique Martinet wrote on Thu, Jul 25, 2013 : > Well, I do care - but I couldn't find where the trans->cancelled member > function was supposed to be called anyway... > So adding it to the struct and fixing the warning is well and fine, but > if it's still never called in the end I don't see much point and there's > nothing to test. To be more precise, there's a single call to c->trans_mode->cancelled in net/9p/client.c, in p9_client_flush, which is called on subfunction returning -ERESTARTSYS... which never happens as far as I could see. This will be useful once/if we start working on client recovery, though - so the function in itself definitely does interest me, and I guess that thinking about I would have preferred to have the hook added rather than the function removed. But there definitely is no hurry to add this cancelled function till then. Regards,
From: Andi Shyti <andi@etezian.org> Date: Thu, 25 Jul 2013 10:27:41 +0200 > I would prefer to NACK my patch, it messes up things more than > fixes. Your patch it already applied to my tree, so you must send me something which fixes things. > If we really want to get rid of this function we should > completely revert this patch: > > 80b45261a0b263536b043c5ccfc4ba4fc27c2acc Such as this. ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
On Thu, Jul 25, 2013 at 1:48 AM, Dominique Martinet < dominique.martinet@cea.fr> wrote: > Hard morning, sorry for the double mail. > > Dominique Martinet wrote on Thu, Jul 25, 2013 : > > Well, I do care - but I couldn't find where the trans->cancelled member > > function was supposed to be called anyway... > > So adding it to the struct and fixing the warning is well and fine, but > > if it's still never called in the end I don't see much point and there's > > nothing to test. > > To be more precise, there's a single call to c->trans_mode->cancelled in > net/9p/client.c, in p9_client_flush, which is called on subfunction > returning > -ERESTARTSYS... which never happens as far as I could see. > > -ERESTARTSYS will happen in a variety of contexts, most likely though is that the user tried to interrupt a pending blocked operation on the remote file system (this is pretty common in Plan 9 world where I/O operations can be long-standing with synthetic file systems). > This will be useful once/if we start working on client recovery, though > - so the function in itself definitely does interest me, and I guess > that thinking about I would have preferred to have the hook added rather > than the function removed. > But there definitely is no hurry to add this cancelled function till > then. > So, the cancel function should be used to flush any pending requests that haven't actually been sent yet. Looking at the 9p RDMA code, it looks like the thought was that this wasn't going to be possible. Regardless of removing unsent requests, the flush will still be sent and if the server processes it before the original request and sends a flush response back then we need to clear the posted buffer. This is what rdma_cancelled is supposed to be doing. So, the fix is to hook it into the structure -- but looking at the code it seems like we probably need to do something more to reclaim the buffer rather than just incrementing a counter. To be clear this has less to do with recovery and more to do with the proper implementation of 9p flush semantics. By and large, those semantics won't impact static file system users -- but if anyone is using the transport to access synthetic filesystems or files then they'll definitely want to have a properly implemented flush setup. The way to test this is to get a blocking read on a remote named pipe or fifo and then ^C it. -eric ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
I think I need to stop sending mails before triple-checking things! So sorry for the multiple mails again. Dominique Martinet wrote on Thu, Jul 25, 2013 : > [rdma_cancelled] > There is one problem though - if the server handles the original request > before getting the flush, the receive buffer will be consumed and we > won't send a new one, so we'll starve the reception queue. > I'm afraid I don't have any bright idea there... This still looks correct to me. > While we are on reception buffer issues, there is another problem with > the queue of receive buffers, even without flush, in the following > scenario: > - post a buffer for tag 0, on a hanging request > - post a buffer for tag 1 > - reply for tag 1 will come on buffer from tag 0 > - post another request with tag 1.. the buffer already is in the queue, > and we don't know we can post the buffer associated with tag 0 back. It actually looks like the reply buffers are swapped properly - taken out of the req struct into the context on send, then given back to the appropriate req on reception, so on normal operation there's no problem with what I described - sorry for crying wolf. > I haven't found how to reproduce this perfectly yet, but a dd with > blocksize 1MB and one with blocksize 10B in parallel brought the > mountpoint down (and the whole server was completely unavailable for the > duration of the dd - TCP sessions timed out, I even got IO errors on the > local disk :D) I need to run more tests to explain what happens with the two dds, but it's easily reproductible with debugs on, I guess that helps with a race somewhere. Regards,
On Thu, Jul 25, 2013 at 2:05 PM, Dominique Martinet < dominique.martinet@cea.fr> wrote: > Eric Van Hensbergen wrote on Thu, Jul 25, 2013 : > > So, the cancel function should be used to flush any pending requests that > > haven't actually been sent yet. Looking at the 9p RDMA code, it looks > like > > the thought was that this wasn't going to be possible. Regardless of > > removing unsent requests, the flush will still be sent and if the server > > processes it before the original request and sends a flush response back > > then we need to clear the posted buffer. This is what rdma_cancelled is > > supposed to be doing. So, the fix is to hook it into the structure -- > but > > looking at the code it seems like we probably need to do something more > to > > reclaim the buffer rather than just incrementing a counter. > > > > To be clear this has less to do with recovery and more to do with the > > proper implementation of 9p flush semantics. By and large, those > semantics > > won't impact static file system users -- but if anyone is using the > > transport to access synthetic filesystems or files then they'll > definitely > > want to have a properly implemented flush setup. The way to test this is > > to get a blocking read on a remote named pipe or fifo and then ^C it. > > Ok, I knew about the concept of flush but didn't think a ^C would cause > a -ESYSRESTART, so didn't think of that. > That said, reading from, say, a fifo is an entierly local operation: the > client does a walk, getattr, doesn't do anything 9p-wise, and clunks > when it's done with it. > > > It depends on the flags you use with 9p. If you mount with nodevmap it'll treat them like normal files. Alternatively you can use synthetics from fuse or even 9p file systems from Plan 9 port. > > As for the function needing a bit more work, there's a race, but on > "normal" requests I think it is about right - the answer lays in a > comment in rdma_request: > > /* When an error occurs between posting the recv and the send, > * there will be a receive context posted without a pending request. > * Since there is no way to "un-post" it, we remember it and skip > * post_recv() for the next request. > * So here, > * see if we are this `next request' and need to absorb an excess rc. > * If yes, then drop and free our own, and do not recv_post(). > **/ > > Basically, receive buffers are sent in a queue, and we can't "retrieve" > it back, so we just don't sent next one. > > There is one problem though - if the server handles the original request > before getting the flush, the receive buffer will be consumed and we > won't send a new one, so we'll starve the reception queue. > I'm afraid I don't have any bright idea there... > > I need to go back and refresh more of my RDMA knowledge, I thought you could provide a better more direct coupling between the request and response buffers (similar to what we do in virtio). In such a case, you'd simply need to reclaim the "skipped" buffer not shuffle everything around. Ultimately that would seem to work against the point of having "zero copy" RDMA. > > I haven't found how to reproduce this perfectly yet, but a dd with > blocksize 1MB and one with blocksize 10B in parallel brought the > mountpoint down (and the whole server was completely unavailable for the > duration of the dd - TCP sessions timed out, I even got IO errors on the > local disk :D) > Sounds like you ran out of memory and things didn't degrade gracefully. Which server are you using? -eric ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Eric Van Hensbergen wrote on Fri, Jul 26, 2013 : > It depends on the flags you use with 9p. If you mount with nodevmap it'll > treat them like normal files. Alternatively you can use synthetics from > fuse or even 9p file systems from Plan 9 port. Ok, thank you for the explanation. Now I need to see if my server supports this :) > I need to go back and refresh more of my RDMA knowledge, I thought you > could provide a better more direct coupling between the request and > response buffers (similar to what we do in virtio). In such a case, you'd > simply need to reclaim the "skipped" buffer not shuffle everything around. > Ultimately that would seem to work against the point of having "zero copy" > RDMA. What I said next was wrong, but the queue really isn't something you can control this finely. Once something is posted, you can't take it back, and you can't know which buffer will come back for which reply (but it's easy enough to read the tag from the reply and associate it back) For flush, I'm honestly not sure on how to deal with the possibility of the servers processing the original query and sending a reply after we've tagged our own buffer as free again, but I think there might actually be a problem even on sucessful flush if the next post is from a different tag (that has its own rc buffer) -- as far as I understand it, this one's reply will be considered duplicate reply in handle_recv. But I really need to do some more precise testing, I'll try playing with nodevmap, if I get the flush functions to be called I'll get this fixed. > > I haven't found how to reproduce this perfectly yet, but a dd with > > blocksize 1MB and one with blocksize 10B in parallel brought the > > mountpoint down (and the whole server was completely unavailable for the > > duration of the dd - TCP sessions timed out, I even got IO errors on the > > local disk :D) > > > > Sounds like you ran out of memory and things didn't degrade gracefully. I think it's actually "just" in the debug, there are two p9_debug printk in p9_client_cb which are called from an rdma irq handler, and it gets messy once the kernel buffer is full and it wants to flush to the console. I've got a short patch that adds an argument to p9_client_cb with an indication that it shouldn't output messages, I'll send it to the v9fs mailing list for comments if you think the idea is acceptable. > Which server are you using? I know of two servers out there which handle 9P/RDMA - diod ( http://code.google.com/p/diod/ ) and nfs-ganesha ( https://github.com/nfs-ganesha/nfs-ganesha ) I wrote the RDMA implementation for ganesha and am mostly running tests on that, but both should somewhat work with their quirks :) Regards,
From: Andi Shyti <andi@etezian.org> Date: Thu, 25 Jul 2013 10:54:24 +0200 > This patch reverts commit > > 80b45261a0b263536b043c5ccfc4ba4fc27c2acc > > which was implementing a 'cancelled' functionality to notify that > a cancelled request will not be replied. > > This implementation was not used anywhere and therefore removed. > > Signed-off-by: Andi Shyti <andi@etezian.org> Applied, thanks Andi. ------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c index 928f2bb..8f68df5 100644 --- a/net/9p/trans_rdma.c +++ b/net/9p/trans_rdma.c @@ -588,17 +588,6 @@ static int rdma_cancel(struct p9_client *client, struct p9_req_t *req) return 1; } -/* A request has been fully flushed without a reply. - * That means we have posted one buffer in excess. - */ -static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req) -{ - struct p9_trans_rdma *rdma = client->trans; - - atomic_inc(&rdma->excess_rc); - return 0; -} - /** * trans_create_rdma - Transport method for creating atransport instance * @client: client instance
This patch gets rid of the following warning: net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function] static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req) The rdma_cancelled function is not called anywhere in the kernel Signed-off-by: Andi Shyti <andi@etezian.org> --- net/9p/trans_rdma.c | 11 ----------- 1 file changed, 11 deletions(-)