diff mbox

[V9fs-developer] net: trans_rdma: remove unused function

Message ID 1374497956-32104-1-git-send-email-andi@etezian.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andi Shyti July 22, 2013, 12:59 p.m. UTC
This patch gets rid of the following warning:

net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]
 static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)

The rdma_cancelled function is not called anywhere in the kernel

Signed-off-by: Andi Shyti <andi@etezian.org>
---
 net/9p/trans_rdma.c | 11 -----------
 1 file changed, 11 deletions(-)

Comments

David Miller July 24, 2013, 10:46 p.m. UTC | #1
From: Andi Shyti <andi@etezian.org>

Date: Mon, 22 Jul 2013 14:59:16 +0200

> This patch gets rid of the following warning:

> 

> net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]

>  static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)

> 

> The rdma_cancelled function is not called anywhere in the kernel

> 

> Signed-off-by: Andi Shyti <andi@etezian.org>


Applied to net-next, thanks.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Paul Bolle July 24, 2013, 11:09 p.m. UTC | #2
On Wed, 2013-07-24 at 15:46 -0700, David Miller wrote:
> From: Andi Shyti <andi@etezian.org>
> > This patch gets rid of the following warning:
> > 
> > net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]
> >  static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)
> > 
> > The rdma_cancelled function is not called anywhere in the kernel
> > 
> > Signed-off-by: Andi Shyti <andi@etezian.org>
> 
> Applied to net-next, thanks.

After this patch one might as well revert the rest of commit
80b45261a0b2 ("9P: Add cancelled() to the transport functions.") too. It
seems the entire cancelled callback stuff is now pointless.

As I already asked in https://lkml.org/lkml/2013/7/15/87 : did that
commit "forget to actually hook up rdma_cancelled() into
p9_rdma_trans()?". It does look so to me.


Paul Bolle



------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
David Miller July 24, 2013, 11:45 p.m. UTC | #3
From: Paul Bolle <pebolle@tiscali.nl>

Date: Thu, 25 Jul 2013 01:09:47 +0200

> On Wed, 2013-07-24 at 15:46 -0700, David Miller wrote:

>> From: Andi Shyti <andi@etezian.org>

>> > This patch gets rid of the following warning:

>> > 

>> > net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]

>> >  static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)

>> > 

>> > The rdma_cancelled function is not called anywhere in the kernel

>> > 

>> > Signed-off-by: Andi Shyti <andi@etezian.org>

>> 

>> Applied to net-next, thanks.

> 

> After this patch one might as well revert the rest of commit

> 80b45261a0b2 ("9P: Add cancelled() to the transport functions.") too. It

> seems the entire cancelled callback stuff is now pointless.

> 

> As I already asked in https://lkml.org/lkml/2013/7/15/87 : did that

> commit "forget to actually hook up rdma_cancelled() into

> p9_rdma_trans()?". It does look so to me.


If nobody responds to this in the next day or so, feel free to send me
a patch which rips it all out.

No response means they don't care, and neither, therefore, should we.

Thanks.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Dominique Martinet July 25, 2013, 6:14 a.m. UTC | #4
Hi,

David Miller wrote on Wed, Jul 24, 2013 :
> From: Paul Bolle <pebolle@tiscali.nl>
> Date: Thu, 25 Jul 2013 01:09:47 +0200
> > After this patch one might as well revert the rest of commit
> > 80b45261a0b2 ("9P: Add cancelled() to the transport functions.") too. It
> > seems the entire cancelled callback stuff is now pointless.
> > 
> > As I already asked in https://lkml.org/lkml/2013/7/15/87 : did that
> > commit "forget to actually hook up rdma_cancelled() into
> > p9_rdma_trans()?". It does look so to me.
> 
> If nobody responds to this in the next day or so, feel free to send me
> a patch which rips it all out.
> 
> No response means they don't care, and neither, therefore, should we.

Well, I do care - but I couldn't find where the trans->cancelled member
function was supposed to be called anyway...
So adding it to the struct and fixing the warning is well and fine, but
if it's still never called in the end I don't see much point and there's
nothing to test.

It's on my loooong todo list of things to look at, but as no other reply
came and the patch was never picked up it kind of dropped in priority.
I'll have another look next week

I guess the worst that could happen would be submitting the
rdma_cancelled function again when this is sorted out :)

Regards,
Dominique Martinet July 25, 2013, 6:48 a.m. UTC | #5
Hard morning, sorry for the double mail.

Dominique Martinet wrote on Thu, Jul 25, 2013 :
> Well, I do care - but I couldn't find where the trans->cancelled member
> function was supposed to be called anyway...
> So adding it to the struct and fixing the warning is well and fine, but
> if it's still never called in the end I don't see much point and there's
> nothing to test.

To be more precise, there's a single call to c->trans_mode->cancelled in
net/9p/client.c, in p9_client_flush, which is called on subfunction returning
-ERESTARTSYS... which never happens as far as I could see.

This will be useful once/if we start working on client recovery, though
- so the function in itself definitely does interest me, and I guess
that thinking about I would have preferred to have the hook added rather
than the function removed.
But there definitely is no hurry to add this cancelled function till
then.


Regards,
David Miller July 25, 2013, 8:35 a.m. UTC | #6
From: Andi Shyti <andi@etezian.org>
Date: Thu, 25 Jul 2013 10:27:41 +0200

> I would prefer to NACK my patch, it messes up things more than
> fixes.

Your patch it already applied to my tree, so you must send me
something which fixes things.

> If we really want to get rid of this function we should
> completely revert this patch:
> 
> 80b45261a0b263536b043c5ccfc4ba4fc27c2acc

Such as this.

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Eric Van Hensbergen July 25, 2013, 5:31 p.m. UTC | #7
On Thu, Jul 25, 2013 at 1:48 AM, Dominique Martinet <
dominique.martinet@cea.fr> wrote:

> Hard morning, sorry for the double mail.
>
> Dominique Martinet wrote on Thu, Jul 25, 2013 :
> > Well, I do care - but I couldn't find where the trans->cancelled member
> > function was supposed to be called anyway...
> > So adding it to the struct and fixing the warning is well and fine, but
> > if it's still never called in the end I don't see much point and there's
> > nothing to test.
>
> To be more precise, there's a single call to c->trans_mode->cancelled in
> net/9p/client.c, in p9_client_flush, which is called on subfunction
> returning
> -ERESTARTSYS... which never happens as far as I could see.
>
>
-ERESTARTSYS will happen in a variety of contexts, most likely though is
that the user tried to interrupt a pending blocked operation on the remote
file system (this is pretty common in Plan 9 world where I/O operations can
be long-standing with synthetic file systems).


> This will be useful once/if we start working on client recovery, though
> - so the function in itself definitely does interest me, and I guess
> that thinking about I would have preferred to have the hook added rather
> than the function removed.
> But there definitely is no hurry to add this cancelled function till
> then.
>

So, the cancel function should be used to flush any pending requests that
haven't actually been sent yet.  Looking at the 9p RDMA code, it looks like
the thought was that this wasn't going to be possible.  Regardless of
removing unsent requests, the flush will still be sent and if the server
processes it before the original request and sends a flush response back
then we need to clear the posted buffer.  This is what rdma_cancelled is
supposed to be doing.  So, the fix is to hook it into the structure -- but
looking at the code it seems like we probably need to do something more to
reclaim the buffer rather than just incrementing a counter.

To be clear this has less to do with recovery and more to do with the
proper implementation of 9p flush semantics.  By and large, those semantics
won't impact static file system users -- but if anyone is using the
transport to access synthetic filesystems or files then they'll definitely
want to have a properly implemented flush setup.  The way to test this is
to get a blocking read on a remote named pipe or fifo and then ^C it.

      -eric
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Dominique Martinet July 26, 2013, 7:01 a.m. UTC | #8
I think I need to stop sending mails before triple-checking things!
So sorry for the multiple mails again.

Dominique Martinet wrote on Thu, Jul 25, 2013 :
> [rdma_cancelled]
> There is one problem though - if the server handles the original request
> before getting the flush, the receive buffer will be consumed and we
> won't send a new one, so we'll starve the reception queue.
> I'm afraid I don't have any bright idea there...

This still looks correct to me.


> While we are on reception buffer issues, there is another problem with
> the queue of receive buffers, even without flush, in the following
> scenario:
>  - post a buffer for tag 0, on a hanging request
>  - post a buffer for tag 1
>  - reply for tag 1 will come on buffer from tag 0
>  - post another request with tag 1.. the buffer already is in the queue,
> and we don't know we can post the buffer associated with tag 0 back.

It actually looks like the reply buffers are swapped properly - taken
out of the req struct into the context on send, then given back to the
appropriate req on reception, so on normal operation there's no problem
with what I described - sorry for crying wolf.
 
> I haven't found how to reproduce this perfectly yet, but a dd with
> blocksize 1MB and one with blocksize 10B in parallel brought the
> mountpoint down (and the whole server was completely unavailable for the
> duration of the dd - TCP sessions timed out, I even got IO errors on the
> local disk :D)

I need to run more tests to explain what happens with the two dds, but
it's easily reproductible with debugs on, I guess that helps with a race
somewhere.

Regards,
Eric Van Hensbergen July 26, 2013, 1:42 p.m. UTC | #9
On Thu, Jul 25, 2013 at 2:05 PM, Dominique Martinet <
dominique.martinet@cea.fr> wrote:

> Eric Van Hensbergen wrote on Thu, Jul 25, 2013 :
> > So, the cancel function should be used to flush any pending requests that
> > haven't actually been sent yet.  Looking at the 9p RDMA code, it looks
> like
> > the thought was that this wasn't going to be possible.  Regardless of
> > removing unsent requests, the flush will still be sent and if the server
> > processes it before the original request and sends a flush response back
> > then we need to clear the posted buffer.  This is what rdma_cancelled is
> > supposed to be doing.  So, the fix is to hook it into the structure --
> but
> > looking at the code it seems like we probably need to do something more
> to
> > reclaim the buffer rather than just incrementing a counter.
> >
> > To be clear this has less to do with recovery and more to do with the
> > proper implementation of 9p flush semantics.  By and large, those
> semantics
> > won't impact static file system users -- but if anyone is using the
> > transport to access synthetic filesystems or files then they'll
> definitely
> > want to have a properly implemented flush setup.  The way to test this is
> > to get a blocking read on a remote named pipe or fifo and then ^C it.
>
> Ok, I knew about the concept of flush but didn't think a ^C would cause
> a -ESYSRESTART, so didn't think of that.
> That said, reading from, say, a fifo is an entierly local operation: the
> client does a walk, getattr, doesn't do anything 9p-wise, and clunks
> when it's done with it.
>
>
>
It depends on the flags you use with 9p.  If you mount with nodevmap it'll
treat them like normal files.  Alternatively you can use synthetics from
fuse or even 9p file systems from Plan 9 port.


>
> As for the function needing a bit more work, there's a race, but on
> "normal" requests I think it is about right - the answer lays in a
> comment in rdma_request:
>
>   /* When an error occurs between posting the recv and the send,
>    * there will be a receive context posted without a pending request.
>    * Since there is no way to "un-post" it, we remember it and skip
>    * post_recv() for the next request.
>    * So here,
>    * see if we are this `next request' and need to absorb an excess rc.
>    * If yes, then drop and free our own, and do not recv_post().
>    **/
>
> Basically, receive buffers are sent in a queue, and we can't "retrieve"
> it back, so we just don't sent next one.
>
> There is one problem though - if the server handles the original request
> before getting the flush, the receive buffer will be consumed and we
> won't send a new one, so we'll starve the reception queue.
> I'm afraid I don't have any bright idea there...
>
>
I need to go back and refresh more of my RDMA knowledge, I thought you
could provide a better more direct coupling between the request and
response buffers (similar to what we do in virtio).  In such a case, you'd
simply need to reclaim the "skipped" buffer not shuffle everything around.
Ultimately that would seem to work against the point of having "zero copy"
RDMA.


>
> I haven't found how to reproduce this perfectly yet, but a dd with
> blocksize 1MB and one with blocksize 10B in parallel brought the
> mountpoint down (and the whole server was completely unavailable for the
> duration of the dd - TCP sessions timed out, I even got IO errors on the
> local disk :D)
>

Sounds like you ran out of memory and things didn't degrade gracefully.
 Which server are you using?

        -eric
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
Dominique Martinet July 26, 2013, 3:17 p.m. UTC | #10
Eric Van Hensbergen wrote on Fri, Jul 26, 2013 :
> It depends on the flags you use with 9p.  If you mount with nodevmap it'll
> treat them like normal files.  Alternatively you can use synthetics from
> fuse or even 9p file systems from Plan 9 port.

Ok, thank you for the explanation. Now I need to see if my server
supports this :)

> I need to go back and refresh more of my RDMA knowledge, I thought you
> could provide a better more direct coupling between the request and
> response buffers (similar to what we do in virtio).  In such a case, you'd
> simply need to reclaim the "skipped" buffer not shuffle everything around.
> Ultimately that would seem to work against the point of having "zero copy"
> RDMA.

What I said next was wrong, but the queue really isn't something you can
control this finely. Once something is posted, you can't take it back,
and you can't know which buffer will come back for which reply (but it's
easy enough to read the tag from the reply and associate it back)

For flush, I'm honestly not sure on how to deal with the possibility of
the servers processing the original query and sending a reply after
we've tagged our own buffer as free again, but I think there might
actually be a problem even on sucessful flush if the next post is from a
different tag (that has its own rc buffer) -- as far as I understand it,
this one's reply will be considered duplicate reply in handle_recv.
But I really need to do some more precise testing, I'll try playing with
nodevmap, if I get the flush functions to be called I'll get this
fixed.


> > I haven't found how to reproduce this perfectly yet, but a dd with
> > blocksize 1MB and one with blocksize 10B in parallel brought the
> > mountpoint down (and the whole server was completely unavailable for the
> > duration of the dd - TCP sessions timed out, I even got IO errors on the
> > local disk :D)
> >
> 
> Sounds like you ran out of memory and things didn't degrade gracefully.

I think it's actually "just" in the debug, there are two p9_debug printk
in p9_client_cb which are called from an rdma irq handler, and it gets
messy once the kernel buffer is full and it wants to flush to the
console.
I've got a short patch that adds an argument to p9_client_cb with an
indication that it shouldn't output messages, I'll send it to the v9fs
mailing list for comments if you think the idea is acceptable.

>  Which server are you using?

I know of two servers out there which handle 9P/RDMA - diod (
http://code.google.com/p/diod/ ) and nfs-ganesha (
https://github.com/nfs-ganesha/nfs-ganesha )
I wrote the RDMA implementation for ganesha and am mostly running tests
on that, but both should somewhat work with their quirks :)

Regards,
David Miller July 30, 2013, 10:54 p.m. UTC | #11
From: Andi Shyti <andi@etezian.org>
Date: Thu, 25 Jul 2013 10:54:24 +0200

> This patch reverts commit
> 
> 80b45261a0b263536b043c5ccfc4ba4fc27c2acc
> 
> which was implementing a 'cancelled' functionality to notify that
> a cancelled request will not be replied.
> 
> This implementation was not used anywhere and therefore removed.
> 
> Signed-off-by: Andi Shyti <andi@etezian.org>

Applied, thanks Andi.

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
diff mbox

Patch

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 928f2bb..8f68df5 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -588,17 +588,6 @@  static int rdma_cancel(struct p9_client *client, struct p9_req_t *req)
 	return 1;
 }
 
-/* A request has been fully flushed without a reply.
- * That means we have posted one buffer in excess.
- */
-static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)
-{
-	struct p9_trans_rdma *rdma = client->trans;
-
-	atomic_inc(&rdma->excess_rc);
-	return 0;
-}
-
 /**
  * trans_create_rdma - Transport method for creating atransport instance
  * @client: client instance