[4/8] lustre: fld: retry fld rpc even for ESHUTDOWN
diff mbox series

Message ID 1564022647-17351-5-git-send-email-jsimmons@infradead.org
State New
Headers show
Series
  • lustre: some old patches from whamcloud tree
Related show

Commit Message

James Simmons July 25, 2019, 2:44 a.m. UTC
From: wang di <di.wang@intel.com>

when LWP is being evicted, because it is not replayable,
the request might return ESHUTDOWN or EWOULDBLOCK, instead
of failed, which might cause application failure, fld
client will retry RPC, until the connection is being setup
again or the LWP is being closed.

WC-bug-id: https://jira.whamcloud.com/browse/LU-4420
Lustre-commit: d335e310d4bf490509998ddbb1824e38cff20998
Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/10285
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
---
 fs/lustre/fld/fld_request.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Andreas Dilger Aug. 14, 2019, 4:58 p.m. UTC | #1
This is mostly useful only on the servers, but clients also allocate "sequences" of 1, so it doesn't hurt to apply it to clients as well. 

Cheers, Andreas

> On Jul 24, 2019, at 19:44, James Simmons <jsimmons@infradead.org> wrote:
> 
> From: wang di <di.wang@intel.com>
> 
> when LWP is being evicted, because it is not replayable,
> the request might return ESHUTDOWN or EWOULDBLOCK, instead
> of failed, which might cause application failure, fld
> client will retry RPC, until the connection is being setup
> again or the LWP is being closed.
> 
> WC-bug-id: https://jira.whamcloud.com/browse/LU-4420
> Lustre-commit: d335e310d4bf490509998ddbb1824e38cff20998
> Signed-off-by: wang di <di.wang@intel.com>
> Reviewed-on: http://review.whamcloud.com/10285
> Reviewed-by: Fan Yong <fan.yong@intel.com>
> Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
> ---
> fs/lustre/fld/fld_request.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
> index ec45ea6..ba0ef82 100644
> --- a/fs/lustre/fld/fld_request.c
> +++ b/fs/lustre/fld/fld_request.c
> @@ -367,12 +367,12 @@ int fld_client_rpc(struct obd_export *exp,
>    rc = ptlrpc_queue_wait(req);
>    obd_put_request_slot(&exp->exp_obd->u.cli);
>    if (rc != 0) {
> -        if (rc == -EWOULDBLOCK) {
> -            /* For no_delay req(see above), EWOULDBLOCK means the
> -             * connection is being evicted, but this seq lookup
> -             * should not return error, since it would cause
> -             * unecessary failure of the application, instead
> -             * it should retry here
> +        if (rc == -EWOULDBLOCK || rc == -ESHUTDOWN) {
> +            /* For no_delay req(see above), EWOULDBLOCK and
> +             * ESHUTDOWN means the connection is being evicted,
> +             * but this seq lookup should not return error,
> +             * since it would cause unecessary failure of the
> +             * application, instead it should retry here
>             */
>            ptlrpc_req_finished(req);
>            goto again;
> -- 
> 1.8.3.1
>
Andreas Dilger Aug. 14, 2019, 4:58 p.m. UTC | #2
While FLD is used on both client and server, the LWP connections are only used on clients. However, I don't see any harm to keep this code consistent with the master branch. 

Otherwise, you would have to remember to apply this patch at some point in the future when the server code is landed, which is more likely to leave the bug in place 

Cheers, Andreas

> On Jul 24, 2019, at 19:44, James Simmons <jsimmons@infradead.org> wrote:
> 
> From: wang di <di.wang@intel.com>
> 
> when LWP is being evicted, because it is not replayable,
> the request might return ESHUTDOWN or EWOULDBLOCK, instead
> of failed, which might cause application failure, fld
> client will retry RPC, until the connection is being setup
> again or the LWP is being closed.
> 
> WC-bug-id: https://jira.whamcloud.com/browse/LU-4420
> Lustre-commit: d335e310d4bf490509998ddbb1824e38cff20998
> Signed-off-by: wang di <di.wang@intel.com>
> Reviewed-on: http://review.whamcloud.com/10285
> Reviewed-by: Fan Yong <fan.yong@intel.com>
> Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
> ---
> fs/lustre/fld/fld_request.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
> index ec45ea6..ba0ef82 100644
> --- a/fs/lustre/fld/fld_request.c
> +++ b/fs/lustre/fld/fld_request.c
> @@ -367,12 +367,12 @@ int fld_client_rpc(struct obd_export *exp,
>    rc = ptlrpc_queue_wait(req);
>    obd_put_request_slot(&exp->exp_obd->u.cli);
>    if (rc != 0) {
> -        if (rc == -EWOULDBLOCK) {
> -            /* For no_delay req(see above), EWOULDBLOCK means the
> -             * connection is being evicted, but this seq lookup
> -             * should not return error, since it would cause
> -             * unecessary failure of the application, instead
> -             * it should retry here
> +        if (rc == -EWOULDBLOCK || rc == -ESHUTDOWN) {
> +            /* For no_delay req(see above), EWOULDBLOCK and
> +             * ESHUTDOWN means the connection is being evicted,
> +             * but this seq lookup should not return error,
> +             * since it would cause unecessary failure of the
> +             * application, instead it should retry here
>             */
>            ptlrpc_req_finished(req);
>            goto again;
> -- 
> 1.8.3.1
>

Patch
diff mbox series

diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
index ec45ea6..ba0ef82 100644
--- a/fs/lustre/fld/fld_request.c
+++ b/fs/lustre/fld/fld_request.c
@@ -367,12 +367,12 @@  int fld_client_rpc(struct obd_export *exp,
 	rc = ptlrpc_queue_wait(req);
 	obd_put_request_slot(&exp->exp_obd->u.cli);
 	if (rc != 0) {
-		if (rc == -EWOULDBLOCK) {
-			/* For no_delay req(see above), EWOULDBLOCK means the
-			 * connection is being evicted, but this seq lookup
-			 * should not return error, since it would cause
-			 * unecessary failure of the application, instead
-			 * it should retry here
+		if (rc == -EWOULDBLOCK || rc == -ESHUTDOWN) {
+			/* For no_delay req(see above), EWOULDBLOCK and
+			 * ESHUTDOWN means the connection is being evicted,
+			 * but this seq lookup should not return error,
+			 * since it would cause unecessary failure of the
+			 * application, instead it should retry here
 			 */
 			ptlrpc_req_finished(req);
 			goto again;