From patchwork Tue Mar 19 10:16:57 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Riemer X-Patchwork-Id: 2299961 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 27C2ADFB79 for ; Tue, 19 Mar 2013 10:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755523Ab3CSKQz (ORCPT ); Tue, 19 Mar 2013 06:16:55 -0400 Received: from mail-bk0-f47.google.com ([209.85.214.47]:41847 "EHLO mail-bk0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755093Ab3CSKQx (ORCPT ); Tue, 19 Mar 2013 06:16:53 -0400 Received: by mail-bk0-f47.google.com with SMTP id jc3so136778bkc.20 for ; Tue, 19 Mar 2013 03:16:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:x-enigmail-version:content-type:x-gm-message-state; bh=mzZ/Z5sb5DcCd7FQSwQT9p5DlTcWMji9VBoYbwGCLuw=; b=pNoF2IarDe4sR5/tQA/pVxmYwa0/bj4BnGLiGN3RRM1+OB1aIfZQaCwJQLWu32I8vh 8uqKcII0YOaDYRjo+7HglAtfcLsr3KAy3JnaZ8HZLOxbApF89TyKkjgt5Uvbc58wE73a IG5lt4+LFyi1s3JtB9H0Cz2hVOv7TraLbnbQIgWE5bdIu+vMkV4LyaZ6BdWyGXgsj1+D DD+u1ygqeLq245As7D/H1GOwMohmRX0wFwDhlMyCNLbNR6gil6lfpg/wIxxIbiqWStKO zlW29tam5TWNJOwKA+tJqIoCbqPIS6r/83WStBi7v6KKxVDvAUmI8kJ/NkELXAgoF9Rn uQ6w== X-Received: by 10.204.146.141 with SMTP id h13mr5340888bkv.127.1363688212110; Tue, 19 Mar 2013 03:16:52 -0700 (PDT) Received: from [192.168.88.16] ([62.217.45.26]) by mx.google.com with ESMTPS id gi19sm6301801bkc.2.2013.03.19.03.16.50 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 19 Mar 2013 03:16:50 -0700 (PDT) Message-ID: <51483B19.1070201@profitbricks.com> Date: Tue, 19 Mar 2013 11:16:57 +0100 From: Sebastian Riemer User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130221 Thunderbird/17.0.3 MIME-Version: 1.0 To: Bart Van Assche CC: "linux-rdma@vger.kernel.org" Subject: [RFC ib_srp-backport] ib_srp: bind fast IO failing to QP timeout X-Enigmail-Version: 1.5.1 X-Gm-Message-State: ALoCoQniLB9Wh/ceXxlBobGgykmyQd1rdMOx3It0AnWvuNd+6qxpWV6z2TV4fmivt9ayXQK8v/QH Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Hi Bart, now I've got my priority on SRP again. I've also noticed that your ib_srp-backport doesn't fail the IO fast enough. The fast_io_fail_tmo only comes into play after the QP is already in timeout and the "terminate_rport_io" function is missing. My idea is to use the QP retry count directly for fast IO failing. It is at 7 by default and the QP timeout is at approx. 2s. The overall QP timeout is at approx. 35s already (1+7 tries * 2s * 2, I guess). Using only 3 retries I'm at approx 18s. My patches introduce that parameter as module parameter as it is quite difficult to set the QP from RTS to RTR again. Only there the QP timeout parameters can be set. My patch series isn't complete yet as paths aren't reconnected - they are only failed fast bound to the overall QP timeout. But it should give you an idea what I'm trying to do here. What are your thought regarding this? Attached patches: ib_srp: register srp_fail_rport_io as terminate_rport_io ib_srp: be quiet when failing SCSI commands scsi_transport_srp: disable the fast_io_fail_tmo parameter ib_srp: show the QP timeout and retry count in srp_host sysfs files ib_srp: introduce qp_retry_cnt module parameter Cheers, Sebastian Btw.: Before, I've hacked MD RAID-1 for high-performance replication as DRBD is crap for our purposes. But that's worthless without a reliably working transport. From c101d00fe529d845192dd6d5930a1b9c16c99b81 Mon Sep 17 00:00:00 2001 From: Sebastian Riemer Date: Wed, 13 Mar 2013 16:16:28 +0100 Subject: [PATCH 1/5] ib_srp: register srp_fail_rport_io as terminate_rport_io We need to fail the IO fast in the selected time. So register the missing terminate_rport_io function. Signed-off-by: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index dc49dc8..64644c5 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -756,6 +756,29 @@ static void srp_reset_req(struct srp_target_port *target, struct srp_request *re } } +static void srp_fail_req(struct srp_target_port *target, struct srp_request *req) +{ + struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL); + + if (scmnd) { + srp_free_req(target, req, scmnd, 0); + scmnd->result = DID_TRANSPORT_FAILFAST << 16; + scmnd->scsi_done(scmnd); + } +} + +static void srp_fail_rport_io(struct srp_rport *rport) +{ + struct srp_target_port *target = rport->lld_data; + int i; + + for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { + struct srp_request *req = &target->req_ring[i]; + if (req->scmnd) + srp_fail_req(target, req); + } +} + static int srp_reconnect_target(struct srp_target_port *target) { struct Scsi_Host *shost = target->scsi_host; @@ -2700,6 +2723,7 @@ static void srp_remove_one(struct ib_device *device) static struct srp_function_template ib_srp_transport_functions = { .rport_delete = srp_rport_delete, + .terminate_rport_io = srp_fail_rport_io, }; static int __init srp_init_module(void)