From patchwork Tue Mar 21 13:52:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9636645 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 96626602D6 for ; Tue, 21 Mar 2017 13:54:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E5942793A for ; Tue, 21 Mar 2017 13:54:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 832DE27D8D; Tue, 21 Mar 2017 13:54:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA40F2793A for ; Tue, 21 Mar 2017 13:54:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756964AbdCUNyv (ORCPT ); Tue, 21 Mar 2017 09:54:51 -0400 Received: from mail-qk0-f195.google.com ([209.85.220.195]:36814 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756928AbdCUNyu (ORCPT ); Tue, 21 Mar 2017 09:54:50 -0400 Received: by mail-qk0-f195.google.com with SMTP id n141so22129905qke.3; Tue, 21 Mar 2017 06:52:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=bK7vWdraTh3TcMqRdhbdhTYNpP4kXWR+GtV8eVbdKQg=; b=uV1xbz/CHheZB/XG2yQiBQF9gejH+vvxCkVKHE7zwoFNNXPzv8tT5tQwAuI7UUDW/P fk+wZZiJk3THyM+biNZLGg12Q9nr7JIoVekkc3JkJzB1deMskMmnp+j6K6OadCoS6hP8 kbzmUj+vRCOu+L0iU9+rfAixFejLG8mYX/zb/ijcV9RHV5CbE9SMYzbfFgAkqi34vwWE DvOaIm7yR+H3uc7YJCDQQ8rN8P4Z3ZMjBia1/CXWFlq6FiYKHC8o+P31dcFSl61DKeSS ilp2O6H8FpnntGv6Q5YjfTc6qZOaFeC86oZnaMz4jmwd7uOOHvwVdIzPTER5z3yN+h/z YBUA== X-Gm-Message-State: AFeK/H0zjfJeRU1VIajF1HS7sPlSV6Yk6XXnaaklYoZdGjyugy6BCxBM7dunNfZd+ZQtPw== X-Received: by 10.55.178.132 with SMTP id b126mr28452341qkf.225.1490104352716; Tue, 21 Mar 2017 06:52:32 -0700 (PDT) Received: from [172.21.250.142] ([69.46.234.238]) by smtp.gmail.com with ESMTPSA id x2sm14762730qkc.61.2017.03.21.06.52.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Mar 2017 06:52:31 -0700 (PDT) Subject: Re: SQ overflow seen running isert traffic To: Potnuri Bharat Teja , "Nicholas A. Bellinger" References: <20161017111655.GA21245@chelsio.com> <021001d228a4$6cd6a6c0$4683f440$@opengridcomputing.com> <20161018112801.GA3117@chelsio.com> <005101d2294c$be5bb460$3b131d20$@opengridcomputing.com> <1477885208.27946.8.camel@haakon3.risingtidesystems.com> <20161108100617.GA2812@chelsio.com> <20170320101526.GA11699@chelsio.com> <1490077940.8236.77.camel@haakon3.risingtidesystems.com> <20170321075131.GA11565@chelsio.com> Cc: SWise OGC , "target-devel@vger.kernel.org" , "linux-rdma@vger.kernel.org" From: Sagi Grimberg Message-ID: <945e2947-f67a-4202-cd27-d4631fe10f68@grimberg.me> Date: Tue, 21 Mar 2017 15:52:30 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <20170321075131.GA11565@chelsio.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Baharat and Nic, Apologies for the late reply, > Hi Nicholas, > I see them from 2MB onwards. >> >> > Here is what I see with the 3 patches alone applied: >> > >> > -> In isert_put_datain() and isert_post_response() a corresponding recv >> WR is posted before >> > posting a send and hence for every post failure a recv is already posted >> into a tightly packed >> > RQ, causing it to overflow. >> >> Just for me to understand, the intermittent TMR ABORT_TASKs are caused >> by the repeated failure to post RDMA_WRITE WRs for a large ISER Data-In >> payload, due to mis-sizing of needed WRs from RDMA R/W API vs. >> underlying hardware capabilities. > Yes. >> >> Moving the recv posts after the send post for RDMA_WRITEs helps to >> reduce the severity of the issue with iw_cxgb4, but doesn't completely >> eliminate the issue under load. > Moving recv posts only comes in to effect along with your changes. ... >> So the reason why your patch to swap post_recv -> post_send to post_send >> -> post_recv makes a difference is because it allows enough trickle of >> RDMA_WRITEs to make it through, where iser-initiator doesn't attempt to >> escalate recovery and doesn't attempt session reinstatement. > I dont exactly know if above thing comes into play but the actual reason I did > swap posting RQ and SQ is, unlike SQ, RQ is posted with WRs to the brim during > the intialisation itself. From thereon we post a RQ WR for every RQ completion > That makes it almost full at any point of time. > > Now in our scenario, SQ is miscalulated and too small for few adapters and so > filled gradually as the IO starts. Once SQ is full, according to your patches > isert queues it and tries to repost the command again. Here in iser functions > like isert_post_response(), isert_put_datain() post send is done after post recv. > For the first post send failure in say isert_put_datain(), the corresponding > post recv is already posted, then on queuing the command and trying reposting > an extra recv is again posted which fills up the RQ also. > > By swapping post recv and send as in my incermental patch, we dont post that > extra recv, and post recv only on successful post send. > Therfore I think this incremental patch is necessary. Reversing the order to recv and send posting will cause problems in stress IO workloads (especially for iWARP). The problem of sending a reply before reposting the recv buffer is that the initiator can send immediately a new request and we don't have a recv buffer waiting for it, which will cause RNR-NAK. This *will* cause performance drops and jitters for sure. How about we just track the rx_desc to know if we already posted it as a start (untested as I don't have access to RDMA HW this week): Tested-by: Potnuri Bharat Teja --- @@ -85,6 +85,7 @@ struct iser_rx_desc { u64 dma_addr; struct ib_sge rx_sg; struct ib_cqe rx_cqe; + bool in_use; char pad[ISER_RX_PAD_SIZE]; } __packed; -- We have a lot of room for cleanups in isert... I'll need to make some time to get it going... I'll be waiting to hear from you if it makes your issue go away. Cheers, Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 9b33c0c97468..fcbed35e95a8 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -817,6 +817,7 @@ isert_post_recvm(struct isert_conn *isert_conn, u32 count) rx_wr->sg_list = &rx_desc->rx_sg; rx_wr->num_sge = 1; rx_wr->next = rx_wr + 1; + rx_desc->in_use = false; } rx_wr--; rx_wr->next = NULL; /* mark end of work requests list */ @@ -835,6 +836,15 @@ isert_post_recv(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc) struct ib_recv_wr *rx_wr_failed, rx_wr; int ret; + if (!rx_desc->in_use) { + /* + * if the descriptor is not in-use we already reposted it + * for recv, so just silently return + */ + return 0; + } + + rx_desc->in_use = false; rx_wr.wr_cqe = &rx_desc->rx_cqe; rx_wr.sg_list = &rx_desc->rx_sg; rx_wr.num_sge = 1; @@ -1397,6 +1407,8 @@ isert_recv_done(struct ib_cq *cq, struct ib_wc *wc) return; } + rx_desc->in_use = true; + ib_dma_sync_single_for_cpu(ib_dev, rx_desc->dma_addr, ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE); diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h index c02ada57d7f5..87d994de8c91 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.h +++ b/drivers/infiniband/ulp/isert/ib_isert.h @@ -60,7 +60,7 @@ #define ISER_RX_PAD_SIZE (ISCSI_DEF_MAX_RECV_SEG_LEN + 4096 - \ (ISER_RX_PAYLOAD_SIZE + sizeof(u64) + sizeof(struct ib_sge) + \ - sizeof(struct ib_cqe))) + sizeof(struct ib_cqe) + sizeof(bool))) #define ISCSI_ISER_SG_TABLESIZE 256