From patchwork Fri Oct 16 14:28:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11841805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF6A914B2 for ; Fri, 16 Oct 2020 14:28:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BCF2420897 for ; Fri, 16 Oct 2020 14:28:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NAVNIZm3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408816AbgJPO2j (ORCPT ); Fri, 16 Oct 2020 10:28:39 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:22669 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408814AbgJPO2i (ORCPT ); Fri, 16 Oct 2020 10:28:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602858517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5NEORNZ83ko/42dG+WK88K6XSXli8mWtOAjrChHKcpM=; b=NAVNIZm3WUSg9ALUWB2B/cGtWEc1lsVo9yvFNeS9W+k4FK92fcxkrslXtE7gX+oSx6Darl RCI54khouulLkSkWLFqx0a/Kjba2SzWIZX5TeOKjXi9k48c/PmnolRk8nV7JRyRjK35shP NLsKLg87hd8MehAiALtjRqVMOBFXhDo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-595-Q12u0JCjM-KMX_3yLKqJig-1; Fri, 16 Oct 2020 10:28:35 -0400 X-MC-Unique: Q12u0JCjM-KMX_3yLKqJig-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 675871007465; Fri, 16 Oct 2020 14:28:34 +0000 (UTC) Received: from localhost (ovpn-12-93.pek2.redhat.com [10.72.12.93]) by smtp.corp.redhat.com (Postfix) with ESMTP id BD52455766; Fri, 16 Oct 2020 14:28:30 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Chao Leng , Sagi Grimberg Subject: [PATCH 1/4] blk-mq: check rq->state explicitly in blk_mq_tagset_count_completed_rqs Date: Fri, 16 Oct 2020 22:28:08 +0800 Message-Id: <20201016142811.1262214-2-ming.lei@redhat.com> In-Reply-To: <20201016142811.1262214-1-ming.lei@redhat.com> References: <20201016142811.1262214-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org blk_mq_tagset_count_completed_rqs() is called from blk_mq_tagset_wait_completed_request() for draining requests to be completed remotely. What we need to do is to see request->state to switch to non-MQ_RQ_COMPLETE. So check MQ_RQ_COMPLETE explicitly in blk_mq_tagset_count_completed_rqs(). Meantime mark flush request as IDLE in its .end_io() for aligning to end normal request because flush request may stay in inflight tags in case of !elevator, so we need to change its state into IDLE. Cc: Chao Leng Cc: Sagi Grimberg Signed-off-by: Ming Lei --- block/blk-flush.c | 2 ++ block/blk-mq-tag.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index 53abb5c73d99..31a2ae04d3f0 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -231,6 +231,8 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) return; } + WRITE_ONCE(rq->state, MQ_RQ_IDLE); + if (fq->rq_status != BLK_STS_OK) error = fq->rq_status; diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 9c92053e704d..10ff8968b93b 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -367,7 +367,7 @@ static bool blk_mq_tagset_count_completed_rqs(struct request *rq, { unsigned *count = data; - if (blk_mq_request_completed(rq)) + if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE) (*count)++; return true; } From patchwork Fri Oct 16 14:28:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11841807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4727A14B2 for ; Fri, 16 Oct 2020 14:28:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 213CB208E4 for ; Fri, 16 Oct 2020 14:28:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="C8dF+73x" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408817AbgJPO2t (ORCPT ); Fri, 16 Oct 2020 10:28:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:55690 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408814AbgJPO2t (ORCPT ); Fri, 16 Oct 2020 10:28:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602858528; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SquF7vcULodKmU2T8C8jXbF8cJYghVDY1WRl/3cGs0s=; b=C8dF+73xRNFR8MXch9Fopd2fZJbvcs9F8ruJr/mAgbyz0+W5LYRuJPxKHCY2uyHGh/Rzn9 EPumugF7mP112O0Fkx8zieoxej3fAv7jvyMfdvno0QL3IcLEVOIjU0TsJkNWDCYZQI00LL sF9fgYIZRSDat94ssyDnJn9Ahn9AvBs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-388-rerpMqs8OXaOY1Rgc5ZcNA-1; Fri, 16 Oct 2020 10:28:44 -0400 X-MC-Unique: rerpMqs8OXaOY1Rgc5ZcNA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 054C057085; Fri, 16 Oct 2020 14:28:43 +0000 (UTC) Received: from localhost (ovpn-12-93.pek2.redhat.com [10.72.12.93]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E7CE6EF55; Fri, 16 Oct 2020 14:28:36 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Yi Zhang , Chao Leng , Sagi Grimberg Subject: [PATCH 2/4] blk-mq: think request as completed if it isn't IN_FLIGHT. Date: Fri, 16 Oct 2020 22:28:09 +0800 Message-Id: <20201016142811.1262214-3-ming.lei@redhat.com> In-Reply-To: <20201016142811.1262214-1-ming.lei@redhat.com> References: <20201016142811.1262214-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org MQ_RQ_COMPLETE is one transient state, because the .complete callback ends or requeues this request, then the request state is updated to IDLE. blk_mq_request_completed() is often used by driver for avoiding double completion with help of driver's specific sync approach. Such as, NVMe TCP calls blk_mq_request_completed() in its timeout handler and abort handler for avoiding double completion. If request's state is updated to IDLE in either one, another code path may think this request as not completed, and will complete it one more time. Then double completion is triggered. Yi reported[1] that 'refcount_t: underflow; use-after-free' of rq->ref is triggered in blktests(nvme/012) on one very slow machine. Fixes this issue by thinking request as completed if its state becomes not IN_FLIGHT. Reported-by: Yi Zhang Cc: Chao Leng Cc: Sagi Grimberg Signed-off-by: Ming Lei --- include/linux/blk-mq.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index b23eeca4d677..9a0c1f8ac42d 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -486,9 +486,15 @@ static inline int blk_mq_request_started(struct request *rq) return blk_mq_rq_state(rq) != MQ_RQ_IDLE; } +/* + * It is often called in abort handler for avoiding double completion, + * MQ_RQ_COMPLETE is one transient state because .complete callback + * may end or requeue this request, in either way the request is marked + * as IDLE. So return true if this request's state become not IN_FLIGHT. + */ static inline int blk_mq_request_completed(struct request *rq) { - return blk_mq_rq_state(rq) == MQ_RQ_COMPLETE; + return blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT; } void blk_mq_start_request(struct request *rq); From patchwork Fri Oct 16 14:28:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11841809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C40814B2 for ; Fri, 16 Oct 2020 14:28:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6674B208E4 for ; Fri, 16 Oct 2020 14:28:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gdsTg0Aa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408818AbgJPO2v (ORCPT ); Fri, 16 Oct 2020 10:28:51 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:38739 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408814AbgJPO2v (ORCPT ); Fri, 16 Oct 2020 10:28:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602858529; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lz3Ou5mJa3qRn8JaO0Ng0wU2ITlWqsK2TaLAz5ZaXd8=; b=gdsTg0AamYqEiAAQzS48I4islGAjFgkd0v/eKjOWDR1xYKu339m6o3RLp/aBzHGTNXGB0+ URrW0/To0SYcv0fNrjQOMFyBGgfa5B5KyTsQSBdhFKiD4DOaqVlimoAOfjUAk/Yr84W1wd QwFswxt1NcUwn7dngaLprXhbcyQxAJ4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-227-ANENXbAcPSyuCtPLGJdHug-1; Fri, 16 Oct 2020 10:28:47 -0400 X-MC-Unique: ANENXbAcPSyuCtPLGJdHug-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0171B186DD2A; Fri, 16 Oct 2020 14:28:46 +0000 (UTC) Received: from localhost (ovpn-12-93.pek2.redhat.com [10.72.12.93]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3D2556EF55; Fri, 16 Oct 2020 14:28:44 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Chao Leng , Sagi Grimberg , Yi Zhang Subject: [PATCH 3/4] nvme: tcp: fix race between timeout and normal completion Date: Fri, 16 Oct 2020 22:28:10 +0800 Message-Id: <20201016142811.1262214-4-ming.lei@redhat.com> In-Reply-To: <20201016142811.1262214-1-ming.lei@redhat.com> References: <20201016142811.1262214-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org NVMe TCP timeout handler allows to abort request directly when the controller isn't in LIVE state. nvme_tcp_error_recovery() updates controller state as RESETTING, and schedule reset work function. If new timeout comes before the work function is called, the new timedout request will be aborted directly, however at that time, the controller isn't shut down yet, then timeout abort vs. normal completion race will be triggered. Fix the race by the following approach: 1) aborting timed out request directly only in case that controller is in CONNECTING and DELETING state. In the two states, controller has been shutdown, so it is safe to do so; Also, it is enough to recovery controller in this way, because we only stop/destroy queues during RESETTING, and cancel all in-flight requests, no new request is required in RESETTING. 2) delay unquiesce io queues and admin queue until controller is LIVE because it isn't necessary to start queues during RESETTING. Instead, this way may risk timeout vs. normal completion race because we need to abort timed-out request directly during CONNECTING state for setting up controller. CC: Chao Leng Cc: Sagi Grimberg Reported-by: Yi Zhang Signed-off-by: Ming Lei --- drivers/nvme/host/tcp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index d6a3e1487354..56ac61a90c1b 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1886,7 +1886,6 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new) static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl, bool remove) { - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); blk_mq_quiesce_queue(ctrl->admin_q); nvme_tcp_stop_queue(ctrl, 0); if (ctrl->admin_tagset) { @@ -1897,15 +1896,13 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl, if (remove) blk_mq_unquiesce_queue(ctrl->admin_q); nvme_tcp_destroy_admin_queue(ctrl, remove); - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); } static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, bool remove) { - mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); if (ctrl->queue_count <= 1) - goto out; + return; blk_mq_quiesce_queue(ctrl->admin_q); nvme_start_freeze(ctrl); nvme_stop_queues(ctrl); @@ -1918,8 +1915,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, if (remove) nvme_start_queues(ctrl); nvme_tcp_destroy_io_queues(ctrl, remove); -out: - mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); } static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl) @@ -1938,6 +1933,8 @@ static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl) ctrl->opts->reconnect_delay * HZ); } else { dev_info(ctrl->device, "Removing controller...\n"); + nvme_start_queues(ctrl); + blk_mq_unquiesce_queue(ctrl->admin_q); nvme_delete_ctrl(ctrl); } } @@ -2030,11 +2027,11 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work) struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl; nvme_stop_keep_alive(ctrl); + + mutex_lock(&tcp_ctrl->teardown_lock); nvme_tcp_teardown_io_queues(ctrl, false); - /* unquiesce to fail fast pending requests */ - nvme_start_queues(ctrl); nvme_tcp_teardown_admin_queue(ctrl, false); - blk_mq_unquiesce_queue(ctrl->admin_q); + mutex_unlock(&tcp_ctrl->teardown_lock); if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) { /* state change failure is ok if we started ctrl delete */ @@ -2051,6 +2048,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown) cancel_work_sync(&to_tcp_ctrl(ctrl)->err_work); cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work); + mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); nvme_tcp_teardown_io_queues(ctrl, shutdown); blk_mq_quiesce_queue(ctrl->admin_q); if (shutdown) @@ -2058,6 +2056,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown) else nvme_disable_ctrl(ctrl); nvme_tcp_teardown_admin_queue(ctrl, shutdown); + mutex_unlock(&to_tcp_ctrl(ctrl)->teardown_lock); } static void nvme_tcp_delete_ctrl(struct nvme_ctrl *ctrl) @@ -2192,19 +2191,20 @@ nvme_tcp_timeout(struct request *rq, bool reserved) "queue %d: timeout request %#x type %d\n", nvme_tcp_queue_id(req->queue), rq->tag, pdu->hdr.type); - if (ctrl->state != NVME_CTRL_LIVE) { + /* + * During CONNECTING or DELETING, the controller has been shutdown, + * so it is safe to abort the request directly, otherwise timeout + * vs. normal completion will be triggered. + */ + if (ctrl->state == NVME_CTRL_CONNECTING || + ctrl->state == NVME_CTRL_DELETING || + ctrl->state == NVME_CTRL_DELETING_NOIO) { /* - * If we are resetting, connecting or deleting we should - * complete immediately because we may block controller - * teardown or setup sequence + * If we are connecting we should complete immediately because + * we may block controller setup sequence * - ctrl disable/shutdown fabrics requests * - connect requests * - initialization admin requests - * - I/O requests that entered after unquiescing and - * the controller stopped responding - * - * All other requests should be cancelled by the error - * recovery work, so it's fine that we fail it here. */ nvme_tcp_complete_timed_out(rq); return BLK_EH_DONE; From patchwork Fri Oct 16 14:28:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11841811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AB1214B2 for ; Fri, 16 Oct 2020 14:29:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 428FF2087D for ; Fri, 16 Oct 2020 14:29:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Rtg1zbBS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408819AbgJPO27 (ORCPT ); Fri, 16 Oct 2020 10:28:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:51102 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408814AbgJPO27 (ORCPT ); Fri, 16 Oct 2020 10:28:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602858538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8dCETTUY0srMfYpMQV60y38uEv/90/o462XzHM99kUk=; b=Rtg1zbBSFhXFG4yH0EbgkjnCGRlVkn2l+/22neiH4JZFeBKgpK+unFtMnYqsFATnL58b0P EAKwU9KIn7RR1ZVJYZ5rU4igCkU8n+6NHCIg+2/yxzbFPc+otoZ8pbxOzjhDTUyZDWHFHd v6S59TC8m/Sq6I7PBcHW8ICE85h5rx4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-275-uqwPH2clO-mYbcEszgjLkg-1; Fri, 16 Oct 2020 10:28:56 -0400 X-MC-Unique: uqwPH2clO-mYbcEszgjLkg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 917F586ABD1; Fri, 16 Oct 2020 14:28:54 +0000 (UTC) Received: from localhost (ovpn-12-93.pek2.redhat.com [10.72.12.93]) by smtp.corp.redhat.com (Postfix) with ESMTP id 038AA5C1BD; Fri, 16 Oct 2020 14:28:47 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Keith Busch Cc: Ming Lei , Chao Leng , Sagi Grimberg , Yi Zhang Subject: [PATCH 4/4] nvme: tcp: complete non-IO requests atomically Date: Fri, 16 Oct 2020 22:28:11 +0800 Message-Id: <20201016142811.1262214-5-ming.lei@redhat.com> In-Reply-To: <20201016142811.1262214-1-ming.lei@redhat.com> References: <20201016142811.1262214-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org During controller's CONNECTING state, admin/fabric/connect requests are submitted for recovery, and we allow to abort this request directly in time out handler for not blocking the setup step. So timout vs. normal completion race may be triggered on these requests. Add atomic completion for requests from connect/fabric/admin queue for avoiding the race. CC: Chao Leng Cc: Sagi Grimberg Reported-by: Yi Zhang Signed-off-by: Ming Lei --- drivers/nvme/host/tcp.c | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 56ac61a90c1b..654061abdc5a 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -30,6 +30,8 @@ static int so_priority; module_param(so_priority, int, 0644); MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority"); +#define REQ_STATE_COMPLETE 0 + enum nvme_tcp_send_state { NVME_TCP_SEND_CMD_PDU = 0, NVME_TCP_SEND_H2C_PDU, @@ -56,6 +58,8 @@ struct nvme_tcp_request { size_t offset; size_t data_sent; enum nvme_tcp_send_state state; + + unsigned long comp_state; }; enum nvme_tcp_queue_flags { @@ -469,6 +473,33 @@ static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl) queue_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work); } +/* + * requests originated from admin, fabrics and connect_q have to be + * completed atomically because we don't cover the race between timeout + * and normal completion for these queues. + */ +static inline bool nvme_tcp_need_atomic_complete(struct request *rq) +{ + return !rq->rq_disk; +} + +static inline void nvme_tcp_clear_rq_complete(struct request *rq) +{ + struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); + + if (unlikely(nvme_tcp_need_atomic_complete(rq))) + clear_bit(REQ_STATE_COMPLETE, &req->comp_state); +} + +static inline bool nvme_tcp_mark_rq_complete(struct request *rq) +{ + struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); + + if (unlikely(nvme_tcp_need_atomic_complete(rq))) + return !test_and_set_bit(REQ_STATE_COMPLETE, &req->comp_state); + return true; +} + static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, struct nvme_completion *cqe) { @@ -483,7 +514,8 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, return -EINVAL; } - if (!nvme_try_complete_req(rq, cqe->status, cqe->result)) + if (nvme_tcp_mark_rq_complete(rq) && + !nvme_try_complete_req(rq, cqe->status, cqe->result)) nvme_complete_rq(rq); queue->nr_cqe++; @@ -674,7 +706,8 @@ static inline void nvme_tcp_end_request(struct request *rq, u16 status) { union nvme_result res = {}; - if (!nvme_try_complete_req(rq, cpu_to_le16(status << 1), res)) + if (nvme_tcp_mark_rq_complete(rq) && + !nvme_try_complete_req(rq, cpu_to_le16(status << 1), res)) nvme_complete_rq(rq); } @@ -2173,7 +2206,7 @@ static void nvme_tcp_complete_timed_out(struct request *rq) /* fence other contexts that may complete the command */ mutex_lock(&to_tcp_ctrl(ctrl)->teardown_lock); nvme_tcp_stop_queue(ctrl, nvme_tcp_queue_id(req->queue)); - if (!blk_mq_request_completed(rq)) { + if (nvme_tcp_mark_rq_complete(rq) && !blk_mq_request_completed(rq)) { nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD; blk_mq_complete_request(rq); } @@ -2315,6 +2348,7 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx, if (unlikely(ret)) return ret; + nvme_tcp_clear_rq_complete(rq); blk_mq_start_request(rq); nvme_tcp_queue_request(req, true, bd->last);