From patchwork Sun Dec 3 16:53:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 10089331 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D8D1B60327 for ; Sun, 3 Dec 2017 16:53:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C059A28E86 for ; Sun, 3 Dec 2017 16:53:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B330B28E8B; Sun, 3 Dec 2017 16:53:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62B8128E86 for ; Sun, 3 Dec 2017 16:53:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752155AbdLCQxV (ORCPT ); Sun, 3 Dec 2017 11:53:21 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:36744 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752127AbdLCQxU (ORCPT ); Sun, 3 Dec 2017 11:53:20 -0500 Received: by mail-wm0-f52.google.com with SMTP id b76so10524456wmg.1 for ; Sun, 03 Dec 2017 08:53:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=GM9eW0YvXzwdoxooBbnMBi6Yd28t3HUAflRGqsgaqI0=; b=hBqhpusDwhzopT3De61thFMFMdNNCp6VjGIJLv/E5AKryh6YwDrYy+0aNxJXN3JS5a Ehv/TE5pa+hmihBDmuLOcqhgz0nZR7TTwZjcL+HOy8PPNsmieBN6xKhplziQbGi+3tGG +up21Kw/jIvd2Al+Xuhi91qOqWz1YO/Q6OBk3CjVz6+JQmeafdtuS5hmoDZu8Vi0L6ze kli0wDqWDW1H7Ypkx1eOfePALD4rAKRw7/42ki6sB4IJmLbKfCRIJjXrrhDMFfcK99Vw X/7O0JGs0IgGWo3Fwi3HhsF8ejp+FhBx9cdofr4Mm/FjnwWBzYwUztgrIVFZDFhWjaAj +J/g== X-Gm-Message-State: AKGB3mL/iQcQ0K6gt0zrCg+hhBqlufV2Z/12vAQcQQdnXQSHoj0Y+hrd A7rC+mJzB/DwGNPXvMM7uYd6c0Qa X-Google-Smtp-Source: AGs4zMblON+ReLQl1dbHIeNyjNIVSMk9qMtvZ61HQRyC8gzvmDpe2PPyzwacwSQD/FH793tkGzCc8w== X-Received: by 10.28.153.150 with SMTP id b144mr1362931wme.93.1512319999146; Sun, 03 Dec 2017 08:53:19 -0800 (PST) Received: from [192.168.64.169] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id i8sm6448359wmh.42.2017.12.03.08.53.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 03 Dec 2017 08:53:18 -0800 (PST) Subject: Re: possible core cq bug To: Steve Wise , linux-rdma@vger.kernel.org References: <052101d36ba2$7e5b4120$7b11c360$@opengridcomputing.com> <006c01d36c4a$c79cfb50$56d6f1f0$@opengridcomputing.com> From: Sagi Grimberg Message-ID: Date: Sun, 3 Dec 2017 18:53:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <006c01d36c4a$c79cfb50$56d6f1f0$@opengridcomputing.com> Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP >>> If an application creates its cq for DIRECT poll mode using ib_create_cq() >>> instead of ib_alloc_cq(), and then uses ib_drain_qp() to drain its qp, >>> ib_drain_sq/rq() will always hang forever because cq->wc is NULL. IE >>> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does. Yet the >>> __ib_process_cq() requires cq->wc to actually complete any completions >> and >>> calling the cqe_done function. >>> >>> Is this a bug in the CQ core code or the application? >> >> Take a look in __ib_drain_rq/__ib_drain_sq for >> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls >> the completion queue from time to time... > > Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL. Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That indeed would be a bug. We can WARN on it as well so the application will know to allocate its CQ with ib_alloc_cq. Does something like this makes sense? --- -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index f2ae75fa3128..90eac56b5f1a 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget) */ int ib_process_cq_direct(struct ib_cq *cq, int budget) { - WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT); + WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc); return __ib_process_cq(cq, budget); }