From patchwork Mon Jun 18 11:27:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10470951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 32A876020C for ; Mon, 18 Jun 2018 11:27:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21D01287DA for ; Mon, 18 Jun 2018 11:27:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 15EE628992; Mon, 18 Jun 2018 11:27:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 657F1287DA for ; Mon, 18 Jun 2018 11:27:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934140AbeFRL11 (ORCPT ); Mon, 18 Jun 2018 07:27:27 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33132 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933693AbeFRL10 (ORCPT ); Mon, 18 Jun 2018 07:27:26 -0400 Received: by mail-wm0-f65.google.com with SMTP id z6-v6so14636048wma.0 for ; Mon, 18 Jun 2018 04:27:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dev-mellanox-co-il.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=5sAYW1kCFojoNphvxJRIC5jl4Zokw4DfcpUgxhXK0o8=; b=abUsDWVMIJoC1gYiAjvAYcQ60fqdqyT8MZmSyxl+zk3uTXwbDyUv5UppQxUfQy1YtO v7xTWKtXyUEhkgmM/fmHMmHAihHBRgZpRrHkjHaydD4U/8Yy3l7VGSKkYVLIL3hKjkGX qoBVMRKA8oS73nOxlROOi8OFCmdOiuR/5tQ9NgmjSVSeT8ojCL6cj6OGht6hX0722qQ5 Gxxm3L5E104skVxSu7hjI+xbERhXc7VKb7dnkv5K6YWFbVVGsCdyU5sixnoziD+vSYBz YudlH5tsbuGCSrsKCd7dyLDE+uQ7FvGcu7G07aAMtZZHP+IO6dVgtcMvPd2Fp9y4XVYF axZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=5sAYW1kCFojoNphvxJRIC5jl4Zokw4DfcpUgxhXK0o8=; b=MMnLRx1JvcNKFe0Jtjx/dM6UYXwcHwWJSP4PQOKlxfkCGm0C1M4cyZ4TeULgPnaxyk IJ5AHklS1XsfJjgVasn3OeiW8sF9ShelTdLNyN2P5SVE0G8jkxJmPTR2wy64jVXAcJBx fTVnMbTW15Q7Rr9prHM3bviP9xSJffnxYxjhhLVNWL+/YzgHni/Wf6BNXb5OfD1YqSKN l4nscHQf0VRyFAGKgeO1m8wFCym49W33kJtvzjxBxjnV7jIqgsgZeVa1ohVFk/aBRpep eImVqD0VnmD9kaiqbabLKBtiqmYfXnw6hGLyw83rOVJ3AJ3VmFEVrwYOgiRJ+0ruczEx AIWw== X-Gm-Message-State: APt69E1WK5JxJwap6kr2+5QGc3BD3ICdPrWjqFq2DBNToayjKj7epdkz 8GThTHVt1qMs2Mcj7djv/BOoZw== X-Google-Smtp-Source: ADUXVKLNNS47+ByujHaV9LYf7aUBFVOu4Fd6ABt8KJIMAq+89YaLXWQW9YuxkNA/0dt+oOsC2T9RNQ== X-Received: by 2002:a1c:ae94:: with SMTP id x142-v6mr7947829wme.22.1529321244749; Mon, 18 Jun 2018 04:27:24 -0700 (PDT) Received: from [10.8.1.82] ([193.47.165.251]) by smtp.googlemail.com with ESMTPSA id d3-v6sm19313991wri.24.2018.06.18.04.27.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Jun 2018 04:27:23 -0700 (PDT) Subject: Re: [PATCH rdma-next v2 09/20] IB/core: Improve uverbs_cleanup_ucontext algorithm To: Jason Gunthorpe Cc: Leon Romanovsky , Doug Ledford , Leon Romanovsky , RDMA mailing list , Joonas Lahtinen , Matan Barak , Yishai Hadas , Saeed Mahameed , linux-netdev , Majd Dibbiny References: <20180617100006.30663-1-leon@kernel.org> <20180617100006.30663-10-leon@kernel.org> <20180617195106.eai7iajx4y5w3ii6@mellanox.com> From: Yishai Hadas Message-ID: Date: Mon, 18 Jun 2018 14:27:22 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180617195106.eai7iajx4y5w3ii6@mellanox.com> Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 6/17/2018 10:51 PM, Jason Gunthorpe wrote: > On Sun, Jun 17, 2018 at 12:59:55PM +0300, Leon Romanovsky wrote: > >> +void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed) >> +{ >> /* >> * Waits for all remove_commit and alloc_commit to finish. Logically, We >> * want to hold this forever as the context is going to be destroyed, >> * but we'll release it since it causes a "held lock freed" BUG message. >> */ >> down_write(&ucontext->cleanup_rwsem); >> + while (!list_empty(&ucontext->uobjects)) >> + if (__uverbs_cleanup_ucontext(ucontext, RDMA_REMOVE_DESTROY)) >> + /* No entry was cleaned-up successfully during this iteration */ >> + break; > > No, this isn't right, it must remain REMOVE or CLOSE here. The enum is > a signal to the driver what is going on. DESTROY is only for user > triggered destroy called in a user context. > The algorithm must enable the drivers an option to fully cleanup their resources as was done before this change. Using REMOVE or CLOSE without some following change won't do the work as the infrastructure (e.g. remove_commit_idr_uobject) and other IB cleanup functions during the road (e.g. uverbs_free_qp, uverbs_free_cq) will force cleanup of some memory/idr/ref resources and prevent a second successful iteration in case of a failure. For that reason the initial iteration should be with some relaxed mode and just later in case were left uncleaned-up resources the code should use the REMOVE/CLOSE option. However, I do agree that we need to preserve the original signal to let downstream layers to know what happened, at the moment it looks like there is one place that it's even a must as part of uverbs_hot_unplug_completion_event_file() where 'RDMA_REMOVE_DRIVER_REMOVE' is used explicitly as part of the cleanup flow. For that I suggest below [1] patch which replaces RDMA_REMOVE_DESTROY in the first iteration with RDMA_REMOVE_DRIVER_REMOVE_RELAX/ RDMA_REMOVE_CLOSE_RELAX new enum values to do the job. > There needs to be some kind of guarenteed return from the driver that > destroy is failing due to elevated refcounts, and not some other > reason.. This is just checking for any ret? We can't rely on all drivers' return codes from legacy/current firmware code for all objects to return a specific ret code for that case. Even if we had such, the second iteration where we should force cleanup might still fail with that ret code and a kernel memory leak would occur. > >> - while (!list_empty(&ucontext->uobjects)) { >> - struct ib_uobject *obj, *next_obj; >> - unsigned int next_order = UINT_MAX; >> + if (!list_empty(&ucontext->uobjects)) >> + __uverbs_cleanup_ucontext(ucontext, device_removed ? >> + RDMA_REMOVE_DRIVER_REMOVE : RDMA_REMOVE_CLOSE); > > Failure to cleanup is a driver bug, and should be reported with > WARN_ON. This is also mis using the remove enum, CLOSE is not a > 'bigger hammer' > The patch saves the previous behavior that set a warn message [2] and not a WARN_ON, if you think that WARN_ON is better we can change to. [2] pr_warn("ib_uverbs: unable to remove uobject id %d err %d\n", obj->id, err); The suggested patch on top of current can look like below. ++ b/drivers/infiniband/core/rdma_core.c @@ -360,9 +360,10 @@ static int __must_check remove_commit_idr_uobject(struct ib_uobject *uobj, /* * We can only fail gracefully if the user requested to destroy the - * object. In the rest of the cases, just remove whatever you can. + * object or when a retry may be called upon an error. + * In the rest of the cases, just remove whatever you can. */ - if (why == RDMA_REMOVE_DESTROY && ret) + if (ret && ib_is_remove_retry(why)) return ret; ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device, @@ -393,7 +394,7 @@ static int __must_check remove_commit_fd_uobject(struct ib_uobject *uobj, container_of(uobj, struct ib_uobject_file, uobj); int ret = fd_type->context_closed(uobj_file, why); - if (why == RDMA_REMOVE_DESTROY && ret) + if (ret && ib_is_remove_retry(why)) return ret; //Here is the specific place that checks for RDMA_REMOVE_DRIVER_REMOVE. // it should do the work also when RDMA_REMOVE_DRIVER_REMOVE was called. @@ -187,7 +187,7 @@ static int uverbs_hot_unplug_completion_event_file(struct ib_uobject_file *uobj_ event_queue->is_closed = 1; spin_unlock_irq(&event_queue->lock); - if (why == RDMA_REMOVE_DRIVER_REMOVE) { + if (why == RDMA_REMOVE_DRIVER_REMOVE || why == RDMA_REMOVE_DRIVER_REMOVE_RELAX) { wake_up_interruptible(&event_queue->poll_wait); kill_fasync(&event_queue->async_queue, SIGIO, POLL_IN); } What do you think ? Yishai --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 4685ef5..6b03ca7 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1468,8 +1468,21 @@ enum rdma_remove_reason { RDMA_REMOVE_DRIVER_REMOVE, /* Context is being cleaned-up, but commit was just completed */ RDMA_REMOVE_DURING_CLEANUP, + /* Driver is being hot-unplugged, retry may be called upon an error */ + RDMA_REMOVE_DRIVER_REMOVE_RELAX, + /* Context deletion, retry may be called upon an error */ + RDMA_REMOVE_CLOSE_RELAX, }; +static inline bool ib_is_remove_retry(enum rdma_remove_reason why) +{ + if (why == RDMA_REMOVE_DESTROY || why == RDMA_REMOVE_DRIVER_REMOVE_RELAX || + why == RDMA_REMOVE_CLOSE_RELAX) + return true; + + return false; +} + // In the new algorithm below enums will be used instead of //RDMA_REMOVE_DESTROY @@ -700,7 +701,9 @@ void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed) */ down_write(&ucontext->cleanup_rwsem); while (!list_empty(&ucontext->uobjects)) - if (__uverbs_cleanup_ucontext(ucontext, RDMA_REMOVE_DESTROY)) + if (__uverbs_cleanup_ucontext(ucontext, device_removed ? + RDMA_REMOVE_DRIVER_REMOVE_RELAX : + RDMA_REMOVE_CLOSE_RELAX)) /* No entry was cleaned-up successfully during this iteration */ break; // Below logic will be replaced in all applicable places, I just put few // of to demonstrate the solution. --- a/drivers/infiniband/core/uverbs_std_types_cq.c +++ b/drivers/infiniband/core/uverbs_std_types_cq.c @@ -44,7 +44,7 @@ static int uverbs_free_cq(struct ib_uobject *uobject, int ret; ret = ib_destroy_cq(cq); - if (!ret || why != RDMA_REMOVE_DESTROY) + if (!ret || !ib_is_remove_retry(why)) ib_uverbs_release_ucq(uobject->context->ufile, ev_queue ? container_of(ev_queue, struct ib_uverbs_completion_event_file,