From patchwork Wed Dec 27 21:49:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 10134079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 830F96020A for ; Wed, 27 Dec 2017 21:50:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FEE02D597 for ; Wed, 27 Dec 2017 21:50:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 602992D59B; Wed, 27 Dec 2017 21:50:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 738C62D597 for ; Wed, 27 Dec 2017 21:50:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752622AbdL0VuE (ORCPT ); Wed, 27 Dec 2017 16:50:04 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:33096 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752382AbdL0VuC (ORCPT ); Wed, 27 Dec 2017 16:50:02 -0500 Received: by mail-wr0-f194.google.com with SMTP id v21so27065491wrc.0 for ; Wed, 27 Dec 2017 13:50:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=fSF8F+Vrxq5pPitwL2KQ43m3GbFr/GO8IZq864VVH98=; b=X1xfqzbh2k2XTD7lwpoJhFThceidbtwdI13qBNBUiiQDKoVm1cvLGOZESDwTc6OQv7 QOwqqI/CFgigb3sl/tW7XOKPp6jnDh2UZ0STkGTKb2Rt1oiiLHC0ntXvRagUnlGi1Vx6 BdWpg/4hL59Kr2T1Oz6lq8lFcu2PAYJG1v3NTWIKSWThYBYq8GiFWeloqnvSCIuSF2St OPbITq4sfZNqB5F6ZgDzH+sQj2uWOv86xdFxhoOl2VQae1xZvzYy13+6Oi6OEIhIAlLh VWxo+XiZbL3GP2qdmZ/pkXqqNvGX++7GdOtBKgBhk2qzWhkYSsznrQw2CL4HNR0z/DVm EKBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=fSF8F+Vrxq5pPitwL2KQ43m3GbFr/GO8IZq864VVH98=; b=Y3Q2VIB2wXtzaqTpJiCXI1H2UQ21LURMdvTK6EP9LI9HEh/RLzNVDYrCelmSeXT+hv W6Iko35SUcsfmfHQGfXEKHVBvePlG2y5RcJyfzMudy1NqAWW82pfrOcOHcCH1/Y/noH4 ipT69iT7UVUYo3Cyxq0Md1Gp7at70gj0rO6rNuItobEIQyFqer6oHRe8uTbKlEpKT1L8 F/6Z4qfpsxHguXxu1o0cHql7bsF+bSTra393RQ/P4JML4Oik/h+ANLMcpJhzm8CXrXc0 7cftmEetv//CWEpulj4WUP76lIlcgXfLL98vJneib4onXoYG+q9WTRKBtER1Z769wLUS fdcw== X-Gm-Message-State: AKGB3mInAvFfDiGFf4tr59rmuZoB9m2mSvoXrmI+AOq64dYuEZFSkAUN wZvKz4Gvrpu6byr1Qj5nbaT+jA== X-Google-Smtp-Source: ACJfBosTiqj9khsO2ktJEYA/zuv74TjjYvFRbqxJ9iLwBYKoGSNsYMDrWm4oP9b+oiIokPdET1sENw== X-Received: by 10.223.141.133 with SMTP id o5mr31129881wrb.35.1514411401558; Wed, 27 Dec 2017 13:50:01 -0800 (PST) Received: from ziepe.ca (S010614cc2056d97f.ed.shawcable.net. [70.74.179.152]) by smtp.gmail.com with ESMTPSA id g16sm47464933wra.38.2017.12.27.13.50.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 13:50:00 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.86_2) (envelope-from ) id 1eUJaG-0008F2-FB; Wed, 27 Dec 2017 14:49:56 -0700 Date: Wed, 27 Dec 2017 14:49:56 -0700 From: Jason Gunthorpe To: Leon Romanovsky Cc: Doug Ledford , linux-rdma@vger.kernel.org, Daniel Jurgens , Majd Dibbiny , Mark Bloch , Moni Shoua , Yishai Hadas , Leon Romanovsky , stable@vger.kernel.org Subject: Re: [PATCH rdma-rc 1/4] RDMA/core: Enforce requirement to hold lists_rwsem semaphore Message-ID: <20171227214956.GI25436@ziepe.ca> References: <20171224115458.27577-1-leon@kernel.org> <20171224115458.27577-2-leon@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20171224115458.27577-2-leon@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, Dec 24, 2017 at 01:54:55PM +0200, Leon Romanovsky wrote: > From: Leon Romanovsky > > Add comment and run time check to the __ib_device_get_by_index() > function to remind that the caller should hold lists_rwsem semaphore. Upon closer inspecting, this is not entirely right. There is no bug here, the locking is clearly explained in the comment for device_mutex. lists_rwsem's down_write must only be done while within the device_mutex. Therefore holding the device_mutex implies there can be no concurrent writers, and any read lock of lists_rwsem is not necessary. > struct ib_device *__ib_device_get_by_index(u32 index) > { > struct ib_device *device; > > + WARN_ON_ONCE(!rwsem_is_locked(&lists_rwsem)); So this is wrong, it needs to consider the device_mutex too > @@ -526,8 +531,8 @@ int ib_register_device(struct ib_device *device, > if (!add_client_context(device, client) && client->add) > client->add(device); > > - device->index = __dev_new_index(); > down_write(&lists_rwsem); > + device->index = __dev_new_index(); > list_add_tail(&device->core_list, &device_list); > up_write(&lists_rwsem); And this sequence was OK before - the only thing that needs to be protected by the write lock is the actual list manipulation, not the search. The locking here has become a bit nutzo, and the sequencing is wrong too.. Below is a draft of tidying at least some of this.. Can you work from here? Will drop this patch. * Get rid of going_down, we can use list_del and list_splice held under the locks to prevent access to the ib_client_data struct * This allow simplifiying the removal locking to only hold write locks while doing the list edit * Correct call ordering of removal - client remove should be called before we break set/get_client_data, otherwise the client has no control over when those calls start to fail. * Client remove must prevent other threads from calling set/get_client_data - so safety checks now become WARN_ON * The kfree doesn't need locking since the list manipulation have no dangling pointer anymore. * Add assert ASSERT_LISTS_READ_LOCKED in all the right places Jason --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 3aeaf11d0a83b7..9e973483b7b91d 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -52,12 +52,15 @@ MODULE_DESCRIPTION("core kernel InfiniBand API"); MODULE_LICENSE("Dual BSD/GPL"); struct ib_client_data { + /* + * list uses a dual locking scheme, readers can either grab the global + * read lists_rwsem or the per-device client_data_lock spin + * lock. writers must grab both the write lists_rwsem and the spin + * lock. + */ struct list_head list; struct ib_client *client; void * data; - /* The device or client is going down. Do not call client or device - * callbacks other than remove(). */ - bool going_down; }; struct workqueue_struct *ib_comp_wq; @@ -84,6 +87,16 @@ static LIST_HEAD(client_list); static DEFINE_MUTEX(device_mutex); static DECLARE_RWSEM(lists_rwsem); +/* + * Used to indicate the function is going to read from client_data_list/list + * or device_list/core_list. + */ +static void ASSERT_LISTS_READ_LOCKED(void) +{ + WARN_ON_ONCE(!rwsem_is_locked(&lists_rwsem) && + !mutex_is_locked(&device_mutex)); +} + static int ib_security_change(struct notifier_block *nb, unsigned long event, void *lsm_data); static void ib_policy_change_task(struct work_struct *work); @@ -141,7 +154,7 @@ struct ib_device *__ib_device_get_by_index(u32 index) { struct ib_device *device; - WARN_ON_ONCE(!rwsem_is_locked(&lists_rwsem)); + ASSERT_LISTS_READ_LOCKED(); list_for_each_entry(device, &device_list, core_list) if (device->index == index) @@ -154,6 +167,8 @@ static struct ib_device *__ib_device_get_by_name(const char *name) { struct ib_device *device; + ASSERT_LISTS_READ_LOCKED(); + list_for_each_entry(device, &device_list, core_list) if (!strncmp(name, device->name, IB_DEVICE_NAME_MAX)) return device; @@ -172,6 +187,8 @@ static int alloc_name(char *name) if (!inuse) return -ENOMEM; + ASSERT_LISTS_READ_LOCKED(); + list_for_each_entry(device, &device_list, core_list) { if (!sscanf(device->name, name, &i)) continue; @@ -292,7 +309,6 @@ static int add_client_context(struct ib_device *device, struct ib_client *client context->client = client; context->data = NULL; - context->going_down = false; down_write(&lists_rwsem); spin_lock_irqsave(&device->client_data_lock, flags); @@ -531,8 +547,8 @@ int ib_register_device(struct ib_device *device, if (!add_client_context(device, client) && client->add) client->add(device); - down_write(&lists_rwsem); device->index = __dev_new_index(); + down_write(&lists_rwsem); list_add_tail(&device->core_list, &device_list); up_write(&lists_rwsem); mutex_unlock(&device_mutex); @@ -559,23 +575,27 @@ void ib_unregister_device(struct ib_device *device) { struct ib_client_data *context, *tmp; unsigned long flags; + struct list_head client_data_tmp; + + INIT_LIST_HEAD(&client_data_tmp); mutex_lock(&device_mutex); + list_for_each_entry(context, &device->client_data_list, list) + if (context->client->remove) + context->client->remove(device, context->data); + down_write(&lists_rwsem); - list_del(&device->core_list); + spin_lock_irqsave(&device->client_data_lock, flags); - list_for_each_entry_safe(context, tmp, &device->client_data_list, list) - context->going_down = true; + list_splice_init(&device->client_data_list, &client_data_tmp); spin_unlock_irqrestore(&device->client_data_lock, flags); - downgrade_write(&lists_rwsem); - list_for_each_entry_safe(context, tmp, &device->client_data_list, - list) { - if (context->client->remove) - context->client->remove(device, context->data); - } - up_read(&lists_rwsem); + list_for_each_entry_safe(context, tmp, &client_data_tmp, list) + kfree(context); + + list_del(&device->core_list); + up_write(&lists_rwsem); ib_device_unregister_rdmacg(device); ib_device_unregister_sysfs(device); @@ -587,13 +607,6 @@ void ib_unregister_device(struct ib_device *device) ib_security_destroy_port_pkey_list(device); kfree(device->port_pkey_list); - down_write(&lists_rwsem); - spin_lock_irqsave(&device->client_data_lock, flags); - list_for_each_entry_safe(context, tmp, &device->client_data_list, list) - kfree(context); - spin_unlock_irqrestore(&device->client_data_lock, flags); - up_write(&lists_rwsem); - device->reg_state = IB_DEV_UNREGISTERED; } EXPORT_SYMBOL(ib_unregister_device); @@ -617,6 +630,8 @@ int ib_register_client(struct ib_client *client) mutex_lock(&device_mutex); + ASSERT_LISTS_READ_LOCKED(); + list_for_each_entry(device, &device_list, core_list) if (!add_client_context(device, client) && client->add) client->add(device); @@ -654,33 +669,27 @@ void ib_unregister_client(struct ib_client *client) list_for_each_entry(device, &device_list, core_list) { struct ib_client_data *found_context = NULL; - down_write(&lists_rwsem); - spin_lock_irqsave(&device->client_data_lock, flags); - list_for_each_entry_safe(context, tmp, &device->client_data_list, list) + list_for_each_entry_safe(context, tmp, &device->client_data_list, list) { if (context->client == client) { - context->going_down = true; found_context = context; break; } - spin_unlock_irqrestore(&device->client_data_lock, flags); - up_write(&lists_rwsem); + } - if (client->remove) - client->remove(device, found_context ? - found_context->data : NULL); + if (found_context) { + if (client->remove) + client->remove(device, found_context->data); - if (!found_context) { - pr_warn("No client context found for %s/%s\n", - device->name, client->name); - continue; - } + down_write(&lists_rwsem); + spin_lock_irqsave(&device->client_data_lock, flags); + list_del(&context->list); + spin_unlock_irqrestore(&device->client_data_lock, + flags); + up_write(&lists_rwsem); - down_write(&lists_rwsem); - spin_lock_irqsave(&device->client_data_lock, flags); - list_del(&found_context->list); - kfree(found_context); - spin_unlock_irqrestore(&device->client_data_lock, flags); - up_write(&lists_rwsem); + kfree(found_context); + } else + WARN_ON(!found_context); } mutex_unlock(&device_mutex); @@ -735,9 +744,7 @@ void ib_set_client_data(struct ib_device *device, struct ib_client *client, goto out; } - pr_warn("No client context found for %s/%s\n", - device->name, client->name); - + WARN_ON(true); out: spin_unlock_irqrestore(&device->client_data_lock, flags); } @@ -1133,9 +1140,6 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, list_for_each_entry(context, &dev->client_data_list, list) { struct ib_client *client = context->client; - if (context->going_down) - continue; - if (client->get_net_dev_by_params) { net_dev = client->get_net_dev_by_params(dev, port, pkey, gid, addr,