[v8,09/12] sysfs: fix deadlock race with module removal

When driver sysfs attributes use a lock also used on module removal we
can race to deadlock. This happens when for instance a sysfs file on
a driver is used, then at the same time we have module removal call
trigger. The module removal call code holds a lock, and then the
driver's sysfs file entry waits for the same lock. While holding the
lock the module removal tries to remove the sysfs entries, but these
cannot be removed yet as one is waiting for a lock. This won't complete
as the lock is already held. Likewise module removal cannot complete,
and so we deadlock.

This can now be easily reproducible with our sysfs selftest as follows:

./tools/testing/selftests/sysfs/sysfs.sh -t 0027

This uses a local driver lock. Test 0028 can also be used, that uses
the rtnl_lock():

./tools/testing/selftests/sysfs/sysfs.sh -t 0028

To fix this we extend the struct kernfs_node with a module reference
and use the try_module_get() after kernfs_get_active() is called. As
documented in the prior patch, we now know that once kernfs_get_active()
is called the module is implicitly guarded to exist and cannot be removed.
This is because the module is the one in charge of removing the same
sysfs file it created, and removal of sysfs files on module exit will wait
until they don't have any active references. By using a try_module_get()
after kernfs_get_active() we yield to let module removal trump calls to
process a sysfs operation, while also preventing module removal if a sysfs
operation is in already progress. This prevents the deadlock.

This deadlock was first reported with the zram driver, however the live
patching folks have acknowledged they have observed this as well with
live patching, when a live patch is removed. I was then able to
reproduce easily by creating a dedicated selftest for it.

A sketch of how this can happen follows, consider foo a local mutex
part of a driver, and used on the driver's module exit routine and
on one of its sysfs ops:

foo.c:
static DEFINE_MUTEX(foo);
static ssize_t foo_store(struct device *dev,
			 struct device_attribute *attr,
			 const char *buf, size_t count)
{
	...
	mutex_lock(&foo);
	...
	mutex_lock(&foo);
	...
}
static DEVICE_ATTR_RW(foo);
...
void foo_exit(void)
{
	mutex_lock(&foo);
	...
	mutex_unlock(&foo);
}
module_exit(foo_exit);

And this can lead to this condition:

CPU A                              CPU B
                                   foo_store()
foo_exit()
  mutex_lock(&foo)
                                   mutex_lock(&foo)
   del_gendisk(some_struct->disk);
     device_del()
       device_remove_groups()

In this situation foo_store() is waiting for the mutex foo to
become unlocked, but that won't happen until module removal is complete.
But module removal won't complete until the sysfs file being poked at
completes which is waiting for a lock already held.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 +-
 fs/kernfs/dir.c                        | 44 ++++++++++++++++++----
 fs/kernfs/file.c                       |  6 ++-
 fs/kernfs/kernfs-internal.h            |  3 +-
 fs/kernfs/symlink.c                    |  3 +-
 fs/sysfs/dir.c                         |  2 +-
 fs/sysfs/file.c                        |  6 ++-
 fs/sysfs/group.c                       |  3 +-
 include/linux/kernfs.h                 | 14 ++++---
 include/linux/sysfs.h                  | 52 ++++++++++++++++++++------
 kernel/cgroup/cgroup.c                 |  2 +-
 11 files changed, 105 insertions(+), 34 deletions(-)

Message ID	20210927163805.808907-10-mcgrof@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD3C5C43219 for <linux-fsdevel@archiver.kernel.org>; Mon, 27 Sep 2021 16:38:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B162F61209 for <linux-fsdevel@archiver.kernel.org>; Mon, 27 Sep 2021 16:38:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235654AbhI0Qj7 (ORCPT <rfc822;linux-fsdevel@archiver.kernel.org>); Mon, 27 Sep 2021 12:39:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235547AbhI0Qju (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Mon, 27 Sep 2021 12:39:50 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 569A7C06176D; Mon, 27 Sep 2021 09:38:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=LSXwx/oEdh6hVPqiyH4zD3pixr3m2tbgmt56Q5k8oAU=; b=3GvJOF2pJPpjp/LvxI1qnlc36D WmvxzeAnVIofv8oU53lOIDTZSUXs2E1caeQvVQTGrPvRkU39gRwQd3k22BOvfdSYVN5mA8bQSWIvz j2zlX5Tz+npm6XU6+5IhbGW6J0cfLtYDdJ0drLR/WRoex9rkWRxP0X9g+HSt6NrHuyVrDDHvlD9Fk SwgOb/hBvl4ql5OZgunscpr2/i3/qLj0g8eFOXs9q6LmWmWsBTS+1PWfanxWf7RJhZDUU4hkIBPZL j6T+GiQNQR7Idk7zR7pMDvfb8oLaNwtjM1KMrbBtU9BZpGoXeQSc92tWEvOEC/2jFKY3GAsGQq08V We7V+TWw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mUtdn-003OST-24; Mon, 27 Sep 2021 16:38:07 +0000 From: Luis Chamberlain <mcgrof@kernel.org> To: tj@kernel.org, gregkh@linuxfoundation.org, akpm@linux-foundation.org, minchan@kernel.org, jeyu@kernel.org, shuah@kernel.org Cc: bvanassche@acm.org, dan.j.williams@intel.com, joe@perches.com, tglx@linutronix.de, mcgrof@kernel.org, keescook@chromium.org, rostedt@goodmis.org, linux-spdx@vger.kernel.org, linux-doc@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 09/12] sysfs: fix deadlock race with module removal Date: Mon, 27 Sep 2021 09:38:02 -0700 Message-Id: <20210927163805.808907-10-mcgrof@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210927163805.808907-1-mcgrof@kernel.org> References: <20210927163805.808907-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain <mcgrof@infradead.org> Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	syfs: generic deadlock fix with module removal \| expand [v8,00/12] syfs: generic deadlock fix with module removal [v8,01/12] LICENSES: Add the copyleft-next-0.3.1 license [v8,02/12] testing: use the copyleft-next-0.3.1 SPDX tag [v8,03/12] selftests: add tests_sysfs module [v8,04/12] kernfs: add initial failure injection support [v8,05/12] test_sysfs: add support to use kernfs failure injection [v8,06/12] kernel/module: add documentation for try_module_get() [v8,07/12] fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on kernfs_create_link() [v8,08/12] fs/sysfs/dir.c: replace S_IRWXU\|S_IRUGO\|S_IXUGO with 0755 sysfs_create_dir_ns() [v8,09/12] sysfs: fix deadlock race with module removal [v8,10/12] test_sysfs: enable deadlock tests by default [v8,11/12] zram: fix crashes with cpu hotplug multistate [v8,12/12] zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal

[v8,09/12] sysfs: fix deadlock race with module removal

Commit Message

Comments

Patch