diff mbox series

[v2] block: genhd: don't call probe function with major_names_lock held

Message ID f790f8fb-5758-ea4e-a527-0ee4af82dd44@i-love.sakura.ne.jp (mailing list archive)
State New, archived
Headers show
Series [v2] block: genhd: don't call probe function with major_names_lock held | expand

Commit Message

Tetsuo Handa June 19, 2021, 1:05 a.m. UTC
syzbot is reporting circular locking problem at blk_request_module() [1],
for blk_request_module() is calling probe function with major_names_lock
held while major_names_lock is held during module's __init and __exit
functions.

                                         loop_exit() {
                                           mutex_lock(&loop_ctl_mutex);
  blk_request_module() {
    mutex_lock(&major_names_lock);
    loop_probe() {
      mutex_lock(&loop_ctl_mutex); // Blocked by loop_exit()
      mutex_unlock(&loop_ctl_mutex);
    }
    mutex_unlock(&major_names_lock);
                                           unregister_blkdev() {
                                             mutex_lock(&major_names_lock); // Blocked by blk_request_module()
                                             mutex_unlock(&major_names_lock);
                                           }
                                           mutex_unlock(&loop_ctl_mutex);
  }
                                         }

Based on an assumption that a probe callback passed to __register_blkdev()
belongs to a module which calls __register_blkdev(), drop major_names_lock
before calling probe function by holding a reference to that module which
contains that probe function. If there is a module where this assumption
does not hold, such module can call ____register_blkdev() directly.

  blk_request_module() {
    mutex_lock(&major_names_lock);
    // Block loop_exit()
    mutex_unlock(&major_names_lock);
    loop_probe() {
      mutex_lock(&loop_ctl_mutex);
      mutex_unlock(&loop_ctl_mutex);
    }
    // Unblock loop_exit()
  }
                                         loop_exit() {
                                           mutex_lock(&loop_ctl_mutex);
                                           unregister_blkdev() {
                                             mutex_lock(&major_names_lock);
                                             mutex_unlock(&major_names_lock);
                                           }
                                           mutex_unlock(&loop_ctl_mutex);
                                         }

Note that regardless of this patch, it is up to probe function to
serialize module's __init function and probe function in that module
by using e.g. a mutex. This patch simply makes sure that module's __exit
function won't be called when probe function is about to be called.

While Desmond Cheong Zhi Xi already proposed a patch for breaking ABCD
circular dependency [2], I consider that this patch is still needed for
safely breaking AB-BA dependency upon module unloading. (Note that syzbot
does not test module unloading code because the syzbot kernels are built
with almost all modules built-in. We need manual inspection.)

By doing kmalloc(GFP_KERNEL) in ____register_blkdev() before holding
major_names_lock, we could convert major_names_lock from a mutex to
a spinlock. But that is beyond the scope of this patch.

Link: https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb [1]
Link: https://lkml.kernel.org/r/20210617160904.570111-1-desmondcheongzx@gmail.com [2]
Reported-by: syzbot <syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com>
---
 block/genhd.c         | 36 +++++++++++++++++++++++++++---------
 include/linux/genhd.h |  8 +++++---
 2 files changed, 32 insertions(+), 12 deletions(-)

Comments

kernel test robot June 19, 2021, 3:24 a.m. UTC | #1
Hi Tetsuo,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.13-rc6]
[cannot apply to block/for-next next-20210618]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tetsuo-Handa/block-genhd-don-t-call-probe-function-with-major_names_lock-held/20210619-090731
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dd860052c99b1e088352bdd4fb7aef46f8d2ef47
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/1de14b707f1a3e49fa4412b1eb8391f09747a005
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tetsuo-Handa/block-genhd-don-t-call-probe-function-with-major_names_lock-held/20210619-090731
        git checkout 1de14b707f1a3e49fa4412b1eb8391f09747a005
        # save the attached .config to linux build tree
        make W=1 ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/genhd.c:223: warning: expecting prototype for __register_blkdev(). Prototype was for ____register_blkdev() instead


vim +223 block/genhd.c

^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  196  
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  197  /**
e2b6b301871719 block/genhd.c         Christoph Hellwig     2020-11-14  198   * __register_blkdev - register a new block device
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  199   *
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  200   * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  201   *         @major = 0, try to allocate any unused major number.
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  202   * @name: the name of the new block device as a zero terminated string
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  203   * @probe: callback that is called on access to any minor number of @major
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  204   * @owner: the owner of @probe function (i.e. THIS_MODULE or NULL).
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  205   *
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  206   * The @name must be unique within the system.
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  207   *
0e056eb5530da8 block/genhd.c         Mauro Carvalho Chehab 2017-03-30  208   * The return value depends on the @major input parameter:
0e056eb5530da8 block/genhd.c         Mauro Carvalho Chehab 2017-03-30  209   *
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  210   *  - if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  211   *    then the function returns zero on success, or a negative error code
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  212   *  - if any unused major number was requested with @major = 0 parameter
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  213   *    then the return value is the allocated major number in range
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  214   *    [1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  215   *
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  216   * See Documentation/admin-guide/devices.txt for the list of allocated
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  217   * major numbers.
e2b6b301871719 block/genhd.c         Christoph Hellwig     2020-11-14  218   *
e2b6b301871719 block/genhd.c         Christoph Hellwig     2020-11-14  219   * Use register_blkdev instead for any new code.
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  220   */
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  221  int ____register_blkdev(unsigned int major, const char *name,
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  222  			void (*probe)(dev_t devt), struct module *owner)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16 @223  {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  224  	struct blk_major_name **n, *p;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  225  	int index, ret = 0;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  226  
e49fbbbf0aa14f block/genhd.c         Christoph Hellwig     2020-10-29  227  	mutex_lock(&major_names_lock);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  228  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  229  	/* temporary */
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  230  	if (major == 0) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  231  		for (index = ARRAY_SIZE(major_names)-1; index > 0; index--) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  232  			if (major_names[index] == NULL)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  233  				break;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  234  		}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  235  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  236  		if (index == 0) {
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  237  			printk("%s: failed to get major for %s\n",
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  238  			       __func__, name);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  239  			ret = -EBUSY;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  240  			goto out;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  241  		}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  242  		major = index;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  243  		ret = major;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  244  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  245  
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  246  	if (major >= BLKDEV_MAJOR_MAX) {
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  247  		pr_err("%s: major requested (%u) is greater than the maximum (%u) for %s\n",
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  248  		       __func__, major, BLKDEV_MAJOR_MAX-1, name);
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  249  
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  250  		ret = -EINVAL;
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  251  		goto out;
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  252  	}
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  253  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  254  	p = kmalloc(sizeof(struct blk_major_name), GFP_KERNEL);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  255  	if (p == NULL) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  256  		ret = -ENOMEM;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  257  		goto out;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  258  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  259  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  260  	p->major = major;
a160c6159d4a0c block/genhd.c         Christoph Hellwig     2020-10-29  261  	p->probe = probe;
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  262  	p->owner = owner;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  263  	strlcpy(p->name, name, sizeof(p->name));
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  264  	p->next = NULL;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  265  	index = major_to_index(major);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  266  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  267  	for (n = &major_names[index]; *n; n = &(*n)->next) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  268  		if ((*n)->major == major)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  269  			break;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  270  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  271  	if (!*n)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  272  		*n = p;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  273  	else
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  274  		ret = -EBUSY;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  275  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  276  	if (ret < 0) {
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  277  		printk("register_blkdev: cannot get major %u for %s\n",
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  278  		       major, name);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  279  		kfree(p);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  280  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  281  out:
e49fbbbf0aa14f block/genhd.c         Christoph Hellwig     2020-10-29  282  	mutex_unlock(&major_names_lock);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  283  	return ret;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  284  }
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  285  EXPORT_SYMBOL(____register_blkdev);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  286  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot June 19, 2021, 6:14 a.m. UTC | #2
Hi Tetsuo,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.13-rc6]
[cannot apply to block/for-next next-20210618]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tetsuo-Handa/block-genhd-don-t-call-probe-function-with-major_names_lock-held/20210619-090731
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dd860052c99b1e088352bdd4fb7aef46f8d2ef47
config: x86_64-randconfig-a004-20210618 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project d1baf2895467735ab14f4b3415fce204c0cc8e7f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/1de14b707f1a3e49fa4412b1eb8391f09747a005
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tetsuo-Handa/block-genhd-don-t-call-probe-function-with-major_names_lock-held/20210619-090731
        git checkout 1de14b707f1a3e49fa4412b1eb8391f09747a005
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/genhd.c:223: warning: expecting prototype for __register_blkdev(). Prototype was for ____register_blkdev() instead


vim +223 block/genhd.c

^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  196  
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  197  /**
e2b6b301871719 block/genhd.c         Christoph Hellwig     2020-11-14  198   * __register_blkdev - register a new block device
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  199   *
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  200   * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  201   *         @major = 0, try to allocate any unused major number.
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  202   * @name: the name of the new block device as a zero terminated string
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  203   * @probe: callback that is called on access to any minor number of @major
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  204   * @owner: the owner of @probe function (i.e. THIS_MODULE or NULL).
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  205   *
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  206   * The @name must be unique within the system.
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  207   *
0e056eb5530da8 block/genhd.c         Mauro Carvalho Chehab 2017-03-30  208   * The return value depends on the @major input parameter:
0e056eb5530da8 block/genhd.c         Mauro Carvalho Chehab 2017-03-30  209   *
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  210   *  - if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  211   *    then the function returns zero on success, or a negative error code
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  212   *  - if any unused major number was requested with @major = 0 parameter
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  213   *    then the return value is the allocated major number in range
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  214   *    [1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  215   *
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  216   * See Documentation/admin-guide/devices.txt for the list of allocated
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  217   * major numbers.
e2b6b301871719 block/genhd.c         Christoph Hellwig     2020-11-14  218   *
e2b6b301871719 block/genhd.c         Christoph Hellwig     2020-11-14  219   * Use register_blkdev instead for any new code.
9e8c0bccdc944b block/genhd.c         Márton Németh         2009-02-20  220   */
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  221  int ____register_blkdev(unsigned int major, const char *name,
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  222  			void (*probe)(dev_t devt), struct module *owner)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16 @223  {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  224  	struct blk_major_name **n, *p;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  225  	int index, ret = 0;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  226  
e49fbbbf0aa14f block/genhd.c         Christoph Hellwig     2020-10-29  227  	mutex_lock(&major_names_lock);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  228  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  229  	/* temporary */
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  230  	if (major == 0) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  231  		for (index = ARRAY_SIZE(major_names)-1; index > 0; index--) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  232  			if (major_names[index] == NULL)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  233  				break;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  234  		}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  235  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  236  		if (index == 0) {
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  237  			printk("%s: failed to get major for %s\n",
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  238  			       __func__, name);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  239  			ret = -EBUSY;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  240  			goto out;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  241  		}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  242  		major = index;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  243  		ret = major;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  244  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  245  
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  246  	if (major >= BLKDEV_MAJOR_MAX) {
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  247  		pr_err("%s: major requested (%u) is greater than the maximum (%u) for %s\n",
dfc76d11dd455a block/genhd.c         Keyur Patel           2019-02-17  248  		       __func__, major, BLKDEV_MAJOR_MAX-1, name);
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  249  
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  250  		ret = -EINVAL;
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  251  		goto out;
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  252  	}
133d55cdb2f1f9 block/genhd.c         Logan Gunthorpe       2017-06-16  253  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  254  	p = kmalloc(sizeof(struct blk_major_name), GFP_KERNEL);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  255  	if (p == NULL) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  256  		ret = -ENOMEM;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  257  		goto out;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  258  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  259  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  260  	p->major = major;
a160c6159d4a0c block/genhd.c         Christoph Hellwig     2020-10-29  261  	p->probe = probe;
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  262  	p->owner = owner;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  263  	strlcpy(p->name, name, sizeof(p->name));
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  264  	p->next = NULL;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  265  	index = major_to_index(major);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  266  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  267  	for (n = &major_names[index]; *n; n = &(*n)->next) {
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  268  		if ((*n)->major == major)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  269  			break;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  270  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  271  	if (!*n)
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  272  		*n = p;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  273  	else
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  274  		ret = -EBUSY;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  275  
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  276  	if (ret < 0) {
f33ff110ef31bd block/genhd.c         Srivatsa S. Bhat      2018-02-05  277  		printk("register_blkdev: cannot get major %u for %s\n",
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  278  		       major, name);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  279  		kfree(p);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  280  	}
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  281  out:
e49fbbbf0aa14f block/genhd.c         Christoph Hellwig     2020-10-29  282  	mutex_unlock(&major_names_lock);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  283  	return ret;
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  284  }
1de14b707f1a3e block/genhd.c         Tetsuo Handa          2021-06-19  285  EXPORT_SYMBOL(____register_blkdev);
^1da177e4c3f41 drivers/block/genhd.c Linus Torvalds        2005-04-16  286  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Greg Kroah-Hartman June 19, 2021, 6:44 a.m. UTC | #3
On Sat, Jun 19, 2021 at 10:05:44AM +0900, Tetsuo Handa wrote:
> syzbot is reporting circular locking problem at blk_request_module() [1],
> for blk_request_module() is calling probe function with major_names_lock
> held while major_names_lock is held during module's __init and __exit
> functions.
> 
>                                          loop_exit() {
>                                            mutex_lock(&loop_ctl_mutex);
>   blk_request_module() {
>     mutex_lock(&major_names_lock);
>     loop_probe() {
>       mutex_lock(&loop_ctl_mutex); // Blocked by loop_exit()
>       mutex_unlock(&loop_ctl_mutex);
>     }
>     mutex_unlock(&major_names_lock);
>                                            unregister_blkdev() {
>                                              mutex_lock(&major_names_lock); // Blocked by blk_request_module()
>                                              mutex_unlock(&major_names_lock);
>                                            }
>                                            mutex_unlock(&loop_ctl_mutex);
>   }
>                                          }
> 
> Based on an assumption that a probe callback passed to __register_blkdev()
> belongs to a module which calls __register_blkdev(), drop major_names_lock
> before calling probe function by holding a reference to that module which
> contains that probe function. If there is a module where this assumption
> does not hold, such module can call ____register_blkdev() directly.
> 
>   blk_request_module() {
>     mutex_lock(&major_names_lock);
>     // Block loop_exit()
>     mutex_unlock(&major_names_lock);
>     loop_probe() {
>       mutex_lock(&loop_ctl_mutex);
>       mutex_unlock(&loop_ctl_mutex);
>     }
>     // Unblock loop_exit()
>   }
>                                          loop_exit() {
>                                            mutex_lock(&loop_ctl_mutex);
>                                            unregister_blkdev() {
>                                              mutex_lock(&major_names_lock);
>                                              mutex_unlock(&major_names_lock);
>                                            }
>                                            mutex_unlock(&loop_ctl_mutex);
>                                          }
> 
> Note that regardless of this patch, it is up to probe function to
> serialize module's __init function and probe function in that module
> by using e.g. a mutex. This patch simply makes sure that module's __exit
> function won't be called when probe function is about to be called.

module init functions ARE serialized.

> 
> While Desmond Cheong Zhi Xi already proposed a patch for breaking ABCD
> circular dependency [2], I consider that this patch is still needed for
> safely breaking AB-BA dependency upon module unloading. (Note that syzbot
> does not test module unloading code because the syzbot kernels are built
> with almost all modules built-in. We need manual inspection.)
> 
> By doing kmalloc(GFP_KERNEL) in ____register_blkdev() before holding
> major_names_lock, we could convert major_names_lock from a mutex to
> a spinlock. But that is beyond the scope of this patch.
> 
> Link: https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb [1]
> Link: https://lkml.kernel.org/r/20210617160904.570111-1-desmondcheongzx@gmail.com [2]
> Reported-by: syzbot <syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com>
> ---
>  block/genhd.c         | 36 +++++++++++++++++++++++++++---------
>  include/linux/genhd.h |  8 +++++---
>  2 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 9f8cb7beaad1..9577c70a6bd3 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -169,6 +169,7 @@ static struct blk_major_name {
>  	int major;
>  	char name[16];
>  	void (*probe)(dev_t devt);
> +	struct module *owner;

The module owner should not matter here.


>  } *major_names[BLKDEV_MAJOR_HASH_SIZE];
>  static DEFINE_MUTEX(major_names_lock);
>  
> @@ -197,7 +198,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
>   * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
>   *         @major = 0, try to allocate any unused major number.
>   * @name: the name of the new block device as a zero terminated string
> - * @probe: allback that is called on access to any minor number of @major
> + * @probe: callback that is called on access to any minor number of @major
> + * @owner: the owner of @probe function (i.e. THIS_MODULE or NULL).

This feels wrong.

>   *
>   * The @name must be unique within the system.
>   *
> @@ -214,8 +216,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
>   *
>   * Use register_blkdev instead for any new code.
>   */
> -int __register_blkdev(unsigned int major, const char *name,
> -		void (*probe)(dev_t devt))
> +int ____register_blkdev(unsigned int major, const char *name,
> +			void (*probe)(dev_t devt), struct module *owner)

That's just a crazy naming scheme, please no.

>  {
>  	struct blk_major_name **n, *p;
>  	int index, ret = 0;
> @@ -255,6 +257,7 @@ int __register_blkdev(unsigned int major, const char *name,
>  
>  	p->major = major;
>  	p->probe = probe;
> +	p->owner = owner;
>  	strlcpy(p->name, name, sizeof(p->name));
>  	p->next = NULL;
>  	index = major_to_index(major);
> @@ -277,7 +280,7 @@ int __register_blkdev(unsigned int major, const char *name,
>  	mutex_unlock(&major_names_lock);
>  	return ret;
>  }
> -EXPORT_SYMBOL(__register_blkdev);
> +EXPORT_SYMBOL(____register_blkdev);

That's crazy.

>  
>  void unregister_blkdev(unsigned int major, const char *name)
>  {
> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
>  {
>  	unsigned int major = MAJOR(devt);
>  	struct blk_major_name **n;
> +	void (*probe_fn)(dev_t devt);
>  
>  	mutex_lock(&major_names_lock);
>  	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
> -		if ((*n)->major == major && (*n)->probe) {
> -			(*n)->probe(devt);
> -			mutex_unlock(&major_names_lock);
> -			return;
> -		}
> +		if ((*n)->major != major || !(*n)->probe)
> +			continue;
> +		if (!try_module_get((*n)->owner))
> +			break;

So you just fail?  Are you sure that is ok?


> +		/*
> +		 * Calling probe function with major_names_lock held causes
> +		 * circular locking dependency problem. Thus, call it after
> +		 * releasing major_names_lock.
> +		 */
> +		probe_fn = (*n)->probe;
> +		mutex_unlock(&major_names_lock);

So you save a pointer off and then unlock?  Why does that lock matter?

> +		/*
> +		 * Assuming that unregister_blkdev() is called from module's
> +		 * __exit function, a module refcount taken above allows us
> +		 * to safely call probe function without major_names_lock held.
> +		 */
> +		probe_fn(devt);
> +		module_put((*n)->owner);

So you are trying to keep the module in memory while the probe is call,
ok.  But module removal is not an issue, you remove modules at your own
risk.  As syzbot isn't even testing this, why is this an issue?


> +		return;
>  	}
>  	mutex_unlock(&major_names_lock);
>  
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 6fc26f7bdf71..070b73c043e6 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -277,10 +277,12 @@ extern void put_disk(struct gendisk *disk);
>  
>  #define alloc_disk(minors) alloc_disk_node(minors, NUMA_NO_NODE)
>  
> -int __register_blkdev(unsigned int major, const char *name,
> -		void (*probe)(dev_t devt));
> +int ____register_blkdev(unsigned int major, const char *name,
> +			void (*probe)(dev_t devt), struct module *owner);
> +#define __register_blkdev(major, name, probe) \
> +	____register_blkdev(major, name, probe, THIS_MODULE)

This just feels wrong, you should only need 1 level deep of __ at most.

thanks,

greg k-h
Tetsuo Handa June 19, 2021, 8:47 a.m. UTC | #4
On 2021/06/19 15:44, Greg KH wrote:
>> Note that regardless of this patch, it is up to probe function to
>> serialize module's __init function and probe function in that module
>> by using e.g. a mutex. This patch simply makes sure that module's __exit
>> function won't be called when probe function is about to be called.
> 
> module init functions ARE serialized.

The __init functions between module foo and module bar are serialized.
But the __init function for module foo and the probe function for module
foo are not serialized.

> The module owner should not matter here.

The __exit functions between module foo and module bar are serialized.
But the __exit function for module bar and the probe function for module
bar are not serialized.

You can observe a deadlock via the following steps.

(1) Build loop.ko with CONFIG_BLK_DEV_LOOP_MIN_COUNT=0 and
    a delay injection patch shown below.

----------
diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..fe0360dc8c5d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -680,6 +680,11 @@ void blk_request_module(dev_t devt)
 	mutex_lock(&major_names_lock);
 	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
 		if ((*n)->major == major && (*n)->probe) {
+			if (!strcmp(current->comm, "bash")) {
+				pr_info("sleep injection start\n");
+				schedule_timeout_killable(10 * HZ); // Call "rmmod" here.
+				pr_info("sleep injection end\n");
+			}
 			(*n)->probe(devt);
 			mutex_unlock(&major_names_lock);
 			return;
----------

(2) Run the following commands from bash shell.

# modprobe loop
# mknod /dev/loop0 b 7 0
# exec 100<>/dev/loop0 & sleep 1; rmmod loop

(3) See dmesg output.

----------
[   32.260467][ T2873] loop: module loaded
[   32.289961][ T2880] sleep injection start
[   42.484039][ T2880] sleep injection end
[   42.484218][ T2880]
[   42.484248][ T2880] ======================================================
[   42.484381][ T2880] WARNING: possible circular locking dependency detected
[   42.484455][ T2880] 5.13.0-rc6+ #867 Not tainted
[   42.484508][ T2880] ------------------------------------------------------
[   42.484579][ T2880] bash/2880 is trying to acquire lock:
[   42.484638][ T2880] ffffffffc075b468 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_probe+0x44/0x90 [loop]
[   42.484760][ T2880]
[   42.484760][ T2880] but task is already holding lock:
[   42.484836][ T2880] ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.484950][ T2880]
[   42.484950][ T2880] which lock already depends on the new lock.
[   42.484950][ T2880]
[   42.485053][ T2880]
[   42.485053][ T2880] the existing dependency chain (in reverse order) is:
[   42.485144][ T2880]
[   42.485144][ T2880] -> #1 (major_names_lock){+.+.}-{3:3}:
[   42.485230][ T2880]        lock_acquire+0xb3/0x380
[   42.485292][ T2880]        __mutex_lock+0x89/0x8f0
[   42.485350][ T2880]        mutex_lock_nested+0x16/0x20
[   42.485410][ T2880]        unregister_blkdev+0x38/0xb0
[   42.485469][ T2880]        loop_exit+0x44/0xd84 [loop]
[   42.485534][ T2880]        __x64_sys_delete_module+0x135/0x210
[   42.485602][ T2880]        do_syscall_64+0x36/0x70
[   42.485660][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.485733][ T2880]
[   42.485733][ T2880] -> #0 (loop_ctl_mutex){+.+.}-{3:3}:
[   42.485817][ T2880]        check_prevs_add+0x16a/0x1040
[   42.487091][ T2880]        __lock_acquire+0x118b/0x1580
[   42.488427][ T2880]        lock_acquire+0xb3/0x380
[   42.489701][ T2880]        __mutex_lock+0x89/0x8f0
[   42.490960][ T2880]        mutex_lock_nested+0x16/0x20
[   42.492374][ T2880]        loop_probe+0x44/0x90 [loop]
[   42.493756][ T2880]        blk_request_module+0xa3/0x100
[   42.495207][ T2880]        blkdev_get_no_open+0x93/0xc0
[   42.496516][ T2880]        blkdev_get_by_dev+0x56/0x200
[   42.497735][ T2880]        blkdev_open+0x5d/0xa0
[   42.498919][ T2880]        do_dentry_open+0x163/0x3b0
[   42.500157][ T2880]        vfs_open+0x28/0x30
[   42.501312][ T2880]        path_openat+0x7e6/0xad0
[   42.502443][ T2880]        do_filp_open+0x9f/0x110
[   42.503592][ T2880]        do_sys_openat2+0x245/0x310
[   42.504647][ T2880]        do_sys_open+0x48/0x80
[   42.505689][ T2880]        __x64_sys_open+0x1c/0x20
[   42.506730][ T2880]        do_syscall_64+0x36/0x70
[   42.507795][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.508890][ T2880]
[   42.508890][ T2880] other info that might help us debug this:
[   42.508890][ T2880]
[   42.511436][ T2880]  Possible unsafe locking scenario:
[   42.511436][ T2880]
[   42.512303][ T2880]        CPU0                    CPU1
[   42.512727][ T2880]        ----                    ----
[   42.513143][ T2880]   lock(major_names_lock);
[   42.513557][ T2880]                                lock(loop_ctl_mutex);
[   42.513987][ T2880]                                lock(major_names_lock);
[   42.514417][ T2880]   lock(loop_ctl_mutex);
[   42.514841][ T2880]
[   42.514841][ T2880]  *** DEADLOCK ***
[   42.514841][ T2880]
[   42.516112][ T2880] 1 lock held by bash/2880:
[   42.516518][ T2880]  #0: ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.517383][ T2880]
[   42.517383][ T2880] stack backtrace:
[   42.518228][ T2880] CPU: 3 PID: 2880 Comm: bash Not tainted 5.13.0-rc6+ #867
[   42.518679][ T2880] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[   42.519650][ T2880] Call Trace:
[   42.520128][ T2880]  dump_stack+0x76/0x95
[   42.520608][ T2880]  print_circular_bug.isra.50.cold.71+0x13c/0x141
[   42.521105][ T2880]  check_noncircular+0xfe/0x110
[   42.521607][ T2880]  check_prevs_add+0x16a/0x1040
[   42.522106][ T2880]  __lock_acquire+0x118b/0x1580
[   42.522606][ T2880]  lock_acquire+0xb3/0x380
[   42.523244][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.523753][ T2880]  __mutex_lock+0x89/0x8f0
[   42.524250][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.524749][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.525290][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.525790][ T2880]  ? vprintk_default+0x18/0x20
[   42.526286][ T2880]  ? vprintk+0x56/0x130
[   42.526797][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.527286][ T2880]  mutex_lock_nested+0x16/0x20
[   42.527768][ T2880]  ? mutex_lock_nested+0x16/0x20
[   42.528250][ T2880]  loop_probe+0x44/0x90 [loop]
[   42.528730][ T2880]  blk_request_module+0xa3/0x100
[   42.529209][ T2880]  blkdev_get_no_open+0x93/0xc0
[   42.529691][ T2880]  blkdev_get_by_dev+0x56/0x200
[   42.530176][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.530688][ T2880]  blkdev_open+0x5d/0xa0
[   42.531170][ T2880]  do_dentry_open+0x163/0x3b0
[   42.531656][ T2880]  vfs_open+0x28/0x30
[   42.532132][ T2880]  path_openat+0x7e6/0xad0
[   42.532612][ T2880]  do_filp_open+0x9f/0x110
[   42.533094][ T2880]  ? _raw_spin_unlock+0x1d/0x30
[   42.533582][ T2880]  ? alloc_fd+0x116/0x200
[   42.534069][ T2880]  do_sys_openat2+0x245/0x310
[   42.534566][ T2880]  do_sys_open+0x48/0x80
[   42.535015][ T2880]  __x64_sys_open+0x1c/0x20
[   42.535462][ T2880]  do_syscall_64+0x36/0x70
[   42.535913][ T2880]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.536364][ T2880] RIP: 0033:0x7f16f15737f0
[   42.536797][ T2880] Code: 48 8b 15 83 76 2d 00 f7 d8 64 89 02 48 83 c8 ff c3 66 0f 1f 84 00 00 00 00 00 83 3d cd d7 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e cf 01 00 48 89 04 24
[   42.538148][ T2880] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[   42.538671][ T2880] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[   42.539173][ T2880] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[   42.539681][ T2880] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[   42.540187][ T2880] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[   42.540696][ T2880] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.772906][   T35] INFO: task bash:2880 blocked for more than 122 seconds.
[  246.774289][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.775478][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.776734][   T35] task:bash            state:D stack:    0 pid: 2880 ppid:  2856 flags:0x00004000
[  246.779550][   T35] Call Trace:
[  246.781054][   T35]  __schedule+0x271/0x9e0
[  246.782381][   T35]  schedule+0x50/0xc0
[  246.783671][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.784978][   T35]  __mutex_lock+0x467/0x8f0
[  246.786277][   T35]  ? loop_probe+0x44/0x90 [loop]
[  246.787582][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.789093][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.790409][   T35]  mutex_lock_nested+0x16/0x20
[  246.791700][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.792992][   T35]  loop_probe+0x44/0x90 [loop]
[  246.794286][   T35]  blk_request_module+0xa3/0x100
[  246.795757][   T35]  blkdev_get_no_open+0x93/0xc0
[  246.797066][   T35]  blkdev_get_by_dev+0x56/0x200
[  246.798360][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.799661][   T35]  blkdev_open+0x5d/0xa0
[  246.800953][   T35]  do_dentry_open+0x163/0x3b0
[  246.802258][   T35]  vfs_open+0x28/0x30
[  246.803559][   T35]  path_openat+0x7e6/0xad0
[  246.805201][   T35]  do_filp_open+0x9f/0x110
[  246.807128][   T35]  ? _raw_spin_unlock+0x1d/0x30
[  246.808461][   T35]  ? alloc_fd+0x116/0x200
[  246.809763][   T35]  do_sys_openat2+0x245/0x310
[  246.811034][   T35]  do_sys_open+0x48/0x80
[  246.812257][   T35]  __x64_sys_open+0x1c/0x20
[  246.813416][   T35]  do_syscall_64+0x36/0x70
[  246.814522][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.815624][   T35] RIP: 0033:0x7f16f15737f0
[  246.816707][   T35] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[  246.817863][   T35] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[  246.819033][   T35] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[  246.820059][   T35] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[  246.820687][   T35] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[  246.821292][   T35] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.821882][   T35] INFO: task rmmod:2882 blocked for more than 122 seconds.
[  246.822456][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.823011][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.823599][   T35] task:rmmod           state:D stack:    0 pid: 2882 ppid:  2856 flags:0x00000000
[  246.824799][   T35] Call Trace:
[  246.825405][   T35]  __schedule+0x271/0x9e0
[  246.826018][   T35]  schedule+0x50/0xc0
[  246.826621][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.827232][   T35]  __mutex_lock+0x467/0x8f0
[  246.827854][   T35]  ? unregister_blkdev+0x38/0xb0
[  246.828608][   T35]  mutex_lock_nested+0x16/0x20
[  246.829237][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.829858][   T35]  unregister_blkdev+0x38/0xb0
[  246.830478][   T35]  loop_exit+0x44/0xd84 [loop]
[  246.831112][   T35]  __x64_sys_delete_module+0x135/0x210
[  246.831749][   T35]  do_syscall_64+0x36/0x70
[  246.832379][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.833019][   T35] RIP: 0033:0x7ff4990fcee7
[  246.833655][   T35] RSP: 002b:00007ffe445ecab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  246.834326][   T35] RAX: ffffffffffffffda RBX: 0000000000a21220 RCX: 00007ff4990fcee7
[  246.835011][   T35] RDX: 00007ff499171a80 RSI: 0000000000000800 RDI: 0000000000a21288
[  246.835794][   T35] RBP: 0000000000000000 R08: 00007ff4993c6060 R09: 00007ff499171a80
[  246.836501][   T35] R10: 00007ffe445ec520 R11: 0000000000000206 R12: 00007ffe445ee840
[  246.837206][   T35] R13: 0000000000000000 R14: 0000000000a21220 R15: 0000000000a21010
[  246.837922][   T35] INFO: lockdep is turned off.
----------

Christoph and Desmond commented

  > > And now you can all probe after it has been freed and/or the module has
  > > been unloaded. The obviously correct fix is to only hold mtd_table_mutex
  > > for the actually required critical section:
  > > 
  > 
  > Thank you for the correction, Christoph. I hadn't thought of the 
  > scenario where the module is unloaded. I'll be more conscientious in the 
  > future.

at https://lkml.kernel.org/r/YMs3O/ce1567cf-bc94-790c-cfc0-e4e429e1a86a@gmail.com
but the fact that AB-BA deadlock is possible was forgotten. Therefore, I propose
this patch for fixing commit a160c6159d4a0cf8 ("block: add an optional probe callback
to major_names") which started calling the probe function without making sure that
the module will not go away.

>> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
>>  {
>>  	unsigned int major = MAJOR(devt);
>>  	struct blk_major_name **n;
>> +	void (*probe_fn)(dev_t devt);
>>  
>>  	mutex_lock(&major_names_lock);
>>  	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
>> -		if ((*n)->major == major && (*n)->probe) {
>> -			(*n)->probe(devt);
>> -			mutex_unlock(&major_names_lock);
>> -			return;
>> -		}
>> +		if ((*n)->major != major || !(*n)->probe)
>> +			continue;
>> +		if (!try_module_get((*n)->owner))
>> +			break;
> 
> So you just fail?  Are you sure that is ok?

After "break;", the control reaches request_module() (which waits for module
unloading to complete and then tries to load that module again).

I think failing open() for once due to racing with module unloading is acceptable
(as long as there is no leak/deadlock/oops etc.), but do we want to immediately
retry this loop after returning from request_module() ?

Also, to make sure that the __init function for module foo and the probe
function for module foo are serialized, should we also verify that
(*n)->owner && (*n)->owner->state == MODULE_STATE_LIVE (which indicates that
the __init function for that module has completed) because module_is_live()
 from try_module_get() is rather weak?

----------
/* FIXME: It'd be nice to isolate modules during init, too, so they
   aren't used before they (may) fail.  But presently too much code
   (IDE & SCSI) require entry into the module during init.*/
static inline bool module_is_live(struct module *mod)
{
	return mod->state != MODULE_STATE_GOING;
}
----------

>> +		/*
>> +		 * Assuming that unregister_blkdev() is called from module's
>> +		 * __exit function, a module refcount taken above allows us
>> +		 * to safely call probe function without major_names_lock held.
>> +		 */
>> +		probe_fn(devt);
>> +		module_put((*n)->owner);
> 
> So you are trying to keep the module in memory while the probe is call,
> ok.  But module removal is not an issue, you remove modules at your own
> risk.  As syzbot isn't even testing this, why is this an issue?

That's a crazy comment. A module removal bug (unless forced unloading is used)
is an issue. Why not fix bugs where syzbot cannot test?
Tetsuo Handa June 20, 2021, 1:54 p.m. UTC | #5
On 2021/06/20 11:44, Hillf Danton wrote:
> Good craft in regard to triggering the ABBA deadlock, but curious why not
> move unregister_blkdev out of and before loop_ctl_mutex, given it will also
> serialise with the prober.
> 

Well, something like this untested diff?

Call unregister_blkdev() as soon as __exit function starts, for calling
probe function after cleanup started will be unexpected for __exit function.

Keep probe function no-op until __init function ends, for probe function
might be called as soon as __register_blkdev() succeeded but calling probe
function before setup completes will be unexpected for __init function.

 drivers/block/ataflop.c |    6 +++++-
 drivers/block/brd.c     |    8 ++++++--
 drivers/block/floppy.c  |    4 ++++
 drivers/block/loop.c    |    4 ++--
 drivers/ide/ide-probe.c |    8 +++++++-
 drivers/md/md.c         |    5 +++++
 drivers/scsi/sd.c       |   10 +---------
 7 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index d601e49f80e0..3681e8c493b1 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1995,6 +1995,7 @@ static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
 }
 
 static DEFINE_MUTEX(ataflop_probe_lock);
+static bool module_initialize_completed;
 
 static void ataflop_probe(dev_t dev)
 {
@@ -2006,6 +2007,8 @@ static void ataflop_probe(dev_t dev)
 
 	if (drive >= FD_MAX_UNITS || type >= NUM_DISK_MINORS)
 		return;
+	if (!module_initialize_completed)
+		return;
 	mutex_lock(&ataflop_probe_lock);
 	if (!unit[drive].disk[type]) {
 		if (ataflop_alloc_disk(drive, type) == 0)
@@ -2080,6 +2083,7 @@ static int __init atari_floppy_init (void)
 	       UseTrackbuffer ? "" : "no ");
 	config_types();
 
+	module_initialize_completed = true;
 	return 0;
 
 err:
@@ -2138,6 +2142,7 @@ static void __exit atari_floppy_exit(void)
 {
 	int i, type;
 
+	unregister_blkdev(FLOPPY_MAJOR, "fd");
 	for (i = 0; i < FD_MAX_UNITS; i++) {
 		for (type = 0; type < NUM_DISK_MINORS; type++) {
 			if (!unit[i].disk[type])
@@ -2148,7 +2153,6 @@ static void __exit atari_floppy_exit(void)
 		}
 		blk_mq_free_tag_set(&unit[i].tag_set);
 	}
-	unregister_blkdev(FLOPPY_MAJOR, "fd");
 
 	del_timer_sync(&fd_timer);
 	atari_stram_free( DMABuffer );
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 7562cf30b14e..91a10c938e65 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -371,6 +371,7 @@ __setup("ramdisk_size=", ramdisk_size);
 static LIST_HEAD(brd_devices);
 static DEFINE_MUTEX(brd_devices_mutex);
 static struct dentry *brd_debugfs_dir;
+static bool module_initialize_completed;
 
 static struct brd_device *brd_alloc(int i)
 {
@@ -439,6 +440,8 @@ static void brd_probe(dev_t dev)
 	struct brd_device *brd;
 	int i = MINOR(dev) / max_part;
 
+	if (!module_initialize_completed)
+		return;
 	mutex_lock(&brd_devices_mutex);
 	list_for_each_entry(brd, &brd_devices, brd_list) {
 		if (brd->brd_number == i)
@@ -530,6 +533,7 @@ static int __init brd_init(void)
 	mutex_unlock(&brd_devices_mutex);
 
 	pr_info("brd: module loaded\n");
+	module_initialize_completed = true;
 	return 0;
 
 out_free:
@@ -550,13 +554,13 @@ static void __exit brd_exit(void)
 {
 	struct brd_device *brd, *next;
 
+	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
+
 	debugfs_remove_recursive(brd_debugfs_dir);
 
 	list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
 		brd_del_one(brd);
 
-	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
-
 	pr_info("brd: module unloaded\n");
 }
 
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 8a9d22207c59..37b8e53c183d 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4523,6 +4523,7 @@ static int floppy_alloc_disk(unsigned int drive, unsigned int type)
 }
 
 static DEFINE_MUTEX(floppy_probe_lock);
+static bool module_initialize_completed;
 
 static void floppy_probe(dev_t dev)
 {
@@ -4533,6 +4534,8 @@ static void floppy_probe(dev_t dev)
 	    type >= ARRAY_SIZE(floppy_type))
 		return;
 
+	if (!module_initialize_completed)
+		return;
 	mutex_lock(&floppy_probe_lock);
 	if (!disks[drive][type]) {
 		if (floppy_alloc_disk(drive, type) == 0)
@@ -4705,6 +4708,7 @@ static int __init do_floppy_init(void)
 				NULL);
 	}
 
+	module_initialize_completed = true;
 	return 0;
 
 out_remove_drives:
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 76e12f3482a9..08aef61ab791 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2386,13 +2386,13 @@ static int loop_exit_cb(int id, void *ptr, void *data)
 
 static void __exit loop_exit(void)
 {
+	unregister_blkdev(LOOP_MAJOR, "loop");
+
 	mutex_lock(&loop_ctl_mutex);
 
 	idr_for_each(&loop_index_idr, &loop_exit_cb, NULL);
 	idr_destroy(&loop_index_idr);
 
-	unregister_blkdev(LOOP_MAJOR, "loop");
-
 	misc_deregister(&loop_misc);
 
 	mutex_unlock(&loop_ctl_mutex);
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index aefd74c0d862..8c71356cdcfe 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -40,6 +40,8 @@
 #include <linux/uaccess.h>
 #include <asm/io.h>
 
+static bool module_initialize_completed;
+
 /**
  *	generic_id		-	add a generic drive id
  *	@drive:	drive to make an ID block for
@@ -904,6 +906,8 @@ static int init_irq (ide_hwif_t *hwif)
 
 static void ata_probe(dev_t dev)
 {
+	if (!module_initialize_completed)
+		return;
 	request_module("ide-disk");
 	request_module("ide-cd");
 	request_module("ide-tape");
@@ -1475,6 +1479,8 @@ int ide_host_register(struct ide_host *host, const struct ide_port_info *d,
 		}
 	}
 
+	if (j)
+		module_initialize_completed = true;
 	return j ? 0 : -1;
 }
 EXPORT_SYMBOL_GPL(ide_host_register);
@@ -1539,6 +1545,7 @@ EXPORT_SYMBOL_GPL(ide_port_unregister_devices);
 
 static void ide_unregister(ide_hwif_t *hwif)
 {
+	unregister_blkdev(hwif->major, hwif->name);
 	mutex_lock(&ide_cfg_mtx);
 
 	if (hwif->present) {
@@ -1559,7 +1566,6 @@ static void ide_unregister(ide_hwif_t *hwif)
 	 * Remove us from the kernel's knowledge
 	 */
 	kfree(hwif->sg_table);
-	unregister_blkdev(hwif->major, hwif->name);
 
 	ide_release_dma_engine(hwif);
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 49f897fbb89b..6603900062bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -68,6 +68,8 @@
 #include "md-bitmap.h"
 #include "md-cluster.h"
 
+static bool module_initialize_completed;
+
 /* pers_list is a list of registered personalities protected
  * by pers_lock.
  * pers_lock does extra service to protect accesses to
@@ -5776,6 +5778,8 @@ static void md_probe(dev_t dev)
 {
 	if (MAJOR(dev) == MD_MAJOR && MINOR(dev) >= 512)
 		return;
+	if (!module_initialize_completed)
+		return;
 	if (create_on_open)
 		md_alloc(dev, NULL);
 }
@@ -9590,6 +9594,7 @@ static int __init md_init(void)
 	raid_table_header = register_sysctl_table(raid_root_table);
 
 	md_geninit();
+	module_initialize_completed = true;
 	return 0;
 
 err_mdp:
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cb3c37d1e009..4fc8f4db2ccf 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -629,14 +629,6 @@ static struct scsi_driver sd_template = {
 	.eh_reset		= sd_eh_reset,
 };
 
-/*
- * Don't request a new module, as that could deadlock in multipath
- * environment.
- */
-static void sd_default_probe(dev_t devt)
-{
-}
-
 /*
  * Device no to disk mapping:
  * 
@@ -3715,7 +3707,7 @@ static int __init init_sd(void)
 	SCSI_LOG_HLQUEUE(3, printk("init_sd: sd driver entry point\n"));
 
 	for (i = 0; i < SD_MAJORS; i++) {
-		if (__register_blkdev(sd_major(i), "sd", sd_default_probe))
+		if (register_blkdev(sd_major(i), "sd"))
 			continue;
 		majors++;
 	}
Christoph Hellwig June 21, 2021, 6:18 a.m. UTC | #6
The obvious fix is to not call unregister_blkdev under loop_ctl_mutex.
Which is pretty pointless anyway - I have a series to fix that but it
might need some reordering to move this to the front.
Greg Kroah-Hartman June 21, 2021, 8:54 a.m. UTC | #7
On Sun, Jun 20, 2021 at 10:54:20PM +0900, Tetsuo Handa wrote:
> On 2021/06/20 11:44, Hillf Danton wrote:
> > Good craft in regard to triggering the ABBA deadlock, but curious why not
> > move unregister_blkdev out of and before loop_ctl_mutex, given it will also
> > serialise with the prober.
> > 
> 
> Well, something like this untested diff?
> 
> Call unregister_blkdev() as soon as __exit function starts, for calling
> probe function after cleanup started will be unexpected for __exit function.
> 
> Keep probe function no-op until __init function ends, for probe function
> might be called as soon as __register_blkdev() succeeded but calling probe
> function before setup completes will be unexpected for __init function.
> 
>  drivers/block/ataflop.c |    6 +++++-
>  drivers/block/brd.c     |    8 ++++++--
>  drivers/block/floppy.c  |    4 ++++
>  drivers/block/loop.c    |    4 ++--
>  drivers/ide/ide-probe.c |    8 +++++++-
>  drivers/md/md.c         |    5 +++++
>  drivers/scsi/sd.c       |   10 +---------
>  7 files changed, 30 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
> index d601e49f80e0..3681e8c493b1 100644
> --- a/drivers/block/ataflop.c
> +++ b/drivers/block/ataflop.c
> @@ -1995,6 +1995,7 @@ static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
>  }
>  
>  static DEFINE_MUTEX(ataflop_probe_lock);
> +static bool module_initialize_completed;

This is almost always wrong.

>  
>  static void ataflop_probe(dev_t dev)
>  {
> @@ -2006,6 +2007,8 @@ static void ataflop_probe(dev_t dev)
>  
>  	if (drive >= FD_MAX_UNITS || type >= NUM_DISK_MINORS)
>  		return;
> +	if (!module_initialize_completed)
> +		return;

This is not correct, when you register a callback structure, it can be
instantly called.  Do not expect it to be delayed until later.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..9577c70a6bd3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -169,6 +169,7 @@  static struct blk_major_name {
 	int major;
 	char name[16];
 	void (*probe)(dev_t devt);
+	struct module *owner;
 } *major_names[BLKDEV_MAJOR_HASH_SIZE];
 static DEFINE_MUTEX(major_names_lock);
 
@@ -197,7 +198,8 @@  void blkdev_show(struct seq_file *seqf, off_t offset)
  * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
  *         @major = 0, try to allocate any unused major number.
  * @name: the name of the new block device as a zero terminated string
- * @probe: allback that is called on access to any minor number of @major
+ * @probe: callback that is called on access to any minor number of @major
+ * @owner: the owner of @probe function (i.e. THIS_MODULE or NULL).
  *
  * The @name must be unique within the system.
  *
@@ -214,8 +216,8 @@  void blkdev_show(struct seq_file *seqf, off_t offset)
  *
  * Use register_blkdev instead for any new code.
  */
-int __register_blkdev(unsigned int major, const char *name,
-		void (*probe)(dev_t devt))
+int ____register_blkdev(unsigned int major, const char *name,
+			void (*probe)(dev_t devt), struct module *owner)
 {
 	struct blk_major_name **n, *p;
 	int index, ret = 0;
@@ -255,6 +257,7 @@  int __register_blkdev(unsigned int major, const char *name,
 
 	p->major = major;
 	p->probe = probe;
+	p->owner = owner;
 	strlcpy(p->name, name, sizeof(p->name));
 	p->next = NULL;
 	index = major_to_index(major);
@@ -277,7 +280,7 @@  int __register_blkdev(unsigned int major, const char *name,
 	mutex_unlock(&major_names_lock);
 	return ret;
 }
-EXPORT_SYMBOL(__register_blkdev);
+EXPORT_SYMBOL(____register_blkdev);
 
 void unregister_blkdev(unsigned int major, const char *name)
 {
@@ -676,14 +679,29 @@  void blk_request_module(dev_t devt)
 {
 	unsigned int major = MAJOR(devt);
 	struct blk_major_name **n;
+	void (*probe_fn)(dev_t devt);
 
 	mutex_lock(&major_names_lock);
 	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
-		if ((*n)->major == major && (*n)->probe) {
-			(*n)->probe(devt);
-			mutex_unlock(&major_names_lock);
-			return;
-		}
+		if ((*n)->major != major || !(*n)->probe)
+			continue;
+		if (!try_module_get((*n)->owner))
+			break;
+		/*
+		 * Calling probe function with major_names_lock held causes
+		 * circular locking dependency problem. Thus, call it after
+		 * releasing major_names_lock.
+		 */
+		probe_fn = (*n)->probe;
+		mutex_unlock(&major_names_lock);
+		/*
+		 * Assuming that unregister_blkdev() is called from module's
+		 * __exit function, a module refcount taken above allows us
+		 * to safely call probe function without major_names_lock held.
+		 */
+		probe_fn(devt);
+		module_put((*n)->owner);
+		return;
 	}
 	mutex_unlock(&major_names_lock);
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 6fc26f7bdf71..070b73c043e6 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -277,10 +277,12 @@  extern void put_disk(struct gendisk *disk);
 
 #define alloc_disk(minors) alloc_disk_node(minors, NUMA_NO_NODE)
 
-int __register_blkdev(unsigned int major, const char *name,
-		void (*probe)(dev_t devt));
+int ____register_blkdev(unsigned int major, const char *name,
+			void (*probe)(dev_t devt), struct module *owner);
+#define __register_blkdev(major, name, probe) \
+	____register_blkdev(major, name, probe, THIS_MODULE)
 #define register_blkdev(major, name) \
-	__register_blkdev(major, name, NULL)
+	____register_blkdev(major, name, NULL, NULL)
 void unregister_blkdev(unsigned int major, const char *name);
 
 bool bdev_check_media_change(struct block_device *bdev);