diff mbox

blkdev loop UAF

Message ID 20180111170008.24crzej4utoh5s3p@quack2.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kara Jan. 11, 2018, 5 p.m. UTC
On Thu 11-01-18 19:22:39, Hou Tao wrote:
> Hi,
> 
> On 2018/1/11 16:24, Dan Carpenter wrote:
> > Thanks for your report and the patch.  I am sending it to the
> > linux-block devs since it's already public.
> > 
> > regards,
> > dan carpenter
> 
> The User-after-free problem is not specific for loop device, it can also
> be reproduced on scsi device, and there are more race problems caused by
> the race between bdev open and gendisk shutdown [1].
> 
> The cause of the UAF problem is that there are two instances of gendisk which share
> the same bdev. After the process owning the new gendisk increases bdev->bd_openers,
> the other process which owns the older gendisk will find bdev->bd_openers is not zero
> and will put the last reference of the older gendisk and cause User-after-free.
> 
> I had proposed a patch for the problem, but it's still an incomplete fix for the race
> between gendisk shutdown and bdev opening.
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fc..5ecdb9f 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1510,6 +1510,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
>                 if (bdev->bd_bdi == &noop_backing_dev_info)
>                         bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
>         } else {
> +               if (bdev->bd_disk != disk) {
> +                       ret = -ENXIO;
> +                       goto out_unlock_bdev;
> +               }
> +
>                 if (bdev->bd_contains == bdev) {
>                         ret = 0;
>                         if (bdev->bd_disk->fops->open)
> 
> 
> As far as I know, Jan Kara is working on these problems. So, Jan, any suggestions ?

Yeah, thanks for the ping. I was working today to get this fixed. I have
about 6 patches (attached) which should fix all the outstanding issues I'm
aware of.  So far they are only boot tested. I will give them more
stress-testing next week and then post them officially but if you have time,
feel free to try them out as well. Thanks!

								Honza

Comments

Eric Biggers Feb. 2, 2018, 3:58 a.m. UTC | #1
On Thu, Jan 11, 2018 at 06:00:08PM +0100, Jan Kara wrote:
> On Thu 11-01-18 19:22:39, Hou Tao wrote:
> > Hi,
> > 
> > On 2018/1/11 16:24, Dan Carpenter wrote:
> > > Thanks for your report and the patch.  I am sending it to the
> > > linux-block devs since it's already public.
> > > 
> > > regards,
> > > dan carpenter
> > 
> > The User-after-free problem is not specific for loop device, it can also
> > be reproduced on scsi device, and there are more race problems caused by
> > the race between bdev open and gendisk shutdown [1].
> > 
> > The cause of the UAF problem is that there are two instances of gendisk which share
> > the same bdev. After the process owning the new gendisk increases bdev->bd_openers,
> > the other process which owns the older gendisk will find bdev->bd_openers is not zero
> > and will put the last reference of the older gendisk and cause User-after-free.
> > 
> > I had proposed a patch for the problem, but it's still an incomplete fix for the race
> > between gendisk shutdown and bdev opening.
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 4a181fc..5ecdb9f 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1510,6 +1510,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> >                 if (bdev->bd_bdi == &noop_backing_dev_info)
> >                         bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
> >         } else {
> > +               if (bdev->bd_disk != disk) {
> > +                       ret = -ENXIO;
> > +                       goto out_unlock_bdev;
> > +               }
> > +
> >                 if (bdev->bd_contains == bdev) {
> >                         ret = 0;
> >                         if (bdev->bd_disk->fops->open)
> > 
> > 
> > As far as I know, Jan Kara is working on these problems. So, Jan, any suggestions ?
> 
> Yeah, thanks for the ping. I was working today to get this fixed. I have
> about 6 patches (attached) which should fix all the outstanding issues I'm
> aware of.  So far they are only boot tested. I will give them more
> stress-testing next week and then post them officially but if you have time,
> feel free to try them out as well. Thanks!
> 

Jan, are you still planning to fix this?  This was also reported by syzbot here:
https://groups.google.com/d/msg/syzkaller-bugs/9wXx4YULITQ/0SK8vqxhAgAJ.  Note
that the C reproducer attached to that report doesn't work for me anymore
following ae6650163c6 ("loop: fix concurrent lo_open/lo_release").  However,
syzbot has still hit this on more recent kernels.  Here's a C reproducer that
still works for me on Linus' tree as of today, though it can take 5-10 seconds:

#include <fcntl.h>
#include <linux/loop.h>
#include <sys/ioctl.h>
#include <unistd.h>

int main()
{
	int fd = open("/dev/loop-control", 0);

	if (!fork()) {
		for (;;)
			ioctl(fd, LOOP_CTL_ADD, 0);
	}

	if (!fork()) {
		for (;;)
			ioctl(fd, LOOP_CTL_REMOVE, 0);
	}

	fork();
	for (;;) {
		fd = open("/dev/loop0", O_EXCL);
		close(fd);
	}
}

This is the KASAN splat I get from the above:

BUG: KASAN: use-after-free in disk_unblock_events+0x3e/0x40 block/genhd.c:1672
Read of size 8 at addr ffff880030743670 by task systemd-udevd/165

CPU: 1 PID: 165 Comm: systemd-udevd Not tainted 4.15.0-08279-gfe53d1443a146 #38
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0xb3/0x145 lib/dump_stack.c:53
perf: interrupt took too long (7334 > 4816), lowering kernel.perf_event_max_sample_rate to 27000
 print_address_description+0x73/0x290 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x235/0x350 mm/kasan/report.c:409
 disk_unblock_events+0x3e/0x40 block/genhd.c:1672
 __blkdev_get+0x41b/0xd30 fs/block_dev.c:1535
 blkdev_get+0x283/0x980 fs/block_dev.c:1591
 do_dentry_open.isra.1+0x6c0/0xaf0 fs/open.c:752
 do_last fs/namei.c:3378 [inline]
 path_openat+0x587/0x28b0 fs/namei.c:3518
 do_filp_open+0x231/0x3a0 fs/namei.c:3553
 do_sys_open+0x399/0x610 fs/open.c:1059
 do_syscall_64+0x213/0x740 arch/x86/entry/common.c:285
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f51d23262ce
RSP: 002b:00007ffdf1844310 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000562878961d30 RCX: 00007f51d23262ce
RDX: 00000000000a0800 RSI: 0000562878974860 RDI: ffffffffffffff9c
RBP: 0000562878965150 R08: 00007f51d1c9d914 R09: ffffffffffffffb0
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000e
R13: 0000562878965150 R14: 0000000000000004 R15: 0000000000000003

Allocated by task 160:
 kmem_cache_alloc_node_trace include/linux/slab.h:413 [inline]
 kmalloc_node include/linux/slab.h:537 [inline]
 kzalloc_node include/linux/slab.h:699 [inline]
 __alloc_disk_node+0xb6/0x450 block/genhd.c:1415
 loop_add+0x3fd/0x9e0 drivers/block/loop.c:1814
 loop_probe+0x152/0x180 drivers/block/loop.c:1921
 kobj_lookup+0x176/0x310 drivers/base/map.c:124
 get_gendisk+0x28/0x290 block/genhd.c:804
 __blkdev_get+0x2ec/0xd30 fs/block_dev.c:1433
 blkdev_get+0x485/0x980 fs/block_dev.c:1591
 do_dentry_open.isra.1+0x6c0/0xaf0 fs/open.c:752
 do_last fs/namei.c:3378 [inline]
 path_openat+0x587/0x28b0 fs/namei.c:3518
 do_filp_open+0x231/0x3a0 fs/namei.c:3553
 do_sys_open+0x399/0x610 fs/open.c:1059
 entry_SYSCALL_64_fastpath+0x1e/0x8b

Freed by task 165:
 __cache_free mm/slab.c:3484 [inline]
 kfree+0x8a/0xd0 mm/slab.c:3799
 disk_release+0x2c3/0x380 block/genhd.c:1264
 device_release+0x6e/0x1d0 drivers/base/core.c:811
 kobject_cleanup lib/kobject.c:646 [inline]
 kobject_release lib/kobject.c:675 [inline]
 kref_put include/linux/kref.h:70 [inline]
 kobject_put+0x14c/0x400 lib/kobject.c:692
 __blkdev_get+0x39f/0xd30 fs/block_dev.c:1528
 blkdev_get+0x283/0x980 fs/block_dev.c:1591
 do_dentry_open.isra.1+0x6c0/0xaf0 fs/open.c:752
 do_last fs/namei.c:3378 [inline]
 path_openat+0x587/0x28b0 fs/namei.c:3518
 do_filp_open+0x231/0x3a0 fs/namei.c:3553
 do_sys_open+0x399/0x610 fs/open.c:1059
 do_syscall_64+0x213/0x740 arch/x86/entry/common.c:285
 return_from_SYSCALL_64+0x0/0x75

The buggy address belongs to the object at ffff880030743280
 which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 1008 bytes inside of
 2048-byte region [ffff880030743280, ffff880030743a80)
The buggy address belongs to the page:
page:ffffea0000a99670 count:1 mapcount:0 mapping:ffff880030742180 index:0x0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw: 0100000000008100 ffff880030742180 0000000000000000 0000000100000003
raw: ffffea0000b895a0 ffffea0000b91720 ffff880036000800
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880030743500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880030743580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff880030743600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff880030743680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880030743700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jan Kara Feb. 5, 2018, 8:56 a.m. UTC | #2
On Thu 01-02-18 19:58:45, Eric Biggers wrote:
> On Thu, Jan 11, 2018 at 06:00:08PM +0100, Jan Kara wrote:
> > On Thu 11-01-18 19:22:39, Hou Tao wrote:
> > > Hi,
> > > 
> > > On 2018/1/11 16:24, Dan Carpenter wrote:
> > > > Thanks for your report and the patch.  I am sending it to the
> > > > linux-block devs since it's already public.
> > > > 
> > > > regards,
> > > > dan carpenter
> > > 
> > > The User-after-free problem is not specific for loop device, it can also
> > > be reproduced on scsi device, and there are more race problems caused by
> > > the race between bdev open and gendisk shutdown [1].
> > > 
> > > The cause of the UAF problem is that there are two instances of gendisk which share
> > > the same bdev. After the process owning the new gendisk increases bdev->bd_openers,
> > > the other process which owns the older gendisk will find bdev->bd_openers is not zero
> > > and will put the last reference of the older gendisk and cause User-after-free.
> > > 
> > > I had proposed a patch for the problem, but it's still an incomplete fix for the race
> > > between gendisk shutdown and bdev opening.
> > > 
> > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > index 4a181fc..5ecdb9f 100644
> > > --- a/fs/block_dev.c
> > > +++ b/fs/block_dev.c
> > > @@ -1510,6 +1510,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> > >                 if (bdev->bd_bdi == &noop_backing_dev_info)
> > >                         bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
> > >         } else {
> > > +               if (bdev->bd_disk != disk) {
> > > +                       ret = -ENXIO;
> > > +                       goto out_unlock_bdev;
> > > +               }
> > > +
> > >                 if (bdev->bd_contains == bdev) {
> > >                         ret = 0;
> > >                         if (bdev->bd_disk->fops->open)
> > > 
> > > 
> > > As far as I know, Jan Kara is working on these problems. So, Jan, any suggestions ?
> > 
> > Yeah, thanks for the ping. I was working today to get this fixed. I have
> > about 6 patches (attached) which should fix all the outstanding issues I'm
> > aware of.  So far they are only boot tested. I will give them more
> > stress-testing next week and then post them officially but if you have time,
> > feel free to try them out as well. Thanks!
> > 
> 
> Jan, are you still planning to fix this?  This was also reported by syzbot here:
> https://groups.google.com/d/msg/syzkaller-bugs/9wXx4YULITQ/0SK8vqxhAgAJ.  Note
> that the C reproducer attached to that report doesn't work for me anymore
> following ae6650163c6 ("loop: fix concurrent lo_open/lo_release").  However,
> syzbot has still hit this on more recent kernels.  Here's a C reproducer that
> still works for me on Linus' tree as of today, though it can take 5-10 seconds:

Yes, I do plan to post the patches. I was just struggling with getting the
original failure reproduced (plus I had quite some other work, vacation) so
I couldn't really test whether my patches fix the reported problem. So
thanks for your reproducer, I'll try it out and see whether my patches fix
the problem.

								Honza

> #include <fcntl.h>
> #include <linux/loop.h>
> #include <sys/ioctl.h>
> #include <unistd.h>
> 
> int main()
> {
> 	int fd = open("/dev/loop-control", 0);
> 
> 	if (!fork()) {
> 		for (;;)
> 			ioctl(fd, LOOP_CTL_ADD, 0);
> 	}
> 
> 	if (!fork()) {
> 		for (;;)
> 			ioctl(fd, LOOP_CTL_REMOVE, 0);
> 	}
> 
> 	fork();
> 	for (;;) {
> 		fd = open("/dev/loop0", O_EXCL);
> 		close(fd);
> 	}
> }
> 
> This is the KASAN splat I get from the above:
> 
> BUG: KASAN: use-after-free in disk_unblock_events+0x3e/0x40 block/genhd.c:1672
> Read of size 8 at addr ffff880030743670 by task systemd-udevd/165
> 
> CPU: 1 PID: 165 Comm: systemd-udevd Not tainted 4.15.0-08279-gfe53d1443a146 #38
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0xb3/0x145 lib/dump_stack.c:53
> perf: interrupt took too long (7334 > 4816), lowering kernel.perf_event_max_sample_rate to 27000
>  print_address_description+0x73/0x290 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x235/0x350 mm/kasan/report.c:409
>  disk_unblock_events+0x3e/0x40 block/genhd.c:1672
>  __blkdev_get+0x41b/0xd30 fs/block_dev.c:1535
>  blkdev_get+0x283/0x980 fs/block_dev.c:1591
>  do_dentry_open.isra.1+0x6c0/0xaf0 fs/open.c:752
>  do_last fs/namei.c:3378 [inline]
>  path_openat+0x587/0x28b0 fs/namei.c:3518
>  do_filp_open+0x231/0x3a0 fs/namei.c:3553
>  do_sys_open+0x399/0x610 fs/open.c:1059
>  do_syscall_64+0x213/0x740 arch/x86/entry/common.c:285
>  entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f51d23262ce
> RSP: 002b:00007ffdf1844310 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000562878961d30 RCX: 00007f51d23262ce
> RDX: 00000000000a0800 RSI: 0000562878974860 RDI: ffffffffffffff9c
> RBP: 0000562878965150 R08: 00007f51d1c9d914 R09: ffffffffffffffb0
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000e
> R13: 0000562878965150 R14: 0000000000000004 R15: 0000000000000003
> 
> Allocated by task 160:
>  kmem_cache_alloc_node_trace include/linux/slab.h:413 [inline]
>  kmalloc_node include/linux/slab.h:537 [inline]
>  kzalloc_node include/linux/slab.h:699 [inline]
>  __alloc_disk_node+0xb6/0x450 block/genhd.c:1415
>  loop_add+0x3fd/0x9e0 drivers/block/loop.c:1814
>  loop_probe+0x152/0x180 drivers/block/loop.c:1921
>  kobj_lookup+0x176/0x310 drivers/base/map.c:124
>  get_gendisk+0x28/0x290 block/genhd.c:804
>  __blkdev_get+0x2ec/0xd30 fs/block_dev.c:1433
>  blkdev_get+0x485/0x980 fs/block_dev.c:1591
>  do_dentry_open.isra.1+0x6c0/0xaf0 fs/open.c:752
>  do_last fs/namei.c:3378 [inline]
>  path_openat+0x587/0x28b0 fs/namei.c:3518
>  do_filp_open+0x231/0x3a0 fs/namei.c:3553
>  do_sys_open+0x399/0x610 fs/open.c:1059
>  entry_SYSCALL_64_fastpath+0x1e/0x8b
> 
> Freed by task 165:
>  __cache_free mm/slab.c:3484 [inline]
>  kfree+0x8a/0xd0 mm/slab.c:3799
>  disk_release+0x2c3/0x380 block/genhd.c:1264
>  device_release+0x6e/0x1d0 drivers/base/core.c:811
>  kobject_cleanup lib/kobject.c:646 [inline]
>  kobject_release lib/kobject.c:675 [inline]
>  kref_put include/linux/kref.h:70 [inline]
>  kobject_put+0x14c/0x400 lib/kobject.c:692
>  __blkdev_get+0x39f/0xd30 fs/block_dev.c:1528
>  blkdev_get+0x283/0x980 fs/block_dev.c:1591
>  do_dentry_open.isra.1+0x6c0/0xaf0 fs/open.c:752
>  do_last fs/namei.c:3378 [inline]
>  path_openat+0x587/0x28b0 fs/namei.c:3518
>  do_filp_open+0x231/0x3a0 fs/namei.c:3553
>  do_sys_open+0x399/0x610 fs/open.c:1059
>  do_syscall_64+0x213/0x740 arch/x86/entry/common.c:285
>  return_from_SYSCALL_64+0x0/0x75
> 
> The buggy address belongs to the object at ffff880030743280
>  which belongs to the cache kmalloc-2048 of size 2048
> The buggy address is located 1008 bytes inside of
>  2048-byte region [ffff880030743280, ffff880030743a80)
> The buggy address belongs to the page:
> page:ffffea0000a99670 count:1 mapcount:0 mapping:ffff880030742180 index:0x0 compound_mapcount: 0
> flags: 0x100000000008100(slab|head)
> raw: 0100000000008100 ffff880030742180 0000000000000000 0000000100000003
> raw: ffffea0000b895a0 ffffea0000b91720 ffff880036000800
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff880030743500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880030743580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff880030743600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                              ^
>  ffff880030743680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880030743700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
diff mbox

Patch

From f560a27dd0938300a6bb5b16d9164e29b5abf980 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 11 Jan 2018 17:29:03 +0100
Subject: [PATCH 6/6] blockdev: Avoid two active bdev inodes for one device

When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:

CPU0					CPU1
blkdev_open()
  bdev = bd_acquire()
					del_gendisk()
					  bdev_unhash_inode(bdev);
					remove device
					create new device with the same number
  __blkdev_get()
    disk = get_gendisk()
      - gets reference to gendisk of the new device

Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.

Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index fe41a76769fa..fe09ef9c21f3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1058,6 +1058,27 @@  static int bd_prepare_to_claim(struct block_device *bdev,
 	return 0;
 }
 
+static struct gendisk *bdev_get_gendisk(struct block_device *bdev, int *partno)
+{
+	struct gendisk *disk = get_gendisk(bdev->bd_dev, partno);
+
+	if (!disk)
+		return NULL;
+	/*
+	 * Now that we hold gendisk reference we make sure bdev we looked up is
+	 * not stale. If it is, it means device got removed and created before
+	 * we looked up gendisk and we fail open in such case. Associating
+	 * unhashed bdev with newly created gendisk could lead to two bdevs
+	 * (and thus two independent caches) being associated with one device
+	 * which is bad.
+	 */
+	if (inode_unhashed(bdev->bd_inode)) {
+		put_disk_and_module(disk);
+		return NULL;
+	}
+	return disk;
+}
+
 /**
  * bd_start_claiming - start claiming a block device
  * @bdev: block device of interest
@@ -1094,7 +1115,7 @@  static struct block_device *bd_start_claiming(struct block_device *bdev,
 	 * @bdev might not have been initialized properly yet, look up
 	 * and grab the outer block device the hard way.
 	 */
-	disk = get_gendisk(bdev->bd_dev, &partno);
+	disk = bdev_get_gendisk(bdev, &partno);
 	if (!disk)
 		return ERR_PTR(-ENXIO);
 
@@ -1429,7 +1450,7 @@  static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
  restart:
 
 	ret = -ENXIO;
-	disk = get_gendisk(bdev->bd_dev, &partno);
+	disk = bdev_get_gendisk(bdev, &partno);
 	if (!disk)
 		goto out;
 
-- 
2.13.6