Message ID | 20211210120631.2578505-2-lijinlin3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix undefined behaviour during device synchronization | expand |
On 12/10/21 1:06 PM, Li Jinlin wrote: > UBSAN reports this problem: > > [ 5984.281385] UBSAN: Undefined behaviour in drivers/md/md.c:8175:15 > [ 5984.281390] signed integer overflow: > [ 5984.281393] -2147483291 - 2072033152 cannot be represented in type 'int' > [ 5984.281400] CPU: 25 PID: 1854 Comm: md101_resync Kdump: loaded Not tainted 4.19.90 > [ 5984.281404] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDDA > [ 5984.281406] Call trace: > [ 5984.281415] dump_backtrace+0x0/0x310 > [ 5984.281418] show_stack+0x28/0x38 > [ 5984.281425] dump_stack+0xec/0x15c > [ 5984.281430] ubsan_epilogue+0x18/0x84 > [ 5984.281434] handle_overflow+0x14c/0x19c > [ 5984.281439] __ubsan_handle_sub_overflow+0x34/0x44 > [ 5984.281445] is_mddev_idle+0x338/0x3d8 > [ 5984.281449] md_do_sync+0x1bb8/0x1cf8 > [ 5984.281452] md_thread+0x220/0x288 > [ 5984.281457] kthread+0x1d8/0x1e0 > [ 5984.281461] ret_from_fork+0x10/0x18 > > When the stat aacum of the disk is greater than INT_MAX, its > value becomes negative after casting to 'int', which may lead > to overflow after subtracting a positive number. In the same > way, when the value of sync_io is greater than INT_MAX, > overflow may also occur. These situations will lead to > undefined behavior. > > Otherwise, if the stat accum of the disk is close to INT_MAX > when creating raid arrays, the initial value of last_events > would be set close to INT_MAX when mddev initializes IO > event counters. 'curr_events - rdev->last_events > 64' will > always false during synchronization. If all the disks of mddev > are in this case, is_mddev_idle() will always return 1, which > may cause non-sync IO is very slow. > > Fix by using atomic64_t type for sync_io, and using s64 type > for curr_events/last_events. > > Signed-off-by: Li Jinlin <lijinlin3@huawei.com> > --- > drivers/md/md.c | 6 +++--- > drivers/md/md.h | 4 ++-- > include/linux/genhd.h | 2 +- > 3 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 5111ed966947..be73a5ae6864 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8429,14 +8429,14 @@ static int is_mddev_idle(struct mddev *mddev, int init) > { > struct md_rdev *rdev; > int idle; > - int curr_events; > + s64 curr_events; > > idle = 1; > rcu_read_lock(); > rdev_for_each_rcu(rdev, mddev) { > struct gendisk *disk = rdev->bdev->bd_disk; > - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - > - atomic_read(&disk->sync_io); > + curr_events = (s64)part_stat_read_accum(disk->part0, sectors) - > + atomic64_read(&disk->sync_io); > /* sync IO will cause sync_io to increase before the disk_stats > * as sync_io is counted when a request starts, and > * disk_stats is counted when it completes. > diff --git a/drivers/md/md.h b/drivers/md/md.h > index 53ea7a6961de..e00d6730da13 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -50,7 +50,7 @@ struct md_rdev { > > sector_t sectors; /* Device size (in 512bytes sectors) */ > struct mddev *mddev; /* RAID array if running */ > - int last_events; /* IO event timestamp */ > + s64 last_events; /* IO event timestamp */ > > /* > * If meta_bdev is non-NULL, it means that a separate device is > @@ -551,7 +551,7 @@ extern void mddev_unlock(struct mddev *mddev); > > static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors) > { > - atomic_add(nr_sectors, &bdev->bd_disk->sync_io); > + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io); > } > > static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) > diff --git a/include/linux/genhd.h b/include/linux/genhd.h > index 74c410263113..efa7884de11b 100644 > --- a/include/linux/genhd.h > +++ b/include/linux/genhd.h > @@ -150,7 +150,7 @@ struct gendisk { > struct list_head slave_bdevs; > #endif > struct timer_rand_state *random; > - atomic_t sync_io; /* RAID */ > + atomic64_t sync_io; /* RAID */ > struct disk_events *ev; > #ifdef CONFIG_BLK_DEV_INTEGRITY > struct kobject integrity_kobj; > You haven't answered my question. This patch has exactly the same problem than the original, only shifted to LONG_MAX instead of INT_MAX. Have you considered decreasing 'sync_io' in the endio handler, and then just using the 'sync_io' value to figure out if sync_io is active? Cheers, Hannes
diff --git a/drivers/md/md.c b/drivers/md/md.c index 5111ed966947..be73a5ae6864 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8429,14 +8429,14 @@ static int is_mddev_idle(struct mddev *mddev, int init) { struct md_rdev *rdev; int idle; - int curr_events; + s64 curr_events; idle = 1; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) { struct gendisk *disk = rdev->bdev->bd_disk; - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - - atomic_read(&disk->sync_io); + curr_events = (s64)part_stat_read_accum(disk->part0, sectors) - + atomic64_read(&disk->sync_io); /* sync IO will cause sync_io to increase before the disk_stats * as sync_io is counted when a request starts, and * disk_stats is counted when it completes. diff --git a/drivers/md/md.h b/drivers/md/md.h index 53ea7a6961de..e00d6730da13 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -50,7 +50,7 @@ struct md_rdev { sector_t sectors; /* Device size (in 512bytes sectors) */ struct mddev *mddev; /* RAID array if running */ - int last_events; /* IO event timestamp */ + s64 last_events; /* IO event timestamp */ /* * If meta_bdev is non-NULL, it means that a separate device is @@ -551,7 +551,7 @@ extern void mddev_unlock(struct mddev *mddev); static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors) { - atomic_add(nr_sectors, &bdev->bd_disk->sync_io); + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io); } static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 74c410263113..efa7884de11b 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -150,7 +150,7 @@ struct gendisk { struct list_head slave_bdevs; #endif struct timer_rand_state *random; - atomic_t sync_io; /* RAID */ + atomic64_t sync_io; /* RAID */ struct disk_events *ev; #ifdef CONFIG_BLK_DEV_INTEGRITY struct kobject integrity_kobj;
UBSAN reports this problem: [ 5984.281385] UBSAN: Undefined behaviour in drivers/md/md.c:8175:15 [ 5984.281390] signed integer overflow: [ 5984.281393] -2147483291 - 2072033152 cannot be represented in type 'int' [ 5984.281400] CPU: 25 PID: 1854 Comm: md101_resync Kdump: loaded Not tainted 4.19.90 [ 5984.281404] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDDA [ 5984.281406] Call trace: [ 5984.281415] dump_backtrace+0x0/0x310 [ 5984.281418] show_stack+0x28/0x38 [ 5984.281425] dump_stack+0xec/0x15c [ 5984.281430] ubsan_epilogue+0x18/0x84 [ 5984.281434] handle_overflow+0x14c/0x19c [ 5984.281439] __ubsan_handle_sub_overflow+0x34/0x44 [ 5984.281445] is_mddev_idle+0x338/0x3d8 [ 5984.281449] md_do_sync+0x1bb8/0x1cf8 [ 5984.281452] md_thread+0x220/0x288 [ 5984.281457] kthread+0x1d8/0x1e0 [ 5984.281461] ret_from_fork+0x10/0x18 When the stat aacum of the disk is greater than INT_MAX, its value becomes negative after casting to 'int', which may lead to overflow after subtracting a positive number. In the same way, when the value of sync_io is greater than INT_MAX, overflow may also occur. These situations will lead to undefined behavior. Otherwise, if the stat accum of the disk is close to INT_MAX when creating raid arrays, the initial value of last_events would be set close to INT_MAX when mddev initializes IO event counters. 'curr_events - rdev->last_events > 64' will always false during synchronization. If all the disks of mddev are in this case, is_mddev_idle() will always return 1, which may cause non-sync IO is very slow. Fix by using atomic64_t type for sync_io, and using s64 type for curr_events/last_events. Signed-off-by: Li Jinlin <lijinlin3@huawei.com> --- drivers/md/md.c | 6 +++--- drivers/md/md.h | 4 ++-- include/linux/genhd.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-)