Message ID | 20221027100410.3891-1-joshi.k@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix bio-allocation from per-cpu cache | expand |
On 10/27/22 11:04, Kanchan Joshi wrote: > If cache does not have any entry, make sure to detect that and return > failure. Otherwise this leads to null pointer dereference. Damn, it was done right in v2 https://lore.kernel.org/all/9fd04486d972c1f3ef273fa26b4b6bf51a5e4270.1666122465.git.asml.silence@gmail.com/ Perhaps I based v3 on a wrong version. Thanks > Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > Can be reproduced by: > fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block > > BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a > Read of size 8 at addr 0000000000000000 by task fio/1835 > > CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_report+0x490/0x4a1 > ? __virt_addr_valid+0x28/0x140 > ? bio_alloc_bioset.cold+0x2a/0x16a > kasan_report+0xb3/0x130 > ? bio_alloc_bioset.cold+0x2a/0x16a > bio_alloc_bioset.cold+0x2a/0x16a > ? bvec_alloc+0xf0/0xf0 > ? iov_iter_is_aligned+0x130/0x2c0 > blkdev_direct_IO.part.0+0x16a/0x8d0 > > block/bio.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 8f624ffaf3d0..66f088bb3736 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -439,13 +439,14 @@ static struct bio *bio_alloc_percpu_cache(struct block_device *bdev, > > cache = per_cpu_ptr(bs->cache, get_cpu()); > if (!cache->free_list && > - READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) { > + READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) > bio_alloc_irq_cache_splice(cache); > - if (!cache->free_list) { > - put_cpu(); > - return NULL; > - } > + > + if (!cache->free_list) { Let's nest it under the other "if (!cache->free_list)" > + put_cpu(); > + return NULL; > } > + > bio = cache->free_list; > cache->free_list = bio->bi_next; > cache->nr--;
On Thu, Oct 27, 2022 at 11:38:50AM +0100, Pavel Begunkov wrote: >On 10/27/22 11:04, Kanchan Joshi wrote: >>If cache does not have any entry, make sure to detect that and return >>failure. Otherwise this leads to null pointer dereference. > >Damn, it was done right in v2 > >https://lore.kernel.org/all/9fd04486d972c1f3ef273fa26b4b6bf51a5e4270.1666122465.git.asml.silence@gmail.com/ > >Perhaps I based v3 on a wrong version. Thanks > > >>Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") >>Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> >>--- >>Can be reproduced by: >>fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block >> >>BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a >>Read of size 8 at addr 0000000000000000 by task fio/1835 >> >>CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 >>Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g >>Call Trace: >> <TASK> >> dump_stack_lvl+0x34/0x48 >> print_report+0x490/0x4a1 >> ? __virt_addr_valid+0x28/0x140 >> ? bio_alloc_bioset.cold+0x2a/0x16a >> kasan_report+0xb3/0x130 >> ? bio_alloc_bioset.cold+0x2a/0x16a >> bio_alloc_bioset.cold+0x2a/0x16a >> ? bvec_alloc+0xf0/0xf0 >> ? iov_iter_is_aligned+0x130/0x2c0 >> blkdev_direct_IO.part.0+0x16a/0x8d0 >> >> block/bio.c | 11 ++++++----- >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >>diff --git a/block/bio.c b/block/bio.c >>index 8f624ffaf3d0..66f088bb3736 100644 >>--- a/block/bio.c >>+++ b/block/bio.c >>@@ -439,13 +439,14 @@ static struct bio *bio_alloc_percpu_cache(struct block_device *bdev, >> cache = per_cpu_ptr(bs->cache, get_cpu()); >> if (!cache->free_list && >>- READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) { >>+ READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) >> bio_alloc_irq_cache_splice(cache); >>- if (!cache->free_list) { >>- put_cpu(); >>- return NULL; >>- } >>+ >>+ if (!cache->free_list) { > >Let's nest it under the other "if (!cache->free_list)" Not sure if I got you. It was under that if condition earlier, and that part causes trouble. What you wrote in v2 is another way, but there also we have two checks on cache->free_list.
On 10/27/22 11:49, Kanchan Joshi wrote: > On Thu, Oct 27, 2022 at 11:38:50AM +0100, Pavel Begunkov wrote: >> On 10/27/22 11:04, Kanchan Joshi wrote: >>> If cache does not have any entry, make sure to detect that and return >>> failure. Otherwise this leads to null pointer dereference. >> >> Damn, it was done right in v2 >> >> https://lore.kernel.org/all/9fd04486d972c1f3ef273fa26b4b6bf51a5e4270.1666122465.git.asml.silence@gmail.com/ >> >> Perhaps I based v3 on a wrong version. Thanks >> >> >>> Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") >>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> >>> --- >>> Can be reproduced by: >>> fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block >>> >>> BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a >>> Read of size 8 at addr 0000000000000000 by task fio/1835 >>> >>> CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 >>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g >>> Call Trace: >>> <TASK> >>> dump_stack_lvl+0x34/0x48 >>> print_report+0x490/0x4a1 >>> ? __virt_addr_valid+0x28/0x140 >>> ? bio_alloc_bioset.cold+0x2a/0x16a >>> kasan_report+0xb3/0x130 >>> ? bio_alloc_bioset.cold+0x2a/0x16a >>> bio_alloc_bioset.cold+0x2a/0x16a >>> ? bvec_alloc+0xf0/0xf0 >>> ? iov_iter_is_aligned+0x130/0x2c0 >>> blkdev_direct_IO.part.0+0x16a/0x8d0 >>> >>> block/bio.c | 11 ++++++----- >>> 1 file changed, 6 insertions(+), 5 deletions(-) >>> >>> diff --git a/block/bio.c b/block/bio.c >>> index 8f624ffaf3d0..66f088bb3736 100644 >>> --- a/block/bio.c >>> +++ b/block/bio.c >>> @@ -439,13 +439,14 @@ static struct bio *bio_alloc_percpu_cache(struct block_device *bdev, >>> cache = per_cpu_ptr(bs->cache, get_cpu()); >>> if (!cache->free_list && >>> - READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) { >>> + READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) >>> bio_alloc_irq_cache_splice(cache); >>> - if (!cache->free_list) { >>> - put_cpu(); >>> - return NULL; >>> - } >>> + >>> + if (!cache->free_list) { >> >> Let's nest it under the other "if (!cache->free_list)" > > Not sure if I got you. It was under that if condition earlier, and that > part causes trouble. Under the free_list check specifically, the threshold would also go in a separate if, > What you wrote in v2 is another way, but there also we have two checks > on cache->free_list. Your version: if (cache_empty()) if (check_threshold()) try_replenish_cache(); // splice if (cache_empty()) // still empty return NULL; vs v2: if (cache_empty()) { if (check_threshold()) try_replenish_cache(); // splice if (cache_empty()) // still empty return NULL; } But on the other hand compilers should be smart enough to optimise repeated checks when the cache already have requests, so there should be no real difference.
On 10/27/22 4:04 AM, Kanchan Joshi wrote: > If cache does not have any entry, make sure to detect that and return > failure. Otherwise this leads to null pointer dereference. > > Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > Can be reproduced by: > fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block > > BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a > Read of size 8 at addr 0000000000000000 by task fio/1835 > > CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_report+0x490/0x4a1 > ? __virt_addr_valid+0x28/0x140 > ? bio_alloc_bioset.cold+0x2a/0x16a > kasan_report+0xb3/0x130 > ? bio_alloc_bioset.cold+0x2a/0x16a > bio_alloc_bioset.cold+0x2a/0x16a > ? bvec_alloc+0xf0/0xf0 > ? iov_iter_is_aligned+0x130/0x2c0 > blkdev_direct_IO.part.0+0x16a/0x8d0 Was going to apply this, but after running some testing, it does fix the initial crash but I still get weird corruption crashes with the series it's fixing. Pavel, I'm going to drop this series for now.
On 10/27/22 21:44, Jens Axboe wrote: > On 10/27/22 4:04 AM, Kanchan Joshi wrote: >> If cache does not have any entry, make sure to detect that and return >> failure. Otherwise this leads to null pointer dereference. >> >> Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") >> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> >> --- >> Can be reproduced by: >> fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block >> >> BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a >> Read of size 8 at addr 0000000000000000 by task fio/1835 >> >> CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g >> Call Trace: >> <TASK> >> dump_stack_lvl+0x34/0x48 >> print_report+0x490/0x4a1 >> ? __virt_addr_valid+0x28/0x140 >> ? bio_alloc_bioset.cold+0x2a/0x16a >> kasan_report+0xb3/0x130 >> ? bio_alloc_bioset.cold+0x2a/0x16a >> bio_alloc_bioset.cold+0x2a/0x16a >> ? bvec_alloc+0xf0/0xf0 >> ? iov_iter_is_aligned+0x130/0x2c0 >> blkdev_direct_IO.part.0+0x16a/0x8d0 > > Was going to apply this, but after running some testing, it does > fix the initial crash but I still get weird corruption crashes > with the series it's fixing. > > Pavel, I'm going to drop this series for now. I found one yesterday. Is the issue reproducible?
On 10/27/22 21:45, Pavel Begunkov wrote: > On 10/27/22 21:44, Jens Axboe wrote: >> On 10/27/22 4:04 AM, Kanchan Joshi wrote: >>> If cache does not have any entry, make sure to detect that and return >>> failure. Otherwise this leads to null pointer dereference. >>> >>> Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") >>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> >>> --- >>> Can be reproduced by: >>> fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block >>> >>> BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a >>> Read of size 8 at addr 0000000000000000 by task fio/1835 >>> >>> CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 >>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g >>> Call Trace: >>> <TASK> >>> dump_stack_lvl+0x34/0x48 >>> print_report+0x490/0x4a1 >>> ? __virt_addr_valid+0x28/0x140 >>> ? bio_alloc_bioset.cold+0x2a/0x16a >>> kasan_report+0xb3/0x130 >>> ? bio_alloc_bioset.cold+0x2a/0x16a >>> bio_alloc_bioset.cold+0x2a/0x16a >>> ? bvec_alloc+0xf0/0xf0 >>> ? iov_iter_is_aligned+0x130/0x2c0 >>> blkdev_direct_IO.part.0+0x16a/0x8d0 >> >> Was going to apply this, but after running some testing, it does >> fix the initial crash but I still get weird corruption crashes >> with the series it's fixing. >> >> Pavel, I'm going to drop this series for now. > > I found one yesterday. Is the issue reproducible? found one bug *
On 10/27/22 2:45 PM, Pavel Begunkov wrote: > On 10/27/22 21:44, Jens Axboe wrote: >> On 10/27/22 4:04 AM, Kanchan Joshi wrote: >>> If cache does not have any entry, make sure to detect that and return >>> failure. Otherwise this leads to null pointer dereference. >>> >>> Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") >>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> >>> --- >>> Can be reproduced by: >>> fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block >>> >>> BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a >>> Read of size 8 at addr 0000000000000000 by task fio/1835 >>> >>> CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 >>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g >>> Call Trace: >>> <TASK> >>> dump_stack_lvl+0x34/0x48 >>> print_report+0x490/0x4a1 >>> ? __virt_addr_valid+0x28/0x140 >>> ? bio_alloc_bioset.cold+0x2a/0x16a >>> kasan_report+0xb3/0x130 >>> ? bio_alloc_bioset.cold+0x2a/0x16a >>> bio_alloc_bioset.cold+0x2a/0x16a >>> ? bvec_alloc+0xf0/0xf0 >>> ? iov_iter_is_aligned+0x130/0x2c0 >>> blkdev_direct_IO.part.0+0x16a/0x8d0 >> >> Was going to apply this, but after running some testing, it does >> fix the initial crash but I still get weird corruption crashes >> with the series it's fixing. >> >> Pavel, I'm going to drop this series for now. > > I found one yesterday. Is the issue reproducible? Oh yeah, triggers in < 1 second for me when running my usual irq peak bench: t/io_uring -p0 -d128 -b512 -s32 -c32 -F1 -B1 -R0 -X1 -n24 -P1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1 Interestingly, doesn't trigger in qemu with just a single device.
On 10/27/22 21:52, Jens Axboe wrote: > On 10/27/22 2:45 PM, Pavel Begunkov wrote: >> On 10/27/22 21:44, Jens Axboe wrote: >>> On 10/27/22 4:04 AM, Kanchan Joshi wrote: >>>> If cache does not have any entry, make sure to detect that and return >>>> failure. Otherwise this leads to null pointer dereference. >>>> >>>> Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") >>>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> >>>> --- >>>> Can be reproduced by: >>>> fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block >>>> >>>> BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a >>>> Read of size 8 at addr 0000000000000000 by task fio/1835 >>>> >>>> CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 >>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g >>>> Call Trace: >>>> <TASK> >>>> dump_stack_lvl+0x34/0x48 >>>> print_report+0x490/0x4a1 >>>> ? __virt_addr_valid+0x28/0x140 >>>> ? bio_alloc_bioset.cold+0x2a/0x16a >>>> kasan_report+0xb3/0x130 >>>> ? bio_alloc_bioset.cold+0x2a/0x16a >>>> bio_alloc_bioset.cold+0x2a/0x16a >>>> ? bvec_alloc+0xf0/0xf0 >>>> ? iov_iter_is_aligned+0x130/0x2c0 >>>> blkdev_direct_IO.part.0+0x16a/0x8d0 >>> >>> Was going to apply this, but after running some testing, it does >>> fix the initial crash but I still get weird corruption crashes >>> with the series it's fixing. >>> >>> Pavel, I'm going to drop this series for now. >> >> I found one yesterday. Is the issue reproducible? > > Oh yeah, triggers in < 1 second for me when running my usual irq > peak bench: > > t/io_uring -p0 -d128 -b512 -s32 -c32 -F1 -B1 -R0 -X1 -n24 -P1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1 > > Interestingly, doesn't trigger in qemu with just a single device. The bug I mentioned is splicing from in-IRQ put, which modifies the non-irq list. We need to hit that ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK in the cache to trigger it, so makes sense you see it only with very high qd tests, matches the profile. I'll resend the patch set with a few changes, but would be great if you can say if sth like below works for you diff --git a/block/bio.c b/block/bio.c index 0686a3774157..af715aee239b 100644 --- a/block/bio.c +++ b/block/bio.c @@ -764,6 +764,12 @@ static inline void bio_put_percpu_cache(struct bio *bio) struct bio_alloc_cache *cache; cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu()); + if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX) { + put_cpu(); + bio_free(bio); + return; + } + bio_uninit(bio); if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) { @@ -779,10 +785,6 @@ static inline void bio_put_percpu_cache(struct bio *bio) cache->nr_irq++; local_irq_restore(flags); } - - if (READ_ONCE(cache->nr_irq) + cache->nr > - ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK) - bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK); put_cpu(); }
diff --git a/block/bio.c b/block/bio.c index 8f624ffaf3d0..66f088bb3736 100644 --- a/block/bio.c +++ b/block/bio.c @@ -439,13 +439,14 @@ static struct bio *bio_alloc_percpu_cache(struct block_device *bdev, cache = per_cpu_ptr(bs->cache, get_cpu()); if (!cache->free_list && - READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) { + READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD) bio_alloc_irq_cache_splice(cache); - if (!cache->free_list) { - put_cpu(); - return NULL; - } + + if (!cache->free_list) { + put_cpu(); + return NULL; } + bio = cache->free_list; cache->free_list = bio->bi_next; cache->nr--;
If cache does not have any entry, make sure to detect that and return failure. Otherwise this leads to null pointer dereference. Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put") Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- Can be reproduced by: fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a Read of size 8 at addr 0000000000000000 by task fio/1835 CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x490/0x4a1 ? __virt_addr_valid+0x28/0x140 ? bio_alloc_bioset.cold+0x2a/0x16a kasan_report+0xb3/0x130 ? bio_alloc_bioset.cold+0x2a/0x16a bio_alloc_bioset.cold+0x2a/0x16a ? bvec_alloc+0xf0/0xf0 ? iov_iter_is_aligned+0x130/0x2c0 blkdev_direct_IO.part.0+0x16a/0x8d0 block/bio.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)