Message ID | 20240324133702.1328237-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] block: fail unaligned bio from submit_bio_noacct() | expand |
On Sun, Mar 24 2024 at 9:37P -0400, Ming Lei <ming.lei@redhat.com> wrote: > For any FS bio, its start sector and size have to be aligned with the > queue's logical block size from beginning, because bio split code can't > make one aligned bio. > > This rule is obvious, but there is still user which may send unaligned > bio to block layer, and it is observed that dm-integrity can do that, > and cause double free of driver's dma meta buffer. > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > troubles. > > Meantime remove this kind of check in dio and discard code path. > > Cc: Keith Busch <kbusch@kernel.org> > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Mikulas Patocka <mpatocka@redhat.com> > Cc: Mike Snitzer <snitzer@kernel.org> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > V2: > - remove the check in dio and discard code path > - check .bi_sector with (logical_block_size >> 9) - 1 > > block/blk-core.c | 16 ++++++++++++++++ > block/blk-lib.c | 17 ----------------- > block/fops.c | 3 +-- > 3 files changed, 17 insertions(+), 19 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index a16b5abdbbf5..2d86922f95e3 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -729,6 +729,19 @@ void submit_bio_noacct_nocheck(struct bio *bio) > __submit_bio_noacct(bio); > } > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > +{ > + unsigned int bs = q->limits.logical_block_size; > + > + if (bio->bi_iter.bi_size & (bs - 1)) > + return false; > + > + if (bio->bi_iter.bi_sector & ((bs >> SECTOR_SHIFT) - 1)) > + return false; > + > + return true; > +} > + You missed Christoph's reply to v1 where he offered: "This should just use bdev_logical_block_size() on bio->bi_bdev." Otherwise, looks good. Mike
On Sun, Mar 24, 2024 at 09:37:02PM +0800, Ming Lei wrote: > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > +{ > + unsigned int bs = q->limits.logical_block_size; > + > + if (bio->bi_iter.bi_size & (bs - 1)) > + return false; > + > + if (bio->bi_iter.bi_sector & ((bs >> SECTOR_SHIFT) - 1)) > + return false; > + > + return true; > +} This should still use bdev_logic_block_size. And maybe it's just me, but I think dropping thelines after the false returns would actually make it more readle. > diff --git a/block/fops.c b/block/fops.c > index 679d9b752fe8..75595c728190 100644 > --- a/block/fops.c > +++ b/block/fops.c > @@ -37,8 +37,7 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb) > static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos, > struct iov_iter *iter) > { > - return pos & (bdev_logical_block_size(bdev) - 1) || > - !bdev_iter_is_aligned(bdev, iter); > + return !bdev_iter_is_aligned(bdev, iter); If you drop this: - we now actually go all the way down to building and submiting a bio for a trivial bounds check. - your get a trivial to trigger WARN_ON. I'd strongly advise against dropping this check.
On Sun, Mar 24, 2024 at 04:25:04PM -0700, Christoph Hellwig wrote: > On Sun, Mar 24, 2024 at 09:37:02PM +0800, Ming Lei wrote: > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > > +{ > > + unsigned int bs = q->limits.logical_block_size; > > + > > + if (bio->bi_iter.bi_size & (bs - 1)) > > + return false; > > + > > + if (bio->bi_iter.bi_sector & ((bs >> SECTOR_SHIFT) - 1)) > > + return false; > > + > > + return true; > > +} > > > This should still use bdev_logic_block_size. And maybe it's just me, > but I think dropping thelines after the false returns would actually > make it more readle. OK, will remove the blank line. > > > diff --git a/block/fops.c b/block/fops.c > > index 679d9b752fe8..75595c728190 100644 > > --- a/block/fops.c > > +++ b/block/fops.c > > @@ -37,8 +37,7 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb) > > static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos, > > struct iov_iter *iter) > > { > > - return pos & (bdev_logical_block_size(bdev) - 1) || > > - !bdev_iter_is_aligned(bdev, iter); > > + return !bdev_iter_is_aligned(bdev, iter); > > If you drop this: > > - we now actually go all the way down to building and submiting a > bio for a trivial bounds check. > - your get a trivial to trigger WARN_ON. > > I'd strongly advise against dropping this check. OK. Also only q->limits.logical_block_size is fetched for small BS IO fast path, I think log(lbs) can be cached in request_queue for avoiding the extra fetch of q.limits. Especially, it could be easier to do so with your recent queue limit atomic update changes. Thanks, Ming
On Mon, Mar 25, 2024 at 11:03:25AM +0800, Ming Lei wrote: > Also only q->limits.logical_block_size is fetched for small BS IO > fast path, I think log(lbs) can be cached in request_queue for avoiding the > extra fetch of q.limits. Especially, it could be easier to do so > with your recent queue limit atomic update changes. So. One thing I've been thinking of for a while (and which Bart also mentioned) is tht queue_limits currently is a bit of a mess between the actual queue limits, and the gneidks configuration. The logical block size is firmly in the latter, and we should probably move it to the gendisk eventually. Depending on how converting the SCSI ULDs to the atomic queue limits API goes that imght happen rather sooner than later.
On Sun, Mar 24, 2024 at 08:12:01PM -0700, Christoph Hellwig wrote: > On Mon, Mar 25, 2024 at 11:03:25AM +0800, Ming Lei wrote: > > Also only q->limits.logical_block_size is fetched for small BS IO > > fast path, I think log(lbs) can be cached in request_queue for avoiding the > > extra fetch of q.limits. Especially, it could be easier to do so > > with your recent queue limit atomic update changes. > > So. One thing I've been thinking of for a while (and which Bart also > mentioned) is tht queue_limits currently is a bit of a mess between > the actual queue limits, and the gneidks configuration. The logical > block size is firmly in the latter, and we should probably move it lbs and pbs belong to disk, but some others may not be very obvious. Strictly speaking elevator/blkcg belong to disk too, but still stay in request_queue, :-) Thanks, Ming
On Sun, Mar 24, 2024 at 09:37:02PM +0800, Ming Lei wrote: > @@ -780,6 +793,9 @@ void submit_bio_noacct(struct bio *bio) > } > } > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > + goto end_io; > + The "status" at this point is "BLK_STS_IOERR", so user space would see EIO, but the existing checks return EINVAL. I'm not sure if that's "ok", but assuming it is, I think the user visible different behavior should be mentioned in the changelog. Alternatively, maybe we want an asynchronous way to return EINVAL for these conditions. It's more informative to a user where the problem is than a generic EIO. There is no BLK_STS_ value that translates to EINVAL, though, so maybe we need a new block status code like BLK_STS_INVALID_REQUEST. > @@ -53,10 +52,6 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, > return -EOPNOTSUPP; > } > > - bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; > - if ((sector | nr_sects) & bs_mask) > - return -EINVAL; > - > if (!nr_sects) > return -EINVAL;
On Mon, Mar 25, 2024 at 11:53:45AM -0700, Keith Busch wrote: > On Sun, Mar 24, 2024 at 09:37:02PM +0800, Ming Lei wrote: > > @@ -780,6 +793,9 @@ void submit_bio_noacct(struct bio *bio) > > } > > } > > > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > > + goto end_io; > > + > > The "status" at this point is "BLK_STS_IOERR", so user space would see > EIO, but the existing checks return EINVAL. I'm not sure if that's "ok", > but assuming it is, I think the user visible different behavior should > be mentioned in the changelog. > > Alternatively, maybe we want an asynchronous way to return EINVAL for It has to be async way to return it because submit_bio*() returns void. > these conditions. It's more informative to a user where the problem is > than a generic EIO. There is no BLK_STS_ value that translates to > EINVAL, though, so maybe we need a new block status code like > BLK_STS_INVALID_REQUEST. Yeah, I agree, but that is one existed issue. The 'status' should have been initialized as 'BLK_STS_INVALID_REQUEST' or 'BLK_STS_INVALID' in submit_bio_noacct(), and all check failure can be thought as -EINVAL. Thanks, Ming
diff --git a/block/blk-core.c b/block/blk-core.c index a16b5abdbbf5..2d86922f95e3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -729,6 +729,19 @@ void submit_bio_noacct_nocheck(struct bio *bio) __submit_bio_noacct(bio); } +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) +{ + unsigned int bs = q->limits.logical_block_size; + + if (bio->bi_iter.bi_size & (bs - 1)) + return false; + + if (bio->bi_iter.bi_sector & ((bs >> SECTOR_SHIFT) - 1)) + return false; + + return true; +} + /** * submit_bio_noacct - re-submit a bio to the block device layer for I/O * @bio: The bio describing the location in memory and on the device. @@ -780,6 +793,9 @@ void submit_bio_noacct(struct bio *bio) } } + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) + goto end_io; + if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) bio_clear_polled(bio); diff --git a/block/blk-lib.c b/block/blk-lib.c index a6954eafb8c8..ea1a7d16ffdf 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -39,7 +39,6 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct bio **biop) { struct bio *bio = *biop; - sector_t bs_mask; if (bdev_read_only(bdev)) return -EPERM; @@ -53,10 +52,6 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, return -EOPNOTSUPP; } - bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; - if ((sector | nr_sects) & bs_mask) - return -EINVAL; - if (!nr_sects) return -EINVAL; @@ -217,11 +212,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, unsigned flags) { int ret; - sector_t bs_mask; - - bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; - if ((sector | nr_sects) & bs_mask) - return -EINVAL; ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask, biop, flags); @@ -250,15 +240,10 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned flags) { int ret = 0; - sector_t bs_mask; struct bio *bio; struct blk_plug plug; bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev); - bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; - if ((sector | nr_sects) & bs_mask) - return -EINVAL; - retry: bio = NULL; blk_start_plug(&plug); @@ -313,8 +298,6 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector, if (max_sectors == 0) return -EOPNOTSUPP; - if ((sector | nr_sects) & bs_mask) - return -EINVAL; if (bdev_read_only(bdev)) return -EPERM; diff --git a/block/fops.c b/block/fops.c index 679d9b752fe8..75595c728190 100644 --- a/block/fops.c +++ b/block/fops.c @@ -37,8 +37,7 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb) static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos, struct iov_iter *iter) { - return pos & (bdev_logical_block_size(bdev) - 1) || - !bdev_iter_is_aligned(bdev, iter); + return !bdev_iter_is_aligned(bdev, iter); } #define DIO_INLINE_BIO_VECS 4
For any FS bio, its start sector and size have to be aligned with the queue's logical block size from beginning, because bio split code can't make one aligned bio. This rule is obvious, but there is still user which may send unaligned bio to block layer, and it is observed that dm-integrity can do that, and cause double free of driver's dma meta buffer. So failfast unaligned bio from submit_bio_noacct() for avoiding more troubles. Meantime remove this kind of check in dio and discard code path. Cc: Keith Busch <kbusch@kernel.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- V2: - remove the check in dio and discard code path - check .bi_sector with (logical_block_size >> 9) - 1 block/blk-core.c | 16 ++++++++++++++++ block/blk-lib.c | 17 ----------------- block/fops.c | 3 +-- 3 files changed, 17 insertions(+), 19 deletions(-)