Message ID | 20231212110844.19698-9-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | block atomic writes | expand |
On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote: > Currently an IO size is limited to the request_queue limits max_sectors. > Limit the size for an atomic write to queue limit atomic_write_max_sectors > value. > > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > block/blk-merge.c | 12 +++++++++++- > block/blk.h | 3 +++ > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 0ccc251e22ff..8d4de9253fe9 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio, > { > unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT; > unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT; > - unsigned max_sectors = lim->max_sectors, start, end; > + unsigned max_sectors, start, end; > + > + /* > + * We ignore lim->max_sectors for atomic writes simply because > + * it may less than bio->write_atomic_unit, which we cannot > + * tolerate. > + */ > + if (bio->bi_opf & REQ_ATOMIC) > + max_sectors = lim->atomic_write_max_sectors; > + else > + max_sectors = lim->max_sectors; I can understand the trouble for write atomic from bio split, which may simply split in the max_sectors boundary, however this change is still too fragile: 1) ->max_sectors may be set from userspace - so this change simply override userspace setting 2) otherwise ->max_sectors is same with ->max_hw_sectors: - then something must be wrong in device side or driver side because ->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed to be figured out before device is setup 3) too big max_sectors may break driver or device, such as nvme-pci aligns max_hw_sectors with DMA optimized mapping size And there might be more(better) choices: 1) make sure atomic write limit is respected when userspace updates ->max_sectors 2) when driver finds that atomic write limits conflict with other existed hardware limits, fail or solve(such as reduce write atomic unit) the conflict before queue is started; With single write atomic limits update API, the conflict can be figured out earlier by block layer too. thanks, Ming
On 15/12/2023 02:27, Ming Lei wrote: > On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote: >> Currently an IO size is limited to the request_queue limits max_sectors. >> Limit the size for an atomic write to queue limit atomic_write_max_sectors >> value. >> >> Signed-off-by: John Garry <john.g.garry@oracle.com> >> --- >> block/blk-merge.c | 12 +++++++++++- >> block/blk.h | 3 +++ >> 2 files changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> index 0ccc251e22ff..8d4de9253fe9 100644 >> --- a/block/blk-merge.c >> +++ b/block/blk-merge.c >> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio, >> { >> unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT; >> unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT; >> - unsigned max_sectors = lim->max_sectors, start, end; >> + unsigned max_sectors, start, end; >> + >> + /* >> + * We ignore lim->max_sectors for atomic writes simply because >> + * it may less than bio->write_atomic_unit, which we cannot >> + * tolerate. >> + */ >> + if (bio->bi_opf & REQ_ATOMIC) >> + max_sectors = lim->atomic_write_max_sectors; >> + else >> + max_sectors = lim->max_sectors; Please note that I mentioned this issue in the cover letter, so you can see some discussion there (if missed). > > I can understand the trouble for write atomic from bio split, which > may simply split in the max_sectors boundary, however this change is > still too fragile: > > 1) ->max_sectors may be set from userspace > - so this change simply override userspace setting yes > > 2) otherwise ->max_sectors is same with ->max_hw_sectors: > > - then something must be wrong in device side or driver side because > ->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed > to be figured out before device is setup > Right, so I think that it is proper to limit atomic_write_unit_max et al to max_hw_sectors or whatever DMA engine device limits. I can make that change. > 3) too big max_sectors may break driver or device, such as nvme-pci > aligns max_hw_sectors with DMA optimized mapping size > > And there might be more(better) choices: > > 1) make sure atomic write limit is respected when userspace updates > ->max_sectors My mind has been changed and I would say no and we can treat atomic_write_unit_max as special. Indeed, max_sectors and atomic_write_unit_max are complementary. If someone sets max_sectors to 4KB and then tries an atomic write of 16KB then they don't know what they are doing. My original idea was to dynamically limit atomic_unit_unit_max et al to max_sectors (so that max_sectors is always respected for atomic writes). As an alternative, how about we keep the value of atomic_unit_unit_max static, but reject an atomic write if it exceeds max_sectors? It's not too different than dynamically limiting atomic_unit_unit_max. But as mentioned, it may be asking for trouble.... > > 2) when driver finds that atomic write limits conflict with other > existed hardware limits, fail or solve(such as reduce write atomic unit) the > conflict before queue is started; With single write atomic limits update API, > the conflict can be figured out earlier by block layer too. I think that we can do this, but I am not sure what other queue limits may conflict (apart from max_sectors / max_sectors_hw). There is max segment size, but we just rely on a single PAGE per iovec to evaluate atomic_unit_unit_max, so should not be an issue. Thanks, John
diff --git a/block/blk-merge.c b/block/blk-merge.c index 0ccc251e22ff..8d4de9253fe9 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio, { unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT; unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT; - unsigned max_sectors = lim->max_sectors, start, end; + unsigned max_sectors, start, end; + + /* + * We ignore lim->max_sectors for atomic writes simply because + * it may less than bio->write_atomic_unit, which we cannot + * tolerate. + */ + if (bio->bi_opf & REQ_ATOMIC) + max_sectors = lim->atomic_write_max_sectors; + else + max_sectors = lim->max_sectors; if (lim->chunk_sectors) { max_sectors = min(max_sectors, diff --git a/block/blk.h b/block/blk.h index 94e330e9c853..6f6cd5b1830a 100644 --- a/block/blk.h +++ b/block/blk.h @@ -178,6 +178,9 @@ static inline unsigned int blk_queue_get_max_sectors(struct request *rq) if (unlikely(op == REQ_OP_WRITE_ZEROES)) return q->limits.max_write_zeroes_sectors; + if (rq->cmd_flags & REQ_ATOMIC) + return q->limits.atomic_write_max_sectors; + return q->limits.max_sectors; }
Currently an IO size is limited to the request_queue limits max_sectors. Limit the size for an atomic write to queue limit atomic_write_max_sectors value. Signed-off-by: John Garry <john.g.garry@oracle.com> --- block/blk-merge.c | 12 +++++++++++- block/blk.h | 3 +++ 2 files changed, 14 insertions(+), 1 deletion(-)