diff mbox series

[v2,08/16] block: Limit atomic write IO size according to atomic_write_max_sectors

Message ID 20231212110844.19698-9-john.g.garry@oracle.com (mailing list archive)
State Not Applicable
Headers show
Series block atomic writes | expand

Commit Message

John Garry Dec. 12, 2023, 11:08 a.m. UTC
Currently an IO size is limited to the request_queue limits max_sectors.
Limit the size for an atomic write to queue limit atomic_write_max_sectors
value.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-merge.c | 12 +++++++++++-
 block/blk.h       |  3 +++
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Ming Lei Dec. 15, 2023, 2:27 a.m. UTC | #1
On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
> Currently an IO size is limited to the request_queue limits max_sectors.
> Limit the size for an atomic write to queue limit atomic_write_max_sectors
> value.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  block/blk-merge.c | 12 +++++++++++-
>  block/blk.h       |  3 +++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 0ccc251e22ff..8d4de9253fe9 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
>  {
>  	unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
>  	unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
> -	unsigned max_sectors = lim->max_sectors, start, end;
> +	unsigned max_sectors, start, end;
> +
> +	/*
> +	 * We ignore lim->max_sectors for atomic writes simply because
> +	 * it may less than bio->write_atomic_unit, which we cannot
> +	 * tolerate.
> +	 */
> +	if (bio->bi_opf & REQ_ATOMIC)
> +		max_sectors = lim->atomic_write_max_sectors;
> +	else
> +		max_sectors = lim->max_sectors;

I can understand the trouble for write atomic from bio split, which
may simply split in the max_sectors boundary, however this change is
still too fragile:

1) ->max_sectors may be set from userspace
- so this change simply override userspace setting

2) otherwise ->max_sectors is same with ->max_hw_sectors:

- then something must be wrong in device side or driver side because
->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
to be figured out before device is setup

3) too big max_sectors may break driver or device, such as nvme-pci
aligns max_hw_sectors with DMA optimized mapping size

And there might be more(better) choices:

1) make sure atomic write limit is respected when userspace updates
->max_sectors

2) when driver finds that atomic write limits conflict with other
existed hardware limits, fail or solve(such as reduce write atomic unit) the
conflict before queue is started; With single write atomic limits update API,
the conflict can be figured out earlier by block layer too.



thanks, 
Ming
John Garry Dec. 15, 2023, 1:55 p.m. UTC | #2
On 15/12/2023 02:27, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
>> Currently an IO size is limited to the request_queue limits max_sectors.
>> Limit the size for an atomic write to queue limit atomic_write_max_sectors
>> value.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   block/blk-merge.c | 12 +++++++++++-
>>   block/blk.h       |  3 +++
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 0ccc251e22ff..8d4de9253fe9 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
>>   {
>>   	unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
>>   	unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
>> -	unsigned max_sectors = lim->max_sectors, start, end;
>> +	unsigned max_sectors, start, end;
>> +
>> +	/*
>> +	 * We ignore lim->max_sectors for atomic writes simply because
>> +	 * it may less than bio->write_atomic_unit, which we cannot
>> +	 * tolerate.
>> +	 */
>> +	if (bio->bi_opf & REQ_ATOMIC)
>> +		max_sectors = lim->atomic_write_max_sectors;
>> +	else
>> +		max_sectors = lim->max_sectors;

Please note that I mentioned this issue in the cover letter, so you can 
see some discussion there (if missed).

> 
> I can understand the trouble for write atomic from bio split, which
> may simply split in the max_sectors boundary, however this change is
> still too fragile:
> 
> 1) ->max_sectors may be set from userspace
> - so this change simply override userspace setting

yes

> 
> 2) otherwise ->max_sectors is same with ->max_hw_sectors:
> 
> - then something must be wrong in device side or driver side because
> ->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
> to be figured out before device is setup
> 

Right, so I think that it is proper to limit atomic_write_unit_max et al 
to max_hw_sectors or whatever DMA engine device limits. I can make that 
change.

> 3) too big max_sectors may break driver or device, such as nvme-pci
> aligns max_hw_sectors with DMA optimized mapping size
> 
> And there might be more(better) choices:
> 
> 1) make sure atomic write limit is respected when userspace updates
> ->max_sectors

My mind has been changed and I would say no and we can treat 
atomic_write_unit_max as special. Indeed, max_sectors and 
atomic_write_unit_max are complementary. If someone sets max_sectors to 
4KB and then tries an atomic write of 16KB then they don't know what 
they are doing.

My original idea was to dynamically limit atomic_unit_unit_max et al to 
max_sectors (so that max_sectors is always respected for atomic writes).

As an alternative, how about we keep the value of atomic_unit_unit_max 
static, but reject an atomic write if it exceeds max_sectors? It's not 
too different than dynamically limiting atomic_unit_unit_max. But as 
mentioned, it may be asking for trouble....

> 
> 2) when driver finds that atomic write limits conflict with other
> existed hardware limits, fail or solve(such as reduce write atomic unit) the
> conflict before queue is started; With single write atomic limits update API,
> the conflict can be figured out earlier by block layer too.

I think that we can do this, but I am not sure what other queue limits 
may conflict (apart from max_sectors / max_sectors_hw). There is max 
segment size, but we just rely on a single PAGE per iovec to evaluate 
atomic_unit_unit_max, so should not be an issue.

Thanks,
John
diff mbox series

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 0ccc251e22ff..8d4de9253fe9 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -171,7 +171,17 @@  static inline unsigned get_max_io_size(struct bio *bio,
 {
 	unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
 	unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
-	unsigned max_sectors = lim->max_sectors, start, end;
+	unsigned max_sectors, start, end;
+
+	/*
+	 * We ignore lim->max_sectors for atomic writes simply because
+	 * it may less than bio->write_atomic_unit, which we cannot
+	 * tolerate.
+	 */
+	if (bio->bi_opf & REQ_ATOMIC)
+		max_sectors = lim->atomic_write_max_sectors;
+	else
+		max_sectors = lim->max_sectors;
 
 	if (lim->chunk_sectors) {
 		max_sectors = min(max_sectors,
diff --git a/block/blk.h b/block/blk.h
index 94e330e9c853..6f6cd5b1830a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -178,6 +178,9 @@  static inline unsigned int blk_queue_get_max_sectors(struct request *rq)
 	if (unlikely(op == REQ_OP_WRITE_ZEROES))
 		return q->limits.max_write_zeroes_sectors;
 
+	if (rq->cmd_flags & REQ_ATOMIC)
+		return q->limits.atomic_write_max_sectors;
+
 	return q->limits.max_sectors;
 }