[03/11] block: remove the BIP_IP_CHECKSUM flag

Message ID	20240607055912.3586772-4-hch@lst.de (mailing list archive)
State	New
Headers	show Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 160A314E2F7; Fri, 7 Jun 2024 05:59:34 +0000 (UTC) From: Christoph Hellwig <hch@lst.de> To: Jens Axboe <axboe@kernel.dk>, "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@kernel.org>, Mikulas Patocka <mpatocka@redhat.com>, Song Liu <song@kernel.org>, Yu Kuai <yukuai3@huawei.com>, Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Dave Jiang <dave.jiang@intel.com>, Ira Weiny <ira.weiny@intel.com>, Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>, Chaitanya Kulkarni <kch@nvidia.com>, linux-block@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, nvdimm@lists.linux.dev, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>, Kanchan Joshi <joshi.k@samsung.com> Subject: [PATCH 03/11] block: remove the BIP_IP_CHECKSUM flag Date: Fri, 7 Jun 2024 07:58:57 +0200 Message-ID: <20240607055912.3586772-4-hch@lst.de> In-Reply-To: <20240607055912.3586772-1-hch@lst.de> References: <20240607055912.3586772-1-hch@lst.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[01/11] dm-integrity: use the nop integrity profile \| expand [01/11] dm-integrity: use the nop integrity profile [02/11] block: remove the unused BIP_{CTRL,DISK}_NOCHECK flags [03/11] block: remove the BIP_IP_CHECKSUM flag [04/11] block: remove the blk_integrity_profile structure [05/11] block: remove the blk_flush_integrity call in blk_integrity_unregister [06/11] block: factor out flag_{store,show} helper for integrity [07/11] block: use kstrtoul in flag_store [08/11] block: don't require stable pages for non-PI metadata [09/11] block: bypass the STABLE_WRITES flag for protection information [10/11] block: invert the BLK_INTEGRITY_{GENERATE,VERIFY} flags [11/11] block: move integrity information into queue_limits

Christoph Hellwig June 7, 2024, 5:58 a.m. UTC

Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
checksum type instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 block/bio-integrity.c | 3 ---
 drivers/scsi/sd.c     | 6 +++---
 include/linux/bio.h   | 5 ++---
 3 files changed, 5 insertions(+), 9 deletions(-)

Hannes Reinecke June 7, 2024, 6:11 a.m. UTC | #1

On 6/7/24 07:58, Christoph Hellwig wrote:
> Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
> checksum type instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
>   block/bio-integrity.c | 3 ---
>   drivers/scsi/sd.c     | 6 +++---
>   include/linux/bio.h   | 5 ++---
>   3 files changed, 5 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes

Martin K. Petersen June 10, 2024, 11:56 a.m. UTC | #2

Christoph,

> Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
> checksum type instead.

This removes the ability to submit an individual I/O using a CRC instead
of the IP checksum. There are cases which can't be expressed when
the controller is operating in IP checksum mode.

Christoph Hellwig June 10, 2024, 11:57 a.m. UTC | #3

On Mon, Jun 10, 2024 at 07:56:00AM -0400, Martin K. Petersen wrote:
> > Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
> > checksum type instead.
> 
> This removes the ability to submit an individual I/O using a CRC instead
> of the IP checksum. There are cases which can't be expressed when
> the controller is operating in IP checksum mode.

Huh, how?

Martin K. Petersen June 10, 2024, 12:19 p.m. UTC | #4

Christoph,

>> This removes the ability to submit an individual I/O using a CRC
>> instead of the IP checksum. There are cases which can't be expressed
>> when the controller is operating in IP checksum mode.
>
> Huh, how?

On the wire between controller and target there's only CRC. If I want to
write a "bad" CRC to disk, I have switch the controller to CRC mode. The
controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test
tooling relies heavily on being able to write "bad" things to disk and
read them back to validate that we detect the error.

Christoph Hellwig June 10, 2024, 12:24 p.m. UTC | #5

On Mon, Jun 10, 2024 at 08:19:33AM -0400, Martin K. Petersen wrote:
> On the wire between controller and target there's only CRC. If I want to
> write a "bad" CRC to disk, I have switch the controller to CRC mode. The
> controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test
> tooling relies heavily on being able to write "bad" things to disk and
> read them back to validate that we detect the error.

But how do you even toggle the flag?  There is no no code to do that.
And if you already have a special kernel module for that it really
should just use a passthrough request to take care of that.

Note that unlike the NOCHECK flag which I just cleaned up because they
were unused, this one actually does get in the way of the architecture
of the whole series :(  We could add a per-bip csum_type but it would
feel really weird.

Martin K. Petersen June 11, 2024, 7:51 p.m. UTC | #6

Christoph,

Sorry about the delay. Travel got in the way.

>> On the wire between controller and target there's only CRC. If I want to
>> write a "bad" CRC to disk, I have switch the controller to CRC mode. The
>> controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test
>> tooling relies heavily on being able to write "bad" things to disk and
>> read them back to validate that we detect the error.
>
> But how do you even toggle the flag?  There is no no code to do that.
> And if you already have a special kernel module for that it really
> should just use a passthrough request to take care of that.

A passthrough command to the controller?

> Note that unlike the NOCHECK flag which I just cleaned up because they
> were unused, this one actually does get in the way of the architecture
> of the whole series :( We could add a per-bip csum_type but it would
> feel really weird.

Why would it feel weird? That's how it currently works.

The qualification tool issues a flurry of commands injecting errors at
various places in the stack to identify that the right entity (block
layer, controller, storage device) catch a bad checksum, reference tag,
etc. Being able to enable/disable checking at each place in the stack is
important. I also have code for target that does the same thing in the
reverse direction.

Christoph Hellwig June 12, 2024, 3:51 a.m. UTC | #7

On Tue, Jun 11, 2024 at 03:51:27PM -0400, Martin K. Petersen wrote:
> > But how do you even toggle the flag?  There is no no code to do that.
> > And if you already have a special kernel module for that it really
> > should just use a passthrough request to take care of that.
> 
> A passthrough command to the controller?

A passthrough command to the LU that has a mismatching checksum.

> > Note that unlike the NOCHECK flag which I just cleaned up because they
> > were unused, this one actually does get in the way of the architecture
> > of the whole series :( We could add a per-bip csum_type but it would
> > feel really weird.
> 
> Why would it feel weird? That's how it currently works.

Because there's no way to have it set to anything but the per-queue
one.  

> The qualification tool issues a flurry of commands injecting errors at
> various places in the stack to identify that the right entity (block
> layer, controller, storage device) catch a bad checksum, reference tag,
> etc.

How does it do that?  There's no actualy way to make it mismatch.

Martin K. Petersen June 12, 2024, 5:27 p.m. UTC | #8

Christoph,

>> > Note that unlike the NOCHECK flag which I just cleaned up because they
>> > were unused, this one actually does get in the way of the architecture
>> > of the whole series :( We could add a per-bip csum_type but it would
>> > feel really weird.
>> 
>> Why would it feel weird? That's how it currently works.
>
> Because there's no way to have it set to anything but the per-queue
> one.

That's what the io_uring passthrough changes enable.

Note that the IP checksum is an optional performance feature. A SCSI
controller supporting IP-to-CRC conversion does not imply that all
submitted metadata must use IP checksum format.

The T10 CRC used to be painfully slow to calculate prior to processors
growing support for pclmulqdq or similar. Hence the optional IP
checksum. But on a modern CPU, the T10 CRC can often be calculated fast
enough that it is less of a performance impediment.

The interface was explicitly designed so that the entity which creates
the metadata decides which checksum it wants to use. And then it uses
the bip flag to communicate that to the HBA. The patch which allowed the
user to set the desired guard tag format for block layer-owned PI fell
by the wayside, apparently. Possibly lost track because the T10 CRC
hardware offload changes took a while to land.

Note that I would personally love to get rid of the IP checksum
altogether but I think it's too soon to make it obsolete. Still a lot of
SCSI stuff out there which runs in IP checksum mode. And it is still a
bit faster than CRC for many workloads. And as long as it is in use, we
need the ability to support it and qualify it.

All I'm asking is that we retain the ability to disable checking at the
controller level and at the target level. And that the optional IP
checksum can be selected on a per-I/O basis. IOW, please just retain the
three existing bip flags. Happy to look into what polarity-reversal
would look like but I don't think that should hold up your series.

>> The qualification tool issues a flurry of commands injecting errors at
>> various places in the stack to identify that the right entity (block
>> layer, controller, storage device) catch a bad checksum, reference tag,
>> etc.
>
> How does it do that?  There's no actualy way to make it mismatch.

Through a custom passthrough driver that we want to get rid of and
replace with the io_uring interface series.

Christoph Hellwig June 13, 2024, 5:35 a.m. UTC | #9

On Wed, Jun 12, 2024 at 01:27:47PM -0400, Martin K. Petersen wrote:
> >> > Note that unlike the NOCHECK flag which I just cleaned up because they
> >> > were unused, this one actually does get in the way of the architecture
> >> > of the whole series :( We could add a per-bip csum_type but it would
> >> > feel really weird.
> >> 
> >> Why would it feel weird? That's how it currently works.
> >
> > Because there's no way to have it set to anything but the per-queue
> > one.
> 
> That's what the io_uring passthrough changes enable.

The checksum type?  How is that compatible with nvme?

Anyway, I'll just leave this flag in for the resend, but if we can't
come up with a coherent user for it in a merge cycle or two (which
I very much doubt) I'll send another patch to remove it.

Martin K. Petersen June 14, 2024, 12:57 a.m. UTC | #10

Christoph,

> The checksum type?  How is that compatible with nvme?

NVMe does not support IP checksum.

Christoph Hellwig June 14, 2024, 3:16 a.m. UTC | #11

On Thu, Jun 13, 2024 at 08:57:26PM -0400, Martin K. Petersen wrote:
> 
> Christoph,
> 
> > The checksum type?  How is that compatible with nvme?
> 
> NVMe does not support IP checksum.

Yes, I know.  So how do you make an interface work where you can
arbitrarily force it?

[03/11] block: remove the BIP_IP_CHECKSUM flag

Commit Message

Comments

Patch