diff mbox series

[03/11] block: remove the BIP_IP_CHECKSUM flag

Message ID 20240607055912.3586772-4-hch@lst.de (mailing list archive)
State Superseded, archived
Delegated to: Benjamin Marzinski
Headers show
Series [01/11] dm-integrity: use the nop integrity profile | expand

Commit Message

Christoph Hellwig June 7, 2024, 5:58 a.m. UTC
Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
checksum type instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 block/bio-integrity.c | 3 ---
 drivers/scsi/sd.c     | 6 +++---
 include/linux/bio.h   | 5 ++---
 3 files changed, 5 insertions(+), 9 deletions(-)

Comments

Hannes Reinecke June 7, 2024, 6:11 a.m. UTC | #1
On 6/7/24 07:58, Christoph Hellwig wrote:
> Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
> checksum type instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
>   block/bio-integrity.c | 3 ---
>   drivers/scsi/sd.c     | 6 +++---
>   include/linux/bio.h   | 5 ++---
>   3 files changed, 5 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Martin K. Petersen June 10, 2024, 11:56 a.m. UTC | #2
Christoph,

> Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
> checksum type instead.

This removes the ability to submit an individual I/O using a CRC instead
of the IP checksum. There are cases which can't be expressed when
the controller is operating in IP checksum mode.
Christoph Hellwig June 10, 2024, 11:57 a.m. UTC | #3
On Mon, Jun 10, 2024 at 07:56:00AM -0400, Martin K. Petersen wrote:
> > Remove the BIP_IP_CHECKSUM as sd can just look at the per-disk
> > checksum type instead.
> 
> This removes the ability to submit an individual I/O using a CRC instead
> of the IP checksum. There are cases which can't be expressed when
> the controller is operating in IP checksum mode.

Huh, how?
Martin K. Petersen June 10, 2024, 12:19 p.m. UTC | #4
Christoph,

>> This removes the ability to submit an individual I/O using a CRC
>> instead of the IP checksum. There are cases which can't be expressed
>> when the controller is operating in IP checksum mode.
>
> Huh, how?

On the wire between controller and target there's only CRC. If I want to
write a "bad" CRC to disk, I have switch the controller to CRC mode. The
controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test
tooling relies heavily on being able to write "bad" things to disk and
read them back to validate that we detect the error.
Christoph Hellwig June 10, 2024, 12:24 p.m. UTC | #5
On Mon, Jun 10, 2024 at 08:19:33AM -0400, Martin K. Petersen wrote:
> On the wire between controller and target there's only CRC. If I want to
> write a "bad" CRC to disk, I have switch the controller to CRC mode. The
> controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test
> tooling relies heavily on being able to write "bad" things to disk and
> read them back to validate that we detect the error.

But how do you even toggle the flag?  There is no no code to do that.
And if you already have a special kernel module for that it really
should just use a passthrough request to take care of that.

Note that unlike the NOCHECK flag which I just cleaned up because they
were unused, this one actually does get in the way of the architecture
of the whole series :(  We could add a per-bip csum_type but it would
feel really weird.
Martin K. Petersen June 11, 2024, 7:51 p.m. UTC | #6
Christoph,

Sorry about the delay. Travel got in the way.

>> On the wire between controller and target there's only CRC. If I want to
>> write a "bad" CRC to disk, I have switch the controller to CRC mode. The
>> controller can't convert a "bad" IP checksum to a "bad" CRC. The PI test
>> tooling relies heavily on being able to write "bad" things to disk and
>> read them back to validate that we detect the error.
>
> But how do you even toggle the flag?  There is no no code to do that.
> And if you already have a special kernel module for that it really
> should just use a passthrough request to take care of that.

A passthrough command to the controller?

> Note that unlike the NOCHECK flag which I just cleaned up because they
> were unused, this one actually does get in the way of the architecture
> of the whole series :( We could add a per-bip csum_type but it would
> feel really weird.

Why would it feel weird? That's how it currently works.

The qualification tool issues a flurry of commands injecting errors at
various places in the stack to identify that the right entity (block
layer, controller, storage device) catch a bad checksum, reference tag,
etc. Being able to enable/disable checking at each place in the stack is
important. I also have code for target that does the same thing in the
reverse direction.
Christoph Hellwig June 12, 2024, 3:51 a.m. UTC | #7
On Tue, Jun 11, 2024 at 03:51:27PM -0400, Martin K. Petersen wrote:
> > But how do you even toggle the flag?  There is no no code to do that.
> > And if you already have a special kernel module for that it really
> > should just use a passthrough request to take care of that.
> 
> A passthrough command to the controller?

A passthrough command to the LU that has a mismatching checksum.

> > Note that unlike the NOCHECK flag which I just cleaned up because they
> > were unused, this one actually does get in the way of the architecture
> > of the whole series :( We could add a per-bip csum_type but it would
> > feel really weird.
> 
> Why would it feel weird? That's how it currently works.

Because there's no way to have it set to anything but the per-queue
one.  

> The qualification tool issues a flurry of commands injecting errors at
> various places in the stack to identify that the right entity (block
> layer, controller, storage device) catch a bad checksum, reference tag,
> etc.

How does it do that?  There's no actualy way to make it mismatch.
Martin K. Petersen June 12, 2024, 5:27 p.m. UTC | #8
Christoph,

>> > Note that unlike the NOCHECK flag which I just cleaned up because they
>> > were unused, this one actually does get in the way of the architecture
>> > of the whole series :( We could add a per-bip csum_type but it would
>> > feel really weird.
>> 
>> Why would it feel weird? That's how it currently works.
>
> Because there's no way to have it set to anything but the per-queue
> one.

That's what the io_uring passthrough changes enable.

Note that the IP checksum is an optional performance feature. A SCSI
controller supporting IP-to-CRC conversion does not imply that all
submitted metadata must use IP checksum format.

The T10 CRC used to be painfully slow to calculate prior to processors
growing support for pclmulqdq or similar. Hence the optional IP
checksum. But on a modern CPU, the T10 CRC can often be calculated fast
enough that it is less of a performance impediment.

The interface was explicitly designed so that the entity which creates
the metadata decides which checksum it wants to use. And then it uses
the bip flag to communicate that to the HBA. The patch which allowed the
user to set the desired guard tag format for block layer-owned PI fell
by the wayside, apparently. Possibly lost track because the T10 CRC
hardware offload changes took a while to land.

Note that I would personally love to get rid of the IP checksum
altogether but I think it's too soon to make it obsolete. Still a lot of
SCSI stuff out there which runs in IP checksum mode. And it is still a
bit faster than CRC for many workloads. And as long as it is in use, we
need the ability to support it and qualify it.

All I'm asking is that we retain the ability to disable checking at the
controller level and at the target level. And that the optional IP
checksum can be selected on a per-I/O basis. IOW, please just retain the
three existing bip flags. Happy to look into what polarity-reversal
would look like but I don't think that should hold up your series.

>> The qualification tool issues a flurry of commands injecting errors at
>> various places in the stack to identify that the right entity (block
>> layer, controller, storage device) catch a bad checksum, reference tag,
>> etc.
>
> How does it do that?  There's no actualy way to make it mismatch.

Through a custom passthrough driver that we want to get rid of and
replace with the io_uring interface series.
Christoph Hellwig June 13, 2024, 5:35 a.m. UTC | #9
On Wed, Jun 12, 2024 at 01:27:47PM -0400, Martin K. Petersen wrote:
> >> > Note that unlike the NOCHECK flag which I just cleaned up because they
> >> > were unused, this one actually does get in the way of the architecture
> >> > of the whole series :( We could add a per-bip csum_type but it would
> >> > feel really weird.
> >> 
> >> Why would it feel weird? That's how it currently works.
> >
> > Because there's no way to have it set to anything but the per-queue
> > one.
> 
> That's what the io_uring passthrough changes enable.

The checksum type?  How is that compatible with nvme?

Anyway, I'll just leave this flag in for the resend, but if we can't
come up with a coherent user for it in a merge cycle or two (which
I very much doubt) I'll send another patch to remove it.
Martin K. Petersen June 14, 2024, 12:57 a.m. UTC | #10
Christoph,

> The checksum type?  How is that compatible with nvme?

NVMe does not support IP checksum.
Christoph Hellwig June 14, 2024, 3:16 a.m. UTC | #11
On Thu, Jun 13, 2024 at 08:57:26PM -0400, Martin K. Petersen wrote:
> 
> Christoph,
> 
> > The checksum type?  How is that compatible with nvme?
> 
> NVMe does not support IP checksum.

Yes, I know.  So how do you make an interface work where you can
arbitrarily force it?
diff mbox series

Patch

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index af7f71d16114de..c69da65759af89 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -488,9 +488,6 @@  bool bio_integrity_prep(struct bio *bio)
 	bip->bip_flags |= BIP_BLOCK_INTEGRITY;
 	bip_set_seed(bip, bio->bi_iter.bi_sector);
 
-	if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM)
-		bip->bip_flags |= BIP_IP_CHECKSUM;
-
 	/* Map it */
 	offset = offset_in_page(buf);
 	for (i = 0; i < nr_pages && len > 0; i++) {
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b477383ccc3b2a..e21b7df5c31b0d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -43,7 +43,7 @@ 
 #include <linux/idr.h>
 #include <linux/interrupt.h>
 #include <linux/init.h>
-#include <linux/blkdev.h>
+#include <linux/blk-integrity.h>
 #include <linux/blkpg.h>
 #include <linux/blk-pm.h>
 #include <linux/delay.h>
@@ -799,12 +799,12 @@  static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd,
 					   unsigned int dix, unsigned int dif)
 {
 	struct request *rq = scsi_cmd_to_rq(scmd);
-	struct bio *bio = rq->bio;
+	struct blk_integrity *bi = &rq->q->integrity;
 	unsigned int prot_op = sd_prot_op(rq_data_dir(rq), dix, dif);
 	unsigned int protect = 0;
 
 	if (dix) {				/* DIX Type 0, 1, 2, 3 */
-		if (bio_integrity_flagged(bio, BIP_IP_CHECKSUM))
+		if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM)
 			scmd->prot_flags |= SCSI_PROT_IP_CHECKSUM;
 		scmd->prot_flags |= SCSI_PROT_GUARD_CHECK;
 	}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ec5dcf8635ac66..3295dd6021659b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -324,9 +324,8 @@  static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio)
 enum bip_flags {
 	BIP_BLOCK_INTEGRITY	= 1 << 0, /* block layer owns integrity data */
 	BIP_MAPPED_INTEGRITY	= 1 << 1, /* ref tag has been remapped */
-	BIP_IP_CHECKSUM		= 1 << 4, /* IP checksum */
-	BIP_INTEGRITY_USER	= 1 << 5, /* Integrity payload is user address */
-	BIP_COPY_USER		= 1 << 6, /* Kernel bounce buffer in use */
+	BIP_INTEGRITY_USER	= 1 << 2, /* Integrity payload is user address */
+	BIP_COPY_USER		= 1 << 3, /* Kernel bounce buffer in use */
 };
 
 /*