diff mbox

blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES

Message ID CAOi1vP-amFyFHbPV3FhG3F1Tp+df6V7a4mcP8LLWY0GZx+JsFw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Dryomov Feb. 22, 2017, 2:41 p.m. UTC
On Wed, Feb 22, 2017 at 5:41 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:
>
> Ilya,
>
> Ilya> could you please explain blk_integrity_revalidate() and
> Ilya> its GENHD_FL_UP check in particular?  We have the queue,
> Ilya> bi->profile can't be NULL after blk_integrity_register(), and
> Ilya> since the latter "must" be used for registering the profile with
> Ilya> the block layer, wouldn't the following be sufficient for
> Ilya> blk_integrity users?
>
> IIrc, the FL_UP check fixed a registration problem in the nvme driver.

Did it have something to do with a NULL disk->queue?

>
> The rationale behind revalidate was that we need to handle devices which
> lose the integrity capability at runtime (i.e. a integrity-enabled DM
> device is extended with a non-cable drive forcing the feature to be
> turned off). The clearing of the integrity profile is more important in
> that case than zapping the stable pages flag. But that was the original
> reason for not just ORing BDI_CAP_STABLE_WRITES.
>
> I don't have a huge problem with keeping stable pages on if a device
> suddenly stops being integrity capable. However, I'd like to understand
> your use case a bit better.

Well, blk_integrity_revalidate() doesn't clear the profile, it just
clears the stable pages flag.  Whoever calls blk_integrity_unregister()
to clear the profile can also clear the stable pages flag -- why not
let blk_integrity_unregister() clear the flag like I suggested?

>
> Ilya> The alternative seems to be to set up a bogus
> Ilya> blk_integrity_profile (nop_profile won't do -- this one would have
> Ilya> to be truly bogus w/ NULL *_fn) under BLK_DEV_INTEGRITY ifdefs and
> Ilya> hope that nothing breaks.
>
> Can you point me to the relevant code on your end?

This code doesn't exist yet -- it's just something I thought I'd have
to do if my patch or something along those lines isn't accepted.  There
is the nop_profile with stub *_fn callbacks, which is used by nvme and
nvdimm to make bio_integrity_enabled() return true, so that per-interval
buffers are allocated in bio_integrity_prep().  "I want some space for
per-interval metadata, but no integrity checksums" is use case there.

In the rbd case, we don't want that.  We just want to set the stable
pages flag and make sure it's not reset by blk_integrity code.  So, if
blk_integrity_revalidate() isn't fixed, rbd will need to set up a bogus
profile with NULL *_fn callbacks to make bio_integrity_enabled() return
false.  This is obviously fragile, because blk_get_integrity() will no
longer return NULL...

We are setting BDI_CAP_STABLE_WRITES in rbd_init_disk(), see commit
bae818ee1577 ("rbd: require stable pages if message data CRCs are
enabled").  The CRCs are generated in write_partial_message_data() and
verified in read_partial_msg_data().

I'm attaching the patch I had in mind.

Thanks,

                Ilya

Comments

Martin K. Petersen Feb. 23, 2017, 11:49 p.m. UTC | #1
>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:

Ilya,

Ilya> Well, blk_integrity_revalidate() doesn't clear the profile, it
Ilya> just clears the stable pages flag.  Whoever calls
Ilya> blk_integrity_unregister() to clear the profile can also clear the
Ilya> stable pages flag -- why not let blk_integrity_unregister() clear
Ilya> the flag like I suggested?

That's what it used to do.

blk_integrity_revalidate() was obviously introduced to overcome some
problem. Unfortunately, I can't recall what that was and Google isn't
being particularly helpful. I suspect it was either in the NVDIMM or
NVMe camps since that's where the churn was.

I don't have a problem with your patch as long as we're sure there are
no regressions. I would carry the gendisk check over, though.
Ilya Dryomov Feb. 28, 2017, 8:19 a.m. UTC | #2
On Fri, Feb 24, 2017 at 12:49 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:
>
> Ilya,
>
> Ilya> Well, blk_integrity_revalidate() doesn't clear the profile, it
> Ilya> just clears the stable pages flag.  Whoever calls
> Ilya> blk_integrity_unregister() to clear the profile can also clear the
> Ilya> stable pages flag -- why not let blk_integrity_unregister() clear
> Ilya> the flag like I suggested?
>
> That's what it used to do.
>
> blk_integrity_revalidate() was obviously introduced to overcome some
> problem. Unfortunately, I can't recall what that was and Google isn't
> being particularly helpful. I suspect it was either in the NVDIMM or
> NVMe camps since that's where the churn was.

Adding more people from 25520d55cdb6 -- does anyone remember the story
behind blk_integrity_revalidate()?

>
> I don't have a problem with your patch as long as we're sure there are
> no regressions. I would carry the gendisk check over, though.

I spent quite some time trying to do basic tests on nvme, since it uses
both nop_profile and real integrity profiles.  Unfortunately I don't
have access to an nvme card with separate-metadata capability and
upstream qemu driver doesn't support any metadata capabilities at all,
so I turned to https://github.com/OpenChannelSSD/qemu-nvme.  It lets
you configure mc and multiple lbafs, but

    commit ff646a5841f743fdd9cfef138ac6be1f0ba4fbfb
    Author: Keith Busch <keith.busch@intel.com> Thu Oct 23 20:02:44 2014

    As part of this, I am removing all meta-data support. It's not well
    supported in any immediatly available operating system, so the code
    is not well tested.

In the end I got it into a semi-working state by checking out an old
version and dropping some sanity checks.  I formatted the namespace in
a number of ways -- bdi/stable_pages_required and integrity/format
behaved as expected ("none", "nop", "T10-DIF-TYPE1-CRC").

Given the above, I'm not sure what the baseline is -- blk_integrity
code isn't invoked for data-only lbafs.  Could nvme folks please look
at this?  rbd regression caused by integrity revalidate change goes
back to 4.4 and I'd really like to get it fixed.  The proposed patch
is attached to my earlier reply on linux-block.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen March 2, 2017, 3:24 a.m. UTC | #3
>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:

Ilya,

Ilya> Given the above, I'm not sure what the baseline is --
Ilya> blk_integrity code isn't invoked for data-only lbafs.  Could nvme
Ilya> folks please look at this?  rbd regression caused by integrity
Ilya> revalidate change goes back to 4.4 and I'd really like to get it
Ilya> fixed.  The proposed patch is attached to my earlier reply on
Ilya> linux-block.

I've had a couple of fires to attend to this week. I'll try and give
your patch a spin on my FC setup tomorrow or Friday.
Ilya Dryomov March 10, 2017, 4:49 p.m. UTC | #4
On Thu, Mar 2, 2017 at 4:24 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:
>
> Ilya,
>
> Ilya> Given the above, I'm not sure what the baseline is --
> Ilya> blk_integrity code isn't invoked for data-only lbafs.  Could nvme
> Ilya> folks please look at this?  rbd regression caused by integrity
> Ilya> revalidate change goes back to 4.4 and I'd really like to get it
> Ilya> fixed.  The proposed patch is attached to my earlier reply on
> Ilya> linux-block.
>
> I've had a couple of fires to attend to this week. I'll try and give
> your patch a spin on my FC setup tomorrow or Friday.

Hi Martin,

Ping...  Did you get a chance to try it?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 24a0cf8bc437a65b3986e9ab25cf2f05e6bbd5d0 Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov@gmail.com>
Date: Mon, 20 Feb 2017 18:50:50 +0100
Subject: [PATCH] block: get rid of blk_integrity_revalidate()

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which clears the stable pages
bit if no blk_integrity profile is registered.  It's called from
revalidate_disk() and rescan_partitions(), making it impossible to set
BDI_CAP_STABLE_WRITES for drivers that support partitions and don't use
blk_integrity.  One such driver is rbd, where the ceph messenger is
responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 block/blk-integrity.c     | 19 ++-----------------
 block/partition-generic.c |  1 -
 fs/block_dev.c            |  1 -
 include/linux/genhd.h     |  2 --
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..319f2e4f4a8b 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -417,7 +417,7 @@  void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
 	bi->tuple_size = template->tuple_size;
 	bi->tag_size = template->tag_size;
 
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
 }
 EXPORT_SYMBOL(blk_integrity_register);
 
@@ -430,26 +430,11 @@  EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities &= ~BDI_CAP_STABLE_WRITES;
 	memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
 
-void blk_integrity_revalidate(struct gendisk *disk)
-{
-	struct blk_integrity *bi = &disk->queue->integrity;
-
-	if (!(disk->flags & GENHD_FL_UP))
-		return;
-
-	if (bi->profile)
-		disk->queue->backing_dev_info.capabilities |=
-			BDI_CAP_STABLE_WRITES;
-	else
-		disk->queue->backing_dev_info.capabilities &=
-			~BDI_CAP_STABLE_WRITES;
-}
-
 void blk_integrity_add(struct gendisk *disk)
 {
 	if (kobject_init_and_add(&disk->integrity_kobj, &integrity_ktype,
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..0171a2faad68 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -497,7 +497,6 @@  int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
-	blk_integrity_revalidate(disk);
 	check_disk_size_change(disk, bdev);
 	bdev->bd_invalidated = 0;
 	if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3c47614a4b32..b94e2a4974a1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1422,7 +1422,6 @@  int revalidate_disk(struct gendisk *disk)
 
 	if (disk->fops->revalidate_disk)
 		ret = disk->fops->revalidate_disk(disk);
-	blk_integrity_revalidate(disk);
 	bdev = bdget_disk(disk, 0);
 	if (!bdev)
 		return ret;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 76f39754e7b0..76d6a1cd4153 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -722,11 +722,9 @@  static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 extern void blk_integrity_add(struct gendisk *);
 extern void blk_integrity_del(struct gendisk *);
-extern void blk_integrity_revalidate(struct gendisk *);
 #else	/* CONFIG_BLK_DEV_INTEGRITY */
 static inline void blk_integrity_add(struct gendisk *disk) { }
 static inline void blk_integrity_del(struct gendisk *disk) { }
-static inline void blk_integrity_revalidate(struct gendisk *disk) { }
 #endif	/* CONFIG_BLK_DEV_INTEGRITY */
 
 #else /* CONFIG_BLOCK */
-- 
2.4.3