Message ID | 20150213092432.GC11034@birch.djwong.org (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Fri, Feb 13 2015 at 4:24am -0500, Darrick J. Wong <darrick.wong@oracle.com> wrote: > I created a dm-raid1 device backed by a device that supports DISCARD > and another device that does NOT support DISCARD with the following > dm configuration: > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo > # lsblk -D > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO > sda 0 4K 1G 0 > `-moo (dm-0) 0 4K 1G 0 > sdb 0 0B 0B 0 > `-moo (dm-0) 0 4K 1G 0 > > Notice that the mirror device /dev/mapper/moo advertises DISCARD > support even though one of the mirror halves doesn't. > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite > loop in do_region() when it tries to issue a DISCARD request to sdb. > The problem is that when we call do_region() against sdb, num_sectors > is set to zero because q->limits.max_discard_sectors is zero. > Therefore, "remaining" never decreases and the loop never terminates. > > Before entering the loop, check for the combination of REQ_DISCARD and > no discard and return -EOPNOTSUPP to avoid hanging up the mirror > device. Fix the same problem with WRITE_DISCARD while we're at it. > > This bug was found by the unfortunate coincidence of pvmove and a > discard operation in the RHEL 6.5 kernel; 3.19 is also affected. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com> > Cc: Srinivas Eeda <srinivas.eeda@oracle.com> Your patch looks fine but it is laser focused on dm-io. Again, that is fine (fixes a real problem). But I'm wondering how other targets will respond in the face of partial discard support across the logical address space of the DM device. When I implemented dm_table_supports_discards() I consciously allowed a DM table to contain a mix of discard support. I'm now wondering where it is we benefit from that? Seems like more of a liability than anything -- so a bigger hammer approach to fixing this would be to require all targets and all devices in a DM table support discard. Which amounts to changing dm_table_supports_discards() to be like dm_table_supports_write_same(). BTW, given dm_table_supports_write_same(), your patch shouldn't need to worry about WRITE SAME. Did you experience issues with WRITE SAME too or were you just being proactive? Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Feb 13, 2015 at 10:55:50AM -0500, Mike Snitzer wrote: > On Fri, Feb 13 2015 at 4:24am -0500, > Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > I created a dm-raid1 device backed by a device that supports DISCARD > > and another device that does NOT support DISCARD with the following > > dm configuration: > > > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo > > # lsblk -D > > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO > > sda 0 4K 1G 0 > > `-moo (dm-0) 0 4K 1G 0 > > sdb 0 0B 0B 0 > > `-moo (dm-0) 0 4K 1G 0 > > > > Notice that the mirror device /dev/mapper/moo advertises DISCARD > > support even though one of the mirror halves doesn't. > > > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite > > loop in do_region() when it tries to issue a DISCARD request to sdb. > > The problem is that when we call do_region() against sdb, num_sectors > > is set to zero because q->limits.max_discard_sectors is zero. > > Therefore, "remaining" never decreases and the loop never terminates. > > > > Before entering the loop, check for the combination of REQ_DISCARD and > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror > > device. Fix the same problem with WRITE_DISCARD while we're at it. > > > > This bug was found by the unfortunate coincidence of pvmove and a > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com> > > Cc: Srinivas Eeda <srinivas.eeda@oracle.com> > > Your patch looks fine but it is laser focused on dm-io. Again, that is > fine (fixes a real problem). But I'm wondering how other targets will > respond in the face of partial discard support across the logical > address space of the DM device. I'm not sure. As far as I could tell, the simple dm targets seem happy enough to pass along the -EOPNOTSUPP code to whomever gave it the bio. I didn't look at complicated things like dm-thin*, dm-cache, or dm-raid456. It might be worth having a regression test that can be applied to all the dm* devices in rapid succession so we can find out. Of the callers that I audited, ext4, xfs, btrfs, and f2fs silently ignore the EOPNOTSUPP code. jfs and swapfiles printk about the error but take no other action. BLKDISCARD (the ioctl) simply returns the error code to userspace. gfs2 and nilfs2 will disable discards; and fat doesn't check error codes at all. None of them panic or otherwise go bonkers. It seems that most FSes expect transient discard failures and/or partial support. > When I implemented dm_table_supports_discards() I consciously allowed a > DM table to contain a mix of discard support. I'm now wondering where > it is we benefit from that? Seems like more of a liability than <shrug> Given the result above, I think we can leave it this way if we perform a code audit, though certainly this cautious approach would mask any other bugs that could be lurking. > anything -- so a bigger hammer approach to fixing this would be to > require all targets and all devices in a DM table support discard. > Which amounts to changing dm_table_supports_discards() to be like > dm_table_supports_write_same(). > > BTW, given dm_table_supports_write_same(), your patch shouldn't need to > worry about WRITE SAME. Did you experience issues with WRITE SAME too > or were you just being proactive? I was being (perhaps overly) proactive, at 1am. :) That part of the if statement can go away. I'll send a v2. --D > > Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> When I implemented dm_table_supports_discards() I consciously
Mike> allowed a DM table to contain a mix of discard support. I'm now
Mike> wondering where it is we benefit from that? Seems like more of a
Mike> liability than anything -- so a bigger hammer approach to fixing
Mike> this would be to require all targets and all devices in a DM table
Mike> support discard.
I think our original rationale was that since discard is only a hint it
would be fine to mix and match. And at the time there seemed to be value
in supporting a heterogeneous setups with say a disk drive and an SSD.
Back then the SSD vendors were all busy telling us how crucial discard
would be going forward. However, that turned out not to be the case and
discard often causes more problems than it solves. So I'm perfectly OK
with requiring all devices in a table to have the same capabilities. In
many ways I think that's a cleaner approach.
On Fri, Feb 13, 2015 at 02:21:01PM -0500, Martin K. Petersen wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> When I implemented dm_table_supports_discards() I consciously > Mike> allowed a DM table to contain a mix of discard support. I'm now > Mike> wondering where it is we benefit from that? Seems like more of a > Mike> liability than anything -- so a bigger hammer approach to fixing > Mike> this would be to require all targets and all devices in a DM table > Mike> support discard. > > I think our original rationale was that since discard is only a hint it > would be fine to mix and match. And at the time there seemed to be value > in supporting a heterogeneous setups with say a disk drive and an SSD. > > Back then the SSD vendors were all busy telling us how crucial discard > would be going forward. However, that turned out not to be the case and > discard often causes more problems than it solves. So I'm perfectly OK > with requiring all devices in a table to have the same capabilities. In > many ways I think that's a cleaner approach. I agree that imposing that extra requirement would be cleaner from a software engineering POV. That said, given that discard is advisory and most of the callers seem to anticipate that it might not work, why not allow heterogeneous mixes? It seems unfortunate to remove that ability just because there are kernel bugs. If you're implementing thin provisioning and are unmapping storage when discard requests come through, wouldn't it be an antifeature that this suddenly stops just because something got in the way? fstrim ought to be able to talk to the parts of the compound device that can be trimmed. Second question: Can a dm device detect that q->limits.max_discard_sectors has changed in one of the devices it's sitting on? Say I have this: X -> Y -> SSD1 \-> Z -> SSD2 SSD*, Z, Y, and X all advertise discard. Then I change Y to point to a spinny disk: X -> Y -> HDD \-> Z -> SSD2 Even if Y notices that it no longer supports discard, will X figure that out too? --D > > -- > Martin K. Petersen Oracle Linux Engineering -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Feb 13 2015 at 3:07P -0500, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Fri, Feb 13, 2015 at 02:21:01PM -0500, Martin K. Petersen wrote: > > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > > > Mike> When I implemented dm_table_supports_discards() I consciously > > Mike> allowed a DM table to contain a mix of discard support. I'm now > > Mike> wondering where it is we benefit from that? Seems like more of a > > Mike> liability than anything -- so a bigger hammer approach to fixing > > Mike> this would be to require all targets and all devices in a DM table > > Mike> support discard. > > > > I think our original rationale was that since discard is only a hint it > > would be fine to mix and match. And at the time there seemed to be value > > in supporting a heterogeneous setups with say a disk drive and an SSD. > > > > Back then the SSD vendors were all busy telling us how crucial discard > > would be going forward. However, that turned out not to be the case and > > discard often causes more problems than it solves. So I'm perfectly OK > > with requiring all devices in a table to have the same capabilities. In > > many ways I think that's a cleaner approach. > > I agree that imposing that extra requirement would be cleaner from a software > engineering POV. > > That said, given that discard is advisory and most of the callers seem to > anticipate that it might not work, why not allow heterogeneous mixes? It seems > unfortunate to remove that ability just because there are kernel bugs. If > you're implementing thin provisioning and are unmapping storage when discard > requests come through, wouldn't it be an antifeature that this suddenly stops > just because something got in the way? fstrim ought to be able to talk to > the parts of the compound device that can be trimmed. Yeah, I'm OK with leaving it as is for now (allowing a mix). So I'll be picking your dm-io patch up for 3.20-rc. But if other similar bugs manifest then we can revisit disallowing a mix. As for your question on thin provisioning. The DM thinp target advertises support for discards even if the underlying data device doesn't (unless the user explicitly asked for discards to be disabled on the thin-pool). This is accomplished with the target's 'discards_supported' override. So even if we did make the change to disallowing a mix: the 'discards_supported' override would still enable discards. > Second question: Can a dm device detect that q->limits.max_discard_sectors has > changed in one of the devices it's sitting on? Say I have this: > > X -> Y -> SSD1 > \-> Z -> SSD2 > > SSD*, Z, Y, and X all advertise discard. > > Then I change Y to point to a spinny disk: > > X -> Y -> HDD > \-> Z -> SSD2 > > Even if Y notices that it no longer supports discard, will X figure that out > too? No, the DM table reload for Y would alter the discard capability of Y but it will _not_ bubble up to X. (would be nice if such limit stacking with refresh but...) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> Yeah, I'm OK with leaving it as is for now (allowing a mix).
That's fine with me as long as you're comfortable that we're handling
the corner cases correctly.
Mike> So I'll be picking your dm-io patch up for 3.20-rc.
Should also go to stable.
Mike> No, the DM table reload for Y would alter the discard capability
Mike> of Y but it will _not_ bubble up to X.
Mike> (would be nice if such limit stacking with refresh but...)
Yeah. We keep talking about this. Maybe we can have a quick brain storm
at LSF/MM?
On Fri, 13 Feb 2015, Mike Snitzer wrote: > On Fri, Feb 13 2015 at 4:24am -0500, > Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > I created a dm-raid1 device backed by a device that supports DISCARD > > and another device that does NOT support DISCARD with the following > > dm configuration: > > > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo > > # lsblk -D > > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO > > sda 0 4K 1G 0 > > `-moo (dm-0) 0 4K 1G 0 > > sdb 0 0B 0B 0 > > `-moo (dm-0) 0 4K 1G 0 > > > > Notice that the mirror device /dev/mapper/moo advertises DISCARD > > support even though one of the mirror halves doesn't. > > > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite > > loop in do_region() when it tries to issue a DISCARD request to sdb. > > The problem is that when we call do_region() against sdb, num_sectors > > is set to zero because q->limits.max_discard_sectors is zero. > > Therefore, "remaining" never decreases and the loop never terminates. > > > > Before entering the loop, check for the combination of REQ_DISCARD and > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror > > device. Fix the same problem with WRITE_DISCARD while we're at it. > > > > This bug was found by the unfortunate coincidence of pvmove and a > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com> > > Cc: Srinivas Eeda <srinivas.eeda@oracle.com> > > Your patch looks fine but it is laser focused on dm-io. Again, that is > fine (fixes a real problem). But I'm wondering how other targets will > respond in the face of partial discard support across the logical > address space of the DM device. > > When I implemented dm_table_supports_discards() I consciously allowed a > DM table to contain a mix of discard support. I'm now wondering where > it is we benefit from that? Seems like more of a liability than > anything -- so a bigger hammer approach to fixing this would be to > require all targets and all devices in a DM table support discard. > Which amounts to changing dm_table_supports_discards() to be like > dm_table_supports_write_same(). > > BTW, given dm_table_supports_write_same(), your patch shouldn't need to > worry about WRITE SAME. Did you experience issues with WRITE SAME too > or were you just being proactive? > > Mike I think that Darrick's patch is needed even for WRITE SAME. Note that queue limits and flags can't be reliably prevent bios from coming in. For example: 1. Some piece of code tests queue flags and sees that max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends it with submit_bio. 2. Meanwhile, the device is reconfigured so that it doesn't support WRITE_SAME. q->limits.max_write_same_sectors is set to zero. 3. The bio submitted at step 1 can't be reverted, so it arrives at the device mapper even if it advertises that it doesn't support write same - now, it causes the lockup that Darrick observed. Another problem is that queue flags are not propagated up when you reload a single device - someone could reload a mirror leg with a different dm table that doesn't support write_same, and even after the reload, the mirror keeps advertising that it does support WRITE_SAME. Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, 26 Feb 2015, Mikulas Patocka wrote: > > > On Fri, 13 Feb 2015, Mike Snitzer wrote: > > > On Fri, Feb 13 2015 at 4:24am -0500, > > Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > > I created a dm-raid1 device backed by a device that supports DISCARD > > > and another device that does NOT support DISCARD with the following > > > dm configuration: > > > > > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo > > > # lsblk -D > > > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO > > > sda 0 4K 1G 0 > > > `-moo (dm-0) 0 4K 1G 0 > > > sdb 0 0B 0B 0 > > > `-moo (dm-0) 0 4K 1G 0 > > > > > > Notice that the mirror device /dev/mapper/moo advertises DISCARD > > > support even though one of the mirror halves doesn't. > > > > > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl > > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite > > > loop in do_region() when it tries to issue a DISCARD request to sdb. > > > The problem is that when we call do_region() against sdb, num_sectors > > > is set to zero because q->limits.max_discard_sectors is zero. > > > Therefore, "remaining" never decreases and the loop never terminates. > > > > > > Before entering the loop, check for the combination of REQ_DISCARD and > > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror > > > device. Fix the same problem with WRITE_DISCARD while we're at it. > > > > > > This bug was found by the unfortunate coincidence of pvmove and a > > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com> > > > Cc: Srinivas Eeda <srinivas.eeda@oracle.com> > > > > Your patch looks fine but it is laser focused on dm-io. Again, that is > > fine (fixes a real problem). But I'm wondering how other targets will > > respond in the face of partial discard support across the logical > > address space of the DM device. > > > > When I implemented dm_table_supports_discards() I consciously allowed a > > DM table to contain a mix of discard support. I'm now wondering where > > it is we benefit from that? Seems like more of a liability than > > anything -- so a bigger hammer approach to fixing this would be to > > require all targets and all devices in a DM table support discard. > > Which amounts to changing dm_table_supports_discards() to be like > > dm_table_supports_write_same(). > > > > BTW, given dm_table_supports_write_same(), your patch shouldn't need to > > worry about WRITE SAME. Did you experience issues with WRITE SAME too > > or were you just being proactive? > > > > Mike > > I think that Darrick's patch is needed even for WRITE SAME. > > Note that queue limits and flags can't be reliably prevent bios from > coming in. > > For example: > > 1. Some piece of code tests queue flags and sees that > max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends > it with submit_bio. > > 2. Meanwhile, the device is reconfigured so that it doesn't support > WRITE_SAME. q->limits.max_write_same_sectors is set to zero. > > 3. The bio submitted at step 1 can't be reverted, so it arrives at the > device mapper even if it advertises that it doesn't support write same - > now, it causes the lockup that Darrick observed. > > > Another problem is that queue flags are not propagated up when you reload > a single device - someone could reload a mirror leg with a different dm > table that doesn't support write_same, and even after the reload, the > mirror keeps advertising that it does support WRITE_SAME. It comes to another idea - if the limits change while the do-while loop is in progress, even the original Darrick's patch is wrong and fails to prevent the lockup. So - we need to read the limits in advance, test them and never re-read them. Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, Feb 26, 2015 at 02:56:20PM -0500, Mikulas Patocka wrote: > > > On Fri, 13 Feb 2015, Mike Snitzer wrote: > > > On Fri, Feb 13 2015 at 4:24am -0500, > > Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > > I created a dm-raid1 device backed by a device that supports DISCARD > > > and another device that does NOT support DISCARD with the following > > > dm configuration: > > > > > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo > > > # lsblk -D > > > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO > > > sda 0 4K 1G 0 > > > `-moo (dm-0) 0 4K 1G 0 > > > sdb 0 0B 0B 0 > > > `-moo (dm-0) 0 4K 1G 0 > > > > > > Notice that the mirror device /dev/mapper/moo advertises DISCARD > > > support even though one of the mirror halves doesn't. > > > > > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl > > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite > > > loop in do_region() when it tries to issue a DISCARD request to sdb. > > > The problem is that when we call do_region() against sdb, num_sectors > > > is set to zero because q->limits.max_discard_sectors is zero. > > > Therefore, "remaining" never decreases and the loop never terminates. > > > > > > Before entering the loop, check for the combination of REQ_DISCARD and > > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror > > > device. Fix the same problem with WRITE_DISCARD while we're at it. > > > > > > This bug was found by the unfortunate coincidence of pvmove and a > > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com> > > > Cc: Srinivas Eeda <srinivas.eeda@oracle.com> > > > > Your patch looks fine but it is laser focused on dm-io. Again, that is > > fine (fixes a real problem). But I'm wondering how other targets will > > respond in the face of partial discard support across the logical > > address space of the DM device. > > > > When I implemented dm_table_supports_discards() I consciously allowed a > > DM table to contain a mix of discard support. I'm now wondering where > > it is we benefit from that? Seems like more of a liability than > > anything -- so a bigger hammer approach to fixing this would be to > > require all targets and all devices in a DM table support discard. > > Which amounts to changing dm_table_supports_discards() to be like > > dm_table_supports_write_same(). > > > > BTW, given dm_table_supports_write_same(), your patch shouldn't need to > > worry about WRITE SAME. Did you experience issues with WRITE SAME too > > or were you just being proactive? > > > > Mike > > I think that Darrick's patch is needed even for WRITE SAME. > > Note that queue limits and flags can't be reliably prevent bios from > coming in. > > For example: > > 1. Some piece of code tests queue flags and sees that > max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends > it with submit_bio. > > 2. Meanwhile, the device is reconfigured so that it doesn't support > WRITE_SAME. q->limits.max_write_same_sectors is set to zero. > > 3. The bio submitted at step 1 can't be reverted, so it arrives at the > device mapper even if it advertises that it doesn't support write same - > now, it causes the lockup that Darrick observed. I'd pondered patching the WRITE SAME case too. > Another problem is that queue flags are not propagated up when you reload > a single device - someone could reload a mirror leg with a different dm > table that doesn't support write_same, and even after the reload, the > mirror keeps advertising that it does support WRITE_SAME. Not sure how to deal with that -- I suppose we could save the 'last recorded q limits' and watch for changes, but I suppose that depends on how resilient callers are against DISCARD and WRITE SAME returning -EOPNOTSUPP. So far as I can tell the in-kernel caller handles it gracefully enough, and I'd hope that any sane userland program would know to follow up with a regular write (if applicable), but who knows... > It comes to another idea - if the limits change while the do-while loop is > in progress, even the original Darrick's patch is wrong and fails to > prevent the lockup. So - we need to read the limits in advance, test them > and never re-read them. Agreed. I /think/ issuing discard/write same to a device that no longer supports it will return an error... or at least in my crummy 5 minute test it seemed to work. --D > > > Mikulas > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c index c09359d..32d330a 100644 --- a/drivers/md/dm-io.c +++ b/drivers/md/dm-io.c @@ -290,6 +290,13 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where, unsigned short logical_block_size = queue_logical_block_size(q); sector_t num_sectors; + /* Reject unsupported discard/write same requests */ + if (((rw & REQ_DISCARD) && !blk_queue_discard(q)) || + ((rw & REQ_WRITE_SAME) && !bdev_write_same(where->bdev))) { + dec_count(io, region, -EOPNOTSUPP); + return; + } + /* * where->count may be zero if rw holds a flush and we need to * send a zero-sized flush.
I created a dm-raid1 device backed by a device that supports DISCARD and another device that does NOT support DISCARD with the following dm configuration: # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo # lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 4K 1G 0 `-moo (dm-0) 0 4K 1G 0 sdb 0 0B 0B 0 `-moo (dm-0) 0 4K 1G 0 Notice that the mirror device /dev/mapper/moo advertises DISCARD support even though one of the mirror halves doesn't. If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite loop in do_region() when it tries to issue a DISCARD request to sdb. The problem is that when we call do_region() against sdb, num_sectors is set to zero because q->limits.max_discard_sectors is zero. Therefore, "remaining" never decreases and the loop never terminates. Before entering the loop, check for the combination of REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up the mirror device. Fix the same problem with WRITE_DISCARD while we're at it. This bug was found by the unfortunate coincidence of pvmove and a discard operation in the RHEL 6.5 kernel; 3.19 is also affected. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Srinivas Eeda <srinivas.eeda@oracle.com> --- drivers/md/dm-io.c | 7 +++++++ 1 file changed, 7 insertions(+) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel