Message ID | 20170626090848.21585-1-zren@suse.com (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Mon, Jun 26, 2017 at 05:08:48PM +0800, Eric Ren wrote: > When the primary mirror device fails, activating a mirrored > LV will crash the kernel. It can be reproduced 100% with the > scripts below: > > """ > dd if=/dev/zero of=file.raw bs=1G count=1 > loopdev=$(losetup -f) > losetup $loopdev file.raw > dmsetup create pv1 --table "0 102399 linear $loopdev 0" > dmsetup create pv2 --table "0 102399 linear $loopdev 102400" > vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 > lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ > vgtest /dev/mapper/pv1 /dev/mapper/pv2 > vgchange -an vgtest > echo 0 10000000 error | dmsetup load /dev/mapper/pv1 > dmsetup resume /dev/mapper/pv1 > vgchange -ay vgtest > """ So as you have a reproducer, can you eventually write an blktest [1] regression test for it? Thanks, Johannes [1] https://github.com/osandov/blktests
Dne 26.6.2017 v 11:08 Eric Ren napsal(a): > When the primary mirror device fails, activating a mirrored > LV will crash the kernel. It can be reproduced 100% with the > scripts below: > > """ > dd if=/dev/zero of=file.raw bs=1G count=1 > loopdev=$(losetup -f) > losetup $loopdev file.raw > dmsetup create pv1 --table "0 102399 linear $loopdev 0" > dmsetup create pv2 --table "0 102399 linear $loopdev 102400" > vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 > lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ > vgtest /dev/mapper/pv1 /dev/mapper/pv2 > vgchange -an vgtest > echo 0 10000000 error | dmsetup load /dev/mapper/pv1 > dmsetup resume /dev/mapper/pv1 > vgchange -ay vgtest > " > The call trace: > """ > [ 287.008629] device-mapper: raid1: Unable to read primary mirror during recovery > [ 287.008632] device-mapper: raid1: Primary mirror (254:10) failed while out-of-sync: Reads may fail. > ... > [ 287.012480] BUG: unable to handle kernel NULL pointer dereference at 0000000000000019 > [ 287.012515] IP: [<ffffffffa00d944f>] mirror_end_io+0x7f/0x130 [dm_mirror] > ... > [ 291.994645] Call Trace: > [ 291.994671] [<ffffffffa007b215>] clone_endio+0x35/0xe0 [dm_mod] > [ 291.994675] [<ffffffffa0589ced>] do_reads+0x17d/0x1d0 [dm_mirror] > [ 291.994680] [<ffffffffa058af5c>] do_mirror+0xec/0x250 [dm_mirror] > [ 291.994687] [<ffffffff810958fe>] process_one_work+0x14e/0x410 > [ 291.994691] [<ffffffff81096156>] worker_thread+0x116/0x490 > [ 291.994694] [<ffffffff8109b627>] kthread+0xc7/0xe0 > """ > > Fixes it by setting "details.bi_bdev" to NULL in error path beforing > calling into mirror_end_io(), which will fail the IO properly. Hi Which kernel version is this ? I'd thought we've already fixed this BZ for old mirrors: https://bugzilla.redhat.com/show_bug.cgi?id=1382382 There similar BZ for md-raid based mirrors (--type raid1) https://bugzilla.redhat.com/show_bug.cgi?id=1416099 Regards Zdenej -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi, On 06/26/2017 05:14 PM, Johannes Thumshirn wrote: > On Mon, Jun 26, 2017 at 05:08:48PM +0800, Eric Ren wrote: >> When the primary mirror device fails, activating a mirrored >> LV will crash the kernel. It can be reproduced 100% with the >> scripts below: >> >> """ >> dd if=/dev/zero of=file.raw bs=1G count=1 >> loopdev=$(losetup -f) >> losetup $loopdev file.raw >> dmsetup create pv1 --table "0 102399 linear $loopdev 0" >> dmsetup create pv2 --table "0 102399 linear $loopdev 102400" >> vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 >> lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ >> vgtest /dev/mapper/pv1 /dev/mapper/pv2 >> vgchange -an vgtest >> echo 0 10000000 error | dmsetup load /dev/mapper/pv1 >> dmsetup resume /dev/mapper/pv1 >> vgchange -ay vgtest >> """ > So as you have a reproducer, can you eventually write an blktest [1] > regression test for it? No problem, will do later on. Thanks, Eric > > > Thanks, > Johannes > > [1] https://github.com/osandov/blktests > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi Zdenek, On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: > Dne 26.6.2017 v 11:08 Eric Ren napsal(a): >> > [... snip...] > > Hi > > Which kernel version is this ? > > I'd thought we've already fixed this BZ for old mirrors: > https://bugzilla.redhat.com/show_bug.cgi?id=1382382 > > There similar BZ for md-raid based mirrors (--type raid1) > https://bugzilla.redhat.com/show_bug.cgi?id=1416099 My base kernel version is 4.4.68, but with this 2 latest fixes applied: """ Revert "dm mirror: use all available legs on multiple failures" dm io: fix duplicate bio completion due to missing ref count """ Yes, I've been working on dm-mirror crash issue for couple days, but only see this fixes above when I rebased to send my fixes out, hah. But, with this 2 fixes, the producer can still crash the kernel. Anyway, I can test with upstream kernel again ;-P Regards, Eric > > > Regards > > Zdenej > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Dne 26.6.2017 v 12:55 Eric Ren napsal(a): > Hi Zdenek, > > > On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: >> Dne 26.6.2017 v 11:08 Eric Ren napsal(a): >>> >> [... snip...] >> >> Hi >> >> Which kernel version is this ? >> >> I'd thought we've already fixed this BZ for old mirrors: >> https://bugzilla.redhat.com/show_bug.cgi?id=1382382 >> >> There similar BZ for md-raid based mirrors (--type raid1) >> https://bugzilla.redhat.com/show_bug.cgi?id=1416099 > My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > """ > Revert "dm mirror: use all available legs on multiple failures" Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. I'm now starting to wonder why? It's been a real fix for a real issue - and 'revert' message states there is no such problem ?? I'm confused.... Mike - have you tried the sequence from BZ ? Zdenek -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Jun 26 2017 at 7:49am -0400, Zdenek Kabelac <zkabelac@redhat.com> wrote: > Dne 26.6.2017 v 12:55 Eric Ren napsal(a): > >Hi Zdenek, > > > > > >On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: > >>Dne 26.6.2017 v 11:08 Eric Ren napsal(a): > >>> > >>[... snip...] > >> > >>Hi > >> > >>Which kernel version is this ? > >> > >>I'd thought we've already fixed this BZ for old mirrors: > >>https://bugzilla.redhat.com/show_bug.cgi?id=1382382 > >> > >>There similar BZ for md-raid based mirrors (--type raid1) > >>https://bugzilla.redhat.com/show_bug.cgi?id=1416099 > >My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > > >""" > >Revert "dm mirror: use all available legs on multiple failures" > > Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. > > I'm now starting to wonder why? > > It's been a real fix for a real issue - and 'revert' message states > there is no such problem ?? > > I'm confused.... > > Mike - have you tried the sequence from BZ ? Jon (cc'd) tested it and found that the original issue wasn't reproducible. But that the commit in question was the source of crashes (likely due to needing the fix from this thread). We can revisit for 4.13 -- but I'll defer to Heinz. Eric, please share your results from testing the latest 4.12-rc7. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi, On 06/26/2017 06:55 PM, Eric Ren wrote: > Hi Zdenek, > > > On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: >> Dne 26.6.2017 v 11:08 Eric Ren napsal(a): >>> >> [... snip...] >> >> Hi >> >> Which kernel version is this ? >> >> I'd thought we've already fixed this BZ for old mirrors: >> https://bugzilla.redhat.com/show_bug.cgi?id=1382382 >> >> There similar BZ for md-raid based mirrors (--type raid1) >> https://bugzilla.redhat.com/show_bug.cgi?id=1416099 > My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > """ > Revert "dm mirror: use all available legs on multiple failures" > dm io: fix duplicate bio completion due to missing ref count I have a confusion about this "dm io..." fix. The fix itself is good. Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev has failed, will crash the kernel with the discard operation from mkfs.ext4. However, mkfs.ext4 can succeed on a healthy mirrored device. This is the thing I don't understand, because no matter the mirrored device is good or not, there's always a duplicate bio completion before having this this fix, thus write_callback() will be called twice, crashing will occur on the second write_callback(): """ static void write_callback(unsigned long error, void *context) { unsigned i; struct bio *bio = (struct bio *) context; struct mirror_set *ms; int should_wake = 0; unsigned long flags; ms = bio_get_m(bio)->ms; ====> NULL pointer at the duplicate completion bio_set_m(bio, NULL); """ If no this fix, I expected the DISCARD IO would always crash the kernel, but it's not true when the mirrored device is good. Hope someone happen to know the reason can give some hints ;-P Regards, Eric > """ > > Yes, I've been working on dm-mirror crash issue for couple days, > but only see this fixes above when I rebased to send my fixes out, hah. > > But, with this 2 fixes, the producer can still crash the kernel. > Anyway, I can > test with upstream kernel again ;-P > > Regards, > Eric > >> >> >> Regards >> >> Zdenej >> > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi Mike, On 06/26/2017 09:43 PM, Mike Snitzer wrote: > On Mon, Jun 26 2017 at 7:49am -0400, > Zdenek Kabelac <zkabelac@redhat.com> wrote: > >> [...snip...] >> Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. >> >> I'm now starting to wonder why? >> >> It's been a real fix for a real issue - and 'revert' message states >> there is no such problem ?? >> >> I'm confused.... >> >> Mike - have you tried the sequence from BZ ? > Jon (cc'd) tested it and found that the original issue wasn't > reproducible. But that the commit in question was the source of crashes > (likely due to needing the fix from this thread). > > We can revisit for 4.13 -- but I'll defer to Heinz. > > Eric, please share your results from testing the latest 4.12-rc7. Sure, but I will do it tomorrow, different timezone ;-) Maybe, Jon can also try my producer on the latest kernel. So, we can double confirm with each other. Thanks, Eric > > Mike > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Jun 26 2017 at 9:47am -0400, Eric Ren <zren@suse.com> wrote: > Hi, > > On 06/26/2017 06:55 PM, Eric Ren wrote: > >Hi Zdenek, > > > > > >On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: > >>Dne 26.6.2017 v 11:08 Eric Ren napsal(a): > >>> > >>[... snip...] > >> > >>Hi > >> > >>Which kernel version is this ? > >> > >>I'd thought we've already fixed this BZ for old mirrors: > >>https://bugzilla.redhat.com/show_bug.cgi?id=1382382 > >> > >>There similar BZ for md-raid based mirrors (--type raid1) > >>https://bugzilla.redhat.com/show_bug.cgi?id=1416099 > >My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > > >""" > >Revert "dm mirror: use all available legs on multiple failures" > >dm io: fix duplicate bio completion due to missing ref count > > I have a confusion about this "dm io..." fix. The fix itself is good. > > Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev > has failed, will crash the kernel with the discard operation from mkfs.ext4. > However, mkfs.ext4 can succeed on a healthy mirrored device. This > is the thing I don't understand, because no matter the mirrored device is > good or not, there's always a duplicate bio completion before having this > this fix, thus write_callback() will be called twice, crashing will > occur on the > second write_callback(): No, there is only a duplicate bio completion if the error path is taken (e.g. underlying device doesn't support discard). > """ > static void write_callback(unsigned long error, void *context) > { > unsigned i; > struct bio *bio = (struct bio *) context; > struct mirror_set *ms; > int should_wake = 0; > unsigned long flags; > > ms = bio_get_m(bio)->ms; ====> NULL pointer at the > duplicate completion > bio_set_m(bio, NULL); > """ > > If no this fix, I expected the DISCARD IO would always crash the > kernel, but it's not true when > the mirrored device is good. Hope someone happen to know the reason > can give some hints ;-P If the mirror is healthy then only one completion is returned to dm-mirror (via write_callback). The problem was the error patch wasn't managing the reference count as needed. Whereas dm-io's normal discard IO path does. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi Mike, On 06/26/2017 10:37 PM, Mike Snitzer wrote: > On Mon, Jun 26 2017 at 9:47am -0400, > Eric Ren <zren@suse.com> wrote: > [...snip...] >>> """ >>> Revert "dm mirror: use all available legs on multiple failures" >>> dm io: fix duplicate bio completion due to missing ref count >> I have a confusion about this "dm io..." fix. The fix itself is good. >> >> Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev >> has failed, will crash the kernel with the discard operation from mkfs.ext4. >> However, mkfs.ext4 can succeed on a healthy mirrored device. This >> is the thing I don't understand, because no matter the mirrored device is >> good or not, there's always a duplicate bio completion before having this >> this fix, thus write_callback() will be called twice, crashing will >> occur on the >> second write_callback(): > No, there is only a duplicate bio completion if the error path is taken > (e.g. underlying device doesn't support discard). Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region(): """ static void do_region(int op, int op_flags, unsigned region, struct dm_io_region *where, struct dpages *dp, struct io *io) { ... if (op == REQ_OP_DISCARD) special_cmd_max_sectors = q->limits.max_discard_sectors; ... if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) { atomic_inc(&io->count); ===> [1] dec_count(io, region, -EOPNOTSUPP); ===> [2] return; } """ [1] ref count fixed by patch "dm io: ..."; [2] we won't come here if "special_cmd_max_sectors != 0", which is true when both sides of the mirror is good. So only when a mirror device fails, "max_discard_sectors" on its queue is 0, thus this error path will be taken, right? > >> """ >> static void write_callback(unsigned long error, void *context) >> { >> unsigned i; >> struct bio *bio = (struct bio *) context; >> struct mirror_set *ms; >> int should_wake = 0; >> unsigned long flags; >> >> ms = bio_get_m(bio)->ms; ====> NULL pointer at the >> duplicate completion >> bio_set_m(bio, NULL); >> """ >> >> If no this fix, I expected the DISCARD IO would always crash the >> kernel, but it's not true when >> the mirrored device is good. Hope someone happen to know the reason >> can give some hints ;-P > If the mirror is healthy then only one completion is returned to > dm-mirror (via write_callback). The problem was the error patch wasn't > managing the reference count as needed. Whereas dm-io's normal discard > IO path does. Yes, I can understand this. Thanks a lot, Eric > > Mike > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi, On 06/26/2017 11:27 PM, Eric Ren wrote: > Hi Mike, > > > On 06/26/2017 10:37 PM, Mike Snitzer wrote: >> On Mon, Jun 26 2017 at 9:47am -0400, >> Eric Ren <zren@suse.com> wrote: >> [...snip...] >>>> """ >>>> Revert "dm mirror: use all available legs on multiple failures" >>>> dm io: fix duplicate bio completion due to missing ref count >>> I have a confusion about this "dm io..." fix. The fix itself is good. >>> >>> Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev >>> has failed, will crash the kernel with the discard operation from >>> mkfs.ext4. >>> However, mkfs.ext4 can succeed on a healthy mirrored device. This >>> is the thing I don't understand, because no matter the mirrored >>> device is >>> good or not, there's always a duplicate bio completion before having >>> this >>> this fix, thus write_callback() will be called twice, crashing will >>> occur on the >>> second write_callback(): >> No, there is only a duplicate bio completion if the error path is taken >> (e.g. underlying device doesn't support discard). > Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region(): > > """ > static void do_region(int op, int op_flags, unsigned region, > struct dm_io_region *where, struct dpages *dp, > struct io *io) > { > ... > if (op == REQ_OP_DISCARD) > special_cmd_max_sectors = q->limits.max_discard_sectors; > ... > if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES || > op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) { > atomic_inc(&io->count); ===> [1] > dec_count(io, region, -EOPNOTSUPP); ===> [2] > return; > } > """ > > [1] ref count fixed by patch "dm io: ..."; > [2] we won't come here if "special_cmd_max_sectors != 0", which is > true when both sides > of the mirror is good. > > So only when a mirror device fails, "max_discard_sectors" on its queue > is 0, thus this error path > will be taken, right? Yes, I checked this as below: - print info """ @@ -310,8 +310,10 @@ static void do_region(int op, int op_flags, unsigned region, /* * Reject unsupported discard and write same requests. */ - if (op == REQ_OP_DISCARD) + if (op == REQ_OP_DISCARD) { special_cmd_max_sectors = q->limits.max_discard_sectors; + DMERR("do_region: op=DISCARD, q->limits.max_discard_sectors=%d\n", special_cmd_max_sectors); + } """ - log message when mkfs on mirror device that primary leg fails: [ 169.672880] device-mapper: io: do_region: op=DISCARD, q->limits.max_discard_sectors=0 ===> for the failed leg [ 169.672885] device-mapper: io: do_region: op=DISCARD, q->limits.max_discard_sectors=8388607 ===> for the good leg Thanks, Eric > >> >>> """ >>> static void write_callback(unsigned long error, void *context) >>> { >>> unsigned i; >>> struct bio *bio = (struct bio *) context; >>> struct mirror_set *ms; >>> int should_wake = 0; >>> unsigned long flags; >>> >>> ms = bio_get_m(bio)->ms; ====> NULL pointer at the >>> duplicate completion >>> bio_set_m(bio, NULL); >>> """ >>> >>> If no this fix, I expected the DISCARD IO would always crash the >>> kernel, but it's not true when >>> the mirrored device is good. Hope someone happen to know the reason >>> can give some hints ;-P >> If the mirror is healthy then only one completion is returned to >> dm-mirror (via write_callback). The problem was the error patch wasn't >> managing the reference count as needed. Whereas dm-io's normal discard >> IO path does. > > Yes, I can understand this. > > Thanks a lot, > Eric > >> >> Mike >> >> -- >> dm-devel mailing list >> dm-devel@redhat.com >> https://www.redhat.com/mailman/listinfo/dm-devel >> > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hi Mike, On 06/26/2017 09:43 PM, Mike Snitzer wrote: > On Mon, Jun 26 2017 at 7:49am -0400, > Zdenek Kabelac <zkabelac@redhat.com> wrote: > > [...snip...] > Jon (cc'd) tested it and found that the original issue wasn't > reproducible. But that the commit in question was the source of crashes > (likely due to needing the fix from this thread). > > We can revisit for 4.13 -- but I'll defer to Heinz. > > Eric, please share your results from testing the latest 4.12-rc7. Testing on 4.12-rc7 doesn't crash the kernel. Sorry for the noise, please ignore this patch. It's still a confusion to me why backporting this fixes has problem. Will update this thread, if I find something. Regards, Eric > > Mike > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index 4da8858..696e308 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -568,6 +568,7 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads) region_t region; struct bio *bio; struct mirror *m; + struct dm_raid1_bio_record *bio_record; while ((bio = bio_list_pop(reads))) { region = dm_rh_bio_to_region(ms->rh, bio); @@ -583,8 +584,16 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads) if (likely(m)) read_async_bio(m, bio); - else + else { + /* + * In mirror_end_io(), we will fail the IO properly + * if details.bi_bdev is NULL. + */ + bio_record = dm_per_bio_data(bio, + sizeof(struct dm_raid1_bio_record)); + bio_record->details.bi_bdev = NULL; bio_io_error(bio); + } } }
When the primary mirror device fails, activating a mirrored LV will crash the kernel. It can be reproduced 100% with the scripts below: """ dd if=/dev/zero of=file.raw bs=1G count=1 loopdev=$(losetup -f) losetup $loopdev file.raw dmsetup create pv1 --table "0 102399 linear $loopdev 0" dmsetup create pv2 --table "0 102399 linear $loopdev 102400" vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ vgtest /dev/mapper/pv1 /dev/mapper/pv2 vgchange -an vgtest echo 0 10000000 error | dmsetup load /dev/mapper/pv1 dmsetup resume /dev/mapper/pv1 vgchange -ay vgtest """ The call trace: """ [ 287.008629] device-mapper: raid1: Unable to read primary mirror during recovery [ 287.008632] device-mapper: raid1: Primary mirror (254:10) failed while out-of-sync: Reads may fail. ... [ 287.012480] BUG: unable to handle kernel NULL pointer dereference at 0000000000000019 [ 287.012515] IP: [<ffffffffa00d944f>] mirror_end_io+0x7f/0x130 [dm_mirror] ... [ 291.994645] Call Trace: [ 291.994671] [<ffffffffa007b215>] clone_endio+0x35/0xe0 [dm_mod] [ 291.994675] [<ffffffffa0589ced>] do_reads+0x17d/0x1d0 [dm_mirror] [ 291.994680] [<ffffffffa058af5c>] do_mirror+0xec/0x250 [dm_mirror] [ 291.994687] [<ffffffff810958fe>] process_one_work+0x14e/0x410 [ 291.994691] [<ffffffff81096156>] worker_thread+0x116/0x490 [ 291.994694] [<ffffffff8109b627>] kthread+0xc7/0xe0 """ Fixes it by setting "details.bi_bdev" to NULL in error path beforing calling into mirror_end_io(), which will fail the IO properly. Reported-by: Jerry Tang <jtang@suse.com> Signed-off-by: Eric Ren <zren@suse.com> --- drivers/md/dm-raid1.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)