Message ID | 20150812114631.15268065@noble (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Wed, 12 Aug 2015, NeilBrown wrote: > Ahh - thanks for the explanation. That makes sense. > Presumably you need to wait for old reads that you don't have a read > returning old data after a write has confirmed the new data is written? > Is there some other reason? > > As soon as the new "proper" exception is installed, no new reads will > use the old address, so it should be safe to wait without holding the > lock. > > So what about this? This is wrong too - when you add the exception to the complete hash table and drop the lock, writes to the chunk in the origin target are can modify the origin volume. These writes race with pending reads to the snapshot (__chunk_is_tracked tests for those reads). If the I/O scheduler schedules the write to the origin before the read from the snapshot, the read from the snapshot returns corrupted data. There can be more problems with snapshot merging - if the chunk is on the complete hash table, the merging code assumes that it can be safely merged. Mikulas > Thanks for your time, > NeilBrown > > > > diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c > index 7c82d3ccce87..c7a7ed2cfd6d 100644 > --- a/drivers/md/dm-snap.c > +++ b/drivers/md/dm-snap.c > @@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success) > goto out; > } > > - /* Check for conflicting reads */ > - __check_for_conflicting_io(s, pe->e.old_chunk); > + /* Add proper exception. Now all reads will be > + * redirected, so no new reads can be started on > + * 'old_chunk'. > + */ > + dm_insert_exception(&s->complete, e); > + > + /* Drain conflicting reads */ > + if (__chunk_is_tracked(s, pe->e.old_chunk)) { > + up_write(&s->lock); > + __check_for_conflicting_io(s, pe->e.old_chunk); > + down_write(&s->lock); > + } > > /* > - * Add a proper exception, and remove the > - * in-flight exception from the list. > + * And now we can remove the temporary exception. > */ > - dm_insert_exception(&s->complete, e); > > out: > dm_remove_exception(&pe->e); > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, 12 Aug 2015, NeilBrown wrote: > On Tue, 11 Aug 2015 05:14:33 -0400 (EDT) Mikulas Patocka > <mpatocka@redhat.com> wrote: > > > Hi > > > > On Mon, 10 Aug 2015, NeilBrown wrote: > > > > > > > > Hi Mikulas, > > > I have a customer hitting the deadlock you described over a year ago > > > in: > > > > > > Subject: [dm-devel] [PATCH] block: flush queued bios when the process > > > blocks > > > > Ask block layer maintainers to accept that patch. > > Unfortunately I don't really like the patch ... or the bioset rescue > workqueues that it is based on. Sorry. > > So I might keep looking for a more localised approach.... The problem here is that other dm targets may deadlock in a similar way too - for example, dm-thin could deadlock on pmd->pool_lock. The cause of the bug is bio queuing on current->bio_list. There is an assumption that if a dm target submits a bio to a lower-level target, the bio finishes in finite time. Queuing on current->bio_list breaks the assumption, bios can be held indefinitelly on current->bio_list. The patch that flushes current->bio_list is the correct way to fix it - it makes sure that a bio can't be held indefinitely. Another way to fix it would be to abandon current->bio_list --- but then, there would be problem with stack overflow on deeply nested targets. Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, 12 Aug 2015 12:25:42 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 12 Aug 2015, NeilBrown wrote: > > > On Tue, 11 Aug 2015 05:14:33 -0400 (EDT) Mikulas Patocka > > <mpatocka@redhat.com> wrote: > > > > > Hi > > > > > > On Mon, 10 Aug 2015, NeilBrown wrote: > > > > > > > > > > > Hi Mikulas, > > > > I have a customer hitting the deadlock you described over a year ago > > > > in: > > > > > > > > Subject: [dm-devel] [PATCH] block: flush queued bios when the process > > > > blocks > > > > > > Ask block layer maintainers to accept that patch. > > > > Unfortunately I don't really like the patch ... or the bioset rescue > > workqueues that it is based on. Sorry. > > > > So I might keep looking for a more localised approach.... > > The problem here is that other dm targets may deadlock in a similar way > too - for example, dm-thin could deadlock on pmd->pool_lock. > > The cause of the bug is bio queuing on current->bio_list. There is an > assumption that if a dm target submits a bio to a lower-level target, the > bio finishes in finite time. Queuing on current->bio_list breaks the > assumption, bios can be held indefinitelly on current->bio_list. > > The patch that flushes current->bio_list is the correct way to fix it - it > makes sure that a bio can't be held indefinitely. > > Another way to fix it would be to abandon current->bio_list --- but then, > there would be problem with stack overflow on deeply nested targets. > I think it is a bit simplistic to say that current->bio_list is the "cause" of the problem. It is certainly a part of the problem. The assumption you mention is another part - and the two conflict. As you say, we could abandon current->bio_list, but then we risk stack overflows again. Or we could hand the bio_list to a work-queue whenever the make_request function needs to schedule.... but then if handling one of those bios needs to schedule... not sure, it might work. Or we could change the assumption and never block in a make_request function after calling generic_make_request(). Maybe that is difficult. Or we could change __split_and_process_bio to use bio_split() to split the bio, then handle the first and call generic_make_request on the second. That would effectively put the second half on the end of bio_list so it wouldn't be tried until all requests derived from the first half have been handled. None of these is completely straight forward, but I suspect all are possible. I'm leaning towards the last one: when you want to split a bio, use bio_split and handle the two halves separately. Do you have thoughts on that? Thanks, NeilBrown -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Aug 12 2015 at 8:43pm -0400, NeilBrown <neilb@suse.com> wrote: > On Wed, 12 Aug 2015 12:25:42 -0400 (EDT) Mikulas Patocka > <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 12 Aug 2015, NeilBrown wrote: > > > > > On Tue, 11 Aug 2015 05:14:33 -0400 (EDT) Mikulas Patocka > > > <mpatocka@redhat.com> wrote: > > > > > > > Hi > > > > > > > > On Mon, 10 Aug 2015, NeilBrown wrote: > > > > > > > > > > > > > > Hi Mikulas, > > > > > I have a customer hitting the deadlock you described over a year ago > > > > > in: > > > > > > > > > > Subject: [dm-devel] [PATCH] block: flush queued bios when the process > > > > > blocks > > > > > > > > Ask block layer maintainers to accept that patch. > > > > > > Unfortunately I don't really like the patch ... or the bioset rescue > > > workqueues that it is based on. Sorry. > > > > > > So I might keep looking for a more localised approach.... > > > > The problem here is that other dm targets may deadlock in a similar way > > too - for example, dm-thin could deadlock on pmd->pool_lock. > > > > The cause of the bug is bio queuing on current->bio_list. There is an > > assumption that if a dm target submits a bio to a lower-level target, the > > bio finishes in finite time. Queuing on current->bio_list breaks the > > assumption, bios can be held indefinitelly on current->bio_list. > > > > The patch that flushes current->bio_list is the correct way to fix it - it > > makes sure that a bio can't be held indefinitely. > > > > Another way to fix it would be to abandon current->bio_list --- but then, > > there would be problem with stack overflow on deeply nested targets. > > > > I think it is a bit simplistic to say that current->bio_list is the > "cause" of the problem. It is certainly a part of the problem. The > assumption you mention is another part - and the two conflict. > > As you say, we could abandon current->bio_list, but then we risk stack > overflows again. > Or we could hand the bio_list to a work-queue whenever the make_request > function needs to schedule.... but then if handling one of those bios > needs to schedule... not sure, it might work. > > Or we could change the assumption and never block in a make_request > function after calling generic_make_request(). Maybe that is difficult. > > Or we could change __split_and_process_bio to use bio_split() to split > the bio, then handle the first and call generic_make_request on the > second. That would effectively put the second half on the end of > bio_list so it wouldn't be tried until all requests derived from the > first half have been handled. > > None of these is completely straight forward, but I suspect all are > possible. > > I'm leaning towards the last one: when you want to split a bio, use > bio_split and handle the two halves separately. Sounds worth pursuing. > Do you have thoughts on that? Once the late bio splitting work is merged for 4.3 I (along with Joe, yourself, anyone interested) hope to review/audit/fix DM core to make proper use of bio_split() et al. So any work in this area needs to be based on the late-bio-splitting patchset/branch that mlin has been pushing. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, 13 Aug 2015, NeilBrown wrote: > On Wed, 12 Aug 2015 12:25:42 -0400 (EDT) Mikulas Patocka > <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 12 Aug 2015, NeilBrown wrote: > > > > > On Tue, 11 Aug 2015 05:14:33 -0400 (EDT) Mikulas Patocka > > > <mpatocka@redhat.com> wrote: > > > > > > > Hi > > > > > > > > On Mon, 10 Aug 2015, NeilBrown wrote: > > > > > > > > > > > > > > Hi Mikulas, > > > > > I have a customer hitting the deadlock you described over a year ago > > > > > in: > > > > > > > > > > Subject: [dm-devel] [PATCH] block: flush queued bios when the process > > > > > blocks > > > > > > > > Ask block layer maintainers to accept that patch. > > > > > > Unfortunately I don't really like the patch ... or the bioset rescue > > > workqueues that it is based on. Sorry. > > > > > > So I might keep looking for a more localised approach.... > > > > The problem here is that other dm targets may deadlock in a similar way > > too - for example, dm-thin could deadlock on pmd->pool_lock. > > > > The cause of the bug is bio queuing on current->bio_list. There is an > > assumption that if a dm target submits a bio to a lower-level target, the > > bio finishes in finite time. Queuing on current->bio_list breaks the > > assumption, bios can be held indefinitelly on current->bio_list. > > > > The patch that flushes current->bio_list is the correct way to fix it - it > > makes sure that a bio can't be held indefinitely. > > > > Another way to fix it would be to abandon current->bio_list --- but then, > > there would be problem with stack overflow on deeply nested targets. > > > > I think it is a bit simplistic to say that current->bio_list is the > "cause" of the problem. It is certainly a part of the problem. The > assumption you mention is another part - and the two conflict. > > As you say, we could abandon current->bio_list, but then we risk stack > overflows again. > Or we could hand the bio_list to a work-queue whenever the make_request > function needs to schedule.... but then if handling one of those bios > needs to schedule... not sure, it might work. > > Or we could change the assumption and never block in a make_request > function after calling generic_make_request(). Maybe that is difficult. > > Or we could change __split_and_process_bio to use bio_split() to split > the bio, then handle the first and call generic_make_request on the > second. That would effectively put the second half on the end of > bio_list so it wouldn't be tried until all requests derived from the > first half have been handled. I don't think it will fix the bug - even if you put the second half of the bio at the end of bio_list, it will still wait until other entries on the bio list are processed. For example - device 1 gets a bio, splits it to bio1 and bio2, forwards them to device 2 and put them on current->bio_list Device 2 receives bio1 (popped from curretn->bio_list), splits it to bio3 and bio4, forwards them to device 3 and puts them at the end of current->bio_list Device 2 receives bio2 (popped from current->bio_list), waits until bio1 finishes, but bio1 won't ever finish because it depends on bio3 and bio4 that are on current->bio_list. > None of these is completely straight forward, but I suspect all are > possible. > > I'm leaning towards the last one: when you want to split a bio, use > bio_split and handle the two halves separately. > > Do you have thoughts on that? > > Thanks, > NeilBrown Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 7c82d3ccce87..c7a7ed2cfd6d 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success) goto out; } - /* Check for conflicting reads */ - __check_for_conflicting_io(s, pe->e.old_chunk); + /* Add proper exception. Now all reads will be + * redirected, so no new reads can be started on + * 'old_chunk'. + */ + dm_insert_exception(&s->complete, e); + + /* Drain conflicting reads */ + if (__chunk_is_tracked(s, pe->e.old_chunk)) { + up_write(&s->lock); + __check_for_conflicting_io(s, pe->e.old_chunk); + down_write(&s->lock); + } /* - * Add a proper exception, and remove the - * in-flight exception from the list. + * And now we can remove the temporary exception. */ - dm_insert_exception(&s->complete, e); out: dm_remove_exception(&pe->e);