Message ID | 6e32c568-731b-4e19-5e54-5e44aa129f37@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | xfs_repair: kick processing thread if ra_count is at limit | expand |
On Wed, Oct 24, 2018 at 06:11:46PM -0500, Eric Sandeen wrote: > Zorro hit an xfs_repair hang on a 500T filesystem where > all the prefetch threads were sleeping and nothing progressed. > > The problem is that if every buffer we tried to read ahead in > phase6 was already up to date, pf_start_io_workers has no effect; > there is no io to do, and the sem_wait in pf_queuing_worker waits > forever. > > Kick the processing thread to avoid this situation. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201173 > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > My brains started leaking out debugging this, but it works, > and it seems harmless. :D Happy to have review from anyone who groks > the prefetch thread management better than I do... > > diff --git a/repair/prefetch.c b/repair/prefetch.c > index 9571b24..1de0e2f 100644 > --- a/repair/prefetch.c > +++ b/repair/prefetch.c > @@ -768,8 +768,12 @@ pf_queuing_worker( > * might get stuck on a buffer that has been locked > * and added to the I/O queue but is waiting for > * the thread to be woken. > + * Start processing as well, in case everything so > + * far was already prefetched and the queue is empty. > */ > + > pf_start_io_workers(args); > + pf_start_processing(args); > sem_wait(&args->ra_count); > } Looks reasonable. We've had other bugs like this in the prefetch code, so I'm not surprised there are still some lurking. Reviewed-by: Dave Chinner <dchinner@redhat.com> Cheers, Dave.
diff --git a/repair/prefetch.c b/repair/prefetch.c index 9571b24..1de0e2f 100644 --- a/repair/prefetch.c +++ b/repair/prefetch.c @@ -768,8 +768,12 @@ pf_queuing_worker( * might get stuck on a buffer that has been locked * and added to the I/O queue but is waiting for * the thread to be woken. + * Start processing as well, in case everything so + * far was already prefetched and the queue is empty. */ + pf_start_io_workers(args); + pf_start_processing(args); sem_wait(&args->ra_count); }
Zorro hit an xfs_repair hang on a 500T filesystem where all the prefetch threads were sleeping and nothing progressed. The problem is that if every buffer we tried to read ahead in phase6 was already up to date, pf_start_io_workers has no effect; there is no io to do, and the sem_wait in pf_queuing_worker waits forever. Kick the processing thread to avoid this situation. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201173 Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- My brains started leaking out debugging this, but it works, and it seems harmless. :D Happy to have review from anyone who groks the prefetch thread management better than I do...