diff mbox

[4/7] dm mpath: remove process_queued_ios()

Message ID 1391511280-29325-5-git-send-email-hare@suse.de (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Hannes Reinecke Feb. 4, 2014, 10:54 a.m. UTC
process_queued_ios() has served 3 functions:
  1) select pg and pgpath if none is selected
  2) start pg_init if requested
  3) dispatch queued IOs when pg is ready

Basically, a call to queue_work(process_queued_ios) can be replaced by
dm_table_run_md_queue_async(), which runs request queue and ends up
calling map_io(), which does 1), 2) and 3).

Exception is when !pg_ready() (which means either pg_init is running or
requested), then multipath_busy() prevents map_io() being called from
request_fn.

If pg_init is running, it should be ok as long as pg_init_done() does
the right thing when pg_init is completed, I.e.: restart pg_init if
!pg_ready() or call dm_table_run_md_queue_async() to kick map_io().

If pg_init is requested, we have to make sure the request is detected
and pg_init will be started.  pg_init is requested in 3 places:
  a) __choose_pgpath() in map_io()
  b) __choose_pgpath() in multipath_ioctl()
  c) pg_init retry in pg_init_done()
a) is ok because map_io() calls __pg_init_all_paths(), which does 2).
b) needs a call to __pg_init_all_paths(), which does 2).
c) needs a call to __pg_init_all_paths(), which does 2).

So this patch removes process_queued_ios() and ensures that
__pg_init_all_paths() is called at the appropriate locations.

We only need to take care to add a small delay when calling
__pg_init_all_paths() to move processing off to a workqueue; otherwise
pg_init_done() might end up calling scsi_dh_activate() directly, which
might use non-atomic memory allocations or issue I/O.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-mpath.c | 71 +++++++++++++++++++++------------------------------
 1 file changed, 29 insertions(+), 42 deletions(-)

Comments

Junichi Nomura Feb. 4, 2014, 11:26 a.m. UTC | #1
On 02/04/14 19:54, Hannes Reinecke wrote:
> We only need to take care to add a small delay when calling
> __pg_init_all_paths() to move processing off to a workqueue; otherwise
> pg_init_done() might end up calling scsi_dh_activate() directly, which
> might use non-atomic memory allocations or issue I/O.

Hi Hannes,

could you tell me how "might end up calling scsi_dh_active()" happens?

  queue_delayed_work()
    queue_delayed_work_on()
      __queue_delayed_work()
        if (!delay)
          __queue_work()
            get_work_pool()
            insert_work()
              set_work_pwq()
              get_pwq()
              if (__need_more_worker())
                wake_up_worker()

queue_work() doesn't execute the work itself.

What am I missing?
Hannes Reinecke Feb. 4, 2014, 11:31 a.m. UTC | #2
On 02/04/2014 12:26 PM, Junichi Nomura wrote:
> On 02/04/14 19:54, Hannes Reinecke wrote:
>> We only need to take care to add a small delay when calling
>> __pg_init_all_paths() to move processing off to a workqueue; otherwise
>> pg_init_done() might end up calling scsi_dh_activate() directly, which
>> might use non-atomic memory allocations or issue I/O.
> 
> Hi Hannes,
> 
> could you tell me how "might end up calling scsi_dh_active()" happens?
> 
>   queue_delayed_work()
>     queue_delayed_work_on()
>       __queue_delayed_work()
>         if (!delay)
>           __queue_work()
>             get_work_pool()
>             insert_work()
>               set_work_pwq()
>               get_pwq()
>               if (__need_more_worker())
>                 wake_up_worker()
> 
> queue_work() doesn't execute the work itself.
> 
> What am I missing?
> 
As mentioned, I stumbled across the same issue when developing the
asynchronous SCSI aborts. I'll see to have it recreated with this
patchset.

Cheers,

Hannes
Mike Snitzer Feb. 10, 2014, 1:30 p.m. UTC | #3
On Tue, Feb 04 2014 at  6:31am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 02/04/2014 12:26 PM, Junichi Nomura wrote:
> > On 02/04/14 19:54, Hannes Reinecke wrote:
> >> We only need to take care to add a small delay when calling
> >> __pg_init_all_paths() to move processing off to a workqueue; otherwise
> >> pg_init_done() might end up calling scsi_dh_activate() directly, which
> >> might use non-atomic memory allocations or issue I/O.
> > 
> > Hi Hannes,
> > 
> > could you tell me how "might end up calling scsi_dh_active()" happens?
> > 
> >   queue_delayed_work()
> >     queue_delayed_work_on()
> >       __queue_delayed_work()
> >         if (!delay)
> >           __queue_work()
> >             get_work_pool()
> >             insert_work()
> >               set_work_pwq()
> >               get_pwq()
> >               if (__need_more_worker())
> >                 wake_up_worker()
> > 
> > queue_work() doesn't execute the work itself.
> > 
> > What am I missing?
> > 
> As mentioned, I stumbled across the same issue when developing the
> asynchronous SCSI aborts. I'll see to have it recreated with this
> patchset.

Just to verify, this seems to be the only outstanding question for this
patchset?

What value are you using for HZ?  If this portion of the change does
turn out to be meaningul: Rather than tieing to HZ should we just use an
explicitly non-zero value for __pg_init_all_paths()'s @min_delay?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke Feb. 11, 2014, 9:46 a.m. UTC | #4
On 02/10/2014 02:30 PM, Mike Snitzer wrote:
> On Tue, Feb 04 2014 at  6:31am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 02/04/2014 12:26 PM, Junichi Nomura wrote:
>>> On 02/04/14 19:54, Hannes Reinecke wrote:
>>>> We only need to take care to add a small delay when calling
>>>> __pg_init_all_paths() to move processing off to a workqueue; otherwise
>>>> pg_init_done() might end up calling scsi_dh_activate() directly, which
>>>> might use non-atomic memory allocations or issue I/O.
>>>
>>> Hi Hannes,
>>>
>>> could you tell me how "might end up calling scsi_dh_active()" happens?
>>>
>>>   queue_delayed_work()
>>>     queue_delayed_work_on()
>>>       __queue_delayed_work()
>>>         if (!delay)
>>>           __queue_work()
>>>             get_work_pool()
>>>             insert_work()
>>>               set_work_pwq()
>>>               get_pwq()
>>>               if (__need_more_worker())
>>>                 wake_up_worker()
>>>
>>> queue_work() doesn't execute the work itself.
>>>
>>> What am I missing?
>>>
>> As mentioned, I stumbled across the same issue when developing the
>> asynchronous SCSI aborts. I'll see to have it recreated with this
>> patchset.
> 
> Just to verify, this seems to be the only outstanding question for this
> patchset?
> 
> What value are you using for HZ?  If this portion of the change does
> turn out to be meaningul: Rather than tieing to HZ should we just use an
> explicitly non-zero value for __pg_init_all_paths()'s @min_delay?
> 
The actual amount here is irrelevant, as long as it's non-zero.
It's just there to force execution of the work item off the current
thread.

Cheers,

Hannes
Mike Snitzer Feb. 11, 2014, 3:55 p.m. UTC | #5
On Tue, Feb 11 2014 at  4:46am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 02/10/2014 02:30 PM, Mike Snitzer wrote:
> > 
> > Just to verify, this seems to be the only outstanding question for this
> > patchset?
> > 
> > What value are you using for HZ?  If this portion of the change does
> > turn out to be meaningul: Rather than tieing to HZ should we just use an
> > explicitly non-zero value for __pg_init_all_paths()'s @min_delay?
> > 
> The actual amount here is irrelevant, as long as it's non-zero.
> It's just there to force execution of the work item off the current
> thread.

I'm aware we just need a non-zero value.  My concern, as originally
raised by Junichi in an earlier reply when you had it as HZ/50, is that
the value could be 0 if HZ is really small.  While unlikely I see no
point allowing the variable nature of HZ compromise passing a non-zero
value here.  Best to just be explicit by passing 1 or something.

All said, the question of why this is actually needed remains.  I trust
you're working on answering that via reproducer (by not forcing the use
of workqueue context)?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 11, 2014, 4:29 p.m. UTC | #6
On Tue, Feb 11 2014 at  1:03pm -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 02/11/2014 04:55 PM, Mike Snitzer wrote:
> >On Tue, Feb 11 2014 at  4:46am -0500,
> >Hannes Reinecke <hare@suse.de> wrote:
> >
> >>On 02/10/2014 02:30 PM, Mike Snitzer wrote:
> >>>
> >>>Just to verify, this seems to be the only outstanding question for this
> >>>patchset?
> >>>
> >>>What value are you using for HZ?  If this portion of the change does
> >>>turn out to be meaningul: Rather than tieing to HZ should we just use an
> >>>explicitly non-zero value for __pg_init_all_paths()'s @min_delay?
> >>>
> >>The actual amount here is irrelevant, as long as it's non-zero.
> >>It's just there to force execution of the work item off the current
> >>thread.
> >
> >I'm aware we just need a non-zero value.  My concern, as originally
> >raised by Junichi in an earlier reply when you had it as HZ/50, is that
> >the value could be 0 if HZ is really small.  While unlikely I see no
> >point allowing the variable nature of HZ compromise passing a non-zero
> >value here.  Best to just be explicit by passing 1 or something.
> >
> >All said, the question of why this is actually needed remains.  I trust
> >you're working on answering that via reproducer (by not forcing the use
> >of workqueue context)?
> >
> Precisely.
> 
> But as this is a bit hard to trigger it might take some time.
> (you'll only be hitting this issue if you have to retry
> scsi_dh_activate, so you'll need to trigger this somehow).

OK.

> I hope to get it done this week.
> Is there any deadline which I might miss with that?

No, that'll be great.  We have some time until the 3.15 merge window
opens.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke Feb. 11, 2014, 6:03 p.m. UTC | #7
On 02/11/2014 04:55 PM, Mike Snitzer wrote:
> On Tue, Feb 11 2014 at  4:46am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
>
>> On 02/10/2014 02:30 PM, Mike Snitzer wrote:
>>>
>>> Just to verify, this seems to be the only outstanding question for this
>>> patchset?
>>>
>>> What value are you using for HZ?  If this portion of the change does
>>> turn out to be meaningul: Rather than tieing to HZ should we just use an
>>> explicitly non-zero value for __pg_init_all_paths()'s @min_delay?
>>>
>> The actual amount here is irrelevant, as long as it's non-zero.
>> It's just there to force execution of the work item off the current
>> thread.
>
> I'm aware we just need a non-zero value.  My concern, as originally
> raised by Junichi in an earlier reply when you had it as HZ/50, is that
> the value could be 0 if HZ is really small.  While unlikely I see no
> point allowing the variable nature of HZ compromise passing a non-zero
> value here.  Best to just be explicit by passing 1 or something.
>
> All said, the question of why this is actually needed remains.  I trust
> you're working on answering that via reproducer (by not forcing the use
> of workqueue context)?
>
Precisely.

But as this is a bit hard to trigger it might take some time.
(you'll only be hitting this issue if you have to retry 
scsi_dh_activate, so you'll need to trigger this somehow).

I hope to get it done this week.
Is there any deadline which I might miss with that?

Cheers,

Hannes
Junichi Nomura Feb. 12, 2014, 2:37 a.m. UTC | #8
On 02/10/14 22:30, Mike Snitzer wrote:
> On Tue, Feb 04 2014 at  6:31am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 02/04/2014 12:26 PM, Junichi Nomura wrote:
>>> On 02/04/14 19:54, Hannes Reinecke wrote:
>>>> We only need to take care to add a small delay when calling
>>>> __pg_init_all_paths() to move processing off to a workqueue; otherwise
>>>> pg_init_done() might end up calling scsi_dh_activate() directly, which
>>>> might use non-atomic memory allocations or issue I/O.
>>>
>>> Hi Hannes,
>>>
>>> could you tell me how "might end up calling scsi_dh_active()" happens?
>>>
>>>   queue_delayed_work()
>>>     queue_delayed_work_on()
>>>       __queue_delayed_work()
>>>         if (!delay)
>>>           __queue_work()
>>>             get_work_pool()
>>>             insert_work()
>>>               set_work_pwq()
>>>               get_pwq()
>>>               if (__need_more_worker())
>>>                 wake_up_worker()
>>>
>>> queue_work() doesn't execute the work itself.
>>>
>>> What am I missing?
>>>
>> As mentioned, I stumbled across the same issue when developing the
>> asynchronous SCSI aborts. I'll see to have it recreated with this
>> patchset.
> 
> Just to verify, this seems to be the only outstanding question for this
> patchset?

Hi Mike, Hannes,

Yes, this is the only remaining question from me.
For other parts of this patch and patchset, issues I pointed were solved.
diff mbox

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 0c15847..0355adc 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -93,8 +93,6 @@  struct multipath {
 	unsigned pg_init_count;		/* Number of times pg_init called */
 	unsigned pg_init_delay_msecs;	/* Number of msecs before pg_init retry */
 
-	struct work_struct process_queued_ios;
-
 	struct work_struct trigger_event;
 
 	/*
@@ -119,7 +117,6 @@  typedef int (*action_fn) (struct pgpath *pgpath);
 static struct kmem_cache *_mpio_cache;
 
 static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
-static void process_queued_ios(struct work_struct *work);
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
 static int __pgpath_busy(struct pgpath *pgpath);
@@ -197,7 +194,6 @@  static struct multipath *alloc_multipath(struct dm_target *ti)
 		spin_lock_init(&m->lock);
 		m->queue_io = 1;
 		m->pg_init_delay_msecs = DM_PG_INIT_DELAY_DEFAULT;
-		INIT_WORK(&m->process_queued_ios, process_queued_ios);
 		INIT_WORK(&m->trigger_event, trigger_event);
 		init_waitqueue_head(&m->pg_init_wait);
 		mutex_init(&m->work_mutex);
@@ -254,16 +250,21 @@  static void clear_mapinfo(struct multipath *m, union map_info *info)
  * Path selection
  *-----------------------------------------------*/
 
-static void __pg_init_all_paths(struct multipath *m)
+static int __pg_init_all_paths(struct multipath *m, unsigned long min_delay)
 {
 	struct pgpath *pgpath;
-	unsigned long pg_init_delay = 0;
+	unsigned long pg_init_delay = min_delay;
 
 	if (m->pg_init_in_progress || m->pg_init_disabled)
-		return;
+		return 0;
 
 	m->pg_init_count++;
 	m->pg_init_required = 0;
+
+	/* Check here to reset pg_init_required */
+	if (!m->current_pg)
+		return 0;
+
 	if (m->pg_init_delay_retry)
 		pg_init_delay = msecs_to_jiffies(m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT ?
 						 m->pg_init_delay_msecs : DM_PG_INIT_DELAY_MSECS);
@@ -275,6 +276,7 @@  static void __pg_init_all_paths(struct multipath *m)
 				       pg_init_delay))
 			m->pg_init_in_progress++;
 	}
+	return m->pg_init_in_progress;
 }
 
 static void __switch_pg(struct multipath *m, struct pgpath *pgpath)
@@ -404,7 +406,7 @@  static int map_io(struct multipath *m, struct request *clone,
 							      &pgpath->path,
 							      nr_bytes);
 		} else {
-			__pg_init_all_paths(m);
+			__pg_init_all_paths(m, 0);
 			r = DM_MAPIO_REQUEUE;
 		}
 	} else {
@@ -436,40 +438,13 @@  static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 		m->saved_queue_if_no_path = queue_if_no_path;
 	m->queue_if_no_path = queue_if_no_path;
 	if (!m->queue_if_no_path)
-		queue_work(kmultipathd, &m->process_queued_ios);
+		dm_table_run_md_queue_async(m->ti->table);
 
 	spin_unlock_irqrestore(&m->lock, flags);
 
 	return 0;
 }
 
-static void process_queued_ios(struct work_struct *work)
-{
-	struct multipath *m =
-		container_of(work, struct multipath, process_queued_ios);
-	struct pgpath *pgpath = NULL;
-	unsigned must_queue = 1;
-	unsigned long flags;
-
-	spin_lock_irqsave(&m->lock, flags);
-
-	if (!m->current_pgpath)
-		__choose_pgpath(m, 0);
-
-	pgpath = m->current_pgpath;
-
-	if ((pgpath && !m->queue_io) ||
-	    (!pgpath && !m->queue_if_no_path))
-		must_queue = 0;
-
-	if (pgpath && m->pg_init_required)
-		__pg_init_all_paths(m);
-
-	spin_unlock_irqrestore(&m->lock, flags);
-	if (!must_queue)
-		dm_table_run_md_queue_async(m->ti->table);
-}
-
 /*
  * An event is triggered whenever a path is taken out of use.
  * Includes path failure and PG bypass.
@@ -1016,7 +991,7 @@  static int reinstate_path(struct pgpath *pgpath)
 
 	if (!m->nr_valid_paths++) {
 		m->current_pgpath = NULL;
-		queue_work(kmultipathd, &m->process_queued_ios);
+		dm_table_run_md_queue_async(m->ti->table);
 	} else if (m->hw_handler_name && (m->current_pg == pgpath->pg)) {
 		if (queue_work(kmpath_handlerd, &pgpath->activate_path.work))
 			m->pg_init_in_progress++;
@@ -1214,9 +1189,12 @@  static void pg_init_done(void *data, int errors)
 
 	if (!m->pg_init_required)
 		m->queue_io = 0;
-
-	m->pg_init_delay_retry = delay_retry;
-	queue_work(kmultipathd, &m->process_queued_ios);
+	else if (m->current_pg) {
+		m->pg_init_delay_retry = delay_retry;
+		/* Use a small delay to force the use of workqueue context */
+		if (__pg_init_all_paths(m, HZ/100))
+			goto out;
+	}
 
 	/*
 	 * Wake up any thread waiting to suspend.
@@ -1589,8 +1567,17 @@  static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
 	if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT)
 		r = scsi_verify_blk_ioctl(NULL, cmd);
 
-	if (r == -ENOTCONN && !fatal_signal_pending(current))
-		queue_work(kmultipathd, &m->process_queued_ios);
+	if (r == -ENOTCONN && !fatal_signal_pending(current)) {
+		spin_lock_irqsave(&m->lock, flags);
+		if (!m->current_pg) {
+			/* Path status changed, redo selection */
+			__choose_pgpath(m, 0);
+		}
+		if (m->pg_init_required)
+			__pg_init_all_paths(m, 0);
+		spin_unlock_irqrestore(&m->lock, flags);
+		dm_table_run_md_queue_async(m->ti->table);
+	}
 
 	return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg);
 }