Message ID | 1391511280-29325-5-git-send-email-hare@suse.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 02/04/14 19:54, Hannes Reinecke wrote: > We only need to take care to add a small delay when calling > __pg_init_all_paths() to move processing off to a workqueue; otherwise > pg_init_done() might end up calling scsi_dh_activate() directly, which > might use non-atomic memory allocations or issue I/O. Hi Hannes, could you tell me how "might end up calling scsi_dh_active()" happens? queue_delayed_work() queue_delayed_work_on() __queue_delayed_work() if (!delay) __queue_work() get_work_pool() insert_work() set_work_pwq() get_pwq() if (__need_more_worker()) wake_up_worker() queue_work() doesn't execute the work itself. What am I missing?
On 02/04/2014 12:26 PM, Junichi Nomura wrote: > On 02/04/14 19:54, Hannes Reinecke wrote: >> We only need to take care to add a small delay when calling >> __pg_init_all_paths() to move processing off to a workqueue; otherwise >> pg_init_done() might end up calling scsi_dh_activate() directly, which >> might use non-atomic memory allocations or issue I/O. > > Hi Hannes, > > could you tell me how "might end up calling scsi_dh_active()" happens? > > queue_delayed_work() > queue_delayed_work_on() > __queue_delayed_work() > if (!delay) > __queue_work() > get_work_pool() > insert_work() > set_work_pwq() > get_pwq() > if (__need_more_worker()) > wake_up_worker() > > queue_work() doesn't execute the work itself. > > What am I missing? > As mentioned, I stumbled across the same issue when developing the asynchronous SCSI aborts. I'll see to have it recreated with this patchset. Cheers, Hannes
On Tue, Feb 04 2014 at 6:31am -0500, Hannes Reinecke <hare@suse.de> wrote: > On 02/04/2014 12:26 PM, Junichi Nomura wrote: > > On 02/04/14 19:54, Hannes Reinecke wrote: > >> We only need to take care to add a small delay when calling > >> __pg_init_all_paths() to move processing off to a workqueue; otherwise > >> pg_init_done() might end up calling scsi_dh_activate() directly, which > >> might use non-atomic memory allocations or issue I/O. > > > > Hi Hannes, > > > > could you tell me how "might end up calling scsi_dh_active()" happens? > > > > queue_delayed_work() > > queue_delayed_work_on() > > __queue_delayed_work() > > if (!delay) > > __queue_work() > > get_work_pool() > > insert_work() > > set_work_pwq() > > get_pwq() > > if (__need_more_worker()) > > wake_up_worker() > > > > queue_work() doesn't execute the work itself. > > > > What am I missing? > > > As mentioned, I stumbled across the same issue when developing the > asynchronous SCSI aborts. I'll see to have it recreated with this > patchset. Just to verify, this seems to be the only outstanding question for this patchset? What value are you using for HZ? If this portion of the change does turn out to be meaningul: Rather than tieing to HZ should we just use an explicitly non-zero value for __pg_init_all_paths()'s @min_delay? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 02/10/2014 02:30 PM, Mike Snitzer wrote: > On Tue, Feb 04 2014 at 6:31am -0500, > Hannes Reinecke <hare@suse.de> wrote: > >> On 02/04/2014 12:26 PM, Junichi Nomura wrote: >>> On 02/04/14 19:54, Hannes Reinecke wrote: >>>> We only need to take care to add a small delay when calling >>>> __pg_init_all_paths() to move processing off to a workqueue; otherwise >>>> pg_init_done() might end up calling scsi_dh_activate() directly, which >>>> might use non-atomic memory allocations or issue I/O. >>> >>> Hi Hannes, >>> >>> could you tell me how "might end up calling scsi_dh_active()" happens? >>> >>> queue_delayed_work() >>> queue_delayed_work_on() >>> __queue_delayed_work() >>> if (!delay) >>> __queue_work() >>> get_work_pool() >>> insert_work() >>> set_work_pwq() >>> get_pwq() >>> if (__need_more_worker()) >>> wake_up_worker() >>> >>> queue_work() doesn't execute the work itself. >>> >>> What am I missing? >>> >> As mentioned, I stumbled across the same issue when developing the >> asynchronous SCSI aborts. I'll see to have it recreated with this >> patchset. > > Just to verify, this seems to be the only outstanding question for this > patchset? > > What value are you using for HZ? If this portion of the change does > turn out to be meaningul: Rather than tieing to HZ should we just use an > explicitly non-zero value for __pg_init_all_paths()'s @min_delay? > The actual amount here is irrelevant, as long as it's non-zero. It's just there to force execution of the work item off the current thread. Cheers, Hannes
On Tue, Feb 11 2014 at 4:46am -0500, Hannes Reinecke <hare@suse.de> wrote: > On 02/10/2014 02:30 PM, Mike Snitzer wrote: > > > > Just to verify, this seems to be the only outstanding question for this > > patchset? > > > > What value are you using for HZ? If this portion of the change does > > turn out to be meaningul: Rather than tieing to HZ should we just use an > > explicitly non-zero value for __pg_init_all_paths()'s @min_delay? > > > The actual amount here is irrelevant, as long as it's non-zero. > It's just there to force execution of the work item off the current > thread. I'm aware we just need a non-zero value. My concern, as originally raised by Junichi in an earlier reply when you had it as HZ/50, is that the value could be 0 if HZ is really small. While unlikely I see no point allowing the variable nature of HZ compromise passing a non-zero value here. Best to just be explicit by passing 1 or something. All said, the question of why this is actually needed remains. I trust you're working on answering that via reproducer (by not forcing the use of workqueue context)? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Feb 11 2014 at 1:03pm -0500, Hannes Reinecke <hare@suse.de> wrote: > On 02/11/2014 04:55 PM, Mike Snitzer wrote: > >On Tue, Feb 11 2014 at 4:46am -0500, > >Hannes Reinecke <hare@suse.de> wrote: > > > >>On 02/10/2014 02:30 PM, Mike Snitzer wrote: > >>> > >>>Just to verify, this seems to be the only outstanding question for this > >>>patchset? > >>> > >>>What value are you using for HZ? If this portion of the change does > >>>turn out to be meaningul: Rather than tieing to HZ should we just use an > >>>explicitly non-zero value for __pg_init_all_paths()'s @min_delay? > >>> > >>The actual amount here is irrelevant, as long as it's non-zero. > >>It's just there to force execution of the work item off the current > >>thread. > > > >I'm aware we just need a non-zero value. My concern, as originally > >raised by Junichi in an earlier reply when you had it as HZ/50, is that > >the value could be 0 if HZ is really small. While unlikely I see no > >point allowing the variable nature of HZ compromise passing a non-zero > >value here. Best to just be explicit by passing 1 or something. > > > >All said, the question of why this is actually needed remains. I trust > >you're working on answering that via reproducer (by not forcing the use > >of workqueue context)? > > > Precisely. > > But as this is a bit hard to trigger it might take some time. > (you'll only be hitting this issue if you have to retry > scsi_dh_activate, so you'll need to trigger this somehow). OK. > I hope to get it done this week. > Is there any deadline which I might miss with that? No, that'll be great. We have some time until the 3.15 merge window opens. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 02/11/2014 04:55 PM, Mike Snitzer wrote: > On Tue, Feb 11 2014 at 4:46am -0500, > Hannes Reinecke <hare@suse.de> wrote: > >> On 02/10/2014 02:30 PM, Mike Snitzer wrote: >>> >>> Just to verify, this seems to be the only outstanding question for this >>> patchset? >>> >>> What value are you using for HZ? If this portion of the change does >>> turn out to be meaningul: Rather than tieing to HZ should we just use an >>> explicitly non-zero value for __pg_init_all_paths()'s @min_delay? >>> >> The actual amount here is irrelevant, as long as it's non-zero. >> It's just there to force execution of the work item off the current >> thread. > > I'm aware we just need a non-zero value. My concern, as originally > raised by Junichi in an earlier reply when you had it as HZ/50, is that > the value could be 0 if HZ is really small. While unlikely I see no > point allowing the variable nature of HZ compromise passing a non-zero > value here. Best to just be explicit by passing 1 or something. > > All said, the question of why this is actually needed remains. I trust > you're working on answering that via reproducer (by not forcing the use > of workqueue context)? > Precisely. But as this is a bit hard to trigger it might take some time. (you'll only be hitting this issue if you have to retry scsi_dh_activate, so you'll need to trigger this somehow). I hope to get it done this week. Is there any deadline which I might miss with that? Cheers, Hannes
On 02/10/14 22:30, Mike Snitzer wrote: > On Tue, Feb 04 2014 at 6:31am -0500, > Hannes Reinecke <hare@suse.de> wrote: > >> On 02/04/2014 12:26 PM, Junichi Nomura wrote: >>> On 02/04/14 19:54, Hannes Reinecke wrote: >>>> We only need to take care to add a small delay when calling >>>> __pg_init_all_paths() to move processing off to a workqueue; otherwise >>>> pg_init_done() might end up calling scsi_dh_activate() directly, which >>>> might use non-atomic memory allocations or issue I/O. >>> >>> Hi Hannes, >>> >>> could you tell me how "might end up calling scsi_dh_active()" happens? >>> >>> queue_delayed_work() >>> queue_delayed_work_on() >>> __queue_delayed_work() >>> if (!delay) >>> __queue_work() >>> get_work_pool() >>> insert_work() >>> set_work_pwq() >>> get_pwq() >>> if (__need_more_worker()) >>> wake_up_worker() >>> >>> queue_work() doesn't execute the work itself. >>> >>> What am I missing? >>> >> As mentioned, I stumbled across the same issue when developing the >> asynchronous SCSI aborts. I'll see to have it recreated with this >> patchset. > > Just to verify, this seems to be the only outstanding question for this > patchset? Hi Mike, Hannes, Yes, this is the only remaining question from me. For other parts of this patch and patchset, issues I pointed were solved.
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 0c15847..0355adc 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -93,8 +93,6 @@ struct multipath { unsigned pg_init_count; /* Number of times pg_init called */ unsigned pg_init_delay_msecs; /* Number of msecs before pg_init retry */ - struct work_struct process_queued_ios; - struct work_struct trigger_event; /* @@ -119,7 +117,6 @@ typedef int (*action_fn) (struct pgpath *pgpath); static struct kmem_cache *_mpio_cache; static struct workqueue_struct *kmultipathd, *kmpath_handlerd; -static void process_queued_ios(struct work_struct *work); static void trigger_event(struct work_struct *work); static void activate_path(struct work_struct *work); static int __pgpath_busy(struct pgpath *pgpath); @@ -197,7 +194,6 @@ static struct multipath *alloc_multipath(struct dm_target *ti) spin_lock_init(&m->lock); m->queue_io = 1; m->pg_init_delay_msecs = DM_PG_INIT_DELAY_DEFAULT; - INIT_WORK(&m->process_queued_ios, process_queued_ios); INIT_WORK(&m->trigger_event, trigger_event); init_waitqueue_head(&m->pg_init_wait); mutex_init(&m->work_mutex); @@ -254,16 +250,21 @@ static void clear_mapinfo(struct multipath *m, union map_info *info) * Path selection *-----------------------------------------------*/ -static void __pg_init_all_paths(struct multipath *m) +static int __pg_init_all_paths(struct multipath *m, unsigned long min_delay) { struct pgpath *pgpath; - unsigned long pg_init_delay = 0; + unsigned long pg_init_delay = min_delay; if (m->pg_init_in_progress || m->pg_init_disabled) - return; + return 0; m->pg_init_count++; m->pg_init_required = 0; + + /* Check here to reset pg_init_required */ + if (!m->current_pg) + return 0; + if (m->pg_init_delay_retry) pg_init_delay = msecs_to_jiffies(m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT ? m->pg_init_delay_msecs : DM_PG_INIT_DELAY_MSECS); @@ -275,6 +276,7 @@ static void __pg_init_all_paths(struct multipath *m) pg_init_delay)) m->pg_init_in_progress++; } + return m->pg_init_in_progress; } static void __switch_pg(struct multipath *m, struct pgpath *pgpath) @@ -404,7 +406,7 @@ static int map_io(struct multipath *m, struct request *clone, &pgpath->path, nr_bytes); } else { - __pg_init_all_paths(m); + __pg_init_all_paths(m, 0); r = DM_MAPIO_REQUEUE; } } else { @@ -436,40 +438,13 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, m->saved_queue_if_no_path = queue_if_no_path; m->queue_if_no_path = queue_if_no_path; if (!m->queue_if_no_path) - queue_work(kmultipathd, &m->process_queued_ios); + dm_table_run_md_queue_async(m->ti->table); spin_unlock_irqrestore(&m->lock, flags); return 0; } -static void process_queued_ios(struct work_struct *work) -{ - struct multipath *m = - container_of(work, struct multipath, process_queued_ios); - struct pgpath *pgpath = NULL; - unsigned must_queue = 1; - unsigned long flags; - - spin_lock_irqsave(&m->lock, flags); - - if (!m->current_pgpath) - __choose_pgpath(m, 0); - - pgpath = m->current_pgpath; - - if ((pgpath && !m->queue_io) || - (!pgpath && !m->queue_if_no_path)) - must_queue = 0; - - if (pgpath && m->pg_init_required) - __pg_init_all_paths(m); - - spin_unlock_irqrestore(&m->lock, flags); - if (!must_queue) - dm_table_run_md_queue_async(m->ti->table); -} - /* * An event is triggered whenever a path is taken out of use. * Includes path failure and PG bypass. @@ -1016,7 +991,7 @@ static int reinstate_path(struct pgpath *pgpath) if (!m->nr_valid_paths++) { m->current_pgpath = NULL; - queue_work(kmultipathd, &m->process_queued_ios); + dm_table_run_md_queue_async(m->ti->table); } else if (m->hw_handler_name && (m->current_pg == pgpath->pg)) { if (queue_work(kmpath_handlerd, &pgpath->activate_path.work)) m->pg_init_in_progress++; @@ -1214,9 +1189,12 @@ static void pg_init_done(void *data, int errors) if (!m->pg_init_required) m->queue_io = 0; - - m->pg_init_delay_retry = delay_retry; - queue_work(kmultipathd, &m->process_queued_ios); + else if (m->current_pg) { + m->pg_init_delay_retry = delay_retry; + /* Use a small delay to force the use of workqueue context */ + if (__pg_init_all_paths(m, HZ/100)) + goto out; + } /* * Wake up any thread waiting to suspend. @@ -1589,8 +1567,17 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd, if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT) r = scsi_verify_blk_ioctl(NULL, cmd); - if (r == -ENOTCONN && !fatal_signal_pending(current)) - queue_work(kmultipathd, &m->process_queued_ios); + if (r == -ENOTCONN && !fatal_signal_pending(current)) { + spin_lock_irqsave(&m->lock, flags); + if (!m->current_pg) { + /* Path status changed, redo selection */ + __choose_pgpath(m, 0); + } + if (m->pg_init_required) + __pg_init_all_paths(m, 0); + spin_unlock_irqrestore(&m->lock, flags); + dm_table_run_md_queue_async(m->ti->table); + } return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg); }