Message ID | 20170517150153.2eamcjtc4frx2cae@linutronix.de (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Wed, 17 May 2017, 11:01am, Sebastian Andrzej Siewior wrote: > On 2017-05-12 11:55:52 [-0400], Chad Dupuis wrote: > > Ok, I believe I've found the issue here. The machine that the test has > > performed on had many more possible CPUs than active CPUs. We calculate > > which CPU to the work time on in bnx2fc_process_new_cqes() like this: > > > > unsigned int cpu = wqe % num_possible_cpus(); > > > > Since not all CPUs are active, we were trying to schedule work on > > non-active CPUs which meant that the upper layers were never notified of > > the completion. With this change: > > > > diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > b/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > index c2288d6..6f08e43 100644 > > --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct > > bnx2fc_rport *tgt) > > /* Pending work request completion */ > > struct bnx2fc_work *work = NULL; > > struct bnx2fc_percpu_s *fps = NULL; > > - unsigned int cpu = wqe % num_possible_cpus(); > > + unsigned int cpu = wqe % num_active_cpus(); > > + > > + /* Sanity check cpu to make sure it's online */ > > + if (!cpu_active(cpu)) > > + /* Default to CPU 0 */ > > + cpu = 0; > > > > work = bnx2fc_alloc_work(tgt, wqe); > > if (work) { > > > > The issue is fixed. > > > > Sebastian, can you add this change to your patch set? > > Are sure that you can reliably reproduce the issue and fix it with the > patch above? Because this patch: Yes it was reproducible each time we would start the FCoE interface. With the above patch, our sanity test passed with no issues seen. > > diff --git a/init/main.c b/init/main.c > index b0c11cbf5ddf..483a971b1fd2 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -997,6 +997,12 @@ static int __ref kernel_init(void *unused) > "See Linux Documentation/admin-guide/init.rst for guidance."); > } > > +static void workfn(struct work_struct *work) > +{ > + pr_err("%s() %d\n", __func__, raw_smp_processor_id()); > +} > +static DECLARE_WORK(work, workfn); > + > static noinline void __init kernel_init_freeable(void) > { > /* > @@ -1040,6 +1046,15 @@ static noinline void __init kernel_init_freeable(void) > > (void) sys_dup(0); > (void) sys_dup(0); > + { > + > + cpu_down(3); > + pr_err("%s() num possible: %d\n", __func__, num_possible_cpus()); > + pr_err("%s() num online : %d\n", __func__, num_online_cpus()); > + pr_err("%s() 3 active : %d\n", __func__, cpu_active(3)); > + schedule_work_on(3, &work); > + ssleep(5); > + } > /* > * check if there is an early userspace init. If yes, let it do all > * the work > > produces this output: > [ 1.960313] Unregister pv shared memory for cpu 3 > [ 1.997000] kernel_init_freeable() num possible: 8 > [ 1.998073] kernel_init_freeable() num online : 7 > [ 1.999125] kernel_init_freeable() 3 active : 0 > [ 2.000337] workfn() 1 > > which means, CPU3 is offline and work runs on CPU1 instead. So it does > already what you suggest except that chances are, that it is not run on > CPU0 in this case (but on another CPU). > > So it either takes some time for wait_for_completion(&io_req->tm_done); > to come back _or_ there is a leak somewhere where a complete() is > somehow missing / racing against something. > > Sebastian >
On 2017-05-17 17:01:53 [+0200], To Chad Dupuis wrote: > On 2017-05-12 11:55:52 [-0400], Chad Dupuis wrote: > > Ok, I believe I've found the issue here. The machine that the test has > > performed on had many more possible CPUs than active CPUs. We calculate > > which CPU to the work time on in bnx2fc_process_new_cqes() like this: > > > > unsigned int cpu = wqe % num_possible_cpus(); > > > > Since not all CPUs are active, we were trying to schedule work on > > non-active CPUs which meant that the upper layers were never notified of > > the completion. With this change: > > > > diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > b/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > index c2288d6..6f08e43 100644 > > --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c > > @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct > > bnx2fc_rport *tgt) > > /* Pending work request completion */ > > struct bnx2fc_work *work = NULL; > > struct bnx2fc_percpu_s *fps = NULL; > > - unsigned int cpu = wqe % num_possible_cpus(); > > + unsigned int cpu = wqe % num_active_cpus(); > > + > > + /* Sanity check cpu to make sure it's online */ > > + if (!cpu_active(cpu)) > > + /* Default to CPU 0 */ > > + cpu = 0; > > > > work = bnx2fc_alloc_work(tgt, wqe); > > if (work) { > > > > The issue is fixed. > > > > Sebastian, can you add this change to your patch set? > > Are sure that you can reliably reproduce the issue and fix it with the > patch above? Because this patch: oh. Okay. Now it clicked. It can fix the issue but it is still possible, that CPU0 goes down between your check for it and schedule_work_on() returning. Let my think of something… Sebastian
On 2017-05-17 17:07:34 [+0200], To Chad Dupuis wrote: > > > Sebastian, can you add this change to your patch set? > > > > Are sure that you can reliably reproduce the issue and fix it with the > > patch above? Because this patch: > > oh. Okay. Now it clicked. It can fix the issue but it is still possible, > that CPU0 goes down between your check for it and schedule_work_on() > returning. Let my think of something… Oh wait. I already thought about this: it may take bnx2fc_percpu from CPU7 and run the worker on CPU3. The job isn't lost, because the worker does: | static void bnx2fc_percpu_io_work(struct work_struct *work_s) | { | struct bnx2fc_percpu_s *p; … | p = container_of(work_s, struct bnx2fc_percpu_s, work); | | spin_lock_bh(&p->fp_work_lock); and so will access bnx2fc_percpu of CPU7 running on CPU3. So I *think* that your patch should make no difference and there should be no leak if schedule_work_on() is invoked on an offline CPU. Sebastian
diff --git a/init/main.c b/init/main.c index b0c11cbf5ddf..483a971b1fd2 100644 --- a/init/main.c +++ b/init/main.c @@ -997,6 +997,12 @@ static int __ref kernel_init(void *unused) "See Linux Documentation/admin-guide/init.rst for guidance."); } +static void workfn(struct work_struct *work) +{ + pr_err("%s() %d\n", __func__, raw_smp_processor_id()); +} +static DECLARE_WORK(work, workfn); + static noinline void __init kernel_init_freeable(void) { /* @@ -1040,6 +1046,15 @@ static noinline void __init kernel_init_freeable(void) (void) sys_dup(0); (void) sys_dup(0); + { + + cpu_down(3); + pr_err("%s() num possible: %d\n", __func__, num_possible_cpus()); + pr_err("%s() num online : %d\n", __func__, num_online_cpus()); + pr_err("%s() 3 active : %d\n", __func__, cpu_active(3)); + schedule_work_on(3, &work); + ssleep(5); + } /* * check if there is an early userspace init. If yes, let it do all * the work