diff mbox

[REEEEPOST] bnx2i + bnx2fc: convert to generic workqueue (#3)

Message ID 20170517150153.2eamcjtc4frx2cae@linutronix.de (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Sebastian Andrzej Siewior May 17, 2017, 3:01 p.m. UTC
On 2017-05-12 11:55:52 [-0400], Chad Dupuis wrote:
> Ok, I believe I've found the issue here.  The machine that the test has 
> performed on had many more possible CPUs than active CPUs.  We calculate 
> which CPU to the work time on in bnx2fc_process_new_cqes() like this:
> 
> unsigned int cpu = wqe % num_possible_cpus();
> 
> Since not all CPUs are active, we were trying to schedule work on 
> non-active CPUs which meant that the upper layers were never notified of 
> the completion.  With this change:
> 
> diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c 
> b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> index c2288d6..6f08e43 100644
> --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct 
> bnx2fc_rport *tgt)
>                         /* Pending work request completion */
>                         struct bnx2fc_work *work = NULL;
>                         struct bnx2fc_percpu_s *fps = NULL;
> -                       unsigned int cpu = wqe % num_possible_cpus();
> +                       unsigned int cpu = wqe % num_active_cpus();
> +
> +                       /* Sanity check cpu to make sure it's online */
> +                       if (!cpu_active(cpu))
> +                               /* Default to CPU 0 */
> +                               cpu = 0;
>  
>                         work = bnx2fc_alloc_work(tgt, wqe);
>                         if (work) {
> 
> The issue is fixed.
> 
> Sebastian, can you add this change to your patch set?

Are sure that you can reliably reproduce the issue and fix it with the
patch above? Because this patch:


produces this output:
[    1.960313] Unregister pv shared memory for cpu 3
[    1.997000] kernel_init_freeable() num possible: 8
[    1.998073] kernel_init_freeable() num online  : 7
[    1.999125] kernel_init_freeable() 3 active    : 0
[    2.000337] workfn() 1

which means, CPU3 is offline and work runs on CPU1 instead. So it does
already what you suggest except that chances are, that it is not run on
CPU0 in this case (but on another CPU).

So it either takes some time for wait_for_completion(&io_req->tm_done);
to come back _or_ there is a leak somewhere where a complete() is
somehow missing / racing against something.

Sebastian

Comments

Dupuis, Chad May 17, 2017, 3:06 p.m. UTC | #1
On Wed, 17 May 2017, 11:01am, Sebastian Andrzej Siewior wrote:

> On 2017-05-12 11:55:52 [-0400], Chad Dupuis wrote:
> > Ok, I believe I've found the issue here.  The machine that the test has 
> > performed on had many more possible CPUs than active CPUs.  We calculate 
> > which CPU to the work time on in bnx2fc_process_new_cqes() like this:
> > 
> > unsigned int cpu = wqe % num_possible_cpus();
> > 
> > Since not all CPUs are active, we were trying to schedule work on 
> > non-active CPUs which meant that the upper layers were never notified of 
> > the completion.  With this change:
> > 
> > diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c 
> > b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > index c2288d6..6f08e43 100644
> > --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct 
> > bnx2fc_rport *tgt)
> >                         /* Pending work request completion */
> >                         struct bnx2fc_work *work = NULL;
> >                         struct bnx2fc_percpu_s *fps = NULL;
> > -                       unsigned int cpu = wqe % num_possible_cpus();
> > +                       unsigned int cpu = wqe % num_active_cpus();
> > +
> > +                       /* Sanity check cpu to make sure it's online */
> > +                       if (!cpu_active(cpu))
> > +                               /* Default to CPU 0 */
> > +                               cpu = 0;
> >  
> >                         work = bnx2fc_alloc_work(tgt, wqe);
> >                         if (work) {
> > 
> > The issue is fixed.
> > 
> > Sebastian, can you add this change to your patch set?
> 
> Are sure that you can reliably reproduce the issue and fix it with the
> patch above? Because this patch:

Yes it was reproducible each time we would start the FCoE interface.  With 
the above patch, our sanity test passed with no issues seen.

> 
> diff --git a/init/main.c b/init/main.c
> index b0c11cbf5ddf..483a971b1fd2 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -997,6 +997,12 @@ static int __ref kernel_init(void *unused)
>               "See Linux Documentation/admin-guide/init.rst for guidance.");
>  }
>  
> +static void workfn(struct work_struct *work)
> +{
> +       pr_err("%s() %d\n", __func__, raw_smp_processor_id());
> +}
> +static DECLARE_WORK(work, workfn);
> +
>  static noinline void __init kernel_init_freeable(void)
>  {
>         /*
> @@ -1040,6 +1046,15 @@ static noinline void __init kernel_init_freeable(void)
>  
>         (void) sys_dup(0);
>         (void) sys_dup(0);
> +       {
> +
> +               cpu_down(3);
> +               pr_err("%s() num possible: %d\n", __func__, num_possible_cpus());
> +               pr_err("%s() num online  : %d\n", __func__, num_online_cpus());
> +               pr_err("%s() 3 active    : %d\n", __func__, cpu_active(3));
> +               schedule_work_on(3, &work);
> +               ssleep(5);
> +       }
>         /*
>          * check if there is an early userspace init.  If yes, let it do all
>          * the work
> 
> produces this output:
> [    1.960313] Unregister pv shared memory for cpu 3
> [    1.997000] kernel_init_freeable() num possible: 8
> [    1.998073] kernel_init_freeable() num online  : 7
> [    1.999125] kernel_init_freeable() 3 active    : 0
> [    2.000337] workfn() 1
> 
> which means, CPU3 is offline and work runs on CPU1 instead. So it does
> already what you suggest except that chances are, that it is not run on
> CPU0 in this case (but on another CPU).
> 
> So it either takes some time for wait_for_completion(&io_req->tm_done);
> to come back _or_ there is a leak somewhere where a complete() is
> somehow missing / racing against something.
> 
> Sebastian
>
Sebastian Andrzej Siewior May 17, 2017, 3:07 p.m. UTC | #2
On 2017-05-17 17:01:53 [+0200], To Chad Dupuis wrote:
> On 2017-05-12 11:55:52 [-0400], Chad Dupuis wrote:
> > Ok, I believe I've found the issue here.  The machine that the test has 
> > performed on had many more possible CPUs than active CPUs.  We calculate 
> > which CPU to the work time on in bnx2fc_process_new_cqes() like this:
> > 
> > unsigned int cpu = wqe % num_possible_cpus();
> > 
> > Since not all CPUs are active, we were trying to schedule work on 
> > non-active CPUs which meant that the upper layers were never notified of 
> > the completion.  With this change:
> > 
> > diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c 
> > b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > index c2288d6..6f08e43 100644
> > --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
> > @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct 
> > bnx2fc_rport *tgt)
> >                         /* Pending work request completion */
> >                         struct bnx2fc_work *work = NULL;
> >                         struct bnx2fc_percpu_s *fps = NULL;
> > -                       unsigned int cpu = wqe % num_possible_cpus();
> > +                       unsigned int cpu = wqe % num_active_cpus();
> > +
> > +                       /* Sanity check cpu to make sure it's online */
> > +                       if (!cpu_active(cpu))
> > +                               /* Default to CPU 0 */
> > +                               cpu = 0;
> >  
> >                         work = bnx2fc_alloc_work(tgt, wqe);
> >                         if (work) {
> > 
> > The issue is fixed.
> > 
> > Sebastian, can you add this change to your patch set?
> 
> Are sure that you can reliably reproduce the issue and fix it with the
> patch above? Because this patch:

oh. Okay. Now it clicked. It can fix the issue but it is still possible,
that CPU0 goes down between your check for it and schedule_work_on()
returning. Let my think of something…

Sebastian
Sebastian Andrzej Siewior May 17, 2017, 5:18 p.m. UTC | #3
On 2017-05-17 17:07:34 [+0200], To Chad Dupuis wrote:
> > > Sebastian, can you add this change to your patch set?
> > 
> > Are sure that you can reliably reproduce the issue and fix it with the
> > patch above? Because this patch:
> 
> oh. Okay. Now it clicked. It can fix the issue but it is still possible,
> that CPU0 goes down between your check for it and schedule_work_on()
> returning. Let my think of something…

Oh wait. I already thought about this: it may take bnx2fc_percpu from
CPU7 and run the worker on CPU3. The job isn't lost, because the worker
does:

| static void bnx2fc_percpu_io_work(struct work_struct *work_s)
| {
|         struct bnx2fc_percpu_s *p;
 …
|         p = container_of(work_s, struct bnx2fc_percpu_s, work);
| 
|         spin_lock_bh(&p->fp_work_lock);

and so will access bnx2fc_percpu of CPU7 running on CPU3. So I *think*
that your patch should make no difference and there should be no leak if
schedule_work_on() is invoked on an offline CPU.
 
Sebastian
diff mbox

Patch

diff --git a/init/main.c b/init/main.c
index b0c11cbf5ddf..483a971b1fd2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -997,6 +997,12 @@  static int __ref kernel_init(void *unused)
              "See Linux Documentation/admin-guide/init.rst for guidance.");
 }
 
+static void workfn(struct work_struct *work)
+{
+       pr_err("%s() %d\n", __func__, raw_smp_processor_id());
+}
+static DECLARE_WORK(work, workfn);
+
 static noinline void __init kernel_init_freeable(void)
 {
        /*
@@ -1040,6 +1046,15 @@  static noinline void __init kernel_init_freeable(void)
 
        (void) sys_dup(0);
        (void) sys_dup(0);
+       {
+
+               cpu_down(3);
+               pr_err("%s() num possible: %d\n", __func__, num_possible_cpus());
+               pr_err("%s() num online  : %d\n", __func__, num_online_cpus());
+               pr_err("%s() 3 active    : %d\n", __func__, cpu_active(3));
+               schedule_work_on(3, &work);
+               ssleep(5);
+       }
        /*
         * check if there is an early userspace init.  If yes, let it do all
         * the work