diff mbox

[1/2] IB/hfi1: Use preempt_{dis,en}able_nort()

Message ID 20171003154920.31566-2-acme@kernel.org (mailing list archive)
State Not Applicable
Headers show

Commit Message

Arnaldo Carvalho de Melo Oct. 3, 2017, 3:49 p.m. UTC
From: Arnaldo Carvalho de Melo <acme@redhat.com>

sc_buffer_alloc() disables preemption that will be reenabled by either
pio_copy() or seg_pio_copy_end(). But before disabling preemption it
grabs a spin lock that will be dropped after it disables preemption,
which ends up triggering a warning in migrate_disable() later on.

    spin_lock_irqsave(&sc->alloc_lock)
      migrate_disable() ++p->migrate_disable -> 2
    preempt_disable()
    spin_unlock_irqrestore(&sc->alloc_lock)
      migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
    spin_lock_irqsave(some other lock) -> b00m

And the WARN_ON code ends up tripping over this over and over in
log_store().

Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
command.

[512258.613862] sm-3297 16 .....11 359465349134644: sc_buffer_alloc <-hfi1_verbs_send_pio
[512258.613876] sm-3297 16 .....11 359465349134719: migrate_disable <-sc_buffer_alloc
[512258.613890] sm-3297 16 .....12 359465349134798: rt_spin_lock <-sc_buffer_alloc
[512258.613903] sm-3297 16 ....112 359465349135481: rt_spin_unlock <-sc_buffer_alloc
[512258.613916] sm-3297 16 ....112 359465349135556: migrate_enable <-sc_buffer_alloc
[512258.613935] sm-3297 16 ....112 359465349135788: seg_pio_copy_start <-hfi1_verbs_send_pio
[512258.613954] sm-3297 16 ....112 359465349136273: update_sge <-hfi1_verbs_send_pio
[512258.613981] sm-3297 16 ....112 359465349136373: seg_pio_copy_mid <-hfi1_verbs_send_pio
[512258.613999] sm-3297 16 ....112 359465349136873: update_sge <-hfi1_verbs_send_pio
[512258.614017] sm-3297 16 ....112 359465349136956: seg_pio_copy_mid <-hfi1_verbs_send_pio
[512258.614035] sm-3297 16 ....112 359465349137221: seg_pio_copy_end <-hfi1_verbs_send_pio
[512258.614048] sm-3297 16 .....12 359465349137360: migrate_disable <-hfi1_verbs_send_pio
[512258.614065] sm-3297 16 .....12 359465349137476: warn_slowpath_null <-migrate_disable
[512258.614081] sm-3297 16 .....12 359465349137564: __warn <-warn_slowpath_null
[512258.614088] sm-3297 16 .....12 359465349137958: printk <-__warn
[512258.614096] sm-3297 16 .....12 359465349138055: vprintk_default <-printk
[512258.614104] sm-3297 16 .....12 359465349138144: vprintk_emit <-vprintk_default
[512258.614111] sm-3297 16 d....12 359465349138312: _raw_spin_lock <-vprintk_emit
[512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
[512258.614127] sm-3297 16 .....12 359465349139068: migrate_disable <-vprintk_emit

According to a discussion (see Link: below) on the linux-rt-users
mailing list, this locking is done for performance reasons, not for
correctness, so use the _nort() variants to avoid the above problem.

Suggested-by: Julia Cartwright <julia@ni.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Dean Luick <dean.luick@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Kaike Wan <kaike.wan@intel.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170926210045.GO29872@jcartwri.amer.corp.natinst.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 drivers/infiniband/hw/hfi1/pio.c      | 2 +-
 drivers/infiniband/hw/hfi1/pio_copy.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Julia Cartwright Oct. 5, 2017, 2:17 p.m. UTC | #1
On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> sc_buffer_alloc() disables preemption that will be reenabled by either
> pio_copy() or seg_pio_copy_end(). But before disabling preemption it
> grabs a spin lock that will be dropped after it disables preemption,
> which ends up triggering a warning in migrate_disable() later on.
> 
>     spin_lock_irqsave(&sc->alloc_lock)
>       migrate_disable() ++p->migrate_disable -> 2
>     preempt_disable()
>     spin_unlock_irqrestore(&sc->alloc_lock)
>       migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
>     spin_lock_irqsave(some other lock) -> b00m
> 
> And the WARN_ON code ends up tripping over this over and over in
> log_store().
> 
> Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
> command.
> 
> [512258.613862] sm-3297 16 .....11 359465349134644: sc_buffer_alloc <-hfi1_verbs_send_pio
> [512258.613876] sm-3297 16 .....11 359465349134719: migrate_disable <-sc_buffer_alloc
> [512258.613890] sm-3297 16 .....12 359465349134798: rt_spin_lock <-sc_buffer_alloc
> [512258.613903] sm-3297 16 ....112 359465349135481: rt_spin_unlock <-sc_buffer_alloc
> [512258.613916] sm-3297 16 ....112 359465349135556: migrate_enable <-sc_buffer_alloc
> [512258.613935] sm-3297 16 ....112 359465349135788: seg_pio_copy_start <-hfi1_verbs_send_pio
> [512258.613954] sm-3297 16 ....112 359465349136273: update_sge <-hfi1_verbs_send_pio
> [512258.613981] sm-3297 16 ....112 359465349136373: seg_pio_copy_mid <-hfi1_verbs_send_pio
> [512258.613999] sm-3297 16 ....112 359465349136873: update_sge <-hfi1_verbs_send_pio
> [512258.614017] sm-3297 16 ....112 359465349136956: seg_pio_copy_mid <-hfi1_verbs_send_pio
> [512258.614035] sm-3297 16 ....112 359465349137221: seg_pio_copy_end <-hfi1_verbs_send_pio
> [512258.614048] sm-3297 16 .....12 359465349137360: migrate_disable <-hfi1_verbs_send_pio
> [512258.614065] sm-3297 16 .....12 359465349137476: warn_slowpath_null <-migrate_disable
> [512258.614081] sm-3297 16 .....12 359465349137564: __warn <-warn_slowpath_null
> [512258.614088] sm-3297 16 .....12 359465349137958: printk <-__warn
> [512258.614096] sm-3297 16 .....12 359465349138055: vprintk_default <-printk
> [512258.614104] sm-3297 16 .....12 359465349138144: vprintk_emit <-vprintk_default
> [512258.614111] sm-3297 16 d....12 359465349138312: _raw_spin_lock <-vprintk_emit
> [512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
> [512258.614127] sm-3297 16 .....12 359465349139068: migrate_disable <-vprintk_emit
> 
> According to a discussion (see Link: below) on the linux-rt-users
> mailing list, this locking is done for performance reasons, not for
> correctness, so use the _nort() variants to avoid the above problem.
> 
> Suggested-by: Julia Cartwright <julia@ni.com>
> Cc: Clark Williams <williams@redhat.com>
> Cc: Dean Luick <dean.luick@intel.com>
> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Kaike Wan <kaike.wan@intel.com>
> Cc: Leon Romanovsky <leonro@mellanox.com>
> Cc: linux-rdma@vger.kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
> Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20170926210045.GO29872-40jcartwri.amer.corp.natinst.com&d=DwIBaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=cAXq_8W9Othb2h8ZcWv8Vw&m=zzKtWJ595HB0jyuiFic0ZEkpmmjvGRXJHkGF27oyvCI&s=J4_Al0cbvQ9PCM3VbqzJ6apmpSZI9Xx7eq6Gcfucp24&e= 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  drivers/infiniband/hw/hfi1/pio.c      | 2 +-
>  drivers/infiniband/hw/hfi1/pio_copy.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
> index 615be68e40b3..3a30bde9a07b 100644
> --- a/drivers/infiniband/hw/hfi1/pio.c
> +++ b/drivers/infiniband/hw/hfi1/pio.c
> @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
>  
>  	/* there is enough room */
>  
> -	preempt_disable();
> +	preempt_disable_nort();
>  	this_cpu_inc(*sc->buffers_allocated);

Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
this_cpu_* operations perform a preemption check, which we'd trip.

You may also have to change these to the non-preempt checked variants.

   Julia
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Oct. 5, 2017, 3:27 p.m. UTC | #2
On Thu, 5 Oct 2017, Julia Cartwright wrote:
> On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > -	preempt_disable();
> > +	preempt_disable_nort();
> >  	this_cpu_inc(*sc->buffers_allocated);
> 
> Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> this_cpu_* operations perform a preemption check, which we'd trip.

Good point. Changing this to migrate_disable() would do the trick.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julia Cartwright Oct. 5, 2017, 3:37 p.m. UTC | #3
On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Julia Cartwright wrote:
> > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > > -	preempt_disable();
> > > +	preempt_disable_nort();
> > >  	this_cpu_inc(*sc->buffers_allocated);
> > 
> > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> > this_cpu_* operations perform a preemption check, which we'd trip.
> 
> Good point. Changing this to migrate_disable() would do the trick.

Wouldn't we still trip the preempt check even with migration disabled?
In another thread I asked the same question: should the preemption
checks here be converted to migration-checks in RT?

   Julia
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Oct. 5, 2017, 3:55 p.m. UTC | #4
On Thu, 5 Oct 2017 10:37:59 -0500
Julia Cartwright <julia@ni.com> wrote:

> On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> > On Thu, 5 Oct 2017, Julia Cartwright wrote:  
> > > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:  
> > > > -	preempt_disable();
> > > > +	preempt_disable_nort();
> > > >  	this_cpu_inc(*sc->buffers_allocated);  
> > > 
> > > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> > > this_cpu_* operations perform a preemption check, which we'd trip.  
> > 
> > Good point. Changing this to migrate_disable() would do the trick.  
> 
> Wouldn't we still trip the preempt check even with migration disabled?
> In another thread I asked the same question: should the preemption
> checks here be converted to migration-checks in RT?

Is it a "preemption check"? Getting a cpu # should only care about
migration. This isn't the same as a rcu_sched check is it? That does
care about preemption.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julia Cartwright Oct. 5, 2017, 4:05 p.m. UTC | #5
On Thu, Oct 05, 2017 at 11:55:39AM -0400, Steven Rostedt wrote:
> On Thu, 5 Oct 2017 10:37:59 -0500
> Julia Cartwright <julia@ni.com> wrote:
> 
> > On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> > > On Thu, 5 Oct 2017, Julia Cartwright wrote:  
> > > > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:  
> > > > > -	preempt_disable();
> > > > > +	preempt_disable_nort();
> > > > >  	this_cpu_inc(*sc->buffers_allocated);  
> > > > 
> > > > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> > > > this_cpu_* operations perform a preemption check, which we'd trip.  
> > > 
> > > Good point. Changing this to migrate_disable() would do the trick.  
> > 
> > Wouldn't we still trip the preempt check even with migration disabled?
> > In another thread I asked the same question: should the preemption
> > checks here be converted to migration-checks in RT?
> 
> Is it a "preemption check"?

Sorry if I was unclear, more precisely: the this_cpu_* family of
accessors, w/ CONFIG_DEBUG_PREEMPT currently spits out a warning when
the caller is invoked in a context where preemption is enabled.

The check is shared w/ the smp_processor_id() check, as implemented in
lib/smp_processor_id.c.  It effectively boils down to a check of
preempt_count() and irqs_disabled().

> Getting a cpu # should only care about migration.

I think we're agreeing? :)

> This isn't the same as a rcu_sched check is it? That does care about
> preemption.

This is something totally different, I think.

   Julia
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Oct. 5, 2017, 4:16 p.m. UTC | #6
On Thu, 5 Oct 2017, Julia Cartwright wrote:

> On Thu, Oct 05, 2017 at 11:55:39AM -0400, Steven Rostedt wrote:
> > On Thu, 5 Oct 2017 10:37:59 -0500
> > Julia Cartwright <julia@ni.com> wrote:
> > 
> > > On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 5 Oct 2017, Julia Cartwright wrote:  
> > > > > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:  
> > > > > > -	preempt_disable();
> > > > > > +	preempt_disable_nort();
> > > > > >  	this_cpu_inc(*sc->buffers_allocated);  
> > > > > 
> > > > > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> > > > > this_cpu_* operations perform a preemption check, which we'd trip.  
> > > > 
> > > > Good point. Changing this to migrate_disable() would do the trick.  
> > > 
> > > Wouldn't we still trip the preempt check even with migration disabled?
> > > In another thread I asked the same question: should the preemption
> > > checks here be converted to migration-checks in RT?
> > 
> > Is it a "preemption check"?
> 
> Sorry if I was unclear, more precisely: the this_cpu_* family of
> accessors, w/ CONFIG_DEBUG_PREEMPT currently spits out a warning when
> the caller is invoked in a context where preemption is enabled.
> 
> The check is shared w/ the smp_processor_id() check, as implemented in
> lib/smp_processor_id.c.  It effectively boils down to a check of
> preempt_count() and irqs_disabled().

Except that on RT that check cares about the migrate disable state. You can
invoke this_cpu_* and smp_processor_id() in preemptible/interruptible
context because of:

	if (cpumask_equal(current->cpus_ptr, cpumask_of(this_cpu)))
		goto out;

That's true even on mainline.

But you are right that this check needs some update because
migrate_disable() does not immediately update the allowed cpumask IIRC.

Thanks,

	tglx




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Oct. 5, 2017, 4:30 p.m. UTC | #7
On 2017-10-03 12:49:19 [-0300], Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
> index 615be68e40b3..3a30bde9a07b 100644
> --- a/drivers/infiniband/hw/hfi1/pio.c
> +++ b/drivers/infiniband/hw/hfi1/pio.c
> @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
>  
>  	/* there is enough room */
>  
> -	preempt_disable();
> +	preempt_disable_nort();
>  	this_cpu_inc(*sc->buffers_allocated);
>  
>  	/* read this once */

please replace the preempt_disable() / enable with local_lock() /
unlock. The section does not look like it could cope with multiple users
dereferencing / using the same per-CPU variables.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 5, 2017, 4:53 p.m. UTC | #8
Em Thu, Oct 05, 2017 at 09:17:44AM -0500, Julia Cartwright escreveu:
> On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > +++ b/drivers/infiniband/hw/hfi1/pio.c
> > @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,

> >  	/* there is enough room */

> > -	preempt_disable();
> > +	preempt_disable_nort();
> >  	this_cpu_inc(*sc->buffers_allocated);

> Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?

No

> I believe that the this_cpu_* operations perform a preemption check, which we'd trip.

Humm, looking at include/linux/percpu-defs.h on v4.11.12-rt14 I see
(trimmed to what we're discussing here):

#ifdef CONFIG_DEBUG_PREEMPT
extern void __this_cpu_preempt_check(const char *op);
#else
static inline void __this_cpu_preempt_check(const char *op) { }
#endif

#define __this_cpu_add(pcp, val)					\
({									\
	__this_cpu_preempt_check("add");				\
	raw_cpu_add(pcp, val);						\
})
#define __this_cpu_inc(pcp)		__this_cpu_add(pcp, 1)

/*
 * Operations with implied preemption/interrupt protection.  These
 * operations can be used without worrying about preemption or interrupt.
 */
#define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
#define this_cpu_inc(pcp)               this_cpu_add(pcp, 1)
 
> You may also have to change these to the non-preempt checked variants.

So __this_cpu_inc() checks preemption but this_cpu_inc() doesn't and
thus we're ok here? Or am I getting lost in this maze of defines? :-)

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julia Cartwright Oct. 5, 2017, 6:29 p.m. UTC | #9
On Thu, Oct 05, 2017 at 01:53:05PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 05, 2017 at 09:17:44AM -0500, Julia Cartwright escreveu:
> > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > > +++ b/drivers/infiniband/hw/hfi1/pio.c
> > > @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
> 
> > >  	/* there is enough room */
> 
> > > -	preempt_disable();
> > > +	preempt_disable_nort();
> > >  	this_cpu_inc(*sc->buffers_allocated);
> 
> > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?
> 
> No
> 
> > I believe that the this_cpu_* operations perform a preemption check, which we'd trip.
>
> Humm, looking at include/linux/percpu-defs.h on v4.11.12-rt14 I see
> (trimmed to what we're discussing here):
>
> #ifdef CONFIG_DEBUG_PREEMPT
> extern void __this_cpu_preempt_check(const char *op);
> #else
> static inline void __this_cpu_preempt_check(const char *op) { }
> #endif
>
> #define __this_cpu_add(pcp, val)					\
> ({									\
> 	__this_cpu_preempt_check("add");				\
> 	raw_cpu_add(pcp, val);						\
> })
> #define __this_cpu_inc(pcp)		__this_cpu_add(pcp, 1)
>
> /*
>  * Operations with implied preemption/interrupt protection.  These
>  * operations can be used without worrying about preemption or interrupt.
>  */
> #define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
> #define this_cpu_inc(pcp)               this_cpu_add(pcp, 1)
>
> > You may also have to change these to the non-preempt checked variants.
>
> So __this_cpu_inc() checks preemption but this_cpu_inc() doesn't and
> thus we're ok here? Or am I getting lost in this maze of defines? :-)

I think I was the one lost in the maze.  You are correct.  Sorry for the
confusion.

My mind expected that the __ prefixed versions would be the "raw"
versions that are free of checks, with the checks made in the non
prefixed versions, but it is the other way around.

I'm happy to accept that the bug is within my mind.

   Julia
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 5, 2017, 6:53 p.m. UTC | #10
Em Thu, Oct 05, 2017 at 01:29:00PM -0500, Julia Cartwright escreveu:
> On Thu, Oct 05, 2017 at 01:53:05PM -0300, Arnaldo Carvalho de Melo wrote:
> > So __this_cpu_inc() checks preemption but this_cpu_inc() doesn't and
> > thus we're ok here? Or am I getting lost in this maze of defines? :-)
 
> I think I was the one lost in the maze.  You are correct.  Sorry for the
> confusion.
 
> My mind expected that the __ prefixed versions would be the "raw"
> versions that are free of checks, with the checks made in the non
> prefixed versions, but it is the other way around.
 
> I'm happy to accept that the bug is within my mind.

:-) 

Ok, now I'm taking the opportunity to read more about local locks, as
Sebastian thinks are needed in this case, which I'm not entirely
convinced from the discussion that took place here, and as you took part
in that discussion and suggested using the nort variants of
preempt_disable, do you still think this is the way to go?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Oct. 6, 2017, 9:19 a.m. UTC | #11
On 2017-10-05 18:30:19 [+0200], To Arnaldo Carvalho de Melo wrote:
> On 2017-10-03 12:49:19 [-0300], Arnaldo Carvalho de Melo wrote:
> > From: Arnaldo Carvalho de Melo <acme@redhat.com>
> > 
> > diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
> > index 615be68e40b3..3a30bde9a07b 100644
> > --- a/drivers/infiniband/hw/hfi1/pio.c
> > +++ b/drivers/infiniband/hw/hfi1/pio.c
> > @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
> >  
> >  	/* there is enough room */
> >  
> > -	preempt_disable();
> > +	preempt_disable_nort();
> >  	this_cpu_inc(*sc->buffers_allocated);
> >  
> >  	/* read this once */
> 
> please replace the preempt_disable() / enable with local_lock() /
> unlock. The section does not look like it could cope with multiple users
> dereferencing / using the same per-CPU variables.

ignore me, there is a spin_lock to protect that…

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dennis Dalessandro Oct. 10, 2017, 6:59 p.m. UTC | #12
On 10/3/2017 11:49 AM, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> sc_buffer_alloc() disables preemption that will be reenabled by either
> pio_copy() or seg_pio_copy_end(). But before disabling preemption it
> grabs a spin lock that will be dropped after it disables preemption,
> which ends up triggering a warning in migrate_disable() later on.
> 
>      spin_lock_irqsave(&sc->alloc_lock)
>        migrate_disable() ++p->migrate_disable -> 2
>      preempt_disable()
>      spin_unlock_irqrestore(&sc->alloc_lock)
>        migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
>      spin_lock_irqsave(some other lock) -> b00m
> 
> And the WARN_ON code ends up tripping over this over and over in
> log_store().
> 
> Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
> command.
> 
> [512258.613862] sm-3297 16 .....11 359465349134644: sc_buffer_alloc <-hfi1_verbs_send_pio
> [512258.613876] sm-3297 16 .....11 359465349134719: migrate_disable <-sc_buffer_alloc
> [512258.613890] sm-3297 16 .....12 359465349134798: rt_spin_lock <-sc_buffer_alloc
> [512258.613903] sm-3297 16 ....112 359465349135481: rt_spin_unlock <-sc_buffer_alloc
> [512258.613916] sm-3297 16 ....112 359465349135556: migrate_enable <-sc_buffer_alloc
> [512258.613935] sm-3297 16 ....112 359465349135788: seg_pio_copy_start <-hfi1_verbs_send_pio
> [512258.613954] sm-3297 16 ....112 359465349136273: update_sge <-hfi1_verbs_send_pio
> [512258.613981] sm-3297 16 ....112 359465349136373: seg_pio_copy_mid <-hfi1_verbs_send_pio
> [512258.613999] sm-3297 16 ....112 359465349136873: update_sge <-hfi1_verbs_send_pio
> [512258.614017] sm-3297 16 ....112 359465349136956: seg_pio_copy_mid <-hfi1_verbs_send_pio
> [512258.614035] sm-3297 16 ....112 359465349137221: seg_pio_copy_end <-hfi1_verbs_send_pio
> [512258.614048] sm-3297 16 .....12 359465349137360: migrate_disable <-hfi1_verbs_send_pio
> [512258.614065] sm-3297 16 .....12 359465349137476: warn_slowpath_null <-migrate_disable
> [512258.614081] sm-3297 16 .....12 359465349137564: __warn <-warn_slowpath_null
> [512258.614088] sm-3297 16 .....12 359465349137958: printk <-__warn
> [512258.614096] sm-3297 16 .....12 359465349138055: vprintk_default <-printk
> [512258.614104] sm-3297 16 .....12 359465349138144: vprintk_emit <-vprintk_default
> [512258.614111] sm-3297 16 d....12 359465349138312: _raw_spin_lock <-vprintk_emit
> [512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
> [512258.614127] sm-3297 16 .....12 359465349139068: migrate_disable <-vprintk_emit
> 
> According to a discussion (see Link: below) on the linux-rt-users
> mailing list, this locking is done for performance reasons, not for
> correctness, so use the _nort() variants to avoid the above problem.
> 
> Suggested-by: Julia Cartwright <julia@ni.com>
> Cc: Clark Williams <williams@redhat.com>
> Cc: Dean Luick <dean.luick@intel.com>
> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Kaike Wan <kaike.wan@intel.com>
> Cc: Leon Romanovsky <leonro@mellanox.com>
> Cc: linux-rdma@vger.kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
> Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: http://lkml.kernel.org/r/20170926210045.GO29872@jcartwri.amer.corp.natinst.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

I assume you are not asking for Doug to pick this up for linux-rdma and 
that this more of an RFC sort of deal and the intended destination is 
the -rt tree? Anyway, for this patch:

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 10, 2017, 7:02 p.m. UTC | #13
Em Tue, Oct 10, 2017 at 02:59:18PM -0400, Dennis Dalessandro escreveu:
> On 10/3/2017 11:49 AM, Arnaldo Carvalho de Melo wrote:
> > From: Arnaldo Carvalho de Melo <acme@redhat.com>
> > 
> > sc_buffer_alloc() disables preemption that will be reenabled by either
> > pio_copy() or seg_pio_copy_end(). But before disabling preemption it
> > grabs a spin lock that will be dropped after it disables preemption,
> > which ends up triggering a warning in migrate_disable() later on.
> > 
> >      spin_lock_irqsave(&sc->alloc_lock)
> >        migrate_disable() ++p->migrate_disable -> 2
> >      preempt_disable()
> >      spin_unlock_irqrestore(&sc->alloc_lock)
> >        migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
> >      spin_lock_irqsave(some other lock) -> b00m
> > 
> > And the WARN_ON code ends up tripping over this over and over in
> > log_store().
> > 
> > Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
> > command.
> > 
> > [512258.613862] sm-3297 16 .....11 359465349134644: sc_buffer_alloc <-hfi1_verbs_send_pio
> > [512258.613876] sm-3297 16 .....11 359465349134719: migrate_disable <-sc_buffer_alloc
> > [512258.613890] sm-3297 16 .....12 359465349134798: rt_spin_lock <-sc_buffer_alloc
> > [512258.613903] sm-3297 16 ....112 359465349135481: rt_spin_unlock <-sc_buffer_alloc
> > [512258.613916] sm-3297 16 ....112 359465349135556: migrate_enable <-sc_buffer_alloc
> > [512258.613935] sm-3297 16 ....112 359465349135788: seg_pio_copy_start <-hfi1_verbs_send_pio
> > [512258.613954] sm-3297 16 ....112 359465349136273: update_sge <-hfi1_verbs_send_pio
> > [512258.613981] sm-3297 16 ....112 359465349136373: seg_pio_copy_mid <-hfi1_verbs_send_pio
> > [512258.613999] sm-3297 16 ....112 359465349136873: update_sge <-hfi1_verbs_send_pio
> > [512258.614017] sm-3297 16 ....112 359465349136956: seg_pio_copy_mid <-hfi1_verbs_send_pio
> > [512258.614035] sm-3297 16 ....112 359465349137221: seg_pio_copy_end <-hfi1_verbs_send_pio
> > [512258.614048] sm-3297 16 .....12 359465349137360: migrate_disable <-hfi1_verbs_send_pio
> > [512258.614065] sm-3297 16 .....12 359465349137476: warn_slowpath_null <-migrate_disable
> > [512258.614081] sm-3297 16 .....12 359465349137564: __warn <-warn_slowpath_null
> > [512258.614088] sm-3297 16 .....12 359465349137958: printk <-__warn
> > [512258.614096] sm-3297 16 .....12 359465349138055: vprintk_default <-printk
> > [512258.614104] sm-3297 16 .....12 359465349138144: vprintk_emit <-vprintk_default
> > [512258.614111] sm-3297 16 d....12 359465349138312: _raw_spin_lock <-vprintk_emit
> > [512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
> > [512258.614127] sm-3297 16 .....12 359465349139068: migrate_disable <-vprintk_emit
> > 
> > According to a discussion (see Link: below) on the linux-rt-users
> > mailing list, this locking is done for performance reasons, not for
> > correctness, so use the _nort() variants to avoid the above problem.
> > 
> > Suggested-by: Julia Cartwright <julia@ni.com>
> > Cc: Clark Williams <williams@redhat.com>
> > Cc: Dean Luick <dean.luick@intel.com>
> > Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> > Cc: Doug Ledford <dledford@redhat.com>
> > Cc: Kaike Wan <kaike.wan@intel.com>
> > Cc: Leon Romanovsky <leonro@mellanox.com>
> > Cc: linux-rdma@vger.kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
> > Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Link: http://lkml.kernel.org/r/20170926210045.GO29872@jcartwri.amer.corp.natinst.com
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> I assume you are not asking for Doug to pick this up for linux-rdma and that
> this more of an RFC sort of deal and the intended destination is the -rt

Right, and so far there were no strong objection for this one to be
merged on the -rt tree, Sebastian, can you do it please? Adding Dennis'
reviewed-by, one of maintainers for this driver, ok?

> tree? Anyway, for this patch:
> 
> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Oct. 11, 2017, 11:03 a.m. UTC | #14
On 2017-10-10 16:02:18 [-0300], Arnaldo Carvalho de Melo wrote:
> 
> Right, and so far there were no strong objection for this one to be
> merged on the -rt tree, Sebastian, can you do it please? Adding Dennis'
> reviewed-by, one of maintainers for this driver, ok?

I am still curious about the performance improvement that is with this
preempt disable section compared to without it compared to !PREEMPT
kernel..
If that is important then migrate_disable() would do that on RT.
I guess that there were no splat with CONFIG_DEBUG_PREEMPT ?
If that is all okay, please resend the patch with the explanation why
this preempt_disable() does not matter and I pick it up.

> > tree? Anyway, for this patch:
> > 
> > Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 11, 2017, 1:43 p.m. UTC | #15
Em Wed, Oct 11, 2017 at 01:03:55PM +0200, Sebastian Andrzej Siewior escreveu:
> On 2017-10-10 16:02:18 [-0300], Arnaldo Carvalho de Melo wrote:
> > 
> > Right, and so far there were no strong objection for this one to be
> > merged on the -rt tree, Sebastian, can you do it please? Adding Dennis'
> > reviewed-by, one of maintainers for this driver, ok?
> 
> I am still curious about the performance improvement that is with this
> preempt disable section compared to without it compared to !PREEMPT
> kernel..
> If that is important then migrate_disable() would do that on RT.

I can try that

> I guess that there were no splat with CONFIG_DEBUG_PREEMPT ?

I haven't tried that

> If that is all okay, please resend the patch with the explanation why
> this preempt_disable() does not matter and I pick it up.

I can just pick the explanation given by the authors and stash it there,
I guess.
 
> > > tree? Anyway, for this patch:
> > > 
> > > Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> 
> Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
index 615be68e40b3..3a30bde9a07b 100644
--- a/drivers/infiniband/hw/hfi1/pio.c
+++ b/drivers/infiniband/hw/hfi1/pio.c
@@ -1421,7 +1421,7 @@  struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
 
 	/* there is enough room */
 
-	preempt_disable();
+	preempt_disable_nort();
 	this_cpu_inc(*sc->buffers_allocated);
 
 	/* read this once */
diff --git a/drivers/infiniband/hw/hfi1/pio_copy.c b/drivers/infiniband/hw/hfi1/pio_copy.c
index 03024cec78dd..c3f48f705c97 100644
--- a/drivers/infiniband/hw/hfi1/pio_copy.c
+++ b/drivers/infiniband/hw/hfi1/pio_copy.c
@@ -162,7 +162,7 @@  void pio_copy(struct hfi1_devdata *dd, struct pio_buf *pbuf, u64 pbc,
 
 	/* finished with this buffer */
 	this_cpu_dec(*pbuf->sc->buffers_allocated);
-	preempt_enable();
+	preempt_enable_nort();
 }
 
 /*
@@ -753,5 +753,5 @@  void seg_pio_copy_end(struct pio_buf *pbuf)
 
 	/* finished with this buffer */
 	this_cpu_dec(*pbuf->sc->buffers_allocated);
-	preempt_enable();
+	preempt_enable_nort();
 }