Softirq priority inversion from "softirq: reduce latencies"

Message ID	1456762914.648.76.camel@edumazet-ThinkPad-T530 (mailing list archive)
State	Not Applicable
Headers	show Return-Path: <dmaengine-owner@kernel.org> Message-ID: <1456762914.648.76.camel@edumazet-ThinkPad-T530> Subject: Re: Softirq priority inversion from "softirq: reduce latencies" From: Eric Dumazet <eric.dumazet@gmail.com> To: Peter Hurley <peter@hurleysoftware.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>, Francois Romieu <romieu@fr.zoreil.com>, Eric Dumazet <edumazet@google.com>, David Miller <davem@davemloft.net>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Greg KH <gregkh@linuxfoundation.org>, dmaengine@vger.kernel.org, John Ogness <john.ogness@linutronix.de>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Andrew Morton <akpm@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de> Date: Mon, 29 Feb 2016 08:21:54 -0800 In-Reply-To: <56D469B2.2070707@hurleysoftware.com> References: <56D1E8B6.6090003@hurleysoftware.com> <1456638957.3676.12.camel@gmail.com> <20160228170109.GA16322@electric-eye.fr.zoreil.com> <1456721889.3488.67.camel@gmail.com> <56D45DAF.5070709@hurleysoftware.com> <1456759153.648.61.camel@edumazet-ThinkPad-T530> <56D469B2.2070707@hurleysoftware.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: dmaengine-owner@vger.kernel.org Precedence: bulk

Eric Dumazet Feb. 29, 2016, 4:21 p.m. UTC

On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote:

>  The current kernel is HZ=250 but this would occur on HZ=1000 as well.

Right. But the problem with HZ=100 and HZ=250 is that the detection can
happens because jiffy granularity is too coarse, since 

msecs_to_jiffies(2) -> 1

Following patch might reduce the probability, but wont really fix your
problem.

Fact that ksoftirqd prio is not what you want is completely orthogonal.




--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Hurley Feb. 29, 2016, 6:05 p.m. UTC | #1

On 02/29/2016 08:21 AM, Eric Dumazet wrote:
> On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote:
> 
>>  The current kernel is HZ=250 but this would occur on HZ=1000 as well.
> 
> Right. But the problem with HZ=100 and HZ=250 is that the detection can
> happens because jiffy granularity is too coarse, since 
> 
> msecs_to_jiffies(2) -> 1
> 
> Following patch might reduce the probability, but wont really fix your
> problem.
> 
> Fact that ksoftirqd prio is not what you want is completely orthogonal.
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 479e443..f7cc594 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -180,7 +180,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
>  
>  /*
>   * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> - * but break the loop if need_resched() is set or after 2 ms.
> + * but break the loop if need_resched() is set or after 2 ms/ticks.
>   * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
>   * certain cases, such as stop_machine(), jiffies may cease to
>   * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> @@ -191,7 +191,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
>   * we want to handle softirqs as soon as possible, but they
>   * should not be able to lock up the box.
>   */
> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME  (1 + msecs_to_jiffies(2))
>  #define MAX_SOFTIRQ_RESTART 10
>  
>  #ifdef CONFIG_TRACE_IRQFLAGS

While I appreciate the attempt, that's not the problem.

Just to be clear

		if (time_before(jiffies, end) && !need_resched() &&
		    --max_restart)
			goto restart;

aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.




--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet Feb. 29, 2016, 6:24 p.m. UTC | #2

On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote:

> While I appreciate the attempt, that's not the problem.
> 
> Just to be clear
> 
> 		if (time_before(jiffies, end) && !need_resched() &&
> 		    --max_restart)
> 			goto restart;
> 
> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.

Sure, now remove the 1st and 2nd condition.

You would still 'abort' (ie wakeup ksoftirqd really) when --max_restart
becomes 0

So, instead of some subtle load dependent bug, you know have a reliable
trigger.

The fact it took 3 years for someone to complain about this change
should tell us something really.

The only way for your bug to hide would be to remove all the 'break
infinite loop' logic.

And this is not going to happen.

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Hurley Feb. 29, 2016, 6:53 p.m. UTC | #3

On 02/29/2016 10:24 AM, Eric Dumazet wrote:
> On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote:
> 
>> While I appreciate the attempt, that's not the problem.
>>
>> Just to be clear
>>
>> 		if (time_before(jiffies, end) && !need_resched() &&
>> 		    --max_restart)
>> 			goto restart;
>>
>> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
> 
> 
> Sure, now remove the 1st and 2nd condition.

Well just removing the 2nd condition has everything working fine,
because that fixes the priority inversion.

> You would still 'abort' (ie wakeup ksoftirqd really) when --max_restart
> becomes 0

Sure. Which would mean there's contended heavy i/o load so the driver
has to fallback to non-DMA. That's an acceptable outcome.

> So, instead of some subtle load dependent bug, you know have a reliable
> trigger.

There's no "subtle load dependent bug" here.

The driver has a fallback mode of operation that it relies on without
DMA. Of course, as I already wrote, this has consequences.

If system resources are _actually contended_, then naturally, fighting
for cpu and i/o time is fine, and I'm happy to do that in ksoftirqd.

However, when system resources are _not_ contended, it makes no
sense to be forced to revert to ksoftirqd resolution, which is strictly
intended as fallback.

Or flipping your argument on its head, why not just _always_ execute
softirq in ksoftirqd?

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Gleixner Feb. 29, 2016, 7:14 p.m. UTC | #4

On Mon, 29 Feb 2016, Peter Hurley wrote:
> On 02/29/2016 10:24 AM, Eric Dumazet wrote:
> >> Just to be clear
> >>
> >> 		if (time_before(jiffies, end) && !need_resched() &&
> >> 		    --max_restart)
> >> 			goto restart;
> >>
> >> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
> > 
> > Sure, now remove the 1st and 2nd condition.
> 
> Well just removing the 2nd condition has everything working fine,
> because that fixes the priority inversion.

No. It does not fix anything. It hides the shortcomings of the driver.

> However, when system resources are _not_ contended, it makes no
> sense to be forced to revert to ksoftirqd resolution, which is strictly
> intended as fallback.

No. You claim it is simply because your driver does not handle that situation
properly.

> Or flipping your argument on its head, why not just _always_ execute
> softirq in ksoftirqd?

Which is what that change effectivley does. And that makes a lot of sense,
because you get the softirq load under scheduler control and do not let the
softirq run as a context stealing entity which is completely uncontrollable by
the scheduler.

Running the softirq on return from interrupt can cause real priority
inversions.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller Feb. 29, 2016, 8:24 p.m. UTC | #5

From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 29 Feb 2016 20:14:36 +0100 (CET)

> On Mon, 29 Feb 2016, Peter Hurley wrote:
>> Or flipping your argument on its head, why not just _always_ execute
>> softirq in ksoftirqd?
> 
> Which is what that change effectivley does. And that makes a lot of sense,
> because you get the softirq load under scheduler control and do not let the
> softirq run as a context stealing entity which is completely uncontrollable by
> the scheduler.

+1
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Hurley Feb. 29, 2016, 11:04 p.m. UTC | #6

On 02/29/2016 11:14 AM, Thomas Gleixner wrote:
> On Mon, 29 Feb 2016, Peter Hurley wrote:
>> On 02/29/2016 10:24 AM, Eric Dumazet wrote:
>>>> Just to be clear
>>>>
>>>> 		if (time_before(jiffies, end) && !need_resched() &&
>>>> 		    --max_restart)
>>>> 			goto restart;
>>>>
>>>> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
>>>
>>> Sure, now remove the 1st and 2nd condition.
>>
>> Well just removing the 2nd condition has everything working fine,
>> because that fixes the priority inversion.
> 
> No. It does not fix anything. It hides the shortcomings of the driver.
>  
>> However, when system resources are _not_ contended, it makes no
>> sense to be forced to revert to ksoftirqd resolution, which is strictly
>> intended as fallback.
> 
> No. You claim it is simply because your driver does not handle that situation
> properly.
>  
>> Or flipping your argument on its head, why not just _always_ execute
>> softirq in ksoftirqd?
> 
> Which is what that change effectivley does. And that makes a lot of sense,
> because you get the softirq load under scheduler control and do not let the
> softirq run as a context stealing entity which is completely uncontrollable by
> the scheduler.

Ok, fair enough.

However, charging [in the scheduler sense] very lightweight DMA completion for
one subsystem collectively with very heavyweight NET_RX (doing garbage collection
in softirq!) is hardly ideal.

The alternative being threaded interrupt handlers (which are essentially treated
as 0.000000 scheduler cost).

I just want to make sure that's the conscious choice being made, when the
patches for converting from tasklet to threaded irq start hitting subsystem
maintainers.

Regards,
Peter Hurley


--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Softirq priority inversion from "softirq: reduce latencies"

Commit Message

Comments

Patch