diff mbox

[7/8] sched, net: Fixup busy_loop_us_clock()

Message ID 20131126160815.293633156@infradead.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Peter Zijlstra Nov. 26, 2013, 3:57 p.m. UTC
The only valid use of preempt_enable_no_resched() is if the very next
line is schedule() or if we know preemption cannot actually be enabled
by that statement due to known more preempt_count 'refs'.

This busy_poll stuff looks to be completely and utterly broken,
sched_clock() can return utter garbage with interrupts enabled (rare
but still) and it can drift unbounded between CPUs.

This means that if you get preempted/migrated and your new CPU is
years behind on the previous CPU we get to busy spin for a _very_ long
time.

There is a _REASON_ sched_clock() warns about preemptability -
papering over it with a preempt_disable()/preempt_enable_no_resched()
is just terminal brain damage on so many levels.

Replace sched_clock() usage with local_clock() which has a bounded
drift between CPUs (<2 jiffies).

There is a further problem with the entire busy wait poll thing in
that the spin time is additive to the syscall timeout, not inclusive.

Cc: David S. Miller <davem@davemloft.net>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/net/busy_poll.h |   19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eliezer Tamir Nov. 28, 2013, 4:49 p.m. UTC | #1
On 26/11/2013 17:57, Peter Zijlstra wrote:
> 
> Replace sched_clock() usage with local_clock() which has a bounded
> drift between CPUs (<2 jiffies).
> 

Peter,

I have tested this patch and I see a performance regression of about
1.5%.

Maybe it would be better, rather then testing in the fast path, to
simply disallow busy polling altogether when sched_clock_stable is
not true?

Thanks,
Eliezer
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Nov. 28, 2013, 5:40 p.m. UTC | #2
On Thu, Nov 28, 2013 at 06:49:00PM +0200, Eliezer Tamir wrote:
> I have tested this patch and I see a performance regression of about
> 1.5%.

Cute, can you qualify your metric? Since this is a poll loop the only
metric that would be interesting is the response latency. Is that what's
increased by 1.5%? Also, what's the standard deviation of your result?

Also, can you provide relevant perf results for this? Is it really the
sti;cli pair that's degrading your latency?

Better yet, can you provide us with a simple test-case that we can run
locally (preferably single machine setup, using localnet or somesuch).

> Maybe it would be better, rather then testing in the fast path, to
> simply disallow busy polling altogether when sched_clock_stable is
> not true?

Sadly that doesn't work; sched_clock_stable can become false at any time
after boot (and does, even on recent machines).

That said; let me see if I can come up with a few patches to optimize
the entire thing; that'd be something we all benefit from.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir Nov. 29, 2013, 1:52 p.m. UTC | #3
On 28/11/2013 19:40, Peter Zijlstra wrote:
> On Thu, Nov 28, 2013 at 06:49:00PM +0200, Eliezer Tamir wrote:
>> I have tested this patch and I see a performance regression of about
>> 1.5%.
> 
> Cute, can you qualify your metric? Since this is a poll loop the only
> metric that would be interesting is the response latency. Is that what's
> increased by 1.5%? Also, what's the standard deviation of your result?

Sorry, I should have been more specific.

I use netperf tcp_rr, with all settings except the time (30s) on their
defaults. The setup is exactly the same as in the commit message of the
original patch set.

I get 91.5 KRR/s vs. 90.0 KRR/s.

Unfortunately you need two machines, both of which need NICs that have
driver support for busy poll. currently AFAIK bnx2x, ixgbe, mlx4 and
myri10ge are the only ones, but it's not that hard to add to most NAPI
based drivers.

I will try to test your latest patches and hopefully also get some perf
numbers on Sunday.

Thanks,
Eliezer
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -42,27 +42,10 @@  static inline bool net_busy_loop_on(void
 	return sysctl_net_busy_poll;
 }
 
-/* a wrapper to make debug_smp_processor_id() happy
- * we can use sched_clock() because we don't care much about precision
- * we only care that the average is bounded
- */
-#ifdef CONFIG_DEBUG_PREEMPT
 static inline u64 busy_loop_us_clock(void)
 {
-	u64 rc;
-
-	preempt_disable_notrace();
-	rc = sched_clock();
-	preempt_enable_no_resched_notrace();
-
-	return rc >> 10;
-}
-#else /* CONFIG_DEBUG_PREEMPT */
-static inline u64 busy_loop_us_clock(void)
-{
-	return sched_clock() >> 10;
+	return local_clock() >> 10;
 }
-#endif /* CONFIG_DEBUG_PREEMPT */
 
 static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
 {