Message ID | 20131126160815.293633156@infradead.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 26/11/2013 17:57, Peter Zijlstra wrote: > > Replace sched_clock() usage with local_clock() which has a bounded > drift between CPUs (<2 jiffies). > Peter, I have tested this patch and I see a performance regression of about 1.5%. Maybe it would be better, rather then testing in the fast path, to simply disallow busy polling altogether when sched_clock_stable is not true? Thanks, Eliezer -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 28, 2013 at 06:49:00PM +0200, Eliezer Tamir wrote: > I have tested this patch and I see a performance regression of about > 1.5%. Cute, can you qualify your metric? Since this is a poll loop the only metric that would be interesting is the response latency. Is that what's increased by 1.5%? Also, what's the standard deviation of your result? Also, can you provide relevant perf results for this? Is it really the sti;cli pair that's degrading your latency? Better yet, can you provide us with a simple test-case that we can run locally (preferably single machine setup, using localnet or somesuch). > Maybe it would be better, rather then testing in the fast path, to > simply disallow busy polling altogether when sched_clock_stable is > not true? Sadly that doesn't work; sched_clock_stable can become false at any time after boot (and does, even on recent machines). That said; let me see if I can come up with a few patches to optimize the entire thing; that'd be something we all benefit from. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28/11/2013 19:40, Peter Zijlstra wrote: > On Thu, Nov 28, 2013 at 06:49:00PM +0200, Eliezer Tamir wrote: >> I have tested this patch and I see a performance regression of about >> 1.5%. > > Cute, can you qualify your metric? Since this is a poll loop the only > metric that would be interesting is the response latency. Is that what's > increased by 1.5%? Also, what's the standard deviation of your result? Sorry, I should have been more specific. I use netperf tcp_rr, with all settings except the time (30s) on their defaults. The setup is exactly the same as in the commit message of the original patch set. I get 91.5 KRR/s vs. 90.0 KRR/s. Unfortunately you need two machines, both of which need NICs that have driver support for busy poll. currently AFAIK bnx2x, ixgbe, mlx4 and myri10ge are the only ones, but it's not that hard to add to most NAPI based drivers. I will try to test your latest patches and hopefully also get some perf numbers on Sunday. Thanks, Eliezer -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -42,27 +42,10 @@ static inline bool net_busy_loop_on(void return sysctl_net_busy_poll; } -/* a wrapper to make debug_smp_processor_id() happy - * we can use sched_clock() because we don't care much about precision - * we only care that the average is bounded - */ -#ifdef CONFIG_DEBUG_PREEMPT static inline u64 busy_loop_us_clock(void) { - u64 rc; - - preempt_disable_notrace(); - rc = sched_clock(); - preempt_enable_no_resched_notrace(); - - return rc >> 10; -} -#else /* CONFIG_DEBUG_PREEMPT */ -static inline u64 busy_loop_us_clock(void) -{ - return sched_clock() >> 10; + return local_clock() >> 10; } -#endif /* CONFIG_DEBUG_PREEMPT */ static inline unsigned long sk_busy_loop_end_time(struct sock *sk) {