From patchwork Thu Mar 5 00:02:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ingo Molnar X-Patchwork-Id: 5941431 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 170D1BF440 for ; Thu, 5 Mar 2015 00:02:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3FB1D20320 for ; Thu, 5 Mar 2015 00:02:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3F9C12014A for ; Thu, 5 Mar 2015 00:02:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753388AbbCEACe (ORCPT ); Wed, 4 Mar 2015 19:02:34 -0500 Received: from mail-wi0-f174.google.com ([209.85.212.174]:35205 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751771AbbCEACc (ORCPT ); Wed, 4 Mar 2015 19:02:32 -0500 Received: by wibbs8 with SMTP id bs8so3251029wib.0; Wed, 04 Mar 2015 16:02:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=nT/gsnGQowL4x6fhFy6l7IiXfT0sHJuAWz4D+qZg4HQ=; b=osG0+BfH0uqKhz1fW3qdIv4/KdgfB6lPnpPnJZKUXdux2NfYs6I7/VJi9KOoapaIEV LQqcs8qrDBnuSrhE3yGKf/yn20GoFRo/QvPWEydGOTSlsJ5xP9u+vSiqe9xdYxPTZWQy t+aSmrzfwB6teqbMcDSHNh7F1Xc5xyELEWTqVpHjI9WeecFqcC1CpuWa9AoUxXcfuj8M PK6B+CoKewMnpyvXD7oLzl2IoNeEFOHpwdA4ACpDnNiJrveLMJLYXFNHdNbZNlu9mQJt cGpPxKU/SAOFdZS0TXVNhngJUd9COfJYeGMHmiJeTNUbVgVUbXhKKemRx39lM0BghUyX g7Vg== X-Received: by 10.195.11.73 with SMTP id eg9mr12750409wjd.62.1425513750537; Wed, 04 Mar 2015 16:02:30 -0800 (PST) Received: from gmail.com (540334ED.catv.pool.telekom.hu. [84.3.52.237]) by mx.google.com with ESMTPSA id fu1sm6487289wic.2.2015.03.04.16.02.28 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Mar 2015 16:02:29 -0800 (PST) Date: Thu, 5 Mar 2015 01:02:25 +0100 From: Ingo Molnar To: Andrew Morton Cc: Jason Baron , peterz@infradead.org, mingo@redhat.com, viro@zeniv.linux.org.uk, normalperson@yhbt.net, davidel@xmailserver.org, mtk.manpages@gmail.com, luto@amacapital.net, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Linus Torvalds , Alexander Viro Subject: Re: [PATCH v3 0/3] epoll: introduce round robin wakeup mode Message-ID: <20150305000225.GA27592@gmail.com> References: <20150225073814.GA14558@gmail.com> <54EDF7D8.60201@akamai.com> <20150227131034.2f2787dcabf285191a1f6ffa@linux-foundation.org> <54F0E93C.3010306@akamai.com> <20150227143158.afa4ec39d49d5815cbbd6a6c@linux-foundation.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150227143158.afa4ec39d49d5815cbbd6a6c@linux-foundation.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, FSL_HELO_FAKE,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP * Andrew Morton wrote: > On Fri, 27 Feb 2015 17:01:32 -0500 Jason Baron wrote: > > > > > > > > > I don't really understand the need for rotation/round-robin. We can > > > solve the thundering herd via exclusive wakeups, but what is the point > > > in choosing to wake the task which has been sleeping for the longest > > > time? Why is that better than waking the task which has been sleeping > > > for the *least* time? That's probably faster as that task's data is > > > more likely to still be in cache. > > > > > > The changelogs talks about "starvation" but they don't really say what > > > this term means in this context, nor why it is a bad thing. > > > > > I'm still not getting it. > > > So the idea with the 'rotation' is to try and distribute the > > workload more evenly across the worker threads. > > Why? > > > We currently > > tend to wake up the 'head' of the queue over and over and > > thus the workload for us is not evenly distributed. > > What's wrong with that? > > > In fact, we > > have a workload where we have to remove all the epoll sets > > and then re-add them in a different order to improve the situation. > > Why? So my guess would be (but Jason would know this more precisely) that spreading the workload to more tasks in a FIFO manner, the individual tasks can move between CPUs better, and fill in available CPU bandwidth better, increasing concurrency. With the current LIFO distribution of wakeups, the 'busiest' threads will get many wakeups (potentially from different CPUs), making them cache-hot, which may interfere with them easily migrating across CPUs. So while technically both approaches have similar concurrency, the more 'spread out' task hierarchy schedules in a more consistent manner. But ... this is just a wild guess and even if my description is accurate then it should still be backed by robust measurements and observations, before we extend the ABI. This hypothesis could be tested by the patch below: with the patch applied if the performance difference between FIFO and LIFO epoll wakeups disappears, then the root cause is the cache-hotness code in the scheduler. Thanks, Ingo --- -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ee595ef30470..89af04e946d2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5354,7 +5354,7 @@ static int task_hot(struct task_struct *p, struct lb_env *env) lockdep_assert_held(&env->src_rq->lock); - if (p->sched_class != &fair_sched_class) + if (1 || p->sched_class != &fair_sched_class) return 0; if (unlikely(p->policy == SCHED_IDLE))