From patchwork Fri Sep 18 01:50:29 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Linus Torvalds X-Patchwork-Id: 7212331 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id CC7F3BEEC1 for ; Fri, 18 Sep 2015 01:50:47 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 23B0B20569 for ; Fri, 18 Sep 2015 01:50:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4E3D72055D for ; Fri, 18 Sep 2015 01:50:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752484AbbIRBub (ORCPT ); Thu, 17 Sep 2015 21:50:31 -0400 Received: from mail-io0-f176.google.com ([209.85.223.176]:32981 "EHLO mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752043AbbIRBua (ORCPT ); Thu, 17 Sep 2015 21:50:30 -0400 Received: by iofh134 with SMTP id h134so42058394iof.0; Thu, 17 Sep 2015 18:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=lL34cbUMNxHJx9FxV6HRAOVUVZCEv4jFsyobEQWedQY=; b=DvWuTV6RDV6ntvnrrMJQxluT5sWfLuoea5Wrh+EnrWJ1PwAuXkRVQpJAdEyi2PV5EC zup+CLpCECNrCfccH7LBTI7l5aVwqIZjm9ePs78wGX+k/c2RH1OU8VJHJfbiqeodXX6x izq3GsZrxQSjYNS3r6kewWjOeFU5n6aQpGv9QR8S/rg7pP8//sgyFr4CuRHmwck6EIyU w/0u1uhDmLeivpl/ar7AmDmpuCZ3oeAptpXrPZPjinuyN2dMU8Hj0tGssTRT8VYfdK4k vgUekXXnJkLyzTYuL6SAX90dOuiApOBsbCVnKpbSb/XQNp/sem4dkRoLOc1xNgiJoF0G a22w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=lL34cbUMNxHJx9FxV6HRAOVUVZCEv4jFsyobEQWedQY=; b=EOU5228vImd8ZbayRYCh89cuRGrnwQ812Van6GA+qMBWX7VA0IvhE6y4kqCtx3YijO o/StS66aKhTmPA2FPysfTlKMIk1rJL6NNgPUpxqwYlNQv1fWc98Qz/10HHxBCwK5HdP9 34PamaDRhdQT2HoyS2BHP8LpFQ3FC1HGmzwvA= MIME-Version: 1.0 X-Received: by 10.107.15.170 with SMTP id 42mr12338321iop.137.1442541029800; Thu, 17 Sep 2015 18:50:29 -0700 (PDT) Received: by 10.36.124.195 with HTTP; Thu, 17 Sep 2015 18:50:29 -0700 (PDT) In-Reply-To: <20150918003735.GR3902@dastard> References: <20150916195806.GD29530@quack.suse.cz> <20150916200012.GB8624@ret.masoncoding.com> <20150916220704.GM3902@dastard> <20150917003738.GN3902@dastard> <20150917021453.GO3902@dastard> <20150917224230.GF8624@ret.masoncoding.com> <20150917235647.GG8624@ret.masoncoding.com> <20150918003735.GR3902@dastard> Date: Thu, 17 Sep 2015 18:50:29 -0700 X-Google-Sender-Auth: Cwh4zeCxi-FzS1XsRMbnLDOfsTE Message-ID: Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug() From: Linus Torvalds To: Dave Chinner Cc: Chris Mason , Jan Kara , Josef Bacik , LKML , linux-fsdevel , Neil Brown , Christoph Hellwig , Tejun Heo Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Sep 17, 2015 at 5:37 PM, Dave Chinner wrote: >> > >> > I'm not seeing why that should be an issue. Sure, there's some CPU >> > overhead to context switching, but I don't see that it should be that >> > big of a deal. > > It may well change the dispatch order of enough IOs for it to be > significant on an IO bound device. Hmm. Maybe. We obviously try to order the IO's a bit by inode, and I could see the use of a workqueue maybe changing that sufficiently. But it still sounds a bit unlikely. And in fact, I think I have a better explanation. > In outright performance on my test machine, the difference in > files/s is noise. However, the consistency looks to be substantially > improved and the context switch rate is now running at under > 3,000/sec. Hmm. I don't recall seeing you mention how many context switches per second you had before. What is it down from? However, I think I may have found something more interesting here. The fact is that a *normal* schedule will trigger that whole blk_schedule_flush_plug(), but a cond_sched() or a cond_sched_lock() doesn't actually do a normal schedule at all. Those trigger a *preemption* schedule. And a preemption schedule does not trigger that unplugging at all. Why? A kernel "preemption" very much tries to avoid touching thread state, because the whole point is that normally we may be preempting threads in random places, so we don't run things like sched_submit_work(), because the thread may be in the middle of *creating* that work, and we don't want to introduce races. The preemption scheduling can also be done with "task->state" set to sleeping, and it won't actually sleep. Now, for the explicit schedules like "cond_resched()" and "cond_resched_lock()", those races with obviously don't exist, but they happen to share the same preemption scheduling logic. So it turns out that as far as I can see, the whole "cond_resched()" will not start any IO at all, and it will just be left on the thread plug until we schedule back to the thread. So I don't think this has anything to do with kblockd_workqueue. I don't think it even gets to that point. I may be missing something, but just to humor me, can you test the attached patch *without* Chris's patch to do explicit plugging? This should make cond_resched() and cond_resched_lock() run the unplugging. It may be entirely broken, I haven't thought this entirely through. Linus diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 97d276ff1edb..388ea9e7ab8a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4548,6 +4548,7 @@ SYSCALL_DEFINE0(sched_yield) int __sched _cond_resched(void) { if (should_resched(0)) { + sched_submit_work(current); preempt_schedule_common(); return 1; } @@ -4572,9 +4573,10 @@ int __cond_resched_lock(spinlock_t *lock) if (spin_needbreak(lock) || resched) { spin_unlock(lock); - if (resched) + if (resched) { + sched_submit_work(current); preempt_schedule_common(); - else + } else cpu_relax(); ret = 1; spin_lock(lock);