From patchwork Wed Apr 27 20:37:08 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jens Axboe <axboe@kernel.dk>
X-Patchwork-Id: 8962671
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 501BA9F1C1
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Wed, 27 Apr 2016 20:37:33 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 4089820125
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Wed, 27 Apr 2016 20:37:32 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 22280200E1
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Wed, 27 Apr 2016 20:37:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753245AbcD0UhN (ORCPT
	<rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
	Wed, 27 Apr 2016 16:37:13 -0400
Received: from mail-io0-f171.google.com ([209.85.223.171]:33897 "EHLO
	mail-io0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752666AbcD0UhL (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 27 Apr 2016 16:37:11 -0400
Received: by mail-io0-f171.google.com with SMTP id 190so52571551iow.1
	for <linux-fsdevel@vger.kernel.org>;
	Wed, 27 Apr 2016 13:37:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=kernel-dk.20150623.gappssmtp.com; s=20150623;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-disposition:in-reply-to;
	bh=m8hsQQSf+5R5Mdx4c4o9G84/vzDO3zKerhgB14q954c=;
	b=WaAt06IzAc9+y76qBbvEx4hcx+uG0g7s8SZfrglOgwAg9ScihdR8aW5avzMRiG3GS+
	6h4qU0lQ8fOKa74+S63+KtspG5r0Dx29JExnsLPqt+sosFc0WiC0K+GTYSQnKaVMuJrB
	J+qflJGdhBfxfpRq4EwNMmp3j0/NL+dBD4j9n8lnDS2TNS3sOmxb2FC6tyK+mQcIM7Hk
	EwovTXrMMCJYnCb5XFy5a9rlNnhj3oY2ON+1mjqfcJxnPCDvd/q+YT90vVzIF+EGLDZf
	pwIYqXzTN1pBYdhImpIGSnsLIHrE8fcrcN/RiuiXLlAjH0WiD9/UE1L8GwuwrTbbpXBv
	E9bg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:date:from:to:cc:subject:message-id:references
	:mime-version:content-disposition:in-reply-to;
	bh=m8hsQQSf+5R5Mdx4c4o9G84/vzDO3zKerhgB14q954c=;
	b=iJkDOjJQKIk60LosksFHh7Ds1+8Z62aSWF2zykPiDZ8Fp/mBbZtq6wICYoMIFYdQmH
	1gtlHTXf6mHyBqTzKCTGfaXjUEMgBOSdulYUonlm75lKqJwr9TgtWdjkEnLshCctzqbT
	Qa6f2VBOAJqxhspMYlXL0j1wkRpmLLZI0oUjc7NTMHEL/uf4ur8iPgd17N9b/vEakNAh
	Cl8/XZwg4HvJpSNZNht7wu8S/Wb35m+pc6/mQz0fmpRPWFK5ZAw8fRdwsnu0KmZ/OvTh
	XmGJDbs0lF2gNuRJCpeVaHZoq4u336uQAhKcRLCOLTBGlC4+TpDsVF0oxAIoKLsPx/YD
	Mk+A==
X-Gm-Message-State: 
 AOPr4FWSefB9wK4kuTwCVwjAtIMzFkTnhGXuNmuITqZLob2E7rhhXGjk8n58BT7aj4ifoQ==
X-Received: by 10.107.16.137 with SMTP id 9mr12695447ioq.75.1461789430278;
	Wed, 27 Apr 2016 13:37:10 -0700 (PDT)
Received: from localhost ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id
	c37sm5361651ioj.23.2016.04.27.13.37.09
	(version=TLS1_2 cipher=AES128-SHA bits=128/128);
	Wed, 27 Apr 2016 13:37:09 -0700 (PDT)
Date: Wed, 27 Apr 2016 14:37:08 -0600
From: Jens Axboe <axboe@kernel.dk>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, dchinner@redhat.com, sedat.dilek@gmail.com
Subject: Re: [PATCHSET v5] Make background writeback great again for the
	first time
Message-ID: <20160427203708.GA25397@kernel.dk>
References: <1461686131-22999-1-git-send-email-axboe@fb.com>
	<20160427180105.GA17362@quack2.suse.cz> <5721021E.8060006@fb.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <5721021E.8060006@fb.com>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY
	autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Wed, Apr 27 2016, Jens Axboe wrote:
> On 04/27/2016 12:01 PM, Jan Kara wrote:
> >Hi,
> >
> >On Tue 26-04-16 09:55:23, Jens Axboe wrote:
> >>Since the dawn of time, our background buffered writeback has sucked.
> >>When we do background buffered writeback, it should have little impact
> >>on foreground activity. That's the definition of background activity...
> >>But for as long as I can remember, heavy buffered writers have not
> >>behaved like that. For instance, if I do something like this:
> >>
> >>$ dd if=/dev/zero of=foo bs=1M count=10k
> >>
> >>on my laptop, and then try and start chrome, it basically won't start
> >>before the buffered writeback is done. Or, for server oriented
> >>workloads, where installation of a big RPM (or similar) adversely
> >>impacts database reads or sync writes. When that happens, I get people
> >>yelling at me.
> >>
> >>I have posted plenty of results previously, I'll keep it shorter
> >>this time. Here's a run on my laptop, using read-to-pipe-async for
> >>reading a 5g file, and rewriting it. You can find this test program
> >>in the fio git repo.
> >
> >I have tested your patchset on my test system. Generally I have observed
> >noticeable drop in average throughput for heavy background writes without
> >any other disk activity and also somewhat increased variance in the
> >runtimes. It is most visible on this simple testcases:
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000
> >
> >and
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> >
> >The machine has 4GB of ram, /mnt is an ext3 filesystem that is freshly
> >created before each dd run on a dedicated disk.
> >
> >Without your patches I get pretty stable dd runtimes for both cases:
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000
> >Runtimes: 87.9611 87.3279 87.2554
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> >Runtimes: 93.3502 93.2086 93.541
> >
> >With your patches the numbers look like:
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000
> >Runtimes: 108.183, 97.184, 99.9587
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> >Runtimes: 104.9, 102.775, 102.892
> >
> >I have checked whether the variance is due to some interaction with CFQ
> >which is used for the disk. When I switched the disk to deadline, I still
> >get some variance although, the throughput is still ~10% lower:
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000
> >Runtimes: 100.417 100.643 100.866
> >
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> >Runtimes: 104.208 106.341 105.483
> >
> >The disk is rotational SATA drive with writeback cache, queue depth of the
> >disk reported in /sys/block/sdb/device/queue_depth is 1.
> >
> >So I think we still need some tweaking on the low end of the storage
> >spectrum so that we don't lose 10% of throughput for simple cases like
> >this.
> 
> Thanks for testing, Jan! I haven't tried old QD=1 SATA. I wonder if
> you are seeing smaller requests, and that is why it both varies and
> you get lower throughput? I'll try and setup a test here similar to
> yours.

Jan, care to try the below patch? I can't fully reproduce your issue on
a SCSI disk limited to QD=1, but I have a feeling this might help. It's
a bit of a hack, but the general idea is to allow one more request to
build up for QD=1 devices. That eliminates wait time between one request
finishing, and the next being submitted.

diff --git a/lib/wbt.c b/lib/wbt.c
index 650da911f24f..6b24c8525ace 100644
--- a/lib/wbt.c
+++ b/lib/wbt.c
@@ -93,23 +93,30 @@ void __wbt_done(struct rq_wb *rwb)
 	 * If the device does write back caching, drop further down
 	 * before we wake people up.
 	 */
-	if (rwb->wc && !atomic_read(&rwb->bdi->wb.dirty_sleeping))
+	if (rwb->queue_depth == 1)
+		limit = 2;
+	else if (rwb->wc && !atomic_read(&rwb->bdi->wb.dirty_sleeping))
 		limit = 0;
 	else
 		limit = rwb->wb_normal;
 
+	inflight = atomic_dec_return(&rwb->inflight);
+
 	/*
-	 * Don't wake anyone up if we are above the normal limit. If
-	 * throttling got disabled (limit == 0) with waiters, ensure
-	 * that we wake them up.
+	 * wbt got disabled with IO in flight. Wake up any potential
+	 * waiters, we don't have to do more than that.
 	 */
-	inflight = atomic_dec_return(&rwb->inflight);
-	if (limit && inflight >= limit) {
-		if (!rwb->wb_max)
-			wake_up_all(&rwb->wait);
+	if (!rwb_enabled(rwb)) {
+		wake_up_all(&rwb->wait);
 		return;
 	}
 
+	/*
+	 * Don't wake anyone up if we are above the normal limit.
+	 */
+	if (inflight >= limit)
+		return;
+
 	if (waitqueue_active(&rwb->wait)) {
 		int diff = limit - inflight;
 
@@ -366,6 +373,9 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw)
 	} else
 		limit = rwb->wb_normal;
 
+	if (rwb->queue_depth == 1)
+		limit = 2;
+
 	return limit;
 }