[v2] mm: implement write-behind policy for sequential file writes

Traditional writeback tries to accumulate as much dirty data as possible.
This is worth strategy for extremely short-living files and for batching
writes for saving battery power. But for workloads where disk latency is
important this policy generates periodic disk load spikes which increases
latency for concurrent operations.

Also dirty pages in file cache cannot be reclaimed and reused immediately.
This way massive I/O like file copying affects memory allocation latency.

Present writeback engine allows to tune only dirty data size or expiration
time. Such tuning cannot eliminate spikes - this just lowers and multiplies
them. Other option is switching into sync mode which flushes written data
right after each write, obviously this have significant performance impact.
Such tuning is system-wide and affects memory-mapped and randomly written
files, flusher threads handle them much better.

This patch implements write-behind policy which tracks sequential writes
and starts background writeback when file have enough dirty pages.

Global switch in sysctl vm.dirty_write_behind:
=0: disabled, default
=1: enabled for strictly sequential writes (append, copying)
=2: enabled for all sequential writes

The only parameter is window size: maximum amount of dirty pages behind
current position and maximum amount of pages in background writeback.

Setup is per-disk in sysfs in file /sys/block/$DISK/bdi/write_behind_kb.
Default: 16MiB, '0' disables write-behind for this disk.

When amount of unwritten pages exceeds window size write-behind starts
background writeback for max(excess, max_sectors_kb) and then waits for
the same amount of background writeback initiated at previously.

 |<-wait-this->|           |<-send-this->|<---pending-write-behind--->|
 |<--async-write-behind--->|<--------previous-data------>|<-new-data->|
              current head-^    new head-^              file position-^

Remaining tail pages are flushed at closing file if async write-behind was
started or this is new file and it is at least max_sectors_kb long.

Overall behavior depending on total data size:
< max_sectors_kb - no writes
> max_sectors_kb - write new files in background after close
> write_behind_kb - streaming write, write tail at close

Special cases:

* files with POSIX_FADV_RANDOM, O_DIRECT, O_[D]SYNC are ignored

* writing cursor for O_APPEND is aligned to covers previous small appends
  Append might happen via multiple files or via new file each time.

* mode vm.dirty_write_behind=1 ignores non-append writes
  This reacts only to completely sequential writes like copying files,
  writing logs with O_APPEND or rewriting files after O_TRUNC.

Note: ext4 feature "auto_da_alloc" also writes cache at closing file
after truncating it to 0 and after renaming one file over other.

Changes since v1 (2017-10-02):
* rework window management:
* change default window 1MiB -> 16MiB
* change default request 256KiB -> max_sectors_kb
* drop always-async behavior for O_NONBLOCK
* drop handling POSIX_FADV_NOREUSE (should be in separate patch)
* ignore writes with O_DIRECT, O_SYNC, O_DSYNC
* align head position for O_APPEND
* add strictly sequential mode
* write tail pages for new files
* make void, keep errors at mapping

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: https://lore.kernel.org/patchwork/patch/836149/ (v1)
---
 Documentation/ABI/testing/sysfs-class-bdi |    5 +
 Documentation/admin-guide/sysctl/vm.rst   |   15 +++
 fs/file_table.c                           |    2 
 include/linux/backing-dev-defs.h          |    1 
 include/linux/fs.h                        |    8 +-
 include/linux/mm.h                        |    1 
 kernel/sysctl.c                           |    9 ++
 mm/backing-dev.c                          |   43 +++++----
 mm/filemap.c                              |  136 +++++++++++++++++++++++++++++
 9 files changed, 199 insertions(+), 21 deletions(-)

Message ID	156896493723.4334.13340481207144634918.stgit@buzz (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=EfpA=XP=vger.kernel.org=linux-fsdevel-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A2E1195A for <patchwork-linux-fsdevel@patchwork.kernel.org>; Fri, 20 Sep 2019 07:35:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 64789208C3 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Fri, 20 Sep 2019 07:35:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="pCsSYF8o" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2395097AbfITHfo (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Fri, 20 Sep 2019 03:35:44 -0400 Received: from forwardcorp1p.mail.yandex.net ([77.88.29.217]:39034 "EHLO forwardcorp1p.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394988AbfITHfo (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Fri, 20 Sep 2019 03:35:44 -0400 Received: from mxbackcorp1j.mail.yandex.net (mxbackcorp1j.mail.yandex.net [IPv6:2a02:6b8:0:1619::162]) by forwardcorp1p.mail.yandex.net (Yandex) with ESMTP id C5F672E1461; Fri, 20 Sep 2019 10:35:38 +0300 (MSK) Received: from sas1-7fab0cd91cd2.qloud-c.yandex.net (sas1-7fab0cd91cd2.qloud-c.yandex.net [2a02:6b8:c14:3a93:0:640:7fab:cd9]) by mxbackcorp1j.mail.yandex.net (nwsmtp/Yandex) with ESMTP id CoddOUhSFp-ZbEKFIug; Fri, 20 Sep 2019 10:35:38 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1568964938; bh=Hj95Qr+4U0tc98/OWdUgA3oRvIm+K5SYPEfdofeqSyY=; h=Message-ID:Date:To:From:Subject:Cc; b=pCsSYF8oArAGcMxaW1WuB+00ZUCoz9KM7+WRFxLF9oMod48eLZ8YimczxNfWDBDE6 GToDlnclZ/ud2ahFvsYU4FX2d6hxd7MVJF8kTSnapMUKMX4Gi8mpKnqfmz3cVWXgHl fmDthuzHo1G8sukmofu9OxcpXORB3YmriS75uwFY= Authentication-Results: mxbackcorp1j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-red.dhcp.yndx.net (dynamic-red.dhcp.yndx.net [2a02:6b8:0:40c:344a:8fe6:6594:f7b2]) by sas1-7fab0cd91cd2.qloud-c.yandex.net (nwsmtp/Yandex) with ESMTPSA id c7BAw2bpK5-ZbIqwOKk; Fri, 20 Sep 2019 10:35:37 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) Subject: [PATCH v2] mm: implement write-behind policy for sequential file writes From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk>, Michal Hocko <mhocko@suse.com>, Dave Chinner <david@fromorbit.com>, Mel Gorman <mgorman@suse.de>, Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 20 Sep 2019 10:35:37 +0300 Message-ID: <156896493723.4334.13340481207144634918.stgit@buzz> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	[v2] mm: implement write-behind policy for sequential file writes \| expand [v2] mm: implement write-behind policy for sequential file writes

[v2] mm: implement write-behind policy for sequential file writes

Commit Message

Comments

Patch