From patchwork Sat Nov 17 23:53:12 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jens Axboe <axboe@kernel.dk>
X-Patchwork-Id: 10687643
Return-Path: <linux-fsdevel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A030109C
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Sat, 17 Nov 2018 23:53:29 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA9F328433
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Sat, 17 Nov 2018 23:53:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C95622A135; Sat, 17 Nov 2018 23:53:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48DE828433
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Sat, 17 Nov 2018 23:53:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726952AbeKRKLt (ORCPT
        <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
        Sun, 18 Nov 2018 05:11:49 -0500
Received: from mail-pg1-f193.google.com ([209.85.215.193]:44287 "EHLO
        mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726894AbeKRKLt (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Sun, 18 Nov 2018 05:11:49 -0500
Received: by mail-pg1-f193.google.com with SMTP id t13so3529978pgr.11
        for <linux-fsdevel@vger.kernel.org>;
 Sat, 17 Nov 2018 15:53:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=kernel-dk.20150623.gappssmtp.com; s=20150623;
        h=from:to:subject:date:message-id;
        bh=vg0Te61FL5WUMDKITRC/hKkoXhThs6kltKkJ/TOFcHM=;
        b=U6Z0rHEO23HHr5X1a//WPtlEEURKPE4vSUTZvDSN3RJzlfnpn/skXij/Ig5AfZy6b7
         tvc5SVOdhX0Bpjiog1sLRSiFMnnWW38iD4b0P0xh+EMX/D2l7lvDSUspO2Xf5bjcXpz5
         C8GoXJCLNjVXtr5v3GUZCW7l4SYex8omIAH9RPzIEFKIf1SevIwyO3mGmaSUse8fu0S0
         PMxfU+8HwOqVsH7h0WypWFZFminupUTcoIGVJvI994htzBgA/nZOafbvjtf1Fa42duFl
         o3/wVo4GpDf2MgSH1eE7XjCqpeRurKezwhtBDj8lIXi/Awv/xx12o751aIp8U/Cj0Tdw
         vTug==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:subject:date:message-id;
        bh=vg0Te61FL5WUMDKITRC/hKkoXhThs6kltKkJ/TOFcHM=;
        b=NwqHeKcQ0EW5kjgOAY1UmK/28vhmgdXW4Zxzf/uXIQHhWTWzV2qAzpsQGcvb3qukFs
         Ohq08s2suanUXftu2M7yt7H/oKYET88UD48ixuHVjRU55jXTvdNPRTvwlRvZHKGkMjKp
         Ek0A/iMHCZjwf7/tc1i+Nsye6FRt20Ff5FRiXvAlKSfpXpamprnNUXb1XUJr5IXhgcJa
         WBC/2mneM+3chYa0/ZZiNzDvvuhTHZq3QoUfKIrI2hiFXYgEW7h1KWK+JLu145crVwm0
         JLBpZo43U46ObOcQHvkqmWKsw96FgXb+ze4ve8912yMjnaLYw5F0gcblMfJrKKZ3Icml
         Sf9A==
X-Gm-Message-State: AGRZ1gKH4I8QvhJBm7WZCboag1/yiDt0eMb1f/Vnrp7FKVJnbNGfJO5Q
        DZaIn6JbBFZ6PhZZgFvx11l/fw==
X-Google-Smtp-Source: 
 AJdET5fwCjk6lLhJEe6NFnkRfOsGuNnxI2dj+E50IRgOI/5TBDIsbb7VTWti3YrjJwnMwBWWK8uuCA==
X-Received: by 2002:a62:a511:: with SMTP id
 v17-v6mr16994291pfm.18.1542498803504;
        Sat, 17 Nov 2018 15:53:23 -0800 (PST)
Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166])
        by smtp.gmail.com with ESMTPSA id
 m1sm15812345pgn.9.2018.11.17.15.53.21
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Sat, 17 Nov 2018 15:53:21 -0800 (PST)
From: Jens Axboe <axboe@kernel.dk>
To: linux-block@vger.kernel.org, linux-aio@kvack.org,
        linux-fsdevel@vger.kernel.org
Subject: [PATCHSET 0/5] Support for polled aio
Date: Sat, 17 Nov 2018 16:53:12 -0700
Message-Id: <20181117235317.7366-1-axboe@kernel.dk>
X-Mailer: git-send-email 2.17.1
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Up until now, IO polling has been exclusively available through preadv2
and pwrite2, both fully synchronous interfaces. This works fine for
completely synchronous use cases, but that's about it. If QD=1 wasn't
enough read the performance goals, the only alternative was to increase
the thread count. Unfortunately, that isn't very efficient, both in
terms of CPU utilization (each thread will use 100% of CPU time) and in
terms of achievable performance.

With all of the recent advances in polling (non-irq polling, efficiency
gains, multiple pollable queues, etc), it's now feasible to add polling
support to aio - this patchset just does that.

An iocb flag is added, IOCB_FLAG_HIPRI, similarly to how we have
RWF_HIPRI for preadv2/pwritev2. It's applicable to the commands that
read/write data, like IOCB_CMD_PREAD/IOCB_CMD_PWRITE and the vectored
variants.  Submission works the same as before.

The polling happens off io_getevents(), when the application is looking
for completions. That also works like before, with the only difference
being that events aren't waited for, they are actively found and polled
on the device side.

The only real difference in terms of completions is that polling does
NOT use the libaio user exposed ring. This is just not feasible, as the
application needs to be the one that actively polls for the events.
Because of this, that's not supported with polling, and the internals
completely ignore the ring.

Outside of that, it's illegal to mix polled with non-polled IO on the
same io_context. There's no way to setup an io_context with the
information that we will be polling on it (always add flags to new
syscalls...), hence we need to track this internally. For polled IO, we
can never wait for events, we have to actively find them. I didn't want
to add counters to the io_context to inc/dec for each IO, so I just made
this illegal. If an application attempts to submit both polled and
non-polled IO on the same io_context, it will get an -EINVAL return at
io_submit() time.

Performance results have been very promising. For an internal Facebook
flash storage device, we're getting 20% increase in performance, with an
identical reduction in latencies. Notably, this is testing a highly
tuned setup to just turning on polling. I'm sure there's still extra
room for performance there. Note that at these speeds and feeds, the
polling ends up NOT using more CPU time than we did without polling!

On that same box, I ran microbenchmarks, and was able to increase peak
performance 25%. The box was pegged at around 2.4M IOPS, with just
turning on polling, the bandwidth was maxed out at 12.5GB/sec doing 3.2M
IOPS. All of this with 2 millions LESS interrupts/seconds, and 2M+ less
context switches.

In terms of efficiency, a tester was able to get 800K+ IOPS out of a
_single_ thread at QD=16 on a device. These kinds of results are just
unheard of in terms of efficiency.

You can find this code in my aio-poll branch, and that branch (and
these patches) are on top of my mq-perf branch.

 fs/aio.c                     | 495 ++++++++++++++++++++++++++++++++---
 fs/block_dev.c               |   2 +
 fs/direct-io.c               |   4 +-
 fs/iomap.c                   |   7 +-
 include/linux/fs.h           |   1 +
 include/uapi/linux/aio_abi.h |   2 +
 6 files changed, 478 insertions(+), 33 deletions(-)