From patchwork Fri Nov 1 17:34:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11223471 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 59D62913 for ; Fri, 1 Nov 2019 17:34:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C2B7217F9 for ; Fri, 1 Nov 2019 17:34:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GBwLsHOx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727209AbfKAReL (ORCPT ); Fri, 1 Nov 2019 13:34:11 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:40903 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729421AbfKAReK (ORCPT ); Fri, 1 Nov 2019 13:34:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1572629649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=BnSm/pyC1zFaEEQQT2Q7WTcKwm88OoHgQ5K89nMoPpE=; b=GBwLsHOxsYs1U3x9vffWGhCz+x81LTe+BqZ7GCxdpzXr6YnG1cN9l0ZPMHHpu1QzkU2diC zrR8laUf40+DYzYbMRn6tkp029QDv5oFAWKWLVSDnFbe0X3n7Dx3MEC8k4/U81tSy2FZjn 9ej/An7+kpuHgfj5dC95SbmkjTiMj+M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-198-TbZ0HN8tPs2XGeG71slkgg-1; Fri, 01 Nov 2019 13:34:05 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BE672800EB4; Fri, 1 Nov 2019 17:34:03 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-40.rdu2.redhat.com [10.10.121.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id D8DC6600D1; Fri, 1 Nov 2019 17:34:00 +0000 (UTC) Subject: [RFC PATCH 00/11] pipe: Notification queue preparation [ver #3] From: David Howells To: torvalds@linux-foundation.org Cc: dhowells@redhat.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , nicolas.dichtel@6wind.com, raven@themaw.net, Christian Brauner , dhowells@redhat.com, keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Fri, 01 Nov 2019 17:34:00 +0000 Message-ID: <157262963995.13142.5568934007158044624.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-MC-Unique: TbZ0HN8tPs2XGeG71slkgg-1 X-Mimecast-Spam-Score: 0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Here's a set of preparatory patches for building a general notification queue on top of pipes. It makes a number of significant changes: (1) It removes the nr_exclusive argument from __wake_up_sync_key() as this is always 1. This prepares for step 2. (2) Adds wake_up_interruptible_sync_poll_locked() so that poll can be woken up from a function that's holding the poll waitqueue spinlock. (3) Change the pipe buffer ring to be managed in terms of unbounded head and tail indices rather than bounded index and length. This means that reading the pipe only needs to modify one index, not two. (4) A selection of helper functions are provided to query the state of the pipe buffer, plus a couple to apply updates to the pipe indices. (5) The pipe ring is allowed to have kernel-reserved slots. This allows many notification messages to be spliced in by the kernel without allowing userspace to pin too many pages if it writes to the same pipe. (6) Advance the head and tail indices inside the pipe waitqueue lock and use step 2 to poke poll without having to take the lock twice. (7) Rearrange pipe_write() to preallocate the buffer it is going to write into and then drop the spinlock. This allows kernel notifications to then be added the ring whilst it is filling the buffer it allocated. The read side is stalled because the pipe mutex is still held. (8) Don't wake up readers on a pipe if there was already data in it when we added more. (9) Don't wake up writers on a pipe if the ring wasn't full before we removed a buffer. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-pipe-prep PATCHES BENCHMARK BEST TOTAL BYTES AVG BYTES STDDEV ======= =============== =============== =============== =============== =============== - pipe 307457969 36348556755 302904639 10622403 - splice 287117614 26933658717 224447155 160777958 - vmsplice 435180375 51302964090 427524700 19083037 rm-nrx pipe 311091179 37093181356 309109844 7221622 rm-nrx splice 285628049 27916298942 232635824 158296431 rm-nrx vmsplice 417703153 47570362546 396419687 33960822 wakesl pipe 310698731 36772541631 306437846 8249347 wakesl splice 286193726 28600435451 238336962 141169318 wakesl vmsplice 436175803 50723895824 422699131 40724240 ht pipe 305534565 36426079543 303550662 5673885 ht splice 243632025 23319439010 194328658 150479853 ht vmsplice 432825176 49101781001 409181508 44102509 k-rsv pipe 308691523 36652267561 305435563 12972559 k-rsv splice 244793528 23625172865 196876440 125319143 k-rsv vmsplice 436119082 49460808579 412173404 55547525 r-adv-t pipe 310094218 36860182219 307168185 8081101 r-adv-t splice 285527382 27085052687 225708772 206918887 r-adv-t vmsplice 336885948 40128756927 334406307 5895935 r-cond pipe 308727804 36635828180 305298568 9976806 r-cond splice 284467568 28445793054 237048275 200284329 r-cond vmsplice 449679489 51134833848 426123615 66790875 w-preal pipe 307416578 36662086426 305517386 6216663 w-preal splice 282655051 28455249109 237127075 194154549 w-preal vmsplice 437002601 47832160621 398601338 96513019 w-redun pipe 307279630 36329750422 302747920 8913567 w-redun splice 284324488 27327152734 227726272 219735663 w-redun vmsplice 451141971 51485257719 429043814 51388217 w-ckful pipe 305055247 36374947350 303124561 5400728 w-ckful splice 281575308 26841554544 223679621 215942886 w-ckful vmsplice 436653588 47564907110 396374225 82255342 The patches column indicates the point in the patchset at which the benchmarks were taken: 0 No patches rm-nrx "Remove the nr_exclusive argument from __wake_up_sync_key()" wakesl "Add wake_up_interruptible_sync_poll_locked()" ht "pipe: Use head and tail pointers for the ring, not cursor and length" k-rsv "pipe: Allow pipes to have kernel-reserved slots" r-adv-t "pipe: Advance tail pointer inside of wait spinlock in pipe_read()" r-cond "pipe: Conditionalise wakeup in pipe_read()" w-preal "pipe: Rearrange sequence in pipe_write() to preallocate slot" w-redun "pipe: Remove redundant wakeup from pipe_write()" w-ckful "pipe: Check for ring full inside of the spinlock in pipe_write()" Changes: ver #3: (*) Get rid of pipe_commit_{read,write}. (*) Port the virtio_console driver. (*) Fix pipe_zero(). (*) Amend some comments. (*) Added an additional patch that changes the threshold at which readers wake writers for Konstantin Khlebnikov. ver #2: (*) Split the notification patches out into a separate branch. (*) Removed the nr_exclusive parameter from __wake_up_sync_key(). (*) Renamed the locked wakeup function. (*) Add helpers for empty, full, occupancy. (*) Split the addition of ->max_usage out into its own patch. (*) Fixed some bits pointed out by Rasmus Villemoes. ver #1: (*) Build on top of standard pipes instead of having a driver. David --- David Howells (11): pipe: Reduce #inclusion of pipe_fs_i.h Remove the nr_exclusive argument from __wake_up_sync_key() Add wake_up_interruptible_sync_poll_locked() pipe: Use head and tail pointers for the ring, not cursor and length pipe: Allow pipes to have kernel-reserved slots pipe: Advance tail pointer inside of wait spinlock in pipe_read() pipe: Conditionalise wakeup in pipe_read() pipe: Rearrange sequence in pipe_write() to preallocate slot pipe: Remove redundant wakeup from pipe_write() pipe: Check for ring full inside of the spinlock in pipe_write() pipe: Increase the writer-wakeup threshold to reduce context-switch count drivers/char/virtio_console.c | 16 +- fs/exec.c | 1 fs/fuse/dev.c | 31 +++-- fs/ocfs2/aops.c | 1 fs/pipe.c | 228 +++++++++++++++++++++-------------- fs/splice.c | 190 ++++++++++++++++++----------- include/linux/pipe_fs_i.h | 64 +++++++++- include/linux/uio.h | 4 - include/linux/wait.h | 11 +- kernel/exit.c | 2 kernel/sched/wait.c | 37 ++++-- lib/iov_iter.c | 269 +++++++++++++++++++++++------------------ security/smack/smack_lsm.c | 1 13 files changed, 527 insertions(+), 328 deletions(-)