From patchwork Wed Sep 26 21:08:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10616831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8469615A6 for ; Wed, 26 Sep 2018 21:09:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 747182B774 for ; Wed, 26 Sep 2018 21:09:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 687152B7F2; Wed, 26 Sep 2018 21:09:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C568C2B774 for ; Wed, 26 Sep 2018 21:09:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A88D8E0002; Wed, 26 Sep 2018 17:09:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 257D68E0001; Wed, 26 Sep 2018 17:09:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1471D8E0002; Wed, 26 Sep 2018 17:09:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id DB7118E0001 for ; Wed, 26 Sep 2018 17:09:00 -0400 (EDT) Received: by mail-qt1-f199.google.com with SMTP id e88-v6so393956qtb.1 for ; Wed, 26 Sep 2018 14:09:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id; bh=3SZMKRYMVmDdRPjAvl+mIDUi0cvHC1rXOW2VG9ITyAY=; b=b7kBEblUFjw2EWpwfUUiEk/E45FQmRqhJwVmUayd54Rknb0G48auKiEsa0txcm2uRK ynmDfo9oKpJdE8U/cmXW+BHV8Kbm+LT7YnoocjpwwMn2GaixhLbEcF3lzw2/pGEjTdN4 M4zka9k/xnAxf4onPqV9nkgp8oM8TRNjOljphEXvQiYa9nrqWsG3MA/Nc4Q1113Pvcjn bzKe/TMoaP3flZURc2mxGrncFzzO30d1GF0GIh86A+gdBnNb4TPRrdg8tu/1Y31ts2zy egV1tcLIguVjD6Az4HBAnEtcy+bdfV6I1Qfsn5F4KZPrZ913G1tHfEmOrlXzPaxJAeSg amBA== X-Gm-Message-State: ABuFfohAkdI1yEtcfmDeAUF8NlBTxLAcTzuW2d84g5he6bdVmz5pephY +NKqdkLvwpVBnAom28NS5KgTqjaJJ7ICY9lVBjcdljqPctii2whehe4Wm46W9QTkzzL28W0oNH7 bJvv6RuP1Z70BzRRA86FZ9/mkPaUE3Esy1P9y/Q/MuLqtRPeZo1STbBS9ooZTQjVbK6+NriH8WE O+0WX5CIUFL6KTEP6deMZcDjTi6DXcJWkSQkA2plPeg1+wUzpxl1yUm5gAJ4HBU+AStfFmzBBEX Eyut9BRI6H0StzfymTUbEYtWMIlHKErZW86V1nkAmjQRyBdDSaIVSHXLAkQb2s+F8AlBl+tEw3j Yc14E/2cK65kXWTkXVQuAbk1TxL2vBYxGHmpyrg9ObI7u4rlLLW/hgLIRcSHiQEVzKxmgePert0 c X-Received: by 2002:a37:4c85:: with SMTP id z127-v6mr5815125qka.302.1537996140645; Wed, 26 Sep 2018 14:09:00 -0700 (PDT) X-Received: by 2002:a37:4c85:: with SMTP id z127-v6mr5815090qka.302.1537996139998; Wed, 26 Sep 2018 14:08:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537996139; cv=none; d=google.com; s=arc-20160816; b=hh0bgD+ISwCzoqcYrDwjzbKkmXyhE4j5KLVRRq9XjchSy4UWbm6gtcxqzbNRWtq59K YXKfezEjuJ4mVtHHDCOhbrWPJnheIm7ie49Lgl8Z3/32M5Dd1zivXjSt9BOS/dz1ZepO 5iGyqIiCPGLa7CCvwktJZ2duAu2Q8BuyTaR5tSIMmNsnWtDNIvDjaKZyMj37B740SkMl qmubyOjPxoXSgDOk28259jd5VOmhajO59pPDsc1QOrTXRE/+kkhWTQL+FPeay/lvCaUh qYb3GLPN4YdRlNSuPhEpbCVtWzWIQWIw13fMzhmeV60im42gYqX8SEiMrK9teUpMxpy6 zDeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:to:from:dkim-signature; bh=3SZMKRYMVmDdRPjAvl+mIDUi0cvHC1rXOW2VG9ITyAY=; b=n75w5yCw9KljmmGNcBmolPGxkin6sd/F4vHykHlPeeOk1BSqgxdv3DAgtNj4IAph3L kN/XuW2MmCka00wnj3ZEJwpQcRqXZBnswpOTNhKOspuwyeUIgEIQlqCrbZkZrLrWxln1 H/0a+Yvt/6kE+s252PG/O+pW6+O4IdMyYFpLrJy/maN3j9yEx2QoHkqaimsq/FNwvHgm 6HMz+lpwctxhlLKcBmy4aWLwu8k9vHC9t7uB9e325y9tN6VvX3/MBEyMecl72ZtGII5G SmjlbdCpIqbfppLSWs+9i9CuaCjDTUV7cyBorYypQZaBjRAT+Kf52aonX1ynvjLasKps sfTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=cf0UqgF5; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id l9-v6sor61037qtk.30.2018.09.26.14.08.59 for (Google Transport Security); Wed, 26 Sep 2018 14:08:59 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=cf0UqgF5; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=3SZMKRYMVmDdRPjAvl+mIDUi0cvHC1rXOW2VG9ITyAY=; b=cf0UqgF5vjizMzXoCoy5s9HUKhdGw0IoCF3VyKpHVIjPWKPFgjoc1HJC7VLl39IoKl 2BPgi6+mlIfE1dZSZ4hDJFTW2kWTFI/vj9bly3PC3kXOU+oOkNjEvv5h57YcKixZ5dFP ZKioL3FKLxdFbmsNT964T+bKw0MTjj0qEnP71rvruambf7dCedyENzfjmr3CAaYCocan 0KKg57epUNSVvVmSZ9XPHhqjeJuyScPsDrbe0xvYDKdrspP4WeLb5cwIJ9jfTEUGpugP O2ADcCrG3RumxPRoQfJTIvoufAC2rkU5qzOQR4921VmXJkNnAD24e4PshryVUqKW3Gtq Ztmg== X-Google-Smtp-Source: ACcGV62M6+hEbeqHqVou90I4+huN9MbHVcT7hhBEGLDMQSmUWN0TB4fl/YOjB8uW0onQoW2aCQZr5w== X-Received: by 2002:ac8:3097:: with SMTP id v23-v6mr6132402qta.335.1537996139508; Wed, 26 Sep 2018 14:08:59 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id h37-v6sm65854qtb.37.2018.09.26.14.08.58 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Sep 2018 14:08:58 -0700 (PDT) From: Josef Bacik To: kernel-team@fb.com, linux-kernel@vger.kernel.org, hannes@cmpxchg.org, tj@kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, riel@redhat.com, linux-mm@kvack.org, linux-btrfs@vger.kernel.org Subject: [RFC][PATCH 0/9][V2] drop the mmap_sem when doing IO in the fault path Date: Wed, 26 Sep 2018 17:08:47 -0400 Message-Id: <20180926210856.7895-1-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP v1->v2: - reworked so it only affects x86, since its the only arch I can build and test. - fixed the fact that do_page_mkwrite wasn't actually sending ALLOW_RETRY down to ->page_mkwrite. - fixed error handling in do_page_mkwrite/callers to explicitly catch VM_FAULT_RETRY. - fixed btrfs to set ->cached_page properly. This time I've verified that the ->page_mkwrite retry path is actually getting used (apparently I only verified the read side last time). xfstests is still running but it passed the couple of mmap tests I ran directly. Again this is an RFC, I'm still doing a bunch of testing, but I'd appreciate comments on the overall strategy. -- Original message -- Now that we have proper isolation in place with cgroups2 we have started going through and fixing the various priority inversions. Most are all gone now, but this one is sort of weird since it's not necessarily a priority inversion that happens within the kernel, but rather because of something userspace does. We have giant applications that we want to protect, and parts of these giant applications do things like watch the system state to determine how healthy the box is for load balancing and such. This involves running 'ps' or other such utilities. These utilities will often walk /proc//whatever, and these files can sometimes need to down_read(&task->mmap_sem). Not usually a big deal, but we noticed when we are stress testing that sometimes our protected application has latency spikes trying to get the mmap_sem for tasks that are in lower priority cgroups. This is because any down_write() on a semaphore essentially turns it into a mutex, so even if we currently have it held for reading, any new readers will not be allowed on to keep from starving the writer. This is fine, except a lower priority task could be stuck doing IO because it has been throttled to the point that its IO is taking much longer than normal. But because a higher priority group depends on this completing it is now stuck behind lower priority work. In order to avoid this particular priority inversion we want to use the existing retry mechanism to stop from holding the mmap_sem at all if we are going to do IO. This already exists in the read case sort of, but needed to be extended for more than just grabbing the page lock. With io.latency we throttle at submit_bio() time, so the readahead stuff can block and even page_cache_read can block, so all these paths need to have the mmap_sem dropped. The other big thing is ->page_mkwrite. btrfs is particularly shitty here because we have to reserve space for the dirty page, which can be a very expensive operation. We use the same retry method as the read path, and simply cache the page and verify the page is still setup properly the next pass through ->page_mkwrite(). I've tested these patches with xfstests and there are no regressions. Let me know what you think. Thanks, Josef