From patchwork Tue Jun 11 12:24:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m_=28VMware=29?= X-Patchwork-Id: 10986763 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A284C14DB for ; Tue, 11 Jun 2019 12:25:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90D262862A for ; Tue, 11 Jun 2019 12:25:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 844C128831; Tue, 11 Jun 2019 12:25:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5E492862A for ; Tue, 11 Jun 2019 12:25:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D71D16B0269; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C5E246B0008; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB0516B0010; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by kanga.kvack.org (Postfix) with ESMTP id 3B7236B0008 for ; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) Received: by mail-lj1-f200.google.com with SMTP id h14so209616lja.9 for ; Tue, 11 Jun 2019 05:25:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=nXeAn+b9rzfB9xUtBdzEuMbNYPo06z76iQWWf55VKQQ=; b=ESyhfc+Lej9XGqyXZjmmODcJ8VZ9rOF+vDlAKDwELSF8JYweSGHLL+rtvj3r3XuwB7 tno8jJvEXEh1ARNvlfx4czgI20qPYr1cNkGXgCG/0G1QwIO/qfaCHuTt4MELE4uJY0i9 gs47New+ecANmwoPkeRCrn22TO8zHUxgd2Hrl1Wj2HwQBNgwrD50ps72N6kGdPh+D2ms O0SEYny//4EeKW92POKIGfCkQTgSrVUwx7SF5us50HCsfQlDtkS8qZ73H/dOC2wCob2X /iCZ7/T8Ov3RalAk4IK12rqyojstHSFTpu0+2XyqOGcIjkHalS3l9nBRS2R11v0al1Bn zc9Q== X-Gm-Message-State: APjAAAUu7gLlN94iduLwWce4xCSmFu32r02KSc2x0frBF2JrIgjQkX5h bfiElQLb8/SvQjPT7FG8yb3EjI9w8uTm9I6C+kwKzK4ZxvO1ZkNvScgxhRt1pJrmcYiSBJxvsJc EETFMOy8LSDiSkS73huUGaeh75uUJmqphxFuNmlIVfmP3/LJUf6DbL83wP2C/3I+OJg== X-Received: by 2002:ac2:4ace:: with SMTP id m14mr29091135lfp.99.1560255930565; Tue, 11 Jun 2019 05:25:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqzal8+5ebyH7Peq/Rv9Zm6tuVaz+jUTznumQnp9tDgYS2XV1/jc9gHPCY4zYgW1rh77lE7t X-Received: by 2002:ac2:4ace:: with SMTP id m14mr29091079lfp.99.1560255929018; Tue, 11 Jun 2019 05:25:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560255929; cv=none; d=google.com; s=arc-20160816; b=IcEgXnUq0ZAHo5DDXYpGN47Ee89CCGTZYrYI8F4VN8KJuRdv40kjgAFcQXx0KTcwNj FggGhmwNoB34zJsGcsN9uN3/Etezvg2Az1UXMacrBJWl1++xgFYHus48wMonlF2uDre3 TNUn59i8gSQn2RecjJQoVCFrtPK24StYxBmyhZslEetj1DIYLqMeN7+sb5pA+NXq+Yld RwZ9/BlAtxRFEgcKRV5P8A7UKPgeESYn0GH59teC1bkq0+glg38oZx06wE2T+rrctLs3 RN2y0nMx02vn8eyCW1ihtUvWh50qI64dHZFPuZrUtuOOCkdl/YGrX5RX3v1baR4dv/Hg x6QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=nXeAn+b9rzfB9xUtBdzEuMbNYPo06z76iQWWf55VKQQ=; b=dqPRv3iIqpnaoBVVFymoDxJsJBsYr7IWZdsmukrWshEmBORertOIY2O/6F7AsTiyv4 XcDjUH0goN6iiSAfJoCzJJNlmPl/orbqx6f3QKCIJqrYyTHFCp4yDxWjU64gjv+OvpUJ 8sFY/7vzIsspfdmCePyBUgIxnM/bxV3bfbX0RQRtf/IvURwrD3s+0YVbWmOkF52ksTQR lIDrPdgzQyeI6jIQ5nIMFfbDH1pzG4v/Mgj2rW0WIUubeZhOVeVttbwjjgXGqZUEnS8Q YZ4JEX2xrNvb9vEuJlUgJhyxTbqJhht1FUG0u/3qGE1p0eQbzW3yERgS/JK6KiezLyvn /2qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vmwopensource.org header.s=mail header.b=HpZERFbf; spf=pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.70 as permitted sender) smtp.mailfrom=thellstrom@vmwopensource.org Received: from ste-pvt-msa1.bahnhof.se (ste-pvt-msa1.bahnhof.se. [213.80.101.70]) by mx.google.com with ESMTPS id f15si12711310ljj.159.2019.06.11.05.25.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Jun 2019 05:25:28 -0700 (PDT) Received-SPF: pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.70 as permitted sender) client-ip=213.80.101.70; Authentication-Results: mx.google.com; dkim=pass header.i=@vmwopensource.org header.s=mail header.b=HpZERFbf; spf=pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.70 as permitted sender) smtp.mailfrom=thellstrom@vmwopensource.org Received: from localhost (localhost [127.0.0.1]) by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTP id 2CC533F4E9; Tue, 11 Jun 2019 14:25:23 +0200 (CEST) Authentication-Results: ste-pvt-msa1.bahnhof.se; dkim=pass (1024-bit key; unprotected) header.d=vmwopensource.org header.i=@vmwopensource.org header.b=HpZERFbf; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at bahnhof.se Received: from ste-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (ste-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5Yf27d0c83ds; Tue, 11 Jun 2019 14:25:09 +0200 (CEST) Received: from mail1.shipmail.org (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) (Authenticated sender: mb878879) by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id 7AA2B3F4D4; Tue, 11 Jun 2019 14:25:08 +0200 (CEST) Received: from localhost.localdomain.localdomain (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) by mail1.shipmail.org (Postfix) with ESMTPSA id 184593619C2; Tue, 11 Jun 2019 14:25:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=vmwopensource.org; s=mail; t=1560255908; bh=Yjmtl92sLAlEoB99F5XOnCklObajAbb4hpy0froOh7I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HpZERFbfctFUuY5BHsAZYFpqZOEK+RF4iSdbsPLxbQM3huaaTreM0SlaTqnbw9qkc OhuLbs/xeGddT/D90s6VBlW7b6ICuwZpG+YTBb/kJw4eVsBTIrDhJ+6kfsWbHCfWEx EYx9Koa+3No3QMUm+EmR+8Rwkdzmz9gfzQIfZV0s= From: =?utf-8?q?Thomas_Hellstr=C3=B6m_=28VMware=29?= To: dri-devel@lists.freedesktop.org Cc: linux-graphics-maintainer@vmware.com, pv-drivers@vmware.com, linux-kernel@vger.kernel.org, Thomas Hellstrom , Andrew Morton , Matthew Wilcox , Will Deacon , Peter Zijlstra , Rik van Riel , Minchan Kim , Michal Hocko , Huang Ying , Souptick Joarder , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , linux-mm@kvack.org, Ralph Campbell Subject: [PATCH v4 1/9] mm: Allow the [page|pfn]_mkwrite callbacks to drop the mmap_sem Date: Tue, 11 Jun 2019 14:24:46 +0200 Message-Id: <20190611122454.3075-2-thellstrom@vmwopensource.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190611122454.3075-1-thellstrom@vmwopensource.org> References: <20190611122454.3075-1-thellstrom@vmwopensource.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Thomas Hellstrom Driver fault callbacks are allowed to drop the mmap_sem when expecting long hardware waits to avoid blocking other mm users. Allow the mkwrite callbacks to do the same by returning early on VM_FAULT_RETRY. In particular we want to be able to drop the mmap_sem when waiting for a reservation object lock on a GPU buffer object. These locks may be held while waiting for the GPU. Cc: Andrew Morton Cc: Matthew Wilcox Cc: Will Deacon Cc: Peter Zijlstra Cc: Rik van Riel Cc: Minchan Kim Cc: Michal Hocko Cc: Huang Ying Cc: Souptick Joarder Cc: "Jérôme Glisse" Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Thomas Hellstrom Reviewed-by: Ralph Campbell --- mm/memory.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ddf20bd0c317..168f546af1ad 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2238,7 +2238,7 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) ret = vmf->vma->vm_ops->page_mkwrite(vmf); /* Restore original flags so that caller is not surprised */ vmf->flags = old_flags; - if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) + if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) return ret; if (unlikely(!(ret & VM_FAULT_LOCKED))) { lock_page(page); @@ -2515,7 +2515,7 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); vmf->flags |= FAULT_FLAG_MKWRITE; ret = vma->vm_ops->pfn_mkwrite(vmf); - if (ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)) + if (ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)) return ret; return finish_mkwrite_fault(vmf); } @@ -2536,7 +2536,8 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); tmp = do_page_mkwrite(vmf); if (unlikely(!tmp || (tmp & - (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) { + (VM_FAULT_ERROR | VM_FAULT_NOPAGE | + VM_FAULT_RETRY)))) { put_page(vmf->page); return tmp; } @@ -3601,7 +3602,8 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf) unlock_page(vmf->page); tmp = do_page_mkwrite(vmf); if (unlikely(!tmp || - (tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) { + (tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | + VM_FAULT_RETRY)))) { put_page(vmf->page); return tmp; } From patchwork Tue Jun 11 12:24:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m_=28VMware=29?= X-Patchwork-Id: 10986769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9173014DB for ; Tue, 11 Jun 2019 12:25:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 81CF22862A for ; Tue, 11 Jun 2019 12:25:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7625C28857; Tue, 11 Jun 2019 12:25:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77E592862A for ; Tue, 11 Jun 2019 12:25:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 834996B000C; Tue, 11 Jun 2019 08:25:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D2D86B000A; Tue, 11 Jun 2019 08:25:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 001AD6B000C; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by kanga.kvack.org (Postfix) with ESMTP id 5DBE86B000C for ; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) Received: by mail-lj1-f199.google.com with SMTP id 77so278217ljf.0 for ; Tue, 11 Jun 2019 05:25:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=3Sclql3dvSZFf/V+1y7l8v94/Rp4TDjyKr6qABVkiR4=; b=TKvUQwd1iHEJocqPFUX09RII+xV7zJcAeMybXWURmbIvWJM7VXsytJHZC2EKQyaBLz BNr5jFPzWZUEvSgb7pSCfm/+8q9TVTVZ2x6Z15osL3rDYfiTOYLZc9o+nSQOBCO2RmZ2 LpHPSBF++bamzqm+2DUr7wU13j2Bwf7IEDnxYQeNPVfFW16FZYeZihGLKJSefbyjbwBP FYLKuZe12P046kKEf7+zz76eF9XB02qQIPXeM6hWNybAyL9Y0m5WmW/wQ/dzdbxWNMuD ZgsLcXmQHbSvZSM4jhH95WvO1HdcoogbYgrT0HoOD3rAD6OrQqRII26UF3FjJfk/OVuQ TFVQ== X-Gm-Message-State: APjAAAUAPehHsEGO/Hp6TKr3X4ilRIKLfA4QKXBXDxjOOpnAQniCOGW9 FlFGPJzXFU04O4zCZJ6YUAANJM1Cge5psnmhpynjyJiGpylRQI17tAPclX/1ZUmadd+Dhu/m4cy IlIDgLF89X246lSWrnfChPaEVZJtq4N+93ho+AcqMpboU7kGRg3TFOV32Wq24aWLUIQ== X-Received: by 2002:a19:6a01:: with SMTP id u1mr35923008lfu.141.1560255930716; Tue, 11 Jun 2019 05:25:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqyfctutclmxsP5Z+/CP7ELukYeOltzi+qHYpX5QmHy2GEYHtL6EqSfNs4UYuOtBNGx/yN7Y X-Received: by 2002:a19:6a01:: with SMTP id u1mr35922964lfu.141.1560255929605; Tue, 11 Jun 2019 05:25:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560255929; cv=none; d=google.com; s=arc-20160816; b=xg4UNRaW023FTGmBiJYI3JH/PyzFVouNo0aMVs14Faf7NsI7PpEafT6MPdYcLgW+WL 2RFo6X0dYTn92FqPRa+gtzZBrf++w/jPXX2y8DKN194xQgpG1lH9yD5tC1XvRuCCPRUi XT8ox7bbMG8wNuvsztEjS48v0bWWDyP3y8GSgEE/uqiSFqvFggzd9jQM+yogAvh87mD9 tWic0UNNvrZaCYQ4m3zUxciGKSHVaLKRRs6AlSJzOImor4y3jbxMpH23k7/+KQopUewl WxObuLqD8Jm/JoOXr3ZXr+G5NHinSNsgNPAJAhL0D/ENCe/+fctckTb+INv9t5rh4psv 5oDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=3Sclql3dvSZFf/V+1y7l8v94/Rp4TDjyKr6qABVkiR4=; b=lSsOriF4WoaIL6fPa3qJxJaCo/yNulab6HsyrENumPUSrO9Lry1COKg8z5C88ndlFC oBcvsvtwP3CUf42QEyzaKg0ysV/ZI2qrLsv/uuRKZqJQ8Y5yg+ZW9+JqwPGXNzPGagiC zDDDLi3XsfHI2kkGUr45NN9OyLkXJeDHd4z/B1zQad2RVxmRLlQgrB+9y5o6NSyB0mhx hliaftMoQuQxZhUMOx9PGEkFAKKZTyFIDguIxNROnZnjjm6A5SroQNTmupXZIhnOUrtB EvjRi0aluxKAvZ6WGGpy/2Gocli/pK5Q7jwupApF36Ci/Z1L0zGLPtzXmrsUtrg3oDaH TI0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vmwopensource.org header.s=mail header.b=RyARfwIF; spf=pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.71 as permitted sender) smtp.mailfrom=thellstrom@vmwopensource.org Received: from ste-pvt-msa2.bahnhof.se (ste-pvt-msa2.bahnhof.se. [213.80.101.71]) by mx.google.com with ESMTPS id m64si11292107lje.200.2019.06.11.05.25.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Jun 2019 05:25:29 -0700 (PDT) Received-SPF: pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.71 as permitted sender) client-ip=213.80.101.71; Authentication-Results: mx.google.com; dkim=pass header.i=@vmwopensource.org header.s=mail header.b=RyARfwIF; spf=pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.71 as permitted sender) smtp.mailfrom=thellstrom@vmwopensource.org Received: from localhost (localhost [127.0.0.1]) by ste-pvt-msa2.bahnhof.se (Postfix) with ESMTP id 148373F633; Tue, 11 Jun 2019 14:25:24 +0200 (CEST) Authentication-Results: ste-pvt-msa2.bahnhof.se; dkim=pass (1024-bit key; unprotected) header.d=vmwopensource.org header.i=@vmwopensource.org header.b=RyARfwIF; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at bahnhof.se Authentication-Results: ste-ftg-msa2.bahnhof.se (amavisd-new); dkim=pass (1024-bit key) header.d=vmwopensource.org Received: from ste-pvt-msa2.bahnhof.se ([127.0.0.1]) by localhost (ste-ftg-msa2.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CXn9JH6tdl4E; Tue, 11 Jun 2019 14:25:09 +0200 (CEST) Received: from mail1.shipmail.org (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) (Authenticated sender: mb878879) by ste-pvt-msa2.bahnhof.se (Postfix) with ESMTPA id 8F91C3F566; Tue, 11 Jun 2019 14:25:08 +0200 (CEST) Received: from localhost.localdomain.localdomain (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) by mail1.shipmail.org (Postfix) with ESMTPSA id 47F1A3619F7; Tue, 11 Jun 2019 14:25:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=vmwopensource.org; s=mail; t=1560255908; bh=MG0HfLcgqDRwxcPIDolCLkQDrfh6DF2qVfbCiXX/p4A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RyARfwIFJWOrkzXAOtSGCWJOroGc1STB77+TLw3mEE8m34xj5NYLlh7EeJffzNfQN brhdf7u4OLw5jNY+7WcaqDYY+6rLRmBeY5HuSI/P2Z+HEDoUi9uBB2PAzmYVkR0goB 8cwe8pJKQvaUXg1idEnSACT2ivHkZnc1DFEGAUC4= From: =?utf-8?q?Thomas_Hellstr=C3=B6m_=28VMware=29?= To: dri-devel@lists.freedesktop.org Cc: linux-graphics-maintainer@vmware.com, pv-drivers@vmware.com, linux-kernel@vger.kernel.org, Thomas Hellstrom , Andrew Morton , Matthew Wilcox , Will Deacon , Peter Zijlstra , Rik van Riel , Minchan Kim , Michal Hocko , Huang Ying , Souptick Joarder , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , linux-mm@kvack.org, Ralph Campbell Subject: [PATCH v4 2/9] mm: Add an apply_to_pfn_range interface Date: Tue, 11 Jun 2019 14:24:47 +0200 Message-Id: <20190611122454.3075-3-thellstrom@vmwopensource.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190611122454.3075-1-thellstrom@vmwopensource.org> References: <20190611122454.3075-1-thellstrom@vmwopensource.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Thomas Hellstrom This is basically apply_to_page_range with added functionality: Allocating missing parts of the page table becomes optional, which means that the function can be guaranteed not to error if allocation is disabled. Also passing of the closure struct and callback function becomes different and more in line with how things are done elsewhere. Finally we keep apply_to_page_range as a wrapper around apply_to_pfn_range The reason for not using the page-walk code is that we want to perform the page-walk on vmas pointing to an address space without requiring the mmap_sem to be held rather than on vmas belonging to a process with the mmap_sem held. Notable changes since RFC: Don't export apply_to_pfn range. Cc: Andrew Morton Cc: Matthew Wilcox Cc: Will Deacon Cc: Peter Zijlstra Cc: Rik van Riel Cc: Minchan Kim Cc: Michal Hocko Cc: Huang Ying Cc: Souptick Joarder Cc: "Jérôme Glisse" Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Thomas Hellstrom Reviewed-by: Ralph Campbell #v1 --- include/linux/mm.h | 10 ++++ mm/memory.c | 135 ++++++++++++++++++++++++++++++++++----------- 2 files changed, 113 insertions(+), 32 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e8834ac32b7..3d06ce2a64af 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2675,6 +2675,16 @@ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, extern int apply_to_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); +struct pfn_range_apply; +typedef int (*pter_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, + struct pfn_range_apply *closure); +struct pfn_range_apply { + struct mm_struct *mm; + pter_fn_t ptefn; + unsigned int alloc; +}; +extern int apply_to_pfn_range(struct pfn_range_apply *closure, + unsigned long address, unsigned long size); #ifdef CONFIG_PAGE_POISONING extern bool page_poisoning_enabled(void); diff --git a/mm/memory.c b/mm/memory.c index 168f546af1ad..462aa47f8878 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2032,18 +2032,17 @@ int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long } EXPORT_SYMBOL(vm_iomap_memory); -static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, - pte_fn_t fn, void *data) +static int apply_to_pte_range(struct pfn_range_apply *closure, pmd_t *pmd, + unsigned long addr, unsigned long end) { pte_t *pte; int err; pgtable_t token; spinlock_t *uninitialized_var(ptl); - pte = (mm == &init_mm) ? + pte = (closure->mm == &init_mm) ? pte_alloc_kernel(pmd, addr) : - pte_alloc_map_lock(mm, pmd, addr, &ptl); + pte_alloc_map_lock(closure->mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; @@ -2054,86 +2053,109 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, token = pmd_pgtable(*pmd); do { - err = fn(pte++, token, addr, data); + err = closure->ptefn(pte++, token, addr, closure); if (err) break; } while (addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); - if (mm != &init_mm) + if (closure->mm != &init_mm) pte_unmap_unlock(pte-1, ptl); return err; } -static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, - pte_fn_t fn, void *data) +static int apply_to_pmd_range(struct pfn_range_apply *closure, pud_t *pud, + unsigned long addr, unsigned long end) { pmd_t *pmd; unsigned long next; - int err; + int err = 0; BUG_ON(pud_huge(*pud)); - pmd = pmd_alloc(mm, pud, addr); + pmd = pmd_alloc(closure->mm, pud, addr); if (!pmd) return -ENOMEM; + do { next = pmd_addr_end(addr, end); - err = apply_to_pte_range(mm, pmd, addr, next, fn, data); + if (!closure->alloc && pmd_none_or_clear_bad(pmd)) + continue; + err = apply_to_pte_range(closure, pmd, addr, next); if (err) break; } while (pmd++, addr = next, addr != end); return err; } -static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d, - unsigned long addr, unsigned long end, - pte_fn_t fn, void *data) +static int apply_to_pud_range(struct pfn_range_apply *closure, p4d_t *p4d, + unsigned long addr, unsigned long end) { pud_t *pud; unsigned long next; - int err; + int err = 0; - pud = pud_alloc(mm, p4d, addr); + pud = pud_alloc(closure->mm, p4d, addr); if (!pud) return -ENOMEM; + do { next = pud_addr_end(addr, end); - err = apply_to_pmd_range(mm, pud, addr, next, fn, data); + if (!closure->alloc && pud_none_or_clear_bad(pud)) + continue; + err = apply_to_pmd_range(closure, pud, addr, next); if (err) break; } while (pud++, addr = next, addr != end); return err; } -static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, - pte_fn_t fn, void *data) +static int apply_to_p4d_range(struct pfn_range_apply *closure, pgd_t *pgd, + unsigned long addr, unsigned long end) { p4d_t *p4d; unsigned long next; - int err; + int err = 0; - p4d = p4d_alloc(mm, pgd, addr); + p4d = p4d_alloc(closure->mm, pgd, addr); if (!p4d) return -ENOMEM; + do { next = p4d_addr_end(addr, end); - err = apply_to_pud_range(mm, p4d, addr, next, fn, data); + if (!closure->alloc && p4d_none_or_clear_bad(p4d)) + continue; + err = apply_to_pud_range(closure, p4d, addr, next); if (err) break; } while (p4d++, addr = next, addr != end); return err; } -/* - * Scan a region of virtual memory, filling in page tables as necessary - * and calling a provided function on each leaf page table. +/** + * apply_to_pfn_range - Scan a region of virtual memory, calling a provided + * function on each leaf page table entry + * @closure: Details about how to scan and what function to apply + * @addr: Start virtual address + * @size: Size of the region + * + * If @closure->alloc is set to 1, the function will fill in the page table + * as necessary. Otherwise it will skip non-present parts. + * Note: The caller must ensure that the range does not contain huge pages. + * The caller must also assure that the proper mmu_notifier functions are + * called before and after the call to apply_to_pfn_range. + * + * WARNING: Do not use this function unless you know exactly what you are + * doing. It is lacking support for huge pages and transparent huge pages. + * + * Return: Zero on success. If the provided function returns a non-zero status, + * the page table walk will terminate and that status will be returned. + * If @closure->alloc is set to 1, then this function may also return memory + * allocation errors arising from allocating page table memory. */ -int apply_to_page_range(struct mm_struct *mm, unsigned long addr, - unsigned long size, pte_fn_t fn, void *data) +int apply_to_pfn_range(struct pfn_range_apply *closure, + unsigned long addr, unsigned long size) { pgd_t *pgd; unsigned long next; @@ -2143,16 +2165,65 @@ int apply_to_page_range(struct mm_struct *mm, unsigned long addr, if (WARN_ON(addr >= end)) return -EINVAL; - pgd = pgd_offset(mm, addr); + pgd = pgd_offset(closure->mm, addr); do { next = pgd_addr_end(addr, end); - err = apply_to_p4d_range(mm, pgd, addr, next, fn, data); + if (!closure->alloc && pgd_none_or_clear_bad(pgd)) + continue; + err = apply_to_p4d_range(closure, pgd, addr, next); if (err) break; } while (pgd++, addr = next, addr != end); return err; } + +/** + * struct page_range_apply - Closure structure for apply_to_page_range() + * @pter: The base closure structure we derive from + * @fn: The leaf pte function to call + * @data: The leaf pte function closure + */ +struct page_range_apply { + struct pfn_range_apply pter; + pte_fn_t fn; + void *data; +}; + +/* + * Callback wrapper to enable use of apply_to_pfn_range for + * the apply_to_page_range interface + */ +static int apply_to_page_range_wrapper(pte_t *pte, pgtable_t token, + unsigned long addr, + struct pfn_range_apply *pter) +{ + struct page_range_apply *pra = + container_of(pter, typeof(*pra), pter); + + return pra->fn(pte, token, addr, pra->data); +} + +/* + * Scan a region of virtual memory, filling in page tables as necessary + * and calling a provided function on each leaf page table. + * + * WARNING: Do not use this function unless you know exactly what you are + * doing. It is lacking support for huge pages and transparent huge pages. + */ +int apply_to_page_range(struct mm_struct *mm, unsigned long addr, + unsigned long size, pte_fn_t fn, void *data) +{ + struct page_range_apply pra = { + .pter = {.mm = mm, + .alloc = 1, + .ptefn = apply_to_page_range_wrapper }, + .fn = fn, + .data = data + }; + + return apply_to_pfn_range(&pra.pter, addr, size); +} EXPORT_SYMBOL_GPL(apply_to_page_range); /* From patchwork Tue Jun 11 12:24:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m_=28VMware=29?= X-Patchwork-Id: 10986767 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAEEB14BB for ; Tue, 11 Jun 2019 12:25:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8051286EC for ; Tue, 11 Jun 2019 12:25:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BC7626E82; Tue, 11 Jun 2019 12:25:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DE5926E82 for ; Tue, 11 Jun 2019 12:25:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 508CE6B0007; Tue, 11 Jun 2019 08:25:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2258A6B0010; Tue, 11 Jun 2019 08:25:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6E766B0007; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by kanga.kvack.org (Postfix) with ESMTP id 3C6986B000A for ; Tue, 11 Jun 2019 08:25:31 -0400 (EDT) Received: by mail-lj1-f200.google.com with SMTP id r84so2053317ljr.22 for ; Tue, 11 Jun 2019 05:25:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=T4ll45PU4kfSDSFIA8ewpcqgrnU8HAKiDusR1hfTMIQ=; b=pof+yGzFeMsJh+KCtmtcLRGsmxndQsPuMeE7rWclejJH6HsToUuzNwuIEkgxV++DXq nNkN3idrjW+xHti0wdNJceT2rhep9/7HKsGxhCN1dB2QLLV/T2Cp6nNZZnVnJGr4QnXY fgiaN6e8gscSXsz58ABuC8xBnzbJlsDDuMFPoK5ALgJtYFxYJNeY9vSy3hdnVf3NrVJr bKFPcoA1ZLaaQXORBlndNEjSxQsbPS23gFwitl0rc7R9+zDRK748RTBtFy/yQ+zgVV85 5teCTcJeNd1YRCJsXB403LVxYhqklGR4pMgHVOJ2BJG9/jtJNV5xjdoHKpzkUDzRs+qR FK+g== X-Gm-Message-State: APjAAAVTBJhgi3GmFCGVYmP0K5yDKCLszbPpvUJzoRY+R6XYgzrS9wT1 rDw+b9aPCXbiLDHvCDtDRalrNcOpg5n0TpU9UYx1S0wN4A4vmiFXzIzPOaWUd/f6ydA0O3BtWbz 5dyzANmS1QYgzJpjiLNf9x5RoZTqZAoBo1p8zowyv41lt2ymSX+c1T9Efksm/T8bYDQ== X-Received: by 2002:a19:3f16:: with SMTP id m22mr36407329lfa.104.1560255930464; Tue, 11 Jun 2019 05:25:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqylVnOR5gHv1w3ZmbxGjTtxY12sbjgY/ic7RbMb9CsZ/pQ3Tx+7CBjGBg9Vz/Uv/goICkr6 X-Received: by 2002:a19:3f16:: with SMTP id m22mr36407274lfa.104.1560255929188; Tue, 11 Jun 2019 05:25:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560255929; cv=none; d=google.com; s=arc-20160816; b=qVo+sh8uCVaFWvkxQHGKU94vxesUCUVOLA2rtbyRNhVqThPl0UzcgUbnKLDaZDNGtU J3f3HxLJkU9j+7EENnrEAm/DeoED8/3n9FKLVIa9C3RYCxnwpsq0xg16h8XYv4HHoC3x PFR83Zzj7ogf895CCVXxYYtXNujzGPRKP/Fk1fsTnyyV45Ha7ZZ5XOQW2hsYhN7r3Xq0 5clEQCV3Jz3WDE7EPloRR+X9TAByBw1HJjSOeSKD8AJ9gcIXJzMCwBPLG83t6XyRRgVS 37APzwfUSdNyknl2PbZiQ3AjqumIEnMAfOyTwYI1saU7Hp23W/d83DcZRngOIVdP90yP zLxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=T4ll45PU4kfSDSFIA8ewpcqgrnU8HAKiDusR1hfTMIQ=; b=AMN2LzWWtivFlXcyLsWvgqabWwPsBdY8FJI9XqTQ24N3LwhNbzwpw4xY4AiY32mXbL a4xnYMvtz0YXFsWrd/HY4JfMXRmUJ+8QSMhZkESSvAG66bF61wc/loWBTuRmmBbMTiin L4h6UbjeWLHf1ksykCeSDYchKV/wtFGd1JHkQTO/3sfC5cvxJpGxxiSSKXPMPgSYapcU 5J+ZbYsZrAtdnH4YA8ilKyldMsXEiJy9pQDaw0V0yhI0CDWpkMmG3RDEBIJ125yL0F95 HZrindUKvUVxaa0qvZgQcwhM8ePGz/FfbhbWLi5uyiIi2K+1LFdjuO58SX1ZciyZ62eB 7szw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vmwopensource.org header.s=mail header.b=gqg0wWjp; spf=pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.70 as permitted sender) smtp.mailfrom=thellstrom@vmwopensource.org Received: from ste-pvt-msa1.bahnhof.se (ste-pvt-msa1.bahnhof.se. [213.80.101.70]) by mx.google.com with ESMTPS id f15si12711339ljj.159.2019.06.11.05.25.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Jun 2019 05:25:29 -0700 (PDT) Received-SPF: pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.70 as permitted sender) client-ip=213.80.101.70; Authentication-Results: mx.google.com; dkim=pass header.i=@vmwopensource.org header.s=mail header.b=gqg0wWjp; spf=pass (google.com: domain of thellstrom@vmwopensource.org designates 213.80.101.70 as permitted sender) smtp.mailfrom=thellstrom@vmwopensource.org Received: from localhost (localhost [127.0.0.1]) by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTP id B47663F530; Tue, 11 Jun 2019 14:25:23 +0200 (CEST) Authentication-Results: ste-pvt-msa1.bahnhof.se; dkim=pass (1024-bit key; unprotected) header.d=vmwopensource.org header.i=@vmwopensource.org header.b=gqg0wWjp; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at bahnhof.se Received: from ste-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (ste-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XNlbMip8T2HR; Tue, 11 Jun 2019 14:25:09 +0200 (CEST) Received: from mail1.shipmail.org (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) (Authenticated sender: mb878879) by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id C47FD3F7AB; Tue, 11 Jun 2019 14:25:08 +0200 (CEST) Received: from localhost.localdomain.localdomain (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) by mail1.shipmail.org (Postfix) with ESMTPSA id 73AC9361A96; Tue, 11 Jun 2019 14:25:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=vmwopensource.org; s=mail; t=1560255908; bh=vE1EQxmA2FhOs8EwFEFftTUOyx8/23joN2AFJRtsCIs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gqg0wWjpfCCf7/QvHhCh4Q6P2fAHFcg8ffwp79ROUZanPco+GFTU38ijelVOtxPfw hbxEO6XVlrnZl3d2b35Y9mBAGsIKrr5weDeN4uH9wwqb/AkfIx/3TtNf3ovlcrMktG iQwxk/n72BwX4d1hb9QhJYa6fKTXxEgvQMIhPmDA= From: =?utf-8?q?Thomas_Hellstr=C3=B6m_=28VMware=29?= To: dri-devel@lists.freedesktop.org Cc: linux-graphics-maintainer@vmware.com, pv-drivers@vmware.com, linux-kernel@vger.kernel.org, Thomas Hellstrom , Andrew Morton , Matthew Wilcox , Will Deacon , Peter Zijlstra , Rik van Riel , Minchan Kim , Michal Hocko , Huang Ying , Souptick Joarder , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , linux-mm@kvack.org, Ralph Campbell Subject: [PATCH v4 3/9] mm: Add write-protect and clean utilities for address space ranges Date: Tue, 11 Jun 2019 14:24:48 +0200 Message-Id: <20190611122454.3075-4-thellstrom@vmwopensource.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190611122454.3075-1-thellstrom@vmwopensource.org> References: <20190611122454.3075-1-thellstrom@vmwopensource.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Thomas Hellstrom Add two utilities to a) write-protect and b) clean all ptes pointing into a range of an address space. The utilities are intended to aid in tracking dirty pages (either driver-allocated system memory or pci device memory). The write-protect utility should be used in conjunction with page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page accesses. Typically one would want to use this on sparse accesses into large memory regions. The clean utility should be used to utilize hardware dirtying functionality and avoid the overhead of page-faults, typically on large accesses into small memory regions. The added file "as_dirty_helpers.c" is initially listed as maintained by VMware under our DRM driver. If somebody would like it elsewhere, that's of course no problem. Notable changes since RFC: - Added comments to help avoid the usage of these function for VMAs it's not intended for. We also do advisory checks on the vm_flags and warn on illegal usage. - Perform the pte modifications the same way softdirty does. - Add mmu_notifier range invalidation calls. - Add a config option so that this code is not unconditionally included. - Tell the mmu_gather code about pending tlb flushes. Cc: Andrew Morton Cc: Matthew Wilcox Cc: Will Deacon Cc: Peter Zijlstra Cc: Rik van Riel Cc: Minchan Kim Cc: Michal Hocko Cc: Huang Ying Cc: Souptick Joarder Cc: "Jérôme Glisse" Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Thomas Hellstrom Reviewed-by: Ralph Campbell #v1 --- MAINTAINERS | 1 + include/linux/mm.h | 9 +- mm/Kconfig | 3 + mm/Makefile | 1 + mm/as_dirty_helpers.c | 298 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 311 insertions(+), 1 deletion(-) create mode 100644 mm/as_dirty_helpers.c diff --git a/MAINTAINERS b/MAINTAINERS index 7a2f487ea49a..a55d4ef91b0b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5179,6 +5179,7 @@ T: git git://people.freedesktop.org/~thomash/linux S: Supported F: drivers/gpu/drm/vmwgfx/ F: include/uapi/drm/vmwgfx_drm.h +F: mm/as_dirty_helpers.c DRM DRIVERS M: David Airlie diff --git a/include/linux/mm.h b/include/linux/mm.h index 3d06ce2a64af..a0bc2a82917e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2685,7 +2685,14 @@ struct pfn_range_apply { }; extern int apply_to_pfn_range(struct pfn_range_apply *closure, unsigned long address, unsigned long size); - +unsigned long apply_as_wrprotect(struct address_space *mapping, + pgoff_t first_index, pgoff_t nr); +unsigned long apply_as_clean(struct address_space *mapping, + pgoff_t first_index, pgoff_t nr, + pgoff_t bitmap_pgoff, + unsigned long *bitmap, + pgoff_t *start, + pgoff_t *end); #ifdef CONFIG_PAGE_POISONING extern bool page_poisoning_enabled(void); extern void kernel_poison_pages(struct page *page, int numpages, int enable); diff --git a/mm/Kconfig b/mm/Kconfig index f0c76ba47695..5006d0e6a5c7 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -765,4 +765,7 @@ config GUP_BENCHMARK config ARCH_HAS_PTE_SPECIAL bool +config AS_DIRTY_HELPERS + bool + endmenu diff --git a/mm/Makefile b/mm/Makefile index ac5e5ba78874..f5d412bbc2f7 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -104,3 +104,4 @@ obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o obj-$(CONFIG_HMM) += hmm.o obj-$(CONFIG_MEMFD_CREATE) += memfd.o +obj-$(CONFIG_AS_DIRTY_HELPERS) += as_dirty_helpers.o diff --git a/mm/as_dirty_helpers.c b/mm/as_dirty_helpers.c new file mode 100644 index 000000000000..ccaad41b70bc --- /dev/null +++ b/mm/as_dirty_helpers.c @@ -0,0 +1,298 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +/** + * struct apply_as - Closure structure for apply_as_range + * @base: struct pfn_range_apply we derive from + * @start: Address of first modified pte + * @end: Address of last modified pte + 1 + * @total: Total number of modified ptes + * @vma: Pointer to the struct vm_area_struct we're currently operating on + */ +struct apply_as { + struct pfn_range_apply base; + unsigned long start; + unsigned long end; + unsigned long total; + struct vm_area_struct *vma; +}; + +/** + * apply_pt_wrprotect - Leaf pte callback to write-protect a pte + * @pte: Pointer to the pte + * @token: Page table token, see apply_to_pfn_range() + * @addr: The virtual page address + * @closure: Pointer to a struct pfn_range_apply embedded in a + * struct apply_as + * + * The function write-protects a pte and records the range in + * virtual address space of touched ptes for efficient range TLB flushes. + * + * Return: Always zero. + */ +static int apply_pt_wrprotect(pte_t *pte, pgtable_t token, + unsigned long addr, + struct pfn_range_apply *closure) +{ + struct apply_as *aas = container_of(closure, typeof(*aas), base); + pte_t ptent = *pte; + + if (pte_write(ptent)) { + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); + + ptent = pte_wrprotect(old_pte); + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); + aas->total++; + aas->start = min(aas->start, addr); + aas->end = max(aas->end, addr + PAGE_SIZE); + } + + return 0; +} + +/** + * struct apply_as_clean - Closure structure for apply_as_clean + * @base: struct apply_as we derive from + * @bitmap_pgoff: Address_space Page offset of the first bit in @bitmap + * @bitmap: Bitmap with one bit for each page offset in the address_space range + * covered. + * @start: Address_space page offset of first modified pte relative + * to @bitmap_pgoff + * @end: Address_space page offset of last modified pte relative + * to @bitmap_pgoff + */ +struct apply_as_clean { + struct apply_as base; + pgoff_t bitmap_pgoff; + unsigned long *bitmap; + pgoff_t start; + pgoff_t end; +}; + +/** + * apply_pt_clean - Leaf pte callback to clean a pte + * @pte: Pointer to the pte + * @token: Page table token, see apply_to_pfn_range() + * @addr: The virtual page address + * @closure: Pointer to a struct pfn_range_apply embedded in a + * struct apply_as_clean + * + * The function cleans a pte and records the range in + * virtual address space of touched ptes for efficient TLB flushes. + * It also records dirty ptes in a bitmap representing page offsets + * in the address_space, as well as the first and last of the bits + * touched. + * + * Return: Always zero. + */ +static int apply_pt_clean(pte_t *pte, pgtable_t token, + unsigned long addr, + struct pfn_range_apply *closure) +{ + struct apply_as *aas = container_of(closure, typeof(*aas), base); + struct apply_as_clean *clean = container_of(aas, typeof(*clean), base); + pte_t ptent = *pte; + + if (pte_dirty(ptent)) { + pgoff_t pgoff = ((addr - aas->vma->vm_start) >> PAGE_SHIFT) + + aas->vma->vm_pgoff - clean->bitmap_pgoff; + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); + + ptent = pte_mkclean(old_pte); + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); + + aas->total++; + aas->start = min(aas->start, addr); + aas->end = max(aas->end, addr + PAGE_SIZE); + + __set_bit(pgoff, clean->bitmap); + clean->start = min(clean->start, pgoff); + clean->end = max(clean->end, pgoff + 1); + } + + return 0; +} + +/** + * apply_as_range - Apply a pte callback to all PTEs pointing into a range + * of an address_space. + * @mapping: Pointer to the struct address_space + * @aas: Closure structure + * @first_index: First page offset in the address_space + * @nr: Number of incremental page offsets to cover + * + * Return: Number of ptes touched. Note that this number might be larger + * than @nr if there are overlapping vmas + */ +static unsigned long apply_as_range(struct address_space *mapping, + struct apply_as *aas, + pgoff_t first_index, pgoff_t nr) +{ + struct vm_area_struct *vma; + pgoff_t vba, vea, cba, cea; + unsigned long start_addr, end_addr; + struct mmu_notifier_range range; + + i_mmap_lock_read(mapping); + vma_interval_tree_foreach(vma, &mapping->i_mmap, first_index, + first_index + nr - 1) { + unsigned long vm_flags = READ_ONCE(vma->vm_flags); + + /* + * We can only do advisory flag tests below, since we can't + * require the vm's mmap_sem to be held to protect the flags. + * Therefore, callers that strictly depend on specific mmap + * flags to remain constant throughout the operation must + * either ensure those flags are immutable for all relevant + * vmas or can't use this function. Fixing this properly would + * require the vma::vm_flags to be protected by a separate + * lock taken after the i_mmap_lock + */ + + /* Skip non-applicable VMAs */ + if ((vm_flags & (VM_SHARED | VM_WRITE)) != + (VM_SHARED | VM_WRITE)) + continue; + + /* Warn on and skip VMAs whose flags indicate illegal usage */ + if (WARN_ON((vm_flags & (VM_HUGETLB | VM_IO)) != VM_IO)) + continue; + + /* Clip to the vma */ + vba = vma->vm_pgoff; + vea = vba + vma_pages(vma); + cba = first_index; + cba = max(cba, vba); + cea = first_index + nr; + cea = min(cea, vea); + + /* Translate to virtual address */ + start_addr = ((cba - vba) << PAGE_SHIFT) + vma->vm_start; + end_addr = ((cea - vba) << PAGE_SHIFT) + vma->vm_start; + if (start_addr >= end_addr) + continue; + + aas->base.mm = vma->vm_mm; + aas->vma = vma; + aas->start = end_addr; + aas->end = start_addr; + + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE, 0, + vma, vma->vm_mm, start_addr, end_addr); + mmu_notifier_invalidate_range_start(&range); + + /* Needed when we only change protection? */ + flush_cache_range(vma, start_addr, end_addr); + + /* + * We're not using tlb_gather_mmu() since typically + * only a small subrange of PTEs are affected. + */ + inc_tlb_flush_pending(vma->vm_mm); + + /* Should not error since aas->base.alloc == 0 */ + WARN_ON(apply_to_pfn_range(&aas->base, start_addr, + end_addr - start_addr)); + if (aas->end > aas->start) + flush_tlb_range(vma, aas->start, aas->end); + + mmu_notifier_invalidate_range_end(&range); + dec_tlb_flush_pending(vma->vm_mm); + } + i_mmap_unlock_read(mapping); + + return aas->total; +} + +/** + * apply_as_wrprotect - Write-protect all ptes in an address_space range + * @mapping: The address_space we want to write protect + * @first_index: The first page offset in the range + * @nr: Number of incremental page offsets to cover + * + * WARNING: This function should only be used for address spaces that + * completely own the pages / memory the page table points to. Typically a + * device file. + * + * Return: The number of ptes actually write-protected. Note that + * already write-protected ptes are not counted. + */ +unsigned long apply_as_wrprotect(struct address_space *mapping, + pgoff_t first_index, pgoff_t nr) +{ + struct apply_as aas = { + .base = { + .alloc = 0, + .ptefn = apply_pt_wrprotect, + }, + .total = 0, + }; + + return apply_as_range(mapping, &aas, first_index, nr); +} +EXPORT_SYMBOL(apply_as_wrprotect); + +/** + * apply_as_clean - Clean all ptes in an address_space range + * @mapping: The address_space we want to clean + * @first_index: The first page offset in the range + * @nr: Number of incremental page offsets to cover + * @bitmap_pgoff: The page offset of the first bit in @bitmap + * @bitmap: Pointer to a bitmap of at least @nr bits. The bitmap needs to + * cover the whole range @first_index..@first_index + @nr. + * @start: Pointer to number of the first set bit in @bitmap. + * is modified as new bits are set by the function. + * @end: Pointer to the number of the last set bit in @bitmap. + * none set. The value is modified as new bits are set by the function. + * + * Note: When this function returns there is no guarantee that a CPU has + * not already dirtied new ptes. However it will not clean any ptes not + * reported in the bitmap. + * + * If a caller needs to make sure all dirty ptes are picked up and none + * additional are added, it first needs to write-protect the address-space + * range and make sure new writers are blocked in page_mkwrite() or + * pfn_mkwrite(). And then after a TLB flush following the write-protection + * pick up all dirty bits. + * + * WARNING: This function should only be used for address spaces that + * completely own the pages / memory the page table points to. Typically a + * device file. + * + * Return: The number of dirty ptes actually cleaned. + */ +unsigned long apply_as_clean(struct address_space *mapping, + pgoff_t first_index, pgoff_t nr, + pgoff_t bitmap_pgoff, + unsigned long *bitmap, + pgoff_t *start, + pgoff_t *end) +{ + bool none_set = (*start >= *end); + struct apply_as_clean clean = { + .base = { + .base = { + .alloc = 0, + .ptefn = apply_pt_clean, + }, + .total = 0, + }, + .bitmap_pgoff = bitmap_pgoff, + .bitmap = bitmap, + .start = none_set ? nr : *start, + .end = none_set ? 0 : *end, + }; + unsigned long ret = apply_as_range(mapping, &clean.base, first_index, + nr); + + *start = clean.start; + *end = clean.end; + return ret; +} +EXPORT_SYMBOL(apply_as_clean);