From patchwork Mon Mar 25 14:40:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869401 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23A911390 for ; Mon, 25 Mar 2019 14:40:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B5B329474 for ; Mon, 25 Mar 2019 14:40:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0993629482; Mon, 25 Mar 2019 14:40:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9263529476 for ; Mon, 25 Mar 2019 14:40:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 563586B0006; Mon, 25 Mar 2019 10:40:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4E97B6B000A; Mon, 25 Mar 2019 10:40:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 388A36B000C; Mon, 25 Mar 2019 10:40:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 0D6546B0006 for ; Mon, 25 Mar 2019 10:40:17 -0400 (EDT) Received: by mail-qt1-f200.google.com with SMTP id 35so10300950qty.12 for ; Mon, 25 Mar 2019 07:40:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=pWcLW/bNcbSzZsR0N5LB1nlWz2nt3SFlayKquuhRY14=; b=fqu4kQVHJPvnl6i6aa6q3P2q2mRGRVUfdjCmYP2bg1BU1h+Bz0BGbiQnmTUyXNMuir 2QKjyDO2feEVdRafFFwNOAywtzMuj42pnMU+Y/TR6AWBQXLIiVkr20vlfYby9IBASsWE KixeeBhICwNwYTJS2o9/ZYYvQ2cAPowgpF6xa2qv+QUJu+pWCK7B8QQWNIwB1/q5MAAs 1YFEsSsJNo9OqtWKuPTrhhmEIDnUXeWMaVkDswmbtlk0QMdIs8e8GBp+1QpAxqHFLy0a bi07lYCXuorBLhLBtAVAzsG9nRdHWtyE3FVhmwORqCoKvAkZJnMNl7JkcI7d3wYkSrv/ /5rw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAWJt3iRKAtqGpHbTcdths35+Zpx7/dBhZv7lB7yWYVA5EnUWIAE XuO4/6YWJSzXE9FZR9PkLKP0CuOfwvtNziHf/dBT16K43GzLH7H0tG1YWsXyYpGgkxv69BPCQIO dXywFZk9CCD1oH37Dbxo/a69tYiVAE28WUhzGPu7hh0SdjTmktZLBwq5gq3FBEgd/0w== X-Received: by 2002:a0c:b15c:: with SMTP id r28mr8177990qvc.122.1553524816793; Mon, 25 Mar 2019 07:40:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqws1UwuD3D8U55ZmP2S0fOF0ngILZnAwya47CZMNvIJ7e4vJ1e8I5U2bqZUIzng1f8VBIzx X-Received: by 2002:a0c:b15c:: with SMTP id r28mr8177939qvc.122.1553524816067; Mon, 25 Mar 2019 07:40:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524816; cv=none; d=google.com; s=arc-20160816; b=hDMbGq3rxDgi0oMMxYjzEa/tfIct094P1qEIzbn4jTfitc5tyKAFEoIUpLA5uTRKCJ QN9FZQKFmoH3zO08S//yFVmiLeTN9vMyMZrx27cD6b8oo6iLvelX2jSciMNOA74BT3kK Zf2ZkQ8n/HnhMOtpbUybFjEOkhfi5rQ5K3fMym8Z6198AXPnALxXTqiMcgD/PAZXexkZ shx3WaVSiueK1fpZhL8Wx1i2uC8QhzkCtGRitLcQ7so4wwL/YNaxoPuE2nq4ExqQ0pBb JGOja/fz0772x2atxrCoIlP93nB40aAhIXrGpRt+RA1GWf+OdM9lOOsE7o1075rA6sON oZLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=pWcLW/bNcbSzZsR0N5LB1nlWz2nt3SFlayKquuhRY14=; b=hUiC6TPeSU/wg34oHgGuekrV61+NmY89y5hjPL0bM25+gm5/Z3fUesMy9Yzf/45asH oICMi1vERRJOFOJV/qlFipXhRDM/bQdK52yvD55BWI52TlN2kMD6Qo5fPOpsisdGt85y PbdAiLmwI5Rk8rXW9Fq/2GfXT4nUrPpOaOQYmXjMdvC38My0xk9d/KFJdVtLJpLyNmJd wchEqx/yTn2CapeCvWLI0zfy2YR3gjlq7v/Y58ABQCH8+/Jphk4OZ//DSx78TMtwYQvb w8g6FXJNWMrEK0K5/DRjdygQCfgGaIsPzfFd/0P7LnX4WZvFksUAAPPipoM8+FpNPqfB 875g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id d16si2741478qto.126.2019.03.25.07.40.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:16 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5683946289; Mon, 25 Mar 2019 14:40:15 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9254F1001DC8; Mon, 25 Mar 2019 14:40:14 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Ralph Campbell , Andrew Morton , John Hubbard , Dan Williams Subject: [PATCH v2 01/11] mm/hmm: select mmu notifier when selecting HMM Date: Mon, 25 Mar 2019 10:40:01 -0400 Message-Id: <20190325144011.10560-2-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 25 Mar 2019 14:40:15 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse To avoid random config build issue, select mmu notifier when HMM is selected. In any cases when HMM get selected it will be by users that will also wants the mmu notifier. Signed-off-by: Jérôme Glisse Acked-by: Balbir Singh Cc: Ralph Campbell Cc: Andrew Morton Cc: John Hubbard Cc: Dan Williams --- mm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/Kconfig b/mm/Kconfig index 25c71eb8a7db..0d2944278d80 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -694,6 +694,7 @@ config DEV_PAGEMAP_OPS config HMM bool + select MMU_NOTIFIER select MIGRATE_VMA_HELPER config HMM_MIRROR From patchwork Mon Mar 25 14:40:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869403 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 70CD71708 for ; Mon, 25 Mar 2019 14:40:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 594892945A for ; Mon, 25 Mar 2019 14:40:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5790429320; Mon, 25 Mar 2019 14:40:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3542A2945A for ; Mon, 25 Mar 2019 14:40:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B6986B000C; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 141CC6B000A; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB5D06B000E; Mon, 25 Mar 2019 10:40:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id B483F6B000A for ; Mon, 25 Mar 2019 10:40:18 -0400 (EDT) Received: by mail-qk1-f197.google.com with SMTP id i124so8722617qkf.14 for ; Mon, 25 Mar 2019 07:40:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=7ONGTr4kZBuzxpe8M2z5Jy/Fc61VQBkFLKoCUtjqGXY=; b=onD4LrU38RLiYboAdPCgPZKKeOduDF5UB0i/P5ftJYo76xRl/7V9Yj6MNLyVIkM0Xb vaWVnmf1mqv+PT8lEUJGzoxX2Z6npZ2lGuMOdT2DZ1vMuGTSSsaVrnIz9sdedtNqnT87 5Ou4H7g5MtUso7TcQuIsu0Bi02EFeLqELzkO4lNPGHv2N8gIx4sslC1Cq94jLHidL9nt DUCauiXy7xxy52uGEJd+2L8m9QrUkkQNj9SFlYI9+4zJWYu9S5YI8r7WYHBWL/G32654 dqAGwrvxgwqHgYsrZc5UkfiScAkflQWTjAb2pRxL1XCFG5sdYmtqStqhqb2Hn3pXnbtA CWKA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAXoQL18AXbjRM1vBHAEEcaFJj4a1xCpB1xGSlSUsSl245ZNJUFe rzzxkBHC+Zolh16mpDAF0K+WYn6zG7rHGWOnz5zsU2VOy6AKXulwGrYynmcpjmlt2A9cho7XKzU N2F9Y1FIrzBBH6CC+3Rvz7sYP2uFtMQ8SPfKYqBLFpxwOm2mgjcTItzWjmWETG7HI4Q== X-Received: by 2002:ae9:e417:: with SMTP id q23mr17949285qkc.29.1553524818457; Mon, 25 Mar 2019 07:40:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKJHRRihScJwyooMNBntqi6jGoblocBp1XqGlZulia9I7EU8WL+6SAWw9D18+WULdROutY X-Received: by 2002:ae9:e417:: with SMTP id q23mr17949197qkc.29.1553524817022; Mon, 25 Mar 2019 07:40:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524817; cv=none; d=google.com; s=arc-20160816; b=jp3oIp/jx2AjvqG0cB2BUxmkYX2v3h0FCv6St4rrJv47v0b9B8zj5jGr1+9RB3q4uI j7zSPI3jJuE38IoMAif2T8yMf1JlRwWDw8xtcU/DMUB1jFao47yC5SNdHl/sTgpQcoBZ z5j07Z7ZTHZYBRlCR69M5kkA0xVR09+1uG4emo7CS4UPmuX5UvdMxjZHeChxVW05cVgp KlcrcCBYbWBZp+wF1Brensx0sZ8GvfMLOXI4u4R01t+Z/DKdJS0EuF4rXR71om61Pw/r DDn6Ye+KMIM3ZV0/IM9lqeLlPUw5Rg8sQ4o03HvrvU1dNJlwoNLBNH3l42k0pQ9SFdPO dD7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=7ONGTr4kZBuzxpe8M2z5Jy/Fc61VQBkFLKoCUtjqGXY=; b=0lwVEsDiHzoitqlvHoHZLyIVHxVrbqTLOuDkZMMnH9qr+dpdNrrdGMGBd8FBxD55S9 x5eDI+Uvt5UlFi7H59B+p0FZ8cB1ZALZPq/rXg4+76HADqCN4AS9T0EAONjDw3TbTkPv GyCASQ331BTslGtCFcuy7/Mp093N/FoZjXQG2Y3qcaQMg4yUqkf2frOENRs3HpSfRBOk kZ4WDcj4VBWvT0L1vuXEGGNfsIarxEz+XleV1/E3TpQXEfuYv/w4z5udwF54LX98Jdrw 4A40Cy/kCvfGN2CyHG0SoQP82D2tc34nugcQJa86J+0b/5E764/Xsk3ACD0qS4FmXOQq Qr7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id g203si1742440qkb.218.2019.03.25.07.40.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:17 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3699C3083363; Mon, 25 Mar 2019 14:40:16 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 744A21001DE4; Mon, 25 Mar 2019 14:40:15 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , John Hubbard , Andrew Morton , Dan Williams Subject: [PATCH v2 02/11] mm/hmm: use reference counting for HMM struct v2 Date: Mon, 25 Mar 2019 10:40:02 -0400 Message-Id: <20190325144011.10560-3-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 25 Mar 2019 14:40:16 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Every time i read the code to check that the HMM structure does not vanish before it should thanks to the many lock protecting its removal i get a headache. Switch to reference counting instead it is much easier to follow and harder to break. This also remove some code that is no longer needed with refcounting. Changes since v1: - removed bunch of useless check (if API is use with bogus argument better to fail loudly so user fix their code) - s/hmm_get/mm_get_hmm/ Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: John Hubbard Cc: Andrew Morton Cc: Dan Williams --- include/linux/hmm.h | 2 + mm/hmm.c | 170 ++++++++++++++++++++++++++++---------------- 2 files changed, 112 insertions(+), 60 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index ad50b7b4f141..716fc61fa6d4 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -131,6 +131,7 @@ enum hmm_pfn_value_e { /* * struct hmm_range - track invalidation lock on virtual address range * + * @hmm: the core HMM structure this range is active against * @vma: the vm area struct for the range * @list: all range lock are on a list * @start: range virtual start address (inclusive) @@ -142,6 +143,7 @@ enum hmm_pfn_value_e { * @valid: pfns array did not change since it has been fill by an HMM function */ struct hmm_range { + struct hmm *hmm; struct vm_area_struct *vma; struct list_head list; unsigned long start; diff --git a/mm/hmm.c b/mm/hmm.c index fe1cd87e49ac..306e57f7cded 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -50,6 +50,7 @@ static const struct mmu_notifier_ops hmm_mmu_notifier_ops; */ struct hmm { struct mm_struct *mm; + struct kref kref; spinlock_t lock; struct list_head ranges; struct list_head mirrors; @@ -57,6 +58,16 @@ struct hmm { struct rw_semaphore mirrors_sem; }; +static inline struct hmm *mm_get_hmm(struct mm_struct *mm) +{ + struct hmm *hmm = READ_ONCE(mm->hmm); + + if (hmm && kref_get_unless_zero(&hmm->kref)) + return hmm; + + return NULL; +} + /* * hmm_register - register HMM against an mm (HMM internal) * @@ -67,14 +78,9 @@ struct hmm { */ static struct hmm *hmm_register(struct mm_struct *mm) { - struct hmm *hmm = READ_ONCE(mm->hmm); + struct hmm *hmm = mm_get_hmm(mm); bool cleanup = false; - /* - * The hmm struct can only be freed once the mm_struct goes away, - * hence we should always have pre-allocated an new hmm struct - * above. - */ if (hmm) return hmm; @@ -86,6 +92,7 @@ static struct hmm *hmm_register(struct mm_struct *mm) hmm->mmu_notifier.ops = NULL; INIT_LIST_HEAD(&hmm->ranges); spin_lock_init(&hmm->lock); + kref_init(&hmm->kref); hmm->mm = mm; spin_lock(&mm->page_table_lock); @@ -106,7 +113,7 @@ static struct hmm *hmm_register(struct mm_struct *mm) if (__mmu_notifier_register(&hmm->mmu_notifier, mm)) goto error_mm; - return mm->hmm; + return hmm; error_mm: spin_lock(&mm->page_table_lock); @@ -118,9 +125,41 @@ static struct hmm *hmm_register(struct mm_struct *mm) return NULL; } +static void hmm_free(struct kref *kref) +{ + struct hmm *hmm = container_of(kref, struct hmm, kref); + struct mm_struct *mm = hmm->mm; + + mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm); + + spin_lock(&mm->page_table_lock); + if (mm->hmm == hmm) + mm->hmm = NULL; + spin_unlock(&mm->page_table_lock); + + kfree(hmm); +} + +static inline void hmm_put(struct hmm *hmm) +{ + kref_put(&hmm->kref, hmm_free); +} + void hmm_mm_destroy(struct mm_struct *mm) { - kfree(mm->hmm); + struct hmm *hmm; + + spin_lock(&mm->page_table_lock); + hmm = mm_get_hmm(mm); + mm->hmm = NULL; + if (hmm) { + hmm->mm = NULL; + spin_unlock(&mm->page_table_lock); + hmm_put(hmm); + return; + } + + spin_unlock(&mm->page_table_lock); } static int hmm_invalidate_range(struct hmm *hmm, bool device, @@ -165,7 +204,7 @@ static int hmm_invalidate_range(struct hmm *hmm, bool device, static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct hmm_mirror *mirror; - struct hmm *hmm = mm->hmm; + struct hmm *hmm = mm_get_hmm(mm); down_write(&hmm->mirrors_sem); mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, @@ -186,13 +225,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) struct hmm_mirror, list); } up_write(&hmm->mirrors_sem); + + hmm_put(hmm); } static int hmm_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { + struct hmm *hmm = mm_get_hmm(range->mm); struct hmm_update update; - struct hmm *hmm = range->mm->hmm; + int ret; VM_BUG_ON(!hmm); @@ -200,14 +242,16 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn, update.end = range->end; update.event = HMM_UPDATE_INVALIDATE; update.blockable = range->blockable; - return hmm_invalidate_range(hmm, true, &update); + ret = hmm_invalidate_range(hmm, true, &update); + hmm_put(hmm); + return ret; } static void hmm_invalidate_range_end(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { + struct hmm *hmm = mm_get_hmm(range->mm); struct hmm_update update; - struct hmm *hmm = range->mm->hmm; VM_BUG_ON(!hmm); @@ -216,6 +260,7 @@ static void hmm_invalidate_range_end(struct mmu_notifier *mn, update.event = HMM_UPDATE_INVALIDATE; update.blockable = true; hmm_invalidate_range(hmm, false, &update); + hmm_put(hmm); } static const struct mmu_notifier_ops hmm_mmu_notifier_ops = { @@ -241,24 +286,13 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm) if (!mm || !mirror || !mirror->ops) return -EINVAL; -again: mirror->hmm = hmm_register(mm); if (!mirror->hmm) return -ENOMEM; down_write(&mirror->hmm->mirrors_sem); - if (mirror->hmm->mm == NULL) { - /* - * A racing hmm_mirror_unregister() is about to destroy the hmm - * struct. Try again to allocate a new one. - */ - up_write(&mirror->hmm->mirrors_sem); - mirror->hmm = NULL; - goto again; - } else { - list_add(&mirror->list, &mirror->hmm->mirrors); - up_write(&mirror->hmm->mirrors_sem); - } + list_add(&mirror->list, &mirror->hmm->mirrors); + up_write(&mirror->hmm->mirrors_sem); return 0; } @@ -273,33 +307,18 @@ EXPORT_SYMBOL(hmm_mirror_register); */ void hmm_mirror_unregister(struct hmm_mirror *mirror) { - bool should_unregister = false; - struct mm_struct *mm; - struct hmm *hmm; + struct hmm *hmm = READ_ONCE(mirror->hmm); - if (mirror->hmm == NULL) + if (hmm == NULL) return; - hmm = mirror->hmm; down_write(&hmm->mirrors_sem); list_del_init(&mirror->list); - should_unregister = list_empty(&hmm->mirrors); + /* To protect us against double unregister ... */ mirror->hmm = NULL; - mm = hmm->mm; - hmm->mm = NULL; up_write(&hmm->mirrors_sem); - if (!should_unregister || mm == NULL) - return; - - mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm); - - spin_lock(&mm->page_table_lock); - if (mm->hmm == hmm) - mm->hmm = NULL; - spin_unlock(&mm->page_table_lock); - - kfree(hmm); + hmm_put(hmm); } EXPORT_SYMBOL(hmm_mirror_unregister); @@ -708,6 +727,8 @@ int hmm_vma_get_pfns(struct hmm_range *range) struct mm_walk mm_walk; struct hmm *hmm; + range->hmm = NULL; + /* Sanity check, this really should not happen ! */ if (range->start < vma->vm_start || range->start >= vma->vm_end) return -EINVAL; @@ -717,14 +738,18 @@ int hmm_vma_get_pfns(struct hmm_range *range) hmm = hmm_register(vma->vm_mm); if (!hmm) return -ENOMEM; - /* Caller must have registered a mirror, via hmm_mirror_register() ! */ - if (!hmm->mmu_notifier.ops) + + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL) { + hmm_put(hmm); return -EINVAL; + } /* FIXME support hugetlb fs */ if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || vma_is_dax(vma)) { hmm_pfns_special(range); + hmm_put(hmm); return -EINVAL; } @@ -736,6 +761,7 @@ int hmm_vma_get_pfns(struct hmm_range *range) * operations such has atomic access would not work. */ hmm_pfns_clear(range, range->pfns, range->start, range->end); + hmm_put(hmm); return -EPERM; } @@ -758,6 +784,12 @@ int hmm_vma_get_pfns(struct hmm_range *range) mm_walk.pte_hole = hmm_vma_walk_hole; walk_page_range(range->start, range->end, &mm_walk); + /* + * Transfer hmm reference to the range struct it will be drop inside + * the hmm_vma_range_done() function (which _must_ be call if this + * function return 0). + */ + range->hmm = hmm; return 0; } EXPORT_SYMBOL(hmm_vma_get_pfns); @@ -802,25 +834,27 @@ EXPORT_SYMBOL(hmm_vma_get_pfns); */ bool hmm_vma_range_done(struct hmm_range *range) { - unsigned long npages = (range->end - range->start) >> PAGE_SHIFT; - struct hmm *hmm; + bool ret = false; - if (range->end <= range->start) { + /* Sanity check this really should not happen. */ + if (range->hmm == NULL || range->end <= range->start) { BUG(); return false; } - hmm = hmm_register(range->vma->vm_mm); - if (!hmm) { - memset(range->pfns, 0, sizeof(*range->pfns) * npages); - return false; - } - - spin_lock(&hmm->lock); + spin_lock(&range->hmm->lock); list_del_rcu(&range->list); - spin_unlock(&hmm->lock); + ret = range->valid; + spin_unlock(&range->hmm->lock); - return range->valid; + /* Is the mm still alive ? */ + if (range->hmm->mm == NULL) + ret = false; + + /* Drop reference taken by hmm_vma_fault() or hmm_vma_get_pfns() */ + hmm_put(range->hmm); + range->hmm = NULL; + return ret; } EXPORT_SYMBOL(hmm_vma_range_done); @@ -880,6 +914,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block) struct hmm *hmm; int ret; + range->hmm = NULL; + /* Sanity check, this really should not happen ! */ if (range->start < vma->vm_start || range->start >= vma->vm_end) return -EINVAL; @@ -891,14 +927,18 @@ int hmm_vma_fault(struct hmm_range *range, bool block) hmm_pfns_clear(range, range->pfns, range->start, range->end); return -ENOMEM; } - /* Caller must have registered a mirror using hmm_mirror_register() */ - if (!hmm->mmu_notifier.ops) + + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL) { + hmm_put(hmm); return -EINVAL; + } /* FIXME support hugetlb fs */ if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || vma_is_dax(vma)) { hmm_pfns_special(range); + hmm_put(hmm); return -EINVAL; } @@ -910,6 +950,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block) * operations such has atomic access would not work. */ hmm_pfns_clear(range, range->pfns, range->start, range->end); + hmm_put(hmm); return -EPERM; } @@ -945,7 +986,16 @@ int hmm_vma_fault(struct hmm_range *range, bool block) hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last, range->end); hmm_vma_range_done(range); + hmm_put(hmm); + } else { + /* + * Transfer hmm reference to the range struct it will be drop + * inside the hmm_vma_range_done() function (which _must_ be + * call if this function return 0). + */ + range->hmm = hmm; } + return ret; } EXPORT_SYMBOL(hmm_vma_fault); From patchwork Mon Mar 25 14:40:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869405 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C4011708 for ; Mon, 25 Mar 2019 14:40:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 676EE2947C for ; Mon, 25 Mar 2019 14:40:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6541229498; Mon, 25 Mar 2019 14:40:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32A4C29486 for ; Mon, 25 Mar 2019 14:40:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4A2EE6B000A; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 36D496B0010; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07C306B000D; Mon, 25 Mar 2019 10:40:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id C9BD76B000C for ; Mon, 25 Mar 2019 10:40:18 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id d8so8764125qkk.17 for ; Mon, 25 Mar 2019 07:40:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=jtubiF8dOMY7rDYGXJ5NcNneghyDcZXbXzBZlbzJKik=; b=Vj0RAhixeB2gnsmtWSgYAMM08dgf2BBpOUfMoaf2Vr9H5bW7/zOHo/ubLWxbf4M+Ik tLQIoUUju5gYoKz0mwssTcqjw66v8ufFq6hHBeAclujYuMhGbjULmXrs6IlEdvHuv3Yo DbOhX5Fa9Rd2ORv9IgOYX5IY3pj72zMeUu22esb7tSUK4K5Ud73W621j+QCsnn7NzI6u qdL6yV2TzKD5HHck1zIfBWTC56rAUMJqb+Rm+5+abK048SMW2bwBEiaTeCsVUmbfYWEB iBnqnAcFRQlpaYO1akgGgZyId1BFwcH/Izesv9aR1K+szvbMzrbuut3yhdOjBNz+eCSc RTvA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAW/oVujc1W/Vk/q4S73jGAok+5vs6H7usSol2WxKfLZuQ+ZCIgS aTdFzdkuOO0Qrd0/mAhNMkH9Hc1gVjaUcR+zPUGAG7olZQ88bQORzVpMg/9745cwDQKf2abo8Vh 4QckUhxmmGLCdvY9Q/w0y0q8W9fGjykZCtNpm2J1jgcZ8u2pbPjvKf8UzuubuMJA+hw== X-Received: by 2002:a0c:fa4b:: with SMTP id k11mr20397930qvo.140.1553524818594; Mon, 25 Mar 2019 07:40:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqy5x+lmJIdkCaCHo9jzHWra6dmlY7ddRxW0E50Oqmmo6e4AbGkysrB5GkqjnCyp/b5ykaCO X-Received: by 2002:a0c:fa4b:: with SMTP id k11mr20397863qvo.140.1553524817627; Mon, 25 Mar 2019 07:40:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524817; cv=none; d=google.com; s=arc-20160816; b=hm0ooLb/m82ne0HexyIj/Lf2ouwxo9h2stfJXqTefuG7mSpY5zunUypvfhfaRXZFHU NvsLbooeR9jmqEZdQM7rQDR4mhkb2o74wl1ZSUDfjFsOAs+oa+XcH45Hdxl9R03grSYa n/TbJ4XWTQZfmc+j6VMpjSb3KgZnCl1KwlHAA/x5clx81/VWbh8wcpYlMrukDyfJfXyI 2V5IvWVTwmYTBjZH8FbqggIrsKEa3A0UKAhxZQD3V3RpVdBkKXV60+eohp6FLBo44qYI fVA7yfGp53mXqNWggYxOxtxNnL7Jv0urS11C//Brmy0vZVcH9OHkoYAe65W5j8oVGw3G P0NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=jtubiF8dOMY7rDYGXJ5NcNneghyDcZXbXzBZlbzJKik=; b=CjaTIHtqoP095Qn34hA4PmAb6GAjUuwFjzqEhKXwQHxwk2Lt8sWbAiKrDbSsIx9GpP NVOG+k2dVkZN1oz3U3BRba0YYAd2j56sEoCHy/MqofSCwDjUcyeFGQxlO07FJsqrMG2Y zPuqj6DPwGeKHVD1UZOKQcJdXgw3Bab/nVvdM1xqJKtihve5XUnGpbkDTir3cLqdEe7K +RXmTtKPSztOD9UU64+bIBa5TNqg7PbocgEdlwdOCCm3aSeBbE34FGcdpm1CihCDzNUU RMxGK0P+SNMMZmJXtu/lxkP81QcVvfBbGUgQfTT1B+zkaRf1obroPU1YcvvCWkANWvyP E+jw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id x134si3057846qka.64.2019.03.25.07.40.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:17 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E301E3092657; Mon, 25 Mar 2019 14:40:16 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53D421001DE4; Mon, 25 Mar 2019 14:40:16 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Dan Williams Subject: [PATCH v2 03/11] mm/hmm: do not erase snapshot when a range is invalidated Date: Mon, 25 Mar 2019 10:40:03 -0400 Message-Id: <20190325144011.10560-4-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Mon, 25 Mar 2019 14:40:16 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Users of HMM might be using the snapshot information to do preparatory step like dma mapping pages to a device before checking for invalidation through hmm_vma_range_done() so do not erase that information and assume users will do the right thing. Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Reviewed-by: John Hubbard Cc: Andrew Morton Cc: Dan Williams --- mm/hmm.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 306e57f7cded..213b0beee8d3 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -170,16 +170,10 @@ static int hmm_invalidate_range(struct hmm *hmm, bool device, spin_lock(&hmm->lock); list_for_each_entry(range, &hmm->ranges, list) { - unsigned long addr, idx, npages; - if (update->end < range->start || update->start >= range->end) continue; range->valid = false; - addr = max(update->start, range->start); - idx = (addr - range->start) >> PAGE_SHIFT; - npages = (min(range->end, update->end) - addr) >> PAGE_SHIFT; - memset(&range->pfns[idx], 0, sizeof(*range->pfns) * npages); } spin_unlock(&hmm->lock); From patchwork Mon Mar 25 14:40:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869407 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 176851390 for ; Mon, 25 Mar 2019 14:40:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0F7B29471 for ; Mon, 25 Mar 2019 14:40:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EE96129480; Mon, 25 Mar 2019 14:40:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AEBB9293B8 for ; Mon, 25 Mar 2019 14:40:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05EBA6B000D; Mon, 25 Mar 2019 10:40:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DA2176B000E; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF1416B0010; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 98CE16B000D for ; Mon, 25 Mar 2019 10:40:19 -0400 (EDT) Received: by mail-qt1-f197.google.com with SMTP id q12so10387029qtr.3 for ; Mon, 25 Mar 2019 07:40:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=WjrNKWZy4qqYcworrBkysnGT86CRGtPdOxAeIcFioOM=; b=gw/7A4wQHbOZuNZ1jC8/1acFsStU26vZW4hI5d01xKn/BqbJZfeKb1aB9PQIlDBv6K 5PSS2/tSBQ1/pZCRM7H0uJO0koQ4KMlmcVzI8C9BC5v8aFXrj/o1gM9dqCQ2RvgLqc8a ZY+KFgkxIo8TnLe/PcswBGWTJQygL1SuZiuVybPVnB0Vt6xIJDlk+Tt9Bb+snUP7EY2Y auKOjlWs/wHngpIv5zmTTANCuQhOOEI/b1X8eaoJuHlbwF4R4C9TNtIio1CX8VQROEbr fsjjiyV3Ho+7eyIAo28OHCAspA4RuvLjr/Z891CZLz49sfLpVzspmUTygSS+jZ9dL62K LXQA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAUO0avbu/6bMKD35n5/NdwIiEiIIl9z/oJvDhMqwbVdI4MYu0Hk oNcSKjxzLoyaABQRrBBehW6FoiTxSUamCQCVgvmQQSxkj675fVj3rgFh1Nb8qqC2Gfo6CCIFe0s EinO1jm0Kmw8JeiR4PiGDvnV6WWR5pUKQfSesFevAOi1aQlPuU+aI9MgJWPL8uz6aXA== X-Received: by 2002:a37:4c08:: with SMTP id z8mr19791481qka.32.1553524819367; Mon, 25 Mar 2019 07:40:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqx65YycVC2ltcgt5jnOjSy6C7KRXxYdU7bSa4HKFKeKJz8AbLkxTbebiRzdVQpwbON3m9Uv X-Received: by 2002:a37:4c08:: with SMTP id z8mr19791424qka.32.1553524818421; Mon, 25 Mar 2019 07:40:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524818; cv=none; d=google.com; s=arc-20160816; b=x5CCh52t6Fok2djJFJXPLVXUt+eYrjLrRSJxhl83VHBEhvG2dECItfJPEkya3mept5 NqQXkc0BKznBGivpL00n102SOZw4WDo+ihgKfnZwPr15roV8c9Wq9Wz0cPw0gDAii80v kOkdBDFW3QyuTnjTJ4hensDqVjs0U2xliVVOcvsxTJdXNJqmbGcoNO4vsGgrgRtxOSzf AQsirAf5UKgIyrH7xLdYClEpYV2/83gBH1SjKMO/kUs3zOYlFhqAS/USd4OEl3RY97cD OdegfvCrV+dAbyShpnFLDSKmHUwTV94RV/OJbcxNgJXpTc2KyS7/t45nv1Pks6rCbSZ6 uewQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=WjrNKWZy4qqYcworrBkysnGT86CRGtPdOxAeIcFioOM=; b=gunVKfADLSSYUt4lk8nuHktlOpYEX5yyEvQNrWfXqIyCjeByUlrMqzPnA7NV1d9hI7 CVH1ZGBnZb3mInLvkqThCkYvQXEDt0yWgA/pUyi9VC+7QftGImmeixnNb6Z3sXLs4dpN P41DpvYOVI6ck7/UqVJKoT+45CUPms2S5m0X4YTj6uapY99PxSnvgNIxMrAKClOWeFSB jPDlwa1xIOzJuDAVYTfiObFNz+jCMqTj26LA+G4TvJHQAblqRPLMAd1GZ9fPe6vVTT0o x/qFSSEjebjIUdXDd4bz06Bu06blJzP5UHgUVr3ozWhe962+9iA/SPri4sqSn6S54But 3iMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id y138si1391627qkb.144.2019.03.25.07.40.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:18 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9F4CF308A968; Mon, 25 Mar 2019 14:40:17 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0ECB6100164A; Mon, 25 Mar 2019 14:40:16 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Dan Williams Subject: [PATCH v2 04/11] mm/hmm: improve and rename hmm_vma_get_pfns() to hmm_range_snapshot() v2 Date: Mon, 25 Mar 2019 10:40:04 -0400 Message-Id: <20190325144011.10560-5-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 25 Mar 2019 14:40:17 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Rename for consistency between code, comments and documentation. Also improves the comments on all the possible returns values. Improve the function by returning the number of populated entries in pfns array. Changes since v1: - updated documentation - reformated some comments Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Reviewed-by: John Hubbard Cc: Andrew Morton Cc: Dan Williams Reviewed-by: Ira Weiny --- Documentation/vm/hmm.rst | 26 ++++++++++++++++++-------- include/linux/hmm.h | 4 ++-- mm/hmm.c | 31 +++++++++++++++++-------------- 3 files changed, 37 insertions(+), 24 deletions(-) diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 44205f0b671f..d9b27bdadd1b 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -189,11 +189,7 @@ the driver callback returns. When the device driver wants to populate a range of virtual addresses, it can use either:: - int hmm_vma_get_pfns(struct vm_area_struct *vma, - struct hmm_range *range, - unsigned long start, - unsigned long end, - hmm_pfn_t *pfns); + long hmm_range_snapshot(struct hmm_range *range); int hmm_vma_fault(struct vm_area_struct *vma, struct hmm_range *range, unsigned long start, @@ -202,7 +198,7 @@ When the device driver wants to populate a range of virtual addresses, it can bool write, bool block); -The first one (hmm_vma_get_pfns()) will only fetch present CPU page table +The first one (hmm_range_snapshot()) will only fetch present CPU page table entries and will not trigger a page fault on missing or non-present entries. The second one does trigger a page fault on missing or read-only entry if the write parameter is true. Page faults use the generic mm page fault code path @@ -220,19 +216,33 @@ Locking with the update() callback is the most important aspect the driver must { struct hmm_range range; ... + + range.start = ...; + range.end = ...; + range.pfns = ...; + range.flags = ...; + range.values = ...; + range.pfn_shift = ...; + again: - ret = hmm_vma_get_pfns(vma, &range, start, end, pfns); - if (ret) + down_read(&mm->mmap_sem); + range.vma = ...; + ret = hmm_range_snapshot(&range); + if (ret) { + up_read(&mm->mmap_sem); return ret; + } take_lock(driver->update); if (!hmm_vma_range_done(vma, &range)) { release_lock(driver->update); + up_read(&mm->mmap_sem); goto again; } // Use pfns array content to update device page table release_lock(driver->update); + up_read(&mm->mmap_sem); return 0; } diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 716fc61fa6d4..32206b0b1bfd 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -365,11 +365,11 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * table invalidation serializes on it. * * YOU MUST CALL hmm_vma_range_done() ONCE AND ONLY ONCE EACH TIME YOU CALL - * hmm_vma_get_pfns() WITHOUT ERROR ! + * hmm_range_snapshot() WITHOUT ERROR ! * * IF YOU DO NOT FOLLOW THE ABOVE RULE THE SNAPSHOT CONTENT MIGHT BE INVALID ! */ -int hmm_vma_get_pfns(struct hmm_range *range); +long hmm_range_snapshot(struct hmm_range *range); bool hmm_vma_range_done(struct hmm_range *range); diff --git a/mm/hmm.c b/mm/hmm.c index 213b0beee8d3..91361aa74b8b 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -698,23 +698,25 @@ static void hmm_pfns_special(struct hmm_range *range) } /* - * hmm_vma_get_pfns() - snapshot CPU page table for a range of virtual addresses - * @range: range being snapshotted - * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid - * vma permission, 0 success + * hmm_range_snapshot() - snapshot CPU page table for a range + * @range: range + * Returns: number of valid pages in range->pfns[] (from range start + * address). This may be zero. If the return value is negative, + * then one of the following values may be returned: + * + * -EINVAL invalid arguments or mm or virtual address are in an + * invalid vma (ie either hugetlbfs or device file vma). + * -EPERM For example, asking for write, when the range is + * read-only + * -EAGAIN Caller needs to retry + * -EFAULT Either no valid vma exists for this range, or it is + * illegal to access the range * * This snapshots the CPU page table for a range of virtual addresses. Snapshot * validity is tracked by range struct. See hmm_vma_range_done() for further * information. - * - * The range struct is initialized here. It tracks the CPU page table, but only - * if the function returns success (0), in which case the caller must then call - * hmm_vma_range_done() to stop CPU page table update tracking on this range. - * - * NOT CALLING hmm_vma_range_done() IF FUNCTION RETURNS 0 WILL LEAD TO SERIOUS - * MEMORY CORRUPTION ! YOU HAVE BEEN WARNED ! */ -int hmm_vma_get_pfns(struct hmm_range *range) +long hmm_range_snapshot(struct hmm_range *range) { struct vm_area_struct *vma = range->vma; struct hmm_vma_walk hmm_vma_walk; @@ -768,6 +770,7 @@ int hmm_vma_get_pfns(struct hmm_range *range) hmm_vma_walk.fault = false; hmm_vma_walk.range = range; mm_walk.private = &hmm_vma_walk; + hmm_vma_walk.last = range->start; mm_walk.vma = vma; mm_walk.mm = vma->vm_mm; @@ -784,9 +787,9 @@ int hmm_vma_get_pfns(struct hmm_range *range) * function return 0). */ range->hmm = hmm; - return 0; + return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } -EXPORT_SYMBOL(hmm_vma_get_pfns); +EXPORT_SYMBOL(hmm_range_snapshot); /* * hmm_vma_range_done() - stop tracking change to CPU page table over a range From patchwork Mon Mar 25 14:40:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869409 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8FDD61708 for ; Mon, 25 Mar 2019 14:40:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 775602948A for ; Mon, 25 Mar 2019 14:40:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7576D29462; Mon, 25 Mar 2019 14:40:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7655B2948E for ; Mon, 25 Mar 2019 14:40:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E79F6B000E; Mon, 25 Mar 2019 10:40:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8465E6B0010; Mon, 25 Mar 2019 10:40:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 673946B0266; Mon, 25 Mar 2019 10:40:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id 2CC5A6B000E for ; Mon, 25 Mar 2019 10:40:20 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id n64so8808904qkb.0 for ; Mon, 25 Mar 2019 07:40:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=rYJUG340wBNeS+cZiOqGWpjbsOovvX98Lu+ozYutPTM=; b=dyAZ00fcGaBlKBe2efyv9JI+Co+XLOZgUZaHIczgu/YKqm5lOco1MNg0QfMMkbZ/8J kJj3IUZzM5wHiQCEzLuLsid1e1Ve3AxW1SiNDqnAlny5MZJ96N+qWCb8lPQRGkObBzCJ qGC/tlYQRyD9FY97+HlBIqCtr02D/qPUJwGFdzdgsO0MGjR3SMU0giT39NecZVNke6mJ OOJvCBBGt/6U27m3KLvnhvGWTQqUZwF5IM+9gPS1I/KDHCYzO8U4K8IwlpLbsfEOe4RF gJ4P6737NH7HoZiuFrSvogpItkMdnkALBFqL5wZ0uPaPUO3HG3QsIF9P9bU8unSh2Br6 amew== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAXS7KBJbpiLBMxHZvydImg8ctLHtmaiIy45a09+q9t6CVP7lM2u +wsS+/gMF565jWt8mxKEvJX6e8m1AA1HIUtvWqPL9f3IgM/SWF2mw3UnWBmomGr4NvLwd0C+mCh rsOK89GVrtwQi0v/yDXbQB3xp76XynvA2X7ieVdw50kKrLyOy585sBVIZj6vipeiEhg== X-Received: by 2002:aed:3988:: with SMTP id m8mr21497224qte.177.1553524819912; Mon, 25 Mar 2019 07:40:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqxalqpSmZX14XH9kbK09W7LmjxltAWWKk99GzgzhBuDtx1sJNwokcgvYZ0SRBE938DHeo0l X-Received: by 2002:aed:3988:: with SMTP id m8mr21497165qte.177.1553524819197; Mon, 25 Mar 2019 07:40:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524819; cv=none; d=google.com; s=arc-20160816; b=PpFWFibj9ONPDMtQMrU6ZUedf2PJkZEQIQCIrmm7NI/3Ug2Jv0hBiAJIejlYyCRFsv BU9o6ESgimUBBSmHIiLU/3TLKFhMx2XeaGUxOzrQXgk3ZOlwardaQNE+Tl9wWcfgjAk0 XFw2s+S0Nv9ZqdgKrff8eEKxFyF8+5vKtjWJKa8WF+5XGIdf4IdaW0OcV8vgppe6K1W5 tqsAIJeaaD5z2VscQZwOr9EZ00790oXESmnX3xXH484rAkiFl5pVwDeaSrRb4EJfAH3U ucWeKlsxX/R9XQmG0qCJHWuF3ADumtIr0rbEEqo5N2YAWWcWpN+vyPD9Kk5CJFJk0Rvn t5FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=rYJUG340wBNeS+cZiOqGWpjbsOovvX98Lu+ozYutPTM=; b=wIs8R02Mf5w4SAyS+j6nEJkWPBv8P9mW4ibqVjS6TyXSm1GNiFd+H/FvGP90ZBvW5D ps5AVXXd1zd7BcQ6mxaclOkqRd6FiqMIQL2EYJAzCl99+TI9kSA0j6y2IgstILkmEF0y EYQHjleqyuj7QyE7QBngL40Uegd8KhZcM54QkQcAhxdhHmSXMjW73IkPTjeXIe75O4DL 0iETiSyeHHPUkOl32Y2kjIexb3VSb1WjKR2QucCZI8qbBOIs8qiKVXNXO/vsSMXrCj93 A8aVaO6gfF2FT8HawfadFdagHAodRH+WgDbEIgC76H6pCebPgM+0BM362Nl1q5scHuJV xfow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 58si1093478qtr.13.2019.03.25.07.40.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:19 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A0E2307D860; Mon, 25 Mar 2019 14:40:18 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id BDE5E1001DC8; Mon, 25 Mar 2019 14:40:17 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , John Hubbard , Dan Williams Subject: [PATCH v2 05/11] mm/hmm: improve and rename hmm_vma_fault() to hmm_range_fault() v2 Date: Mon, 25 Mar 2019 10:40:05 -0400 Message-Id: <20190325144011.10560-6-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 25 Mar 2019 14:40:18 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Rename for consistency between code, comments and documentation. Also improves the comments on all the possible returns values. Improve the function by returning the number of populated entries in pfns array. Changes since v1: - updated documentation - reformated some comments Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: Andrew Morton Cc: John Hubbard Cc: Dan Williams --- Documentation/vm/hmm.rst | 8 +--- include/linux/hmm.h | 13 +++++- mm/hmm.c | 91 +++++++++++++++++----------------------- 3 files changed, 52 insertions(+), 60 deletions(-) diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index d9b27bdadd1b..61f073215a8d 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -190,13 +190,7 @@ When the device driver wants to populate a range of virtual addresses, it can use either:: long hmm_range_snapshot(struct hmm_range *range); - int hmm_vma_fault(struct vm_area_struct *vma, - struct hmm_range *range, - unsigned long start, - unsigned long end, - hmm_pfn_t *pfns, - bool write, - bool block); + long hmm_range_fault(struct hmm_range *range, bool block); The first one (hmm_range_snapshot()) will only fetch present CPU page table entries and will not trigger a page fault on missing or non-present entries. diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 32206b0b1bfd..e9afd23c2eac 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -391,7 +391,18 @@ bool hmm_vma_range_done(struct hmm_range *range); * * See the function description in mm/hmm.c for further documentation. */ -int hmm_vma_fault(struct hmm_range *range, bool block); +long hmm_range_fault(struct hmm_range *range, bool block); + +/* This is a temporary helper to avoid merge conflict between trees. */ +static inline int hmm_vma_fault(struct hmm_range *range, bool block) +{ + long ret = hmm_range_fault(range, block); + if (ret == -EBUSY) + ret = -EAGAIN; + else if (ret == -EAGAIN) + ret = -EBUSY; + return ret < 0 ? ret : 0; +} /* Below are for HMM internal use only! Not to be used by device driver! */ void hmm_mm_destroy(struct mm_struct *mm); diff --git a/mm/hmm.c b/mm/hmm.c index 91361aa74b8b..7860e63c3ba7 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -336,13 +336,13 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr, flags |= write_fault ? FAULT_FLAG_WRITE : 0; ret = handle_mm_fault(vma, addr, flags); if (ret & VM_FAULT_RETRY) - return -EBUSY; + return -EAGAIN; if (ret & VM_FAULT_ERROR) { *pfn = range->values[HMM_PFN_ERROR]; return -EFAULT; } - return -EAGAIN; + return -EBUSY; } static int hmm_pfns_bad(unsigned long addr, @@ -368,7 +368,7 @@ static int hmm_pfns_bad(unsigned long addr, * @fault: should we fault or not ? * @write_fault: write fault ? * @walk: mm_walk structure - * Returns: 0 on success, -EAGAIN after page fault, or page fault error + * Returns: 0 on success, -EBUSY after page fault, or page fault error * * This function will be called whenever pmd_none() or pte_none() returns true, * or whenever there is no page directory covering the virtual address range. @@ -391,12 +391,12 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end, ret = hmm_vma_do_fault(walk, addr, write_fault, &pfns[i]); - if (ret != -EAGAIN) + if (ret != -EBUSY) return ret; } } - return (fault || write_fault) ? -EAGAIN : 0; + return (fault || write_fault) ? -EBUSY : 0; } static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, @@ -527,11 +527,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, uint64_t orig_pfn = *pfn; *pfn = range->values[HMM_PFN_NONE]; - cpu_flags = pte_to_hmm_pfn_flags(range, pte); - hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, - &fault, &write_fault); + fault = write_fault = false; if (pte_none(pte)) { + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, + &fault, &write_fault); if (fault || write_fault) goto fault; return 0; @@ -570,7 +570,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, hmm_vma_walk->last = addr; migration_entry_wait(vma->vm_mm, pmdp, addr); - return -EAGAIN; + return -EBUSY; } return 0; } @@ -578,6 +578,10 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, /* Report error for everything else */ *pfn = range->values[HMM_PFN_ERROR]; return -EFAULT; + } else { + cpu_flags = pte_to_hmm_pfn_flags(range, pte); + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, + &fault, &write_fault); } if (fault || write_fault) @@ -628,7 +632,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, if (fault || write_fault) { hmm_vma_walk->last = addr; pmd_migration_entry_wait(vma->vm_mm, pmdp); - return -EAGAIN; + return -EBUSY; } return 0; } else if (!pmd_present(pmd)) @@ -856,53 +860,34 @@ bool hmm_vma_range_done(struct hmm_range *range) EXPORT_SYMBOL(hmm_vma_range_done); /* - * hmm_vma_fault() - try to fault some address in a virtual address range + * hmm_range_fault() - try to fault some address in a virtual address range * @range: range being faulted * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem) - * Returns: 0 success, error otherwise (-EAGAIN means mmap_sem have been drop) + * Returns: number of valid pages in range->pfns[] (from range start + * address). This may be zero. If the return value is negative, + * then one of the following values may be returned: + * + * -EINVAL invalid arguments or mm or virtual address are in an + * invalid vma (ie either hugetlbfs or device file vma). + * -ENOMEM: Out of memory. + * -EPERM: Invalid permission (for instance asking for write and + * range is read only). + * -EAGAIN: If you need to retry and mmap_sem was drop. This can only + * happens if block argument is false. + * -EBUSY: If the the range is being invalidated and you should wait + * for invalidation to finish. + * -EFAULT: Invalid (ie either no valid vma or it is illegal to access + * that range), number of valid pages in range->pfns[] (from + * range start address). * * This is similar to a regular CPU page fault except that it will not trigger - * any memory migration if the memory being faulted is not accessible by CPUs. + * any memory migration if the memory being faulted is not accessible by CPUs + * and caller does not ask for migration. * * On error, for one virtual address in the range, the function will mark the * corresponding HMM pfn entry with an error flag. - * - * Expected use pattern: - * retry: - * down_read(&mm->mmap_sem); - * // Find vma and address device wants to fault, initialize hmm_pfn_t - * // array accordingly - * ret = hmm_vma_fault(range, write, block); - * switch (ret) { - * case -EAGAIN: - * hmm_vma_range_done(range); - * // You might want to rate limit or yield to play nicely, you may - * // also commit any valid pfn in the array assuming that you are - * // getting true from hmm_vma_range_monitor_end() - * goto retry; - * case 0: - * break; - * case -ENOMEM: - * case -EINVAL: - * case -EPERM: - * default: - * // Handle error ! - * up_read(&mm->mmap_sem) - * return; - * } - * // Take device driver lock that serialize device page table update - * driver_lock_device_page_table_update(); - * hmm_vma_range_done(range); - * // Commit pfns we got from hmm_vma_fault() - * driver_unlock_device_page_table_update(); - * up_read(&mm->mmap_sem) - * - * YOU MUST CALL hmm_vma_range_done() AFTER THIS FUNCTION RETURN SUCCESS (0) - * BEFORE FREEING THE range struct OR YOU WILL HAVE SERIOUS MEMORY CORRUPTION ! - * - * YOU HAVE BEEN WARNED ! */ -int hmm_vma_fault(struct hmm_range *range, bool block) +long hmm_range_fault(struct hmm_range *range, bool block) { struct vm_area_struct *vma = range->vma; unsigned long start = range->start; @@ -974,7 +959,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block) do { ret = walk_page_range(start, range->end, &mm_walk); start = hmm_vma_walk.last; - } while (ret == -EAGAIN); + /* Keep trying while the range is valid. */ + } while (ret == -EBUSY && range->valid); if (ret) { unsigned long i; @@ -984,6 +970,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block) range->end); hmm_vma_range_done(range); hmm_put(hmm); + return ret; } else { /* * Transfer hmm reference to the range struct it will be drop @@ -993,9 +980,9 @@ int hmm_vma_fault(struct hmm_range *range, bool block) range->hmm = hmm; } - return ret; + return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } -EXPORT_SYMBOL(hmm_vma_fault); +EXPORT_SYMBOL(hmm_range_fault); #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */ From patchwork Mon Mar 25 14:40:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869413 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D3EB1390 for ; Mon, 25 Mar 2019 14:40:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BFB42945A for ; Mon, 25 Mar 2019 14:40:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A0EF2946B; Mon, 25 Mar 2019 14:40:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D71C529493 for ; Mon, 25 Mar 2019 14:40:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D76A6B0010; Mon, 25 Mar 2019 10:40:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 69E616B026A; Mon, 25 Mar 2019 10:40:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38E586B026B; Mon, 25 Mar 2019 10:40:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id F285F6B0266 for ; Mon, 25 Mar 2019 10:40:21 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id o135so8766449qke.11 for ; Mon, 25 Mar 2019 07:40:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=g5gWMkcqAifFjDg109febKFtF0HTR0AWAlBCSX3EnvY=; b=e2Y4S1VhsgwlpDXjDxfkH46unvMq/eK1ihHa63cz+OED22uct+pZuyi/XUcG+owyfT 8q2iEnRxNZFt/ew299p/2jeQAk3kD5BdjkbfhMMh3k97ea1VabPDDceP4mrbWu6F26z+ fKrjRoe3Ioolbd8wxYxfkQH687mL5a/VaZ5abe40sJuIXZDlPUfymn+ygOUP1gqrNmiu h0DlAdekTeuwtdYPLhcOn7zPFqvzULHxweDTMP30SQK8hM6HmhQFoh01Q/XuHRiVAvob 8VXy3ve57rrf40RhaSld+wfKJgiBDmaxKHxC4wgaguPoSCI6U9P9CrDN4crR7QU8KpNH MPjQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAXGp73y6dtEQWiDSnTUGeXU7dwyVOv6a3ZIkq/m8oCUWvMTHnHH 4tVyMz193aBDSMqRaxSLOYFoWA/Xf6HzNjTqXxXwHDmHsp+jQqeICsSIPADRWlCnje8GVsiIDsp 4bBk5TxsfMSNJtbs8tcapy2C8JNamFeJw3dh9VdmbhVuFRFDw6HmyI7XAWZC8AIBbmg== X-Received: by 2002:a37:6983:: with SMTP id e125mr19984690qkc.80.1553524821651; Mon, 25 Mar 2019 07:40:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqzm/MyHqsQrmNMlHWrIJi+P0/nO0UPlW3EcuR3U8ut+lpkbZqo5Azu1swsuXG2U3p4c8sjS X-Received: by 2002:a37:6983:: with SMTP id e125mr19984604qkc.80.1553524820276; Mon, 25 Mar 2019 07:40:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524820; cv=none; d=google.com; s=arc-20160816; b=c0p84L8eyjwqc2X4Qtp2mW+eCiJtEzsn8YFlmGbNUo1CQz8GJ9nYv5gvaTvBAM+Wkj w/tK6wELfn+Hr8crMQoHwpwfQkO9UDsHIbR57F3jc4OfR9MANJwl9y6wdOVzTOKe2QqW LNGlA8PhVa3EOT2+zI/aCVjtKv1ajICbyMe/mZ6oKNc+jRhts2Y5aZ4s41uAzvuQIHj8 jPm231xEvN6FEKNjWBIdKq+fhk8fke+KZoRSEi8yDNbBkN1AMqzGG+oO02HXULw7Q1Q0 20xwWsFxLLpzfoBWXodIRZ327P/i6+o8BlauNVPC9dLU862R+rI5Oo4P77fTrIvaxm7J JPkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=g5gWMkcqAifFjDg109febKFtF0HTR0AWAlBCSX3EnvY=; b=lA4HPLAMMIsQrzv2xHax98FdqfCtYiqB05DBnyDsh/flZ3kOAky8Fkq9/DvD8a10cM W9yGTtsLDCDjuK77LdrFMVpz9EK3CoMR2YQemA6VlsQPx5VwSMQ+dZpWlncqgIVF1/Nk 3TD4uicoKNLPFgyB4utgk3asjM07PgyHTd+eqzcyJw9MA+0+fCJUCs20rnlT/pyLMlQT AA5UuKlxqlAU3VMHyrBVjUoXiKf+WeU9zWS1rlO4KJlYdW3PXpB7ePbznvUWhoUrFizY 94gYrmULvI/qWu+hDXnuUp6Dg1ltD1L64StIQqQ3ZYxxBNGvwDMi0vsx1v4fYU4slk/T p3yg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id c82si453557qkb.219.2019.03.25.07.40.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:20 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7222130820DD; Mon, 25 Mar 2019 14:40:19 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A34F1001DC8; Mon, 25 Mar 2019 14:40:18 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , John Hubbard , Dan Williams , Dan Carpenter , Matthew Wilcox Subject: [PATCH v2 06/11] mm/hmm: improve driver API to work and wait over a range v2 Date: Mon, 25 Mar 2019 10:40:06 -0400 Message-Id: <20190325144011.10560-7-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 25 Mar 2019 14:40:19 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse A common use case for HMM mirror is user trying to mirror a range and before they could program the hardware it get invalidated by some core mm event. Instead of having user re-try right away to mirror the range provide a completion mechanism for them to wait for any active invalidation affecting the range. This also changes how hmm_range_snapshot() and hmm_range_fault() works by not relying on vma so that we can drop the mmap_sem when waiting and lookup the vma again on retry. Changes since v1: - squashed: Dan Carpenter: potential deadlock in nonblocking code Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: Andrew Morton Cc: John Hubbard Cc: Dan Williams Cc: Dan Carpenter Cc: Matthew Wilcox --- include/linux/hmm.h | 208 ++++++++++++++--- mm/hmm.c | 528 +++++++++++++++++++++----------------------- 2 files changed, 428 insertions(+), 308 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index e9afd23c2eac..79671036cb5f 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -77,8 +77,34 @@ #include #include #include +#include -struct hmm; + +/* + * struct hmm - HMM per mm struct + * + * @mm: mm struct this HMM struct is bound to + * @lock: lock protecting ranges list + * @ranges: list of range being snapshotted + * @mirrors: list of mirrors for this mm + * @mmu_notifier: mmu notifier to track updates to CPU page table + * @mirrors_sem: read/write semaphore protecting the mirrors list + * @wq: wait queue for user waiting on a range invalidation + * @notifiers: count of active mmu notifiers + * @dead: is the mm dead ? + */ +struct hmm { + struct mm_struct *mm; + struct kref kref; + struct mutex lock; + struct list_head ranges; + struct list_head mirrors; + struct mmu_notifier mmu_notifier; + struct rw_semaphore mirrors_sem; + wait_queue_head_t wq; + long notifiers; + bool dead; +}; /* * hmm_pfn_flag_e - HMM flag enums @@ -155,6 +181,38 @@ struct hmm_range { bool valid; }; +/* + * hmm_range_wait_until_valid() - wait for range to be valid + * @range: range affected by invalidation to wait on + * @timeout: time out for wait in ms (ie abort wait after that period of time) + * Returns: true if the range is valid, false otherwise. + */ +static inline bool hmm_range_wait_until_valid(struct hmm_range *range, + unsigned long timeout) +{ + /* Check if mm is dead ? */ + if (range->hmm == NULL || range->hmm->dead || range->hmm->mm == NULL) { + range->valid = false; + return false; + } + if (range->valid) + return true; + wait_event_timeout(range->hmm->wq, range->valid || range->hmm->dead, + msecs_to_jiffies(timeout)); + /* Return current valid status just in case we get lucky */ + return range->valid; +} + +/* + * hmm_range_valid() - test if a range is valid or not + * @range: range + * Returns: true if the range is valid, false otherwise. + */ +static inline bool hmm_range_valid(struct hmm_range *range) +{ + return range->valid; +} + /* * hmm_pfn_to_page() - return struct page pointed to by a valid HMM pfn * @range: range use to decode HMM pfn value @@ -357,51 +415,133 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); /* - * To snapshot the CPU page table, call hmm_vma_get_pfns(), then take a device - * driver lock that serializes device page table updates, then call - * hmm_vma_range_done(), to check if the snapshot is still valid. The same - * device driver page table update lock must also be used in the - * hmm_mirror_ops.sync_cpu_device_pagetables() callback, so that CPU page - * table invalidation serializes on it. + * To snapshot the CPU page table you first have to call hmm_range_register() + * to register the range. If hmm_range_register() return an error then some- + * thing is horribly wrong and you should fail loudly. If it returned true then + * you can wait for the range to be stable with hmm_range_wait_until_valid() + * function, a range is valid when there are no concurrent changes to the CPU + * page table for the range. + * + * Once the range is valid you can call hmm_range_snapshot() if that returns + * without error then you can take your device page table lock (the same lock + * you use in the HMM mirror sync_cpu_device_pagetables() callback). After + * taking that lock you have to check the range validity, if it is still valid + * (ie hmm_range_valid() returns true) then you can program the device page + * table, otherwise you have to start again. Pseudo code: + * + * mydevice_prefault(mydevice, mm, start, end) + * { + * struct hmm_range range; + * ... * - * YOU MUST CALL hmm_vma_range_done() ONCE AND ONLY ONCE EACH TIME YOU CALL - * hmm_range_snapshot() WITHOUT ERROR ! + * ret = hmm_range_register(&range, mm, start, end); + * if (ret) + * return ret; * - * IF YOU DO NOT FOLLOW THE ABOVE RULE THE SNAPSHOT CONTENT MIGHT BE INVALID ! - */ -long hmm_range_snapshot(struct hmm_range *range); -bool hmm_vma_range_done(struct hmm_range *range); - - -/* - * Fault memory on behalf of device driver. Unlike handle_mm_fault(), this will - * not migrate any device memory back to system memory. The HMM pfn array will - * be updated with the fault result and current snapshot of the CPU page table - * for the range. + * down_read(mm->mmap_sem); + * again: + * + * if (!hmm_range_wait_until_valid(&range, TIMEOUT)) { + * up_read(&mm->mmap_sem); + * hmm_range_unregister(range); + * // Handle time out, either sleep or retry or something else + * ... + * return -ESOMETHING; || goto again; + * } + * + * ret = hmm_range_snapshot(&range); or hmm_range_fault(&range); + * if (ret == -EAGAIN) { + * down_read(mm->mmap_sem); + * goto again; + * } else if (ret == -EBUSY) { + * goto again; + * } + * + * up_read(&mm->mmap_sem); + * if (ret) { + * hmm_range_unregister(range); + * return ret; + * } + * + * // It might not have snap-shoted the whole range but only the first + * // npages, the return values is the number of valid pages from the + * // start of the range. + * npages = ret; * - * The mmap_sem must be taken in read mode before entering and it might be - * dropped by the function if the block argument is false. In that case, the - * function returns -EAGAIN. + * ... * - * Return value does not reflect if the fault was successful for every single - * address or not. Therefore, the caller must to inspect the HMM pfn array to - * determine fault status for each address. + * mydevice_page_table_lock(mydevice); + * if (!hmm_range_valid(range)) { + * mydevice_page_table_unlock(mydevice); + * goto again; + * } * - * Trying to fault inside an invalid vma will result in -EINVAL. + * mydevice_populate_page_table(mydevice, range, npages); + * ... + * mydevice_take_page_table_unlock(mydevice); + * hmm_range_unregister(range); * - * See the function description in mm/hmm.c for further documentation. + * return 0; + * } + * + * The same scheme apply to hmm_range_fault() (ie replace hmm_range_snapshot() + * with hmm_range_fault() in above pseudo code). + * + * YOU MUST CALL hmm_range_unregister() ONCE AND ONLY ONCE EACH TIME YOU CALL + * hmm_range_register() AND hmm_range_register() RETURNED TRUE ! IF YOU DO NOT + * FOLLOW THIS RULE MEMORY CORRUPTION WILL ENSUE ! */ +int hmm_range_register(struct hmm_range *range, + struct mm_struct *mm, + unsigned long start, + unsigned long end); +void hmm_range_unregister(struct hmm_range *range); +long hmm_range_snapshot(struct hmm_range *range); long hmm_range_fault(struct hmm_range *range, bool block); +/* + * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range + * + * When waiting for mmu notifiers we need some kind of time out otherwise we + * could potentialy wait for ever, 1000ms ie 1s sounds like a long time to + * wait already. + */ +#define HMM_RANGE_DEFAULT_TIMEOUT 1000 + /* This is a temporary helper to avoid merge conflict between trees. */ +static inline bool hmm_vma_range_done(struct hmm_range *range) +{ + bool ret = hmm_range_valid(range); + + hmm_range_unregister(range); + return ret; +} + static inline int hmm_vma_fault(struct hmm_range *range, bool block) { - long ret = hmm_range_fault(range, block); - if (ret == -EBUSY) - ret = -EAGAIN; - else if (ret == -EAGAIN) - ret = -EBUSY; - return ret < 0 ? ret : 0; + long ret; + + ret = hmm_range_register(range, range->vma->vm_mm, + range->start, range->end); + if (ret) + return (int)ret; + + if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) { + up_read(&range->vma->vm_mm->mmap_sem); + return -EAGAIN; + } + + ret = hmm_range_fault(range, block); + if (ret <= 0) { + if (ret == -EBUSY || !ret) { + up_read(&range->vma->vm_mm->mmap_sem); + ret = -EBUSY; + } else if (ret == -EAGAIN) + ret = -EBUSY; + hmm_range_unregister(range); + return ret; + } + return 0; } /* Below are for HMM internal use only! Not to be used by device driver! */ diff --git a/mm/hmm.c b/mm/hmm.c index 7860e63c3ba7..fa9498eeb9b6 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -38,26 +38,6 @@ #if IS_ENABLED(CONFIG_HMM_MIRROR) static const struct mmu_notifier_ops hmm_mmu_notifier_ops; -/* - * struct hmm - HMM per mm struct - * - * @mm: mm struct this HMM struct is bound to - * @lock: lock protecting ranges list - * @ranges: list of range being snapshotted - * @mirrors: list of mirrors for this mm - * @mmu_notifier: mmu notifier to track updates to CPU page table - * @mirrors_sem: read/write semaphore protecting the mirrors list - */ -struct hmm { - struct mm_struct *mm; - struct kref kref; - spinlock_t lock; - struct list_head ranges; - struct list_head mirrors; - struct mmu_notifier mmu_notifier; - struct rw_semaphore mirrors_sem; -}; - static inline struct hmm *mm_get_hmm(struct mm_struct *mm) { struct hmm *hmm = READ_ONCE(mm->hmm); @@ -87,12 +67,15 @@ static struct hmm *hmm_register(struct mm_struct *mm) hmm = kmalloc(sizeof(*hmm), GFP_KERNEL); if (!hmm) return NULL; + init_waitqueue_head(&hmm->wq); INIT_LIST_HEAD(&hmm->mirrors); init_rwsem(&hmm->mirrors_sem); hmm->mmu_notifier.ops = NULL; INIT_LIST_HEAD(&hmm->ranges); - spin_lock_init(&hmm->lock); + mutex_init(&hmm->lock); kref_init(&hmm->kref); + hmm->notifiers = 0; + hmm->dead = false; hmm->mm = mm; spin_lock(&mm->page_table_lock); @@ -154,6 +137,7 @@ void hmm_mm_destroy(struct mm_struct *mm) mm->hmm = NULL; if (hmm) { hmm->mm = NULL; + hmm->dead = true; spin_unlock(&mm->page_table_lock); hmm_put(hmm); return; @@ -162,43 +146,22 @@ void hmm_mm_destroy(struct mm_struct *mm) spin_unlock(&mm->page_table_lock); } -static int hmm_invalidate_range(struct hmm *hmm, bool device, - const struct hmm_update *update) +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) { + struct hmm *hmm = mm_get_hmm(mm); struct hmm_mirror *mirror; struct hmm_range *range; - spin_lock(&hmm->lock); - list_for_each_entry(range, &hmm->ranges, list) { - if (update->end < range->start || update->start >= range->end) - continue; + /* Report this HMM as dying. */ + hmm->dead = true; + /* Wake-up everyone waiting on any range. */ + mutex_lock(&hmm->lock); + list_for_each_entry(range, &hmm->ranges, list) { range->valid = false; } - spin_unlock(&hmm->lock); - - if (!device) - return 0; - - down_read(&hmm->mirrors_sem); - list_for_each_entry(mirror, &hmm->mirrors, list) { - int ret; - - ret = mirror->ops->sync_cpu_device_pagetables(mirror, update); - if (!update->blockable && ret == -EAGAIN) { - up_read(&hmm->mirrors_sem); - return -EAGAIN; - } - } - up_read(&hmm->mirrors_sem); - - return 0; -} - -static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) -{ - struct hmm_mirror *mirror; - struct hmm *hmm = mm_get_hmm(mm); + wake_up_all(&hmm->wq); + mutex_unlock(&hmm->lock); down_write(&hmm->mirrors_sem); mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, @@ -224,36 +187,80 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) } static int hmm_invalidate_range_start(struct mmu_notifier *mn, - const struct mmu_notifier_range *range) + const struct mmu_notifier_range *nrange) { - struct hmm *hmm = mm_get_hmm(range->mm); + struct hmm *hmm = mm_get_hmm(nrange->mm); + struct hmm_mirror *mirror; struct hmm_update update; - int ret; + struct hmm_range *range; + int ret = 0; VM_BUG_ON(!hmm); - update.start = range->start; - update.end = range->end; + update.start = nrange->start; + update.end = nrange->end; update.event = HMM_UPDATE_INVALIDATE; - update.blockable = range->blockable; - ret = hmm_invalidate_range(hmm, true, &update); + update.blockable = nrange->blockable; + + if (nrange->blockable) + mutex_lock(&hmm->lock); + else if (!mutex_trylock(&hmm->lock)) { + ret = -EAGAIN; + goto out; + } + hmm->notifiers++; + list_for_each_entry(range, &hmm->ranges, list) { + if (update.end < range->start || update.start >= range->end) + continue; + + range->valid = false; + } + mutex_unlock(&hmm->lock); + + if (nrange->blockable) + down_read(&hmm->mirrors_sem); + else if (!down_read_trylock(&hmm->mirrors_sem)) { + ret = -EAGAIN; + goto out; + } + list_for_each_entry(mirror, &hmm->mirrors, list) { + int ret; + + ret = mirror->ops->sync_cpu_device_pagetables(mirror, &update); + if (!update.blockable && ret == -EAGAIN) { + up_read(&hmm->mirrors_sem); + ret = -EAGAIN; + goto out; + } + } + up_read(&hmm->mirrors_sem); + +out: hmm_put(hmm); return ret; } static void hmm_invalidate_range_end(struct mmu_notifier *mn, - const struct mmu_notifier_range *range) + const struct mmu_notifier_range *nrange) { - struct hmm *hmm = mm_get_hmm(range->mm); - struct hmm_update update; + struct hmm *hmm = mm_get_hmm(nrange->mm); VM_BUG_ON(!hmm); - update.start = range->start; - update.end = range->end; - update.event = HMM_UPDATE_INVALIDATE; - update.blockable = true; - hmm_invalidate_range(hmm, false, &update); + mutex_lock(&hmm->lock); + hmm->notifiers--; + if (!hmm->notifiers) { + struct hmm_range *range; + + list_for_each_entry(range, &hmm->ranges, list) { + if (range->valid) + continue; + range->valid = true; + } + wake_up_all(&hmm->wq); + } + mutex_unlock(&hmm->lock); + hmm_put(hmm); } @@ -405,7 +412,6 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, { struct hmm_range *range = hmm_vma_walk->range; - *fault = *write_fault = false; if (!hmm_vma_walk->fault) return; @@ -444,10 +450,11 @@ static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk, return; } + *fault = *write_fault = false; for (i = 0; i < npages; ++i) { hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags, fault, write_fault); - if ((*fault) || (*write_fault)) + if ((*write_fault)) return; } } @@ -702,162 +709,152 @@ static void hmm_pfns_special(struct hmm_range *range) } /* - * hmm_range_snapshot() - snapshot CPU page table for a range + * hmm_range_register() - start tracking change to CPU page table over a range * @range: range - * Returns: number of valid pages in range->pfns[] (from range start - * address). This may be zero. If the return value is negative, - * then one of the following values may be returned: + * @mm: the mm struct for the range of virtual address + * @start: start virtual address (inclusive) + * @end: end virtual address (exclusive) + * Returns 0 on success, -EFAULT if the address space is no longer valid * - * -EINVAL invalid arguments or mm or virtual address are in an - * invalid vma (ie either hugetlbfs or device file vma). - * -EPERM For example, asking for write, when the range is - * read-only - * -EAGAIN Caller needs to retry - * -EFAULT Either no valid vma exists for this range, or it is - * illegal to access the range - * - * This snapshots the CPU page table for a range of virtual addresses. Snapshot - * validity is tracked by range struct. See hmm_vma_range_done() for further - * information. + * Track updates to the CPU page table see include/linux/hmm.h */ -long hmm_range_snapshot(struct hmm_range *range) +int hmm_range_register(struct hmm_range *range, + struct mm_struct *mm, + unsigned long start, + unsigned long end) { - struct vm_area_struct *vma = range->vma; - struct hmm_vma_walk hmm_vma_walk; - struct mm_walk mm_walk; - struct hmm *hmm; - + range->start = start & PAGE_MASK; + range->end = end & PAGE_MASK; + range->valid = false; range->hmm = NULL; - /* Sanity check, this really should not happen ! */ - if (range->start < vma->vm_start || range->start >= vma->vm_end) - return -EINVAL; - if (range->end < vma->vm_start || range->end > vma->vm_end) + if (range->start >= range->end) return -EINVAL; - hmm = hmm_register(vma->vm_mm); - if (!hmm) - return -ENOMEM; + range->hmm = hmm_register(mm); + if (!range->hmm) + return -EFAULT; /* Check if hmm_mm_destroy() was call. */ - if (hmm->mm == NULL) { - hmm_put(hmm); - return -EINVAL; + if (range->hmm->mm == NULL || range->hmm->dead) { + hmm_put(range->hmm); + return -EFAULT; } - /* FIXME support hugetlb fs */ - if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || - vma_is_dax(vma)) { - hmm_pfns_special(range); - hmm_put(hmm); - return -EINVAL; - } + /* Initialize range to track CPU page table update */ + mutex_lock(&range->hmm->lock); - if (!(vma->vm_flags & VM_READ)) { - /* - * If vma do not allow read access, then assume that it does - * not allow write access, either. Architecture that allow - * write without read access are not supported by HMM, because - * operations such has atomic access would not work. - */ - hmm_pfns_clear(range, range->pfns, range->start, range->end); - hmm_put(hmm); - return -EPERM; - } + list_add_rcu(&range->list, &range->hmm->ranges); - /* Initialize range to track CPU page table update */ - spin_lock(&hmm->lock); - range->valid = true; - list_add_rcu(&range->list, &hmm->ranges); - spin_unlock(&hmm->lock); - - hmm_vma_walk.fault = false; - hmm_vma_walk.range = range; - mm_walk.private = &hmm_vma_walk; - hmm_vma_walk.last = range->start; - - mm_walk.vma = vma; - mm_walk.mm = vma->vm_mm; - mm_walk.pte_entry = NULL; - mm_walk.test_walk = NULL; - mm_walk.hugetlb_entry = NULL; - mm_walk.pmd_entry = hmm_vma_walk_pmd; - mm_walk.pte_hole = hmm_vma_walk_hole; - - walk_page_range(range->start, range->end, &mm_walk); /* - * Transfer hmm reference to the range struct it will be drop inside - * the hmm_vma_range_done() function (which _must_ be call if this - * function return 0). + * If there are any concurrent notifiers we have to wait for them for + * the range to be valid (see hmm_range_wait_until_valid()). */ - range->hmm = hmm; - return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; + if (!range->hmm->notifiers) + range->valid = true; + mutex_unlock(&range->hmm->lock); + + return 0; } -EXPORT_SYMBOL(hmm_range_snapshot); +EXPORT_SYMBOL(hmm_range_register); /* - * hmm_vma_range_done() - stop tracking change to CPU page table over a range - * @range: range being tracked - * Returns: false if range data has been invalidated, true otherwise + * hmm_range_unregister() - stop tracking change to CPU page table over a range + * @range: range * * Range struct is used to track updates to the CPU page table after a call to - * either hmm_vma_get_pfns() or hmm_vma_fault(). Once the device driver is done - * using the data, or wants to lock updates to the data it got from those - * functions, it must call the hmm_vma_range_done() function, which will then - * stop tracking CPU page table updates. - * - * Note that device driver must still implement general CPU page table update - * tracking either by using hmm_mirror (see hmm_mirror_register()) or by using - * the mmu_notifier API directly. - * - * CPU page table update tracking done through hmm_range is only temporary and - * to be used while trying to duplicate CPU page table contents for a range of - * virtual addresses. - * - * There are two ways to use this : - * again: - * hmm_vma_get_pfns(range); or hmm_vma_fault(...); - * trans = device_build_page_table_update_transaction(pfns); - * device_page_table_lock(); - * if (!hmm_vma_range_done(range)) { - * device_page_table_unlock(); - * goto again; - * } - * device_commit_transaction(trans); - * device_page_table_unlock(); - * - * Or: - * hmm_vma_get_pfns(range); or hmm_vma_fault(...); - * device_page_table_lock(); - * hmm_vma_range_done(range); - * device_update_page_table(range->pfns); - * device_page_table_unlock(); + * hmm_range_register(). See include/linux/hmm.h for how to use it. */ -bool hmm_vma_range_done(struct hmm_range *range) +void hmm_range_unregister(struct hmm_range *range) { - bool ret = false; - /* Sanity check this really should not happen. */ - if (range->hmm == NULL || range->end <= range->start) { - BUG(); - return false; - } + if (range->hmm == NULL || range->end <= range->start) + return; - spin_lock(&range->hmm->lock); + mutex_lock(&range->hmm->lock); list_del_rcu(&range->list); - ret = range->valid; - spin_unlock(&range->hmm->lock); - - /* Is the mm still alive ? */ - if (range->hmm->mm == NULL) - ret = false; + mutex_unlock(&range->hmm->lock); - /* Drop reference taken by hmm_vma_fault() or hmm_vma_get_pfns() */ + /* Drop reference taken by hmm_range_register() */ + range->valid = false; hmm_put(range->hmm); range->hmm = NULL; - return ret; } -EXPORT_SYMBOL(hmm_vma_range_done); +EXPORT_SYMBOL(hmm_range_unregister); + +/* + * hmm_range_snapshot() - snapshot CPU page table for a range + * @range: range + * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid + * permission (for instance asking for write and range is read only), + * -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid + * vma or it is illegal to access that range), number of valid pages + * in range->pfns[] (from range start address). + * + * This snapshots the CPU page table for a range of virtual addresses. Snapshot + * validity is tracked by range struct. See in include/linux/hmm.h for example + * on how to use. + */ +long hmm_range_snapshot(struct hmm_range *range) +{ + unsigned long start = range->start, end; + struct hmm_vma_walk hmm_vma_walk; + struct hmm *hmm = range->hmm; + struct vm_area_struct *vma; + struct mm_walk mm_walk; + + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL || hmm->dead) + return -EFAULT; + + do { + /* If range is no longer valid force retry. */ + if (!range->valid) + return -EAGAIN; + + vma = find_vma(hmm->mm, start); + if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + return -EFAULT; + + /* FIXME support hugetlb fs/dax */ + if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + hmm_pfns_special(range); + return -EINVAL; + } + + if (!(vma->vm_flags & VM_READ)) { + /* + * If vma do not allow read access, then assume that it + * does not allow write access, either. HMM does not + * support architecture that allow write without read. + */ + hmm_pfns_clear(range, range->pfns, + range->start, range->end); + return -EPERM; + } + + range->vma = vma; + hmm_vma_walk.last = start; + hmm_vma_walk.fault = false; + hmm_vma_walk.range = range; + mm_walk.private = &hmm_vma_walk; + end = min(range->end, vma->vm_end); + + mm_walk.vma = vma; + mm_walk.mm = vma->vm_mm; + mm_walk.pte_entry = NULL; + mm_walk.test_walk = NULL; + mm_walk.hugetlb_entry = NULL; + mm_walk.pmd_entry = hmm_vma_walk_pmd; + mm_walk.pte_hole = hmm_vma_walk_hole; + + walk_page_range(start, end, &mm_walk); + start = end; + } while (start < range->end); + + return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; +} +EXPORT_SYMBOL(hmm_range_snapshot); /* * hmm_range_fault() - try to fault some address in a virtual address range @@ -889,96 +886,79 @@ EXPORT_SYMBOL(hmm_vma_range_done); */ long hmm_range_fault(struct hmm_range *range, bool block) { - struct vm_area_struct *vma = range->vma; - unsigned long start = range->start; + unsigned long start = range->start, end; struct hmm_vma_walk hmm_vma_walk; + struct hmm *hmm = range->hmm; + struct vm_area_struct *vma; struct mm_walk mm_walk; - struct hmm *hmm; int ret; - range->hmm = NULL; - - /* Sanity check, this really should not happen ! */ - if (range->start < vma->vm_start || range->start >= vma->vm_end) - return -EINVAL; - if (range->end < vma->vm_start || range->end > vma->vm_end) - return -EINVAL; + /* Check if hmm_mm_destroy() was call. */ + if (hmm->mm == NULL || hmm->dead) + return -EFAULT; - hmm = hmm_register(vma->vm_mm); - if (!hmm) { - hmm_pfns_clear(range, range->pfns, range->start, range->end); - return -ENOMEM; - } + do { + /* If range is no longer valid force retry. */ + if (!range->valid) { + up_read(&hmm->mm->mmap_sem); + return -EAGAIN; + } - /* Check if hmm_mm_destroy() was call. */ - if (hmm->mm == NULL) { - hmm_put(hmm); - return -EINVAL; - } + vma = find_vma(hmm->mm, start); + if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + return -EFAULT; - /* FIXME support hugetlb fs */ - if (is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_SPECIAL) || - vma_is_dax(vma)) { - hmm_pfns_special(range); - hmm_put(hmm); - return -EINVAL; - } + /* FIXME support hugetlb fs/dax */ + if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + hmm_pfns_special(range); + return -EINVAL; + } - if (!(vma->vm_flags & VM_READ)) { - /* - * If vma do not allow read access, then assume that it does - * not allow write access, either. Architecture that allow - * write without read access are not supported by HMM, because - * operations such has atomic access would not work. - */ - hmm_pfns_clear(range, range->pfns, range->start, range->end); - hmm_put(hmm); - return -EPERM; - } + if (!(vma->vm_flags & VM_READ)) { + /* + * If vma do not allow read access, then assume that it + * does not allow write access, either. HMM does not + * support architecture that allow write without read. + */ + hmm_pfns_clear(range, range->pfns, + range->start, range->end); + return -EPERM; + } - /* Initialize range to track CPU page table update */ - spin_lock(&hmm->lock); - range->valid = true; - list_add_rcu(&range->list, &hmm->ranges); - spin_unlock(&hmm->lock); - - hmm_vma_walk.fault = true; - hmm_vma_walk.block = block; - hmm_vma_walk.range = range; - mm_walk.private = &hmm_vma_walk; - hmm_vma_walk.last = range->start; - - mm_walk.vma = vma; - mm_walk.mm = vma->vm_mm; - mm_walk.pte_entry = NULL; - mm_walk.test_walk = NULL; - mm_walk.hugetlb_entry = NULL; - mm_walk.pmd_entry = hmm_vma_walk_pmd; - mm_walk.pte_hole = hmm_vma_walk_hole; + range->vma = vma; + hmm_vma_walk.last = start; + hmm_vma_walk.fault = true; + hmm_vma_walk.block = block; + hmm_vma_walk.range = range; + mm_walk.private = &hmm_vma_walk; + end = min(range->end, vma->vm_end); + + mm_walk.vma = vma; + mm_walk.mm = vma->vm_mm; + mm_walk.pte_entry = NULL; + mm_walk.test_walk = NULL; + mm_walk.hugetlb_entry = NULL; + mm_walk.pmd_entry = hmm_vma_walk_pmd; + mm_walk.pte_hole = hmm_vma_walk_hole; + + do { + ret = walk_page_range(start, end, &mm_walk); + start = hmm_vma_walk.last; + + /* Keep trying while the range is valid. */ + } while (ret == -EBUSY && range->valid); + + if (ret) { + unsigned long i; + + i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; + hmm_pfns_clear(range, &range->pfns[i], + hmm_vma_walk.last, range->end); + return ret; + } + start = end; - do { - ret = walk_page_range(start, range->end, &mm_walk); - start = hmm_vma_walk.last; - /* Keep trying while the range is valid. */ - } while (ret == -EBUSY && range->valid); - - if (ret) { - unsigned long i; - - i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; - hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last, - range->end); - hmm_vma_range_done(range); - hmm_put(hmm); - return ret; - } else { - /* - * Transfer hmm reference to the range struct it will be drop - * inside the hmm_vma_range_done() function (which _must_ be - * call if this function return 0). - */ - range->hmm = hmm; - } + } while (start < range->end); return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } From patchwork Mon Mar 25 14:40:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869411 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC11F1708 for ; Mon, 25 Mar 2019 14:40:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D722229462 for ; Mon, 25 Mar 2019 14:40:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5AF829493; Mon, 25 Mar 2019 14:40:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3DB1C294A3 for ; Mon, 25 Mar 2019 14:40:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 510886B0266; Mon, 25 Mar 2019 10:40:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4D3BC6B0010; Mon, 25 Mar 2019 10:40:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07EA36B026A; Mon, 25 Mar 2019 10:40:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id D339F6B0010 for ; Mon, 25 Mar 2019 10:40:21 -0400 (EDT) Received: by mail-qt1-f198.google.com with SMTP id 54so10343977qtn.15 for ; Mon, 25 Mar 2019 07:40:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=2sp6d/tamBRD55gqEYdNuviJE1ZEteUB7U7gWf01poQ=; b=N/rEKhSsq5UkCLaWaqaF4iQa/n/7GBKbys0vimn9WtzrqqV4wkBP6JTQ/ysYtvMuK7 1vwTUSOIYyLbWyg2TkvUhj9tpZDW160pw/tAs9241qJtXHgcAbk2qlONyT0HnVLX4gtl Aw9Uz7YHWM1ZiIvqzUGcBq1crXbI1Ql2FaBs1wPryvemL5rQurZe5kvZ6TaKjqJr88Xq V/XQgXzxbnxbkGEulUDTK3PDj/xaEvIjfl6bHNilQ4BVwSXCXQFbcEfORpMGifdru1pZ i2rVifQwCc5iv35fddgEMYi+W3UZEn7RCDg2rYGQ6BAOEUDp5wAZhGklFN8d3OITUR2Q iA9Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAWLfwqWZgxJYeDSfDjis6dGwjmwjGWW99rXeRHSL5xFU1g2AM9t Kh7QAKLBwXAKrgpemwHl1pOGRUDDw5dJabCNmkeBkSK3cWd2pmxN5Z54KxPCFmztCJ9KXjcWrpV 7vnQ3R0C3WToQ0wCjnl3/seQHnQdHd9ZRxcd/bF4xoYzN2pCYzCWxQiL1WHwRkLqSiA== X-Received: by 2002:a0c:94d0:: with SMTP id k16mr21261936qvk.158.1553524821635; Mon, 25 Mar 2019 07:40:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqwyy3M/13KUuOfioONe7UjXgeI1M2oqwr516cfD9CsoEMnNATrOpsazJWjYAzVyu7RL5iDe X-Received: by 2002:a0c:94d0:: with SMTP id k16mr21261894qvk.158.1553524821047; Mon, 25 Mar 2019 07:40:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524821; cv=none; d=google.com; s=arc-20160816; b=GPC9Cboi2lRjMfS/cFkurT5Lh+xvKUrSuKkHYgwexTetupILzgC6oVwHfQiCTuA/9W pNU3Cg5jZ+JP+QNiSmSoSQaCa2ynVQMUuKecO2dbIMuDphlOotIg06zHAYjA9NoIj6Z5 Hq4JuP5w+l9FVL68o33O/Pi7e2im6I49SZqandQH74cHIzQHRfHCCfKgZIg7GKr6pUI7 JVtJsb6WJ+3bbX5nL3lgyHLS+E6ng0ncVI4ZWWa2yXFW6BNSNHVLrWX1m5F70fB/I0HE DVinJhdMBZUkxG+rKdqzae4QjmNt0Gz1wmSJ/zjgRdIezjmOaKE6VWkgwqApE4cbWkll IHnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=2sp6d/tamBRD55gqEYdNuviJE1ZEteUB7U7gWf01poQ=; b=jJkQg2ca5aESlwVA8q2NHqTmbyPUbWYkSj4UQj7tS6v4GGEP7Mf8BHhyyJhMpnDB+4 hWY2ORg8iPbEzsGNzyDV4lgw0sYuHX5JGeOXF9H2LugkBUt4p09ElXPCaqzyVxfX5bgy hGbKVYGMxjSrBMtUo0287rff4fGkNBTmEYByy9TVH30MVNpo5RlJOQsv8VaSVuSQuqCC 3v0Tpcqw83jWrhhB2kda5Ifo0zkqZcRZc02w3yNJ0znvym/iZKldOL0OODCgFpbUvem5 n2f1QpXkqzJCEbGXZRLTS8d/137/SvPDWgCMufQQ48MmYCnn5ZsWx+xXOMnH/H0XIEWO gyjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id s43si446382qvc.88.2019.03.25.07.40.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:21 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 429C530832EA; Mon, 25 Mar 2019 14:40:20 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 920CD100164A; Mon, 25 Mar 2019 14:40:19 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , John Hubbard , Dan Williams Subject: [PATCH v2 07/11] mm/hmm: add default fault flags to avoid the need to pre-fill pfns arrays. Date: Mon, 25 Mar 2019 10:40:07 -0400 Message-Id: <20190325144011.10560-8-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 25 Mar 2019 14:40:20 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse The HMM mirror API can be use in two fashions. The first one where the HMM user coalesce multiple page faults into one request and set flags per pfns for of those faults. The second one where the HMM user want to pre-fault a range with specific flags. For the latter one it is a waste to have the user pre-fill the pfn arrays with a default flags value. This patch adds a default flags value allowing user to set them for a range without having to pre-fill the pfn array. Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: Andrew Morton Cc: John Hubbard Cc: Dan Williams --- include/linux/hmm.h | 7 +++++++ mm/hmm.c | 12 ++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 79671036cb5f..13bc2c72f791 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -165,6 +165,8 @@ enum hmm_pfn_value_e { * @pfns: array of pfns (big enough for the range) * @flags: pfn flags to match device driver page table * @values: pfn value for some special case (none, special, error, ...) + * @default_flags: default flags for the range (write, read, ...) + * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT) * @valid: pfns array did not change since it has been fill by an HMM function */ @@ -177,6 +179,8 @@ struct hmm_range { uint64_t *pfns; const uint64_t *flags; const uint64_t *values; + uint64_t default_flags; + uint64_t pfn_flags_mask; uint8_t pfn_shift; bool valid; }; @@ -521,6 +525,9 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) { long ret; + range->default_flags = 0; + range->pfn_flags_mask = -1UL; + ret = hmm_range_register(range, range->vma->vm_mm, range->start, range->end); if (ret) diff --git a/mm/hmm.c b/mm/hmm.c index fa9498eeb9b6..4fe88a196d17 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -415,6 +415,18 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, if (!hmm_vma_walk->fault) return; + /* + * So we not only consider the individual per page request we also + * consider the default flags requested for the range. The API can + * be use in 2 fashions. The first one where the HMM user coalesce + * multiple page fault into one request and set flags per pfns for + * of those faults. The second one where the HMM user want to pre- + * fault a range with specific flags. For the latter one it is a + * waste to have the user pre-fill the pfn arrays with a default + * flags value. + */ + pfns = (pfns & range->pfn_flags_mask) | range->default_flags; + /* We aren't ask to do anything ... */ if (!(pfns & range->flags[HMM_PFN_VALID])) return; From patchwork Mon Mar 25 14:40:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869415 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6363186D for ; Mon, 25 Mar 2019 14:40:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A69602948A for ; Mon, 25 Mar 2019 14:40:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9AD8529499; Mon, 25 Mar 2019 14:40:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7E6F29483 for ; Mon, 25 Mar 2019 14:40:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64E0A6B0269; Mon, 25 Mar 2019 10:40:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5D5E96B026A; Mon, 25 Mar 2019 10:40:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 400026B026B; Mon, 25 Mar 2019 10:40:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 16CA56B0269 for ; Mon, 25 Mar 2019 10:40:23 -0400 (EDT) Received: by mail-qt1-f199.google.com with SMTP id d49so10317656qtk.8 for ; Mon, 25 Mar 2019 07:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=yPsEqCCThZ59tsHvaH8tH7RPC6kh++zybcKk5Kp6i4k=; b=afKQfemjhZjaPj+kWe+cE9DFOuHlJ507KoTEZJc0obGU5u2Dzu1w82PTOrWRw0VkJj 3nqSEi0vpUoWQu1ylTfdzO2g5TBmBfk0uJtldPzTbjNmpgP0W89rUIucknBGfVZ8F+ym EYDygi2txr0XbHIWVtQtVp3fl9YB9VV3xvJKdhZ86KsDTM8kFmP8QjztjXixaUZlMvlM JvBKQSf9oFaLEWQBc41kzdgNvai6xSQDz9hZ0wX6VemJux5C5bkNtl0c8Tt/3cnfH43x Xu2ceOzgZ54K8OQAMVSqzmnkgmld9ikQJaqW0lGNBqK+YugoGewsWsMR3xQs+jbQRX4V S7hQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAWRYmMgPRorwNel+fNulWkko5BVPnj3KW3bq9NPvq/M4o0QvhA6 Hd9noU+OXklEK/UqYDemxL9Kw9pORn65HIh96wWP/sIBtkYR5urMFUsFGMVNMrIsDjNNI7szZlH yQh84vRI4hJOLmSfubZWDsstiCXF0bCwEXsJetbGVPHpfyIEjty8AJinvLFvapn7Zyw== X-Received: by 2002:a37:4c08:: with SMTP id z8mr19791732qka.32.1553524822806; Mon, 25 Mar 2019 07:40:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqzaAFFuJqmfwhN/EKFNOsKS1S7qa2BtWtdgrL11E0tq9Q/S5njFDMNH0AThvIsyOSQEUmxp X-Received: by 2002:a37:4c08:: with SMTP id z8mr19791677qka.32.1553524821988; Mon, 25 Mar 2019 07:40:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524821; cv=none; d=google.com; s=arc-20160816; b=lmfuP4wWyUycMM5lRCW9JADR+Dmt75R5miL7vc3q7m84uSfVuhH4oJ1JBvWdsGwE8b m9ep1A5ZM4tlkS7ZDFO3tH3C3SNkFQHixhXC06fwADBjBNRK/06wHx7t8IEmCL0OJ8CI Fj5JyIefKazS0nTOCWNY/B+AYeTrDPb0obxc1NNNR+tYGXFXJxyhPtYdYCeCR1jdfFk6 dJeUXYFEEk2Pl5Tyo3ZoaqLimkYTNvRRZCFI77wFSmc1q0M40yt5ux2DdiP2az1WpDfc 9otsUIi9G8pTjOXJ8M6+/FS77AV26AP+xCdx2h+2dChvWmxgBr9zldymgN+igAFeSwhH psyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=yPsEqCCThZ59tsHvaH8tH7RPC6kh++zybcKk5Kp6i4k=; b=YLO4948TNBPig4vFUr5n51ySDG4X+/0LcYBE/KIKf9jBLvB3VXMF/kGH1Tv0JzLl72 8f4PPjA/MCqGnQyxZppmzJOIcfO5fJmho/n+0sCyDew+OZFNT5VPpV1yLB6tMY542ByK FIFSn1HDJW8fodmjPgrPUDJfdR9i4aP/aMODp6bgioTm+501N7pyurPZ+5o6uRxV2zc1 2kCO8RTZgFVMicIRii+AhSIyvtJgOZOdKTUZOszcdGCt3GDCvqHQOoo9EKhrZxZFWxd+ MM+2+cWn7moGC8K6QUXJMcnbW58fi27FC1r9SeWCIGkzCgeVllA758yfiYi/kfn7SX9W A3KA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id u17si1476612qvm.77.2019.03.25.07.40.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:21 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2598F3092641; Mon, 25 Mar 2019 14:40:21 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 616EF100164A; Mon, 25 Mar 2019 14:40:20 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , John Hubbard , Dan Williams , Arnd Bergmann Subject: [PATCH v2 08/11] mm/hmm: mirror hugetlbfs (snapshoting, faulting and DMA mapping) v2 Date: Mon, 25 Mar 2019 10:40:08 -0400 Message-Id: <20190325144011.10560-9-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Mon, 25 Mar 2019 14:40:21 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse HMM mirror is a device driver helpers to mirror range of virtual address. It means that the process jobs running on the device can access the same virtual address as the CPU threads of that process. This patch adds support for hugetlbfs mapping (ie range of virtual address that are mmap of a hugetlbfs). Changes since v1: - improved commit message - squashed: Arnd Bergmann: fix unused variable warnings Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: Andrew Morton Cc: John Hubbard Cc: Dan Williams Cc: Arnd Bergmann Reviewed-by: Ira Weiny --- include/linux/hmm.h | 29 ++++++++-- mm/hmm.c | 126 +++++++++++++++++++++++++++++++++++++++----- 2 files changed, 138 insertions(+), 17 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 13bc2c72f791..f3b919b04eda 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -181,10 +181,31 @@ struct hmm_range { const uint64_t *values; uint64_t default_flags; uint64_t pfn_flags_mask; + uint8_t page_shift; uint8_t pfn_shift; bool valid; }; +/* + * hmm_range_page_shift() - return the page shift for the range + * @range: range being queried + * Returns: page shift (page size = 1 << page shift) for the range + */ +static inline unsigned hmm_range_page_shift(const struct hmm_range *range) +{ + return range->page_shift; +} + +/* + * hmm_range_page_size() - return the page size for the range + * @range: range being queried + * Returns: page size for the range in bytes + */ +static inline unsigned long hmm_range_page_size(const struct hmm_range *range) +{ + return 1UL << hmm_range_page_shift(range); +} + /* * hmm_range_wait_until_valid() - wait for range to be valid * @range: range affected by invalidation to wait on @@ -438,7 +459,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * struct hmm_range range; * ... * - * ret = hmm_range_register(&range, mm, start, end); + * ret = hmm_range_register(&range, mm, start, end, page_shift); * if (ret) * return ret; * @@ -498,7 +519,8 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); int hmm_range_register(struct hmm_range *range, struct mm_struct *mm, unsigned long start, - unsigned long end); + unsigned long end, + unsigned page_shift); void hmm_range_unregister(struct hmm_range *range); long hmm_range_snapshot(struct hmm_range *range); long hmm_range_fault(struct hmm_range *range, bool block); @@ -529,7 +551,8 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) range->pfn_flags_mask = -1UL; ret = hmm_range_register(range, range->vma->vm_mm, - range->start, range->end); + range->start, range->end, + PAGE_SHIFT); if (ret) return (int)ret; diff --git a/mm/hmm.c b/mm/hmm.c index 4fe88a196d17..64a33770813b 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -387,11 +387,13 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end, struct hmm_vma_walk *hmm_vma_walk = walk->private; struct hmm_range *range = hmm_vma_walk->range; uint64_t *pfns = range->pfns; - unsigned long i; + unsigned long i, page_size; hmm_vma_walk->last = addr; - i = (addr - range->start) >> PAGE_SHIFT; - for (; addr < end; addr += PAGE_SIZE, i++) { + page_size = 1UL << range->page_shift; + i = (addr - range->start) >> range->page_shift; + + for (; addr < end; addr += page_size, i++) { pfns[i] = range->values[HMM_PFN_NONE]; if (fault || write_fault) { int ret; @@ -703,6 +705,69 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return 0; } +static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, + unsigned long start, unsigned long end, + struct mm_walk *walk) +{ +#ifdef CONFIG_HUGETLB_PAGE + unsigned long addr = start, i, pfn, mask, size, pfn_inc; + struct hmm_vma_walk *hmm_vma_walk = walk->private; + struct hmm_range *range = hmm_vma_walk->range; + struct vm_area_struct *vma = walk->vma; + struct hstate *h = hstate_vma(vma); + uint64_t orig_pfn, cpu_flags; + bool fault, write_fault; + spinlock_t *ptl; + pte_t entry; + int ret = 0; + + size = 1UL << huge_page_shift(h); + mask = size - 1; + if (range->page_shift != PAGE_SHIFT) { + /* Make sure we are looking at full page. */ + if (start & mask) + return -EINVAL; + if (end < (start + size)) + return -EINVAL; + pfn_inc = size >> PAGE_SHIFT; + } else { + pfn_inc = 1; + size = PAGE_SIZE; + } + + + ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); + entry = huge_ptep_get(pte); + + i = (start - range->start) >> range->page_shift; + orig_pfn = range->pfns[i]; + range->pfns[i] = range->values[HMM_PFN_NONE]; + cpu_flags = pte_to_hmm_pfn_flags(range, entry); + fault = write_fault = false; + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, + &fault, &write_fault); + if (fault || write_fault) { + ret = -ENOENT; + goto unlock; + } + + pfn = pte_pfn(entry) + (start & mask); + for (; addr < end; addr += size, i++, pfn += pfn_inc) + range->pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags; + hmm_vma_walk->last = end; + +unlock: + spin_unlock(ptl); + + if (ret == -ENOENT) + return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); + + return ret; +#else /* CONFIG_HUGETLB_PAGE */ + return -EINVAL; +#endif +} + static void hmm_pfns_clear(struct hmm_range *range, uint64_t *pfns, unsigned long addr, @@ -726,6 +791,7 @@ static void hmm_pfns_special(struct hmm_range *range) * @mm: the mm struct for the range of virtual address * @start: start virtual address (inclusive) * @end: end virtual address (exclusive) + * @page_shift: expect page shift for the range * Returns 0 on success, -EFAULT if the address space is no longer valid * * Track updates to the CPU page table see include/linux/hmm.h @@ -733,16 +799,23 @@ static void hmm_pfns_special(struct hmm_range *range) int hmm_range_register(struct hmm_range *range, struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + unsigned page_shift) { - range->start = start & PAGE_MASK; - range->end = end & PAGE_MASK; + unsigned long mask = ((1UL << page_shift) - 1UL); + range->valid = false; range->hmm = NULL; - if (range->start >= range->end) + if ((start & mask) || (end & mask)) + return -EINVAL; + if (start >= end) return -EINVAL; + range->page_shift = page_shift; + range->start = start; + range->end = end; + range->hmm = hmm_register(mm); if (!range->hmm) return -EFAULT; @@ -809,6 +882,7 @@ EXPORT_SYMBOL(hmm_range_unregister); */ long hmm_range_snapshot(struct hmm_range *range) { + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; unsigned long start = range->start, end; struct hmm_vma_walk hmm_vma_walk; struct hmm *hmm = range->hmm; @@ -825,15 +899,26 @@ long hmm_range_snapshot(struct hmm_range *range) return -EAGAIN; vma = find_vma(hmm->mm, start); - if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support hugetlb fs/dax */ - if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + /* FIXME support dax */ + if (vma_is_dax(vma)) { hmm_pfns_special(range); return -EINVAL; } + if (is_vm_hugetlb_page(vma)) { + struct hstate *h = hstate_vma(vma); + + if (huge_page_shift(h) != range->page_shift && + range->page_shift != PAGE_SHIFT) + return -EINVAL; + } else { + if (range->page_shift != PAGE_SHIFT) + return -EINVAL; + } + if (!(vma->vm_flags & VM_READ)) { /* * If vma do not allow read access, then assume that it @@ -859,6 +944,7 @@ long hmm_range_snapshot(struct hmm_range *range) mm_walk.hugetlb_entry = NULL; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; + mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; walk_page_range(start, end, &mm_walk); start = end; @@ -877,7 +963,7 @@ EXPORT_SYMBOL(hmm_range_snapshot); * then one of the following values may be returned: * * -EINVAL invalid arguments or mm or virtual address are in an - * invalid vma (ie either hugetlbfs or device file vma). + * invalid vma (for instance device file vma). * -ENOMEM: Out of memory. * -EPERM: Invalid permission (for instance asking for write and * range is read only). @@ -898,6 +984,7 @@ EXPORT_SYMBOL(hmm_range_snapshot); */ long hmm_range_fault(struct hmm_range *range, bool block) { + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; unsigned long start = range->start, end; struct hmm_vma_walk hmm_vma_walk; struct hmm *hmm = range->hmm; @@ -917,15 +1004,25 @@ long hmm_range_fault(struct hmm_range *range, bool block) } vma = find_vma(hmm->mm, start); - if (vma == NULL || (vma->vm_flags & VM_SPECIAL)) + if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support hugetlb fs/dax */ - if (is_vm_hugetlb_page(vma) || vma_is_dax(vma)) { + /* FIXME support dax */ + if (vma_is_dax(vma)) { hmm_pfns_special(range); return -EINVAL; } + if (is_vm_hugetlb_page(vma)) { + if (huge_page_shift(hstate_vma(vma)) != + range->page_shift && + range->page_shift != PAGE_SHIFT) + return -EINVAL; + } else { + if (range->page_shift != PAGE_SHIFT) + return -EINVAL; + } + if (!(vma->vm_flags & VM_READ)) { /* * If vma do not allow read access, then assume that it @@ -952,6 +1049,7 @@ long hmm_range_fault(struct hmm_range *range, bool block) mm_walk.hugetlb_entry = NULL; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; + mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; do { ret = walk_page_range(start, end, &mm_walk); From patchwork Mon Mar 25 14:40:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869417 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1C8F1708 for ; Mon, 25 Mar 2019 14:40:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B939929420 for ; Mon, 25 Mar 2019 14:40:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B7444294A3; Mon, 25 Mar 2019 14:40:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA07229474 for ; Mon, 25 Mar 2019 14:40:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36C3B6B026A; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2EDCB6B026B; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11D676B026C; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id D8DF36B026A for ; Mon, 25 Mar 2019 10:40:23 -0400 (EDT) Received: by mail-qt1-f197.google.com with SMTP id t22so9880809qtc.13 for ; Mon, 25 Mar 2019 07:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=omd/FHaKWfRUxUYh143eh+Ou5RpsvJY2kGVkaQZxymU=; b=CRfANYj29hvp2pb7GZrb+LjnUnVV+GHNZ3x26EBJiuT/8poB78EbY5+wo1+YMrtreA 27edHXE4mpTI0VjZFg7xEOjia1ttKSiVac4zpJKq9RzUitbWiCU5pX1t2zZmOKvUB6xG XvZa3OO+ynxWsQoyM0e3fbKddxahLagSW4H9tIfRWxcZv1OoQWaEFDLWRJaXT1mRQvUq 4ObNydS/gzbsqEDwL1l2nUEUF9jnwIK8Kb3UnvHqaZYeEFlT66eJ+emvjZRWp6/EQK2B t56Mnsa1GB/MV8wLpJVO9Ta8dqSLyY77LNjS+E8lJru77da8Z9RQu4fr1Bl8YEqx4SQR paxA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAX3j2pXO4dks7HBzenRVSKBzDnI6W7ZgYHFkn81ugl9Qty62NGz Fi3vzwT9xs/rdlVerq3TKkH4pcIe2CgOVTWQ1rYeybSvSLWDkd8PRrCThFmZ1OjZgP9ioaCcbZk nCEtRtDHaNHBvEUI6MmNcGEQsf8pYqg3/v58iJWomcEd02ThxPVniWIOhMBFHUKVygQ== X-Received: by 2002:a0c:d4a2:: with SMTP id u31mr8581885qvh.139.1553524823604; Mon, 25 Mar 2019 07:40:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqyZJLr3jjXb5MlA2DoTYeXRlZn4SXziHvU74vjAM4DdkycW58zGSgfu+wVryZxUm37J00hL X-Received: by 2002:a0c:d4a2:: with SMTP id u31mr8581824qvh.139.1553524822873; Mon, 25 Mar 2019 07:40:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524822; cv=none; d=google.com; s=arc-20160816; b=fOOSCKB7bAS4RZ8hpCQg/gwdU1FdKFBh1ZLOw+4dEm051bSAi7ZbMSUp3ORl7Oou0M ZkausWV+wIXXnQ81DITNxX2135aGxobylhU7f1iJFDL5k9idi9TezXg0eJ5xLNVnuZbL AvnpF9TqFUznNLdnghtn1AbAv3Fq3LnadyK+Aqhe4ly4WAYv6maIR+3jWKf5Roiu/14R 0RvDh8gC5B2e6wMj1U6UAB9In/reQ0RYCjPAJd/5F8JoVUkwg49eQzBN3ztQpewodlte N/ITk/nske6Tg4XaSXMzZ3dOlNRyZROtO/j6uyLwL+cIqdUxdcb+KUIJtor2eKk1JaR0 Y/4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=omd/FHaKWfRUxUYh143eh+Ou5RpsvJY2kGVkaQZxymU=; b=lA7E/SARL77PgqX5p6zc/tzvzVQvBLFQ03asCGeq85AYqYZvP8MlO204RvBKjhMsDm TGyZLIOkzGxwkh78NqnkC7bZYIK0Pxeq3xuGtVVUw9ZpawhLtbTV6Ul20ZeFQ84cfAaN GQ9+H89Nwtd38rFt34fgXBw5iYA3vlTdKQq2rcR1s/JbdwgIF0JCupQc/bo5biH6fown A4FnhEek9Pmlb536HKsoDePJvWybdRO6nmaW5k3LO3ez4tcW695uPv9yXVREFcH+PwZY YrjG/vUiDwy4CBdE5ys1YSFhbU4/yMJq0F9Jqmd6OsrIz+jgnZWknH2aMyd0iUANVqmz xqjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id t29si1293834qvc.4.2019.03.25.07.40.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:22 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0CA2866964; Mon, 25 Mar 2019 14:40:22 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 443361001DC8; Mon, 25 Mar 2019 14:40:21 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Dan Williams , John Hubbard , Arnd Bergmann Subject: [PATCH v2 09/11] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem v2 Date: Mon, 25 Mar 2019 10:40:09 -0400 Message-Id: <20190325144011.10560-10-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 25 Mar 2019 14:40:22 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse HMM mirror is a device driver helpers to mirror range of virtual address. It means that the process jobs running on the device can access the same virtual address as the CPU threads of that process. This patch adds support for mirroring mapping of file that are on a DAX block device (ie range of virtual address that is an mmap of a file in a filesystem on a DAX block device). There is no reason to not support such case when mirroring virtual address on a device. Note that unlike GUP code we do not take page reference hence when we back-off we have nothing to undo. Changes since v1: - improved commit message - squashed: Arnd Bergmann: fix unused variable warning in hmm_vma_walk_pud Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: Andrew Morton Cc: Dan Williams Cc: John Hubbard Cc: Arnd Bergmann --- mm/hmm.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 111 insertions(+), 21 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 64a33770813b..ce33151c6832 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -325,6 +325,7 @@ EXPORT_SYMBOL(hmm_mirror_unregister); struct hmm_vma_walk { struct hmm_range *range; + struct dev_pagemap *pgmap; unsigned long last; bool fault; bool block; @@ -499,6 +500,15 @@ static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd) range->flags[HMM_PFN_VALID]; } +static inline uint64_t pud_to_hmm_pfn_flags(struct hmm_range *range, pud_t pud) +{ + if (!pud_present(pud)) + return 0; + return pud_write(pud) ? range->flags[HMM_PFN_VALID] | + range->flags[HMM_PFN_WRITE] : + range->flags[HMM_PFN_VALID]; +} + static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, unsigned long end, @@ -520,8 +530,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); pfn = pmd_pfn(pmd) + pte_index(addr); - for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { + if (pmd_devmap(pmd)) { + hmm_vma_walk->pgmap = get_dev_pagemap(pfn, + hmm_vma_walk->pgmap); + if (unlikely(!hmm_vma_walk->pgmap)) + return -EBUSY; + } pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags; + } + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } hmm_vma_walk->last = end; return 0; } @@ -608,10 +629,24 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, if (fault || write_fault) goto fault; + if (pte_devmap(pte)) { + hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte), + hmm_vma_walk->pgmap); + if (unlikely(!hmm_vma_walk->pgmap)) + return -EBUSY; + } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) { + *pfn = range->values[HMM_PFN_SPECIAL]; + return -EFAULT; + } + *pfn = hmm_pfn_from_pfn(range, pte_pfn(pte)) | cpu_flags; return 0; fault: + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } pte_unmap(ptep); /* Fault any virtual address we were asked to fault */ return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); @@ -699,12 +734,83 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return r; } } + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } pte_unmap(ptep - 1); hmm_vma_walk->last = addr; return 0; } +static int hmm_vma_walk_pud(pud_t *pudp, + unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct hmm_vma_walk *hmm_vma_walk = walk->private; + struct hmm_range *range = hmm_vma_walk->range; + unsigned long addr = start, next; + pmd_t *pmdp; + pud_t pud; + int ret; + +again: + pud = READ_ONCE(*pudp); + if (pud_none(pud)) + return hmm_vma_walk_hole(start, end, walk); + + if (pud_huge(pud) && pud_devmap(pud)) { + unsigned long i, npages, pfn; + uint64_t *pfns, cpu_flags; + bool fault, write_fault; + + if (!pud_present(pud)) + return hmm_vma_walk_hole(start, end, walk); + + i = (addr - range->start) >> PAGE_SHIFT; + npages = (end - addr) >> PAGE_SHIFT; + pfns = &range->pfns[i]; + + cpu_flags = pud_to_hmm_pfn_flags(range, pud); + hmm_range_need_fault(hmm_vma_walk, pfns, npages, + cpu_flags, &fault, &write_fault); + if (fault || write_fault) + return hmm_vma_walk_hole_(addr, end, fault, + write_fault, walk); + + pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + for (i = 0; i < npages; ++i, ++pfn) { + hmm_vma_walk->pgmap = get_dev_pagemap(pfn, + hmm_vma_walk->pgmap); + if (unlikely(!hmm_vma_walk->pgmap)) + return -EBUSY; + pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags; + } + if (hmm_vma_walk->pgmap) { + put_dev_pagemap(hmm_vma_walk->pgmap); + hmm_vma_walk->pgmap = NULL; + } + hmm_vma_walk->last = end; + return 0; + } + + split_huge_pud(walk->vma, pudp, addr); + if (pud_none(*pudp)) + goto again; + + pmdp = pmd_offset(pudp, addr); + do { + next = pmd_addr_end(addr, end); + ret = hmm_vma_walk_pmd(pmdp, addr, next, walk); + if (ret) + return ret; + } while (pmdp++, addr = next, addr != end); + + return 0; +} + static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, unsigned long start, unsigned long end, struct mm_walk *walk) @@ -777,14 +883,6 @@ static void hmm_pfns_clear(struct hmm_range *range, *pfns = range->values[HMM_PFN_NONE]; } -static void hmm_pfns_special(struct hmm_range *range) -{ - unsigned long addr = range->start, i = 0; - - for (; addr < range->end; addr += PAGE_SIZE, i++) - range->pfns[i] = range->values[HMM_PFN_SPECIAL]; -} - /* * hmm_range_register() - start tracking change to CPU page table over a range * @range: range @@ -902,12 +1000,6 @@ long hmm_range_snapshot(struct hmm_range *range) if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support dax */ - if (vma_is_dax(vma)) { - hmm_pfns_special(range); - return -EINVAL; - } - if (is_vm_hugetlb_page(vma)) { struct hstate *h = hstate_vma(vma); @@ -931,6 +1023,7 @@ long hmm_range_snapshot(struct hmm_range *range) } range->vma = vma; + hmm_vma_walk.pgmap = NULL; hmm_vma_walk.last = start; hmm_vma_walk.fault = false; hmm_vma_walk.range = range; @@ -942,6 +1035,7 @@ long hmm_range_snapshot(struct hmm_range *range) mm_walk.pte_entry = NULL; mm_walk.test_walk = NULL; mm_walk.hugetlb_entry = NULL; + mm_walk.pud_entry = hmm_vma_walk_pud; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; @@ -1007,12 +1101,6 @@ long hmm_range_fault(struct hmm_range *range, bool block) if (vma == NULL || (vma->vm_flags & device_vma)) return -EFAULT; - /* FIXME support dax */ - if (vma_is_dax(vma)) { - hmm_pfns_special(range); - return -EINVAL; - } - if (is_vm_hugetlb_page(vma)) { if (huge_page_shift(hstate_vma(vma)) != range->page_shift && @@ -1035,6 +1123,7 @@ long hmm_range_fault(struct hmm_range *range, bool block) } range->vma = vma; + hmm_vma_walk.pgmap = NULL; hmm_vma_walk.last = start; hmm_vma_walk.fault = true; hmm_vma_walk.block = block; @@ -1047,6 +1136,7 @@ long hmm_range_fault(struct hmm_range *range, bool block) mm_walk.pte_entry = NULL; mm_walk.test_walk = NULL; mm_walk.hugetlb_entry = NULL; + mm_walk.pud_entry = hmm_vma_walk_pud; mm_walk.pmd_entry = hmm_vma_walk_pmd; mm_walk.pte_hole = hmm_vma_walk_hole; mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; From patchwork Mon Mar 25 14:40:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869419 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 832401708 for ; Mon, 25 Mar 2019 14:40:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E7632946F for ; Mon, 25 Mar 2019 14:40:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6C40A29417; Mon, 25 Mar 2019 14:40:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D10752946F for ; Mon, 25 Mar 2019 14:40:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E887F6B026B; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E13DF6B026C; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C42C46B026D; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 975836B026B for ; Mon, 25 Mar 2019 10:40:24 -0400 (EDT) Received: by mail-qt1-f199.google.com with SMTP id x12so10376556qtk.2 for ; Mon, 25 Mar 2019 07:40:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=3Ti6IjAWZ9nxGji48U2finShhN9k8Ywg/5Fu3MYAh2E=; b=pHTejNRsOh2StvbfUohWf4zqb2jA/eYa8WZEOfK1lsdGUO9ev+w1k3peDK4bsiP7YP gGB81vfnTcituxN8HA+XDs9inPj9DukLe6rZi0eaJC67OUHMPVZbEvzIyw6HCzLHZFkF LKtclqXuUQMlyFK9aMoXLjjEeLFMiZo9a5GvH8JIg9+j8oGwbI9V+iQjTUoWqoZsR/XI vJta7t4gLCAuaMrpbrzz3xp8TlyODTUfc2iu4M0mYZjBNSbcU6yTVIJZeDVH8DhDQzSB FGbaDWt4JxKpJh6Rv+tRPgP0nlMZQInbCD7w33v0+8viW5Qj6iZYQnP3Dvqv3N4NJH4r wvEw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAUuUgyLd55CP0pncw92O5rCGdVt3+9jdiSJzP9G1RouCOneDU02 dRdLvIDjB7CRNK+myUWdBn/5TevLMmXKHAS3PXPDJRgGGYCiPzvv3hGmmuh5lWkavTdRDpgbJzo P+kOL7p4OMBVeU8tMd4H917RNckj/K+R03dq7sGOP3oYGNgZC7d/FnvkL7xEIKR7SKQ== X-Received: by 2002:ae9:df41:: with SMTP id t62mr18588024qkf.150.1553524824359; Mon, 25 Mar 2019 07:40:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqwsHbmjhw+8w4oeQfNeqRp0OA9YeOtyILg8DCdvm10zPywiGBTb/fWWFaP0qp+wdLt+TbfC X-Received: by 2002:ae9:df41:: with SMTP id t62mr18587972qkf.150.1553524823657; Mon, 25 Mar 2019 07:40:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524823; cv=none; d=google.com; s=arc-20160816; b=Ry72aUJD6DI9cFrz88TVBL5UfrNK2hX84NwKl/W8bLw6zcwY9Cag/S05Vgd/2hC8wP B1aLFJd7UqeLjY3uHc3NWDHzMWGMCU25SmQPK8xQUG4W8fWqkkRKYmZSbfypKdzTLkuV SgFcQPJWdtdffkn1W9qRBloGA2ydbtc0RP0NEDC0j92mB0FmT6PvWxfD/qtiMJ6wtNct qYIV0bQpMCvCWqh7fFiVeCFDfCZGbpW/+ItXMFdDX3DUghkpKGYjX4pN8vjFY9OVuozV JuycLp08T8K2LrMYGaLP+Ch/0D7LKwDDrXmi6gQ7FtNpMbHUgFAKOukVcrVyLySG/rmk 9oHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=3Ti6IjAWZ9nxGji48U2finShhN9k8Ywg/5Fu3MYAh2E=; b=uMMv+DRon/mShq7Rv+DH3SFiGwcICNTKeIxP7KmUQnuoH8PQqq4tqkwv66jxVgXiTp 0Slsx/Ras3I6Kqu5GtwPuxNUucx/qPDSXAb4EYh4vnSerubB8cLV95EPbJDKjTyaZEow X6jB5AMY/cXqwdmuWefV82bHMP9VoJh4fIvenPOKAqrBV4snAeHIm5taWk9rtuRmByKG vg2GeLuo1YH2KXxObkF1zrOBWa8dWdNoALZwQshkzK0qHjniNMtF31bYcwHCniL5yIYo cv6KgiRQtqHqOXl7EDtiIHzDcrCUSNLwGwWuqIzHZlVtnx5j4tVH266SYdFZilYQLAfm CxbQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id o13si5020813qtm.11.2019.03.25.07.40.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:23 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CEC44821E5; Mon, 25 Mar 2019 14:40:22 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2ABDC1001DC8; Mon, 25 Mar 2019 14:40:22 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , John Hubbard , Dan Williams Subject: [PATCH v2 10/11] mm/hmm: add helpers for driver to safely take the mmap_sem v2 Date: Mon, 25 Mar 2019 10:40:10 -0400 Message-Id: <20190325144011.10560-11-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 25 Mar 2019 14:40:22 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse The device driver context which holds reference to mirror and thus to core hmm struct might outlive the mm against which it was created. To avoid every driver to check for that case provide an helper that check if mm is still alive and take the mmap_sem in read mode if so. If the mm have been destroy (mmu_notifier release call back did happen) then we return -EINVAL so that calling code knows that it is trying to do something against a mm that is no longer valid. Changes since v1: - removed bunch of useless check (if API is use with bogus argument better to fail loudly so user fix their code) Signed-off-by: Jérôme Glisse Reviewed-by: Ralph Campbell Cc: Andrew Morton Cc: John Hubbard Cc: Dan Williams --- include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index f3b919b04eda..5f9deaeb9d77 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -438,6 +438,50 @@ struct hmm_mirror { int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm); void hmm_mirror_unregister(struct hmm_mirror *mirror); +/* + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken). + * + * The device driver context which holds reference to mirror and thus to core + * hmm struct might outlive the mm against which it was created. To avoid every + * driver to check for that case provide an helper that check if mm is still + * alive and take the mmap_sem in read mode if so. If the mm have been destroy + * (mmu_notifier release call back did happen) then we return -EINVAL so that + * calling code knows that it is trying to do something against a mm that is + * no longer valid. + */ +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror) +{ + struct mm_struct *mm; + + /* Sanity check ... */ + if (!mirror || !mirror->hmm) + return -EINVAL; + /* + * Before trying to take the mmap_sem make sure the mm is still + * alive as device driver context might outlive the mm lifetime. + * + * FIXME: should we also check for mm that outlive its owning + * task ? + */ + mm = READ_ONCE(mirror->hmm->mm); + if (mirror->hmm->dead || !mm) + return -EINVAL; + + down_read(&mm->mmap_sem); + return 0; +} + +/* + * hmm_mirror_mm_up_read() - unlock the mmap_sem from read mode + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem + */ +static inline void hmm_mirror_mm_up_read(struct hmm_mirror *mirror) +{ + up_read(&mirror->hmm->mm->mmap_sem); +} + /* * To snapshot the CPU page table you first have to call hmm_range_register() @@ -463,7 +507,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * if (ret) * return ret; * - * down_read(mm->mmap_sem); + * hmm_mirror_mm_down_read(mirror); * again: * * if (!hmm_range_wait_until_valid(&range, TIMEOUT)) { @@ -476,13 +520,13 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror); * * ret = hmm_range_snapshot(&range); or hmm_range_fault(&range); * if (ret == -EAGAIN) { - * down_read(mm->mmap_sem); + * hmm_mirror_mm_down_read(mirror); * goto again; * } else if (ret == -EBUSY) { * goto again; * } * - * up_read(&mm->mmap_sem); + * hmm_mirror_mm_up_read(mirror); * if (ret) { * hmm_range_unregister(range); * return ret; From patchwork Mon Mar 25 14:40:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10869421 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5371E1390 for ; Mon, 25 Mar 2019 14:41:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E0522945C for ; Mon, 25 Mar 2019 14:41:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C97229477; Mon, 25 Mar 2019 14:41:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A56F2947A for ; Mon, 25 Mar 2019 14:41:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D29B66B026C; Mon, 25 Mar 2019 10:40:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CB0E86B026D; Mon, 25 Mar 2019 10:40:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2FC76B026E; Mon, 25 Mar 2019 10:40:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 876686B026C for ; Mon, 25 Mar 2019 10:40:25 -0400 (EDT) Received: by mail-qt1-f197.google.com with SMTP id q12so10387375qtr.3 for ; Mon, 25 Mar 2019 07:40:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=MqKfHNfNBWpbmcGqXPdVGblL10NQAvDWexvXCgPT6fQ=; b=B9zpUdMJjsEWKSYWGcTp0hFyX3vd0C1dkrzjJOubx5bVmiKQt9eXaYJGL5iiurw14k 959aeqLmJSR53T4B6iNV0y11PguizFxgI6A4hvvcNbIpCtwlBwJM8hzoypNLnUkZgpTn p94W5Em0UH5TP5YdYiaOGEXhwLKqK4QhZY4X98/390zVcseH6Dx+hzf2Mw6dUu24Tso1 jSqMNUj/36BPZUKsMlgDLPsQwGDyawj/Ns2rcPwt8eZMCP2yJ+Sgi3aN3p2Zyfn8RA8c 0Z/NvY3byKySe3EtOOma9NLgCPAODh0S2rFH85+hGuT45r9dOQhjz5U31thFwqIfCmR+ PFNA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAWLJeIV9Ljtuwh8Rm+siQ1Co9cObZyypLJPGLqcvkeG0yJ0N8Oz EE0Meec3KDp1EpRWh7lK68974FycE/KnLWvE/JJsGBlPO5FnvbQdhoqXIsX9OFyo1WzAiO4DR4c xtNUmridut975cZ5GHSX3mMHl+D0+7QAalP9LYKUpWtFFyHT1fqcJUcZPYNWeJKfnpA== X-Received: by 2002:a05:620a:101b:: with SMTP id z27mr19164017qkj.160.1553524825309; Mon, 25 Mar 2019 07:40:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqq/OVtCGsbaF2MyCQPfejjs57HSqvfAt6orqMpwBGoqfUZ5lmGt/RfpHJ3afKM3nGC5VV X-Received: by 2002:a05:620a:101b:: with SMTP id z27mr19163964qkj.160.1553524824574; Mon, 25 Mar 2019 07:40:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553524824; cv=none; d=google.com; s=arc-20160816; b=jn3iYQDQGR1Z0xGJr3tyDD6Aqq3aX5iA/RNKFE4WU3cS+ERPUA5aNudxPc/wjgpt6d jIkWo2PzwfhO7+tn6N1cmxPmJ8pn3/yqAy6dh93sWv1TxztLgUC5twD+blczj/rV5qXu ZeC8aQPCcxiinNhXVJi46GipkICAWReduiEoc92J1dLIkOtdp8y2GWUw7YkKqg+V0EGj 88gIcWr/2jI/LD1+N5bKI6jmOLkxK7LcRXPURqNAWYaHUqrrtT9cSYPnz3VcV+Nnk6Oo ZkA94HAB2s25yvso/agdLxI+iaU1/wML3w16stz3qJ0zViTf0IThZFZd6+cFLqnu70F/ Ar8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=MqKfHNfNBWpbmcGqXPdVGblL10NQAvDWexvXCgPT6fQ=; b=eQ+WlYjrO1nJuLjZOvuGCtSqcTLEN9iQVYnpFXuuk8OhQsNqsUUHHw/OUoK0p6DfmD P9RuPWwNv2mN5nlVRBAVMZ8U7F8tKSeU/d18QmX5ZHwiaqrmPxrMuMyeRt2kLYz7AYFz b/Hf1qGXFg5vyRT26sIfBl4IhK/Q8ug3ZTvhdPSUm8aYaulYb85jGcccOwDZ/Lsrs8mr io4H3E7H86H7lhZ8K6k1cA6A8yT3T2nMS/iKBK0UxD8xL35cm5p8Zqd0uNDyFfs+i+X1 6qI51qSzj+P8/oRg2YdEfNYejHzlz5To0DbNOa/nmbwVkkb29E+dxmleiSqUmClaxZQ4 ca3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 25si2754194qtq.283.2019.03.25.07.40.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 07:40:24 -0700 (PDT) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B07DB2026E; Mon, 25 Mar 2019 14:40:23 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id EBEBD1001DC8; Mon, 25 Mar 2019 14:40:22 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Ralph Campbell , John Hubbard , Dan Williams Subject: [PATCH v2 11/11] mm/hmm: add an helper function that fault pages and map them to a device v2 Date: Mon, 25 Mar 2019 10:40:11 -0400 Message-Id: <20190325144011.10560-12-jglisse@redhat.com> In-Reply-To: <20190325144011.10560-1-jglisse@redhat.com> References: <20190325144011.10560-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 25 Mar 2019 14:40:23 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This is a all in one helper that fault pages in a range and map them to a device so that every single device driver do not have to re-implement this common pattern. This is taken from ODP RDMA in preparation of ODP RDMA convertion. It will be use by nouveau and other drivers. Changes since v1: - improved commit message Signed-off-by: Jérôme Glisse Cc: Andrew Morton Cc: Ralph Campbell Cc: John Hubbard Cc: Dan Williams --- include/linux/hmm.h | 9 +++ mm/hmm.c | 152 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 161 insertions(+) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 5f9deaeb9d77..7aadf18b29cb 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -568,6 +568,15 @@ int hmm_range_register(struct hmm_range *range, void hmm_range_unregister(struct hmm_range *range); long hmm_range_snapshot(struct hmm_range *range); long hmm_range_fault(struct hmm_range *range, bool block); +long hmm_range_dma_map(struct hmm_range *range, + struct device *device, + dma_addr_t *daddrs, + bool block); +long hmm_range_dma_unmap(struct hmm_range *range, + struct vm_area_struct *vma, + struct device *device, + dma_addr_t *daddrs, + bool dirty); /* * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range diff --git a/mm/hmm.c b/mm/hmm.c index ce33151c6832..fd143251b157 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -1163,6 +1164,157 @@ long hmm_range_fault(struct hmm_range *range, bool block) return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } EXPORT_SYMBOL(hmm_range_fault); + +/* + * hmm_range_dma_map() - hmm_range_fault() and dma map page all in one. + * @range: range being faulted + * @device: device against to dma map page to + * @daddrs: dma address of mapped pages + * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem) + * Returns: number of pages mapped on success, -EAGAIN if mmap_sem have been + * drop and you need to try again, some other error value otherwise + * + * Note same usage pattern as hmm_range_fault(). + */ +long hmm_range_dma_map(struct hmm_range *range, + struct device *device, + dma_addr_t *daddrs, + bool block) +{ + unsigned long i, npages, mapped; + long ret; + + ret = hmm_range_fault(range, block); + if (ret <= 0) + return ret ? ret : -EBUSY; + + npages = (range->end - range->start) >> PAGE_SHIFT; + for (i = 0, mapped = 0; i < npages; ++i) { + enum dma_data_direction dir = DMA_FROM_DEVICE; + struct page *page; + + /* + * FIXME need to update DMA API to provide invalid DMA address + * value instead of a function to test dma address value. This + * would remove lot of dumb code duplicated accross many arch. + * + * For now setting it to 0 here is good enough as the pfns[] + * value is what is use to check what is valid and what isn't. + */ + daddrs[i] = 0; + + page = hmm_pfn_to_page(range, range->pfns[i]); + if (page == NULL) + continue; + + /* Check if range is being invalidated */ + if (!range->valid) { + ret = -EBUSY; + goto unmap; + } + + /* If it is read and write than map bi-directional. */ + if (range->pfns[i] & range->values[HMM_PFN_WRITE]) + dir = DMA_BIDIRECTIONAL; + + daddrs[i] = dma_map_page(device, page, 0, PAGE_SIZE, dir); + if (dma_mapping_error(device, daddrs[i])) { + ret = -EFAULT; + goto unmap; + } + + mapped++; + } + + return mapped; + +unmap: + for (npages = i, i = 0; (i < npages) && mapped; ++i) { + enum dma_data_direction dir = DMA_FROM_DEVICE; + struct page *page; + + page = hmm_pfn_to_page(range, range->pfns[i]); + if (page == NULL) + continue; + + if (dma_mapping_error(device, daddrs[i])) + continue; + + /* If it is read and write than map bi-directional. */ + if (range->pfns[i] & range->values[HMM_PFN_WRITE]) + dir = DMA_BIDIRECTIONAL; + + dma_unmap_page(device, daddrs[i], PAGE_SIZE, dir); + mapped--; + } + + return ret; +} +EXPORT_SYMBOL(hmm_range_dma_map); + +/* + * hmm_range_dma_unmap() - unmap range of that was map with hmm_range_dma_map() + * @range: range being unmapped + * @vma: the vma against which the range (optional) + * @device: device against which dma map was done + * @daddrs: dma address of mapped pages + * @dirty: dirty page if it had the write flag set + * Returns: number of page unmapped on success, -EINVAL otherwise + * + * Note that caller MUST abide by mmu notifier or use HMM mirror and abide + * to the sync_cpu_device_pagetables() callback so that it is safe here to + * call set_page_dirty(). Caller must also take appropriate locks to avoid + * concurrent mmu notifier or sync_cpu_device_pagetables() to make progress. + */ +long hmm_range_dma_unmap(struct hmm_range *range, + struct vm_area_struct *vma, + struct device *device, + dma_addr_t *daddrs, + bool dirty) +{ + unsigned long i, npages; + long cpages = 0; + + /* Sanity check. */ + if (range->end <= range->start) + return -EINVAL; + if (!daddrs) + return -EINVAL; + if (!range->pfns) + return -EINVAL; + + npages = (range->end - range->start) >> PAGE_SHIFT; + for (i = 0; i < npages; ++i) { + enum dma_data_direction dir = DMA_FROM_DEVICE; + struct page *page; + + page = hmm_pfn_to_page(range, range->pfns[i]); + if (page == NULL) + continue; + + /* If it is read and write than map bi-directional. */ + if (range->pfns[i] & range->values[HMM_PFN_WRITE]) { + dir = DMA_BIDIRECTIONAL; + + /* + * See comments in function description on why it is + * safe here to call set_page_dirty() + */ + if (dirty) + set_page_dirty(page); + } + + /* Unmap and clear pfns/dma address */ + dma_unmap_page(device, daddrs[i], PAGE_SIZE, dir); + range->pfns[i] = range->values[HMM_PFN_NONE]; + /* FIXME see comments in hmm_vma_dma_map() */ + daddrs[i] = 0; + cpages++; + } + + return cpages; +} +EXPORT_SYMBOL(hmm_range_dma_unmap); #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */