From patchwork Mon Jan 27 09:35:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Ghiti X-Patchwork-Id: 13951118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5066C0218C for ; Mon, 27 Jan 2025 09:37:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 607A56B011B; Mon, 27 Jan 2025 04:37:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B7AF28012F; Mon, 27 Jan 2025 04:37:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47F9E6B011D; Mon, 27 Jan 2025 04:37:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2A5736B011B for ; Mon, 27 Jan 2025 04:37:46 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D8A3A1620FD for ; Mon, 27 Jan 2025 09:37:45 +0000 (UTC) X-FDA: 83052729690.19.A37A8D7 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf07.hostedemail.com (Postfix) with ESMTP id F217040002 for ; Mon, 27 Jan 2025 09:37:43 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=yk8BvksE; spf=pass (imf07.hostedemail.com: domain of alexghiti@rivosinc.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=alexghiti@rivosinc.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737970664; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Pa3kN5Pj/U9dm1V/qwQhRimY+zw1O6zUoccW70uHpSw=; b=x2IrGWjlWrG7D16ccckR1lwO5TAhkS3VkL4vP2hm+1jIn7DyHPIkQ3OOfj+xP07oZ1Ax5e D59A28+0/ac2GKpNsLD7fLbI3V/B1+gVtUYP4M6QEg530i/y2BuulvnL/pQRGnC1h5WXBs k2P+/TuaE0m/dUU7KcMEj0yoAZhyBzQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737970664; a=rsa-sha256; cv=none; b=eghMFXYEWTUK9S4VUQSDmwolcz4v2Yi61A1dFLUYa50fVh/MKh60dEPH8WoetmOO4v2TYS MaTVrIggqeX9X27zslHwBDlC4owEg3x+Ror+CIeob5Hi2g6nwL2mAATZJeOUomY64GyM1t BNzKOrRcVHUPBObwzT09nXzeVIjOpX0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=yk8BvksE; spf=pass (imf07.hostedemail.com: domain of alexghiti@rivosinc.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=alexghiti@rivosinc.com; dmarc=none Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-4362f61757fso44483605e9.2 for ; Mon, 27 Jan 2025 01:37:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1737970662; x=1738575462; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Pa3kN5Pj/U9dm1V/qwQhRimY+zw1O6zUoccW70uHpSw=; b=yk8BvksEWFYgRU+RTNoZGXZPm+9QEuvtVa3afbeqPNKLF5LTjk3dOWURo/KNTv2nl4 9eEgDa8VGkdTYmU8CdkaJv2oRxWhjnWjtPXZnOuGiX8+CJ09AicS6tPSdOTCLhWxrwcO WgEZfN9fxqnOkWUhtj+JPaXuv+hIlp8HVdrKBBBTtQV73PuH+9HqyNBWHpaoL2V3nBDQ kXK3BN3EHBKVVrZnYc8tvFydDhQwY+gQAvjEm/QedJL/k98YSH+glP39CYhIsOwF9vXY CoIoRu9qK/dftmxeJmzKcEtKIcf3LIUvoNHCn8Qn6FSTaa2VcvSoG/EHmsdYGlNaXNif fW/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737970662; x=1738575462; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Pa3kN5Pj/U9dm1V/qwQhRimY+zw1O6zUoccW70uHpSw=; b=bpC5ByhjPkMInWC8bKcxBWoMZQ/IMgQ5yCw+OD9FIKzLBa12XuZMNVoqLuCKvnUz13 HIyieb7rvgITiqqmmaaLW4pRkoLZVwtRSgLP3Ia4CL71yoaLELKB/K/BMpH0xXFhgdSY a6MhhBpkc1YZKcr3xfu7UEQggiBKO/+rTYDLluZvnU+B11iCw9KaCNR8LLgV2IOofYQA Ww/L4g5fqPoaZPGmEqt1P7L589/0znuFBD/lQoaC7gyalMTGZG0eswoXjjCiF/gskg2G SzchP1mdnuK8hvJA0He9uhn/WcuIO6qXzeRZKebpWoQYpkqmGf8JVAWXk8IevTJCuda2 XSlQ== X-Forwarded-Encrypted: i=1; AJvYcCWBZo7XwstYllJReLPsC7HZ2LCpcZbOAw6Ezr174YM6VUlbIg8nKOHy21a/Swge0K0INOLSwAs33g==@kvack.org X-Gm-Message-State: AOJu0YxkcAZaN6zIjjFWVs7wWHdcoI5LfQb/HfUDGXRKODi4n+917yrJ twJLkyn/UK4X3wjXJa0V8ItxtWNk+d4T20bemvierIVll0XywbYcEXwqV4g7PI4= X-Gm-Gg: ASbGncveVAlr3wxBcs/VDRcii0ZvbL/d1BjxZWzPGerT8eQz5/HmCq4UzEKN779LT4F NE08kXLY7JCurCyFE2WVfWPyBuDr+LVhDaXEQ9CSxBFng3jduU/a0izq6LkJZAX5Nt1ZC6J6R5b 5YiFa+DktOHmUSYtGUezp7kB8m1FdBIKFKzJqV8ARQLHvgy4gmWg0il+xY4gUWnBAVhl7CegNgr aiZgFagSTeAoOA2wx9pkEgkK2+oGEyYUVDdetmisBX8Z9qTd8p04E1NYJ7pkwV290nFiAMOfKSe C1Ugcu3aCq+ocCJwZ4yMnxLltNTT8Ug= X-Google-Smtp-Source: AGHT+IHlygR1Rem42sW1brqnRYg+b7cppikaFDk94wrGvPPrCI76J4t5Wx9vdAYW3HFrhkgLyStnBg== X-Received: by 2002:a05:600c:4f05:b0:434:fe62:28c1 with SMTP id 5b1f17b1804b1-438913ef83bmr346516855e9.18.1737970662539; Mon, 27 Jan 2025 01:37:42 -0800 (PST) Received: from alex-rivos.ba.rivosinc.com ([2001:861:3382:ef90:3e22:3e78:ce5a:32c3]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438b1718741sm100844135e9.0.2025.01.27.01.37.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jan 2025 01:37:42 -0800 (PST) From: Alexandre Ghiti To: Catalin Marinas , Will Deacon , Ryan Roberts , Mark Rutland , Paul Walmsley , Palmer Dabbelt , Albert Ou , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org Cc: Alexandre Ghiti Subject: [PATCH v4 2/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code Date: Mon, 27 Jan 2025 10:35:23 +0100 Message-Id: <20250127093530.19548-3-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250127093530.19548-1-alexghiti@rivosinc.com> References: <20250127093530.19548-1-alexghiti@rivosinc.com> MIME-Version: 1.0 X-Stat-Signature: y9mndwshsts1phxu1risa1zjyyy63moc X-Rspam-User: X-Rspamd-Queue-Id: F217040002 X-Rspamd-Server: rspam03 X-HE-Tag: 1737970663-907803 X-HE-Meta: U2FsdGVkX1+GG4s2klA6wVOm9CrXoSZhaYbrHdtOflbNlJbpkZZnPqrTtrrF4jfHXvlGa7oZAMXICbwjLyEHsR7+eR4q4imqj0nirg3q1Cdl/HGSh3pklTqN1G0tbk9wEQDH35Sn16iPHGDjyFcDGCVugehG06+0WHJGhSdovmtLrAJzZqjcpRpQteSwSLGKpf88sbIh7qVcEncjHR8cj/Mst3vHL8DiuqDVYq3Ru3dSFJfq+HLPDrN3/W6zi8+YH2s/aQP95e4Uoq9RFovLJDEmbR0GnTz2uzYe74+u+X2YFFMtWLtQDYLndg8eGq0iwnKwYIvcmM8aiokoTsq/IoSxGGFz9vA8C+TTfL/3GfHayjoSDh1/CU1IJvbyCtgc4lAwk1eDOv6jjn05oX3KnjskbiO+V3JeeEEQuDfj56lLrXs9jri6R9U5QOPou98sXk7E/42PBcvjzGeFan2YVg0CIjVyfiarT4PGsFta9WRIicylfY/dvL1UYmJBuZbV3V5QwMqGMFD0VgKrxmw8mtxKqVgB7ubhSypbiD3Kdc4/N6jCqJLU3ojGnoWTfUheiMGUdFWj3rp2gKuF3SkKzIvsdAopAia1DJOTI09sMCCITr327tEjPFYfGpYBQ82DUMWNsKN6rXh/mLCY/IyArW9K3DczaHkOFzaSYamsbZRnx5D57n9sxWtS+cj9Y8HfHsTPLUKNMkVn8/7sD1hAEr2V0na65AV7gpciVFzIKlfIrZmbAfjVqh/3S3ovow+9ri25qrEEJxkNegdE1b41NVGLrE70HoDGGVguNWkQb6QYCgAl2ulFz1NFW5NjMvcKmH9OtMlokSEzoTDHkuXI2sak+fCNgiB4Q7t7Waiw32CpL4D5neE/QAuTssMyD5RPmS1fXDfnooltvS8WOo2/mfuu2ZcnyDnDR2dcD9MHslzYALUSKd67tRgi1LRbt6QGEIYa9Q6acS7gNzIowq5 k8N+ZJIz qWa7Yqnrv0hsC0XU4g9RlOvGgmefwnEhOuGUWSkkOla9ogkV+XWpsSXjCOrNAXl8+tSTyZVaqWfSBt5nCjEGEWqkmuhpaOo9M891erdC9UlmtKycGgaFSVv0igRkSzbnOpAaq+30ZY59R5LiEa4i0PE9LdFM7DbnDx2inVlGnbCdrUXwUSfrzPBb9xIXUV0nzNiw2cMmntrTVFNdlnouVmPWTujfFIwxSKjUaBnLFyd+6V9t6Gn+G+dHbjGIJd351K2+j2dbW8ExxmZsXKg/ZsQSq/7QfJner+xAoEeUPXgmdthB5aHsG1bQX7A1KpuTi4D9Ljuhid5LlblVH+IbnzHmkIuyZTNH8+BasP59WpQUWzIRpDhpngeU1A8JmRjA3ttt/UTTZ/luCkhH+DOHpsN3Slb8Y2c3+GazPc55osApNWOY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The core mm code expects to be able to extract the pfn from a pte. NAPOT mappings work differently since its ptes actually point to the first pfn of the mapping, the other bits being used to encode the size of the mapping. So modify ptep_get() so that it returns a pte value that contains the *real* pfn (which is then different from what the HW expects) and right before storing the ptes to the page table, reset the pfn LSBs to the size of the mapping. And make sure that all NAPOT mappings are set using set_ptes(). Signed-off-by: Alexandre Ghiti --- arch/riscv/include/asm/pgtable-64.h | 11 ++++ arch/riscv/include/asm/pgtable.h | 91 ++++++++++++++++++++++++++--- arch/riscv/mm/hugetlbpage.c | 9 +-- 3 files changed, 96 insertions(+), 15 deletions(-) diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h index 0897dd99ab8d..cddbe426f618 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -104,6 +104,17 @@ enum napot_cont_order { #define napot_cont_mask(order) (~(napot_cont_size(order) - 1UL)) #define napot_pte_num(order) BIT(order) +static inline bool is_napot_order(unsigned int order) +{ + unsigned int napot_order; + + for_each_napot_order(napot_order) + if (order == napot_order) + return true; + + return false; +} + #ifdef CONFIG_RISCV_ISA_SVNAPOT #define HUGE_MAX_HSTATE (2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE)) #else diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 050fdc49b5ad..82b264423b25 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -296,6 +296,8 @@ static inline unsigned long pte_napot(pte_t pte) return pte_val(pte) & _PAGE_NAPOT; } +#define pte_valid_napot(pte) (pte_present(pte) && pte_napot(pte)) + static inline pte_t pte_mknapot(pte_t pte, unsigned int order) { int pos = order - 1 + _PAGE_PFN_SHIFT; @@ -305,6 +307,12 @@ static inline pte_t pte_mknapot(pte_t pte, unsigned int order) return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT); } +/* pte at entry must *not* encode the mapping size in the pfn LSBs. */ +static inline pte_t pte_clear_napot(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_NAPOT); +} + #else static __always_inline bool has_svnapot(void) { return false; } @@ -314,17 +322,14 @@ static inline unsigned long pte_napot(pte_t pte) return 0; } +#define pte_valid_napot(pte) false + #endif /* CONFIG_RISCV_ISA_SVNAPOT */ /* Yields the page frame number (PFN) of a page table entry */ static inline unsigned long pte_pfn(pte_t pte) { - unsigned long res = __page_val_to_pfn(pte_val(pte)); - - if (has_svnapot() && pte_napot(pte)) - res = res & (res - 1UL); - - return res; + return __page_val_to_pfn(pte_val(pte)); } #define pte_page(x) pfn_to_page(pte_pfn(x)) @@ -559,8 +564,13 @@ static inline void __set_pte_at(struct mm_struct *mm, pte_t *ptep, pte_t pteval) #define PFN_PTE_SHIFT _PAGE_PFN_SHIFT -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pteval, unsigned int nr) +static inline pte_t __ptep_get(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} + +static inline void __set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pteval, unsigned int nr) { page_table_check_ptes_set(mm, ptep, pteval, nr); @@ -569,10 +579,13 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, if (--nr == 0) break; ptep++; + + if (unlikely(pte_valid_napot(pteval))) + continue; + pte_val(pteval) += 1 << _PAGE_PFN_SHIFT; } } -#define set_ptes set_ptes static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) @@ -627,6 +640,66 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, return ptep_test_and_clear_young(vma, address, ptep); } +#ifdef CONFIG_RISCV_ISA_SVNAPOT +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pteval, unsigned int nr) +{ + if (unlikely(pte_valid_napot(pteval))) { + unsigned int order = ilog2(nr); + + if (!is_napot_order(order)) { + /* + * Something's weird, we are given a NAPOT pte but the + * size of the mapping is not a known NAPOT mapping + * size, so clear the NAPOT bit and map this without + * NAPOT support: core mm only manipulates pte with the + * real pfn so we know the pte is valid without the N + * bit. + */ + pr_err("Incorrect NAPOT mapping, resetting.\n"); + pteval = pte_clear_napot(pteval); + } else { + /* + * NAPOT ptes that arrive here only have the N bit set + * and their pfn does not contain the mapping size, so + * set that here. + */ + pteval = pte_mknapot(pteval, order); + } + } + + __set_ptes(mm, addr, ptep, pteval, nr); +} +#define set_ptes set_ptes + +static inline pte_t ptep_get(pte_t *ptep) +{ + pte_t pte = __ptep_get(ptep); + + /* + * The pte we load has the N bit set and the size of the mapping in + * the pfn LSBs: keep the N bit and replace the mapping size with + * the *real* pfn since the core mm code expects to find it there. + * The mapping size will be reset just before being written to the + * page table in set_ptes(). + */ + if (unlikely(pte_valid_napot(pte))) { + unsigned int order = napot_cont_order(pte); + int pos = order - 1 + _PAGE_PFN_SHIFT; + unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT); + pte_t *orig_ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * napot_pte_num(order)); + + pte = __pte((pte_val(pte) & napot_mask) + ((ptep - orig_ptep) << _PAGE_PFN_SHIFT)); + } + + return pte; +} +#define ptep_get ptep_get +#else +#define set_ptes __set_ptes +#define ptep_get __ptep_get +#endif /* CONFIG_RISCV_ISA_SVNAPOT */ + #define pgprot_nx pgprot_nx static inline pgprot_t pgprot_nx(pgprot_t _prot) { diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 6b09cd1ef41c..59ed26ce6857 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -256,8 +256,7 @@ void set_huge_pte_at(struct mm_struct *mm, clear_flush(mm, addr, ptep, pgsize, pte_num); - for (i = 0; i < pte_num; i++, ptep++, addr += pgsize) - set_pte_at(mm, addr, ptep, pte); + set_ptes(mm, addr, ptep, pte, pte_num); } int huge_ptep_set_access_flags(struct vm_area_struct *vma, @@ -284,8 +283,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, if (pte_young(orig_pte)) pte = pte_mkyoung(pte); - for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) - set_pte_at(mm, addr, ptep, pte); + set_ptes(mm, addr, ptep, pte, pte_num); return true; } @@ -325,8 +323,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, orig_pte = pte_wrprotect(orig_pte); - for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) - set_pte_at(mm, addr, ptep, orig_pte); + set_ptes(mm, addr, ptep, orig_pte, pte_num); } pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,