From patchwork Tue Mar 11 12:25:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14011782 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5D221C282EC for ; Tue, 11 Mar 2025 12:41:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ke0fTkZ3V0vWgv4w//PTnCcvOBf1dVU3B0puwV4WJT4=; b=KrTOjcPrHWu4el uIR+2bzQJOgl9ll09k0Bk0rEkuw+wTQqhVf14cU3hMhg3hcozJZTxD/19MxVTc75kqwJsEMgwA9Qn 3LRNKw5GsLE21LAY04vPzqDaKWf26lv2L94GqXtS0MEigvu2AcFGpc6ZOB3EjfhqopWmboRPeoBkV EoTYqvpCdx1H685jFaeoZi+rMw7Txb24kJsUWUolMoTwq1CntltrURDBpXGhQinRlN26vdHH1TIXN 3PjEIEHHwo08L0ZyRwuuFkhlMiaC+OBTkhuXgROfr9hIILVzT80EfuzdPkOuU89y3N+rrjvib2w4N K+Ki/Bi4VeUJfyJnWDmQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tryvX-00000005gwv-3toi; Tue, 11 Mar 2025 12:41:43 +0000 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tryfj-00000005d6j-1Wa3 for linux-riscv@lists.infradead.org; Tue, 11 Mar 2025 12:25:24 +0000 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-22359001f1aso127549345ad.3 for ; Tue, 11 Mar 2025 05:25:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741695922; x=1742300722; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=fHi/f5ie5g+Raron2UT8ctucRsTpWdXbh5bc78KgqgOZeoH3xy3CojPxpdDKWCNIgO G8DkRn71T1IJlSWckodvIrm2EIGNV2hNQPuIejIW01GnueNgv0mBcMFlla3DR0A9L7Db aDN7g2F+IbNPw3JIZNImY6ZwUI3H9Tz418bX6ZTO0yejSS4ZQyTM2fat9fr7iQOo9dr4 RyjSqLWdWxaQjNbZqLHCO4Tm6NGZLFIVYD0DP7GuGqqCoAKmg1jZEwIqaB3EAR19Mrdk zJlC01ZSGIn7je9gZqgR35OOTwjcLA51CojNC94J6SXnC9I8ijxpWRdbfOseW3fft9rX QPHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741695922; x=1742300722; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=L8fGAmz6H3ZzogJ1K5DibUlk3wmjGdbzCBvsdezjdInyHexVfNSuO7nwUCsEhy8dLt Wna7ixqr074NDq6Lfx+PkrIurJgk4kQW4VAVzlGk1CqCyhFdybY0U7X/GBec3iyNbB3p GHcr6ODhdY07dQEpQyN/i427LMdqfcGEjUpo/00zXC7wZi3XMx4qwwbuZPOdzAchglGf Elu/TsmMS2KZdmcl5rrKIlL6PKGsIpF58rWPSGgKVc4fzrwS9/6NFfFGRX4LWIrpmXGm ODfTjHevEHmefJHLemTrO82FRAhcxbKYrRm0ZIQzDrSvxZRQqxjIJHgPt0dohY+uX9iz bcKg== X-Forwarded-Encrypted: i=1; AJvYcCUH259Tzytq4XKRHe1bxrtBE03ldhg+0TNehWPth4YO6v3Fdj2tpS9y4wkqkn/Z1pi05A0lpe2V7qssnQ==@lists.infradead.org X-Gm-Message-State: AOJu0YzwUQg2/SKkGtZmvSs17hXRKzusw6MgiTyO6mO3rTqX5LaivGNI lPzRkc35UgUFpkYZ/HeBkZIxZ3mBHO0O3RtkebZKxBDpb9Y8vYm6GxCPWe9n1JI= X-Gm-Gg: ASbGncumZ8BDZyJIsGVGOaAAQYSH2SSHlAtos/mxB4/4T7n6iEHuvVLhq8hTRuYKOnJ yHpiCY+EnZtMcvj6IZNqDWqPigxtX3G77n3FS0AecMB6CGXdkjYR8M3IUR1D3QidrKs3bIXzWf3 yfea/Y47ce8TKfAUPfYrOtyJ9ozeEGx2StY+w8p2KMR87Y6WfcrSkP1BOaPKnXw64F5Rfa27Nlv TKcTI1nMVj1eIxEpiCLVQHKnyaVcNB4JHiXB/VeZY14eMol3wSvPRRS1N/IOAgGTU617zD3PiqU Ng6zNd8oiaSCUv9RetjwJRZcQ9GRaEtrhPgdGw1nZhygMqdAjzo7P9HgRdI9bVZP2qDA/UIvgmV HgQiNYZzXVGxhF7eFxzIxrO88ajc= X-Google-Smtp-Source: AGHT+IF+AKaM7+LGh1MSeSzuaMnYSTp9Pfd35y7XVQv+2e9hDZ72MVIWeURa7iA/fxUl+/8M2l45Ow== X-Received: by 2002:a05:6a20:c891:b0:1f5:59e5:8ada with SMTP id adf61e73a8af0-1f559e58e8dmr18738163637.4.1741695922464; Tue, 11 Mar 2025 05:25:22 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.56]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73698450244sm10226621b3a.80.2025.03.11.05.25.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 11 Mar 2025 05:25:22 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH v2 1/4] mm/gup: Add huge pte handling logic in follow_page_pte() Date: Tue, 11 Mar 2025 20:25:07 +0800 Message-Id: <20250311122510.72934-2-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250311122510.72934-1-luxu.kernel@bytedance.com> References: <20250311122510.72934-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250311_052523_414238_D4D68E71 X-CRM114-Status: GOOD ( 14.78 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Page mapped at pte level can also be huge page when ARM CONT_PTE or RISC-V SVNAPOT is applied. Lack of huge pte handling logic in follow_page_pte() may lead to both performance and correctness issues. For example, on RISC-V platform, pages in the same 64K huge page have the same pte value, which means follow_page_pte() will get the same page for all of them using pte_pfn(). Then __get_user_pages() will return an array of pages with the same pfn. Mapping these pages causes memory confusion. This error can be triggered by the following code: void *addr = mmap(NULL, 0x10000, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | MAP_HUGE_64KB, -1, 0); struct vfio_iommu_type1_dma_map dmap_map = { .argsz = sizeof(dma_map), .flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE, .vaddr = (uint64_t)addr, .size = 0x10000, }; ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); This commit supplies huge pte handling logic in follow_page_pte() to avoid such problems. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 6 ++++++ include/linux/pgtable.h | 8 ++++++++ mm/gup.c | 17 +++++++++++------ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 050fdc49b5ad7..40ae5979dd82c 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -800,6 +800,12 @@ static inline bool pud_user_accessible_page(pud_t pud) #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define pte_trans_huge pte_trans_huge +static inline int pte_trans_huge(pte_t pte) +{ + return pte_huge(pte) && pte_napot(pte); +} + static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372e..3f57ee6dcf017 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1584,6 +1584,14 @@ static inline unsigned long my_zero_pfn(unsigned long addr) #ifdef CONFIG_MMU +#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(pte_trans_huge)) || \ + (!defined(CONFIG_TRANSPARENT_HUGEPAGE)) +static inline int pte_trans_huge(pte_t pte) +{ + return 0; +} +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_trans_huge(pmd_t pmd) { diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..67981ee28df86 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -838,7 +838,7 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, static struct page *follow_page_pte(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, unsigned int flags, - struct dev_pagemap **pgmap) + struct follow_page_context *ctx) { struct mm_struct *mm = vma->vm_mm; struct folio *folio; @@ -879,8 +879,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, * case since they are only valid while holding the pgmap * reference. */ - *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap); - if (*pgmap) + ctx->pgmap = get_dev_pagemap(pte_pfn(pte), ctx->pgmap); + if (ctx->pgmap) page = pte_page(pte); else goto no_page; @@ -940,6 +940,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, */ folio_mark_accessed(folio); } + if (is_vm_hugetlb_page(vma) || pte_trans_huge(pte)) { + ctx->page_mask = (1 << folio_order(folio)) - 1; + page = folio_page(folio, 0) + + ((address & (folio_size(folio) - 1)) >> PAGE_SHIFT); + } out: pte_unmap_unlock(ptep, ptl); return page; @@ -975,7 +980,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags, address); } if (likely(!pmd_leaf(pmdval))) - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags, address); @@ -988,14 +993,14 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, } if (unlikely(!pmd_leaf(pmdval))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); } if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + follow_page_pte(vma, address, pmd, flags, ctx); } page = follow_huge_pmd(vma, address, pmd, flags, ctx); spin_unlock(ptl); From patchwork Tue Mar 11 12:25:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14011779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 79D8DC28B2E for ; Tue, 11 Mar 2025 12:41:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=5SmjxGscVDcX7xDChu3Asd5VPBIahKIPSFy1bnJRj6o=; b=OQRWVQ9r+ye2s/ EkpCRWDI2A4bBrNVlDPlydg3xW48Xxrlz2N727DvHKEHVe3J+Fa786HRz51/vmdS1gWiP5M5/uMp1 49N9BjeDUkhr2vYWsO+sp2sueJBTEOme3ZhCGgmgVcBJKsbO4Z+dWka0MLXsLfG2sdVjCcK8Uko9J RMmMC88HuRbLPXO44HYF/0W2PPsJbAn/6OZ46FboR0p2KRQfXZwL3POrbZXkB9ktzmwFiNqJ4SpFr Oji1/LDdo6ktALAtc+24sFm+L8rNKFh9ukRFAvqebxcVtPsLPCcthtKAXbxByzc10v84YqTy0VSEt ZEZ+gpemVBolyMWy8msQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tryvZ-00000005gz9-3U0V; Tue, 11 Mar 2025 12:41:45 +0000 Received: from mail-pl1-x62f.google.com ([2607:f8b0:4864:20::62f]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tryfn-00000005dBM-24lL for linux-riscv@lists.infradead.org; Tue, 11 Mar 2025 12:25:28 +0000 Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-2235189adaeso90951045ad.0 for ; Tue, 11 Mar 2025 05:25:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741695926; x=1742300726; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4RzuTPLpNm7preMNxFl4olDUAY05FyWZH15sy7wmcR0=; b=lMw3NHxvbVSnvvdYvW+eiheDIw5UA17vR+lUgd6/1/1UhvUgc+ZxfmCvpBmxtduLhO zko3SitSv1+MqxyL8dBzvlnFiznegC/r//58a2I0W3KXX1zyqBD8SYuKt38P5Aqv2Jfm 5u3F0v83Ga3bqCL3AqWpZX+tfsbj1XjgFsZ7kfxej8+eY9k4khY6UHVwSGYM7IbW7+7B q4G9Z6BOkFMXKGrs+GhnN+vt7Hv74kD6Ew3Gihw5evi3koYgeyWYlc7uEEOsLhmtoo8J qdKgHryfnc5mQy4gBhApf1HLgdBGdsJ93XkBAedHl5Dgq8/TJ1awCxdJgfMwUZWr6E/X le+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741695926; x=1742300726; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4RzuTPLpNm7preMNxFl4olDUAY05FyWZH15sy7wmcR0=; b=sxzjTqvYV0V43IASbiDgNbR2cJnisewQHf7076cF5BrjR/JCa1TlZkJGyI7JD7BGSj 9tppTi690T7U8iQGK9omCIrZnjCekNt8vmilZ+y0+RXKFuzXAx1RunrSz1oKSV9CqfgX 8tN2XtN/Zgjs7gketRWaMAj9WJPJJBl5eq7X8oTcq25WM2dawAUSTMLO3QNjn44o4+Q5 hR9u4pzC3vXl1II3YZiVaVpws8eM/i5RWd2mataIttSxD8SBcGSQ2hzwlBdCKaASTM+6 MzEYfz92TnGtCo8ANL9KH+aS1cpjL9u81dPq0F7jUTw0ejNHSSiFyGyDT5YhR/Ag4pGW /oIw== X-Forwarded-Encrypted: i=1; AJvYcCWfY8mav1i9MujI4Onlz36p6n/TYKbopGfkj8j84NXJXXm+gtS6AjixNwP6V8Tok4hgxD5frfTOikl1Ag==@lists.infradead.org X-Gm-Message-State: AOJu0Yy6an+nRshoH5IUsGHcmuv1b6AIDTeggMXOyFoWWUfgeb0zUSLN RfhU4qH0I/qLIMFbfurKrEDEHCPX388aag1w5nGs73ox8Dnrcv9H3NMm6sVAeDk= X-Gm-Gg: ASbGnct2Ljb/8JcyNzreWZnbSi6rI9CF/IBb7VxAVXdFxYhDUY/u7R5WWe2SXYPwFAI UuAOtCg1fd06cYX4r29O0uRPlcABqc1KJXUXDh3EkUhNkmflhVNB0pH0seEcYxUtvILzqvC8zQw OQwXP7UYlNhL2r/wzwxTIW7mtgtLToxj6JQoadlC8CnxAubT1kKpfIF30T25EUxxPGi1J1aPu+2 gH1Xpy198sDD6q3Fpw8GfY3Uxm+lAkdAdjpJG2qnuSCChlu3TWoJgYYbqU5hbjJ9IIcsg9mw0mV 2Wa1hhk9Plk+WnA6HItxG8sgWKkjxho9IOH0qkdPy0ub74bR1MmuQmXVUDQOkpWB7oNhCZEhC1Y Lnj5+09QvfpbfAL2309ssbhlfdhI= X-Google-Smtp-Source: AGHT+IHHairhRley3Z0YtWmhECEN1QFgthVXv+QZN2Jlw6i0j1evvIpG3tBXLE7daxMtzzOJIpUjxQ== X-Received: by 2002:a17:902:cf0f:b0:220:ff82:1c60 with SMTP id d9443c01a7336-22593d78a97mr51960105ad.14.1741695926535; Tue, 11 Mar 2025 05:25:26 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.56]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73698450244sm10226621b3a.80.2025.03.11.05.25.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 11 Mar 2025 05:25:26 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH v2 2/4] iommu/riscv: Use pte_t to represent page table entry Date: Tue, 11 Mar 2025 20:25:08 +0800 Message-Id: <20250311122510.72934-3-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250311122510.72934-1-luxu.kernel@bytedance.com> References: <20250311122510.72934-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250311_052527_564339_0AA693B4 X-CRM114-Status: GOOD ( 15.76 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Since RISC-V IOMMU has the same pte format and translation process with MMU as is specified in RISC-V Privileged specification, we use pte_t to represent IOMMU pte too to reuse existing pte operation functions. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 79 ++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 8f049d4a0e2cb..3b0c934decd08 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -812,7 +812,7 @@ struct riscv_iommu_domain { bool amo_enabled; int numa_node; unsigned int pgd_mode; - unsigned long *pgd_root; + pte_t *pgd_root; }; #define iommu_domain_to_riscv(iommu_domain) \ @@ -1081,27 +1081,29 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, #define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t))) -#define _io_pte_present(pte) ((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) -#define _io_pte_leaf(pte) ((pte) & _PAGE_LEAF) -#define _io_pte_none(pte) ((pte) == 0) -#define _io_pte_entry(pn, prot) ((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot)) +#define _io_pte_present(pte) (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) +#define _io_pte_leaf(pte) (pte_val(pte) & _PAGE_LEAF) +#define _io_pte_none(pte) (pte_val(pte) == 0) +#define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))) static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - unsigned long pte, struct list_head *freelist) + pte_t pte, struct list_head *freelist) { - unsigned long *ptr; + pte_t *ptr; int i; if (!_io_pte_present(pte) || _io_pte_leaf(pte)) return; - ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); /* Recursively free all sub page table pages */ for (i = 0; i < PTRS_PER_PTE; i++) { - pte = READ_ONCE(ptr[i]); - if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) == pte) + pte = ptr[i]; + if (!_io_pte_none(pte)) { + ptr[i] = __pte(0); riscv_iommu_pte_free(domain, pte, freelist); + } } if (freelist) @@ -1110,12 +1112,12 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, iommu_free_page(ptr); } -static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, - unsigned long iova, size_t pgsize, - gfp_t gfp) +static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, + unsigned long iova, size_t pgsize, + gfp_t gfp) { - unsigned long *ptr = domain->pgd_root; - unsigned long pte, old; + pte_t *ptr = domain->pgd_root; + pte_t pte, old; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; void *addr; @@ -1131,7 +1133,7 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, if (((size_t)1 << shift) == pgsize) return ptr; pte_retry: - pte = READ_ONCE(*ptr); + pte = ptep_get(ptr); /* * This is very likely incorrect as we should not be adding * new mapping with smaller granularity on top @@ -1147,38 +1149,37 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, addr = iommu_alloc_page_node(domain->numa_node, gfp); if (!addr) return NULL; - old = pte; - pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); - if (cmpxchg_relaxed(ptr, old, pte) != old) { - iommu_free_page(addr); + old = ptep_get(ptr); + if (!_io_pte_none(old)) goto pte_retry; - } + pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); + set_pte(ptr, pte); } - ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); return NULL; } -static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, - unsigned long iova, size_t *pte_pgsize) +static pte_t *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, + unsigned long iova, size_t *pte_pgsize) { - unsigned long *ptr = domain->pgd_root; - unsigned long pte; + pte_t *ptr = domain->pgd_root; + pte_t pte; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; do { const int shift = PAGE_SHIFT + PT_SHIFT * level; ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); - pte = READ_ONCE(*ptr); + pte = ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { *pte_pgsize = (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) return NULL; - ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); return NULL; @@ -1191,8 +1192,9 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; - unsigned long *ptr; - unsigned long pte, old, pte_prot; + pte_t *ptr; + pte_t pte, old; + unsigned long pte_prot; int rc = 0; LIST_HEAD(freelist); @@ -1210,10 +1212,9 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, break; } - old = READ_ONCE(*ptr); + old = ptep_get(ptr); pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); - if (cmpxchg_relaxed(ptr, old, pte) != old) - continue; + set_pte(ptr, pte); riscv_iommu_pte_free(domain, old, &freelist); @@ -1247,7 +1248,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = pgcount << __ffs(pgsize); - unsigned long *ptr, old; + pte_t *ptr; size_t unmapped = 0; size_t pte_size; @@ -1260,9 +1261,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, if (iova & (pte_size - 1)) return unmapped; - old = READ_ONCE(*ptr); - if (cmpxchg_relaxed(ptr, old, 0) != old) - continue; + set_pte(ptr, __pte(0)); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1279,13 +1278,13 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t pte_size; - unsigned long *ptr; + pte_t *ptr; ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); - if (_io_pte_none(*ptr) || !_io_pte_present(*ptr)) + if (_io_pte_none(ptep_get(ptr)) || !_io_pte_present(ptep_get(ptr))) return 0; - return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1)); + return pfn_to_phys(pte_pfn(ptep_get(ptr))) | (iova & (pte_size - 1)); } static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) From patchwork Tue Mar 11 12:25:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14011781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9733FC35FF1 for ; Tue, 11 Mar 2025 12:41:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=nFffb7+EGlzSecglz7jd3VaFH+udym1LMlEAgOOjpMQ=; b=tcynUYPDWlD2jD FZcwmZxafNgotUcXDwGlxk7TuWVUHxgkEMWQbgtPPUvailLmczbyDYAacx06a50kOPLyvBttnYdsx VV+8Mj7pg0P5RA5OdYPoU/3qPCzeu9HnLM+Zb7JVzIh7s7PDDbDa5epQFdvrmoqWBD4WAHdcKUXbS TG7TFhvfaR9qsRRVY7+AR5AxrkNDFrJ+lPZWuWbAgrbFrDkhcWfI9qFdtJZers3tiTR7Vr7DGFBaj qYaaGWNz4VcdHXf47xF5/4XzHBu6Nyd2svxk5ecGO1p6a2IbHEiroX6XMIk2qb9U5OEcxXlu3TsJ2 fCeBPkvEcHlCviUwqOkA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tryva-00000005h0K-2QOa; Tue, 11 Mar 2025 12:41:46 +0000 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tryfr-00000005dE4-3G0m for linux-riscv@lists.infradead.org; Tue, 11 Mar 2025 12:25:32 +0000 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-219f8263ae0so100784965ad.0 for ; Tue, 11 Mar 2025 05:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741695931; x=1742300731; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=VjySV44CVOjO+Vust+dUZdnA/+bQJh2Dq0uq5pRswcR7NBi+RTULXR+wvm4hzCFECY cAqN8gljZTEOaUDBhjHlTmrsb0LjaWKFgw+QG4WCiNuSSXx0m71KhrD5LMNoANCx6plQ rrvxjCpLHCTsvV8yDGMEAOVUgBndw217296bTMVZ/Lu/MS+ktOobSxorkxJXNdQNW9VW NDvP1VaUh9eZNq9vfOLedB5IcjGzb+UyQTXsm8ItxKpy/ZJMemCBB0xMZvdKqu0c+0ZG tXHSsdRgqOZH2v/JSoD0hYrVsFc5ZDHrXZeclMSA3H7SVI8JFAiSDTyzEdyEOgk/2knl x9Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741695931; x=1742300731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=Z100NZpFDkhlBh7LZRBBDd/fmI/Z5BsYxCMjbbPjk85Ft/NJDXykj6IgdyPCKDQKNR N2GaztHM8Rj4/c9UPgv+silhAkwEKKlsMtLbw3fH1W38jQNpqXFGpDpBzNDY/OowXr8J r/JWRoKN+CHhlGdtc99/TWBRZEcChVfAG+t5rkbm2TW49Y2bfRYJj/w2Q4l+OrbLaBU4 SPvVVJ0Kl2tKBMuPPUeecHHRlmlpSpZ7IEXdh0eQMxRUge2QgvG+abfMr45P+8gx72eb MkkFtiGccQijmMqnax798No7VBR+BuUMOdhfFRvCjMLzUU9P/j+qnHy2I5ffF1STpB5u J/Pg== X-Forwarded-Encrypted: i=1; AJvYcCX1NTH+n+mGsdaPFCW2uJ98iCSIU89TjUB2qZnEASSuJZRLFqXaydpjx0e1WOO4h8nfj77ZGBrG2kC27w==@lists.infradead.org X-Gm-Message-State: AOJu0YxcEg2md3x/UoI6ZCTP+/AuQkv2fp8hnqJDRBY88IIN7HTby/Is soMM5rwU/s6KdjlCk0/kmtkNwFqLPpLzEuAqAOjtKkTWJBHLHVjgYiwgiIg/qjI= X-Gm-Gg: ASbGnctwpChje1Y3mk4meqZ9+70Z5rCRwM86YCwzqDHE27FbhZnt5DjdTI7gkkHXbmO lK9Jiw+mqTieAoETgQgxQycrB24E70DEfqV9juY4PtHGi6MBo0kBRmxyeOxzr5X1/JqnZiI76WW zTqQsv4hHFy0I7lf2fiH8xrmyZK041+A0rskhkIn874JE3N8uNTrTBHO8O4TAPf1Emp0t/vhRUI lB3r+7BOFVFGxH/W+tDFy43FNmmwIflYRg6S5ZjZoviX1rB85x+AYlwPkh1brJt2yAHkm/1sTy8 Od6xoV2lIDtdTGaJHIIuOwfRlAA8uMLBdTQSAHDxhNIZHcIypQ4cQ4W45NYa2z69m/AUq7ewbHp 7TzRAjI4VNwmtCvfTl53Q8vFUaMb1T3oup+IhfQ== X-Google-Smtp-Source: AGHT+IHyZHInMD5SykYccWKdstIzjRz05MOYhFQPfcCcFj0ULoo9WguYAXrHgJpVCASbofOODWBviA== X-Received: by 2002:a05:6a00:10d5:b0:736:47a5:e268 with SMTP id d2e1a72fcca58-736eb7b363bmr4610912b3a.1.1741695930617; Tue, 11 Mar 2025 05:25:30 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.56]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73698450244sm10226621b3a.80.2025.03.11.05.25.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 11 Mar 2025 05:25:30 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH v2 3/4] iommu/riscv: Introduce IOMMU page table lock Date: Tue, 11 Mar 2025 20:25:09 +0800 Message-Id: <20250311122510.72934-4-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250311122510.72934-1-luxu.kernel@bytedance.com> References: <20250311122510.72934-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250311_052531_826581_11013CEE X-CRM114-Status: GOOD ( 16.46 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Introduce page table lock to address competition issues when modifying multiple PTEs, for example, when applying Svnapot. We use fine-grained page table locks to minimize lock contention. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 123 +++++++++++++++++++++++++++++++----- 1 file changed, 107 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 3b0c934decd08..ce4cf6569ffb4 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -808,6 +808,7 @@ struct riscv_iommu_domain { struct iommu_domain domain; struct list_head bonds; spinlock_t lock; /* protect bonds list updates. */ + spinlock_t page_table_lock; /* protect page table updates. */ int pscid; bool amo_enabled; int numa_node; @@ -1086,8 +1087,80 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, #define _io_pte_none(pte) (pte_val(pte) == 0) #define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))) +#define RISCV_IOMMU_PMD_LEVEL 1 + +static bool riscv_iommu_ptlock_init(struct ptdesc *ptdesc, int level) +{ + if (level <= RISCV_IOMMU_PMD_LEVEL) + return ptlock_init(ptdesc); + return true; +} + +static void riscv_iommu_ptlock_free(struct ptdesc *ptdesc, int level) +{ + if (level <= RISCV_IOMMU_PMD_LEVEL) + ptlock_free(ptdesc); +} + +static spinlock_t *riscv_iommu_ptlock(struct riscv_iommu_domain *domain, + pte_t *pte, int level) +{ + spinlock_t *ptl; /* page table page lock */ + +#ifdef CONFIG_SPLIT_PTE_PTLOCKS + if (level <= RISCV_IOMMU_PMD_LEVEL) + ptl = ptlock_ptr(page_ptdesc(virt_to_page(pte))); + else +#endif + ptl = &domain->page_table_lock; + spin_lock(ptl); + + return ptl; +} + +static void *riscv_iommu_alloc_pagetable_node(int numa_node, gfp_t gfp, int level) +{ + struct ptdesc *ptdesc; + void *addr; + + addr = iommu_alloc_page_node(numa_node, gfp); + if (!addr) + return NULL; + + ptdesc = page_ptdesc(virt_to_page(addr)); + if (!riscv_iommu_ptlock_init(ptdesc, level)) { + iommu_free_page(addr); + addr = NULL; + } + + return addr; +} + +static void riscv_iommu_free_pagetable(void *addr, int level) +{ + struct ptdesc *ptdesc = page_ptdesc(virt_to_page(addr)); + + riscv_iommu_ptlock_free(ptdesc, level); + iommu_free_page(addr); +} + +static int pgsize_to_level(size_t pgsize) +{ + int level = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 - + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + int shift = PAGE_SHIFT + PT_SHIFT * level; + + while (pgsize < ((size_t)1 << shift)) { + shift -= PT_SHIFT; + level--; + } + + return level; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - pte_t pte, struct list_head *freelist) + pte_t pte, int level, + struct list_head *freelist) { pte_t *ptr; int i; @@ -1102,10 +1175,11 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte = ptr[i]; if (!_io_pte_none(pte)) { ptr[i] = __pte(0); - riscv_iommu_pte_free(domain, pte, freelist); + riscv_iommu_pte_free(domain, pte, level - 1, freelist); } } + riscv_iommu_ptlock_free(page_ptdesc(virt_to_page(ptr)), level); if (freelist) list_add_tail(&virt_to_page(ptr)->lru, freelist); else @@ -1117,8 +1191,9 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, gfp_t gfp) { pte_t *ptr = domain->pgd_root; - pte_t pte, old; + pte_t pte; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + spinlock_t *ptl; /* page table page lock */ void *addr; do { @@ -1146,14 +1221,21 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, * page table. This might race with other mappings, retry. */ if (_io_pte_none(pte)) { - addr = iommu_alloc_page_node(domain->numa_node, gfp); + addr = riscv_iommu_alloc_pagetable_node(domain->numa_node, gfp, + level - 1); if (!addr) return NULL; - old = ptep_get(ptr); - if (!_io_pte_none(old)) + + ptl = riscv_iommu_ptlock(domain, ptr, level); + pte = ptep_get(ptr); + if (!_io_pte_none(pte)) { + spin_unlock(ptl); + riscv_iommu_free_pagetable(addr, level - 1); goto pte_retry; + } pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); set_pte(ptr, pte); + spin_unlock(ptl); } ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); @@ -1193,9 +1275,10 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; pte_t *ptr; - pte_t pte, old; + pte_t pte; unsigned long pte_prot; - int rc = 0; + int rc = 0, level; + spinlock_t *ptl; /* page table page lock */ LIST_HEAD(freelist); if (!(prot & IOMMU_WRITE)) @@ -1212,11 +1295,12 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, break; } - old = ptep_get(ptr); + level = pgsize_to_level(pgsize); + ptl = riscv_iommu_ptlock(domain, ptr, level); + riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); set_pte(ptr, pte); - - riscv_iommu_pte_free(domain, old, &freelist); + spin_unlock(ptl); size += pgsize; iova += pgsize; @@ -1251,6 +1335,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, pte_t *ptr; size_t unmapped = 0; size_t pte_size; + spinlock_t *ptl; /* page table page lock */ while (unmapped < size) { ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1261,7 +1346,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, if (iova & (pte_size - 1)) return unmapped; + ptl = riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); set_pte(ptr, __pte(0)); + spin_unlock(ptl); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1291,13 +1378,14 @@ static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); const unsigned long pfn = virt_to_pfn(domain->pgd_root); + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; WARN_ON(!list_empty(&domain->bonds)); if ((int)domain->pscid > 0) ida_free(&riscv_iommu_pscids, domain->pscid); - riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), level, NULL); kfree(domain); } @@ -1358,7 +1446,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) struct riscv_iommu_device *iommu; unsigned int pgd_mode; dma_addr_t va_mask; - int va_bits; + int va_bits, level; iommu = dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1381,11 +1469,14 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) INIT_LIST_HEAD_RCU(&domain->bonds); spin_lock_init(&domain->lock); + spin_lock_init(&domain->page_table_lock); domain->numa_node = dev_to_node(iommu->dev); domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD); domain->pgd_mode = pgd_mode; - domain->pgd_root = iommu_alloc_page_node(domain->numa_node, - GFP_KERNEL_ACCOUNT); + level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + domain->pgd_root = riscv_iommu_alloc_pagetable_node(domain->numa_node, + GFP_KERNEL_ACCOUNT, + level); if (!domain->pgd_root) { kfree(domain); return ERR_PTR(-ENOMEM); @@ -1394,7 +1485,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1, RISCV_IOMMU_MAX_PSCID, GFP_KERNEL); if (domain->pscid < 0) { - iommu_free_page(domain->pgd_root); + riscv_iommu_free_pagetable(domain->pgd_root, level); kfree(domain); return ERR_PTR(-ENOMEM); } From patchwork Tue Mar 11 12:25:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14011780 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 797FAC35FF2 for ; Tue, 11 Mar 2025 12:41:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=vOUFkoRSH2JR7fsYgaSWMjgr5hbG+MEV47gD0+Gaca4=; b=LB7hTk9jzapGHr NW6SHNRVW3knnCtUbc4T71AWb0OmnlGlUsNwy2atT0i1nz9m45V97NlRI33X5Wl9MkDWZ5t/B1NhY 8QgEATU7arDd6THAGmEifv23pAT0dgbEFwjRLBQxNXxe7aWUh1NoSnKXBQSnlDHCqtp0yQwGJa2pE pPIr7OMkIMf+vwbdaHYsw7v37oQNgAWvJKvIyBTQU4msMUqfHG16KsOaK/UpCFv/D90OPwNZrCqZi N3FFxKl7HCL/IJyalrYx3CioJSpMeMxRRHwvvN0+Gfw4pByR2x++TIzJ5udFbErlOYhd/8MztqICv 0k5NYmlUrEmSMMSiBT9A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tryvc-00000005h2I-0O4k; Tue, 11 Mar 2025 12:41:48 +0000 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tryfv-00000005dFk-0noo for linux-riscv@lists.infradead.org; Tue, 11 Mar 2025 12:25:36 +0000 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-22359001f1aso127553625ad.3 for ; Tue, 11 Mar 2025 05:25:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741695935; x=1742300735; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=25EEGwiFn3VnbkwivdTLYPKqZQgF1BeYiMnWI1gQipU=; b=JGzbhuHZ/iDEWnqD7gZs2VuwCxPgpFff3dtC6SByWoB1Zj8/PB2CAAo71nPbLqopzC gm6Hglj5jLRpxHge0Z7ZBBnwiuhFqh40ePPFPZrYNLtk49CZhc7mcli92xk6t9PwP+CG fSZdNPutK1u1HikJSqjL8TalxQCj2hKDGaFMXJmLJR4jEn9RToN9P2JB54lEI7z0hXZO rqJvOxHkzkRNXZl1F2DD/bvTQEDVsLQftF1/De5R69rOhVdm70z0uwdMEwbMx9icWNIh qJjvBjgL+Lwt/XO+g/MtIzm5qd9UOmn4l+7pV9P7e+8vJbM7ZSKaB/W92x+6hzAjYd10 jCrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741695935; x=1742300735; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=25EEGwiFn3VnbkwivdTLYPKqZQgF1BeYiMnWI1gQipU=; b=aUyr1np1sR+oNg7alYFpFurSAD8Uxj9tHJV3Y8kT31ptvzG9cOFEzeRpf5DqtMuFcu kFJrx239D5RIiNUSrrg1zAgNW2Er2fU1T3SdvrcoSJRgM3f2msXC2imu9eg2TQRsSeOr dizA1bAOd/8LYlA8mkuZGcuAM3EnqWXrgTwEgtW67TQ9t4pq4P2z5Ol5+UrQeZeLHrbw /TRIkY3DW0o55QMzggO8jAvp3snr9CEUa+XCGp1SgUqJPcpu1D3cck4dPXyOqOdBgxA1 DM/1aO9MQE3ZYs5j8vsj7368XOQXCpntObHDSWx9dZNT2Z043gvttxg+AL/xJlaC3Ik9 ZGDg== X-Forwarded-Encrypted: i=1; AJvYcCX4HE5iEZ4fbP9scG9PyVmfeW0gNz9Esp8w+GVZTyg1FCgGfK8GmNx28CldAWDWp5otoOL6EbyevsqQ3w==@lists.infradead.org X-Gm-Message-State: AOJu0Yz+8JWZnXzg4yGlX200Dz0fqxJca9NnCNLamDYI495B7TzsmavG dVpcwMMbamd7Gm4V/iHBIyHBCyxb9113TvAJ/zQw0i9L5tgdj11jfDTiaaZG80Q= X-Gm-Gg: ASbGncucASDIbv69iZXJ7oUeT1q96TP7y9H52qrQOKiASYj1yHMyTMXj3f5YCQn2h6k zDimp5VqMMVrO6Bq0UYFtD0XX4BVTeqEznVoas5LHGon6nVQ+cXCUD77VKcUuTyq+jTlJC8Li2M 5QxryuzvRh8CQKNWmg1dPfPma7T/KOnDN29FLnixu680whERe7R2BHl50CJu9R+TTa3AM1lBuH+ VoA90frz+kFrogVOiHhEOXnEgfU+izFXEHvUX3tMrHerKbyA2Vo9aoNikIoOo2ele+yhgBlgXP3 3IVTcxYP2zgYZUHGCcjBS8aBUQRM2W5jUJlEvKdXoW+e9QSDCTvHVIP7FwALGxxEYj15fnliBKq RYW9NbTsxU60zXfKM9DXAb0hFjJ8= X-Google-Smtp-Source: AGHT+IHy0qPijH5qvnCVrXw7QQ1MI4RxDeaXxBgMZ4OfxRl/g9KDvFkXIYQvqU0hi4IEIQad7sl9DA== X-Received: by 2002:a17:902:f650:b0:224:1609:a747 with SMTP id d9443c01a7336-22428a9c09fmr326665675ad.31.1741695934635; Tue, 11 Mar 2025 05:25:34 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.56]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73698450244sm10226621b3a.80.2025.03.11.05.25.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 11 Mar 2025 05:25:34 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH v2 4/4] iommu/riscv: Add support for Svnapot Date: Tue, 11 Mar 2025 20:25:10 +0800 Message-Id: <20250311122510.72934-5-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250311122510.72934-1-luxu.kernel@bytedance.com> References: <20250311122510.72934-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250311_052535_236109_BB8801E0 X-CRM114-Status: GOOD ( 16.03 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Add Svnapot size as supported page size and apply Svnapot when it is possible. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 86 +++++++++++++++++++++++++++++++++---- 1 file changed, 77 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index ce4cf6569ffb4..7cc736abd2a61 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -1158,6 +1158,26 @@ static int pgsize_to_level(size_t pgsize) return level; } +static unsigned long napot_size_to_order(unsigned long size) +{ + unsigned long order; + + if (!has_svnapot()) + return 0; + + for_each_napot_order(order) { + if (size == napot_cont_size(order)) + return order; + } + + return 0; +} + +static bool is_napot_size(unsigned long size) +{ + return napot_size_to_order(size) != 0; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte_t pte, int level, struct list_head *freelist) @@ -1205,7 +1225,8 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, * existing mapping with smaller granularity. Up to the caller * to replace and invalidate. */ - if (((size_t)1 << shift) == pgsize) + if ((((size_t)1 << shift) == pgsize) || + (is_napot_size(pgsize) && pgsize_to_level(pgsize) == level)) return ptr; pte_retry: pte = ptep_get(ptr); @@ -1256,7 +1277,10 @@ static pte_t *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); pte = ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { - *pte_pgsize = (size_t)1 << shift; + if (pte_napot(pte)) + *pte_pgsize = napot_cont_size(napot_cont_order(pte)); + else + *pte_pgsize = (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) @@ -1274,13 +1298,18 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; - pte_t *ptr; - pte_t pte; - unsigned long pte_prot; - int rc = 0, level; + pte_t *ptr, old, pte; + unsigned long pte_prot, order = 0; + int rc = 0, level, i; spinlock_t *ptl; /* page table page lock */ LIST_HEAD(freelist); + if (iova & (pgsize - 1)) + return -EINVAL; + + if (is_napot_size(pgsize)) + order = napot_size_to_order(pgsize); + if (!(prot & IOMMU_WRITE)) pte_prot = _PAGE_BASE | _PAGE_READ; else if (domain->amo_enabled) @@ -1297,9 +1326,27 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, level = pgsize_to_level(pgsize); ptl = riscv_iommu_ptlock(domain, ptr, level); - riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); + + old = ptep_get(ptr); + if (pte_napot(old) && napot_cont_size(napot_cont_order(old)) > pgsize) { + spin_unlock(ptl); + rc = -EFAULT; + break; + } + pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); - set_pte(ptr, pte); + if (order) { + pte = pte_mknapot(pte, order); + for (i = 0; i < napot_pte_num(order); i++, ptr++) { + old = ptep_get(ptr); + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + } else { + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + spin_unlock(ptl); size += pgsize; @@ -1336,6 +1383,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, size_t unmapped = 0; size_t pte_size; spinlock_t *ptl; /* page table page lock */ + unsigned long pte_num; + pte_t pte; + int i; while (unmapped < size) { ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1347,7 +1397,21 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, return unmapped; ptl = riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); - set_pte(ptr, __pte(0)); + if (is_napot_size(pte_size)) { + pte = ptep_get(ptr); + + if (!pte_napot(pte) || + napot_cont_size(napot_cont_order(pte)) != pte_size) { + spin_unlock(ptl); + return unmapped; + } + + pte_num = napot_pte_num(napot_cont_order(pte)); + for (i = 0; i < pte_num; i++, ptr++) + set_pte(ptr, __pte(0)); + } else { + set_pte(ptr, __pte(0)); + } spin_unlock(ptl); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, @@ -1447,6 +1511,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) unsigned int pgd_mode; dma_addr_t va_mask; int va_bits, level; + size_t order; iommu = dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1506,6 +1571,9 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) domain->domain.geometry.aperture_end = va_mask; domain->domain.geometry.force_aperture = true; domain->domain.pgsize_bitmap = va_mask & (SZ_4K | SZ_2M | SZ_1G | SZ_512G); + if (has_svnapot()) + for_each_napot_order(order) + domain->domain.pgsize_bitmap |= napot_cont_size(order) & va_mask; domain->domain.ops = &riscv_iommu_paging_domain_ops;