From patchwork Mon Mar 10 13:22:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14010130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B1247C282EC for ; Mon, 10 Mar 2025 14:49:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=RU+8tYpKOsF9hLgzDMwKMx31qhE8vLOVvO/GmzL2EhQ=; b=v0c9NLWsC2tlvG PnOJd57VeAtXjEvb5d6bb4Iv3gx2re0xcGngFdxeWo6gVQ65RGAezdKK2xRuF55q6KtlRhXlFR4ZV 2QPPBfzkWDNNQYXW4oWRk2wZp9AITTIN90cSDkxbh41XdCtcYmQq+XRbzjo8+645JrjDA07LqSUj1 KYV7JwBb3RbVKIpwhh3LiA+p/a6jh4K9xSoT10vGxB7KezQfPWZVve+vhstvPGs4bk6wqoHx0BTrR wrZyMKUCQIZFYl3i3z2cbS8yJUcqi3KV1oEAvzoHwYbhgkmH42ejTXJSu+YQO5IGZb7X1DmF8vHe4 8vgq40dvKmdEoOP45FTQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1treRG-000000030Kg-2HBg; Mon, 10 Mar 2025 14:49:06 +0000 Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1trd5Z-00000002mOK-0cXK for linux-riscv@lists.infradead.org; Mon, 10 Mar 2025 13:22:38 +0000 Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-223594b3c6dso72874195ad.2 for ; Mon, 10 Mar 2025 06:22:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612953; x=1742217753; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FE1/PtV/9zmh+GtmYcbFmUleMVWqQrw0bsQW718u9NM=; b=IP0eXj04F/OPYyk1TIiruHrLV3qMCy/CjQtEXKt5RUbOi9x1EAShqe6m+78gtwC70n /10T76/Kj3Ei+UzWbzH7lXCS9srNJQdzJ/+D6j5OWtiREK5nVuQTRuBWLla8hPFutcQX n+xLRPK331HKKgj6v3m5aj8bJ9ud0V0kUgqhi+4kczrmzFsRWI3eoTxbTblvnSawfjFe mwcF7l8I4VJI341M1eCOxV8LFUZVzvDOVE3MK9vMPtkUDqZ/3jbGZlAMaLy0pER2qytT GE2bDX2b7oPUX9s5dC5ixsCIUQz3czSy0/lwZV96n0qkC2vhA20Gsm9zZ/iMqcuLNOjt dcFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612953; x=1742217753; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FE1/PtV/9zmh+GtmYcbFmUleMVWqQrw0bsQW718u9NM=; b=GwJaoVBBgADU0CwUnHGuTyBanCkoQvPQKRZbmsYch0JxmbGJxLwKPgwbHLN0UCDNvW UCJPqNCNzKYSncGDLYfEiixEvfXd3n0UMzFp8VZJy4voxemJ9udT+adkTWjaCsbMC0/C EFhH1INxNho9HqW282vo6rhaidLy4XFVV6lYoeLHuUu6+HB/WImAlWCdOuGqDJj53Atx PdlJzGwNSwQn82yBtfzXbExUte2QnPO8x/XCzJMhhrYIcsLbiXwtBQbmfdW3RIPyKWbQ FtXYin2J3Xfwf8jIJhX17u9Pa6/xMdO4tzlXeORbSHP1MRxm13CH9Y6CVbnfhmBpFd4Q jFUQ== X-Forwarded-Encrypted: i=1; AJvYcCWADIsrSBCCp4ritAQKJxzuvK43WbiF1K0fvqBd8+j/WAh5Hzc9Aqlowk3JwVguvZ2ZlvboZfVx2OBThg==@lists.infradead.org X-Gm-Message-State: AOJu0Yw5Gg8LHKKBswbMWiomgq5NgLrANSLME2alxasNqyO1j1KyuNEq al7VAusHZc1Kv2hEgB5bdkZKsBqN6JXpbRxB4STZ4fz9jBFDXv7vGIbnaS2SU4/peOoYIxlUyNN SJQQ= X-Gm-Gg: ASbGncuwHJfhS1OnEzQaPhzBUq61jbbvQUhRUKZVIa7Q+zCCclAoxqJQwv/eOm7fT7r NcBXgB94inmHasF0ISJN1p/nooajodW5rCp9289vfTI5ie3iqnPdU/DBFhBOAdP0oYSgisXAFL/ 9OEB6qNmkZF09MFHrBPsnbd2fO7tYQ5gNgyvRd0NGte8SQgNDHkUnTvL/78FHzDl2iJo6dK/3IQ tt0QE25ANuYGG00OAh2n3tpm7TNEO9jRxBtb7AzUK+7YSogpjmcG5xz4Rtwa7ZTeXvnSdGhvaFU txCmggl2JRAUir8FSESbreZH8xSM5tWdT9keUo3Mk7twGnT2V7H4dxs1t5pYY+Q7ZQztB9lXQ8/ MvfzN4RR/MfqlqxMhJ+prR5e8Juw= X-Google-Smtp-Source: AGHT+IFOJcXfNCpatjHpLLH+KqUeCMuEXwNM6dMHRsGOsPG3x+GF1UVPkseaFy0PQweH89Hz1NMnYg== X-Received: by 2002:a17:902:cccf:b0:224:194c:694c with SMTP id d9443c01a7336-22428aaeb6dmr273817455ad.28.1741612953507; Mon, 10 Mar 2025 06:22:33 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:33 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 1/4] mm/gup: Handle huge pte for follow_page_pte() Date: Mon, 10 Mar 2025 21:22:19 +0800 Message-Id: <20250310132222.58378-2-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250310_062237_190021_C8E53945 X-CRM114-Status: GOOD ( 13.34 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Page mapped at pte level can also be huge page when ARM CONT_PTE or RISC-V SVNAPOT is enabled. Handle this scenario in follow_page_pte. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 6 ++++++ include/linux/pgtable.h | 8 ++++++++ mm/gup.c | 22 ++++++++++++++++------ 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 050fdc49b5ad7..40ae5979dd82c 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -800,6 +800,12 @@ static inline bool pud_user_accessible_page(pud_t pud) #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define pte_trans_huge pte_trans_huge +static inline int pte_trans_huge(pte_t pte) +{ + return pte_huge(pte) && pte_napot(pte); +} + static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372e..3f57ee6dcf017 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1584,6 +1584,14 @@ static inline unsigned long my_zero_pfn(unsigned long addr) #ifdef CONFIG_MMU +#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(pte_trans_huge)) || \ + (!defined(CONFIG_TRANSPARENT_HUGEPAGE)) +static inline int pte_trans_huge(pte_t pte) +{ + return 0; +} +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_trans_huge(pmd_t pmd) { diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..84710896f42eb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -838,11 +838,12 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, static struct page *follow_page_pte(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, unsigned int flags, - struct dev_pagemap **pgmap) + struct follow_page_context *ctx) { struct mm_struct *mm = vma->vm_mm; struct folio *folio; struct page *page; + struct hstate *h; spinlock_t *ptl; pte_t *ptep, pte; int ret; @@ -879,8 +880,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, * case since they are only valid while holding the pgmap * reference. */ - *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap); - if (*pgmap) + ctx->pgmap = get_dev_pagemap(pte_pfn(pte), ctx->pgmap); + if (ctx->pgmap) page = pte_page(pte); else goto no_page; @@ -940,6 +941,15 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, */ folio_mark_accessed(folio); } + if (is_vm_hugetlb_page(vma)) { + h = hstate_vma(vma); + WARN_ON_ONCE(page_size(page) != huge_page_size(h)); + page += (address & (huge_page_size(h) - 1)) >> PAGE_SHIFT; + ctx->page_mask = (1 << huge_page_order(h)) - 1; + } else if (pte_trans_huge(pte)) { + page += (address & (page_size(page) - 1)) >> PAGE_SHIFT; + ctx->page_mask = (page_size(page) >> PAGE_SHIFT) - 1; + } out: pte_unmap_unlock(ptep, ptl); return page; @@ -975,7 +985,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags, address); } if (likely(!pmd_leaf(pmdval))) - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags, address); @@ -988,14 +998,14 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, } if (unlikely(!pmd_leaf(pmdval))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); } if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + follow_page_pte(vma, address, pmd, flags, ctx); } page = follow_huge_pmd(vma, address, pmd, flags, ctx); spin_unlock(ptl); From patchwork Mon Mar 10 13:22:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14010131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AC34FC28B2E for ; Mon, 10 Mar 2025 14:49:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=H6yhRVPOCQqm/jMcAU0esxZMMbqUTl+RlzVzdTfWWcU=; b=zTloH2lJ5Q9bmp MHXeArQUlISt0iU9v9MmO3obtjQhXVllDv5r3t4QettQMhxoF91TokKNCWfr2seGNeM7QHnKJhCH4 xmN6wHXYDFYNCWHVbumlX8WtFyTqWngbnO4blJ3r2s0RuLzNNs7w4GD9XSJtcs5rbZKzD9JuAjTyu ACD82TSfChBECAc3K9LY5yk05AJJfeuPqnDQr1msb/cLUXxmBa18NEbg5bvMimpdywlpHp3lidqVe V48Mfz3Iy+9ly1wRmOj2kSqwE88b0No+GEfWzx42afy5F9tN7OK406sEcRi1wTcbw2zygA+1E4R/M 6yMwawAaexWLXlkFIceA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1treRH-000000030L3-0iN0; Mon, 10 Mar 2025 14:49:07 +0000 Received: from mail-pl1-x629.google.com ([2607:f8b0:4864:20::629]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1trd5a-00000002mQt-0awq for linux-riscv@lists.infradead.org; Mon, 10 Mar 2025 13:22:39 +0000 Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-224100e9a5cso75918315ad.2 for ; Mon, 10 Mar 2025 06:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612957; x=1742217757; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+ZMOD/Qf6hHTG0QsYxy12Mjhes8Th3228e4iu7j8c1w=; b=XKsg78TKLB+Ryh3yIzTWhVgSaFkIANgqkvm5Fgq0wmeGDstK8Wczp8hhWvbe7PaX17 0ns4WGRI1iryBAl95b98R9JtZzqOFoHvXlJg4APq8dWpYuO4dn/zxvPvJqKogZTiBR0L AVnHYuWMnkui6lIiX7oP0u4lzr4pZ9vs2tEA/AGV43ZK8ptSOphPW2mFytoVSdlj77wO d12tjDOe/pqNB3v9Unwvq3z2jqYEnmmWtYVF5ZIo60bK4QjONdTxoHf2L/jnoj2V6p0u GEf+mrG6R8OJfAcGd5n7/g0QUmVn3LKNyDA4iSQQzNM7M6oCfxlb4m/icrCNlTBuA4BB iMHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612957; x=1742217757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+ZMOD/Qf6hHTG0QsYxy12Mjhes8Th3228e4iu7j8c1w=; b=pa5avcSCxjZqi+3ekVJjRScVU20Q1Lv5GRPOCYYMhVXCQ7/SpOGWUaRNjA9ojV3W5C pnJ9sfyb+w5g1b2oAVDvPSeFZD4M/l/cFindwn66VSWOhikuHaDxWX3eFM8JU5qf1Dst CyQmd734xnGfNbP1rWP60au4ZwLvmgsNtoSMmhmU1mauwkgLAyTOZpocq9K2u5KQ+r/s GVu4FMQni4Yzx5KbNJDxCs+zA7TIpwN3U6ZHP2+Yf+ci7iw3mCCGX7sSGz00Sl6sg2Kf s0IwvfIgM4IQkwurwzEEnq0PjHDZolVGw6BO5Fk9ykp1qwr0SlFZzzT0th1zNWSyIA9y N+bg== X-Forwarded-Encrypted: i=1; AJvYcCXkB5Opt199ojvCg20tHPnKmExaMZlMHzqqfbtQyDLUEvDJPIlJN7KnZBS3s9HotpEru4LU4qdFxA3w0g==@lists.infradead.org X-Gm-Message-State: AOJu0Yy2m92yx1yNryg5U/H0otdq4phwyRnLOyA0t7H5+sO8r+GDqj4q vrwDWdV2uVkO9210NqqLrhnvVNYJLH0znfqdP22NDtRTc76ZdL4DPZyjp/HDz1I= X-Gm-Gg: ASbGncsrY7/NFuCa4GjZkTMCsQCV9eawrY/lYg/6ksBiDXdAWe3XaytxVnjBjIVPdDH pXH6hHKKZXUVIM9wezTY4qaaMBHLbhuPIAZRI+lgKt2tD7wbNc+ROIxSKDObKSx+4c1oTVvKo51 cvB3Eouk9JhGsn00/XNsRV9sTzpgS0pAsYJkkFzy+tpQ/ow/O4smjHPbZK4XUpG6NpVW9u5vbwd CvXg1SuilaK8Afao50u/P2fElG+wjtAJ7ifurEd6GGd8psR6MAVwuAxx3ggKKt5OA/J98oUTVZo pHK0DGw8hof5x3JqaVCaN78k2x0E/UPSjL5VR68OZjv5+CyF/mEIrZfiedG5phVyHLSKTxalhVj yhxIHFZp6Qq78UGaNt/IXK7Eng0Q= X-Google-Smtp-Source: AGHT+IFwyZ7/tPtBqU0g9RIHfp+iyK4Go8JH9gtSnoe0FV3ZnIyO4IbhPZG1IS3E7YfaXc51VxACFw== X-Received: by 2002:a17:903:1790:b0:224:2a6d:55ae with SMTP id d9443c01a7336-2242a6d585emr214130785ad.48.1741612957112; Mon, 10 Mar 2025 06:22:37 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:36 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 2/4] iommu/riscv: Use pte_t to represent page table entry Date: Mon, 10 Mar 2025 21:22:20 +0800 Message-Id: <20250310132222.58378-3-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250310_062238_178231_2F583BBA X-CRM114-Status: GOOD ( 15.34 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Since RISC-V IOMMU has the same pte format and translation process with MMU as is specified in RISC-V Privileged specification, we use pte_t to represent IOMMU pte too to reuse existing pte operation functions. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 66 ++++++++++++++++++------------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 8f049d4a0e2cb..f752096989a79 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -812,7 +812,7 @@ struct riscv_iommu_domain { bool amo_enabled; int numa_node; unsigned int pgd_mode; - unsigned long *pgd_root; + pte_t *pgd_root; }; #define iommu_domain_to_riscv(iommu_domain) \ @@ -1081,27 +1081,29 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, #define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t))) -#define _io_pte_present(pte) ((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) -#define _io_pte_leaf(pte) ((pte) & _PAGE_LEAF) -#define _io_pte_none(pte) ((pte) == 0) -#define _io_pte_entry(pn, prot) ((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot)) +#define _io_pte_present(pte) (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) +#define _io_pte_leaf(pte) (pte_val(pte) & _PAGE_LEAF) +#define _io_pte_none(pte) (pte_val(pte) == 0) +#define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))) static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - unsigned long pte, struct list_head *freelist) + pte_t pte, struct list_head *freelist) { - unsigned long *ptr; + pte_t *ptr; int i; if (!_io_pte_present(pte) || _io_pte_leaf(pte)) return; - ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); /* Recursively free all sub page table pages */ for (i = 0; i < PTRS_PER_PTE; i++) { - pte = READ_ONCE(ptr[i]); - if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) == pte) + pte = ptr[i]; + if (!_io_pte_none(pte)) { + ptr[i] = __pte(0); riscv_iommu_pte_free(domain, pte, freelist); + } } if (freelist) @@ -1110,12 +1112,12 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, iommu_free_page(ptr); } -static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, +static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, unsigned long iova, size_t pgsize, gfp_t gfp) { - unsigned long *ptr = domain->pgd_root; - unsigned long pte, old; + pte_t *ptr = domain->pgd_root; + pte_t pte, old; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; void *addr; @@ -1131,7 +1133,7 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, if (((size_t)1 << shift) == pgsize) return ptr; pte_retry: - pte = READ_ONCE(*ptr); + pte = ptep_get(ptr); /* * This is very likely incorrect as we should not be adding * new mapping with smaller granularity on top @@ -1154,31 +1156,31 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, goto pte_retry; } } - ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); return NULL; } -static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, - unsigned long iova, size_t *pte_pgsize) +static pte_t *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, + unsigned long iova, size_t *pte_pgsize) { - unsigned long *ptr = domain->pgd_root; - unsigned long pte; + pte_t *ptr = domain->pgd_root; + pte_t pte; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; do { const int shift = PAGE_SHIFT + PT_SHIFT * level; ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); - pte = READ_ONCE(*ptr); + pte = ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { *pte_pgsize = (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) return NULL; - ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); return NULL; @@ -1191,8 +1193,9 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; - unsigned long *ptr; - unsigned long pte, old, pte_prot; + pte_t *ptr; + pte_t pte, old; + unsigned long pte_prot; int rc = 0; LIST_HEAD(freelist); @@ -1210,10 +1213,9 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, break; } - old = READ_ONCE(*ptr); + old = ptep_get(ptr); pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); - if (cmpxchg_relaxed(ptr, old, pte) != old) - continue; + set_pte(ptr, pte); riscv_iommu_pte_free(domain, old, &freelist); @@ -1247,7 +1249,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = pgcount << __ffs(pgsize); - unsigned long *ptr, old; + pte_t *ptr; size_t unmapped = 0; size_t pte_size; @@ -1260,9 +1262,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, if (iova & (pte_size - 1)) return unmapped; - old = READ_ONCE(*ptr); - if (cmpxchg_relaxed(ptr, old, 0) != old) - continue; + set_pte(ptr, __pte(0)); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1279,13 +1279,13 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t pte_size; - unsigned long *ptr; + pte_t *ptr; ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); - if (_io_pte_none(*ptr) || !_io_pte_present(*ptr)) + if (_io_pte_none(ptep_get(ptr)) || !_io_pte_present(ptep_get(ptr))) return 0; - return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1)); + return pfn_to_phys(pte_pfn(ptep_get(ptr))) | (iova & (pte_size - 1)); } static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) From patchwork Mon Mar 10 13:22:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14010133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 24D57C35FF2 for ; Mon, 10 Mar 2025 14:49:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=mXyCCGtFCqs8uYxL9FvBgUoRzSU+T8vT3SKQ3B+D3u4=; b=1MBzYg/4q8rc2D k9it9jnMagL/s0oq0qaRQmPP1+Vjvgg+LaHPn80wQqOIXm3J0PUycrKI2F8+wIb8Oy4woonqyJ87d wRlH8JRlltiy4rSZzPxq9daV34rLo/DW+txvUNcRy69Hl2Hr3KBwEEr7srygFcQ3Vu1A195EBbq03 k7MzPuYUQK1Bk7qDtTvdbEBDdAnMgqeWuwkfLOq21Q5QJtP7fwb3VAK7j7SiD/XQmJhM8X/eLpUse YT1DVd+woWXJQ1AjKp5uPOQuB7hSIFTusL9lMf4YzWhbONLYq7eQ/86cCX3ZbsMzPBBmsc8kVg2SV XTpmE2HhLY0JS84Xv+/w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1treRH-000000030Ll-3c2Q; Mon, 10 Mar 2025 14:49:07 +0000 Received: from mail-pj1-x1029.google.com ([2607:f8b0:4864:20::1029]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1trd5d-00000002mRh-3K7D for linux-riscv@lists.infradead.org; Mon, 10 Mar 2025 13:22:42 +0000 Received: by mail-pj1-x1029.google.com with SMTP id 98e67ed59e1d1-2ff69365e1dso6079049a91.3 for ; Mon, 10 Mar 2025 06:22:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612961; x=1742217761; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CbZEX0tRaUwuzIIhwgz0oK3rdSaEKh+GGhpQM1uwbPo=; b=VbnCCU3vmeLSMboB/x6H2Xz0sa0LHVTU8zn0w8E/vMBp0PWQEBovKLc9oa6Rm0Q6KF fRX/Fi37BIdMf/oO0A59PZ1/CFeF0gDzW1VUou76A4Z35Of6tSF/Qk4Ky4aRUo3cI7ot EWIR+jXYDsVxmc1D45tnNVgyAJVDytDCwSZwO4y00mAexz44bNRtQZC3KL6b/HIATbIS WSY8reUM90zr/LREjcuwC75V/regPeP8/EVGeR4XFuxLg2i3uCxHxhmmE9YhF5zpLVlT n1p8IpvvK6C6xCi6tykpY7NIZUI0EozeYq3fWEtKe/QLLcfE/bsjhqEfyl4OholH0pil mOLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612961; x=1742217761; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CbZEX0tRaUwuzIIhwgz0oK3rdSaEKh+GGhpQM1uwbPo=; b=w2LpgaUd+zt9YzXBent4g9P4gVUgF5sGDDD6kY7IWaS/ljneNj1tDy9cAS+2ezczR6 K4DE0mGiQHR9CXhsT/ArasPXx4bnxC0LWkFnaU6oiFTFgVLUfvor4M+baHrjUmsrp2jX tfAVxRZwujVSy8MdoEALLVCxBPk8AL5RaqJduS3auB/hHhUD034fAG+LiFDoTgo5Hbl9 9OuGL3E0685jeCf0YI7wb2Yn7J1E5cdQh563RZ5/il1KuyRKgRGMxDjMeWHGHiDctMGL 34qWGw5BzWhjiVO69yVDvPRCmBe7M12hmcV6X4eF6zOpCZhLUczazRtYHN3FekdIDBRu j2lQ== X-Forwarded-Encrypted: i=1; AJvYcCX4kUmLx6qJQeAjkd4Vbnmt7zzCHbXFCFZSPtYol3IvP0nbfEhhUFNRGCeQ7gyreZ0yA9PxMiVxYMAqRg==@lists.infradead.org X-Gm-Message-State: AOJu0YyazobZZjQC3u6Y4oqDmzi9Ti5T9oNAC+PFZjmf42W00AWlQz7I Y1jtWPkgBWhvtam9Tf+fVURv0Amr9Ts33GAMOGFGf7KaTkaMnUyHn8QGm2aYbCQ= X-Gm-Gg: ASbGncv3bQeM9HKnmiTCQnAS3rx+rrUhbVApuUSUvPRMFwrHRrka1/Qh8hTcfHSd9nR LkvVIOsMS/rSqXNROynaNCjKeHzOoOPdn7BSJAv7Zop8C+bngMs0uRNvaWO9e2sF9ulM4c/a8up kLGoaixzPfCAmTCGd6TCwkUbnwhVSicLHCObDIPdh2RBU+98xiJ3tTsEbeTrH9R2FGWMsIdZ8au YO8nVqoq6GfK1XIgse7rOdAgPt5zRfZp3DMt18SMLDFguPn4C2E9Z/WA/tE5zQjRL9IbOoEURHA G9mC5Q3TIi4mmB34PRsxIr3xpOMF4DgxFuKCB7vEoaHRMGHc/66oBMioNG1ozR5kZFruVNWe8fS Uj91oNEUNXgNPwxz4dS8GB8VtN6Q= X-Google-Smtp-Source: AGHT+IHCQte13YpgjRC8VlNMd1j/6bcecbJqZgEElVCIYtvZlpVlkeYhvsrIux781DK+TTsFHbqgBw== X-Received: by 2002:a17:90b:4c4a:b0:2ff:7ad4:77b1 with SMTP id 98e67ed59e1d1-2ff7ce4f260mr22306375a91.2.1741612960728; Mon, 10 Mar 2025 06:22:40 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:40 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 3/4] iommu/riscv: Introduce IOMMU page table lock Date: Mon, 10 Mar 2025 21:22:21 +0800 Message-Id: <20250310132222.58378-4-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250310_062241_832944_EF507CC3 X-CRM114-Status: GOOD ( 16.30 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Introduce page table lock to address competition issues when modifying multiple PTEs, for example, when applying Svnapot. We use fine-grained page table locks to minimize lock contention. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 126 ++++++++++++++++++++++++++++++------ 1 file changed, 108 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index f752096989a79..ffc474987a075 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -808,6 +808,7 @@ struct riscv_iommu_domain { struct iommu_domain domain; struct list_head bonds; spinlock_t lock; /* protect bonds list updates. */ + spinlock_t page_table_lock; /* protect page table updates. */ int pscid; bool amo_enabled; int numa_node; @@ -1086,8 +1087,80 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, #define _io_pte_none(pte) (pte_val(pte) == 0) #define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))) +#define RISCV_IOMMU_PMD_LEVEL 1 + +static bool riscv_iommu_ptlock_init(struct ptdesc *ptdesc, int level) +{ + if (level <= RISCV_IOMMU_PMD_LEVEL) + return ptlock_init(ptdesc); + return true; +} + +static void riscv_iommu_ptlock_free(struct ptdesc *ptdesc, int level) +{ + if (level <= RISCV_IOMMU_PMD_LEVEL) + ptlock_free(ptdesc); +} + +static spinlock_t *riscv_iommu_ptlock(struct riscv_iommu_domain *domain, + pte_t *pte, int level) +{ + spinlock_t *ptl; + +#ifdef CONFIG_SPLIT_PTE_PTLOCKS + if (level <= RISCV_IOMMU_PMD_LEVEL) + ptl = ptlock_ptr(page_ptdesc(virt_to_page(pte))); + else +#endif + ptl = &domain->page_table_lock; + spin_lock(ptl); + + return ptl; +} + +static void *riscv_iommu_alloc_pagetable_node(int numa_node, gfp_t gfp, int level) +{ + struct ptdesc *ptdesc; + void *addr; + + addr = iommu_alloc_page_node(numa_node, gfp); + if (!addr) + return NULL; + + ptdesc = page_ptdesc(virt_to_page(addr)); + if (!riscv_iommu_ptlock_init(ptdesc, level)) { + iommu_free_page(addr); + addr = NULL; + } + + return addr; +} + +static void riscv_iommu_free_pagetable(void *addr, int level) +{ + struct ptdesc *ptdesc = page_ptdesc(virt_to_page(addr)); + + riscv_iommu_ptlock_free(ptdesc, level); + iommu_free_page(addr); +} + +static int pgsize_to_level(size_t pgsize) +{ + int level = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 - + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + int shift = PAGE_SHIFT + PT_SHIFT * level; + + while (pgsize < ((size_t)1 << shift)) { + shift -= PT_SHIFT; + level--; + } + + return level; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - pte_t pte, struct list_head *freelist) + pte_t pte, int level, + struct list_head *freelist) { pte_t *ptr; int i; @@ -1102,10 +1175,11 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte = ptr[i]; if (!_io_pte_none(pte)) { ptr[i] = __pte(0); - riscv_iommu_pte_free(domain, pte, freelist); + riscv_iommu_pte_free(domain, pte, level - 1, freelist); } } + riscv_iommu_ptlock_free(page_ptdesc(virt_to_page(ptr)), level); if (freelist) list_add_tail(&virt_to_page(ptr)->lru, freelist); else @@ -1117,8 +1191,9 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, gfp_t gfp) { pte_t *ptr = domain->pgd_root; - pte_t pte, old; + pte_t pte; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + spinlock_t *ptl; void *addr; do { @@ -1146,15 +1221,21 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, * page table. This might race with other mappings, retry. */ if (_io_pte_none(pte)) { - addr = iommu_alloc_page_node(domain->numa_node, gfp); + addr = riscv_iommu_alloc_pagetable_node(domain->numa_node, gfp, + level - 1); if (!addr) return NULL; - old = pte; - pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); - if (cmpxchg_relaxed(ptr, old, pte) != old) { - iommu_free_page(addr); + + ptl = riscv_iommu_ptlock(domain, ptr, level); + pte = ptep_get(ptr); + if (!_io_pte_none(pte)) { + spin_unlock(ptl); + riscv_iommu_free_pagetable(addr, level - 1); goto pte_retry; } + pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); + set_pte(ptr, pte); + spin_unlock(ptl); } ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); @@ -1194,9 +1275,10 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; pte_t *ptr; - pte_t pte, old; + pte_t pte; unsigned long pte_prot; - int rc = 0; + int rc = 0, level; + spinlock_t *ptl; LIST_HEAD(freelist); if (!(prot & IOMMU_WRITE)) @@ -1213,11 +1295,12 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, break; } - old = ptep_get(ptr); + level = pgsize_to_level(pgsize); + ptl = riscv_iommu_ptlock(domain, ptr, level); + riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); set_pte(ptr, pte); - - riscv_iommu_pte_free(domain, old, &freelist); + spin_unlock(ptl); size += pgsize; iova += pgsize; @@ -1252,6 +1335,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, pte_t *ptr; size_t unmapped = 0; size_t pte_size; + spinlock_t *ptl; while (unmapped < size) { ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1262,7 +1346,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, if (iova & (pte_size - 1)) return unmapped; + ptl = riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); set_pte(ptr, __pte(0)); + spin_unlock(ptl); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1292,13 +1378,14 @@ static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); const unsigned long pfn = virt_to_pfn(domain->pgd_root); + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; WARN_ON(!list_empty(&domain->bonds)); if ((int)domain->pscid > 0) ida_free(&riscv_iommu_pscids, domain->pscid); - riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), level, NULL); kfree(domain); } @@ -1359,7 +1446,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) struct riscv_iommu_device *iommu; unsigned int pgd_mode; dma_addr_t va_mask; - int va_bits; + int va_bits, level; iommu = dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1382,11 +1469,14 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) INIT_LIST_HEAD_RCU(&domain->bonds); spin_lock_init(&domain->lock); + spin_lock_init(&domain->page_table_lock); domain->numa_node = dev_to_node(iommu->dev); domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD); domain->pgd_mode = pgd_mode; - domain->pgd_root = iommu_alloc_page_node(domain->numa_node, - GFP_KERNEL_ACCOUNT); + level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + domain->pgd_root = riscv_iommu_alloc_pagetable_node(domain->numa_node, + GFP_KERNEL_ACCOUNT, + level); if (!domain->pgd_root) { kfree(domain); return ERR_PTR(-ENOMEM); @@ -1395,7 +1485,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1, RISCV_IOMMU_MAX_PSCID, GFP_KERNEL); if (domain->pscid < 0) { - iommu_free_page(domain->pgd_root); + riscv_iommu_free_pagetable(domain->pgd_root, level); kfree(domain); return ERR_PTR(-ENOMEM); } From patchwork Mon Mar 10 13:22:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14010132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C222FC282DE for ; Mon, 10 Mar 2025 14:49:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=DFhqJe6ergqeJHZMtefl0R0y8JXhU9FEg+SDnoHruRg=; b=UHw0rxq+apnS2A Z58TbYk0oILEkM9z+dCphHXCAPIQXpcQh3kSQ8DxXmkmlSBKr/tRE5a34pjuyjjJ7ImCMOqwZRcIP +G1df+kgae5qe+lbgTckbCxpGQdwedHvTw2+qZIGe4AbKxZpwxcKQENYeSBYXpzXVCHuSGWOv5bki xVXSmINMUOKNHGiHw37ilqps4OvQfhikA2wnsVzZ5b0CO05bastZL9CaoEN6aryIz5FGe5EUp2YF0 3TqwJc6XF4y7fKopTbuL5YSnYs33zFwmpFiHPsjrxfAoZWskyT5zR34xa/7XjFEDXN4U5ydK3HUhy gyVjSZ0dZPxDHjc3hqTg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1treRI-000000030MP-3SSP; Mon, 10 Mar 2025 14:49:08 +0000 Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1trd5h-00000002mTK-0zWu for linux-riscv@lists.infradead.org; Mon, 10 Mar 2025 13:22:46 +0000 Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-22185cddbffso91521435ad.1 for ; Mon, 10 Mar 2025 06:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612964; x=1742217764; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BxeP0cya+7a8WL+U/abw2KfwIo+gie7J/iF7xNJEC1g=; b=ghlhvPOh9lIQw8pOwJhUf776c0JyWUYWxWCDlgVMrbLX3Laz1tiuHeds7QzTnhidKv nlfA8QZdbo/wbGD3G1h5N0SiIK4N8mCZc/tBnWN6xBr+KxFfrS8thtkir6u0wZEJkq9W 64RysXU75AOplpy73yUqEjy0pngGu+WMNCD9UB6ptOYf32Io0tDxd/OUNH3gdIae0QPk OkTT4UcKZflqVkxk0LAFsDS51b8fkBcgxz8oJwreDJJczLfB+ZpYTPT9vcKzvWdE/Yjz nnEl1evIhNPtsHGrqawH0XWusB1FMJp0qUsc7Br0UO8IS1efQxoknOwfpDLM/BEOdohP 2bVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612964; x=1742217764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BxeP0cya+7a8WL+U/abw2KfwIo+gie7J/iF7xNJEC1g=; b=o1WqjmosNK2YNodnksQnDdDVOQs3/nzUsE4aFvkhqwRZqJnxn88yLAPJA98ifpZ6/P kShEstBMM3L+G/zGx8CoAH2uZJ1jvreroLCXPblMaXQ+kPeFtA+uHLZ81E5zM4LD6Rsi QcAxnC0egL2wWSWRGpSU0tuDfJCu2dwhbF7QVH9dYPxlTPIx4uA86ux4ruVkJI4i62xY Gkhf3F91BfpG2dtketZmbf09I+ILoAd6ILdV4c52K4tFMNtyxpARGid92z2s3AwXkpIB gRlwIuP2vE4ecPCfyZ9lBkp+L8T9LSeJJVXoS/5J8zrQ8G6d3+ddNA40HzD0xCYgifTv ufAg== X-Forwarded-Encrypted: i=1; AJvYcCUXMwGqvcA6W8C/Z5u8XpekKrXA2TymxVMkGWm/qZ5B0GQKXEJPchC5us2KgnRW4q7l2OpQKjyjZRaqwA==@lists.infradead.org X-Gm-Message-State: AOJu0YzbO8ZlVEi++qP8VKfHqHLWmEOtAI8XIzvsz40UC/MJm6chF4fB nhdLkeVSRgnJ6EwIr8tWqEXeFHxg6HlRLuUAxO/cqGo/gEnrOadV0LgA2PpllSA= X-Gm-Gg: ASbGncvbVtt3y+uAk+hFbtAuCWLumV5F8RyeF+qlPrxjRyCESOTuP8srE2U6zB7K8mF 3NRF1Y1q117Mw92MBEkKOXg5z0gFyFxUzDhWJoXuV6ew/tJGRAGZePVXsoOe3kCykmXGBRc0BcM He8PMedoLOBJphkNKmFwU9NE684lkUPyk6BU777WXjzcwsm4eEM3XuiEGa6OVdOYU+4DIQg5Yr8 fwrLOcN0eaCLSVq0eMsUJ3ehC+RlkU5ht6PUzFwxhXCgFmMZmwAuPy6AV+Lu4rbzpSJ1Hk6hRfl 0ykZk0N9tVLHMAYnmqlilUb7rwudh3xrDhy0Ns4HwmXGmCuWPNYQBKTiYUihVV5/doyJ9VNAYh+ dnXnR84n5Yjxw5H+rIoXOaozZaB8= X-Google-Smtp-Source: AGHT+IFGZvCmpJb7xpAqLXMSe32HiKjux3YzHuqmj20G2NlKQjkFcfFgUFq3QXpqyV1ePvUpOlmI/Q== X-Received: by 2002:a17:902:ce04:b0:215:6c5f:d142 with SMTP id d9443c01a7336-22464532f5emr148863585ad.20.1741612964286; Mon, 10 Mar 2025 06:22:44 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:44 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 4/4] iommu/riscv: Add support for Svnapot Date: Mon, 10 Mar 2025 21:22:22 +0800 Message-Id: <20250310132222.58378-5-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250310_062245_272371_5F43C90D X-CRM114-Status: GOOD ( 16.07 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Add Svnapot size as supported page size and apply Svnapot when it is possible. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 85 +++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index ffc474987a075..379875d637901 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -1158,6 +1158,26 @@ static int pgsize_to_level(size_t pgsize) return level; } +static unsigned long napot_size_to_order(unsigned long size) +{ + unsigned long order; + + if (!has_svnapot()) + return 0; + + for_each_napot_order(order) { + if (size == napot_cont_size(order)) + return order; + } + + return 0; +} + +static bool is_napot_size(unsigned long size) +{ + return napot_size_to_order(size) != 0; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte_t pte, int level, struct list_head *freelist) @@ -1205,7 +1225,8 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, * existing mapping with smaller granularity. Up to the caller * to replace and invalidate. */ - if (((size_t)1 << shift) == pgsize) + if ((((size_t)1 << shift) == pgsize) || + (is_napot_size(pgsize) && pgsize_to_level(pgsize) == level)) return ptr; pte_retry: pte = ptep_get(ptr); @@ -1256,7 +1277,10 @@ static pte_t *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); pte = ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { - *pte_pgsize = (size_t)1 << shift; + if (pte_napot(pte)) + *pte_pgsize = napot_cont_size(napot_cont_order(pte)); + else + *pte_pgsize = (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) @@ -1274,13 +1298,18 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; - pte_t *ptr; - pte_t pte; - unsigned long pte_prot; - int rc = 0, level; + pte_t *ptr, old, pte; + unsigned long pte_prot, order = 0; + int rc = 0, level, i; spinlock_t *ptl; LIST_HEAD(freelist); + if (iova & (pgsize - 1)) + return -EINVAL; + + if (is_napot_size(pgsize)) + order = napot_size_to_order(pgsize); + if (!(prot & IOMMU_WRITE)) pte_prot = _PAGE_BASE | _PAGE_READ; else if (domain->amo_enabled) @@ -1297,9 +1326,27 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, level = pgsize_to_level(pgsize); ptl = riscv_iommu_ptlock(domain, ptr, level); - riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); + + old = ptep_get(ptr); + if (pte_napot(old) && napot_cont_size(napot_cont_order(old)) > pgsize) { + spin_unlock(ptl); + rc = -EFAULT; + break; + } + pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); - set_pte(ptr, pte); + if (order) { + pte = pte_mknapot(pte, order); + for (i = 0; i < napot_pte_num(order); i++, ptr++) { + old = ptep_get(ptr); + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + } else { + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + spin_unlock(ptl); size += pgsize; @@ -1336,6 +1383,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, size_t unmapped = 0; size_t pte_size; spinlock_t *ptl; + unsigned long pte_num; + pte_t pte; + int i; while (unmapped < size) { ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1347,7 +1397,20 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, return unmapped; ptl = riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); - set_pte(ptr, __pte(0)); + if (is_napot_size(pte_size)) { + pte = ptep_get(ptr); + + if (!pte_napot(pte) || + napot_cont_size(napot_cont_order(pte)) != pte_size) { + spin_unlock(ptl); + return unmapped; + } + + pte_num = napot_pte_num(napot_cont_order(pte)); + for (i = 0; i < pte_num; i++, ptr++) + set_pte(ptr, __pte(0)); + } else + set_pte(ptr, __pte(0)); spin_unlock(ptl); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, @@ -1447,6 +1510,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) unsigned int pgd_mode; dma_addr_t va_mask; int va_bits, level; + size_t order; iommu = dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1506,6 +1570,9 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) domain->domain.geometry.aperture_end = va_mask; domain->domain.geometry.force_aperture = true; domain->domain.pgsize_bitmap = va_mask & (SZ_4K | SZ_2M | SZ_1G | SZ_512G); + if (has_svnapot()) + for_each_napot_order(order) + domain->domain.pgsize_bitmap |= napot_cont_size(order) & va_mask; domain->domain.ops = &riscv_iommu_paging_domain_ops;