From patchwork Thu Apr 18 16:32:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tomasz Jeznach X-Patchwork-Id: 13635106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3695C4345F for ; Thu, 18 Apr 2024 16:32:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=TUaIX4uzNQxTIY7iWUy1NiCM6GHuANq2RJazVFMm5ZU=; b=VjGzho+sPoqiRY 8V6BnaLUenW812sVzZ5cvkGGi3fTkAnimMsq045uee5g9VxEEMGy6OhVJXZnbNCUEOce1+BCkiHDb EMPN7TnCGjj9D8hBBmcGjaO3qOjIfao2f94f7Jbz4plDrd2YQSYiJZ3GENWyQr5iqTMnNZK68/wUC YyaIaCOEr52HB4+NMVivzm8i8AkJNtLsL7refmiQ+gMzcN8glIQ/a/CwAfTrndeLrPrWbGUAdizee fWGC90v4/Mx5BCRdOCKyy4yCYFz4PZU7C+ZCvJd8Sf44K7FgQbMz1SUO7/BnzvNH3tJkYoqye3fQE BIbLEmo2c8wvd9e4fhLQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rxUgu-000000031dA-12PL; Thu, 18 Apr 2024 16:32:52 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rxUgr-000000031b8-3O56 for linux-riscv@bombadil.infradead.org; Thu, 18 Apr 2024 16:32:50 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:Content-Type :MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=lgxIrVS16h/SerYh1GsIEzKhv9b08IA4pie9TxH63fY=; b=ElYQE6R9c6v51GB9tJ2YgFksI4 1WS0tgA9N3uLtFCiT2aNxDrTNSiCzniWOIuG6FlMECW6k72X7YrpXK/1haKf7csf2oe5+BtxSEocs 7UGXjXxow1w9HF0t6dilHW85K3PC8pB2ieqHC/3IFlTMefeKG4vx8rvDRPVSG6P82MM+AikRpM1SW ZLdi02iEKERukqMDvsVDLKoonvP7GFI/8j3p9cXkF0B5IxLAZOxCzu0fsErpMJgX71Di4E+jXkBOZ /qeoWqMcaEN6fW8MjtMkCkyGvzuKvf6oAdNTT5AcP7H5w9VnhGFZ+g7bpLtldpRZMeva7aRCrWEyU gV+4GAHg==; Received: from mail-pf1-x433.google.com ([2607:f8b0:4864:20::433]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rxUgo-0000000C4tT-1Zdj for linux-riscv@lists.infradead.org; Thu, 18 Apr 2024 16:32:48 +0000 Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-6ed0e9ccca1so1115614b3a.0 for ; Thu, 18 Apr 2024 09:32:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1713457965; x=1714062765; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lgxIrVS16h/SerYh1GsIEzKhv9b08IA4pie9TxH63fY=; b=0X+PNSBauqaj61ov5Whtbe4LPuqQiJpFcKaiKYNw5HozrXzdQYkcBziUIJFPZLv6bE hQKbaeX9KuH62mn3rmAVR+6H39V1N4FRpvpAzZSQmoLrPtsT5wtzebQ7kBpd31Rf4NMb a9sDy86n54z0D0bDzFSHzMtETup37aR4/NP6zUgl3rKQQsWnm4PVQayAwDrlNM8T3RIU xZ2h62/hp643uVuiKd0J60x9FqDxeT+1eOhjSwzLd3PFaCSLV/zUolbkoTa/4uiSmVfT kM94FycYDKdKuMbvXTU4vpZHBw5ZA23cucRqLOwmmvbslbgXONbPS5IgzkS0rzXvEJPY gjgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713457965; x=1714062765; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lgxIrVS16h/SerYh1GsIEzKhv9b08IA4pie9TxH63fY=; b=IAbebarN2C8Q8DyZDpt8cslGBqXBTu2B1O97Gp4jWlWL/XmTMgtI/Ju2hd6Oy0ZRRl IJN7vJpyi9tdmtO5yw8L7cGKzkNojIs0dtC3KuTf7tJ3TG7R54Gib13VoHn2cUQI0WHv XgKX2upQFtTRgCXci8VWdEPf68T0EP2ml/UC+P59EfSGsy7/RVDgFznoKFxQn8zLzZpO IldUp3YNIO5tqyfEMn1AA/TE5M/nD02kLe0ygBzINKrx61VlR7HztvHr1ga9qFOrxZcf 5naL3nITIKXPrKO6I91wRb7Frmpk7vtez1qlgN42+DqsyChShK26qbRyQu1mG3p/46tv Wobg== X-Forwarded-Encrypted: i=1; AJvYcCUeY0KD6Jd5xMybyFkegwgDO0A6YmxgIaqsHQ1Y30E60nY0FpFixjJz6S9Y6SepEhmYqdA6DqFzDFzMpH2AbFHWw26wDffCDjhhSOiEiPbt X-Gm-Message-State: AOJu0Yw4L7QykmlxmmzbgPhM0J+W4ahPk8Cj7r92E2K8wM3IUVPjGczS bHLKv8IswOms7PQQ6nOVQOlV37fg8yZ2gDdD2+byYGozB2jOPLVBF1tFO62vJjo= X-Google-Smtp-Source: AGHT+IGg5fqr8mWjVW5dWxgXmXyrna9Ar6uIbK0PltfhIxYndnsbs7RkjmzhNqIngg9NPJv/yeaLkg== X-Received: by 2002:a05:6a00:21c2:b0:6ed:d68d:948a with SMTP id t2-20020a056a0021c200b006edd68d948amr3580556pfj.23.1713457964808; Thu, 18 Apr 2024 09:32:44 -0700 (PDT) Received: from tjeznach.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id b19-20020a056a000a9300b006eae3aac040sm1674755pfl.31.2024.04.18.09.32.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Apr 2024 09:32:44 -0700 (PDT) From: Tomasz Jeznach To: Joerg Roedel , Will Deacon , Robin Murphy , Paul Walmsley Subject: [PATCH v2 7/7] iommu/riscv: Paging domain support Date: Thu, 18 Apr 2024 09:32:25 -0700 Message-Id: <301244bc3ff5da484b46d3fecc931cdad7d2806f.1713456598.git.tjeznach@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240418_173246_757312_96709A3F X-CRM114-Status: GOOD ( 32.07 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Anup Patel , devicetree@vger.kernel.org, Conor Dooley , Albert Ou , Tomasz Jeznach , linux@rivosinc.com, linux-kernel@vger.kernel.org, Rob Herring , Sebastien Boeuf , iommu@lists.linux.dev, Palmer Dabbelt , Nick Kossifidis , Krzysztof Kozlowski , linux-riscv@lists.infradead.org Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Introduce first-stage address translation support. Page table configured by the IOMMU driver will use the same format as the CPU’s MMU, and will fallback to identity translation if the page table format configured for the MMU is not supported by the IOMMU hardware. This change introduces IOTINVAL.VMA command, required to invalidate any cached IOATC entries after mapping is updated and/or removed from the paging domain. Invalidations for the non-leaf page entries will be added to the driver code in separate patch series, following spec update to clarify non-leaf cache invalidation command. With this patch, allowing only 4K mappings and keeping non-leaf page entries in memory this should be a reasonable simplification. Signed-off-by: Tomasz Jeznach --- drivers/iommu/riscv/Kconfig | 1 + drivers/iommu/riscv/iommu.c | 467 +++++++++++++++++++++++++++++++++++- 2 files changed, 466 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig index 711326992585..6f9fb396034a 100644 --- a/drivers/iommu/riscv/Kconfig +++ b/drivers/iommu/riscv/Kconfig @@ -7,6 +7,7 @@ config RISCV_IOMMU select DMA_OPS select IOMMU_API select IOMMU_IOVA + select IOMMU_DMA help Support for implementations of the RISC-V IOMMU architecture that complements the RISC-V MMU capabilities, providing similar address diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index a4f74588cdc2..32ddc372432d 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -46,6 +46,10 @@ MODULE_LICENSE("GPL"); #define dev_to_iommu(dev) \ container_of((dev)->iommu->iommu_dev, struct riscv_iommu_device, iommu) +/* IOMMU PSCID allocation namespace. */ +static DEFINE_IDA(riscv_iommu_pscids); +#define RISCV_IOMMU_MAX_PSCID BIT(20) + /* Device resource-managed allocations */ struct riscv_iommu_devres { unsigned long addr; @@ -752,12 +756,77 @@ static int riscv_iommu_ddt_alloc(struct riscv_iommu_device *iommu) return 0; } +struct riscv_iommu_bond { + struct list_head list; + struct rcu_head rcu; + struct device *dev; +}; + +/* This struct contains protection domain specific IOMMU driver data. */ +struct riscv_iommu_domain { + struct iommu_domain domain; + struct list_head bonds; + int pscid; + int numa_node; + int amo_enabled:1; + unsigned int pgd_mode; + /* paging domain */ + unsigned long pgd_root; +}; + +#define iommu_domain_to_riscv(iommu_domain) \ + container_of(iommu_domain, struct riscv_iommu_domain, domain) + +/* + * Send IOTLB.INVAL for whole address space for ranges larger than 2MB. + * This limit will be replaced with range invalidations, if supported by + * the hardware, when RISC-V IOMMU architecture specification update for + * range invalidations update will be available. + */ +#define RISCV_IOMMU_IOTLB_INVAL_LIMIT (2 << 20) + +static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain, + unsigned long start, unsigned long end) +{ + struct riscv_iommu_bond *bond; + struct riscv_iommu_device *iommu; + struct riscv_iommu_command cmd; + unsigned long len = end - start + 1; + unsigned long iova; + + rcu_read_lock(); + list_for_each_entry_rcu(bond, &domain->bonds, list) { + iommu = dev_to_iommu(bond->dev); + riscv_iommu_cmd_inval_vma(&cmd); + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid); + if (len > 0 && len < RISCV_IOMMU_IOTLB_INVAL_LIMIT) { + for (iova = start; iova < end; iova += PAGE_SIZE) { + riscv_iommu_cmd_inval_set_addr(&cmd, iova); + riscv_iommu_cmd_send(iommu, &cmd, 0); + } + } else { + riscv_iommu_cmd_send(iommu, &cmd, 0); + } + } + + list_for_each_entry_rcu(bond, &domain->bonds, list) { + iommu = dev_to_iommu(bond->dev); + + riscv_iommu_cmd_iofence(&cmd); + riscv_iommu_cmd_send(iommu, &cmd, RISCV_IOMMU_QUEUE_TIMEOUT); + } + rcu_read_unlock(); +} + static int riscv_iommu_attach_domain(struct riscv_iommu_device *iommu, struct device *dev, struct iommu_domain *iommu_domain) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + struct riscv_iommu_domain *domain; struct riscv_iommu_dc *dc; + struct riscv_iommu_bond *bond = NULL, *b; + struct riscv_iommu_command cmd; u64 fsc, ta, tc; int i; @@ -769,6 +838,20 @@ static int riscv_iommu_attach_domain(struct riscv_iommu_device *iommu, ta = 0; tc = RISCV_IOMMU_DC_TC_V; fsc = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_MODE_BARE); + } else if (iommu_domain->type & __IOMMU_DOMAIN_PAGING) { + domain = iommu_domain_to_riscv(iommu_domain); + + ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid); + tc = RISCV_IOMMU_DC_TC_V; + if (domain->amo_enabled) + tc |= RISCV_IOMMU_DC_TC_SADE; + fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, domain->pgd_mode) | + FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, virt_to_pfn(domain->pgd_root)); + + bond = kzalloc(sizeof(*bond), GFP_KERNEL); + if (!bond) + return -ENOMEM; + bond->dev = dev; } else { /* This should never happen. */ return -ENODEV; @@ -787,12 +870,390 @@ static int riscv_iommu_attach_domain(struct riscv_iommu_device *iommu, xchg64(&dc->ta, ta); xchg64(&dc->tc, tc); - /* Device context invalidation will be required. Ignoring for now. */ + if (!(tc & RISCV_IOMMU_DC_TC_V)) + continue; + + /* Invalidate device context cache */ + riscv_iommu_cmd_iodir_inval_ddt(&cmd); + riscv_iommu_cmd_iodir_set_did(&cmd, fwspec->ids[i]); + riscv_iommu_cmd_send(iommu, &cmd, 0); + + if (FIELD_GET(RISCV_IOMMU_PC_FSC_MODE, fsc) == RISCV_IOMMU_DC_FSC_MODE_BARE) + continue; + + /* Invalidate last valid PSCID */ + riscv_iommu_cmd_inval_vma(&cmd); + riscv_iommu_cmd_inval_set_pscid(&cmd, FIELD_GET(RISCV_IOMMU_DC_TA_PSCID, ta)); + riscv_iommu_cmd_send(iommu, &cmd, 0); + } + + /* Synchronize directory update */ + riscv_iommu_cmd_iofence(&cmd); + riscv_iommu_cmd_send(iommu, &cmd, RISCV_IOMMU_IOTINVAL_TIMEOUT); + + /* Track domain to devices mapping. */ + if (bond) + list_add_rcu(&bond->list, &domain->bonds); + + /* Remove tracking from previous domain, if needed. */ + iommu_domain = iommu_get_domain_for_dev(dev); + if (iommu_domain && !!(iommu_domain->type & __IOMMU_DOMAIN_PAGING)) { + domain = iommu_domain_to_riscv(iommu_domain); + bond = NULL; + rcu_read_lock(); + list_for_each_entry_rcu(b, &domain->bonds, list) { + if (b->dev == dev) { + bond = b; + break; + } + } + rcu_read_unlock(); + + if (bond) { + list_del_rcu(&bond->list); + kfree_rcu(bond, rcu); + } + } + + return 0; +} + +/* + * IOVA page translation tree management. + */ + +#define IOMMU_PAGE_SIZE_4K BIT_ULL(12) +#define IOMMU_PAGE_SIZE_2M BIT_ULL(21) +#define IOMMU_PAGE_SIZE_1G BIT_ULL(30) +#define IOMMU_PAGE_SIZE_512G BIT_ULL(39) + +#define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t))) + +static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + + riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX); +} + +static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, + struct iommu_iotlb_gather *gather) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + + riscv_iommu_iotlb_inval(domain, gather->start, gather->end); +} + +static inline size_t get_page_size(size_t size) +{ + if (size >= IOMMU_PAGE_SIZE_512G) + return IOMMU_PAGE_SIZE_512G; + if (size >= IOMMU_PAGE_SIZE_1G) + return IOMMU_PAGE_SIZE_1G; + if (size >= IOMMU_PAGE_SIZE_2M) + return IOMMU_PAGE_SIZE_2M; + return IOMMU_PAGE_SIZE_4K; +} + +#define _io_pte_present(pte) ((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) +#define _io_pte_leaf(pte) ((pte) & _PAGE_LEAF) +#define _io_pte_none(pte) ((pte) == 0) +#define _io_pte_entry(pn, prot) ((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot)) + +static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, + unsigned long pte, struct list_head *freelist) +{ + unsigned long *ptr; + int i; + + if (!_io_pte_present(pte) || _io_pte_leaf(pte)) + return; + + ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + + /* Recursively free all sub page table pages */ + for (i = 0; i < PTRS_PER_PTE; i++) { + pte = READ_ONCE(ptr[i]); + if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) == pte) + riscv_iommu_pte_free(domain, pte, freelist); + } + + if (freelist) + list_add_tail(&virt_to_page(ptr)->lru, freelist); + else + free_page((unsigned long)ptr); +} + +static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, + unsigned long iova, size_t pgsize, gfp_t gfp) +{ + unsigned long *ptr = (unsigned long *)domain->pgd_root; + unsigned long pte, old; + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + struct page *page; + + do { + const int shift = PAGE_SHIFT + PT_SHIFT * level; + + ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); + /* + * Note: returned entry might be a non-leaf if there was existing mapping + * with smaller granularity. Up to the caller to replace and invalidate. + */ + if (((size_t)1 << shift) == pgsize) + return ptr; +pte_retry: + pte = READ_ONCE(*ptr); + /* + * This is very likely incorrect as we should not be adding new mapping + * with smaller granularity on top of existing 2M/1G mapping. Fail. + */ + if (_io_pte_present(pte) && _io_pte_leaf(pte)) + return NULL; + /* + * Non-leaf entry is missing, allocate and try to add to the page table. + * This might race with other mappings, retry on error. + */ + if (_io_pte_none(pte)) { + page = alloc_pages_node(domain->numa_node, __GFP_ZERO | gfp, 0); + if (!page) + return NULL; + old = pte; + pte = _io_pte_entry(page_to_pfn(page), _PAGE_TABLE); + if (cmpxchg_relaxed(ptr, old, pte) != old) { + __free_pages(page, 0); + goto pte_retry; + } + } + ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + } while (level-- > 0); + + return NULL; +} + +static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, + unsigned long iova, size_t *pte_pgsize) +{ + unsigned long *ptr = (unsigned long *)domain->pgd_root; + unsigned long pte; + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + + do { + const int shift = PAGE_SHIFT + PT_SHIFT * level; + + ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); + pte = READ_ONCE(*ptr); + if (_io_pte_present(pte) && _io_pte_leaf(pte)) { + *pte_pgsize = (size_t)1 << shift; + return ptr; + } + if (_io_pte_none(pte)) + return NULL; + ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + } while (level-- > 0); + + return NULL; +} + +static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, + unsigned long iova, phys_addr_t phys, + size_t pgsize, size_t pgcount, int prot, + gfp_t gfp, size_t *mapped) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + size_t size = 0; + size_t page_size = get_page_size(pgsize); + unsigned long *ptr; + unsigned long pte, old, pte_prot; + + if (!(prot & IOMMU_WRITE)) + pte_prot = _PAGE_BASE | _PAGE_READ; + else if (domain->amo_enabled) + pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE; + else + pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY; + + while (pgcount) { + ptr = riscv_iommu_pte_alloc(domain, iova, page_size, gfp); + if (!ptr) { + *mapped = size; + return -ENOMEM; + } + + old = READ_ONCE(*ptr); + pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); + if (cmpxchg_relaxed(ptr, old, pte) != old) + continue; + + /* TODO: non-leaf page invalidation is pending spec update */ + riscv_iommu_pte_free(domain, old, NULL); + + size += page_size; + iova += page_size; + phys += page_size; + --pgcount; } + *mapped = size; + return 0; } +static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, + unsigned long iova, size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + size_t size = pgcount << __ffs(pgsize); + unsigned long *ptr, old; + size_t unmapped = 0; + size_t pte_size; + + while (unmapped < size) { + ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); + if (!ptr) + return unmapped; + + /* partial unmap is not allowed, fail. */ + if (iova & ~(pte_size - 1)) + return unmapped; + + old = READ_ONCE(*ptr); + if (cmpxchg_relaxed(ptr, old, 0) != old) + continue; + + iommu_iotlb_gather_add_page(&domain->domain, gather, iova, + pte_size); + + iova += pte_size; + unmapped += pte_size; + } + + return unmapped; +} + +static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain, dma_addr_t iova) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + unsigned long pte_size; + unsigned long *ptr; + + ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); + if (_io_pte_none(*ptr) || !_io_pte_present(*ptr)) + return 0; + + return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1)); +} + +static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + + WARN_ON(!list_empty(&domain->bonds)); + + if (domain->pgd_root) { + const unsigned long pfn = virt_to_pfn(domain->pgd_root); + + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); + } + + if ((int)domain->pscid > 0) + ida_free(&riscv_iommu_pscids, domain->pscid); + + kfree(domain); +} + +static bool riscv_iommu_pt_supported(struct riscv_iommu_device *iommu, int pgd_mode) +{ + switch (pgd_mode) { + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: + return iommu->caps & RISCV_IOMMU_CAP_S_SV39; + + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: + return iommu->caps & RISCV_IOMMU_CAP_S_SV48; + + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: + return iommu->caps & RISCV_IOMMU_CAP_S_SV57; + } + return false; +} + +static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain, + struct device *dev) +{ + struct riscv_iommu_device *iommu = dev_to_iommu(dev); + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + struct page *page; + + if (!riscv_iommu_pt_supported(iommu, domain->pgd_mode)) + return -ENODEV; + + domain->numa_node = dev_to_node(iommu->dev); + domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAP_AMO_HWAD); + + if (!domain->pgd_root) { + page = alloc_pages_node(domain->numa_node, + GFP_KERNEL_ACCOUNT | __GFP_ZERO, 0); + if (!page) + return -ENOMEM; + domain->pgd_root = (unsigned long)page_to_virt(page); + } + + return riscv_iommu_attach_domain(iommu, dev, iommu_domain); +} + +static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = { + .attach_dev = riscv_iommu_attach_paging_domain, + .free = riscv_iommu_free_paging_domain, + .map_pages = riscv_iommu_map_pages, + .unmap_pages = riscv_iommu_unmap_pages, + .iova_to_phys = riscv_iommu_iova_to_phys, + .iotlb_sync = riscv_iommu_iotlb_sync, + .flush_iotlb_all = riscv_iommu_flush_iotlb_all, +}; + +static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) +{ + struct riscv_iommu_domain *domain; + + domain = kzalloc(sizeof(*domain), GFP_KERNEL); + if (!domain) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD_RCU(&domain->bonds); + + domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1, + RISCV_IOMMU_MAX_PSCID - 1, GFP_KERNEL); + if (domain->pscid < 0) { + kfree(domain); + return ERR_PTR(-ENOMEM); + } + + /* + * Note: RISC-V Privilege spec mandates that virtual addresses + * need to be sign-extended, so if (VA_BITS - 1) is set, all + * bits >= VA_BITS need to also be set or else we'll get a + * page fault. However the code that creates the mappings + * above us (e.g. iommu_dma_alloc_iova()) won't do that for us + * for now, so we'll end up with invalid virtual addresses + * to map. As a workaround until we get this sorted out + * limit the available virtual addresses to VA_BITS - 1. + */ + domain->domain.geometry.aperture_start = 0; + domain->domain.geometry.aperture_end = DMA_BIT_MASK(VA_BITS - 1); + domain->domain.geometry.force_aperture = true; + + /* + * Follow system address translation mode. + * RISC-V IOMMU ATP mode values match RISC-V CPU SATP mode values. + */ + domain->pgd_mode = satp_mode >> SATP_MODE_SHIFT; + domain->numa_node = NUMA_NO_NODE; + domain->domain.ops = &riscv_iommu_paging_domain_ops; + + return &domain->domain; +} + static int riscv_iommu_attach_identity_domain(struct iommu_domain *iommu_domain, struct device *dev) { @@ -814,7 +1275,7 @@ static struct iommu_domain riscv_iommu_identity_domain = { static int riscv_iommu_device_domain_type(struct device *dev) { - return IOMMU_DOMAIN_IDENTITY; + return 0; } static struct iommu_group *riscv_iommu_device_group(struct device *dev) @@ -858,8 +1319,10 @@ static void riscv_iommu_release_device(struct device *dev) static const struct iommu_ops riscv_iommu_ops = { .owner = THIS_MODULE, + .pgsize_bitmap = SZ_4K, .of_xlate = riscv_iommu_of_xlate, .identity_domain = &riscv_iommu_identity_domain, + .domain_alloc_paging = riscv_iommu_alloc_paging_domain, .def_domain_type = riscv_iommu_device_domain_type, .device_group = riscv_iommu_device_group, .probe_device = riscv_iommu_probe_device,