From patchwork Thu Dec 21 03:19:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13500956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0895FC46CD3 for ; Thu, 21 Dec 2023 03:19:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75D008D0006; Wed, 20 Dec 2023 22:19:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70CAC8D0001; Wed, 20 Dec 2023 22:19:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 511588D0006; Wed, 20 Dec 2023 22:19:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3828D8D0001 for ; Wed, 20 Dec 2023 22:19:22 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 07DD740AC1 for ; Thu, 21 Dec 2023 03:19:21 +0000 (UTC) X-FDA: 81589369764.28.C62FC5C Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) by imf24.hostedemail.com (Postfix) with ESMTP id 69044180018 for ; Thu, 21 Dec 2023 03:19:20 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=BXYeyDdR; dmarc=none; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703128760; a=rsa-sha256; cv=none; b=ZhEYFuvSfa23imgyJvPLAacdeQ7rrjDT5biB0rFyhZFLgcck8w6DsSA2i+wnx5js+zz1/J aPrgyeKJEjmFz+d/r18zU4+b2GR/2/hgtDK9ojd2BH/qQy/qNtti+5EPqjfWePQ+X/wrEt qQHaMbuRiWHALe9ZS8GHkxez3euqYA8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=BXYeyDdR; dmarc=none; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703128760; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bHVoW/MSujJv5NIsFUvfttxthtWbl9YXxYuPhvKhYN4=; b=0ZniTiJJC/7Ww3/qwuRcOQho2JqLed45l/NmDt7qlQ7lZ3DpigN0Kez5r2FSlBvMsfTdyO db0cikz27E2Zwb4QA9cOVuME6vSWAuKCVB1DsmKkLJXdVH2gLPNoKptaySd8bNFUn9lUcA 3pRFJuOTa7Mmy4DHUCAkJOFK97YII7o= Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-78104f6f692so26247585a.1 for ; Wed, 20 Dec 2023 19:19:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1703128759; x=1703733559; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=bHVoW/MSujJv5NIsFUvfttxthtWbl9YXxYuPhvKhYN4=; b=BXYeyDdR53fJFHhH7zN6+KWHhBUPJydO1R04gXi0ZOS2fQHtzRWk97r6t0n68K7XkV znAe0L0jnLdrOSQps+OYKNxXWlgbeboJ754ix+ZL3eUzD9qmQP+NcLNXHjpcFCkcf7yP IF2JBoubufER89WVezJ3z3jjNR+Cdp15fFUpPqQ5b+tnof3+0KfgM/PYwOFmmkR5hqdw vB1+1WSQ1dpsXvAGuVNtnaXES9qbVhJ6r68drZ5KZilWeAcsaT5PbS1G7V948MqNpbJp gaZbjMpUIFa/kSahbX1e64TN/q0LYbeNR2v94F6Op/r3y2arfJ5raJVeX+JZfneMSn5F MhUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703128759; x=1703733559; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bHVoW/MSujJv5NIsFUvfttxthtWbl9YXxYuPhvKhYN4=; b=Ep4qsgoOCW+PBlj7IlgN1VPzWAPkBor8UZsG34C0BJlrBXbiSSezmql7INfb/FIkIL vWIFN3N/O/jUhwexGNeBsvhEq08xqPPYplo8KkPFaKjxM9saMZJ/kr2MNakEUIRc6vBO mKl67Fvm2hH9sVztc/ykN3AaDg5xWg3GzDSMS0DFvSIfRCXjI272p2rC4gzlKu5Tfmmc sl0igDGT17QMc3L3VMXcKixavo7jEvZzv5QFEkb4LAcviAUSH+yTa7hTKpPeWy4Li15o KQk9QE4QFDq5RvNvLVVjvQjdk325FWaw2hYmBOoZ0Q7hZO+DHk5hOniyUrj7opwsmC/I CryA== X-Gm-Message-State: AOJu0YwpJFYn57FptxRROYZJfH2mdfzYSCy5nLS1IvTGU/nArHIuohMC K9y1nPbVJBCtFX67nA/nxvj6vw== X-Google-Smtp-Source: AGHT+IHY+sU7XqrPzSe1RhsjN9sLabzyaNEAd3hCwu8MQn6FSl4jaGvTX3sXjbffBRHiVyH9teDPwg== X-Received: by 2002:a05:620a:461f:b0:781:97:fbc4 with SMTP id br31-20020a05620a461f00b007810097fbc4mr67467qkb.25.1703128759521; Wed, 20 Dec 2023 19:19:19 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id m18-20020a05620a221200b0077d85695db4sm371893qkh.99.2023.12.20.19.19.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 19:19:18 -0800 (PST) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC 1/3] iommu/intel: Use page->refcount to count number of entries in IOMMU Date: Thu, 21 Dec 2023 03:19:13 +0000 Message-ID: <20231221031915.619337-2-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231221031915.619337-1-pasha.tatashin@soleen.com> References: <20231221031915.619337-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 69044180018 X-Stat-Signature: 79b5z1d49ugo1xtuogezri144cyxbsie X-HE-Tag: 1703128760-554056 X-HE-Meta: U2FsdGVkX1+R0p8o7AmHDom8YVNubSqU6vYt/zU3mo8MYZq3fY6mgWqCSgK+FFAgduc8KMFeSpKQ8CnvSja2/MNE9d0qcNXTx2TVEDDEmD9N7d71VmCPPlHTC+kqGN7+QyvbDYU09jktq9YPBcTFgrrrqxG+fQoI8G9Oicxgm0mTu3f1wFoqDozSc0AS2ZiCtpUvpHnKfj4y8qz8ywCwClyXS8oe+ZR+LRuGwEfeYJqMCZT2rWmQZjgT1gdxFjqodESZ/areg/2Nb5dXbN+p+KjsJr8Z/lwiu0X85mGsDwa/vlS96cuCyHez6ooCNr/ZnrS2BRwqjWkQfm+I3mn2yo3+bvDEOAkoGgNOKlE7ky5B27IRGacP+9sZmx19zRkZ79HEZ4PyboxSzwFxe7fiHPB6knR5p0OI2jkNxPnxDpUgZ8trdslULRdmDAaoIwP0e3wFEj0/CY5nBjOj2rEfyKndqH5dvegPyjHFMSZdF23rRHWlQ7/wNs0+Z2mI/cRJXHV7uLtLkFE/1/UQlkw6+fvLo5roEU0bCwNv4ghqzGv7mhfez/z3zHOHI5S79+4uIlvzORTslpWCWXDAF1tf8nwsqKpHeKX+aGCQexLC9oJBSp67NPRmV9z2Bw95oOk6la6lCh3msrnEHKB0661HoYiflwoKJoToj8v4XBkl9SqCq4vulwZ73cV9/GIyC9EG1YeoQ9m0THSiQegtpHKOPU/YcpqWGvMfoZ+bm7zaEPaIWV4ESc7DJLEbHBN2K3tHs7JDu271+2inb5T4cmIgEkdyLmaK0zB6JiQGTeAR4+vHQr6U53Rkxqj8UZ7h1hiMflBB13zLEGzyK6+5iVcFIGGDxAJSJn5d6tWQwglm81JhwMlk5boB74YiX7XCYWvk3SiViEzvJhWMByLGXGV1/9NAR8twcXKxfAf8btLrmoHlrTpIRJLnHUBm8WBTeHbXGDWzUmUEelfnNVuVPxu D2Nhviw6 /z73YxPMf7kpbppqq5VeDACGOrecA9HKQTSIDsb2uRDFxeHMWACgcHg7vyZgfW1ygi12ALGCh+ZZVWXkhYp2KX1TsVegeHu7PLr9U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In order to be able to efficiently free empty page table levels, count the number of entries in each page table my incremeanting and decremeanting refcount every time a PTE is inserted or removed form the page table. For this to work correctly, add two helper function: dma_clear_pte and dma_set_pte where counting is performed, Also, modify the code so every page table entry is always updated using the two new functions. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 40 +++++++++++++++++++++--------------- drivers/iommu/intel/iommu.h | 41 +++++++++++++++++++++++++++++++------ 2 files changed, 58 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 897159dba47d..4688ef797161 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -949,7 +949,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, if (domain->use_first_level) pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US | DMA_FL_PTE_ACCESS; - if (cmpxchg64(&pte->val, 0ULL, pteval)) + if (dma_set_pte(pte, pteval)) /* Someone else set it while we were thinking; use theirs. */ free_pgtable_page(tmp_page); else @@ -1021,7 +1021,8 @@ static void dma_pte_clear_range(struct dmar_domain *domain, continue; } do { - dma_clear_pte(pte); + if (dma_pte_present(pte)) + dma_clear_pte(pte); start_pfn += lvl_to_nr_pages(large_page); pte++; } while (start_pfn <= last_pfn && !first_pte_in_page(pte)); @@ -1062,7 +1063,8 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level, */ if (level < retain_level && !(start_pfn > level_pfn || last_pfn < level_pfn + level_size(level) - 1)) { - dma_clear_pte(pte); + if (dma_pte_present(pte)) + dma_clear_pte(pte); domain_flush_cache(domain, pte, sizeof(*pte)); free_pgtable_page(level_pte); } @@ -1093,12 +1095,13 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain, } } -/* When a page at a given level is being unlinked from its parent, we don't - need to *modify* it at all. All we need to do is make a list of all the - pages which can be freed just as soon as we've flushed the IOTLB and we - know the hardware page-walk will no longer touch them. - The 'pte' argument is the *parent* PTE, pointing to the page that is to - be freed. */ +/* + * A given page at a given level is being unlinked from its parent. + * We need to make a list of all the pages which can be freed just as soon as + * we've flushed the IOTLB and we know the hardware page-walk will no longer + * touch them. The 'pte' argument is the *parent* PTE, pointing to the page + * that is to be freed. + */ static void dma_pte_list_pagetables(struct dmar_domain *domain, int level, struct dma_pte *pte, struct list_head *freelist) @@ -1106,17 +1109,20 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain, struct page *pg; pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT); - list_add_tail(&pg->lru, freelist); - - if (level == 1) - return; - pte = page_address(pg); + do { - if (dma_pte_present(pte) && !dma_pte_superpage(pte)) - dma_pte_list_pagetables(domain, level - 1, pte, freelist); + if (dma_pte_present(pte)) { + if (level > 1 && !dma_pte_superpage(pte)) { + dma_pte_list_pagetables(domain, level - 1, pte, + freelist); + } + dma_clear_pte(pte); + } pte++; } while (!first_pte_in_page(pte)); + + list_add_tail(&pg->lru, freelist); } static void dma_pte_clear_level(struct dmar_domain *domain, int level, @@ -2244,7 +2250,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, /* We don't need lock here, nobody else * touches the iova range */ - tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); + tmp = dma_set_pte(pte, pteval); if (tmp) { static int dumps = 5; pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index ce030c5b5772..f1ea508f45bd 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -802,11 +802,6 @@ struct dma_pte { u64 val; }; -static inline void dma_clear_pte(struct dma_pte *pte) -{ - pte->val = 0; -} - static inline u64 dma_pte_addr(struct dma_pte *pte) { #ifdef CONFIG_64BIT @@ -818,9 +813,43 @@ static inline u64 dma_pte_addr(struct dma_pte *pte) #endif } +#define DMA_PTEVAL_PRESENT(pteval) (((pteval) & 3) != 0) static inline bool dma_pte_present(struct dma_pte *pte) { - return (pte->val & 3) != 0; + return DMA_PTEVAL_PRESENT(pte->val); +} + +static inline void dma_clear_pte(struct dma_pte *pte) +{ + u64 old_pteval; + + old_pteval = xchg(&pte->val, 0ULL); + if (DMA_PTEVAL_PRESENT(old_pteval)) { + struct page *pg = virt_to_page(pte); + int rc = page_ref_dec_return(pg); + + WARN_ON_ONCE(rc > 512 || rc < 1); + } else { + /* Ensure that we cleared a valid entry from the page table */ + WARN_ON(1); + } +} + +static inline u64 dma_set_pte(struct dma_pte *pte, u64 pteval) +{ + u64 old_pteval; + + /* Ensure we about to set a valid entry to the page table */ + WARN_ON(!DMA_PTEVAL_PRESENT(pteval)); + old_pteval = cmpxchg64(&pte->val, 0ULL, pteval); + if (old_pteval == 0) { + struct page *pg = virt_to_page(pte); + int rc = page_ref_inc_return(pg); + + WARN_ON_ONCE(rc > 513 || rc < 2); + } + + return old_pteval; } static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,