From patchwork Thu Dec 21 03:19:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13500956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0895FC46CD3 for ; Thu, 21 Dec 2023 03:19:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75D008D0006; Wed, 20 Dec 2023 22:19:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70CAC8D0001; Wed, 20 Dec 2023 22:19:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 511588D0006; Wed, 20 Dec 2023 22:19:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3828D8D0001 for ; Wed, 20 Dec 2023 22:19:22 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 07DD740AC1 for ; Thu, 21 Dec 2023 03:19:21 +0000 (UTC) X-FDA: 81589369764.28.C62FC5C Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) by imf24.hostedemail.com (Postfix) with ESMTP id 69044180018 for ; Thu, 21 Dec 2023 03:19:20 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=BXYeyDdR; dmarc=none; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703128760; a=rsa-sha256; cv=none; b=ZhEYFuvSfa23imgyJvPLAacdeQ7rrjDT5biB0rFyhZFLgcck8w6DsSA2i+wnx5js+zz1/J aPrgyeKJEjmFz+d/r18zU4+b2GR/2/hgtDK9ojd2BH/qQy/qNtti+5EPqjfWePQ+X/wrEt qQHaMbuRiWHALe9ZS8GHkxez3euqYA8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=BXYeyDdR; dmarc=none; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703128760; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bHVoW/MSujJv5NIsFUvfttxthtWbl9YXxYuPhvKhYN4=; b=0ZniTiJJC/7Ww3/qwuRcOQho2JqLed45l/NmDt7qlQ7lZ3DpigN0Kez5r2FSlBvMsfTdyO db0cikz27E2Zwb4QA9cOVuME6vSWAuKCVB1DsmKkLJXdVH2gLPNoKptaySd8bNFUn9lUcA 3pRFJuOTa7Mmy4DHUCAkJOFK97YII7o= Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-78104f6f692so26247585a.1 for ; Wed, 20 Dec 2023 19:19:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1703128759; x=1703733559; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=bHVoW/MSujJv5NIsFUvfttxthtWbl9YXxYuPhvKhYN4=; b=BXYeyDdR53fJFHhH7zN6+KWHhBUPJydO1R04gXi0ZOS2fQHtzRWk97r6t0n68K7XkV znAe0L0jnLdrOSQps+OYKNxXWlgbeboJ754ix+ZL3eUzD9qmQP+NcLNXHjpcFCkcf7yP IF2JBoubufER89WVezJ3z3jjNR+Cdp15fFUpPqQ5b+tnof3+0KfgM/PYwOFmmkR5hqdw vB1+1WSQ1dpsXvAGuVNtnaXES9qbVhJ6r68drZ5KZilWeAcsaT5PbS1G7V948MqNpbJp gaZbjMpUIFa/kSahbX1e64TN/q0LYbeNR2v94F6Op/r3y2arfJ5raJVeX+JZfneMSn5F MhUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703128759; x=1703733559; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bHVoW/MSujJv5NIsFUvfttxthtWbl9YXxYuPhvKhYN4=; b=Ep4qsgoOCW+PBlj7IlgN1VPzWAPkBor8UZsG34C0BJlrBXbiSSezmql7INfb/FIkIL vWIFN3N/O/jUhwexGNeBsvhEq08xqPPYplo8KkPFaKjxM9saMZJ/kr2MNakEUIRc6vBO mKl67Fvm2hH9sVztc/ykN3AaDg5xWg3GzDSMS0DFvSIfRCXjI272p2rC4gzlKu5Tfmmc sl0igDGT17QMc3L3VMXcKixavo7jEvZzv5QFEkb4LAcviAUSH+yTa7hTKpPeWy4Li15o KQk9QE4QFDq5RvNvLVVjvQjdk325FWaw2hYmBOoZ0Q7hZO+DHk5hOniyUrj7opwsmC/I CryA== X-Gm-Message-State: AOJu0YwpJFYn57FptxRROYZJfH2mdfzYSCy5nLS1IvTGU/nArHIuohMC K9y1nPbVJBCtFX67nA/nxvj6vw== X-Google-Smtp-Source: AGHT+IHY+sU7XqrPzSe1RhsjN9sLabzyaNEAd3hCwu8MQn6FSl4jaGvTX3sXjbffBRHiVyH9teDPwg== X-Received: by 2002:a05:620a:461f:b0:781:97:fbc4 with SMTP id br31-20020a05620a461f00b007810097fbc4mr67467qkb.25.1703128759521; Wed, 20 Dec 2023 19:19:19 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id m18-20020a05620a221200b0077d85695db4sm371893qkh.99.2023.12.20.19.19.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 19:19:18 -0800 (PST) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC 1/3] iommu/intel: Use page->refcount to count number of entries in IOMMU Date: Thu, 21 Dec 2023 03:19:13 +0000 Message-ID: <20231221031915.619337-2-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231221031915.619337-1-pasha.tatashin@soleen.com> References: <20231221031915.619337-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 69044180018 X-Stat-Signature: 79b5z1d49ugo1xtuogezri144cyxbsie X-HE-Tag: 1703128760-554056 X-HE-Meta: U2FsdGVkX1+R0p8o7AmHDom8YVNubSqU6vYt/zU3mo8MYZq3fY6mgWqCSgK+FFAgduc8KMFeSpKQ8CnvSja2/MNE9d0qcNXTx2TVEDDEmD9N7d71VmCPPlHTC+kqGN7+QyvbDYU09jktq9YPBcTFgrrrqxG+fQoI8G9Oicxgm0mTu3f1wFoqDozSc0AS2ZiCtpUvpHnKfj4y8qz8ywCwClyXS8oe+ZR+LRuGwEfeYJqMCZT2rWmQZjgT1gdxFjqodESZ/areg/2Nb5dXbN+p+KjsJr8Z/lwiu0X85mGsDwa/vlS96cuCyHez6ooCNr/ZnrS2BRwqjWkQfm+I3mn2yo3+bvDEOAkoGgNOKlE7ky5B27IRGacP+9sZmx19zRkZ79HEZ4PyboxSzwFxe7fiHPB6knR5p0OI2jkNxPnxDpUgZ8trdslULRdmDAaoIwP0e3wFEj0/CY5nBjOj2rEfyKndqH5dvegPyjHFMSZdF23rRHWlQ7/wNs0+Z2mI/cRJXHV7uLtLkFE/1/UQlkw6+fvLo5roEU0bCwNv4ghqzGv7mhfez/z3zHOHI5S79+4uIlvzORTslpWCWXDAF1tf8nwsqKpHeKX+aGCQexLC9oJBSp67NPRmV9z2Bw95oOk6la6lCh3msrnEHKB0661HoYiflwoKJoToj8v4XBkl9SqCq4vulwZ73cV9/GIyC9EG1YeoQ9m0THSiQegtpHKOPU/YcpqWGvMfoZ+bm7zaEPaIWV4ESc7DJLEbHBN2K3tHs7JDu271+2inb5T4cmIgEkdyLmaK0zB6JiQGTeAR4+vHQr6U53Rkxqj8UZ7h1hiMflBB13zLEGzyK6+5iVcFIGGDxAJSJn5d6tWQwglm81JhwMlk5boB74YiX7XCYWvk3SiViEzvJhWMByLGXGV1/9NAR8twcXKxfAf8btLrmoHlrTpIRJLnHUBm8WBTeHbXGDWzUmUEelfnNVuVPxu D2Nhviw6 /z73YxPMf7kpbppqq5VeDACGOrecA9HKQTSIDsb2uRDFxeHMWACgcHg7vyZgfW1ygi12ALGCh+ZZVWXkhYp2KX1TsVegeHu7PLr9U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In order to be able to efficiently free empty page table levels, count the number of entries in each page table my incremeanting and decremeanting refcount every time a PTE is inserted or removed form the page table. For this to work correctly, add two helper function: dma_clear_pte and dma_set_pte where counting is performed, Also, modify the code so every page table entry is always updated using the two new functions. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 40 +++++++++++++++++++++--------------- drivers/iommu/intel/iommu.h | 41 +++++++++++++++++++++++++++++++------ 2 files changed, 58 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 897159dba47d..4688ef797161 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -949,7 +949,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, if (domain->use_first_level) pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US | DMA_FL_PTE_ACCESS; - if (cmpxchg64(&pte->val, 0ULL, pteval)) + if (dma_set_pte(pte, pteval)) /* Someone else set it while we were thinking; use theirs. */ free_pgtable_page(tmp_page); else @@ -1021,7 +1021,8 @@ static void dma_pte_clear_range(struct dmar_domain *domain, continue; } do { - dma_clear_pte(pte); + if (dma_pte_present(pte)) + dma_clear_pte(pte); start_pfn += lvl_to_nr_pages(large_page); pte++; } while (start_pfn <= last_pfn && !first_pte_in_page(pte)); @@ -1062,7 +1063,8 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level, */ if (level < retain_level && !(start_pfn > level_pfn || last_pfn < level_pfn + level_size(level) - 1)) { - dma_clear_pte(pte); + if (dma_pte_present(pte)) + dma_clear_pte(pte); domain_flush_cache(domain, pte, sizeof(*pte)); free_pgtable_page(level_pte); } @@ -1093,12 +1095,13 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain, } } -/* When a page at a given level is being unlinked from its parent, we don't - need to *modify* it at all. All we need to do is make a list of all the - pages which can be freed just as soon as we've flushed the IOTLB and we - know the hardware page-walk will no longer touch them. - The 'pte' argument is the *parent* PTE, pointing to the page that is to - be freed. */ +/* + * A given page at a given level is being unlinked from its parent. + * We need to make a list of all the pages which can be freed just as soon as + * we've flushed the IOTLB and we know the hardware page-walk will no longer + * touch them. The 'pte' argument is the *parent* PTE, pointing to the page + * that is to be freed. + */ static void dma_pte_list_pagetables(struct dmar_domain *domain, int level, struct dma_pte *pte, struct list_head *freelist) @@ -1106,17 +1109,20 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain, struct page *pg; pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT); - list_add_tail(&pg->lru, freelist); - - if (level == 1) - return; - pte = page_address(pg); + do { - if (dma_pte_present(pte) && !dma_pte_superpage(pte)) - dma_pte_list_pagetables(domain, level - 1, pte, freelist); + if (dma_pte_present(pte)) { + if (level > 1 && !dma_pte_superpage(pte)) { + dma_pte_list_pagetables(domain, level - 1, pte, + freelist); + } + dma_clear_pte(pte); + } pte++; } while (!first_pte_in_page(pte)); + + list_add_tail(&pg->lru, freelist); } static void dma_pte_clear_level(struct dmar_domain *domain, int level, @@ -2244,7 +2250,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, /* We don't need lock here, nobody else * touches the iova range */ - tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); + tmp = dma_set_pte(pte, pteval); if (tmp) { static int dumps = 5; pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index ce030c5b5772..f1ea508f45bd 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -802,11 +802,6 @@ struct dma_pte { u64 val; }; -static inline void dma_clear_pte(struct dma_pte *pte) -{ - pte->val = 0; -} - static inline u64 dma_pte_addr(struct dma_pte *pte) { #ifdef CONFIG_64BIT @@ -818,9 +813,43 @@ static inline u64 dma_pte_addr(struct dma_pte *pte) #endif } +#define DMA_PTEVAL_PRESENT(pteval) (((pteval) & 3) != 0) static inline bool dma_pte_present(struct dma_pte *pte) { - return (pte->val & 3) != 0; + return DMA_PTEVAL_PRESENT(pte->val); +} + +static inline void dma_clear_pte(struct dma_pte *pte) +{ + u64 old_pteval; + + old_pteval = xchg(&pte->val, 0ULL); + if (DMA_PTEVAL_PRESENT(old_pteval)) { + struct page *pg = virt_to_page(pte); + int rc = page_ref_dec_return(pg); + + WARN_ON_ONCE(rc > 512 || rc < 1); + } else { + /* Ensure that we cleared a valid entry from the page table */ + WARN_ON(1); + } +} + +static inline u64 dma_set_pte(struct dma_pte *pte, u64 pteval) +{ + u64 old_pteval; + + /* Ensure we about to set a valid entry to the page table */ + WARN_ON(!DMA_PTEVAL_PRESENT(pteval)); + old_pteval = cmpxchg64(&pte->val, 0ULL, pteval); + if (old_pteval == 0) { + struct page *pg = virt_to_page(pte); + int rc = page_ref_inc_return(pg); + + WARN_ON_ONCE(rc > 513 || rc < 2); + } + + return old_pteval; } static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte, From patchwork Thu Dec 21 03:19:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13500957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C38AC3DA6E for ; Thu, 21 Dec 2023 03:19:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9FE68D0007; Wed, 20 Dec 2023 22:19:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D4EF48D0001; Wed, 20 Dec 2023 22:19:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA21D8D0007; Wed, 20 Dec 2023 22:19:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 97CC08D0001 for ; Wed, 20 Dec 2023 22:19:23 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6C40712035A for ; Thu, 21 Dec 2023 03:19:23 +0000 (UTC) X-FDA: 81589369806.22.C1A467B Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf09.hostedemail.com (Postfix) with ESMTP id A02F6140002 for ; Thu, 21 Dec 2023 03:19:21 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=eBM4Z1H8; spf=pass (imf09.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703128761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mGmYN/slxAgBeZjp+ZQs88mMzGzd5XnAVtjIWHCDIHA=; b=ObDtCiS/nnjWF5hzVRyLbhe1MRg+VEQrrxBNW+hoI/cPeuc1Gdk3BmQ5It9pTQMvWGPj11 r06R03uUMM+4qDiLZqf3bs9/OX0kB0ajXR7EXlSCgCWdZ88AaZwZXfqfMQvN8p07nQ2ISl e33fweZPYDM7fif+6wD5nvtQUsi76Ow= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703128761; a=rsa-sha256; cv=none; b=X+B7cITM8ONs+RA+HiX8IhdRv+AGBLyV5GfqTuZ39jz+H0O/XofKzBJcFBuN1PmuvVaBiz 0divoDGgRpZZDKjRGmDVVxdZBfBpQh1VXJZU673gm0wDNmZijOE9i/keBAfSH8skEsjxL7 5yDajwj+XMekKQRGQplSqCGPRZGEdz8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=eBM4Z1H8; spf=pass (imf09.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7811db57cb4so17311185a.0 for ; Wed, 20 Dec 2023 19:19:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1703128761; x=1703733561; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=mGmYN/slxAgBeZjp+ZQs88mMzGzd5XnAVtjIWHCDIHA=; b=eBM4Z1H8iLce0ZGsy7afF6i6GA2sjEnT91ff82zv8llyre46u186S8jnoIFHwPsTIi cUSisM+7+t8N1i99QIsBHNbafSHqLkM+Z0mSXkj94+of4DX7h7cg/i+79mXKaIWBxdmY NXyN82AsLm5gOX1Q4JKC7aKFzZBLe+pCFgfcOAtmQqsXUxsdzqtjNnqJmnNxj2ieQ5xI Tino263wPbJqt90jfN08MaSYhJXMyzIBu7mq8lWLjoiPlIpdySEKzRyBJ58CqHVx7DuB bMghjC5jyAgHkJppBx0/B9sMiaGcdg55wrtoV2UMkGfOgL9Z0oPN888mprAs21LruOog IRGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703128761; x=1703733561; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mGmYN/slxAgBeZjp+ZQs88mMzGzd5XnAVtjIWHCDIHA=; b=aUkv1iM1b6n1+H8rYB+JAyNZJPPVm7UzTz7I9n9QtftGRn1/UYthsoXIxRvnR+taQm Ymy1szg5h88H1tjGwjqvN57e6GDu/0yZTvHJJg53ZCXvV/YY/4dwTL1degnum1ZKcSwe AP807fM/3ubCnUB7XWyhZOWo02QJ/+QiHh10m2DwI98gKQ3BuKzn5QS2peTsffQy49Lj oyH9y6CXRxR5T5fG7sHUki3XGbAnA8GLH3tLdyMJqelu317/XYqoa22g0+6v3R3GlXmt aSAnmJLpQFSnRMLdcDBita4P9Ujd7uPHVGXVhx+vRPgjPcKOsiwyxegpum5UXJV+et5u hiwQ== X-Gm-Message-State: AOJu0YwXwQgLyy0YzYhq59LSWwcLgFwegDXn4O9lFCVZgmrB0qM5gfG/ Q/uZstbC8X+OgOfZcbKIkZt42Q== X-Google-Smtp-Source: AGHT+IGNYrBgDx+nvC9AuuwI67vV1vM61BysWepXxnQdaZCSvAikEGaalUvebM9KlNFAOPYa+XowDw== X-Received: by 2002:ae9:e64d:0:b0:781:1ae:5aa1 with SMTP id x13-20020ae9e64d000000b0078101ae5aa1mr52489qkl.29.1703128760786; Wed, 20 Dec 2023 19:19:20 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id m18-20020a05620a221200b0077d85695db4sm371893qkh.99.2023.12.20.19.19.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 19:19:19 -0800 (PST) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC 2/3] iommu/intel: synchronize page table map and unmap operations Date: Thu, 21 Dec 2023 03:19:14 +0000 Message-ID: <20231221031915.619337-3-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231221031915.619337-1-pasha.tatashin@soleen.com> References: <20231221031915.619337-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A02F6140002 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 4xko8kqkmqn88y3x3n7964r4yrzm9tiq X-HE-Tag: 1703128761-827027 X-HE-Meta: U2FsdGVkX1+sAT7wm69AFnXxpV5TGDscH9+VByluKC0L3w0emqUyvzCl2f4koXUE7K6n0svJEn1hxyJlX+hwv3c16/mbvTnBLZ+roj8koKPweEIFrDycUrcNc/p5fsC0NsQJx/l4gdSmHarK9jr9qdLmiQFPOkiM2tp58GhUOB5haYOBsy/0ZSY9eYkJy6nD3902kO5R0dNdjFDJExtqr8Dtf5dCOJnVpAhhCgvD7yGofNj5MJaedvkZWarIurag/46C/gm0wy7ZV4c/svhInFZjHahxf48Fu+z5u1L85CEPQOeJf8+nGbYtTMmcUNbmv1TEWhzBSfqe2dW+lkQu3PvxnyxG042aD4q6LsOWEVsYkTqKgHGnqLc6/AWlYUGiw++FZkFH9IvvXBAUuMeq6w9mYPRwQshWky52/wn21RHGAqg2vWxr3P+hY7rWfkGcof+jYGg9uqHxfFx2lC0SUGuN3b0J3LeHSSGJRx/1CA5Z4PKRvFVH2ZTAsoxZdLEpXpFMl7FiUGOuKdMP8OPGzqv3XgiPuESK+qP/zoUSHNVnxUw5/spBTWbUuZbD/SykjmFtnf/+4zDut2RLYCzHk09oVFHkFslLeMORw0o5XihYdaatj5T9WNFO+piHbeBTKR44kHDoBtvaD5h5evGrLmrgv2l/Qzm7lTFHJjLypHMjMG2Y06+HQO3POcs8YX76xCW8JlSc0LEH9BZysz+DgPy+0+9EZU7F6SJTcajkkC53S6LgmIuZNzs2sxsKSXL5DFXO2gWPkjEM8ROuctxSzqVQUWDDz/ZvjsRH5EF/6eNHeSHoQP3oozM8+fDfHCxphnTQge4ignnqd6vztHv3dAVHrJkiJKPvr57q2HajcK5dDkqj70c16f+0S/gsdaKF60x4T4RV314U+DmcqWtk69dJ4cqtXEayTi/AHg5rpaYAxMxFtXpVViCeYfjc/4QcJ1mKzjsC3taeUMtBbdA ht80I7/Z P2eV+t7li3/RNAK1WmF+D31UY8ARHlq4lu550iDl6GcMkA2MJbqRN1sz2jNJVDP0Kruw6vGbzmc3gCe/UmLZHBbJS+KB4dYs/k1gW1RhNsqhAIF29cG0SMSYdTrsklHT5wDPDAYQgZNFnFLPLPitZqWzZNwYWk26oXA/tNfhstlWBHmRDi1B0umrn3IoGiIKgIdE210bMEZvxScv7tmNXEz3vg8/Rf3cg68mdLbbO+25b2YA8HOm4jq7kGxMRY4HnIAMyDDJCep3AsQh2p897wiADA2F25Sd7jRc5m2IpistTdwIzKMuz6hkrZQJdRAY0ji1wTUcBI0snX9bdk92ZOcI8LPVfb+4dyIMpXAERGZi8AONxIeKsCQsLJ05ZmCTCJhyK8HMuw3J5Va+Jpd+DQ9evcckUW54bK2JS6yWHOUCrscxip5MCE+sP/BwiEI22e6nt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since, we are going to update parent page table entries when lower level page tables become emtpy and we add them to the free list. We need a way to synchronize the operation. Use domain->pgd_lock to protect all map and unmap operations. This is reader/writer lock. At the beginning everything is going to be read only mode, however, later, when free page table on unmap is added we will add a writer section as well. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 21 +++++++++++++++++++-- drivers/iommu/intel/iommu.h | 3 +++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 4688ef797161..733f25b277a3 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1082,11 +1082,13 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain, unsigned long last_pfn, int retain_level) { + read_lock(&domain->pgd_lock); dma_pte_clear_range(domain, start_pfn, last_pfn); /* We don't need lock here; nobody else touches the iova range */ dma_pte_free_level(domain, agaw_to_level(domain->agaw), retain_level, domain->pgd, 0, start_pfn, last_pfn); + read_unlock(&domain->pgd_lock); /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { @@ -1179,9 +1181,11 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, WARN_ON(start_pfn > last_pfn)) return; + read_lock(&domain->pgd_lock); /* we don't need lock here; nobody else touches the iova range */ dma_pte_clear_level(domain, agaw_to_level(domain->agaw), domain->pgd, 0, start_pfn, last_pfn, freelist); + read_unlock(&domain->pgd_lock); /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { @@ -2217,6 +2221,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; + read_lock(&domain->pgd_lock); while (nr_pages > 0) { uint64_t tmp; @@ -2226,8 +2231,10 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, pte = pfn_to_dma_pte(domain, iov_pfn, &largepage_lvl, gfp); - if (!pte) + if (!pte) { + read_unlock(&domain->pgd_lock); return -ENOMEM; + } first_pte = pte; lvl_pages = lvl_to_nr_pages(largepage_lvl); @@ -2287,6 +2294,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, pte = NULL; } } + read_unlock(&domain->pgd_lock); return 0; } @@ -4013,6 +4021,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC); if (!domain->pgd) return -ENOMEM; + rwlock_init(&domain->pgd_lock); domain_flush_cache(domain, domain->pgd, PAGE_SIZE); return 0; } @@ -4247,11 +4256,15 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, unsigned long start_pfn, last_pfn; int level = 0; + read_lock(&dmar_domain->pgd_lock); /* Cope with horrid API which requires us to unmap more than the size argument if it happens to be a large-page mapping. */ if (unlikely(!pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, - &level, GFP_ATOMIC))) + &level, GFP_ATOMIC))) { + read_unlock(&dmar_domain->pgd_lock); return 0; + } + read_unlock(&dmar_domain->pgd_lock); if (size < VTD_PAGE_SIZE << level_to_offset_bits(level)) size = VTD_PAGE_SIZE << level_to_offset_bits(level); @@ -4315,8 +4328,10 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, int level = 0; u64 phys = 0; + read_lock(&dmar_domain->pgd_lock); pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &level, GFP_ATOMIC); + read_unlock(&dmar_domain->pgd_lock); if (pte && dma_pte_present(pte)) phys = dma_pte_addr(pte) + (iova & (BIT_MASK(level_to_offset_bits(level) + @@ -4919,8 +4934,10 @@ static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain, struct dma_pte *pte; int lvl = 0; + read_lock(&dmar_domain->pgd_lock); pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl, GFP_ATOMIC); + read_unlock(&dmar_domain->pgd_lock); pgsize = level_size(lvl) << VTD_PAGE_SHIFT; if (!pte || !dma_pte_present(pte)) { iova += pgsize; diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index f1ea508f45bd..cb0577ec5166 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -618,6 +618,9 @@ struct dmar_domain { struct { /* virtual address */ struct dma_pte *pgd; + + /* Synchronizes pgd map/unmap operations */ + rwlock_t pgd_lock; /* max guest address width */ int gaw; /* From patchwork Thu Dec 21 03:19:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13500958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37C32C46CD8 for ; Thu, 21 Dec 2023 03:19:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F4FB8D0008; Wed, 20 Dec 2023 22:19:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 97DA18D0001; Wed, 20 Dec 2023 22:19:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70F248D0008; Wed, 20 Dec 2023 22:19:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5B4378D0001 for ; Wed, 20 Dec 2023 22:19:24 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 39CFA16060E for ; Thu, 21 Dec 2023 03:19:24 +0000 (UTC) X-FDA: 81589369848.13.A46B4B2 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf16.hostedemail.com (Postfix) with ESMTP id 7A5F6180011 for ; Thu, 21 Dec 2023 03:19:22 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=TIou7z8c; dmarc=none; spf=pass (imf16.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703128762; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1AmuHXuEhInFkqtbi2Boe13e6hcYidcEU++H5mZtuKg=; b=JLM/KyY4i7prfotBDTkMnbhieTT22X8Lv7DhA48ws9+Og+osKaOGgjhgSVtGDtEFQ9ZpAn 0InleX6pT9/fm5zHfkzaXueSXd2uYxAmb0kcC+bGQgea9BDwWMPIpFWk+DII1eeBJE1fw4 VLA0FVBIuXw7C2eW5qj9GVi2OyO5S7w= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=TIou7z8c; dmarc=none; spf=pass (imf16.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703128762; a=rsa-sha256; cv=none; b=NXAh6wof6mS1T6uEZN+kuUcDBdz9ZPgz0Z7tuAaq/felDy9Jn464iAmg8REhxCyrfSYSbN 4ctwoIxwm5BYnefFuWMo8eR6l3QJI2OuqDmheFsVkoXeSjCh79049eKyYpfjcPBuEzSKDg FFROHLhFq1dcRCXlmN9piawUFsISkk4= Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-78120bb5592so2300585a.3 for ; Wed, 20 Dec 2023 19:19:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1703128761; x=1703733561; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=1AmuHXuEhInFkqtbi2Boe13e6hcYidcEU++H5mZtuKg=; b=TIou7z8cRgg5eHcxNZtfZ9eXYUHe61SUOJ55aZvIQljWC9LGOuhLAQ2kCdN9nN3CMY uGnX9fYrQp0skyNucldedivnBBH8hmMoxYONWPiUBsLhZo4zf++WmPIbTQj2Kfs08kMo F56lOF6Ld0ZuxgwjWwGGqslvDk7HxMVHgUmltXSwVWI0DcG9MQWV982Bwstl34wqMWwg WsEEOGPDE662q7ZMzfcO0Hh02RqQxKWo6Px7DfBIkhxVNmE4W0b6TgHDp8ru5R7SRRkn wwJ6DTahWh/dJfJvf09SKoeL4Dl0V57cKwPrtmzdX+YgL9MG6Xi1fF6hPLakVh+KTmH7 I4mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703128761; x=1703733561; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1AmuHXuEhInFkqtbi2Boe13e6hcYidcEU++H5mZtuKg=; b=QFAs1jmHxM6BimLXuyPD0cVCRn7O3kNwyJu7XHaPZ4fst7pTk+IDjospAVQLU/1NT0 cmTkRQa2+jOYixL1iKC//p2ZiyhwGrE/2+g/kxr4tfghzGyqYv0qZTm3b44aMaiolkoa c7NBg8/xIemF2x1I3Lxl3eVO/EJxHVdFHjfMkEG5tc+NlV5U9TinmC2hAC6hW3tSTr1R RA/WTDqUoazxEdXFnJq8AjCjFUPwZBo07Yh5fDIpgx9fH0alD3soSxJkRyvNcu8kIZ0n ZLlqw/sz7mXXnJ1n4nNm769L0a1J692j4T9sutM4GuL4+oqrFgnJKusnGl5EdqJ69RtS cdhg== X-Gm-Message-State: AOJu0Yzl9cgjG/KHYO4xmABLrExd0DjhrD3iUjzIMZsUtNcBLuymVOZj VkCojI1P5QLvl9a0Thj3I7IhyA== X-Google-Smtp-Source: AGHT+IHF5zSXhAGcOI9AIkZshfFia5huQM56CGqHZFlyV5V3eI/TVjPw6Y6m+SE7x4iKtWD/m0z+kg== X-Received: by 2002:a05:620a:4891:b0:774:cf9:b206 with SMTP id ea17-20020a05620a489100b007740cf9b206mr19223447qkb.42.1703128761561; Wed, 20 Dec 2023 19:19:21 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id m18-20020a05620a221200b0077d85695db4sm371893qkh.99.2023.12.20.19.19.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 19:19:21 -0800 (PST) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC 3/3] iommu/intel: free empty page tables on unmaps Date: Thu, 21 Dec 2023 03:19:15 +0000 Message-ID: <20231221031915.619337-4-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231221031915.619337-1-pasha.tatashin@soleen.com> References: <20231221031915.619337-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7A5F6180011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jsccz8tru1rjixsbtahfkszupnpfenob X-HE-Tag: 1703128762-77131 X-HE-Meta: U2FsdGVkX1+sceVOdSG4sDZ5k16+QnZjeHQXNlUo1CMbgWapNX4z3CBsnVtwMk8zPOxHL6crNVNsDpR4nWPvenaxXSwPVbDL9ehYR2P8I4qU7HO42F9aWpg0VL64gcdfUcTcrLj3lUnXGGJz/ARvcGv3cG8jR8gB129Tfpg4g+efgDHUcsRr0mRK1w35Kskh+Dz9sSPQFpWCeOj9jKNyrCTkhj71krYLNO/2kCn+LcEcJn5qfsumMB3kICSNz08rQPqWVTAvX8USbjUZVZBpoQmGtfatv7A840sF4SGe/+8vZf4nQ00Soes/AbDt4qm/3CXHJsvBxyqZAt4zXqhXxSZ8/RGzFAHiXFPY9ncsj7iKNdaObIghO1elrHK784BQOUyKXl10crImVE82bS2F9mrMjFS4N2w/1MuOgf/gVcd2IkcW1D15c+/wgQ3Nb5c52i8j43ip7qLmNsvdKrfCmsx8q8OA/gMeZK5LXa6G8q/LFWf3bL0yrE2qIDQUA93FOd68/KM3ouy02x5ULYAXCW4eGwG4oiRhJzUXUwrHuAX3+VPNArJYjeXcHVjgxqWYxHqgZD9J8pM0Prfh5NURmN0sWoOcH1c2+WwQh0mJ9jQmflsSJxiI7qBlrvKMb65WHnJhq5zuH+aqLy07ht3M4i6Od0pJoMNWvfyImGxFqgDkUJhnkIH46bDxSwPadQhQpwbp2kZrdFj1RtHpr1AJlTa6JhIxT+zbNGpB2gwsbaDnXlP1Xo0Zr31UTevyPQd5Vxh4AIcucb9v9pgJ8Es3JSVlCDfiTTNpHxDP6AdKM8g+qVx6uc1SHM9goNPSShafTOtR+YG4aFpkYNjwL/nQg/PXxgQi2RS2pBncH5BYTykQcQ3n/IrOZbMHHJjUVDqASswChRY4BCEXkClDwG0HffPt1FqhRUSm3+hT+ogSY9XUfAAMjynYkb5OL8QVVWt4lzm1nBDRjt9eBXCZKCF v667MSTc RDvue4CtiUaBeUfgsUvSAAuaCA0vnUznm3aiepYeWCbYsoZyKRV1VHnycG31rieLGG5w4ndxeraHIpiIfVWJ3KmhJ57s7okxtAlPpEpOzXxz7xtYvcPekWxc6SYCobd/NQdgPlJ2tBWIXMmTzI0uvHheRJ/pR9blaJp9mG5wiLj4DoMKxmimkgkOJZW3PGw7zNh1TY4sv1wsfrJtFsgToR8ndoomMDpxBP6uJByTUA9ZY+UEbo3p17fc/7TNaiaaYRRuzklLMENMWAyfXsfa3KN4f364La7C1n+3m/rNLy3mA/Nisc/1iMbuQvwtf+SjUogCxAiJWOkXF2iqRqb8gihFTR1qYc0jiIECzI6QtOzRZ3ePW+AEuO83JoFauOj9dajMFHRTK1qePp/MkIx3LHrfJ+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When page tables become empty, add them to the freelist so that they can also be freed. This is means that a page tables that are outside of the imediat iova range might be freed as well, therefore, only in the case where such page tables are going to be freed, we take the writer lock. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 92 +++++++++++++++++++++++++++++++------ 1 file changed, 78 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 733f25b277a3..141dc106fb01 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1130,7 +1130,7 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain, static void dma_pte_clear_level(struct dmar_domain *domain, int level, struct dma_pte *pte, unsigned long pfn, unsigned long start_pfn, unsigned long last_pfn, - struct list_head *freelist) + struct list_head *freelist, int *freed_level) { struct dma_pte *first_pte = NULL, *last_pte = NULL; @@ -1156,11 +1156,48 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level, first_pte = pte; last_pte = pte; } else if (level > 1) { + struct dma_pte *npte = phys_to_virt(dma_pte_addr(pte)); + struct page *npage = virt_to_page(npte); + /* Recurse down into a level that isn't *entirely* obsolete */ - dma_pte_clear_level(domain, level - 1, - phys_to_virt(dma_pte_addr(pte)), + dma_pte_clear_level(domain, level - 1, npte, level_pfn, start_pfn, last_pfn, - freelist); + freelist, freed_level); + + /* + * Free next level page table if it became empty. + * + * We only holding the reader lock, and it is possible + * that other threads are accessing page table as + * readers as well. We can only free page table that + * is outside of the request IOVA space only if + * we grab the writer lock. Since we need to drop reader + * lock, we are incrementing the refcount in the npage + * so it (and the current page table) does not + * dissappear due to concurrent unmapping threads. + * + * Store the size maximum size of the freed page table + * into freed_level, so the size of the IOTLB flush + * can be determined. + */ + if (freed_level && page_count(npage) == 1) { + page_ref_inc(npage); + read_unlock(&domain->pgd_lock); + write_lock(&domain->pgd_lock); + if (page_count(npage) == 2) { + dma_clear_pte(pte); + + if (!first_pte) + first_pte = pte; + + last_pte = pte; + list_add_tail(&npage->lru, freelist); + *freed_level = level; + } + write_unlock(&domain->pgd_lock); + read_lock(&domain->pgd_lock); + page_ref_dec(npage); + } } next: pfn = level_pfn + level_size(level); @@ -1175,7 +1212,8 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level, the page tables, and may have cached the intermediate levels. The pages can only be freed after the IOTLB flush has been done. */ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, - unsigned long last_pfn, struct list_head *freelist) + unsigned long last_pfn, struct list_head *freelist, + int *level) { if (WARN_ON(!domain_pfn_supported(domain, last_pfn)) || WARN_ON(start_pfn > last_pfn)) @@ -1184,7 +1222,8 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, read_lock(&domain->pgd_lock); /* we don't need lock here; nobody else touches the iova range */ dma_pte_clear_level(domain, agaw_to_level(domain->agaw), - domain->pgd, 0, start_pfn, last_pfn, freelist); + domain->pgd, 0, start_pfn, last_pfn, freelist, + level); read_unlock(&domain->pgd_lock); /* free pgd */ @@ -1524,11 +1563,11 @@ static void domain_flush_pasid_iotlb(struct intel_iommu *iommu, static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, struct dmar_domain *domain, - unsigned long pfn, unsigned int pages, + unsigned long pfn, unsigned long pages, int ih, int map) { - unsigned int aligned_pages = __roundup_pow_of_two(pages); - unsigned int mask = ilog2(aligned_pages); + unsigned long aligned_pages = __roundup_pow_of_two(pages); + unsigned long mask = ilog2(aligned_pages); uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT; u16 did = domain_id_iommu(domain, iommu); @@ -1872,7 +1911,8 @@ static void domain_exit(struct dmar_domain *domain) if (domain->pgd) { LIST_HEAD(freelist); - domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist); + domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist, + NULL); put_pages_list(&freelist); } @@ -3579,7 +3619,8 @@ static int intel_iommu_memory_notifier(struct notifier_block *nb, struct intel_iommu *iommu; LIST_HEAD(freelist); - domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist); + domain_unmap(si_domain, start_vpfn, last_vpfn, + &freelist, NULL); rcu_read_lock(); for_each_active_iommu(iommu, drhd) @@ -4253,6 +4294,7 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); + bool queued = iommu_iotlb_gather_queued(gather); unsigned long start_pfn, last_pfn; int level = 0; @@ -4272,7 +4314,16 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, start_pfn = iova >> VTD_PAGE_SHIFT; last_pfn = (iova + size - 1) >> VTD_PAGE_SHIFT; - domain_unmap(dmar_domain, start_pfn, last_pfn, &gather->freelist); + /* + * pass level only if !queued, which means we will do iotlb + * flush callback before freeing pages from freelist. + * + * When level is passed domain_unamp will attempt to add empty + * page tables to freelist, and pass the level number of the highest + * page table that was added to the freelist. + */ + domain_unmap(dmar_domain, start_pfn, last_pfn, &gather->freelist, + queued ? NULL : &level); if (dmar_domain->max_addr == iova + size) dmar_domain->max_addr = iova; @@ -4281,8 +4332,21 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, * We do not use page-selective IOTLB invalidation in flush queue, * so there is no need to track page and sync iotlb. */ - if (!iommu_iotlb_gather_queued(gather)) - iommu_iotlb_gather_add_page(domain, gather, iova, size); + if (!queued) { + size_t sz = size; + + /* + * Increase iova and sz for flushing if level was returned, + * as it means we also are freeing some page tables. + */ + if (level) { + unsigned long pgsize = level_size(level) << VTD_PAGE_SHIFT; + + iova = ALIGN_DOWN(iova, pgsize); + sz = ALIGN(size, pgsize); + } + iommu_iotlb_gather_add_page(domain, gather, iova, sz); + } return size; }