From patchwork Wed Oct 2 08:44:31 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Borislav Petkov X-Patchwork-Id: 2973611 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id A463E9F527 for ; Wed, 2 Oct 2013 08:45:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 58D6920413 for ; Wed, 2 Oct 2013 08:45:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 92636203DD for ; Wed, 2 Oct 2013 08:44:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753638Ab3JBIoj (ORCPT ); Wed, 2 Oct 2013 04:44:39 -0400 Received: from mail.skyhub.de ([78.46.96.112]:52008 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753296Ab3JBIoh (ORCPT ); Wed, 2 Oct 2013 04:44:37 -0400 X-Virus-Scanned: Nedap ESD1 at mail.skyhub.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alien8.de; s=alien8; t=1380703476; bh=zNJH397g46EkzdxT6mg1rp297vEhW/tMcQqnmeTWq24=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=Ro95OCJf27+b6T2stDsFWPZWVMnj/X3boIZdZe o/eKlm3f7x4Fwal98SBb498K/nMRVyx4T8iBSrRtoWzib3gVcfPvuA54m0DhPKtgAjW SJuyaT3lWTDJwz58MrGy5hSnnZ/MBGD4yAv9ciWcqteKv2wr/Bv1zP39ii3O3oa/Xw= Received: from mail.skyhub.de ([127.0.0.1]) by localhost (door.skyhub.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 0059Xrp7q-r3; Wed, 2 Oct 2013 10:44:35 +0200 (CEST) Received: from liondog.tnic (p4FF1C06D.dip0.t-ipconnect.de [79.241.192.109]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id D96271D9600; Wed, 2 Oct 2013 10:44:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alien8.de; s=alien8; t=1380703475; bh=zNJH397g46EkzdxT6mg1rp297vEhW/tMcQqnmeTWq24=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=LXbSTsaw5u8kwtrgkAQSUzmvBaM/NxgVuj1XAq NOE10XxmNpRd5sYrfFr65wwAct0EbAumx1tZ6NvxF+EmM6h239jCokpVP2GILXYpEo1 WHlW/iBhh1jMScWWe3TGeJe5hGxnnqh8SJM98GLWVNXoUqtptMyAH+yJKSPHL2VTEo= Received: by liondog.tnic (Postfix, from userid 1000) id 0316E102B99; Wed, 2 Oct 2013 10:44:31 +0200 (CEST) Date: Wed, 2 Oct 2013 10:44:31 +0200 From: Borislav Petkov To: Alex Williamson Cc: dwmw2@infradead.org, iommu@lists.linux-foundation.org, ddutile@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing Message-ID: <20131002084431.GA20568@pd.tnic> References: <20130615161614.2107.41044.stgit@bling.home> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20130615161614.2107.41044.stgit@bling.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sat, Jun 15, 2013 at 10:27:19AM -0600, Alex Williamson wrote: > At best the current code only seems to free the leaf pagetables and > the root. If you're unlucky enough to have a large gap (like any > QEMU guest with more than 3G of memory), only the first chunk of leaf > pagetables are freed (plus the root). This is a massive memory leak. > This patch re-writes the pagetable freeing function to use a > recursive algorithm and manages to not only free all the pagetables, > but does it without any apparent performance loss versus the current > broken version. > > Signed-off-by: Alex Williamson > Cc: stable@vger.kernel.org > --- > > Suggesting for stable, would like to see some soak time, but it's > hard to imagine this being any worse than the current code. Btw, I have a backport for the 3.0.x series which builds fine here, in case you guys are interested :) --- From: Alex Williamson Date: Sat, 15 Jun 2013 10:27:19 -0600 Subject: [PATCH] intel-iommu: Fix leaks in pagetable freeing upstream commit: 3269ee0bd6686baf86630300d528500ac5b516d7 At best the current code only seems to free the leaf pagetables and the root. If you're unlucky enough to have a large gap (like any QEMU guest with more than 3G of memory), only the first chunk of leaf pagetables are freed (plus the root). This is a massive memory leak. This patch re-writes the pagetable freeing function to use a recursive algorithm and manages to not only free all the pagetables, but does it without any apparent performance loss versus the current broken version. Signed-off-by: Alex Williamson Reviewed-by: Marcelo Tosatti Signed-off-by: Joerg Roedel Signed-off-by: Borislav Petkov --- drivers/pci/intel-iommu.c | 72 +++++++++++++++++++++++------------------------ 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index ae762ecc658b..68baf178cede 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -853,56 +853,54 @@ static int dma_pte_clear_range(struct dmar_domain *domain, return order; } +static void dma_pte_free_level(struct dmar_domain *domain, int level, + struct dma_pte *pte, unsigned long pfn, + unsigned long start_pfn, unsigned long last_pfn) +{ + pfn = max(start_pfn, pfn); + pte = &pte[pfn_level_offset(pfn, level)]; + + do { + unsigned long level_pfn; + struct dma_pte *level_pte; + + if (!dma_pte_present(pte) || dma_pte_superpage(pte)) + goto next; + + level_pfn = pfn & level_mask(level - 1); + level_pte = phys_to_virt(dma_pte_addr(pte)); + + if (level > 2) + dma_pte_free_level(domain, level - 1, level_pte, + level_pfn, start_pfn, last_pfn); + + /* If range covers entire pagetable, free it */ + if (!(start_pfn > level_pfn || + last_pfn < level_pfn + level_size(level))) { + dma_clear_pte(pte); + domain_flush_cache(domain, pte, sizeof(*pte)); + free_pgtable_page(level_pte); + } +next: + pfn += level_size(level); + } while (!first_pte_in_page(++pte) && pfn <= last_pfn); +} + /* free page table pages. last level pte should already be cleared */ static void dma_pte_free_pagetable(struct dmar_domain *domain, unsigned long start_pfn, unsigned long last_pfn) { int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT; - struct dma_pte *first_pte, *pte; - int total = agaw_to_level(domain->agaw); - int level; - unsigned long tmp; - int large_page = 2; BUG_ON(addr_width < BITS_PER_LONG && start_pfn >> addr_width); BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width); BUG_ON(start_pfn > last_pfn); /* We don't need lock here; nobody else touches the iova range */ - level = 2; - while (level <= total) { - tmp = align_to_level(start_pfn, level); - - /* If we can't even clear one PTE at this level, we're done */ - if (tmp + level_size(level) - 1 > last_pfn) - return; - - do { - large_page = level; - first_pte = pte = dma_pfn_level_pte(domain, tmp, level, &large_page); - if (large_page > level) - level = large_page + 1; - if (!pte) { - tmp = align_to_level(tmp + 1, level + 1); - continue; - } - do { - if (dma_pte_present(pte)) { - free_pgtable_page(phys_to_virt(dma_pte_addr(pte))); - dma_clear_pte(pte); - } - pte++; - tmp += level_size(level); - } while (!first_pte_in_page(pte) && - tmp + level_size(level) - 1 <= last_pfn); + dma_pte_free_level(domain, agaw_to_level(domain->agaw), + domain->pgd, 0, start_pfn, last_pfn); - domain_flush_cache(domain, first_pte, - (void *)pte - (void *)first_pte); - - } while (tmp && tmp + level_size(level) - 1 <= last_pfn); - level++; - } /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { free_pgtable_page(domain->pgd);