From patchwork Wed Feb 4 00:19:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bjorn Helgaas X-Patchwork-Id: 5773611 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 5B6439F302 for ; Wed, 4 Feb 2015 00:19:34 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 213CD20272 for ; Wed, 4 Feb 2015 00:19:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C40F520259 for ; Wed, 4 Feb 2015 00:19:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755290AbbBDATb (ORCPT ); Tue, 3 Feb 2015 19:19:31 -0500 Received: from mail-ob0-f177.google.com ([209.85.214.177]:41794 "EHLO mail-ob0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752544AbbBDATa (ORCPT ); Tue, 3 Feb 2015 19:19:30 -0500 Received: by mail-ob0-f177.google.com with SMTP id wp18so21416264obc.8 for ; Tue, 03 Feb 2015 16:19:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=nOFpBX6LJ1ld3ANBMdhH8alSaUBMYcSNSMaroahrfNc=; b=n+kBdCH5o6LSrmZt4v8G2vbp7oBGvkYe03Xpx9mMtfVWJ3YuIF3ZG5QSaN10N9F6W5 MaFXX+NmoQlUeZqsQm2vOvHOe+L5eRKt9woZKFwZbi3310dn1IPvNoLnbvxpL99+ubBV bXiyFLZHMB14KlG95cSOIzRMjVpsyudyo2Q4Ew1pvs7dPY7PXTuUkzWoSESL1rNO3diS SDlnB3S4XD35RgvOv1TGXxQPrfdlin1C1bdwOBumKqqonFb6+9WrfSJ1t38/O61tI/mS ddFmqeoEF4B8CG9vgXbqIB3cfSLanxK+cJf9zGW+SXVdaxjLGyMVHlnHy9ooGeaWuV/x MdbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=nOFpBX6LJ1ld3ANBMdhH8alSaUBMYcSNSMaroahrfNc=; b=kFauFYB0XHcIcWfCWe3Ve91brM3hklERO5+HJC9F8J5EiutUl76rpCt6uci828wekV NCdj/UbcIbi7Bsh8S+AprqVLMk2WB3ZziU7+9yCJd/AtGZesJC9yIJgsfnTLA+kPlyS8 vHc8E7SEo5dJH+rPJshY6vxH1IR9GVkOuZ5VKIO5nwo0J7AXC0gJZSwwV+0mDbK/a9+O UYhRluNEXhpcFFmZxY7GcHFs7XK652BACUn27E/t3toHgxoRBVsuuPQYiF8zWKI3lNtk 2QkI4dQJ8UHxTjokfr+349oc7x9qG1iKd77GxBen5UJn3xgODQM2oseYkUbJ2IV8h9Gq wR5w== X-Gm-Message-State: ALoCoQluahSualZ2NoLOMpwx4krkuVpsX8hC2c9vu3pqykwCFGX3vrJVK5zmifmqkNccytGdpH2e X-Received: by 10.60.134.178 with SMTP id pl18mr2750526oeb.6.1423009169422; Tue, 03 Feb 2015 16:19:29 -0800 (PST) Received: from google.com ([69.71.1.1]) by mx.google.com with ESMTPSA id fi4sm76318obb.16.2015.02.03.16.19.28 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 03 Feb 2015 16:19:28 -0800 (PST) Date: Tue, 3 Feb 2015 18:19:26 -0600 From: Bjorn Helgaas To: Wei Yang Cc: gwshan@linux.vnet.ibm.com, benh@au1.ibm.com, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc/powernv: make sure the IOV BAR will not exceed limit after shifting Message-ID: <20150204001926.GA19540@google.com> References: <20150130230803.GA6795@google.com> <1422946903-12958-1-git-send-email-weiyang@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1422946903-12958-1-git-send-email-weiyang@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FSL_HELO_FAKE, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Feb 03, 2015 at 03:01:43PM +0800, Wei Yang wrote: > The actual IOV BAR range is determined by the start address and the actual > size for vf_num VFs BAR. After shifting the IOV BAR, there would be a > chance the actual end address exceed the limit and overlap with other > devices. > > This patch adds a check to make sure after shifting, the range will not > overlap with other devices. I folded this into the previous patch (the one that adds pnv_pci_vf_resource_shift()). And I think that needs to be folded together with the following one ("powerpc/powernv: Allocate VF PE") because this one references pdn->vf_pes, which is added by "Allocate VF PE". > Signed-off-by: Wei Yang > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 53 ++++++++++++++++++++++++++--- > 1 file changed, 48 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > index 8456ae8..1a1e74b 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -854,16 +854,18 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev) > } > > #ifdef CONFIG_PCI_IOV > -static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) > +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) > { > struct pci_dn *pdn = pci_get_pdn(dev); > int i; > struct resource *res; > resource_size_t size; > + u16 vf_num; > > if (!dev->is_physfn) > - return; > + return -EINVAL; > > + vf_num = pdn->vf_pes; I can't actually build this, but I don't think pdn->vf_pes is defined yet. > for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) { > res = &dev->resource[i]; > if (!res->flags || !res->parent) > @@ -875,11 +877,49 @@ static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) > dev_info(&dev->dev, " Shifting VF BAR %pR to\n", res); > size = pci_iov_resource_size(dev, i); > res->start += size*offset; > - > dev_info(&dev->dev, " %pR\n", res); > + > + /* > + * The actual IOV BAR range is determined by the start address > + * and the actual size for vf_num VFs BAR. The check here is > + * to make sure after shifting, the range will not overlap > + * with other device. > + */ > + if ((res->start + (size * vf_num)) > res->end) { > + dev_err(&dev->dev, "VF BAR%d: %pR will conflict with" > + " other device after shift\n"); sriov_init() sets up "res" with enough space to contain TotalVF copies of the VF BAR. By the time we get here, that "res" is in the resource tree, and you should be able to see it in /proc/iomem. For example, if TotalVFs is 128 and VF BAR0 is 1MB in size, the resource size would be 128 * 1MB = 0x800_0000. If the VF BAR0 in the SR-IOV Capability contains a base address of 0x8000_0000, the resource would be: [mem 0x8000_0000-0x87ff_ffff] We have to assume there's another resource starting immediately after this one, i.e., at 0x8800_0000, and we have to make sure that when we change this resource and turn on SR-IOV, we don't overlap with it. The shifted resource will start at 0x8000_0000 + 1MB * "offset". The hardware will respond to a range whose size is 1MB * NumVFs (NumVFs may be smaller than TotalVFs). If we enable 16 VFs and shift by 23, we set VF BAR0 to 0x8000_0000 + 1MB * 23 = 0x8170_0000, and the size is 1MB * 16 = 0x100_0000, so the new resource will be: [mem 0x8170_0000-0x826f_ffff] That's fine; it doesn't extend past the original end of 0x87ff_ffff. But if we enable those same 16 VFs with a shift of 120, we set VF BAR0 to 0x8000_0000 + 1MB * 120 = 0x8780_0000, and the size stays the same, so the new resource will be: [mem 0x8780_0000-0x887f_ffff] and that's a problem because we have two devices responding at 0x8800_0000. Your test of "res->start + (size * vf_num)) > res->end" is not strict enough to catch this problem. I think we need something like the patch below. I restructured it so we don't have to back out any resource changes if we fail. This shifting strategy seems to imply that the closer NumVFs is to TotalVFs, the less flexibility you have to assign PEs, e.g., if NumVFs == TotalVFs, you wouldn't be able to shift at all. In this example, you could shift by anything from 0 to 128 - 16 = 112, but if you wanted NumVFs = 64, you could only shift by 0 to 64. Is that true? I think your M64 BAR gets split into 256 segments, regardless of what TotalVFs is, so if you expanded the resource to 256 * 1MB for this example, you would be able to shift by up to 256 - NumVFs. Do you actually do this somewhere? I pushed an updated pci/virtualization branch with these updates. I think there's also a leak that needs to be fixed that Dan Carpenter pointed out. Bjorn > + goto failed; > + } > + } > + > + for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) { > + res = &dev->resource[i]; > + if (!res->flags || !res->parent) > + continue; > + > + if (!pnv_pci_is_mem_pref_64(res->flags)) > + continue; > + > pci_update_resource(dev, i); > } > pdn->max_vfs -= offset; > + return 0; > + > +failed: > + for (; i >= PCI_IOV_RESOURCES; i--) { > + res = &dev->resource[i]; > + if (!res->flags || !res->parent) > + continue; > + > + if (!pnv_pci_is_mem_pref_64(res->flags)) > + continue; > + > + dev_info(&dev->dev, " Shifting VF BAR %pR to\n", res); > + size = pci_iov_resource_size(dev, i); > + res->start += size*(-offset); > + dev_info(&dev->dev, " %pR\n", res); > + } > + return -EBUSY; > } > #endif /* CONFIG_PCI_IOV */ > > @@ -1480,8 +1520,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 vf_num) > } > > /* Do some magic shift */ > - if (pdn->m64_per_iov == 1) > - pnv_pci_vf_resource_shift(pdev, pdn->offset); > + if (pdn->m64_per_iov == 1) { > + ret = pnv_pci_vf_resource_shift(pdev, pdn->offset); > + if (ret) > + goto m64_failed; > + } > } > > /* Setup VF PEs */ commit 9849fc0c807d3544dbd682354325b2454f139ca4 Author: Wei Yang Date: Tue Feb 3 13:09:30 2015 -0600 powerpc/powernv: Shift VF resource with an offset On PowerNV platform, resource position in M64 implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64. Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 74de05e58e1d..6ffedcc291a8 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -749,6 +749,74 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev) return 10; } +#ifdef CONFIG_PCI_IOV +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) +{ + struct pci_dn *pdn = pci_get_pdn(dev); + int i; + struct resource *res, res2; + resource_size_t size, end; + u16 vf_num; + + if (!dev->is_physfn) + return -EINVAL; + + /* + * "offset" is in VFs. The M64 BARs are sized so that when they + * are segmented, each segment is the same size as the IOV BAR. + * Each segment is in a separate PE, and the high order bits of the + * address are the PE number. Therefore, each VF's BAR is in a + * separate PE, and changing the IOV BAR start address changes the + * range of PEs the VFs are in. + */ + vf_num = pdn->vf_pes; // FIXME not defined yet + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + res = &dev->resource[i + PCI_IOV_RESOURCES]; + if (!res->flags || !res->parent) + continue; + + if (!pnv_pci_is_mem_pref_64(res->flags)) + continue; + + /* + * The actual IOV BAR range is determined by the start address + * and the actual size for vf_num VFs BAR. This check is to + * make sure that after shifting, the range will not overlap + * with another device. + */ + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); + res2.flags = res->flags; + res2.start = res->start + (size * offset); + res2.end = res2.start + (size * vf_num) - 1; + + if (res2.end > res->end) { + dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR (trying to enable %d VFs shifted by %d)\n", + i, &res2, res, vf_num, offset); + return -EBUSY; + } + } + + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + res = &dev->resource[i + PCI_IOV_RESOURCES]; + if (!res->flags || !res->parent) + continue; + + if (!pnv_pci_is_mem_pref_64(res->flags)) + continue; + + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); + res2 = *res; + res->start += size * offset; + + dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d VFs shifted by %d)\n", + i, &res2, res, vf_num, offset); + pci_update_resource(dev, i + PCI_IOV_RESOURCES); + } + pdn->max_vfs -= offset; + return 0; +} +#endif /* CONFIG_PCI_IOV */ + #if 0 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) {