Message ID | 0132edfec66a6bd413823d43ccdf1c4d6aae2b60.camel@exaion.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] Revert "PCI/VPD: Allow access to valid parts of VPD if some is invalid" | expand |
[+cc Hannes, who did a lot of related VPD work and reviewed the original posting at https://lore.kernel.org/r/20210715215959.2014576-6-helgaas@kernel.org/] On Thu, Mar 07, 2024 at 05:09:27PM +0100, Josselin Mouette wrote: > When a device returns invalid VPD data, it can be misused by other > code paths in kernel space or user space, and there are instances > in which this seems to cause memory corruption. More of the background from Josselin at https://lore.kernel.org/r/aaea0b30c35bb73b947727e4b3ec354d6b5c399c.camel@exaion.com This is a regression, and obviously needs to be fixed somehow, but I'm a bit hesitant to revert this until we understand the problem better. If there's a memory corruption lurking and a revert hides the corruption so we never fix it, I'm not sure that's an improvement overall. > There is no sensible reason why the kernel would provide userspace > or drivers with invalid and potentially dangerous data. > > This reverts commit 5fe204eab174fd474227f23fd47faee4e7a6c000. > --- > drivers/pci/vpd.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > index 485a642b9304..daaa208c9d9c 100644 > --- a/drivers/pci/vpd.c > +++ b/drivers/pci/vpd.c > @@ -68,7 +68,7 @@ static size_t pci_vpd_size(struct pci_dev *dev) > if (pci_read_vpd_any(dev, off + 1, 2, > &header[1]) != 2) { > pci_warn(dev, "failed VPD read at > offset %zu\n", > off + 1); > - return off ?: PCI_VPD_SZ_INVALID; > + return PCI_VPD_SZ_INVALID; > } > size = pci_vpd_lrdt_size(header); > if (off + size > PCI_VPD_MAX_SIZE) > @@ -87,13 +87,13 @@ static size_t pci_vpd_size(struct pci_dev *dev) > return off; > } > } > - return off; > + return PCI_VPD_SZ_INVALID; > > error: > pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset > %zu%s\n", > header[0], size, off, off == 0 ? > "; assume missing optional EEPROM" : ""); > - return off ?: PCI_VPD_SZ_INVALID; > + return PCI_VPD_SZ_INVALID; > } > > static bool pci_vpd_available(struct pci_dev *dev, bool check_size) > -- > 2.39.2 >
On Thu, Mar 07, 2024 at 04:36:06PM -0600, Bjorn Helgaas wrote: > [+cc Hannes, who did a lot of related VPD work and reviewed the > original posting at > https://lore.kernel.org/r/20210715215959.2014576-6-helgaas@kernel.org/] > > On Thu, Mar 07, 2024 at 05:09:27PM +0100, Josselin Mouette wrote: > > When a device returns invalid VPD data, it can be misused by other > > code paths in kernel space or user space, and there are instances > > in which this seems to cause memory corruption. > > More of the background from Josselin at > https://lore.kernel.org/r/aaea0b30c35bb73b947727e4b3ec354d6b5c399c.camel@exaion.com > > This is a regression, and obviously needs to be fixed somehow, but I'm > a bit hesitant to revert this until we understand the problem better. > If there's a memory corruption lurking and a revert hides the > corruption so we never fix it, I'm not sure that's an improvement > overall. I don't want to drop this, but we're kind of stuck here: - I'd still like to understand the problem better. - Trivially, I can't apply patches lacking the appropriate signed-off-by; see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.6#n396 > > There is no sensible reason why the kernel would provide userspace > > or drivers with invalid and potentially dangerous data. > > > > This reverts commit 5fe204eab174fd474227f23fd47faee4e7a6c000. > > --- > > drivers/pci/vpd.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > > index 485a642b9304..daaa208c9d9c 100644 > > --- a/drivers/pci/vpd.c > > +++ b/drivers/pci/vpd.c > > @@ -68,7 +68,7 @@ static size_t pci_vpd_size(struct pci_dev *dev) > > if (pci_read_vpd_any(dev, off + 1, 2, > > &header[1]) != 2) { > > pci_warn(dev, "failed VPD read at > > offset %zu\n", > > off + 1); > > - return off ?: PCI_VPD_SZ_INVALID; > > + return PCI_VPD_SZ_INVALID; > > } > > size = pci_vpd_lrdt_size(header); > > if (off + size > PCI_VPD_MAX_SIZE) > > @@ -87,13 +87,13 @@ static size_t pci_vpd_size(struct pci_dev *dev) > > return off; > > } > > } > > - return off; > > + return PCI_VPD_SZ_INVALID; > > > > error: > > pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset > > %zu%s\n", > > header[0], size, off, off == 0 ? > > "; assume missing optional EEPROM" : ""); > > - return off ?: PCI_VPD_SZ_INVALID; > > + return PCI_VPD_SZ_INVALID; > > } > > > > static bool pci_vpd_available(struct pci_dev *dev, bool check_size) > > -- > > 2.39.2 > >
On 5/3/24 00:23, Bjorn Helgaas wrote: > On Thu, Mar 07, 2024 at 04:36:06PM -0600, Bjorn Helgaas wrote: >> [+cc Hannes, who did a lot of related VPD work and reviewed the >> original posting at >> https://lore.kernel.org/r/20210715215959.2014576-6-helgaas@kernel.org/] >> >> On Thu, Mar 07, 2024 at 05:09:27PM +0100, Josselin Mouette wrote: >>> When a device returns invalid VPD data, it can be misused by other >>> code paths in kernel space or user space, and there are instances >>> in which this seems to cause memory corruption. >> >> More of the background from Josselin at >> https://lore.kernel.org/r/aaea0b30c35bb73b947727e4b3ec354d6b5c399c.camel@exaion.com >> >> This is a regression, and obviously needs to be fixed somehow, but I'm >> a bit hesitant to revert this until we understand the problem better. >> If there's a memory corruption lurking and a revert hides the >> corruption so we never fix it, I'm not sure that's an improvement >> overall. > > I don't want to drop this, but we're kind of stuck here: > > - I'd still like to understand the problem better. > > - Trivially, I can't apply patches lacking the appropriate > signed-off-by; see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.6#n396 > Right. Again. So: Problem is that VPD data is a self-describing stream of serialized TLV records. The records themselves have no internal consistency check, so any data read is assumed to be correct. If there is an error during decoding we really should not continue, as everything beyond that error cannot be relied on. However, the assumption was that the read data is correct, so in getting an error the very assumption leading us to interpret the data as VPD records is invalid, too. But question remains: was the data correct until the error? Or was the error earlier, and we just mis-interpreted invalid data? In my original patch I used the second attempt, as this will pretty much guarantee that all VPD data will be correct. It has the drawback, though, that for some cards we won't be able to read the VPD data, even if they would provide valid data until the error. As there is no good way out of this I'd rather see a blacklist of cards which _can_ be read despite the error, and don't return VPD data otherwise. Cheers, Hannes
diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index 485a642b9304..daaa208c9d9c 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -68,7 +68,7 @@ static size_t pci_vpd_size(struct pci_dev *dev) if (pci_read_vpd_any(dev, off + 1, 2, &header[1]) != 2) { pci_warn(dev, "failed VPD read at offset %zu\n", off + 1); - return off ?: PCI_VPD_SZ_INVALID; + return PCI_VPD_SZ_INVALID; }