diff mbox series

fpga: altera_cvp: restrict registration to CvP enabled devices

Message ID 78c44ad0b2344a4490ffd300cf0df746@SRV177.busymouse24.de (mailing list archive)
State Superseded
Headers show
Series fpga: altera_cvp: restrict registration to CvP enabled devices | expand

Commit Message

Andreas Puhm Oct. 22, 2018, 1:15 p.m. UTC
Hi Moritz,

Thank you, for your fast response!
Below you can find the updated patch.

--------------------------------------------------------------------
Full description:
The altera_cvp probe function only checks, 
if the Altera/Intel PCI device configuration space contains a vendor specific entry (VSEC Capability Header 0x000b) at offset 0x200.
 But the probe function does not verify, if the PCI device (and further the FPGA), 
for which it has been called, actually supports the Configure-via-Protocol feature.

The PCI device (FPGA) can explicitly disable the Configur-via-Protocol (CvP) feature by setting the CVP_EN bit, index 20 of CVP_STATUS register, to '0'. 
As the altera_cvp probe function does not check this it registers the device in any way. 
At this point, the altera_cvp module cannot be used to program this device via CvP. 
In addition no other module can use the device, as it is still registered by altera_cvp.

Keywords: altera_cvp module, PCI, Configure-via-Protocol

Kernel version: problem occured with v4.15, should occur from 4.14+

Instructions to reproduce: 
Proper hardware is necessary to reproduce this, i.e., FPGA with instantiated Altera/Intel PCIe IP Core with disabled CvP feature.

Workaround:
It is possible to circumvent this problem by manually unloading or blacklisting the altera_cvp module.
--------------------------------------------------------------------
Suggested Patch:
This patch was successfully build and tested for 4.15 and 4.18

The patch is based on: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tag/?h=v4.18

Subject: [PATCH] fpga: altera_cvp: restrict registration to CvP enabled devices

The altera-cvp probe function now verifies, that the PCI device supports
the CvP feature, before it registers the device.
This is done by reading the CVP_EN bit,
Bit 20 of the CVP_STATUS register (@ PCI Config Address 0x21C).

If this bit is '1' (CvP enabled), altera-cvp will register the device
for further interaction.
If this bit is '0' (CvP disabled), altera-cvp will not register the device.

Signed-off-by: Andreas Puhm <puhm@oregano.at>
---
 drivers/fpga/altera-cvp.c | 9 +++++++++
 1 file changed, 9 insertions(+)

--


With best regards,
Andreas Puhm <puhm@oregano.at>

Comments

Eric Schwarz Oct. 22, 2018, 1:34 p.m. UTC | #1
Hi Andreas,

Am 22.10.2018 15:15, schrieb Andreas Puhm:

> Hi Moritz,
> 
> Thank you, for your fast response!
> Below you can find the updated patch.
> 
> --------------------------------------------------------------------
> Full description:
> The altera_cvp probe function only checks,
> if the Altera/Intel PCI device configuration space contains a vendor 
> specific entry (VSEC Capability Header 0x000b) at offset 0x200.
> But the probe function does not verify, if the PCI device (and further 
> the FPGA),
> for which it has been called, actually supports the 
> Configure-via-Protocol feature.
> 
> The PCI device (FPGA) can explicitly disable the Configur-via-Protocol 
> (CvP) feature by setting the CVP_EN bit, index 20 of CVP_STATUS 
> register, to '0'.
> As the altera_cvp probe function does not check this it registers the 
> device in any way.
> At this point, the altera_cvp module cannot be used to program this 
> device via CvP.
> In addition no other module can use the device, as it is still 
> registered by altera_cvp.
> 
> Keywords: altera_cvp module, PCI, Configure-via-Protocol
> 
> Kernel version: problem occured with v4.15, should occur from 4.14+
> 
> Instructions to reproduce:
> Proper hardware is necessary to reproduce this, i.e., FPGA with 
> instantiated Altera/Intel PCIe IP Core with disabled CvP feature.
> 
> Workaround:
> It is possible to circumvent this problem by manually unloading or 
> blacklisting the altera_cvp module.
> --------------------------------------------------------------------
> Suggested Patch:
> This patch was successfully build and tested for 4.15 and 4.18
> 
> The patch is based on: 
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tag/?h=v4.18
> 
> Subject: [PATCH] fpga: altera_cvp: restrict registration to CvP enabled 
> devices
> 
> The altera-cvp probe function now verifies, that the PCI device 
> supports
> the CvP feature, before it registers the device.
> This is done by reading the CVP_EN bit,
> Bit 20 of the CVP_STATUS register (@ PCI Config Address 0x21C).
> 
> If this bit is '1' (CvP enabled), altera-cvp will register the device
> for further interaction.
> If this bit is '0' (CvP disabled), altera-cvp will not register the 
> device.
> 
> Signed-off-by: Andreas Puhm <puhm@oregano.at>
> ---
> drivers/fpga/altera-cvp.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
> index 7fa793672a7a..838abcfca0fb 100644
> --- a/drivers/fpga/altera-cvp.c
> +++ b/drivers/fpga/altera-cvp.c
> @@ -403,6 +403,7 @@ static int altera_cvp_probe(struct pci_dev *pdev,
> struct altera_cvp_conf *conf;
> struct fpga_manager *mgr;
> u16 cmd, val;
> +    u32 val32;

Please do not use variable size in variable name.

> int ret;
> 
> /*
> @@ -416,6 +417,14 @@ static int altera_cvp_probe(struct pci_dev *pdev,
> return -ENODEV;
> }
> 
> +    pci_read_config_dword(pdev, VSE_CVP_STATUS, &val32);
> +    if (!(val32 & VSE_CVP_STATUS_CVP_EN)) {
> +        dev_err(&pdev->dev,
> +            "CVP is disabled for this device: CVP_STATUS Reg 0x%x\n",
> +            val32);
> +        return -ENODEV;
> +    }
> +
> conf = devm_kzalloc(&pdev->dev, sizeof(*conf), GFP_KERNEL);
> if (!conf)
> return -ENOMEM;
> --
> 
> With best regards,
> Andreas Puhm <puhm@oregano.at>

Cheers
Eric
Moritz Fischer Oct. 22, 2018, 2:04 p.m. UTC | #2
Hi Andreas,

On Mon, Oct 22, 2018 at 01:15:34PM +0000, Andreas Puhm wrote:

Can you please send your patch using git-send-email?

[..]

> Subject: [PATCH] fpga: altera_cvp: restrict registration to CvP enabled devices

How about:

fpga: altera-cvp: Fix registration for CvP incapable devices

The probe function needs to verify the CvP enable bit in order to
properly determine if FPGA Manager functionality can be safely
enabled.

> 
> The altera-cvp probe function now verifies, that the PCI device supports
> the CvP feature, before it registers the device.
> This is done by reading the CVP_EN bit,
> Bit 20 of the CVP_STATUS register (@ PCI Config Address 0x21C).
> 
> If this bit is '1' (CvP enabled), altera-cvp will register the device
> for further interaction.
> If this bit is '0' (CvP disabled), altera-cvp will not register the device.
> 

Could you add a Fixes <hash> ("Message") tag here, I believe we had this
issue since the very beginning, i.e

Fixes 34d1dc17ce97 ("fpga manager: Add Altera CvP driver")

Something like ^^^^

Thanks,

Moritz
Anatolij Gustschin Oct. 23, 2018, 4:26 p.m. UTC | #3
Hi Andreas,

On Mon, 22 Oct 2018 13:15:34 +0000
Andreas Puhm puhm@oregano.at wrote:
...
>Full description:
>The altera_cvp probe function only checks, 
>if the Altera/Intel PCI device configuration space contains a vendor
>specific entry (VSEC Capability Header 0x000b) at offset 0x200.
> But the probe function does not verify, if the PCI device (and further
>the FPGA), for which it has been called, actually supports the Configure-
>via-Protocol feature.
>
>The PCI device (FPGA) can explicitly disable the Configur-via-Protocol
>(CvP) feature by setting the CVP_EN bit, index 20 of CVP_STATUS register,
>to '0'.
>As the altera_cvp probe function does not check this it registers the
>device in any way.

The CvP docs says that on some FPGAs (e.g. Arria 10) the assertion of CVP
status can take up to 500ms. However it is not clear whether this delay
might be required after peripheral image configuration and after PCIe
link activation. The diagram describing configuration sequence suggests
that CVP_EN should be polled until it is asserted. I can imaging the
situation that this bit is still not asserted when the device is being
probed. Maybe we should better defer device probing if CVP_EN bit is
cleared? When deferred probing fails again and sufficient period for
CVP_EN bit assertion elapsed, then stop deferred probing and return
-ENODEV?

Thanks,

Anatolij
Andreas Puhm Oct. 23, 2018, 6:46 p.m. UTC | #4
Hi Anatolij,

> Von: Anatolij Gustschin [mailto:agust@denx.de]
> Gesendet: Dienstag, 23. Oktober 2018 18:27
> An: Andreas Puhm <puhm@oregano.at>
> Cc: Moritz Fischer <mdf@kernel.org>; Alan Tull <atull@kernel.org>; linux-fpga@vger.kernel.org; linux-kernel@vger.kernel.org
> Betreff: Re: [PATCH] fpga: altera_cvp: restrict registration to CvP enabled devices
> 
> Hi Andreas,
> 
> On Mon, 22 Oct 2018 13:15:34 +0000
> Andreas Puhm puhm@oregano.at wrote:
> ...
> >Full description:
> >The altera_cvp probe function only checks,
> >if the Altera/Intel PCI device configuration space contains a vendor
> >specific entry (VSEC Capability Header 0x000b) at offset 0x200.
> > But the probe function does not verify, if the PCI device (and further
> >the FPGA), for which it has been called, actually supports the Configure-
> >via-Protocol feature.
> >
> >The PCI device (FPGA) can explicitly disable the Configur-via-Protocol
> >(CvP) feature by setting the CVP_EN bit, index 20 of CVP_STATUS register,
> >to '0'.
> >As the altera_cvp probe function does not check this it registers the
> >device in any way.
> 
> The CvP docs says that on some FPGAs (e.g. Arria 10) the assertion of CVP
> status can take up to 500ms. However it is not clear whether this delay
> might be required after peripheral image configuration and after PCIe
> link activation. The diagram describing configuration sequence suggests
> that CVP_EN should be polled until it is asserted. I can imaging the
> situation that this bit is still not asserted when the device is being
> probed. Maybe we should better defer device probing if CVP_EN bit is
> cleared? When deferred probing fails again and sufficient period for
> CVP_EN bit assertion elapsed, then stop deferred probing and return
> -ENODEV?
> 
> Thanks,
> 
> Anatolij

Anatolij, thank you for your feedback.

My rationale behind the patch is as follows:

The CVP_EN is part of the Hard PCIe IP core configuration,
and therefore, has a defined and static value right from "the start".

Remark in [1, fig 12]
" For high density devices such as Intel Cyclone 10 GX, 
it may be necessary to wait up to 500 ms for the CvP
status register bit assertion."
According to [2] the Cyclone 10 GX devices achieve proper operation
within 100 ms (via the PCIe IP core and CvP).

I think (and here the documentation is a bit lacking), 
that this remark is valid only for other bits of the status register,
e.g., CVP_CONFIG_DONE or USERMODE.
I also think, that the 500 ms delay is calculated from peripheral + core image programming
and that the time for peripheral image programming is far lower than that 
(i.e., low enough to allow PCI enumeration).

But if this actually means that it can take up to 500 ms to program the peripheral image, 
than such FPGAs would have different problems.
I.e., missing the deadline for PCI enumeration. 
This would need a solution outside of the scope of the
altera_cvp module (e.g., soft-reset to re-start enumeration with a stable system).

Bottom line: 
The CVP_EN should be deemed stable when altera_cvp is called, 
if not, 
the programming of the Intel/Altera FPGA and PCIe IP core has not been completed in time
for the enumeration of the PCI device. Hence it would be questionable or, more likely, would not
have completed successfully in the first place, i.e., altera_cvp would not have been called.

[1] "Intel(r) Cyclone(r) 10 GX CvP Initialization over PCI Express User Guide", 2018.01.02
[2] "Intel(r) Cyclone(r) 10 GX Device Overview", 2017.05.08

With best regards,
Andreas
Moritz Fischer Oct. 24, 2018, 9:51 a.m. UTC | #5
Hi Anatolij, Andreas,

On Tue, Oct 23, 2018 at 06:46:47PM +0000, Andreas Puhm wrote:
> Hi Anatolij,
> 
> > The CvP docs says that on some FPGAs (e.g. Arria 10) the assertion of CVP
> > status can take up to 500ms. However it is not clear whether this delay
> > might be required after peripheral image configuration and after PCIe
> > link activation. The diagram describing configuration sequence suggests
> > that CVP_EN should be polled until it is asserted. I can imaging the
> > situation that this bit is still not asserted when the device is being
> > probed. Maybe we should better defer device probing if CVP_EN bit is
> > cleared? When deferred probing fails again and sufficient period for
> > CVP_EN bit assertion elapsed, then stop deferred probing and return
> > -ENODEV?
> > 
> > Thanks,
> > 
> > Anatolij
> 
> Anatolij, thank you for your feedback.
> 
> My rationale behind the patch is as follows:
> 
> The CVP_EN is part of the Hard PCIe IP core configuration,
> and therefore, has a defined and static value right from "the start".
> 
> Remark in [1, fig 12]
> " For high density devices such as Intel Cyclone 10 GX, 
> it may be necessary to wait up to 500 ms for the CvP
> status register bit assertion."
> According to [2] the Cyclone 10 GX devices achieve proper operation
> within 100 ms (via the PCIe IP core and CvP).
> 
> I think (and here the documentation is a bit lacking), 
> that this remark is valid only for other bits of the status register,
> e.g., CVP_CONFIG_DONE or USERMODE.
> I also think, that the 500 ms delay is calculated from peripheral + core image programming
> and that the time for peripheral image programming is far lower than that 
> (i.e., low enough to allow PCI enumeration).
> 
> But if this actually means that it can take up to 500 ms to program the peripheral image, 
> than such FPGAs would have different problems.
> I.e., missing the deadline for PCI enumeration. 
> This would need a solution outside of the scope of the
> altera_cvp module (e.g., soft-reset to re-start enumeration with a stable system).
> 
> Bottom line: 
> The CVP_EN should be deemed stable when altera_cvp is called, 
> if not, 
> the programming of the Intel/Altera FPGA and PCIe IP core has not been completed in time
> for the enumeration of the PCI device. Hence it would be questionable or, more likely, would not
> have completed successfully in the first place, i.e., altera_cvp would not have been called.

Yeah I think this makes sense. If your config space isn't up on boot you
would run into issues. I agree the docs are soemwhat vague here. Maybe Matthew or Alan can shoot
an email to their HW folks internally to clarify?

Thanks,
Moritz
Matthew Gerlach Oct. 24, 2018, 11 p.m. UTC | #6
On Wed, 24 Oct 2018, Moritz Fischer wrote:

> Hi Anatolij, Andreas,
>
> On Tue, Oct 23, 2018 at 06:46:47PM +0000, Andreas Puhm wrote:
>> Hi Anatolij,
>>
>>> The CvP docs says that on some FPGAs (e.g. Arria 10) the assertion of CVP
>>> status can take up to 500ms. However it is not clear whether this delay
>>> might be required after peripheral image configuration and after PCIe
>>> link activation. The diagram describing configuration sequence suggests
>>> that CVP_EN should be polled until it is asserted. I can imaging the
>>> situation that this bit is still not asserted when the device is being
>>> probed. Maybe we should better defer device probing if CVP_EN bit is
>>> cleared? When deferred probing fails again and sufficient period for
>>> CVP_EN bit assertion elapsed, then stop deferred probing and return
>>> -ENODEV?
>>>
>>> Thanks,
>>>
>>> Anatolij
>>
>> Anatolij, thank you for your feedback.
>>
>> My rationale behind the patch is as follows:
>>
>> The CVP_EN is part of the Hard PCIe IP core configuration,
>> and therefore, has a defined and static value right from "the start".
>>
>> Remark in [1, fig 12]
>> " For high density devices such as Intel Cyclone 10 GX,
>> it may be necessary to wait up to 500 ms for the CvP
>> status register bit assertion."
>> According to [2] the Cyclone 10 GX devices achieve proper operation
>> within 100 ms (via the PCIe IP core and CvP).
>>
>> I think (and here the documentation is a bit lacking),
>> that this remark is valid only for other bits of the status register,
>> e.g., CVP_CONFIG_DONE or USERMODE.
>> I also think, that the 500 ms delay is calculated from peripheral + core image programming
>> and that the time for peripheral image programming is far lower than that
>> (i.e., low enough to allow PCI enumeration).
>>
>> But if this actually means that it can take up to 500 ms to program the peripheral image,
>> than such FPGAs would have different problems.
>> I.e., missing the deadline for PCI enumeration.
>> This would need a solution outside of the scope of the
>> altera_cvp module (e.g., soft-reset to re-start enumeration with a stable system).
>>
>> Bottom line:
>> The CVP_EN should be deemed stable when altera_cvp is called,
>> if not,
>> the programming of the Intel/Altera FPGA and PCIe IP core has not been completed in time
>> for the enumeration of the PCI device. Hence it would be questionable or, more likely, would not
>> have completed successfully in the first place, i.e., altera_cvp would not have been called.
>
> Yeah I think this makes sense. If your config space isn't up on boot you
> would run into issues. I agree the docs are soemwhat vague here. Maybe Matthew or Alan can shoot
> an email to their HW folks internally to clarify?

My experience with cvp is with Arria10 and Stratix 10.  The PCIe Hard IP 
gets configured when the IOring gets configured at power on.  The idea is 
that the load of the IOring is very fast, much before the infamous 100ms
PCIe timeout for link training.  When the Hard IP is configured, the 
CVP_EN is set or cleared according to how it was configured.  Yes, you 
will run into issues if config space is not up on boot.  The exact nature 
of the issues is dependent on the platform being used.

For the record, if cvp is not being used then the initial full 
configuration of the FPGA can take much longer than just configuring the 
IOring.  In the worst case, if the initial FPGA image in flash is 
corrupted, it can take a while before the failover image gets configured 
into the fpga.  This might be the explanation for the 500 ms for Cyclone 
10 GX devices.  For an Arria10, the flash failover can take much longer 
than even the 500ms, which has been shown to have issues for many platforms.

Thanks,
Matthew

>
> Thanks,
> Moritz
>
Andreas Puhm Oct. 25, 2018, 8:44 a.m. UTC | #7
Hi Moritz, Matthew,

>> Hi Anatolij, Andreas,
>>
>> On Tue, Oct 23, 2018 at 06:46:47PM +0000, Andreas Puhm wrote:
>>> Hi Anatolij,
>>>
>>>> The CvP docs says that on some FPGAs (e.g. Arria 10) the assertion of CVP
>>>> status can take up to 500ms. However it is not clear whether this delay
>>>> might be required after peripheral image configuration and after PCIe
>>>> link activation. The diagram describing configuration sequence suggests
>>>> that CVP_EN should be polled until it is asserted. I can imaging the
>>>> situation that this bit is still not asserted when the device is being
>>>> probed. Maybe we should better defer device probing if CVP_EN bit is
>>>> cleared? When deferred probing fails again and sufficient period for
>>>> CVP_EN bit assertion elapsed, then stop deferred probing and return
>>>> -ENODEV?
>>>>
>>>> Thanks,
>>>>
>>>> Anatolij
>>>
>>> Anatolij, thank you for your feedback.
>>>
>>> My rationale behind the patch is as follows:
>>>
>>> The CVP_EN is part of the Hard PCIe IP core configuration,
>>> and therefore, has a defined and static value right from "the start".
>>>
>>> Remark in [1, fig 12]
>>> " For high density devices such as Intel Cyclone 10 GX,
>>> it may be necessary to wait up to 500 ms for the CvP
>>> status register bit assertion."
>>> According to [2] the Cyclone 10 GX devices achieve proper operation
>>> within 100 ms (via the PCIe IP core and CvP).
>>>
>>> I think (and here the documentation is a bit lacking),
>>> that this remark is valid only for other bits of the status register,
>>> e.g., CVP_CONFIG_DONE or USERMODE.
>>> I also think, that the 500 ms delay is calculated from peripheral + core image programming
>>> and that the time for peripheral image programming is far lower than that
>>> (i.e., low enough to allow PCI enumeration).
>>>
>>> But if this actually means that it can take up to 500 ms to program the peripheral image,
>>> than such FPGAs would have different problems.
>>> I.e., missing the deadline for PCI enumeration.
>>> This would need a solution outside of the scope of the
>>> altera_cvp module (e.g., soft-reset to re-start enumeration with a stable system).
>>>
>>> Bottom line:
>>> The CVP_EN should be deemed stable when altera_cvp is called,
>>> if not,
>>> the programming of the Intel/Altera FPGA and PCIe IP core has not been completed in time
>>> for the enumeration of the PCI device. Hence it would be questionable or, more likely, would not
>>> have completed successfully in the first place, i.e., altera_cvp would not have been called.
>>
>> Yeah I think this makes sense. If your config space isn't up on boot you
>> would run into issues. I agree the docs are soemwhat vague here. Maybe Matthew or Alan can shoot
>> an email to their HW folks internally to clarify?
>
>My experience with cvp is with Arria10 and Stratix 10.  The PCIe Hard IP
>gets configured when the IOring gets configured at power on.  The idea is
>that the load of the IOring is very fast, much before the infamous 100ms
>PCIe timeout for link training.  When the Hard IP is configured, the
>CVP_EN is set or cleared according to how it was configured.  Yes, you

So is it correct that the value of CVP_EN can be evaluated by the altera_cvp right in the first call of its probe function
(as would be the case with my proposed patch).

If it is, I will fix the remaining issues with the patch and submit it.

>will run into issues if config space is not up on boot.  The exact nature
>of the issues is dependent on the platform being used.
>
>For the record, if cvp is not being used then the initial full
>configuration of the FPGA can take much longer than just configuring the
>IOring.  In the worst case, if the initial FPGA image in flash is
>corrupted, it can take a while before the failover image gets configured
>into the fpga.  This might be the explanation for the 500 ms for Cyclone
>10 GX devices.  For an Arria10, the flash failover can take much longer
>than even the 500ms, which has been shown to have issues for many platforms.
>
>Thanks,
>Matthew
>
>>
>> Thanks,
>> Moritz
>>

Thanks,
Andreas
Moritz Fischer Oct. 28, 2018, 5:35 p.m. UTC | #8
Hi Andreas,

On Thu, Oct 25, 2018 at 08:44:06AM +0000, Andreas Puhm wrote:

> >My experience with cvp is with Arria10 and Stratix 10.  The PCIe Hard IP
> >gets configured when the IOring gets configured at power on.  The idea is
> >that the load of the IOring is very fast, much before the infamous 100ms
> >PCIe timeout for link training.  When the Hard IP is configured, the
> >CVP_EN is set or cleared according to how it was configured.  Yes, you
> 
> So is it correct that the value of CVP_EN can be evaluated by the altera_cvp right in the first call of its probe function
> (as would be the case with my proposed patch).
> 
> If it is, I will fix the remaining issues with the patch and submit it.

Yes please, go ahead and fix up the remaining issues (+ send it using
git send-email)

Thanks,
Moritz
diff mbox series

Patch

diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
index 7fa793672a7a..838abcfca0fb 100644
--- a/drivers/fpga/altera-cvp.c
+++ b/drivers/fpga/altera-cvp.c
@@ -403,6 +403,7 @@  static int altera_cvp_probe(struct pci_dev *pdev,
 	struct altera_cvp_conf *conf;
 	struct fpga_manager *mgr;
 	u16 cmd, val;
+	u32 val32;
 	int ret;
 
 	/*
@@ -416,6 +417,14 @@  static int altera_cvp_probe(struct pci_dev *pdev,
 		return -ENODEV;
 	}
 
+	pci_read_config_dword(pdev, VSE_CVP_STATUS, &val32);
+	if (!(val32 & VSE_CVP_STATUS_CVP_EN)) {
+		dev_err(&pdev->dev,
+			"CVP is disabled for this device: CVP_STATUS Reg 0x%x\n",
+			val32);
+		return -ENODEV;
+	}
+
 	conf = devm_kzalloc(&pdev->dev, sizeof(*conf), GFP_KERNEL);
 	if (!conf)
 		return -ENOMEM;