diff mbox series

[v2] fpga: dfl: afu: support Rev 2 of DFL Port feature

Message ID 20240125233715.861883-1-matthew.gerlach@linux.intel.com (mailing list archive)
State New
Headers show
Series [v2] fpga: dfl: afu: support Rev 2 of DFL Port feature | expand

Commit Message

Matthew Gerlach Jan. 25, 2024, 11:37 p.m. UTC
Revision 2 of the Device Feature List (DFL) Port feature
adds support for connecting the contents of the port to
multiple PCIe Physical Functions (PF).

This new functionality requires changing the port reset
behavior during FPGA and software initialization from
revision 1 of the port feature. With revision 1, the initial
state of the logic inside the port was not guaranteed to
be valid until a port reset was performed by software during
driver initialization. With revision 2, the initial state
of the logic inside the port is guaranteed to be valid,
and a port reset is not required during driver initialization.

This change in port reset behavior avoids a potential race
condition during PCI enumeration when a port is connected to
multiple PFs. Problems can occur if the driver attached to
the PF managing the port asserts reset in its probe function
when a driver attached to another PF accesses the port in its
own probe function. The potential problems include failed or hung
transaction layer packet (TLP) transactions and invalid data
being returned.

Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>

---
v2:
- Update commit message for clarity
- Remove potentially confusing dev_info message.
---
 drivers/fpga/dfl-afu-main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Xu Yilun Jan. 30, 2024, 9:55 a.m. UTC | #1
On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
> Revision 2 of the Device Feature List (DFL) Port feature
> adds support for connecting the contents of the port to
> multiple PCIe Physical Functions (PF).
> 
> This new functionality requires changing the port reset
> behavior during FPGA and software initialization from
> revision 1 of the port feature. With revision 1, the initial
> state of the logic inside the port was not guaranteed to
> be valid until a port reset was performed by software during
> driver initialization. With revision 2, the initial state
> of the logic inside the port is guaranteed to be valid,
> and a port reset is not required during driver initialization.
> 
> This change in port reset behavior avoids a potential race
> condition during PCI enumeration when a port is connected to
> multiple PFs. Problems can occur if the driver attached to
> the PF managing the port asserts reset in its probe function
> when a driver attached to another PF accesses the port in its
> own probe function. The potential problems include failed or hung

Only racing during probe functions? I assume any time port_reset()
would fail TLPs for the other PF. And port_reset() could be triggered
at runtime by ioctl().

Thanks,
Yilun

> transaction layer packet (TLP) transactions and invalid data
> being returned.
> 
> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
> 
> ---
> v2:
> - Update commit message for clarity
> - Remove potentially confusing dev_info message.
> ---
>  drivers/fpga/dfl-afu-main.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index c0a75ca360d6..42fe27660ab7 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -417,7 +417,15 @@ static const struct attribute_group port_hdr_group = {
>  static int port_hdr_init(struct platform_device *pdev,
>  			 struct dfl_feature *feature)
>  {
> -	port_reset(pdev);
> +	void __iomem *base;
> +	u8 rev;
> +
> +	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
> +
> +	rev = dfl_feature_revision(base);
> +
> +	if (rev < 2)
> +		port_reset(pdev);
>  
>  	return 0;
>  }
> -- 
> 2.34.1
> 
>
Matthew Gerlach Jan. 30, 2024, 6 p.m. UTC | #2
On Tue, 30 Jan 2024, Xu Yilun wrote:

> On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
>> Revision 2 of the Device Feature List (DFL) Port feature
>> adds support for connecting the contents of the port to
>> multiple PCIe Physical Functions (PF).
>>
>> This new functionality requires changing the port reset
>> behavior during FPGA and software initialization from
>> revision 1 of the port feature. With revision 1, the initial
>> state of the logic inside the port was not guaranteed to
>> be valid until a port reset was performed by software during
>> driver initialization. With revision 2, the initial state
>> of the logic inside the port is guaranteed to be valid,
>> and a port reset is not required during driver initialization.
>>
>> This change in port reset behavior avoids a potential race
>> condition during PCI enumeration when a port is connected to
>> multiple PFs. Problems can occur if the driver attached to
>> the PF managing the port asserts reset in its probe function
>> when a driver attached to another PF accesses the port in its
>> own probe function. The potential problems include failed or hung
>
> Only racing during probe functions? I assume any time port_reset()
> would fail TLPs for the other PF. And port_reset() could be triggered
> at runtime by ioctl().

Yes, a port_reset() triggered by ioctl could result in failed TLP for the 
other PFs. The user space SW performing the ioctl needs to ensure all PFs 
involved are properly quiesced before the port_reset is performed.

Do you want me to update the commit message with this information?

Thanks,
Matthew

>
> Thanks,
> Yilun
>
>> transaction layer packet (TLP) transactions and invalid data
>> being returned.
>>
>> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
>>
>> ---
>> v2:
>> - Update commit message for clarity
>> - Remove potentially confusing dev_info message.
>> ---
>>  drivers/fpga/dfl-afu-main.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
>> index c0a75ca360d6..42fe27660ab7 100644
>> --- a/drivers/fpga/dfl-afu-main.c
>> +++ b/drivers/fpga/dfl-afu-main.c
>> @@ -417,7 +417,15 @@ static const struct attribute_group port_hdr_group = {
>>  static int port_hdr_init(struct platform_device *pdev,
>>  			 struct dfl_feature *feature)
>>  {
>> -	port_reset(pdev);
>> +	void __iomem *base;
>> +	u8 rev;
>> +
>> +	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
>> +
>> +	rev = dfl_feature_revision(base);
>> +
>> +	if (rev < 2)
>> +		port_reset(pdev);
>>
>>  	return 0;
>>  }
>> --
>> 2.34.1
>>
>>
>
Xu Yilun Jan. 31, 2024, 5:43 a.m. UTC | #3
On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
> 
> 
> On Tue, 30 Jan 2024, Xu Yilun wrote:
> 
> > On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
> > > Revision 2 of the Device Feature List (DFL) Port feature
> > > adds support for connecting the contents of the port to
> > > multiple PCIe Physical Functions (PF).
> > > 
> > > This new functionality requires changing the port reset
> > > behavior during FPGA and software initialization from
> > > revision 1 of the port feature. With revision 1, the initial
> > > state of the logic inside the port was not guaranteed to
> > > be valid until a port reset was performed by software during
> > > driver initialization. With revision 2, the initial state
> > > of the logic inside the port is guaranteed to be valid,
> > > and a port reset is not required during driver initialization.
> > > 
> > > This change in port reset behavior avoids a potential race
> > > condition during PCI enumeration when a port is connected to
> > > multiple PFs. Problems can occur if the driver attached to
> > > the PF managing the port asserts reset in its probe function
> > > when a driver attached to another PF accesses the port in its
> > > own probe function. The potential problems include failed or hung
> > 
> > Only racing during probe functions? I assume any time port_reset()
> > would fail TLPs for the other PF. And port_reset() could be triggered
> > at runtime by ioctl().
> 
> Yes, a port_reset() triggered by ioctl could result in failed TLP for the
> other PFs. The user space SW performing the ioctl needs to ensure all PFs
> involved are properly quiesced before the port_reset is performed.

How would user get an insight into other PF drivers to know everything
is quiesced?  I mean do we need driver level management for this?

Thanks,
Yilun

> 
> Do you want me to update the commit message with this information?
> 
> Thanks,
> Matthew
Matthew Gerlach Feb. 1, 2024, 12:26 a.m. UTC | #4
On Wed, 31 Jan 2024, Xu Yilun wrote:

> On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
>>
>>
>> On Tue, 30 Jan 2024, Xu Yilun wrote:
>>
>>> On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
>>>> Revision 2 of the Device Feature List (DFL) Port feature
>>>> adds support for connecting the contents of the port to
>>>> multiple PCIe Physical Functions (PF).
>>>>
>>>> This new functionality requires changing the port reset
>>>> behavior during FPGA and software initialization from
>>>> revision 1 of the port feature. With revision 1, the initial
>>>> state of the logic inside the port was not guaranteed to
>>>> be valid until a port reset was performed by software during
>>>> driver initialization. With revision 2, the initial state
>>>> of the logic inside the port is guaranteed to be valid,
>>>> and a port reset is not required during driver initialization.
>>>>
>>>> This change in port reset behavior avoids a potential race
>>>> condition during PCI enumeration when a port is connected to
>>>> multiple PFs. Problems can occur if the driver attached to
>>>> the PF managing the port asserts reset in its probe function
>>>> when a driver attached to another PF accesses the port in its
>>>> own probe function. The potential problems include failed or hung
>>>
>>> Only racing during probe functions? I assume any time port_reset()
>>> would fail TLPs for the other PF. And port_reset() could be triggered
>>> at runtime by ioctl().
>>
>> Yes, a port_reset() triggered by ioctl could result in failed TLP for the
>> other PFs. The user space SW performing the ioctl needs to ensure all PFs
>> involved are properly quiesced before the port_reset is performed.
>
> How would user get an insight into other PF drivers to know everything
> is quiesced?  I mean do we need driver level management for this?

Since this is an FPGA, the number of other PFs and the drivers bound to 
those PFs depends on the FPGA image. There would also be user space 
software stacks involved with the other PFs as well. The user would have 
to ensure all the SW stacks and drivers are quiesced as appropriate for the 
FPGA image. I don't think the driver performing the port_reset() can know 
all the components to be able to provide any meaningful management.

Thanks,
Matthew

>
> Thanks,
> Yilun
>
>>
>> Do you want me to update the commit message with this information?
>>
>> Thanks,
>> Matthew
>
>
Xu Yilun Feb. 5, 2024, 2:31 a.m. UTC | #5
On Wed, Jan 31, 2024 at 04:26:27PM -0800, matthew.gerlach@linux.intel.com wrote:
> 
> 
> On Wed, 31 Jan 2024, Xu Yilun wrote:
> 
> > On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
> > > 
> > > 
> > > On Tue, 30 Jan 2024, Xu Yilun wrote:
> > > 
> > > > On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
> > > > > Revision 2 of the Device Feature List (DFL) Port feature
> > > > > adds support for connecting the contents of the port to
> > > > > multiple PCIe Physical Functions (PF).
> > > > > 
> > > > > This new functionality requires changing the port reset
> > > > > behavior during FPGA and software initialization from
> > > > > revision 1 of the port feature. With revision 1, the initial
> > > > > state of the logic inside the port was not guaranteed to
> > > > > be valid until a port reset was performed by software during
> > > > > driver initialization. With revision 2, the initial state
> > > > > of the logic inside the port is guaranteed to be valid,
> > > > > and a port reset is not required during driver initialization.
> > > > > 
> > > > > This change in port reset behavior avoids a potential race
> > > > > condition during PCI enumeration when a port is connected to
> > > > > multiple PFs. Problems can occur if the driver attached to
> > > > > the PF managing the port asserts reset in its probe function
> > > > > when a driver attached to another PF accesses the port in its
> > > > > own probe function. The potential problems include failed or hung
> > > > 
> > > > Only racing during probe functions? I assume any time port_reset()
> > > > would fail TLPs for the other PF. And port_reset() could be triggered
> > > > at runtime by ioctl().
> > > 
> > > Yes, a port_reset() triggered by ioctl could result in failed TLP for the
> > > other PFs. The user space SW performing the ioctl needs to ensure all PFs
> > > involved are properly quiesced before the port_reset is performed.
> > 
> > How would user get an insight into other PF drivers to know everything
> > is quiesced?  I mean do we need driver level management for this?
> 
> Since this is an FPGA, the number of other PFs and the drivers bound to
> those PFs depends on the FPGA image. There would also be user space software
> stacks involved with the other PFs as well. The user would have to ensure
> all the SW stacks and drivers are quiesced as appropriate for the FPGA

User may not know everything about the device, they only get part of the
controls that drivers grant. This is still true for vfio + userspace
drivers.

> image. I don't think the driver performing the port_reset() can know all the

Other PF drivers should know their own components. They should be aware
that their devices are being reset. 

> components to be able to provide any meaningful management.

If the reset provider and reset consumer are not in the same driver,
they should interact with each other. IIRC, some reset controller class
works for this purpose.

Thanks,
Yilun

> 
> Thanks,
> Matthew
> 
> > 
> > Thanks,
> > Yilun
> > 
> > > 
> > > Do you want me to update the commit message with this information?
> > > 
> > > Thanks,
> > > Matthew
> > 
> >
Matthew Gerlach Feb. 7, 2024, 4:40 p.m. UTC | #6
On Mon, 5 Feb 2024, Xu Yilun wrote:

> On Wed, Jan 31, 2024 at 04:26:27PM -0800, matthew.gerlach@linux.intel.com wrote:
>>
>>
>> On Wed, 31 Jan 2024, Xu Yilun wrote:
>>
>>> On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
>>>>
>>>>
>>>> On Tue, 30 Jan 2024, Xu Yilun wrote:
>>>>
>>>>> On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
>>>>>> Revision 2 of the Device Feature List (DFL) Port feature
>>>>>> adds support for connecting the contents of the port to
>>>>>> multiple PCIe Physical Functions (PF).
>>>>>>
>>>>>> This new functionality requires changing the port reset
>>>>>> behavior during FPGA and software initialization from
>>>>>> revision 1 of the port feature. With revision 1, the initial
>>>>>> state of the logic inside the port was not guaranteed to
>>>>>> be valid until a port reset was performed by software during
>>>>>> driver initialization. With revision 2, the initial state
>>>>>> of the logic inside the port is guaranteed to be valid,
>>>>>> and a port reset is not required during driver initialization.
>>>>>>
>>>>>> This change in port reset behavior avoids a potential race
>>>>>> condition during PCI enumeration when a port is connected to
>>>>>> multiple PFs. Problems can occur if the driver attached to
>>>>>> the PF managing the port asserts reset in its probe function
>>>>>> when a driver attached to another PF accesses the port in its
>>>>>> own probe function. The potential problems include failed or hung
>>>>>
>>>>> Only racing during probe functions? I assume any time port_reset()
>>>>> would fail TLPs for the other PF. And port_reset() could be triggered
>>>>> at runtime by ioctl().
>>>>
>>>> Yes, a port_reset() triggered by ioctl could result in failed TLP for the
>>>> other PFs. The user space SW performing the ioctl needs to ensure all PFs
>>>> involved are properly quiesced before the port_reset is performed.
>>>
>>> How would user get an insight into other PF drivers to know everything
>>> is quiesced?  I mean do we need driver level management for this?
>>
>> Since this is an FPGA, the number of other PFs and the drivers bound to
>> those PFs depends on the FPGA image. There would also be user space software
>> stacks involved with the other PFs as well. The user would have to ensure
>> all the SW stacks and drivers are quiesced as appropriate for the FPGA
>
> User may not know everything about the device, they only get part of the
> controls that drivers grant. This is still true for vfio + userspace
> drivers.

A user performing a port reset would have to know the impact to the 
specific FPGA image being run in order to ensure all SW stacks are ready 
for the reset.

>
>> image. I don't think the driver performing the port_reset() can know all the
>
> Other PF drivers should know their own components. They should be aware
> that their devices are being reset.

The other PF drivers depend on the actual FPGA image being run.

>
>> components to be able to provide any meaningful management.
>
> If the reset provider and reset consumer are not in the same driver,
> they should interact with each other. IIRC, some reset controller class
> works for this purpose.

The other PFs in many cases can present as standard devices with existing 
drivers like virtio-net or virtio-blk. It does not seem desireable 
to have to change existing drivers for a particular FPGA implementation

Thanks,
Matthew
>
> Thanks,
> Yilun
>
>>
>> Thanks,
>> Matthew
>>
>>>
>>> Thanks,
>>> Yilun
>>>
>>>>
>>>> Do you want me to update the commit message with this information?
>>>>
>>>> Thanks,
>>>> Matthew
>>>
>>>
>
>
Xu Yilun Feb. 18, 2024, 3:23 a.m. UTC | #7
On Wed, Feb 07, 2024 at 08:40:55AM -0800, matthew.gerlach@linux.intel.com wrote:
> 
> 
> On Mon, 5 Feb 2024, Xu Yilun wrote:
> 
> > On Wed, Jan 31, 2024 at 04:26:27PM -0800, matthew.gerlach@linux.intel.com wrote:
> > > 
> > > 
> > > On Wed, 31 Jan 2024, Xu Yilun wrote:
> > > 
> > > > On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
> > > > > 
> > > > > 
> > > > > On Tue, 30 Jan 2024, Xu Yilun wrote:
> > > > > 
> > > > > > On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
> > > > > > > Revision 2 of the Device Feature List (DFL) Port feature
> > > > > > > adds support for connecting the contents of the port to
> > > > > > > multiple PCIe Physical Functions (PF).
> > > > > > > 
> > > > > > > This new functionality requires changing the port reset
> > > > > > > behavior during FPGA and software initialization from
> > > > > > > revision 1 of the port feature. With revision 1, the initial
> > > > > > > state of the logic inside the port was not guaranteed to
> > > > > > > be valid until a port reset was performed by software during
> > > > > > > driver initialization. With revision 2, the initial state
> > > > > > > of the logic inside the port is guaranteed to be valid,
> > > > > > > and a port reset is not required during driver initialization.
> > > > > > > 
> > > > > > > This change in port reset behavior avoids a potential race
> > > > > > > condition during PCI enumeration when a port is connected to
> > > > > > > multiple PFs. Problems can occur if the driver attached to
> > > > > > > the PF managing the port asserts reset in its probe function
> > > > > > > when a driver attached to another PF accesses the port in its
> > > > > > > own probe function. The potential problems include failed or hung
> > > > > > 
> > > > > > Only racing during probe functions? I assume any time port_reset()
> > > > > > would fail TLPs for the other PF. And port_reset() could be triggered
> > > > > > at runtime by ioctl().
> > > > > 
> > > > > Yes, a port_reset() triggered by ioctl could result in failed TLP for the
> > > > > other PFs. The user space SW performing the ioctl needs to ensure all PFs
> > > > > involved are properly quiesced before the port_reset is performed.
> > > > 
> > > > How would user get an insight into other PF drivers to know everything
> > > > is quiesced?  I mean do we need driver level management for this?
> > > 
> > > Since this is an FPGA, the number of other PFs and the drivers bound to
> > > those PFs depends on the FPGA image. There would also be user space software
> > > stacks involved with the other PFs as well. The user would have to ensure
> > > all the SW stacks and drivers are quiesced as appropriate for the FPGA
> > 
> > User may not know everything about the device, they only get part of the
> > controls that drivers grant. This is still true for vfio + userspace
> > drivers.
> 
> A user performing a port reset would have to know the impact to the specific
> FPGA image being run in order to ensure all SW stacks are ready for the
> reset.

We are not going to change the logic of the whole driver model just
because the device is backed up by an FPGA image.  The *driver* should be
fully responsible for matched devices.  A HW reset unaware to the
device driver is not wanted.  Assuming that the userspace could control
every access to device makes no sense.

For your case, there is no garantee userspace could block every access
to "other PF" initiated by "other PF" driver.  There is also no
notification to "other PF" driver that userspace is doing reset to
"other PF" via "management PF" interface.  "other PF" driver just break
on reset.

> 
> > 
> > > image. I don't think the driver performing the port_reset() can know all the
> > 
> > Other PF drivers should know their own components. They should be aware
> > that their devices are being reset.
> 
> The other PF drivers depend on the actual FPGA image being run.
> 
> > 
> > > components to be able to provide any meaningful management.
> > 
> > If the reset provider and reset consumer are not in the same driver,
> > they should interact with each other. IIRC, some reset controller class
> > works for this purpose.
> 
> The other PFs in many cases can present as standard devices with existing
> drivers like virtio-net or virtio-blk. It does not seem desireable to have
> to change existing drivers for a particular FPGA implementation

If you have to use a specific method for reset, it is not a standard virtio
pci device, and you have to make some change.

Thanks,
Yilun

> 
> Thanks,
> Matthew
> > 
> > Thanks,
> > Yilun
> > 
> > > 
> > > Thanks,
> > > Matthew
> > > 
> > > > 
> > > > Thanks,
> > > > Yilun
> > > > 
> > > > > 
> > > > > Do you want me to update the commit message with this information?
> > > > > 
> > > > > Thanks,
> > > > > Matthew
> > > > 
> > > > 
> > 
> >
Matthew Gerlach Feb. 21, 2024, 1:49 a.m. UTC | #8
On Sun, 18 Feb 2024, Xu Yilun wrote:

> On Wed, Feb 07, 2024 at 08:40:55AM -0800, matthew.gerlach@linux.intel.com wrote:
>>
>>
>> On Mon, 5 Feb 2024, Xu Yilun wrote:
>>
>>> On Wed, Jan 31, 2024 at 04:26:27PM -0800, matthew.gerlach@linux.intel.com wrote:
>>>>
>>>>
>>>> On Wed, 31 Jan 2024, Xu Yilun wrote:
>>>>
>>>>> On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, 30 Jan 2024, Xu Yilun wrote:
>>>>>>
>>>>>>> On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
>>>>>>>> Revision 2 of the Device Feature List (DFL) Port feature
>>>>>>>> adds support for connecting the contents of the port to
>>>>>>>> multiple PCIe Physical Functions (PF).
>>>>>>>>
>>>>>>>> This new functionality requires changing the port reset
>>>>>>>> behavior during FPGA and software initialization from
>>>>>>>> revision 1 of the port feature. With revision 1, the initial
>>>>>>>> state of the logic inside the port was not guaranteed to
>>>>>>>> be valid until a port reset was performed by software during
>>>>>>>> driver initialization. With revision 2, the initial state
>>>>>>>> of the logic inside the port is guaranteed to be valid,
>>>>>>>> and a port reset is not required during driver initialization.
>>>>>>>>
>>>>>>>> This change in port reset behavior avoids a potential race
>>>>>>>> condition during PCI enumeration when a port is connected to
>>>>>>>> multiple PFs. Problems can occur if the driver attached to
>>>>>>>> the PF managing the port asserts reset in its probe function
>>>>>>>> when a driver attached to another PF accesses the port in its
>>>>>>>> own probe function. The potential problems include failed or hung
>>>>>>>
>>>>>>> Only racing during probe functions? I assume any time port_reset()
>>>>>>> would fail TLPs for the other PF. And port_reset() could be triggered
>>>>>>> at runtime by ioctl().
>>>>>>
>>>>>> Yes, a port_reset() triggered by ioctl could result in failed TLP for the
>>>>>> other PFs. The user space SW performing the ioctl needs to ensure all PFs
>>>>>> involved are properly quiesced before the port_reset is performed.
>>>>>
>>>>> How would user get an insight into other PF drivers to know everything
>>>>> is quiesced?  I mean do we need driver level management for this?
>>>>
>>>> Since this is an FPGA, the number of other PFs and the drivers bound to
>>>> those PFs depends on the FPGA image. There would also be user space software
>>>> stacks involved with the other PFs as well. The user would have to ensure
>>>> all the SW stacks and drivers are quiesced as appropriate for the FPGA
>>>
>>> User may not know everything about the device, they only get part of the
>>> controls that drivers grant. This is still true for vfio + userspace
>>> drivers.
>>
>> A user performing a port reset would have to know the impact to the specific
>> FPGA image being run in order to ensure all SW stacks are ready for the
>> reset.
>
> We are not going to change the logic of the whole driver model just
> because the device is backed up by an FPGA image.  The *driver* should be
> fully responsible for matched devices.  A HW reset unaware to the
> device driver is not wanted.  Assuming that the userspace could control
> every access to device makes no sense.
>
> For your case, there is no garantee userspace could block every access
> to "other PF" initiated by "other PF" driver.  There is also no
> notification to "other PF" driver that userspace is doing reset to
> "other PF" via "management PF" interface.  "other PF" driver just break
> on reset.

Hi Yilun,

I think this conversation has gotten a little off track. This patch 
only changes the port reset behavior at driver initialization for revision 
2 of the port IP. The behavior and the requirements of port reset during 
run time have not changed. The existing implementation requires the user 
performing the port reset to ensure appropriate SW was quiesced. This 
patch does not change this requirement.

>
>>
>>>
>>>> image. I don't think the driver performing the port_reset() can know all the
>>>
>>> Other PF drivers should know their own components. They should be aware
>>> that their devices are being reset.
>>
>> The other PF drivers depend on the actual FPGA image being run.
>>
>>>
>>>> components to be able to provide any meaningful management.
>>>
>>> If the reset provider and reset consumer are not in the same driver,
>>> they should interact with each other. IIRC, some reset controller class
>>> works for this purpose.
>>
>> The other PFs in many cases can present as standard devices with existing
>> drivers like virtio-net or virtio-blk. It does not seem desireable to have
>> to change existing drivers for a particular FPGA implementation
>
> If you have to use a specific method for reset, it is not a standard virtio
> pci device, and you have to make some change.

From the perspective of the PCI PF/VF implementing the virtio or 
other standard device, the pci endpoint is completely compliant to the 
standard, and no driver changes should be required. As mentioned above, 
this patch does nothing to change any of this behavior. Consider a port 
reset that is part of a partial configuration flow. The virtio endpoint 
can become something completely different with a completely different 
driver. This patch does not affect this flow either.

Thanks,
Matthew

>
> Thanks,
> Yilun
>
>>
>> Thanks,
>> Matthew
>>>
>>> Thanks,
>>> Yilun
>>>
>>>>
>>>> Thanks,
>>>> Matthew
>>>>
>>>>>
>>>>> Thanks,
>>>>> Yilun
>>>>>
>>>>>>
>>>>>> Do you want me to update the commit message with this information?
>>>>>>
>>>>>> Thanks,
>>>>>> Matthew
>>>>>
>>>>>
>>>
>>>
>
Xu Yilun Feb. 21, 2024, 4:14 p.m. UTC | #9
On Tue, Feb 20, 2024 at 05:49:04PM -0800, matthew.gerlach@linux.intel.com wrote:
> 
> 
> On Sun, 18 Feb 2024, Xu Yilun wrote:
> 
> > On Wed, Feb 07, 2024 at 08:40:55AM -0800, matthew.gerlach@linux.intel.com wrote:
> > > 
> > > 
> > > On Mon, 5 Feb 2024, Xu Yilun wrote:
> > > 
> > > > On Wed, Jan 31, 2024 at 04:26:27PM -0800, matthew.gerlach@linux.intel.com wrote:
> > > > > 
> > > > > 
> > > > > On Wed, 31 Jan 2024, Xu Yilun wrote:
> > > > > 
> > > > > > On Tue, Jan 30, 2024 at 10:00:16AM -0800, matthew.gerlach@linux.intel.com wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, 30 Jan 2024, Xu Yilun wrote:
> > > > > > > 
> > > > > > > > On Thu, Jan 25, 2024 at 03:37:15PM -0800, Matthew Gerlach wrote:
> > > > > > > > > Revision 2 of the Device Feature List (DFL) Port feature
> > > > > > > > > adds support for connecting the contents of the port to
> > > > > > > > > multiple PCIe Physical Functions (PF).
> > > > > > > > > 
> > > > > > > > > This new functionality requires changing the port reset
> > > > > > > > > behavior during FPGA and software initialization from
> > > > > > > > > revision 1 of the port feature. With revision 1, the initial
> > > > > > > > > state of the logic inside the port was not guaranteed to
> > > > > > > > > be valid until a port reset was performed by software during
> > > > > > > > > driver initialization. With revision 2, the initial state
> > > > > > > > > of the logic inside the port is guaranteed to be valid,
> > > > > > > > > and a port reset is not required during driver initialization.
> > > > > > > > > 
> > > > > > > > > This change in port reset behavior avoids a potential race
> > > > > > > > > condition during PCI enumeration when a port is connected to
> > > > > > > > > multiple PFs. Problems can occur if the driver attached to
> > > > > > > > > the PF managing the port asserts reset in its probe function
> > > > > > > > > when a driver attached to another PF accesses the port in its
> > > > > > > > > own probe function. The potential problems include failed or hung
> > > > > > > > 
> > > > > > > > Only racing during probe functions? I assume any time port_reset()
> > > > > > > > would fail TLPs for the other PF. And port_reset() could be triggered
> > > > > > > > at runtime by ioctl().
> > > > > > > 
> > > > > > > Yes, a port_reset() triggered by ioctl could result in failed TLP for the
> > > > > > > other PFs. The user space SW performing the ioctl needs to ensure all PFs
> > > > > > > involved are properly quiesced before the port_reset is performed.
> > > > > > 
> > > > > > How would user get an insight into other PF drivers to know everything
> > > > > > is quiesced?  I mean do we need driver level management for this?
> > > > > 
> > > > > Since this is an FPGA, the number of other PFs and the drivers bound to
> > > > > those PFs depends on the FPGA image. There would also be user space software
> > > > > stacks involved with the other PFs as well. The user would have to ensure
> > > > > all the SW stacks and drivers are quiesced as appropriate for the FPGA
> > > > 
> > > > User may not know everything about the device, they only get part of the
> > > > controls that drivers grant. This is still true for vfio + userspace
> > > > drivers.
> > > 
> > > A user performing a port reset would have to know the impact to the specific
> > > FPGA image being run in order to ensure all SW stacks are ready for the
> > > reset.
> > 
> > We are not going to change the logic of the whole driver model just
> > because the device is backed up by an FPGA image.  The *driver* should be
> > fully responsible for matched devices.  A HW reset unaware to the
> > device driver is not wanted.  Assuming that the userspace could control
> > every access to device makes no sense.
> > 
> > For your case, there is no garantee userspace could block every access
> > to "other PF" initiated by "other PF" driver.  There is also no
> > notification to "other PF" driver that userspace is doing reset to
> > "other PF" via "management PF" interface.  "other PF" driver just break
> > on reset.
> 
> Hi Yilun,
> 
> I think this conversation has gotten a little off track. This patch only
> changes the port reset behavior at driver initialization for revision 2 of
> the port IP. The behavior and the requirements of port reset during run time
> have not changed. The existing implementation requires the user performing
> the port reset to ensure appropriate SW was quiesced. This patch does not
> change this requirement.

You are actually adding support to the new devices behavior defined by
revision 2.  Previously the user requires the management PF driver to do port
reset, and this only affects some logic in the PF itself and the impact could
be handled inside the driver.  The current PF driver doesn't actually do anything
because it (or any kernel component) never touches the disabled logic, and the
user accessing of the mmapped but disabled logic won't cause system issues.

But the new management PF does port reset and affect other PFs, and may
cause disorder in other drivers. so you need extra code to support the
behavior.

This patch does make some progress, as you said, "With revision 2,the
initial state of the logic inside the port is guaranteed to be valid, and
a port reset is not required during driver initialization".  But it
should not be the only patch to enable the new port reset behavior.

> 
> > 
> > > 
> > > > 
> > > > > image. I don't think the driver performing the port_reset() can know all the
> > > > 
> > > > Other PF drivers should know their own components. They should be aware
> > > > that their devices are being reset.
> > > 
> > > The other PF drivers depend on the actual FPGA image being run.
> > > 
> > > > 
> > > > > components to be able to provide any meaningful management.
> > > > 
> > > > If the reset provider and reset consumer are not in the same driver,
> > > > they should interact with each other. IIRC, some reset controller class
> > > > works for this purpose.
> > > 
> > > The other PFs in many cases can present as standard devices with existing
> > > drivers like virtio-net or virtio-blk. It does not seem desireable to have
> > > to change existing drivers for a particular FPGA implementation
> > 
> > If you have to use a specific method for reset, it is not a standard virtio
> > pci device, and you have to make some change.
> 
> From the perspective of the PCI PF/VF implementing the virtio or other
> standard device, the pci endpoint is completely compliant to the standard,
> and no driver changes should be required. As mentioned above, this patch
> does nothing to change any of this behavior. Consider a port reset that is
> part of a partial configuration flow. The virtio endpoint can become
> something completely different with a completely different driver. This

Then how could the virtio driver stop and remove itself to avoid crashing the
kernel, and how could the new driver be probed.  If the answer is still
userspace responsible for everything, that's not good to me.

Thanks,
Yilun

> patch does not affect this flow either.
> 
> Thanks,
> Matthew
> 
> > 
> > Thanks,
> > Yilun
> > 
> > > 
> > > Thanks,
> > > Matthew
> > > > 
> > > > Thanks,
> > > > Yilun
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Matthew
> > > > > 
> > > > > > 
> > > > > > Thanks,
> > > > > > Yilun
> > > > > > 
> > > > > > > 
> > > > > > > Do you want me to update the commit message with this information?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Matthew
> > > > > > 
> > > > > > 
> > > > 
> > > > 
> >
Matthew Gerlach March 1, 2024, 8:39 p.m. UTC | #10
[snip]

>>
>> Hi Yilun,
>>
>> I think this conversation has gotten a little off track. This patch only
>> changes the port reset behavior at driver initialization for revision 2 of
>> the port IP. The behavior and the requirements of port reset during run time
>> have not changed. The existing implementation requires the user performing
>> the port reset to ensure appropriate SW was quiesced. This patch does not
>> change this requirement.
>
> You are actually adding support to the new devices behavior defined by
> revision 2.  Previously the user requires the management PF driver to do port
> reset, and this only affects some logic in the PF itself and the impact could
> be handled inside the driver.  The current PF driver doesn't actually do anything
> because it (or any kernel component) never touches the disabled logic, and the
> user accessing of the mmapped but disabled logic won't cause system issues.

I'm wondering about the use case when Virtual Functions (VFs) 
are enabled with the current driver. If partial reconfiguration of the port 
occurs, which includes a reset, what happens to the user software attached 
to the VFs when the hardware has changed out from under it? While the 
current port implementation won't cause PCIe errors when the logic is 
disable, something has to notify the user software that the hardware 
changed. I think the user space software would be typically notified of 
the change by the orchestration software performing the partial 
reconfiguration. I think the use case of VFs with the original port 
implementation is logically equivalent to the new port implementation with 
multiple PFs/VFs.

>
> But the new management PF does port reset and affect other PFs, and may
> cause disorder in other drivers. so you need extra code to support the
> behavior.

I think the real problem is that the new port is not doing as good of a 
job preventing PCIe errors during port reset as the previous version.

Matthew
>
> This patch does make some progress, as you said, "With revision 2,the
> initial state of the logic inside the port is guaranteed to be valid, and
> a port reset is not required during driver initialization".  But it
> should not be the only patch to enable the new port reset behavior.
>
>>
>>>
>>>>
>>>>>
>>>>>> image. I don't think the driver performing the port_reset() can know all the
>>>>>
>>>>> Other PF drivers should know their own components. They should be aware
>>>>> that their devices are being reset.
>>>>
>>>> The other PF drivers depend on the actual FPGA image being run.
>>>>
>>>>>
>>>>>> components to be able to provide any meaningful management.
>>>>>
>>>>> If the reset provider and reset consumer are not in the same driver,
>>>>> they should interact with each other. IIRC, some reset controller class
>>>>> works for this purpose.
>>>>
>>>> The other PFs in many cases can present as standard devices with existing
>>>> drivers like virtio-net or virtio-blk. It does not seem desireable to have
>>>> to change existing drivers for a particular FPGA implementation
>>>
>>> If you have to use a specific method for reset, it is not a standard virtio
>>> pci device, and you have to make some change.
>>
>> From the perspective of the PCI PF/VF implementing the virtio or other
>> standard device, the pci endpoint is completely compliant to the standard,
>> and no driver changes should be required. As mentioned above, this patch
>> does nothing to change any of this behavior. Consider a port reset that is
>> part of a partial configuration flow. The virtio endpoint can become
>> something completely different with a completely different driver. This
>
> Then how could the virtio driver stop and remove itself to avoid crashing the
> kernel, and how could the new driver be probed.  If the answer is still
> userspace responsible for everything, that's not good to me.
>
> Thanks,
> Yilun
>
>> patch does not affect this flow either.
>>
>> Thanks,
>> Matthew
>>
>>>
>>> Thanks,
>>> Yilun
>>>
>>>>
>>>> Thanks,
>>>> Matthew
>>>>>
>>>>> Thanks,
>>>>> Yilun
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Matthew
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Yilun
>>>>>>>
>>>>>>>>
>>>>>>>> Do you want me to update the commit message with this information?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Matthew
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>
Xu Yilun March 2, 2024, 11:39 a.m. UTC | #11
On Fri, Mar 01, 2024 at 12:39:24PM -0800, matthew.gerlach@linux.intel.com wrote:
> 
> 
> [snip]
> 
> > > 
> > > Hi Yilun,
> > > 
> > > I think this conversation has gotten a little off track. This patch only
> > > changes the port reset behavior at driver initialization for revision 2 of
> > > the port IP. The behavior and the requirements of port reset during run time
> > > have not changed. The existing implementation requires the user performing
> > > the port reset to ensure appropriate SW was quiesced. This patch does not
> > > change this requirement.
> > 
> > You are actually adding support to the new devices behavior defined by
> > revision 2.  Previously the user requires the management PF driver to do port
> > reset, and this only affects some logic in the PF itself and the impact could
> > be handled inside the driver.  The current PF driver doesn't actually do anything
> > because it (or any kernel component) never touches the disabled logic, and the
> > user accessing of the mmapped but disabled logic won't cause system issues.
> 
> I'm wondering about the use case when Virtual Functions (VFs) are enabled
> with the current driver. If partial reconfiguration of the port occurs,
> which includes a reset, what happens to the user software attached to the
> VFs when the hardware has changed out from under it? While the current port

Ah, yes. That may be the problem for user, you can fix it.  But the fix
could actually be easier, cause the port driver is fully aware of the
reset request and the reset flow is under its control.

> implementation won't cause PCIe errors when the logic is disable, something
> has to notify the user software that the hardware changed. I think the user
> space software would be typically notified of the change by the
> orchestration software performing the partial reconfiguration. I think the
> use case of VFs with the original port implementation is logically
> equivalent to the new port implementation with multiple PFs/VFs.

The difference is that, the original port driver is fully responsible for the
reset flow and the AFU mapping, while the new port implementation has no
control to the to-be-resetted logic, it blindly does the reset.

> 
> > 
> > But the new management PF does port reset and affect other PFs, and may
> > cause disorder in other drivers. so you need extra code to support the
> > behavior.
> 
> I think the real problem is that the new port is not doing as good of a job
> preventing PCIe errors during port reset as the previous version.

Even if the reset won't cause PCIe errors, the other PF/VF drivers may
still meet HW access issues not understandable by themselves and cause
kernel level panic.

Thanks,
Yilun

> 
> Matthew
> > 
> > This patch does make some progress, as you said, "With revision 2,the
> > initial state of the logic inside the port is guaranteed to be valid, and
> > a port reset is not required during driver initialization".  But it
> > should not be the only patch to enable the new port reset behavior.
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > image. I don't think the driver performing the port_reset() can know all the
> > > > > > 
> > > > > > Other PF drivers should know their own components. They should be aware
> > > > > > that their devices are being reset.
> > > > > 
> > > > > The other PF drivers depend on the actual FPGA image being run.
> > > > > 
> > > > > > 
> > > > > > > components to be able to provide any meaningful management.
> > > > > > 
> > > > > > If the reset provider and reset consumer are not in the same driver,
> > > > > > they should interact with each other. IIRC, some reset controller class
> > > > > > works for this purpose.
> > > > > 
> > > > > The other PFs in many cases can present as standard devices with existing
> > > > > drivers like virtio-net or virtio-blk. It does not seem desireable to have
> > > > > to change existing drivers for a particular FPGA implementation
> > > > 
> > > > If you have to use a specific method for reset, it is not a standard virtio
> > > > pci device, and you have to make some change.
> > > 
> > > From the perspective of the PCI PF/VF implementing the virtio or other
> > > standard device, the pci endpoint is completely compliant to the standard,
> > > and no driver changes should be required. As mentioned above, this patch
> > > does nothing to change any of this behavior. Consider a port reset that is
> > > part of a partial configuration flow. The virtio endpoint can become
> > > something completely different with a completely different driver. This
> > 
> > Then how could the virtio driver stop and remove itself to avoid crashing the
> > kernel, and how could the new driver be probed.  If the answer is still
> > userspace responsible for everything, that's not good to me.
> > 
> > Thanks,
> > Yilun
> > 
> > > patch does not affect this flow either.
> > > 
> > > Thanks,
> > > Matthew
> > > 
> > > > 
> > > > Thanks,
> > > > Yilun
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Matthew
> > > > > > 
> > > > > > Thanks,
> > > > > > Yilun
> > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Matthew
> > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Yilun
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Do you want me to update the commit message with this information?
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Matthew
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > > 
> > > > 
> >
diff mbox series

Patch

diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index c0a75ca360d6..42fe27660ab7 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -417,7 +417,15 @@  static const struct attribute_group port_hdr_group = {
 static int port_hdr_init(struct platform_device *pdev,
 			 struct dfl_feature *feature)
 {
-	port_reset(pdev);
+	void __iomem *base;
+	u8 rev;
+
+	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
+
+	rev = dfl_feature_revision(base);
+
+	if (rev < 2)
+		port_reset(pdev);
 
 	return 0;
 }