diff mbox

PCI: ACPI: Fix ThunderX PEM initialization

Message ID 1485793552-12614-1-git-send-email-Vadim.Lomovtsev@caviumnetworks.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vadim Lomovtsev Jan. 30, 2017, 4:25 p.m. UTC
This patch is to address PEM initialization issue
which causes network issues.

It is necessary to search for _HID:PNP0A08 while requesting
PEM resources via ACPI instead of "THRX0002".

Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
---
 drivers/pci/host/pci-thunder-pem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Daney Jan. 30, 2017, 5:39 p.m. UTC | #1
On 01/30/2017 08:25 AM, Vadim Lomovtsev wrote:
> This patch is to address PEM initialization issue
> which causes network issues.
>
> It is necessary to search for _HID:PNP0A08 while requesting
> PEM resources via ACPI instead of "THRX0002".

Since this is fixing new code, there should be no chance to break 
previously working systems.

Bjorn: If possible, this should be merged for v4.10

>
> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>

Acked-by: David Daney <david.daney@cavium.com>


> ---
>  drivers/pci/host/pci-thunder-pem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> index af722eb..aec30b8 100644
> --- a/drivers/pci/host/pci-thunder-pem.c
> +++ b/drivers/pci/host/pci-thunder-pem.c
> @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
>  	if (!res_pem)
>  		return -ENOMEM;
>
> -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
>  	if (ret) {
>  		dev_err(dev, "can't get rc base address\n");
>  		return ret;
>
Bjorn Helgaas Jan. 30, 2017, 9:12 p.m. UTC | #2
Hi Vadim,

On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> This patch is to address PEM initialization issue
> which causes network issues.
> 
> It is necessary to search for _HID:PNP0A08 while requesting
> PEM resources via ACPI instead of "THRX0002".
> 
> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> ---
>  drivers/pci/host/pci-thunder-pem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> index af722eb..aec30b8 100644
> --- a/drivers/pci/host/pci-thunder-pem.c
> +++ b/drivers/pci/host/pci-thunder-pem.c
> @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
>  	if (!res_pem)
>  		return -ENOMEM;
>  
> -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);

This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
guarantee that if we find a PNP0A08 device, it is a ThunderX device.

I think the only way to call thunder_pem_acpi_init() is via an MCFG
quirk that mentions thunder_pem_ecam_ops, which means we only call it
if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
probably safe in that sense.

But it's an abuse of the ACPI _HID model.  If you match a device using
PNP0A08, all you can assume about it is that it uses the generic
PNP0A08 programming model, and I don't think that includes "the first
memory resource in _CRS contains ECAM space and MSI-X tables."

I expect this is a teething issue because you have firmware in the
field that uses PNP0A08 and it's not feasible to update it.  If that's
the case, the changelog should have details about it and we should
have a comment in the code, because I don't think this is the model we
want to end up with in future releases.

>  	if (ret) {
>  		dev_err(dev, "can't get rc base address\n");
>  		return ret;
> -- 
> 2.4.11
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Vadim Lomovtsev Jan. 31, 2017, 10:28 a.m. UTC | #3
Hi Bjorn,

On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> Hi Vadim,
> 
> On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > This patch is to address PEM initialization issue
> > which causes network issues.
> > 
> > It is necessary to search for _HID:PNP0A08 while requesting
> > PEM resources via ACPI instead of "THRX0002".
> > 
> > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > ---
> >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > index af722eb..aec30b8 100644
> > --- a/drivers/pci/host/pci-thunder-pem.c
> > +++ b/drivers/pci/host/pci-thunder-pem.c
> > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> >  	if (!res_pem)
> >  		return -ENOMEM;
> >  
> > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> 
> This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> 
> I think the only way to call thunder_pem_acpi_init() is via an MCFG
> quirk that mentions thunder_pem_ecam_ops, which means we only call it
> if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> probably safe in that sense.

Agree, it is not the best solution.
We will implement such approach and send for review. 

> 
> But it's an abuse of the ACPI _HID model.  If you match a device using
> PNP0A08, all you can assume about it is that it uses the generic
> PNP0A08 programming model, and I don't think that includes "the first
> memory resource in _CRS contains ECAM space and MSI-X tables."
> 
> I expect this is a teething issue because you have firmware in the
> field that uses PNP0A08 and it's not feasible to update it.  If that's
> the case, the changelog should have details about it and we should
> have a comment in the code, because I don't think this is the model we
> want to end up with in future releases.

It could become so. However, for now I didn't get any reports on that,
(may be I miss something) except some internal emailings.
At my testing HW I was able to see some issues related to acpi-PEM stuff.

Thanks for feed-back, we will prepare another patch or patchset
implementing approach you've highlighted.

> >  	if (ret) {
> >  		dev_err(dev, "can't get rc base address\n");
> >  		return ret;
> > -- 
> > 2.4.11
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Bjorn Helgaas Jan. 31, 2017, 2:25 p.m. UTC | #4
On Tue, Jan 31, 2017 at 02:28:30AM -0800, Vadim Lomovtsev wrote:
> Hi Bjorn,
> 
> On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> > Hi Vadim,
> > 
> > On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > > This patch is to address PEM initialization issue
> > > which causes network issues.
> > > 
> > > It is necessary to search for _HID:PNP0A08 while requesting
> > > PEM resources via ACPI instead of "THRX0002".
> > > 
> > > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > > ---
> > >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > > index af722eb..aec30b8 100644
> > > --- a/drivers/pci/host/pci-thunder-pem.c
> > > +++ b/drivers/pci/host/pci-thunder-pem.c
> > > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> > >  	if (!res_pem)
> > >  		return -ENOMEM;
> > >  
> > > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> > 
> > This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> > guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> > 
> > I think the only way to call thunder_pem_acpi_init() is via an MCFG
> > quirk that mentions thunder_pem_ecam_ops, which means we only call it
> > if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> > probably safe in that sense.
> 
> Agree, it is not the best solution.
> We will implement such approach and send for review. 
> 
> > 
> > But it's an abuse of the ACPI _HID model.  If you match a device using
> > PNP0A08, all you can assume about it is that it uses the generic
> > PNP0A08 programming model, and I don't think that includes "the first
> > memory resource in _CRS contains ECAM space and MSI-X tables."
> > 
> > I expect this is a teething issue because you have firmware in the
> > field that uses PNP0A08 and it's not feasible to update it.  If that's
> > the case, the changelog should have details about it and we should
> > have a comment in the code, because I don't think this is the model we
> > want to end up with in future releases.
> 
> It could become so. However, for now I didn't get any reports on that,
> (may be I miss something) except some internal emailings.
> At my testing HW I was able to see some issues related to acpi-PEM stuff.
> 
> Thanks for feed-back, we will prepare another patch or patchset
> implementing approach you've highlighted.

The approach I would like best is to search for THRX0002, because then
you know you have a ThunderX PEM device, and you can assume
device-specific details about its _CRS.

But that's what the existing code does, and apparently that doesn't
work.  My guess is that you have firmware in the field where the host
bridge has only PNP0A08 (and maybe PNP0A03) as device IDs.

Per spec, the OS can only assume the generic PCI host bridge
programming model in that case.  It can't use any device-specific
features, like the register space you're extracting here, since
there's no way to tell what specific device you have.

If the OS needs to use device-specific features, there must be a
device-specific _HID, i.e., THRX0002.  This is the important thing to
get right in the future.

If existing firmware in the field has no device-specific _HID, we
might need something like this as a quirk to work around that firmware
deficiency.  If that's the case, all I'm really asking for is:

  - Some indication that you have a plan to change the firmware
    strategy so future releases don't require similar quirks, and

  - Some comments in the code pointing out that this is a workaround
    for a firmware deficiency.

As I mentioned, I think this change should be safe because we only run
this code if we find a CAVIUM THUNDERX MCFG table, and we search for a
PNP0A08 device that matches the segment from MCFG.

The only way we could accidentally match the wrong device would be if
we had another bridge in the same segment.  Maybe we should have made
acpi_get_rc_resources() match on the bus range as well as the segment.
But that feels like just a "pedantically correct" sort of thing and
probably over-engineering for this situation; I don't think it would
solve any current problem.

I know you'd like to see this in v4.10, so we need to sort this out
ASAP.

Bjorn
Vadim Lomovtsev Jan. 31, 2017, 2:57 p.m. UTC | #5
On Tue, Jan 31, 2017 at 08:25:25AM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 31, 2017 at 02:28:30AM -0800, Vadim Lomovtsev wrote:
> > Hi Bjorn,
> > 
> > On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> > > Hi Vadim,
> > > 
> > > On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > > > This patch is to address PEM initialization issue
> > > > which causes network issues.
> > > > 
> > > > It is necessary to search for _HID:PNP0A08 while requesting
> > > > PEM resources via ACPI instead of "THRX0002".
> > > > 
> > > > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > > > ---
> > > >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > > > index af722eb..aec30b8 100644
> > > > --- a/drivers/pci/host/pci-thunder-pem.c
> > > > +++ b/drivers/pci/host/pci-thunder-pem.c
> > > > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> > > >  	if (!res_pem)
> > > >  		return -ENOMEM;
> > > >  
> > > > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > > > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> > > 
> > > This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> > > guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> > > 
> > > I think the only way to call thunder_pem_acpi_init() is via an MCFG
> > > quirk that mentions thunder_pem_ecam_ops, which means we only call it
> > > if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> > > probably safe in that sense.
> > 
> > Agree, it is not the best solution.
> > We will implement such approach and send for review. 
> > 
> > > 
> > > But it's an abuse of the ACPI _HID model.  If you match a device using
> > > PNP0A08, all you can assume about it is that it uses the generic
> > > PNP0A08 programming model, and I don't think that includes "the first
> > > memory resource in _CRS contains ECAM space and MSI-X tables."
> > > 
> > > I expect this is a teething issue because you have firmware in the
> > > field that uses PNP0A08 and it's not feasible to update it.  If that's
> > > the case, the changelog should have details about it and we should
> > > have a comment in the code, because I don't think this is the model we
> > > want to end up with in future releases.
> > 
> > It could become so. However, for now I didn't get any reports on that,
> > (may be I miss something) except some internal emailings.
> > At my testing HW I was able to see some issues related to acpi-PEM stuff.
> > 
> > Thanks for feed-back, we will prepare another patch or patchset
> > implementing approach you've highlighted.
> 
> The approach I would like best is to search for THRX0002, because then
> you know you have a ThunderX PEM device, and you can assume
> device-specific details about its _CRS.
>
> But that's what the existing code does, and apparently that doesn't
> work.  My guess is that you have firmware in the field where the host
> bridge has only PNP0A08 (and maybe PNP0A03) as device IDs.

Because there is no such ACPI ID as "THRX0002" registered
(http://www.uefi.org/acpi_id_list).

From the other hand we may gather device resources through
_CRS (we have acpi_device already set by this moment) but
we need to be sure that we're running at Cavium ThunderX board then.

To do that we may add _SUB value (accrodingly to spec) with
exact ID string (a full PCI ID value "177DA22D" is suggested).
This will eventually become the main method of finding out
whether we run on a ThunderX PEM.

However the current FW have no such string and we
need to handle this case also.

> Per spec, the OS can only assume the generic PCI host bridge
> programming model in that case.  It can't use any device-specific
> features, like the register space you're extracting here, since
> there's no way to tell what specific device you have.
> 
> If the OS needs to use device-specific features, there must be a
> device-specific _HID, i.e., THRX0002.  This is the important thing to
> get right in the future.
> 
> If existing firmware in the field has no device-specific _HID, we

I'm afraid not.

> might need something like this as a quirk to work around that firmware
> deficiency.  If that's the case, all I'm really asking for is:
> 
>   - Some indication that you have a plan to change the firmware
>     strategy so future releases don't require similar quirks, and

I believe we have some work at FW level to get this things done
accordingly, but we have HW in the field and this could become
critical.

>   - Some comments in the code pointing out that this is a workaround
>     for a firmware deficiency.
> 
> As I mentioned, I think this change should be safe because we only run
> this code if we find a CAVIUM THUNDERX MCFG table, and we search for a
> PNP0A08 device that matches the segment from MCFG.

However, the patch is not in good shape (was done in rush to be honest)
so I prepare another one shortly, addressing your comments.

> The only way we could accidentally match the wrong device would be if
> we had another bridge in the same segment.  Maybe we should have made
> acpi_get_rc_resources() match on the bus range as well as the segment.
> But that feels like just a "pedantically correct" sort of thing and
> probably over-engineering for this situation; I don't think it would
> solve any current problem.
> 
> I know you'd like to see this in v4.10, so we need to sort this out
> ASAP.
> 
> Bjorn
Bjorn Helgaas Jan. 31, 2017, 8:31 p.m. UTC | #6
On Tue, Jan 31, 2017 at 06:57:20AM -0800, Vadim Lomovtsev wrote:
> On Tue, Jan 31, 2017 at 08:25:25AM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 31, 2017 at 02:28:30AM -0800, Vadim Lomovtsev wrote:
> > > Hi Bjorn,
> > > 
> > > On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> > > > Hi Vadim,
> > > > 
> > > > On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > > > > This patch is to address PEM initialization issue
> > > > > which causes network issues.
> > > > > 
> > > > > It is necessary to search for _HID:PNP0A08 while requesting
> > > > > PEM resources via ACPI instead of "THRX0002".
> > > > > 
> > > > > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > > > > ---
> > > > >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > > > > index af722eb..aec30b8 100644
> > > > > --- a/drivers/pci/host/pci-thunder-pem.c
> > > > > +++ b/drivers/pci/host/pci-thunder-pem.c
> > > > > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> > > > >  	if (!res_pem)
> > > > >  		return -ENOMEM;
> > > > >  
> > > > > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > > > > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> > > > 
> > > > This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> > > > guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> > > > 
> > > > I think the only way to call thunder_pem_acpi_init() is via an MCFG
> > > > quirk that mentions thunder_pem_ecam_ops, which means we only call it
> > > > if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> > > > probably safe in that sense.
> > > 
> > > Agree, it is not the best solution.
> > > We will implement such approach and send for review. 
> > > 
> > > > 
> > > > But it's an abuse of the ACPI _HID model.  If you match a device using
> > > > PNP0A08, all you can assume about it is that it uses the generic
> > > > PNP0A08 programming model, and I don't think that includes "the first
> > > > memory resource in _CRS contains ECAM space and MSI-X tables."
> > > > 
> > > > I expect this is a teething issue because you have firmware in the
> > > > field that uses PNP0A08 and it's not feasible to update it.  If that's
> > > > the case, the changelog should have details about it and we should
> > > > have a comment in the code, because I don't think this is the model we
> > > > want to end up with in future releases.
> > > 
> > > It could become so. However, for now I didn't get any reports on that,
> > > (may be I miss something) except some internal emailings.
> > > At my testing HW I was able to see some issues related to acpi-PEM stuff.
> > > 
> > > Thanks for feed-back, we will prepare another patch or patchset
> > > implementing approach you've highlighted.
> > 
> > The approach I would like best is to search for THRX0002, because then
> > you know you have a ThunderX PEM device, and you can assume
> > device-specific details about its _CRS.
> >
> > But that's what the existing code does, and apparently that doesn't
> > work.  My guess is that you have firmware in the field where the host
> > bridge has only PNP0A08 (and maybe PNP0A03) as device IDs.
> 
> Because there is no such ACPI ID as "THRX0002" registered
> (http://www.uefi.org/acpi_id_list).

To be pedantically correct, I think you want "THRX" registered.  Then
you can manage the "0002" part internally without registering each
individual device.

> From the other hand we may gather device resources through
> _CRS (we have acpi_device already set by this moment) but
> we need to be sure that we're running at Cavium ThunderX board then.
> 
> To do that we may add _SUB value (accrodingly to spec) with
> exact ID string (a full PCI ID value "177DA22D" is suggested).
> This will eventually become the main method of finding out
> whether we run on a ThunderX PEM.

I don't understand why you would use _SUB.  I think the correct way to
handle this is to use a _HID of "THRX0002" with a _CID of "PNP0A08".
Is there some existing _SUB usage you're using as an example?

Bjorn
Vadim Lomovtsev Feb. 1, 2017, 12:53 p.m. UTC | #7
On Tue, Jan 31, 2017 at 02:31:09PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 31, 2017 at 06:57:20AM -0800, Vadim Lomovtsev wrote:
> > On Tue, Jan 31, 2017 at 08:25:25AM -0600, Bjorn Helgaas wrote:
> > > On Tue, Jan 31, 2017 at 02:28:30AM -0800, Vadim Lomovtsev wrote:
> > > > Hi Bjorn,
> > > > 
> > > > On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> > > > > Hi Vadim,
> > > > > 
> > > > > On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > > > > > This patch is to address PEM initialization issue
> > > > > > which causes network issues.
> > > > > > 
> > > > > > It is necessary to search for _HID:PNP0A08 while requesting
> > > > > > PEM resources via ACPI instead of "THRX0002".
> > > > > > 
> > > > > > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > > > > > ---
> > > > > >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > > > > > index af722eb..aec30b8 100644
> > > > > > --- a/drivers/pci/host/pci-thunder-pem.c
> > > > > > +++ b/drivers/pci/host/pci-thunder-pem.c
> > > > > > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> > > > > >  	if (!res_pem)
> > > > > >  		return -ENOMEM;
> > > > > >  
> > > > > > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > > > > > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> > > > > 
> > > > > This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> > > > > guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> > > > > 
> > > > > I think the only way to call thunder_pem_acpi_init() is via an MCFG
> > > > > quirk that mentions thunder_pem_ecam_ops, which means we only call it
> > > > > if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> > > > > probably safe in that sense.
> > > > 
> > > > Agree, it is not the best solution.
> > > > We will implement such approach and send for review. 
> > > > 
> > > > > 
> > > > > But it's an abuse of the ACPI _HID model.  If you match a device using
> > > > > PNP0A08, all you can assume about it is that it uses the generic
> > > > > PNP0A08 programming model, and I don't think that includes "the first
> > > > > memory resource in _CRS contains ECAM space and MSI-X tables."
> > > > > 
> > > > > I expect this is a teething issue because you have firmware in the
> > > > > field that uses PNP0A08 and it's not feasible to update it.  If that's
> > > > > the case, the changelog should have details about it and we should
> > > > > have a comment in the code, because I don't think this is the model we
> > > > > want to end up with in future releases.
> > > > 
> > > > It could become so. However, for now I didn't get any reports on that,
> > > > (may be I miss something) except some internal emailings.
> > > > At my testing HW I was able to see some issues related to acpi-PEM stuff.
> > > > 
> > > > Thanks for feed-back, we will prepare another patch or patchset
> > > > implementing approach you've highlighted.
> > > 
> > > The approach I would like best is to search for THRX0002, because then
> > > you know you have a ThunderX PEM device, and you can assume
> > > device-specific details about its _CRS.
> > >
> > > But that's what the existing code does, and apparently that doesn't
> > > work.  My guess is that you have firmware in the field where the host
> > > bridge has only PNP0A08 (and maybe PNP0A03) as device IDs.
> > 
> > Because there is no such ACPI ID as "THRX0002" registered
> > (http://www.uefi.org/acpi_id_list).
> 
> To be pedantically correct, I think you want "THRX" registered.  Then
> you can manage the "0002" part internally without registering each
> individual device.

Not sure if it would be registered that way, because (AFAIK)
it expected to be string constructed from Vendor ID (not the Product ID) plus
four hex digit manged internaly. So we suggest to change it to 177DXXXX
which corresponds to Cavium PCI ID https://pci-ids.ucw.cz/pci.ids.
It's also possible to use the 3-digit PNP ID, "CAV", to construct these
_HID/_CID/_SUB values (http://www.uefi.org/pnp_id_list).

So the FW will be updated accordingly.

For old FW we'll implement fallback scenario gathering resource info
via _CRS object (comments will be provided in the code also).

> 
> > From the other hand we may gather device resources through
> > _CRS (we have acpi_device already set by this moment) but
> > we need to be sure that we're running at Cavium ThunderX board then.
> > 
> > To do that we may add _SUB value (accrodingly to spec) with
> > exact ID string (a full PCI ID value "177DA22D" is suggested).
> > This will eventually become the main method of finding out
> > whether we run on a ThunderX PEM.
> 
> I don't understand why you would use _SUB.  I think the correct way to
> handle this is to use a _HID of "THRX0002" with a _CID of "PNP0A08".
> Is there some existing _SUB usage you're using as an example?
>

Here we were talking about extra quirk to match PEM and ECAM and thus make
a decision. This object could also store some vendor-specific IDs.
To be honest I didn't see usage in kernel (didn't search), but ACPI Spec
provide an example as (p 6.1.9) :
  Name (_SUB, "MSFT3000")// Vendor-defined subsystem

However this is still debatable and we probably stick to _HID while
implementing additional match functions.


> Bjorn
Bjorn Helgaas Feb. 1, 2017, 3:18 p.m. UTC | #8
On Wed, Feb 01, 2017 at 04:53:25AM -0800, Vadim Lomovtsev wrote:
> On Tue, Jan 31, 2017 at 02:31:09PM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 31, 2017 at 06:57:20AM -0800, Vadim Lomovtsev wrote:
> > > On Tue, Jan 31, 2017 at 08:25:25AM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Jan 31, 2017 at 02:28:30AM -0800, Vadim Lomovtsev wrote:
> > > > > Hi Bjorn,
> > > > > 
> > > > > On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> > > > > > Hi Vadim,
> > > > > > 
> > > > > > On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > > > > > > This patch is to address PEM initialization issue
> > > > > > > which causes network issues.
> > > > > > > 
> > > > > > > It is necessary to search for _HID:PNP0A08 while requesting
> > > > > > > PEM resources via ACPI instead of "THRX0002".
> > > > > > > 
> > > > > > > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > > > > > > ---
> > > > > > >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > > > > > > index af722eb..aec30b8 100644
> > > > > > > --- a/drivers/pci/host/pci-thunder-pem.c
> > > > > > > +++ b/drivers/pci/host/pci-thunder-pem.c
> > > > > > > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> > > > > > >  	if (!res_pem)
> > > > > > >  		return -ENOMEM;
> > > > > > >  
> > > > > > > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > > > > > > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> > > > > > 
> > > > > > This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> > > > > > guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> > > > > > 
> > > > > > I think the only way to call thunder_pem_acpi_init() is via an MCFG
> > > > > > quirk that mentions thunder_pem_ecam_ops, which means we only call it
> > > > > > if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> > > > > > probably safe in that sense.
> > > > > 
> > > > > Agree, it is not the best solution.
> > > > > We will implement such approach and send for review. 
> > > > > 
> > > > > > 
> > > > > > But it's an abuse of the ACPI _HID model.  If you match a device using
> > > > > > PNP0A08, all you can assume about it is that it uses the generic
> > > > > > PNP0A08 programming model, and I don't think that includes "the first
> > > > > > memory resource in _CRS contains ECAM space and MSI-X tables."
> > > > > > 
> > > > > > I expect this is a teething issue because you have firmware in the
> > > > > > field that uses PNP0A08 and it's not feasible to update it.  If that's
> > > > > > the case, the changelog should have details about it and we should
> > > > > > have a comment in the code, because I don't think this is the model we
> > > > > > want to end up with in future releases.
> > > > > 
> > > > > It could become so. However, for now I didn't get any reports on that,
> > > > > (may be I miss something) except some internal emailings.
> > > > > At my testing HW I was able to see some issues related to acpi-PEM stuff.
> > > > > 
> > > > > Thanks for feed-back, we will prepare another patch or patchset
> > > > > implementing approach you've highlighted.
> > > > 
> > > > The approach I would like best is to search for THRX0002, because then
> > > > you know you have a ThunderX PEM device, and you can assume
> > > > device-specific details about its _CRS.
> > > >
> > > > But that's what the existing code does, and apparently that doesn't
> > > > work.  My guess is that you have firmware in the field where the host
> > > > bridge has only PNP0A08 (and maybe PNP0A03) as device IDs.
> > > 
> > > Because there is no such ACPI ID as "THRX0002" registered
> > > (http://www.uefi.org/acpi_id_list).
> > 
> > To be pedantically correct, I think you want "THRX" registered.  Then
> > you can manage the "0002" part internally without registering each
> > individual device.
> 
> Not sure if it would be registered that way, because (AFAIK)
> it expected to be string constructed from Vendor ID (not the Product ID) plus
> four hex digit manged internaly. So we suggest to change it to 177DXXXX
> which corresponds to Cavium PCI ID https://pci-ids.ucw.cz/pci.ids.
> It's also possible to use the 3-digit PNP ID, "CAV", to construct these
> _HID/_CID/_SUB values (http://www.uefi.org/pnp_id_list).

My point was that you only need to register the prefix ("CAV" or
"THRX") of the PNP or ACPI ID.  Then you manage the suffixes
internally.  You as long as you register "CAV" or "THRX", you can
assign and use "THRX0002" yourself without registering that
specifically.

> So the FW will be updated accordingly.
> 
> For old FW we'll implement fallback scenario gathering resource info
> via _CRS object (comments will be provided in the code also).
> 
> > 
> > > From the other hand we may gather device resources through
> > > _CRS (we have acpi_device already set by this moment) but
> > > we need to be sure that we're running at Cavium ThunderX board then.
> > > 
> > > To do that we may add _SUB value (accrodingly to spec) with
> > > exact ID string (a full PCI ID value "177DA22D" is suggested).
> > > This will eventually become the main method of finding out
> > > whether we run on a ThunderX PEM.
> > 
> > I don't understand why you would use _SUB.  I think the correct way to
> > handle this is to use a _HID of "THRX0002" with a _CID of "PNP0A08".
> > Is there some existing _SUB usage you're using as an example?
> 
> Here we were talking about extra quirk to match PEM and ECAM and thus make
> a decision. This object could also store some vendor-specific IDs.
> To be honest I didn't see usage in kernel (didn't search), but ACPI Spec
> provide an example as (p 6.1.9) :
>   Name (_SUB, "MSFT3000")// Vendor-defined subsystem
> 
> However this is still debatable and we probably stick to _HID while
> implementing additional match functions.

If you want a driver to bind to a specific device, e.g., the ThunderX
PCI host bridge, the firmware should use a unique _HID in the ACPI
namespace and the driver should claim that _HID.

It's conceivable that you could use _SUB, but that's needlessly
different from everything else.

Bjorn
Vadim Lomovtsev Feb. 1, 2017, 3:34 p.m. UTC | #9
On Wed, Feb 01, 2017 at 09:18:07AM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 01, 2017 at 04:53:25AM -0800, Vadim Lomovtsev wrote:
> > On Tue, Jan 31, 2017 at 02:31:09PM -0600, Bjorn Helgaas wrote:
> > > On Tue, Jan 31, 2017 at 06:57:20AM -0800, Vadim Lomovtsev wrote:
> > > > On Tue, Jan 31, 2017 at 08:25:25AM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Jan 31, 2017 at 02:28:30AM -0800, Vadim Lomovtsev wrote:
> > > > > > Hi Bjorn,
> > > > > > 
> > > > > > On Mon, Jan 30, 2017 at 03:12:37PM -0600, Bjorn Helgaas wrote:
> > > > > > > Hi Vadim,
> > > > > > > 
> > > > > > > On Mon, Jan 30, 2017 at 08:25:52AM -0800, Vadim Lomovtsev wrote:
> > > > > > > > This patch is to address PEM initialization issue
> > > > > > > > which causes network issues.
> > > > > > > > 
> > > > > > > > It is necessary to search for _HID:PNP0A08 while requesting
> > > > > > > > PEM resources via ACPI instead of "THRX0002".
> > > > > > > > 
> > > > > > > > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> > > > > > > > ---
> > > > > > > >  drivers/pci/host/pci-thunder-pem.c | 2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
> > > > > > > > index af722eb..aec30b8 100644
> > > > > > > > --- a/drivers/pci/host/pci-thunder-pem.c
> > > > > > > > +++ b/drivers/pci/host/pci-thunder-pem.c
> > > > > > > > @@ -331,7 +331,7 @@ static int thunder_pem_acpi_init(struct pci_config_window *cfg)
> > > > > > > >  	if (!res_pem)
> > > > > > > >  		return -ENOMEM;
> > > > > > > >  
> > > > > > > > -	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
> > > > > > > > +	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
> > > > > > > 
> > > > > > > This doesn't smell right: PNP0A08 is the generic ACPI ID.  There's no
> > > > > > > guarantee that if we find a PNP0A08 device, it is a ThunderX device.
> > > > > > > 
> > > > > > > I think the only way to call thunder_pem_acpi_init() is via an MCFG
> > > > > > > quirk that mentions thunder_pem_ecam_ops, which means we only call it
> > > > > > > if we find an MCFG with "CAVIUM" "THUNDERX" OEM and table IDs, so it's
> > > > > > > probably safe in that sense.
> > > > > > 
> > > > > > Agree, it is not the best solution.
> > > > > > We will implement such approach and send for review. 
> > > > > > 
> > > > > > > 
> > > > > > > But it's an abuse of the ACPI _HID model.  If you match a device using
> > > > > > > PNP0A08, all you can assume about it is that it uses the generic
> > > > > > > PNP0A08 programming model, and I don't think that includes "the first
> > > > > > > memory resource in _CRS contains ECAM space and MSI-X tables."
> > > > > > > 
> > > > > > > I expect this is a teething issue because you have firmware in the
> > > > > > > field that uses PNP0A08 and it's not feasible to update it.  If that's
> > > > > > > the case, the changelog should have details about it and we should
> > > > > > > have a comment in the code, because I don't think this is the model we
> > > > > > > want to end up with in future releases.
> > > > > > 
> > > > > > It could become so. However, for now I didn't get any reports on that,
> > > > > > (may be I miss something) except some internal emailings.
> > > > > > At my testing HW I was able to see some issues related to acpi-PEM stuff.
> > > > > > 
> > > > > > Thanks for feed-back, we will prepare another patch or patchset
> > > > > > implementing approach you've highlighted.
> > > > > 
> > > > > The approach I would like best is to search for THRX0002, because then
> > > > > you know you have a ThunderX PEM device, and you can assume
> > > > > device-specific details about its _CRS.
> > > > >
> > > > > But that's what the existing code does, and apparently that doesn't
> > > > > work.  My guess is that you have firmware in the field where the host
> > > > > bridge has only PNP0A08 (and maybe PNP0A03) as device IDs.
> > > > 
> > > > Because there is no such ACPI ID as "THRX0002" registered
> > > > (http://www.uefi.org/acpi_id_list).
> > > 
> > > To be pedantically correct, I think you want "THRX" registered.  Then
> > > you can manage the "0002" part internally without registering each
> > > individual device.
> > 
> > Not sure if it would be registered that way, because (AFAIK)
> > it expected to be string constructed from Vendor ID (not the Product ID) plus
> > four hex digit manged internaly. So we suggest to change it to 177DXXXX
> > which corresponds to Cavium PCI ID https://pci-ids.ucw.cz/pci.ids.
> > It's also possible to use the 3-digit PNP ID, "CAV", to construct these
> > _HID/_CID/_SUB values (http://www.uefi.org/pnp_id_list).
> 
> My point was that you only need to register the prefix ("CAV" or
> "THRX") of the PNP or ACPI ID.  Then you manage the suffixes
> internally.  You as long as you register "CAV" or "THRX", you can
> assign and use "THRX0002" yourself without registering that
> specifically.

Yes, exactly. And the "CAV" perfix is already registered.
And I think will'll use it to keep things aligned to specs & rules.

> 
> > So the FW will be updated accordingly.
> > 
> > For old FW we'll implement fallback scenario gathering resource info
> > via _CRS object (comments will be provided in the code also).
> > 
> > > 
> > > > From the other hand we may gather device resources through
> > > > _CRS (we have acpi_device already set by this moment) but
> > > > we need to be sure that we're running at Cavium ThunderX board then.
> > > > 
> > > > To do that we may add _SUB value (accrodingly to spec) with
> > > > exact ID string (a full PCI ID value "177DA22D" is suggested).
> > > > This will eventually become the main method of finding out
> > > > whether we run on a ThunderX PEM.
> > > 
> > > I don't understand why you would use _SUB.  I think the correct way to
> > > handle this is to use a _HID of "THRX0002" with a _CID of "PNP0A08".
> > > Is there some existing _SUB usage you're using as an example?
> > 
> > Here we were talking about extra quirk to match PEM and ECAM and thus make
> > a decision. This object could also store some vendor-specific IDs.
> > To be honest I didn't see usage in kernel (didn't search), but ACPI Spec
> > provide an example as (p 6.1.9) :
> >   Name (_SUB, "MSFT3000")// Vendor-defined subsystem
> > 
> > However this is still debatable and we probably stick to _HID while
> > implementing additional match functions.
> 
> If you want a driver to bind to a specific device, e.g., the ThunderX
> PCI host bridge, the firmware should use a unique _HID in the ACPI
> namespace and the driver should claim that _HID.
> 
> It's conceivable that you could use _SUB, but that's needlessly
> different from everything else.

See your point, thank you for quick and clear feed-back.

Just to let you know:
We are in the middle of developing solution for fall-back
scenario (when ACPI/PNP ID wasn't found) for PEM initalization.
Hope to be able to get it into 4.10.

> 
> Bjorn

Vadim
Jon Masters March 15, 2017, 11:14 a.m. UTC | #10
Hi Bjorn, Vadim,

Following up to this old thread...

On 02/01/2017 10:18 AM, Bjorn Helgaas wrote:
> On Wed, Feb 01, 2017 at 04:53:25AM -0800, Vadim Lomovtsev wrote:

>>>> Because there is no such ACPI ID as "THRX0002" registered
>>>> (http://www.uefi.org/acpi_id_list).

There is still no "THRX" prefix registered with UEFI as of this morning.

>>> To be pedantically correct, I think you want "THRX" registered.  Then
>>> you can manage the "0002" part internally without registering each
>>> individual device.

The upstream Linux kernel contains a quirk matching entry that looks for
"THRX". Therefore, you have already agreed (as of at least January) that
this is the prefix that you will use in any firmware updates to support
the latest upstream Linux kernel. Please register this prefix promptly.

>> Not sure if it would be registered that way, because (AFAIK)
>> it expected to be string constructed from Vendor ID (not the Product ID) plus
>> four hex digit manged internaly. So we suggest to change it to 177DXXXX
>> which corresponds to Cavium PCI ID https://pci-ids.ucw.cz/pci.ids.
>> It's also possible to use the 3-digit PNP ID, "CAV", to construct these
>> _HID/_CID/_SUB values (http://www.uefi.org/pnp_id_list).
> 
> My point was that you only need to register the prefix ("CAV" or
> "THRX") of the PNP or ACPI ID.  Then you manage the suffixes
> internally.  You as long as you register "CAV" or "THRX", you can
> assign and use "THRX0002" yourself without registering that
> specifically.
> 
>> So the FW will be updated accordingly.

Indeed.

The version Bjorn merged looks for "THRX". This is the version that you will
use, and you will promptly register that prefix with UEFI and provide fixes
for existing firmware to correctly use the solution that is upstream.

Thanks,

Jon.
Vadim Lomovtsev March 15, 2017, 11:33 a.m. UTC | #11
Hi Jon,

On Wed, Mar 15, 2017 at 07:14:38AM -0400, Jon Masters wrote:
> Hi Bjorn, Vadim,
> 
> Following up to this old thread...
> 
> On 02/01/2017 10:18 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 01, 2017 at 04:53:25AM -0800, Vadim Lomovtsev wrote:
> 
> >>>> Because there is no such ACPI ID as "THRX0002" registered
> >>>> (http://www.uefi.org/acpi_id_list).
> 
> There is still no "THRX" prefix registered with UEFI as of this morning.
> 
> >>> To be pedantically correct, I think you want "THRX" registered.  Then
> >>> you can manage the "0002" part internally without registering each
> >>> individual device.
> 
> The upstream Linux kernel contains a quirk matching entry that looks for
> "THRX". Therefore, you have already agreed (as of at least January) that
> this is the prefix that you will use in any firmware updates to support
> the latest upstream Linux kernel. Please register this prefix promptly.

And from what I know for now - we wont going to register this
since we have already regsitered "CAV" prefix for that. And this was the part
of our discussion also.

We had a bit long review of proper implementation of legacy firmware support,
so my apologise on that.

Please take a look at link to the patchset posted by Tomasz.
https://www.spinics.net/lists/arm-kernel/msg568741.html

> 
> >> Not sure if it would be registered that way, because (AFAIK)
> >> it expected to be string constructed from Vendor ID (not the Product ID) plus
> >> four hex digit manged internaly. So we suggest to change it to 177DXXXX
> >> which corresponds to Cavium PCI ID https://pci-ids.ucw.cz/pci.ids.
> >> It's also possible to use the 3-digit PNP ID, "CAV", to construct these
> >> _HID/_CID/_SUB values (http://www.uefi.org/pnp_id_list).
> > 
> > My point was that you only need to register the prefix ("CAV" or
> > "THRX") of the PNP or ACPI ID.  Then you manage the suffixes
> > internally.  You as long as you register "CAV" or "THRX", you can
> > assign and use "THRX0002" yourself without registering that
> > specifically.
> >

And my reply here was :
"Yes, exactly. And the "CAV" perfix is already registered.
And I think will'll use it to keep things aligned to specs & rules."

> >> So the FW will be updated accordingly.
> 
> Indeed.

Yes, it is now contains "CAVxxx" as _HID for device config object.

> 
> The version Bjorn merged looks for "THRX". This is the version that you will
> use, and you will promptly register that prefix with UEFI and provide fixes
> for existing firmware to correctly use the solution that is upstream.

Cavium FW is updated accordingly to use already registered prefix.
For existent FW legacy support is posted by Tomasz.

> 
> Thanks,
> 
> Jon.
> 


WBR,
Vadim
Jon Masters March 16, 2017, 2:32 p.m. UTC | #12
Hi Vadim,

Thanks for your followup and attention to this matter. More below.

On 03/15/2017 07:33 AM, Vadim Lomovtsev wrote:

>> The upstream Linux kernel contains a quirk matching entry that looks for
>> "THRX". Therefore, you have already agreed (as of at least January) that
>> this is the prefix that you will use in any firmware updates to support
>> the latest upstream Linux kernel. Please register this prefix promptly.
> 
> And from what I know for now - we wont going to register this
> since we have already regsitered "CAV" prefix for that. And this was the part
> of our discussion also.
> 
> We had a bit long review of proper implementation of legacy firmware support,
> so my apologise on that.
> 
> Please take a look at link to the patchset posted by Tomasz.
> https://www.spinics.net/lists/arm-kernel/msg568741.html

I'll let others comment on the suitability of taking that for upstream.

>>>> So the FW will be updated accordingly.
>>
>> Indeed.
> 
> Yes, it is now contains "CAVxxx" as _HID for device config object.

Which is different from the version that was merged into upstream. That
should never have happened. It will never happen again. I have spent some
time over the past few days ensuring folks understand that I will not
allow a repeat of this to occur the next time around. We will have
platforms that are bulletproof and supported by upstream with any
errata fixes in a very carefully controlled manner. There will
under no circumstances ever be a situation like this again.

>> The version Bjorn merged looks for "THRX". This is the version that you will
>> use, and you will promptly register that prefix with UEFI and provide fixes
>> for existing firmware to correctly use the solution that is upstream.
> 
> Cavium FW is updated accordingly to use already registered prefix.
> For existent FW legacy support is posted by Tomasz.

I'm watching this to ensure it's cleaned up properly.

Jon.
David Daney March 16, 2017, 4:25 p.m. UTC | #13
On 03/16/2017 07:32 AM, Jon Masters wrote:
> Hi Vadim,
>
> Thanks for your followup and attention to this matter. More below.
>
> On 03/15/2017 07:33 AM, Vadim Lomovtsev wrote:
>
>>> The upstream Linux kernel contains a quirk matching entry that looks for
>>> "THRX". Therefore, you have already agreed (as of at least January) that
>>> this is the prefix that you will use in any firmware updates to support
>>> the latest upstream Linux kernel. Please register this prefix promptly.
>>
>> And from what I know for now - we wont going to register this
>> since we have already regsitered "CAV" prefix for that. And this was the part
>> of our discussion also.
>>
>> We had a bit long review of proper implementation of legacy firmware support,
>> so my apologise on that.
>>
>> Please take a look at link to the patchset posted by Tomasz.
>> https://www.spinics.net/lists/arm-kernel/msg568741.html
>
> I'll let others comment on the suitability of taking that for upstream.
>
>>>>> So the FW will be updated accordingly.
>>>
>>> Indeed.
>>
>> Yes, it is now contains "CAVxxx" as _HID for device config object.
>
> Which is different from the version that was merged into upstream. That
> should never have happened. It will never happen again. I have spent some
> time over the past few days ensuring folks understand that I will not
> allow a repeat of this to occur the next time around. We will have
> platforms that are bulletproof and supported by upstream with any
> errata fixes in a very carefully controlled manner. There will
> under no circumstances ever be a situation like this again.

We are still evaluating the merits of registering the values that 
appeared in v4.10, and not changing them.  We should know more in a 
couple of days.


>
>>> The version Bjorn merged looks for "THRX". This is the version that you will
>>> use, and you will promptly register that prefix with UEFI and provide fixes
>>> for existing firmware to correctly use the solution that is upstream.
>>
>> Cavium FW is updated accordingly to use already registered prefix.
>> For existent FW legacy support is posted by Tomasz.
>
> I'm watching this to ensure it's cleaned up properly.
>
> Jon.
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Jon Masters March 21, 2017, 11:38 a.m. UTC | #14
On 03/16/2017 12:25 PM, David Daney wrote:
> On 03/16/2017 07:32 AM, Jon Masters wrote:

>>> Yes, it is now contains "CAVxxx" as _HID for device config object.
>>
>> Which is different from the version that was merged into upstream. That
>> should never have happened. It will never happen again. I have spent some
>> time over the past few days ensuring folks understand that I will not
>> allow a repeat of this to occur the next time around. We will have
>> platforms that are bulletproof and supported by upstream with any
>> errata fixes in a very carefully controlled manner. There will
>> under no circumstances ever be a situation like this again.
> 
> We are still evaluating the merits of registering the values that appeared
> in v4.10, and not changing them.  We should know more in a couple of days.

Thanks David. What was the verdict? (for the public record). If we need to
get a change into upstream, let's get that teed up before 4.12 merge.

And for other folks following along with this thread: I'm not just picking
on Cavium here. I'll be doing the same with *every* ARM server SoC company
as necessary over the coming months. We are going to have militantly
compliant standards adherence in this industry and every ARM server SoC is
going to "just work" with an upstream Linux kernel with an ACPI enabled 
platform. This will be so utterly clean and boring it'll be amazing.

Jon.
Bjorn Helgaas March 21, 2017, 1:47 p.m. UTC | #15
On Tue, Mar 21, 2017 at 07:38:07AM -0400, Jon Masters wrote:
> On 03/16/2017 12:25 PM, David Daney wrote:
> > On 03/16/2017 07:32 AM, Jon Masters wrote:
> 
> >>> Yes, it is now contains "CAVxxx" as _HID for device config object.
> >>
> >> Which is different from the version that was merged into upstream. That
> >> should never have happened. It will never happen again. I have spent some
> >> time over the past few days ensuring folks understand that I will not
> >> allow a repeat of this to occur the next time around. We will have
> >> platforms that are bulletproof and supported by upstream with any
> >> errata fixes in a very carefully controlled manner. There will
> >> under no circumstances ever be a situation like this again.
> > 
> > We are still evaluating the merits of registering the values that appeared
> > in v4.10, and not changing them.  We should know more in a couple of days.
> 
> Thanks David. What was the verdict? (for the public record). If we need to
> get a change into upstream, let's get that teed up before 4.12 merge.
> 
> And for other folks following along with this thread: I'm not just picking
> on Cavium here. I'll be doing the same with *every* ARM server SoC company
> as necessary over the coming months. We are going to have militantly
> compliant standards adherence in this industry and every ARM server SoC is
> going to "just work" with an upstream Linux kernel with an ACPI enabled 
> platform. This will be so utterly clean and boring it'll be amazing.

Thanks for keeping on top of this, Jon.  I agree, we should not be
using unregistered vendor prefixes, e.g., the "THRX" added by
44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host
controller").  I'm sorry I merged that without doing the due
diligence.

I suspect the resolution will be to register "THRX".  If that doesn't
happen, I'll propose reverting 44f22bd91e88, not because I want to
break things, but only because I'm not personally in a position to do
anything smarter.  So please propose a better solution that fits
within the ACPI _HID/_CID model :)

Bjorn
Tomasz Nowicki March 21, 2017, 2:17 p.m. UTC | #16
Hi Bjorn,

On 21.03.2017 14:47, Bjorn Helgaas wrote:
> On Tue, Mar 21, 2017 at 07:38:07AM -0400, Jon Masters wrote:
>> On 03/16/2017 12:25 PM, David Daney wrote:
>>> On 03/16/2017 07:32 AM, Jon Masters wrote:
>>
>>>>> Yes, it is now contains "CAVxxx" as _HID for device config object.
>>>>
>>>> Which is different from the version that was merged into upstream. That
>>>> should never have happened. It will never happen again. I have spent some
>>>> time over the past few days ensuring folks understand that I will not
>>>> allow a repeat of this to occur the next time around. We will have
>>>> platforms that are bulletproof and supported by upstream with any
>>>> errata fixes in a very carefully controlled manner. There will
>>>> under no circumstances ever be a situation like this again.
>>>
>>> We are still evaluating the merits of registering the values that appeared
>>> in v4.10, and not changing them.  We should know more in a couple of days.
>>
>> Thanks David. What was the verdict? (for the public record). If we need to
>> get a change into upstream, let's get that teed up before 4.12 merge.
>>
>> And for other folks following along with this thread: I'm not just picking
>> on Cavium here. I'll be doing the same with *every* ARM server SoC company
>> as necessary over the coming months. We are going to have militantly
>> compliant standards adherence in this industry and every ARM server SoC is
>> going to "just work" with an upstream Linux kernel with an ACPI enabled
>> platform. This will be so utterly clean and boring it'll be amazing.
>
> Thanks for keeping on top of this, Jon.  I agree, we should not be
> using unregistered vendor prefixes, e.g., the "THRX" added by
> 44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host
> controller").  I'm sorry I merged that without doing the due
> diligence.

Honestly, it is me who is responsible for this since I submitted the patch.

>
> I suspect the resolution will be to register "THRX".  If that doesn't
> happen, I'll propose reverting 44f22bd91e88, not because I want to
> break things, but only because I'm not personally in a position to do
> anything smarter.  So please propose a better solution that fits
> within the ACPI _HID/_CID model :)

I already submitted the patch to fix this. Please see:
https://patchwork.ozlabs.org/patch/739042/

Thanks,
Tomasz
David Daney March 21, 2017, 2:56 p.m. UTC | #17
On 03/21/2017 07:17 AM, Tomasz Nowicki wrote:
> Hi Bjorn,
>
> On 21.03.2017 14:47, Bjorn Helgaas wrote:
>> On Tue, Mar 21, 2017 at 07:38:07AM -0400, Jon Masters wrote:
>>> On 03/16/2017 12:25 PM, David Daney wrote:
>>>> On 03/16/2017 07:32 AM, Jon Masters wrote:
>>>
>>>>>> Yes, it is now contains "CAVxxx" as _HID for device config object.
>>>>>
>>>>> Which is different from the version that was merged into upstream.
>>>>> That
>>>>> should never have happened. It will never happen again. I have
>>>>> spent some
>>>>> time over the past few days ensuring folks understand that I will not
>>>>> allow a repeat of this to occur the next time around. We will have
>>>>> platforms that are bulletproof and supported by upstream with any
>>>>> errata fixes in a very carefully controlled manner. There will
>>>>> under no circumstances ever be a situation like this again.
>>>>
>>>> We are still evaluating the merits of registering the values that
>>>> appeared
>>>> in v4.10, and not changing them.  We should know more in a couple of
>>>> days.
>>>
>>> Thanks David. What was the verdict? (for the public record). If we
>>> need to
>>> get a change into upstream, let's get that teed up before 4.12 merge.
>>>
>>> And for other folks following along with this thread: I'm not just
>>> picking
>>> on Cavium here. I'll be doing the same with *every* ARM server SoC
>>> company
>>> as necessary over the coming months. We are going to have militantly
>>> compliant standards adherence in this industry and every ARM server
>>> SoC is
>>> going to "just work" with an upstream Linux kernel with an ACPI enabled
>>> platform. This will be so utterly clean and boring it'll be amazing.
>>
>> Thanks for keeping on top of this, Jon.  I agree, we should not be
>> using unregistered vendor prefixes, e.g., the "THRX" added by
>> 44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host
>> controller").  I'm sorry I merged that without doing the due
>> diligence.
>
> Honestly, it is me who is responsible for this since I submitted the patch.

Yes.  After all this back and forth, Cavium has decided to deploy 
firmware with "CAVxxx" as _HID.

The deciding factor was that the prefix is already registered and there 
are probably fewer than 10 systems deployed with the experimental and 
erroneous  "THRXxxx" value.  Neither option (switching the kernel to 
"CAVxxx", or changing the firmware to use "THRXxxx") was without its 
drawbacks.  There were valid arguments on either side.  In the end 
internal momentum, among other factors, brought us to the conclusion 
that using the "CAVxxx" as _HID, and trying to get Tomasz' patches 
merged is what we will do. We fully realize that there may exists some 
combinations of the Linux kernel and Cavium firmware (both officially 
released, and experimental and unsupported) that don't result in 
functional PCIe.

I, personally, am sorry that this screw up happened in the first place. 
Let's hope that everyone learned something from the experience.

Thanks,
David Daney


>
>>
>> I suspect the resolution will be to register "THRX".  If that doesn't
>> happen, I'll propose reverting 44f22bd91e88, not because I want to
>> break things, but only because I'm not personally in a position to do
>> anything smarter.  So please propose a better solution that fits
>> within the ACPI _HID/_CID model :)
>
> I already submitted the patch to fix this. Please see:
> https://patchwork.ozlabs.org/patch/739042/
>
> Thanks,
> Tomasz
Bjorn Helgaas March 21, 2017, 5:45 p.m. UTC | #18
On Tue, Mar 21, 2017 at 03:17:17PM +0100, Tomasz Nowicki wrote:
> Hi Bjorn,
> 
> On 21.03.2017 14:47, Bjorn Helgaas wrote:
> >On Tue, Mar 21, 2017 at 07:38:07AM -0400, Jon Masters wrote:
> >>On 03/16/2017 12:25 PM, David Daney wrote:
> >>>On 03/16/2017 07:32 AM, Jon Masters wrote:
> >>
> >>>>>Yes, it is now contains "CAVxxx" as _HID for device config object.
> >>>>
> >>>>Which is different from the version that was merged into upstream. That
> >>>>should never have happened. It will never happen again. I have spent some
> >>>>time over the past few days ensuring folks understand that I will not
> >>>>allow a repeat of this to occur the next time around. We will have
> >>>>platforms that are bulletproof and supported by upstream with any
> >>>>errata fixes in a very carefully controlled manner. There will
> >>>>under no circumstances ever be a situation like this again.
> >>>
> >>>We are still evaluating the merits of registering the values that appeared
> >>>in v4.10, and not changing them.  We should know more in a couple of days.
> >>
> >>Thanks David. What was the verdict? (for the public record). If we need to
> >>get a change into upstream, let's get that teed up before 4.12 merge.
> >>
> >>And for other folks following along with this thread: I'm not just picking
> >>on Cavium here. I'll be doing the same with *every* ARM server SoC company
> >>as necessary over the coming months. We are going to have militantly
> >>compliant standards adherence in this industry and every ARM server SoC is
> >>going to "just work" with an upstream Linux kernel with an ACPI enabled
> >>platform. This will be so utterly clean and boring it'll be amazing.
> >
> >Thanks for keeping on top of this, Jon.  I agree, we should not be
> >using unregistered vendor prefixes, e.g., the "THRX" added by
> >44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host
> >controller").  I'm sorry I merged that without doing the due
> >diligence.
> 
> Honestly, it is me who is responsible for this since I submitted the patch.
> 
> >
> >I suspect the resolution will be to register "THRX".  If that doesn't
> >happen, I'll propose reverting 44f22bd91e88, not because I want to
> >break things, but only because I'm not personally in a position to do
> >anything smarter.  So please propose a better solution that fits
> >within the ACPI _HID/_CID model :)
> 
> I already submitted the patch to fix this. Please see:
> https://patchwork.ozlabs.org/patch/739042/

Applied, thanks!  Glad to have this cleared up!

Bjorn
Jon Masters March 22, 2017, 2:28 p.m. UTC | #19
On 03/21/2017 10:56 AM, David Daney wrote:
> On 03/21/2017 07:17 AM, Tomasz Nowicki wrote:

>> On 21.03.2017 14:47, Bjorn Helgaas wrote:

>>>> And for other folks following along with this thread: I'm not just
>>>> picking on Cavium here. I'll be doing the same with *every* ARM server
>>>> SoC company as necessary over the coming months.

>>> Thanks for keeping on top of this, Jon.

You're welcome. I'm pleased (in some sense) that we're starting to see
enough systems shipping that unifying quirks and IDs such that ODMs
can bend metal easily is a problem that we want to solve. I am saddened
that there isn't an ARM swat team with black helicopters swooping in to
ensure zoo avoidance (and I do actually request this every year in my
own budget cycle), but I am very "happy" to serve that role for now.

As I said, this isn't Cavium's fault. They're a victim of their market
success. I'm super excited to see them shipping systems on which we
want to run general purpose Operating Systems. At the same time, as
with *every* other ARM vendor, I will keep my eye out for compliance
concerns and I will act to ensure that these things are flagged.

>>> I agree, we should not be
>>> using unregistered vendor prefixes, e.g., the "THRX" added by
>>> 44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host
>>> controller").  I'm sorry I merged that without doing the due
>>> diligence.

Oh, it's difficult for you to police everything without having every
possible platform in front of you, with every firmware, and a lot of
time that none of us have :)

>> Honestly, it is me who is responsible for this since I submitted
>> the patch.

You're great Tomasz. You've done awesome stuff over the past few months.
I want to be /very/ clear that none of my pushback is directed at you,
David, or any specific individual. You're doing great. I'm going to
make sure that alignment happens in this industry because I need to
ship a "common core" single binary build OS that supports "ARM
servers". That means every server, from every vendor. Not all are
going to be "certified" to run RHEL, but all servers must be capable
of booting and working with upstream kernels, and running *ANY*
Linux distro, so that customers and users who try an "ARM server"
from a random ODM don't get upset. There will be no zoo. There will
only be "upstream first" driven development and the distros will
learn to consume only from upstream. They won't produce hacked up
nonsense with patches to support platforms that aren't upstream.

> Yes.  After all this back and forth, Cavium has decided to deploy
> firmware with "CAVxxx" as _HID.

Great. How about a stable backport for Greg K-H? I want to make sure
that everyone running "upstream" has a chance of booting.

> The deciding factor was that the prefix is already registered and there
> are probably fewer than 10 systems deployed with the experimental and
> erroneous  "THRXxxx" value.  Neither option (switching the kernel to "CAVxxx",
> or changing the firmware to use "THRXxxx") was without its drawbacks. 

Agree. Let's pick a solution and learn for the future. I know you know
this, but for everyone else (especially ARM vendors who follow):

The dirty secret to server is that we have software we ship, and hardware
that ships separately. The software lives for years. It's easier to change
the hardware than software that has already shipped. This is where very
rich and featurefull firmware comes in. The platform is defined by the
firmware, which should provide a fully standard interface that is SET
IN STONE. It's so utterly bulletproof that it's both forward and
backward compatible. Going forward, all of the ARM vendors are going
to have utterly bulletproof server platforms with an amazing level of
joined up cohesion in terms of tracking changes on the software and
hardware side in terms of the platform firmware gluing it together.

It's a dirty secret that x86 teaches us, and we're going to play
exactly the same model out again (this has been the evil plan for
many many years). But to do it right requires that we are very
very careful in connecting dots between the platform pieces.

Thanks,

Jon.
Bjorn Helgaas March 22, 2017, 2:48 p.m. UTC | #20
On Wed, Mar 22, 2017 at 10:28:27AM -0400, Jon Masters wrote:
> On 03/21/2017 10:56 AM, David Daney wrote:

> > Yes.  After all this back and forth, Cavium has decided to deploy
> > firmware with "CAVxxx" as _HID.
> 
> Great. How about a stable backport for Greg K-H? I want to make sure
> that everyone running "upstream" has a chance of booting.

The first patch, fee4d813850c ("PCI: Use Cavium assigned hardware ID for
ThunderX host controller"), has a stable tag already.

The second, 1dc94a38af89 ("PCI: Add legacy firmware support for Cavium
ThunderX host controller"), does not, but I can easily add it if needed.

Bjorn
Jon Masters March 22, 2017, 4:25 p.m. UTC | #21
On 03/22/2017 10:48 AM, Bjorn Helgaas wrote:
> On Wed, Mar 22, 2017 at 10:28:27AM -0400, Jon Masters wrote:
>> On 03/21/2017 10:56 AM, David Daney wrote:
> 
>>> Yes.  After all this back and forth, Cavium has decided to deploy
>>> firmware with "CAVxxx" as _HID.
>>
>> Great. How about a stable backport for Greg K-H? I want to make sure
>> that everyone running "upstream" has a chance of booting.
> 
> The first patch, fee4d813850c ("PCI: Use Cavium assigned hardware ID for
> ThunderX host controller"), has a stable tag already.

Thanks - I saw that after I mailed.

> The second, 1dc94a38af89 ("PCI: Add legacy firmware support for Cavium
> ThunderX host controller"), does not, but I can easily add it if needed.

I think that would be ideal. There is firmware out in the wild that
has neither identifier in it (for example, a bunch of folks in the
office bought platforms recently that don't boot upstream kernels).
My guys are used to just taking an upstream kernel and using that,
for development, and I have no intention of having the broader RH
org do any different from what they would do with an x86 box ;)

Jon.
Bjorn Helgaas March 22, 2017, 4:34 p.m. UTC | #22
On Wed, Mar 22, 2017 at 12:25:39PM -0400, Jon Masters wrote:
> On 03/22/2017 10:48 AM, Bjorn Helgaas wrote:
> > On Wed, Mar 22, 2017 at 10:28:27AM -0400, Jon Masters wrote:
> >> On 03/21/2017 10:56 AM, David Daney wrote:
> > 
> >>> Yes.  After all this back and forth, Cavium has decided to deploy
> >>> firmware with "CAVxxx" as _HID.
> >>
> >> Great. How about a stable backport for Greg K-H? I want to make sure
> >> that everyone running "upstream" has a chance of booting.
> > 
> > The first patch, fee4d813850c ("PCI: Use Cavium assigned hardware ID for
> > ThunderX host controller"), has a stable tag already.
> 
> Thanks - I saw that after I mailed.
> 
> > The second, 1dc94a38af89 ("PCI: Add legacy firmware support for Cavium
> > ThunderX host controller"), does not, but I can easily add it if needed.
> 
> I think that would be ideal. There is firmware out in the wild that
> has neither identifier in it (for example, a bunch of folks in the
> office bought platforms recently that don't boot upstream kernels).
> My guys are used to just taking an upstream kernel and using that,
> for development, and I have no intention of having the broader RH
> org do any different from what they would do with an x86 box ;)

I added the same stable tag (v4.10+) to the second patch.
Bjorn Helgaas March 23, 2017, 10:14 p.m. UTC | #23
On Wed, Mar 22, 2017 at 11:34:00AM -0500, Bjorn Helgaas wrote:
> On Wed, Mar 22, 2017 at 12:25:39PM -0400, Jon Masters wrote:
> > On 03/22/2017 10:48 AM, Bjorn Helgaas wrote:
> > > On Wed, Mar 22, 2017 at 10:28:27AM -0400, Jon Masters wrote:
> > >> On 03/21/2017 10:56 AM, David Daney wrote:
> > > 
> > >>> Yes.  After all this back and forth, Cavium has decided to deploy
> > >>> firmware with "CAVxxx" as _HID.
> > >>
> > >> Great. How about a stable backport for Greg K-H? I want to make sure
> > >> that everyone running "upstream" has a chance of booting.
> > > 
> > > The first patch, fee4d813850c ("PCI: Use Cavium assigned hardware ID for
> > > ThunderX host controller"), has a stable tag already.
> > 
> > Thanks - I saw that after I mailed.
> > 
> > > The second, 1dc94a38af89 ("PCI: Add legacy firmware support for Cavium
> > > ThunderX host controller"), does not, but I can easily add it if needed.
> > 
> > I think that would be ideal. There is firmware out in the wild that
> > has neither identifier in it (for example, a bunch of folks in the
> > office bought platforms recently that don't boot upstream kernels).
> > My guys are used to just taking an upstream kernel and using that,
> > for development, and I have no intention of having the broader RH
> > org do any different from what they would do with an x86 box ;)
> 
> I added the same stable tag (v4.10+) to the second patch.

I moved these from pci/host-thunder (targeted for v4.12) to for-linus so we
can get these in v4.11.
Jon Masters March 23, 2017, 10:16 p.m. UTC | #24
Thanks :) :) :)
diff mbox

Patch

diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c
index af722eb..aec30b8 100644
--- a/drivers/pci/host/pci-thunder-pem.c
+++ b/drivers/pci/host/pci-thunder-pem.c
@@ -331,7 +331,7 @@  static int thunder_pem_acpi_init(struct pci_config_window *cfg)
 	if (!res_pem)
 		return -ENOMEM;
 
-	ret = acpi_get_rc_resources(dev, "THRX0002", root->segment, res_pem);
+	ret = acpi_get_rc_resources(dev, "PNP0A08", root->segment, res_pem);
 	if (ret) {
 		dev_err(dev, "can't get rc base address\n");
 		return ret;