diff mbox series

[2/4] x86/amd_nb: add support for newer PCI topologies

Message ID 20181102181055.130531-3-brian.woods@amd.com (mailing list archive)
State Not Applicable
Headers show
Series Update DF/SMN access and k10temp for AMD F17h M30h | expand

Commit Message

Woods, Brian Nov. 2, 2018, 6:11 p.m. UTC
Add support for new processors which have multiple PCI root complexes
per data fabric/SMN interface.  The interfaces per root complex are
redundant and should be skipped.  This makes sure the DF/SMN interfaces
get accessed via the correct root complex.

Ex:
DF/SMN 0 -> 60
	    40
	    20
	    00
DF/SMN 1 -> e0
	    c0
	    a0
	    80

Signed-off-by: Brian Woods <brian.woods@amd.com>
---
 arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

Comments

Bjorn Helgaas Nov. 2, 2018, 7:59 p.m. UTC | #1
On Fri, Nov 02, 2018 at 06:11:07PM +0000, Woods, Brian wrote:
> Add support for new processors which have multiple PCI root complexes
> per data fabric/SMN interface.  The interfaces per root complex are
> redundant and should be skipped.  This makes sure the DF/SMN interfaces
> get accessed via the correct root complex.

SMN?

> Ex:
> DF/SMN 0 -> 60
> 	    40
> 	    20
> 	    00
> DF/SMN 1 -> e0
> 	    c0
> 	    a0
> 	    80

This isn't my code, and I'm not really objecting to these changes, but
from where I sit, the fact that you need this sort of vendor-specific
topology discovery is a little bit ugly and seems like something of a
maintenance issue.  You could argue that this is sort of an "AMD CPU
driver", which is entitled to be device-specific, and that does make
some sense.

But device-specific code is typically packaged as a driver that uses
driver registration interfaces like acpi_bus_register_driver(),
pci_register_driver(), etc.  That gives you a consistent structure
and, more importantly, a framework for dealing with hotplug.  It
doesn't look like amd_nb.c would deal well with hot-add of CPUs.

> Signed-off-by: Brian Woods <brian.woods@amd.com>
> ---
>  arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 19d489ee2b1e..c0bf26aeb7c3 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -213,7 +213,10 @@ int amd_cache_northbridges(void)
>  	const struct pci_device_id *root_ids = amd_root_ids;
>  	struct pci_dev *root, *misc, *link;
>  	struct amd_northbridge *nb;
> -	u16 i = 0;
> +	u16 roots_per_misc = 0;
> +	u16 misc_count = 0;
> +	u16 root_count = 0;
> +	u16 i, j;
>  
>  	if (amd_northbridges.num)
>  		return 0;
> @@ -226,26 +229,52 @@ int amd_cache_northbridges(void)
>  
>  	misc = NULL;
>  	while ((misc = next_northbridge(misc, misc_ids)) != NULL)
> -		i++;
> +		misc_count++;
>  
> -	if (!i)
> +	root = NULL;
> +	while ((root = next_northbridge(root, root_ids)) != NULL)
> +		root_count++;
> +
> +	if (!misc_count)
>  		return -ENODEV;
>  
> -	nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL);
> +	if (root_count) {
> +		roots_per_misc = root_count / misc_count;
> +
> +		/*
> +		 * There should be _exactly_ N roots for each DF/SMN
> +		 * interface.
> +		 */
> +		if (!roots_per_misc || (root_count % roots_per_misc)) {
> +			pr_info("Unsupported AMD DF/PCI configuration found\n");
> +			return -ENODEV;
> +		}
> +	}
> +
> +	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
>  	if (!nb)
>  		return -ENOMEM;
>  
>  	amd_northbridges.nb = nb;
> -	amd_northbridges.num = i;
> +	amd_northbridges.num = misc_count;
>  
>  	link = misc = root = NULL;
> -	for (i = 0; i != amd_northbridges.num; i++) {
> +	for (i = 0; i < amd_northbridges.num; i++) {
>  		node_to_amd_nb(i)->root = root =
>  			next_northbridge(root, root_ids);
>  		node_to_amd_nb(i)->misc = misc =
>  			next_northbridge(misc, misc_ids);
>  		node_to_amd_nb(i)->link = link =
>  			next_northbridge(link, link_ids);
> +
> +		/*
> +		 * If there are more root devices than data fabric/SMN,
> +		 * interfaces, then the root devices per DF/SMN
> +		 * interface are redundant and N-1 should be skipped so
> +		 * they aren't mapped incorrectly.
> +		 */
> +		for (j = 1; j < roots_per_misc; j++)
> +			root = next_northbridge(root, root_ids);
>  	}
>  
>  	if (amd_gart_present())
> -- 
> 2.11.0
>
Borislav Petkov Nov. 2, 2018, 11:29 p.m. UTC | #2
On Fri, Nov 02, 2018 at 02:59:25PM -0500, Bjorn Helgaas wrote:
> This isn't my code, and I'm not really objecting to these changes, but
> from where I sit, the fact that you need this sort of vendor-specific
> topology discovery is a little bit ugly and seems like something of a
> maintenance issue.  You could argue that this is sort of an "AMD CPU
> driver", which is entitled to be device-specific, and that does make
> some sense.

It is a bunch of glue code which enumerates the PCI devices a CPU
has and other in-kernel users can use that instead of doing the
discovery/enumeration themselves.

> But device-specific code is typically packaged as a driver that uses
> driver registration interfaces like acpi_bus_register_driver(),
> pci_register_driver(), etc.  That gives you a consistent structure
> and, more importantly, a framework for dealing with hotplug.  It
> doesn't look like amd_nb.c would deal well with hot-add of CPUs.

If you mean physical hotadd, then that's a non-issue as, AFAIK, AMD
doesn't support that.

Now, TBH I've never tried soft-offlining the cores of a node and then
check whether using the PCI devices of that node would work.

Now, I don't mind this getting converted to a proper PCI driver as long
as it is not a module as it has to be present at all times. Other than
that, I'm a happy camper.

Thx.
Borislav Petkov Nov. 5, 2018, 7:38 p.m. UTC | #3
On Fri, Nov 02, 2018 at 06:11:07PM +0000, Woods, Brian wrote:
> Add support for new processors which have multiple PCI root complexes
> per data fabric/SMN interface.

Please write out abbreviations. I believe it is only you and I who know
what SMN means. :)

> The interfaces per root complex are redundant and should be skipped.

And I believe it is only you who understands that sentence. :)

Please elaborate why interfaces need to be skipped, *which* interfaces
need to be skipped and which is the correct interface to access DF/SMN
through?

> This makes sure the DF/SMN interfaces get accessed via the correct
> root complex.
>
> Ex:
> DF/SMN 0 -> 60
> 	    40
> 	    20
> 	    00
> DF/SMN 1 -> e0
> 	    c0
> 	    a0
> 	    80
> 
> Signed-off-by: Brian Woods <brian.woods@amd.com>
> ---
>  arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 19d489ee2b1e..c0bf26aeb7c3 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -213,7 +213,10 @@ int amd_cache_northbridges(void)
>  	const struct pci_device_id *root_ids = amd_root_ids;
>  	struct pci_dev *root, *misc, *link;
>  	struct amd_northbridge *nb;
> -	u16 i = 0;
> +	u16 roots_per_misc = 0;
> +	u16 misc_count = 0;
> +	u16 root_count = 0;
> +	u16 i, j;
>  
>  	if (amd_northbridges.num)
>  		return 0;
> @@ -226,26 +229,52 @@ int amd_cache_northbridges(void)
>  
>  	misc = NULL;
>  	while ((misc = next_northbridge(misc, misc_ids)) != NULL)
> -		i++;
> +		misc_count++;
>  
> -	if (!i)
> +	root = NULL;
> +	while ((root = next_northbridge(root, root_ids)) != NULL)
> +		root_count++;
> +
> +	if (!misc_count)
>  		return -ENODEV;

So you're doing the root_count above but returning in the !misc_count
case. So that root_count iteration was unnecessary work. IOW, you should
keep the misc_count check after its loop.

>  
> -	nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL);
> +	if (root_count) {
> +		roots_per_misc = root_count / misc_count;
> +
> +		/*
> +		 * There should be _exactly_ N roots for each DF/SMN
> +		 * interface.
> +		 */
> +		if (!roots_per_misc || (root_count % roots_per_misc)) {
> +			pr_info("Unsupported AMD DF/PCI configuration found\n");
> +			return -ENODEV;
> +		}
> +	}
> +
> +	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
>  	if (!nb)
>  		return -ENOMEM;
>  
>  	amd_northbridges.nb = nb;
> -	amd_northbridges.num = i;
> +	amd_northbridges.num = misc_count;
>  
>  	link = misc = root = NULL;
> -	for (i = 0; i != amd_northbridges.num; i++) {
> +	for (i = 0; i < amd_northbridges.num; i++) {
>  		node_to_amd_nb(i)->root = root =
>  			next_northbridge(root, root_ids);
>  		node_to_amd_nb(i)->misc = misc =
>  			next_northbridge(misc, misc_ids);
>  		node_to_amd_nb(i)->link = link =
>  			next_northbridge(link, link_ids);
> +
> +		/*
> +		 * If there are more root devices than data fabric/SMN,
> +		 * interfaces, then the root devices per DF/SMN
> +		 * interface are redundant and N-1 should be skipped so
> +		 * they aren't mapped incorrectly.
> +		 */

This text is trying to explain it a bit better but you still still need
to specify which are the redundant ones. All N-1 or is there a special
root device through which the DF/SMN gets accessed or?

Thx.
Woods, Brian Nov. 5, 2018, 8:33 p.m. UTC | #4
On Mon, Nov 05, 2018 at 08:38:40PM +0100, Borislav Petkov wrote:
> On Fri, Nov 02, 2018 at 06:11:07PM +0000, Woods, Brian wrote:
> > Add support for new processors which have multiple PCI root complexes
> > per data fabric/SMN interface.
> 
> Please write out abbreviations. I believe it is only you and I who know
> what SMN means. :)

Will do.

> > The interfaces per root complex are redundant and should be skipped.
> 
> And I believe it is only you who understands that sentence. :)
> 
> Please elaborate why interfaces need to be skipped, *which* interfaces
> need to be skipped and which is the correct interface to access DF/SMN
> through?

See last comment.

> > This makes sure the DF/SMN interfaces get accessed via the correct
> > root complex.
> >
> > Ex:
> > DF/SMN 0 -> 60
> > 	    40
> > 	    20
> > 	    00
> > DF/SMN 1 -> e0
> > 	    c0
> > 	    a0
> > 	    80
> > 
> > Signed-off-by: Brian Woods <brian.woods@amd.com>
> > ---
> >  arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------
> >  1 file changed, 35 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> > index 19d489ee2b1e..c0bf26aeb7c3 100644
> > --- a/arch/x86/kernel/amd_nb.c
> > +++ b/arch/x86/kernel/amd_nb.c
> > @@ -213,7 +213,10 @@ int amd_cache_northbridges(void)
> >  	const struct pci_device_id *root_ids = amd_root_ids;
> >  	struct pci_dev *root, *misc, *link;
> >  	struct amd_northbridge *nb;
> > -	u16 i = 0;
> > +	u16 roots_per_misc = 0;
> > +	u16 misc_count = 0;
> > +	u16 root_count = 0;
> > +	u16 i, j;
> >  
> >  	if (amd_northbridges.num)
> >  		return 0;
> > @@ -226,26 +229,52 @@ int amd_cache_northbridges(void)
> >  
> >  	misc = NULL;
> >  	while ((misc = next_northbridge(misc, misc_ids)) != NULL)
> > -		i++;
> > +		misc_count++;
> >  
> > -	if (!i)
> > +	root = NULL;
> > +	while ((root = next_northbridge(root, root_ids)) != NULL)
> > +		root_count++;
> > +
> > +	if (!misc_count)
> >  		return -ENODEV;
> 
> So you're doing the root_count above but returning in the !misc_count
> case. So that root_count iteration was unnecessary work. IOW, you should
> keep the misc_count check after its loop.

I think having them togeter is cleaner. If you aren't finding any
misc IDs, I highly doubt you'll find any root IDs.  There shouldn't
be much of a difference in how fast the function exits, either way.
If you want it the other way though, I don't mind changing it.

> >  
> > -	nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL);
> > +	if (root_count) {
> > +		roots_per_misc = root_count / misc_count;
> > +
> > +		/*
> > +		 * There should be _exactly_ N roots for each DF/SMN
> > +		 * interface.
> > +		 */
> > +		if (!roots_per_misc || (root_count % roots_per_misc)) {
> > +			pr_info("Unsupported AMD DF/PCI configuration found\n");
> > +			return -ENODEV;
> > +		}
> > +	}
> > +
> > +	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> >  	if (!nb)
> >  		return -ENOMEM;
> >  
> >  	amd_northbridges.nb = nb;
> > -	amd_northbridges.num = i;
> > +	amd_northbridges.num = misc_count;
> >  
> >  	link = misc = root = NULL;
> > -	for (i = 0; i != amd_northbridges.num; i++) {
> > +	for (i = 0; i < amd_northbridges.num; i++) {
> >  		node_to_amd_nb(i)->root = root =
> >  			next_northbridge(root, root_ids);
> >  		node_to_amd_nb(i)->misc = misc =
> >  			next_northbridge(misc, misc_ids);
> >  		node_to_amd_nb(i)->link = link =
> >  			next_northbridge(link, link_ids);
> > +
> > +		/*
> > +		 * If there are more root devices than data fabric/SMN,
> > +		 * interfaces, then the root devices per DF/SMN
> > +		 * interface are redundant and N-1 should be skipped so
> > +		 * they aren't mapped incorrectly.
> > +		 */
> 
> This text is trying to explain it a bit better but you still still need
> to specify which are the redundant ones. All N-1 or is there a special
> root device through which the DF/SMN gets accessed or?
> 
> Thx.
Would

		/*
		 * If there are more PCI root devices than data fabric/
		 * system management network interfaces, then the (N)
		 * PCI roots per DF/SMN interface are functionally the
		 * same (for DF/SMN access) and N-1 are redundant.  The
		 * N-1 PCI roots should be skipped per DF/SMN interface
		 * so the DF/SMN interfaces get mapped to the correct
		 * PCI root.
		 */

be better?  I would update the commit msg also.

> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
Borislav Petkov Nov. 5, 2018, 9:42 p.m. UTC | #5
On Mon, Nov 05, 2018 at 08:33:34PM +0000, Woods, Brian wrote:
> I think having them togeter is cleaner. If you aren't finding any
> misc IDs, I highly doubt you'll find any root IDs.  There shouldn't
> be much of a difference in how fast the function exits, either way.
> If you want it the other way though, I don't mind changing it.

Yes please. Because this is the usual kernel coding style of calling a
function (or a loop which has some result in this case) and testing that
result immediately after the function call.

> Would
> 
> 		/*
> 		 * If there are more PCI root devices than data fabric/
> 		 * system management network interfaces, then the (N)
> 		 * PCI roots per DF/SMN interface are functionally the
> 		 * same (for DF/SMN access) and N-1 are redundant.  The
> 		 * N-1 PCI roots should be skipped per DF/SMN interface
> 		 * so the DF/SMN interfaces get mapped to the correct
> 		 * PCI root.

You say "correct" as there is a special one. But the text before it says
they're "functionally the same" wrt DF/SMN access so it sounds to me
like we wanna map the first one we find and ignore the others.

I.e., we wanna say

"... so the DF/SMN interfaces get mapped to the *first* PCI root and the
others N-1 ignored."

Or am I misreading this?

Thx.
Bjorn Helgaas Nov. 5, 2018, 9:45 p.m. UTC | #6
[+cc Takashi, Andy, Colin, Myron for potential distro impact]

[Beginning of thread:
https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/]

On Sat, Nov 03, 2018 at 12:29:48AM +0100, Borislav Petkov wrote:
> On Fri, Nov 02, 2018 at 02:59:25PM -0500, Bjorn Helgaas wrote:
> > This isn't my code, and I'm not really objecting to these changes, but
> > from where I sit, the fact that you need this sort of vendor-specific
> > topology discovery is a little bit ugly and seems like something of a
> > maintenance issue.  

I think this is the most important part, and I should have elaborated
on it instead of getting into the driver structure details below.

It is a major goal of ACPI and PCI that an old kernel should work
unchanged on a new platform unless it needs to use new functionality
introduced in the new platform.

amd_nb.c prevents us from achieving that goal.  These patches don't
add new functionality; they merely describe minor topographical
differences in new hardware.  We usually try to do that in a more
generic way, e.g., via an ACPI method, so the new platform can update
the ACPI method and use an old, already-qualified, already-shipped
kernel.

I'm not strenuously objecting to these because this isn't a *huge*
deal, but I suspect it is a source of friction for distros that don't
want to update and requalify their software for every new platform.

> > You could argue that this is sort of an "AMD CPU
> > driver", which is entitled to be device-specific, and that does make
> > some sense.
> 
> It is a bunch of glue code which enumerates the PCI devices a CPU
> has and other in-kernel users can use that instead of doing the
> discovery/enumeration themselves.
> 
> > But device-specific code is typically packaged as a driver that uses
> > driver registration interfaces like acpi_bus_register_driver(),
> > pci_register_driver(), etc.  That gives you a consistent structure
> > and, more importantly, a framework for dealing with hotplug.  It
> > doesn't look like amd_nb.c would deal well with hot-add of CPUs.
> 
> If you mean physical hotadd, then that's a non-issue as, AFAIK, AMD
> doesn't support that.
> 
> Now, TBH I've never tried soft-offlining the cores of a node and then
> check whether using the PCI devices of that node would work.
> 
> Now, I don't mind this getting converted to a proper PCI driver as long
> as it is not a module as it has to be present at all times. Other than
> that, I'm a happy camper.

amd_nb.c uses pci_get_device(), which is incompatible with hotplug and
subverts the usual driver/device ownership model.  We could pursue
this part of the conversation, but I think it's more fruitful to
approach this from the "new machine, old kernel" angle above.
Borislav Petkov Nov. 5, 2018, 9:56 p.m. UTC | #7
On Mon, Nov 05, 2018 at 03:45:37PM -0600, Bjorn Helgaas wrote:
> amd_nb.c prevents us from achieving that goal.  These patches don't
> add new functionality; they merely describe minor topographical
> differences in new hardware.  We usually try to do that in a more
> generic way, e.g., via an ACPI method, so the new platform can update
> the ACPI method and use an old, already-qualified, already-shipped
> kernel.
> 
> I'm not strenuously objecting to these because this isn't a *huge*
> deal, but I suspect it is a source of friction for distros that don't
> want to update and requalify their software for every new platform.

Err, how is this any different from adding distro support for a new CPU
family?

This is basically the same thing. When distros add support for new
hardware, they have to backport patches for upstream. These PCI devices
which are part of the CPU are part of that hardware enablement.

So there's no way around doing that enablement. I don't think you can do
"old distro, new hardware" stuff without *some* hw enablement.
Woods, Brian Nov. 5, 2018, 11:32 p.m. UTC | #8
On Mon, Nov 05, 2018 at 10:42:33PM +0100, Borislav Petkov wrote:
> Yes please. Because this is the usual kernel coding style of calling a
> function (or a loop which has some result in this case) and testing that
> result immediately after the function call.

Done.

> You say "correct" as there is a special one. But the text before it says
> they're "functionally the same" wrt DF/SMN access so it sounds to me
> like we wanna map the first one we find and ignore the others.
> 
> I.e., we wanna say
> 
> "... so the DF/SMN interfaces get mapped to the *first* PCI root and the
> others N-1 ignored."
> 
> Or am I misreading this?
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

Your understanding is correct.  It's more so that the following DF/SMN
interface gets mapped correctly.
		/*
		 * If there are more PCI root devices than data fabric/
		 * system management network interfaces, then the (N)
		 * PCI roots per DF/SMN interface are functionally the
		 * same (for DF/SMN access) and N-1 are redundant.  N-1
		 * PCI roots should be skipped per DF/SMN interface so
		 * the following DF/SMN interfaces get mapped to
		 * correct PCI roots.
		 */
Does that read clearer?
Borislav Petkov Nov. 6, 2018, 8:27 a.m. UTC | #9
On Mon, Nov 05, 2018 at 11:32:16PM +0000, Woods, Brian wrote:
> Your understanding is correct.  It's more so that the following DF/SMN
> interface gets mapped correctly.
> 		/*
> 		 * If there are more PCI root devices than data fabric/
> 		 * system management network interfaces, then the (N)
> 		 * PCI roots per DF/SMN interface are functionally the
> 		 * same (for DF/SMN access) and N-1 are redundant.  N-1
> 		 * PCI roots should be skipped per DF/SMN interface so
> 		 * the following DF/SMN interfaces get mapped to
> 		 * correct PCI roots.
> 		 */
> Does that read clearer?

Yap, thanks!
Bjorn Helgaas Nov. 6, 2018, 9:42 p.m. UTC | #10
On Mon, Nov 05, 2018 at 10:56:50PM +0100, Borislav Petkov wrote:
> On Mon, Nov 05, 2018 at 03:45:37PM -0600, Bjorn Helgaas wrote:
> > amd_nb.c prevents us from achieving that goal.  These patches don't
> > add new functionality; they merely describe minor topographical
> > differences in new hardware.  We usually try to do that in a more
> > generic way, e.g., via an ACPI method, so the new platform can update
> > the ACPI method and use an old, already-qualified, already-shipped
> > kernel.
> > 
> > I'm not strenuously objecting to these because this isn't a *huge*
> > deal, but I suspect it is a source of friction for distros that don't
> > want to update and requalify their software for every new platform.
> 
> Err, how is this any different from adding distro support for a new CPU
> family?
> 
> This is basically the same thing. When distros add support for new
> hardware, they have to backport patches for upstream. These PCI devices
> which are part of the CPU are part of that hardware enablement.
> 
> So there's no way around doing that enablement. I don't think you can do
> "old distro, new hardware" stuff without *some* hw enablement.

Just depends on how hard you want to work on defining abstract
interfaces.

This isn't some complicated new device where the programming model
changed on the new CPU.  This is a thermometer that was already
supported.  ACPI provides plenty of functionality that could be used
to support this generically, e.g., see drivers/acpi/thermal.c,
drivers/thermal/int340x_thermal/processor_thermal_device.c, etc.

But maybe there's some real value in the nitty-gritty device-specific
code in amd_nb.c.  If so, I guess you're stuck with updates like this
and negotiating with the distros to do backports and new releases.

Bjorn
Borislav Petkov Nov. 6, 2018, 10 p.m. UTC | #11
On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote:
> This isn't some complicated new device where the programming model
> changed on the new CPU.  This is a thermometer that was already
> supported.  ACPI provides plenty of functionality that could be used
> to support this generically, e.g., see drivers/acpi/thermal.c,
> drivers/thermal/int340x_thermal/processor_thermal_device.c, etc.

Ok, you say ACPI but how do you envision practically doing that? I mean,
this is used by old boxes too - ever since K8. So how do we go and add
ACPI functionality to old boxes?

Or do you mean it should simply be converted to do pci_register_driver()
with a struct pci_driver pointer which has all those PCI device IDs in a
table? I'm looking at the last example
drivers/thermal/int340x_thermal/processor_thermal_device.c you gave above.

> But maybe there's some real value in the nitty-gritty device-specific
> code in amd_nb.c.  If so, I guess you're stuck with updates like this
> and negotiating with the distros to do backports and new releases.

Well, even if it is converted to a different registration scheme, you
still need to add new PCI device IDs to the table, no? So *some* sort of
enablement still needs to happen.

And then the argument about needing enablement for distros is moot
because it still needs enablement/backporting - regardless of the
registration scheme.

Or do you mean something else?

I mean, don't get me wrong, I don't mind if it gets converted to
pci_register_driver() if you think it fits better this way with the
drivers registering with PCI devices - I'm just trying to understand the
reasoning for it.

Thx.
Bjorn Helgaas Nov. 6, 2018, 11:20 p.m. UTC | #12
[+cc Sumeet, Srinivas for INT3401 questions below]
[Beginning of thread:
https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/]

On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote:
> On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote:
> > This isn't some complicated new device where the programming model
> > changed on the new CPU.  This is a thermometer that was already
> > supported.  ACPI provides plenty of functionality that could be used
> > to support this generically, e.g., see drivers/acpi/thermal.c,
> > drivers/thermal/int340x_thermal/processor_thermal_device.c, etc.
> 
> Ok, you say ACPI but how do you envision practically doing that? I mean,
> this is used by old boxes too - ever since K8. So how do we go and add
> ACPI functionality to old boxes?
> 
> Or do you mean it should simply be converted to do pci_register_driver()
> with a struct pci_driver pointer which has all those PCI device IDs in a
> table? I'm looking at the last example
> drivers/thermal/int340x_thermal/processor_thermal_device.c you gave above.

No, there would be no need to change anything for boxes already in the
field.  But for *new* systems, you could make devices or thermal zones
in the ACPI namespace (they might even already be there for use by
Windows).

drivers/thermal/int340x_thermal/processor_thermal_device.c claims
either INT3401 ACPI devices or listed PCI devices.  It looks like it
tries the platform (INT3401) devices first, and if it finds any, it
ignores any matching PCI devices.  This *could* be so it uses INT3401
devices on new platforms and falls back to the PCI devices otherwise,
although there *are* recent updates that add PCI IDs.

It looks like INT3401 is Intel-specific since it uses a PPCC method
which isn't defined by the ACPI spec.  But AMD could define a similar
PNP ID and have new platforms expose ACPI devices with _TMP methods
that know how to read the sensor on that platform.

Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone
(ACPI 6.2, sec 11), would be sufficient.  I don't know what the
relationship between hwmon and other thermal stuff, e.g.,
Documentation/thermal/sysfs-api.txt is.  acpi/thermal.c looks tied
into the drivers/thermal stuff (it registers "thermal_zone" devices),
but not to hwmon.

> > But maybe there's some real value in the nitty-gritty device-specific
> > code in amd_nb.c.  If so, I guess you're stuck with updates like this
> > and negotiating with the distros to do backports and new releases.
> 
> Well, even if it is converted to a different registration scheme, you
> still need to add new PCI device IDs to the table, no? So *some* sort of
> enablement still needs to happen.

As long as we have a driver that knows how to claim a known PNP ID,
and every new platform exposes a device compatible with that ID, the
driver should just work.

Bjorn
Borislav Petkov Nov. 7, 2018, 9:18 a.m. UTC | #13
On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote:
> Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone
> (ACPI 6.2, sec 11), would be sufficient.  I don't know what the
> relationship between hwmon and other thermal stuff, e.g.,
> Documentation/thermal/sysfs-api.txt is.  acpi/thermal.c looks tied
> into the drivers/thermal stuff (it registers "thermal_zone" devices),
> but not to hwmon.

Err, I still don't think I'm catching your drift but let me stop you
right there: amd_nb is not there only for hwmon/k10temp. It is a small
interface glue if you will, which exports the CPU functionality in PCI
config space to other consumers.

So it is not really a driver - it is used by drivers to talk/query CPU
settings through it.

With that said, I don't think I understand all that talk about PNP IDs
and ACPI methods. But maybe I'm missing something...

So what's up?
Bjorn Helgaas Nov. 7, 2018, 1:38 p.m. UTC | #14
On Wed, Nov 07, 2018 at 10:18:38AM +0100, Borislav Petkov wrote:
> On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote:
> > Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone
> > (ACPI 6.2, sec 11), would be sufficient.  I don't know what the
> > relationship between hwmon and other thermal stuff, e.g.,
> > Documentation/thermal/sysfs-api.txt is.  acpi/thermal.c looks tied
> > into the drivers/thermal stuff (it registers "thermal_zone" devices),
> > but not to hwmon.
> 
> Err, I still don't think I'm catching your drift but let me stop you
> right there: amd_nb is not there only for hwmon/k10temp. It is a small
> interface glue if you will, which exports the CPU functionality in PCI
> config space to other consumers.
> 
> So it is not really a driver - it is used by drivers to talk/query CPU
> settings through it.
> 
> With that said, I don't think I understand all that talk about PNP IDs
> and ACPI methods. But maybe I'm missing something...

Firmware supplies ACPI namespace.  The namespace contains an abstract
description of the platform, including devices.  Devices are
identified by PNP IDs, which are analogous to PCI vendor/device IDs,
except that a device may have several generic "compatible device IDs"
in addition to an ID unique to the device.  Devices may also contain
methods (supplied by firmware as part of the namespace), which are
essentially bytecode that can be executed by the ACPI interpreter in
the kernel.  Linux drivers claim ACPI devices based on PNP ID and
operate them using either ACPI methods (which can decouple the driver
from device specifics) or the usual direct MMIO/IO port/MSR style.

Here's an outline of how it *could* work:

  - AMD defines "AMD0001" device ID for the CPU temp sensor
  - BIOS supplies AMD0001 devices in the ACPI namespace
  - Each AMD0001 device has a _TMP method (supplied by BIOS and
    specific to the CPU)
  - Linux driver claims AMD0001 devices
  - Driver reads temp sensors by executing _TMP methods (Linux ACPI
    interpreter runs the bytecode)

That way when you release a new platform with different temp sensors,
you update the BIOS AMD0001 devices and _TMP methods to know about
them, and the old Linux driver works unchanged.

Bjorn
Guenter Roeck Nov. 7, 2018, 1:51 p.m. UTC | #15
On 11/7/18 1:18 AM, Borislav Petkov wrote:
> On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote:
>> Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone
>> (ACPI 6.2, sec 11), would be sufficient.  I don't know what the
>> relationship between hwmon and other thermal stuff, e.g.,
>> Documentation/thermal/sysfs-api.txt is.  acpi/thermal.c looks tied
>> into the drivers/thermal stuff (it registers "thermal_zone" devices),
>> but not to hwmon.
> 
> Err, I still don't think I'm catching your drift but let me stop you
> right there: amd_nb is not there only for hwmon/k10temp. It is a small
> interface glue if you will, which exports the CPU functionality in PCI
> config space to other consumers.
> 

Also, thermal and hwmon are orthogonal, just like hwmon and iio. One would
typically have a driver in one subsystem, in some cases bridging to the
other subsystem, but one would not have drivers in both subsystems.
I think Bjorn is suggesting that the k10temp driver should move to the
thermal subsystem, though I don't really understand what that has to do
with finding the correct PCI device(s) to query. Or maybe I misunderstand.

Guenter

> So it is not really a driver - it is used by drivers to talk/query CPU
> settings through it.
> 
> With that said, I don't think I understand all that talk about PNP IDs
> and ACPI methods. But maybe I'm missing something...
> 
> So what's up?
>
Borislav Petkov Nov. 7, 2018, 4:07 p.m. UTC | #16
On Wed, Nov 07, 2018 at 07:38:56AM -0600, Bjorn Helgaas wrote:
> Firmware supplies ACPI namespace.  The namespace contains an abstract
> description of the platform, including devices.  Devices are
> identified by PNP IDs, which are analogous to PCI vendor/device IDs,
> except that a device may have several generic "compatible device IDs"
> in addition to an ID unique to the device.  Devices may also contain
> methods (supplied by firmware as part of the namespace), which are
> essentially bytecode that can be executed by the ACPI interpreter in
> the kernel.  Linux drivers claim ACPI devices based on PNP ID and
> operate them using either ACPI methods (which can decouple the driver
> from device specifics) or the usual direct MMIO/IO port/MSR style.
> 
> Here's an outline of how it *could* work:
> 
>   - AMD defines "AMD0001" device ID for the CPU temp sensor
>   - BIOS supplies AMD0001 devices in the ACPI namespace
>   - Each AMD0001 device has a _TMP method (supplied by BIOS and
>     specific to the CPU)
>   - Linux driver claims AMD0001 devices
>   - Driver reads temp sensors by executing _TMP methods (Linux ACPI
>     interpreter runs the bytecode)

Thanks for explaining.

> That way when you release a new platform with different temp sensors,
> you update the BIOS AMD0001 devices and _TMP methods to know about
> them, and the old Linux driver works unchanged.

So I don't know about temp sensors - I'm talking about amd_nb which is
something... well, I explained already what it is in my previous mail so
I won't repeat myself.

Anyway, if there is such a PNP ID device - and I believe I have stumbled
upon some blurb about it in the BKDGs - which says "this device
represents the PCI device IDs of a CPU" and if that can be used to
register amd_nb through it, then sure, I don't see why not.

This way, when new CPU comes out and the same PNP ID device is present,
amd_nb would load, sure.

Maybe Brian knows more on the subject...
Bjorn Helgaas Nov. 7, 2018, 5:10 p.m. UTC | #17
On Wed, Nov 07, 2018 at 05:07:07PM +0100, Borislav Petkov wrote:
> On Wed, Nov 07, 2018 at 07:38:56AM -0600, Bjorn Helgaas wrote:
> > Firmware supplies ACPI namespace.  The namespace contains an abstract
> > description of the platform, including devices.  Devices are
> > identified by PNP IDs, which are analogous to PCI vendor/device IDs,
> > except that a device may have several generic "compatible device IDs"
> > in addition to an ID unique to the device.  Devices may also contain
> > methods (supplied by firmware as part of the namespace), which are
> > essentially bytecode that can be executed by the ACPI interpreter in
> > the kernel.  Linux drivers claim ACPI devices based on PNP ID and
> > operate them using either ACPI methods (which can decouple the driver
> > from device specifics) or the usual direct MMIO/IO port/MSR style.
> > 
> > Here's an outline of how it *could* work:
> > 
> >   - AMD defines "AMD0001" device ID for the CPU temp sensor
> >   - BIOS supplies AMD0001 devices in the ACPI namespace
> >   - Each AMD0001 device has a _TMP method (supplied by BIOS and
> >     specific to the CPU)
> >   - Linux driver claims AMD0001 devices
> >   - Driver reads temp sensors by executing _TMP methods (Linux ACPI
> >     interpreter runs the bytecode)
> 
> Thanks for explaining.
> 
> > That way when you release a new platform with different temp sensors,
> > you update the BIOS AMD0001 devices and _TMP methods to know about
> > them, and the old Linux driver works unchanged.
> 
> So I don't know about temp sensors - I'm talking about amd_nb which is
> something... well, I explained already what it is in my previous mail so
> I won't repeat myself.
> 
> Anyway, if there is such a PNP ID device - and I believe I have stumbled
> upon some blurb about it in the BKDGs - which says "this device
> represents the PCI device IDs of a CPU" and if that can be used to
> register amd_nb through it, then sure, I don't see why not.
> 
> This way, when new CPU comes out and the same PNP ID device is present,
> amd_nb would load, sure.

No, the idea was more that that temp monitoring, e.g., k10temp, could
be independent of amd_nb.

But I can tell this idea isn't going anywhere, so let's just forget
that I stuck my neck out and let it die on the vine :)

Bjorn
Bjorn Helgaas Nov. 7, 2018, 5:16 p.m. UTC | #18
On Wed, Nov 07, 2018 at 05:51:22AM -0800, Guenter Roeck wrote:
> On 11/7/18 1:18 AM, Borislav Petkov wrote:
> > On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote:
> > > Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone
> > > (ACPI 6.2, sec 11), would be sufficient.  I don't know what the
> > > relationship between hwmon and other thermal stuff, e.g.,
> > > Documentation/thermal/sysfs-api.txt is.  acpi/thermal.c looks tied
> > > into the drivers/thermal stuff (it registers "thermal_zone" devices),
> > > but not to hwmon.
> > 
> > Err, I still don't think I'm catching your drift but let me stop you
> > right there: amd_nb is not there only for hwmon/k10temp. It is a small
> > interface glue if you will, which exports the CPU functionality in PCI
> > config space to other consumers.
> 
> Also, thermal and hwmon are orthogonal, just like hwmon and iio. One would
> typically have a driver in one subsystem, in some cases bridging to the
> other subsystem, but one would not have drivers in both subsystems.
> I think Bjorn is suggesting that the k10temp driver should move to the
> thermal subsystem, though I don't really understand what that has to do
> with finding the correct PCI device(s) to query. Or maybe I misunderstand.

Not really; I'm suggesting that it's possible to make k10temp work in
a way that requires less knowledge about the AMD topology and hence
fewer changes for new platforms.

Today, k10temp needs CPU/PCI topology info from amd_nb to read the
sensors via PCI registers.  k10temp could conceivably read the sensors
via ACPI methods, which means that topology info would be in the
firmware and k10temp wouldn't depend on amd_nb.

Bjorn
Borislav Petkov Nov. 7, 2018, 5:17 p.m. UTC | #19
On Wed, Nov 07, 2018 at 11:10:44AM -0600, Bjorn Helgaas wrote:
> No, the idea was more that that temp monitoring, e.g., k10temp, could
> be independent of amd_nb.
> 
> But I can tell this idea isn't going anywhere, so let's just forget
> that I stuck my neck out and let it die on the vine :)

No no, your idea makes sense. I see the advantage of not having to do
enablement so that old k10temp can load on new machines. Sure.

We just need to find out about the PNP ID...

:-)
srinivas pandruvada Nov. 7, 2018, 7:15 p.m. UTC | #20
On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote:
> [+cc Sumeet, Srinivas for INT3401 questions below]
> [Beginning of thread:
> 
https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/
> ]
> 
> On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote:
> > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote:
> > > This isn't some complicated new device where the programming
> > > model
> > > changed on the new CPU.  This is a thermometer that was already
> > > supported.  ACPI provides plenty of functionality that could be
> > > used
> > > to support this generically, e.g., see drivers/acpi/thermal.c,
> > > drivers/thermal/int340x_thermal/processor_thermal_device.c, etc.
> > 
> > Ok, you say ACPI but how do you envision practically doing that? I
> > mean,
> > this is used by old boxes too - ever since K8. So how do we go and
> > add
> > ACPI functionality to old boxes?
> > 
> > Or do you mean it should simply be converted to do
> > pci_register_driver()
> > with a struct pci_driver pointer which has all those PCI device IDs
> > in a
> > table? I'm looking at the last example
> > drivers/thermal/int340x_thermal/processor_thermal_device.c you gave
> > above.
> 
> No, there would be no need to change anything for boxes already in
> the
> field.  But for *new* systems, you could make devices or thermal
> zones
> in the ACPI namespace (they might even already be there for use by
> Windows).
> 
> drivers/thermal/int340x_thermal/processor_thermal_device.c claims
> either INT3401 ACPI devices or listed PCI devices. 

To enumerate a driver to get processor temperature and get power
properties, we have two methods:
-  The older Atom processors valleyview and Baytrail had no PCI device
for the processor thermal management. There was INT3401 ACPI device to
handle this.

- The newer processors for core and Atom, has a dedicate PCI device and
there is no INT3401 ACPI device anymore.
Since OEM systems will have different power properties and thermal
trips, there is a companion ACPI device, which provides PPCC and
thermal trips and optionally output from another temperature sensor
from the default on processor cores.

Thanks,
Srinivas



>  It looks like it
> tries the platform (INT3401) devices first, and if it finds any, it
> ignores any matching PCI devices.  This *could* be so it uses INT3401
> devices on new platforms and falls back to the PCI devices otherwise,
> although there *are* recent updates that add PCI IDs.
> 
> It looks like INT3401 is Intel-specific since it uses a PPCC method
> which isn't defined by the ACPI spec.  But AMD could define a similar
> PNP ID and have new platforms expose ACPI devices with _TMP methods
> that know how to read the sensor on that platform.
> 
> Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone
> (ACPI 6.2, sec 11), would be sufficient.  I don't know what the
> relationship between hwmon and other thermal stuff, e.g.,
> Documentation/thermal/sysfs-api.txt is.  acpi/thermal.c looks tied
> into the drivers/thermal stuff (it registers "thermal_zone" devices),
> but not to hwmon.
> 
> > > But maybe there's some real value in the nitty-gritty device-
> > > specific
> > > code in amd_nb.c.  If so, I guess you're stuck with updates like
> > > this
> > > and negotiating with the distros to do backports and new
> > > releases.
> > 
> > Well, even if it is converted to a different registration scheme,
> > you
> > still need to add new PCI device IDs to the table, no? So *some*
> > sort of
> > enablement still needs to happen.
> 
> As long as we have a driver that knows how to claim a known PNP ID,
> and every new platform exposes a device compatible with that ID, the
> driver should just work.
> 
> Bjorn
Woods, Brian Nov. 7, 2018, 7:50 p.m. UTC | #21
On Wed, Nov 07, 2018 at 05:07:07PM +0100, Borislav Petkov wrote:
> Thanks for explaining.
> 
> So I don't know about temp sensors - I'm talking about amd_nb which is
> something... well, I explained already what it is in my previous mail so
> I won't repeat myself.
> 
> Anyway, if there is such a PNP ID device - and I believe I have stumbled
> upon some blurb about it in the BKDGs - which says "this device
> represents the PCI device IDs of a CPU" and if that can be used to
> register amd_nb through it, then sure, I don't see why not.
> 
> This way, when new CPU comes out and the same PNP ID device is present,
> amd_nb would load, sure.
> 
> Maybe Brian knows more on the subject...
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

More on exactly what?

If it's the PNP ID device, if you can let me know where in the BKDG you
found it before so I can look for keywords etc?  Because I'd need to
find those for k8 onwards so.
Bjorn Helgaas Nov. 7, 2018, 9:31 p.m. UTC | #22
On Wed, Nov 07, 2018 at 11:15:37AM -0800, Srinivas Pandruvada wrote:
> On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote:
> > [+cc Sumeet, Srinivas for INT3401 questions below]
> > [Beginning of thread:
> > 
> https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/
> > ]
> > 
> > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote:
> > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote:
> > > > This isn't some complicated new device where the programming
> > > > model changed on the new CPU.  This is a thermometer that was
> > > > already supported.  ACPI provides plenty of functionality that
> > > > could be used to support this generically, e.g., see
> > > > drivers/acpi/thermal.c,
> > > > drivers/thermal/int340x_thermal/processor_thermal_device.c,
> > > > etc.
> > > 
> > > Ok, you say ACPI but how do you envision practically doing that?
> > > I mean, this is used by old boxes too - ever since K8. So how do
> > > we go and add ACPI functionality to old boxes?
> > > 
> > > Or do you mean it should simply be converted to do
> > > pci_register_driver() with a struct pci_driver pointer which has
> > > all those PCI device IDs in a table? I'm looking at the last
> > > example
> > > drivers/thermal/int340x_thermal/processor_thermal_device.c you
> > > gave above.
> > 
> > No, there would be no need to change anything for boxes already in
> > the field.  But for *new* systems, you could make devices or
> > thermal zones in the ACPI namespace (they might even already be
> > there for use by Windows).
> > 
> > drivers/thermal/int340x_thermal/processor_thermal_device.c claims
> > either INT3401 ACPI devices or listed PCI devices. 
> 
> To enumerate a driver to get processor temperature and get power
> properties, we have two methods:
> -  The older Atom processors valleyview and Baytrail had no PCI device
> for the processor thermal management. There was INT3401 ACPI device to
> handle this.
> 
> - The newer processors for core and Atom, has a dedicate PCI device and
> there is no INT3401 ACPI device anymore.

This is really interesting because it's completely the reverse of what
I would have expected.

You use INT3401 on the *older* processors, where int3401_add() is
called for any platform devices with INT3401 PNP ID:

  int3401_add(plat_dev)                   # platform/ACPI .probe
    proc_thermal_add(dev)
      adev = ACPI_COMPANION(dev)
      int340x_thermal_zone_add(adev)
	thermal_zone_device_register()

The sensors are read in this path, where thermal_zone_get_temp() is
the generic thermal entry point:

  thermal_zone_get_temp()
    tz->ops->get_temp()
      int340x_thermal_get_zone_temp()    # ops.get_temp
	acpi_evaluate_integer(..., "_TMP", ...)

The above works for any platform that supplies the INT3401 device
because the _TMP method that actually reads the sensor is supplied by
the platform firmware.

On *newer* processors, you apparently use this path:

  proc_thermal_pci_probe(pci_dev)        # PCI .probe
    pci_enable_device(pci_dev)
    proc_thermal_add(dev)
      adev = ACPI_COMPANION(dev)
      int340x_thermal_zone_add(adev)
	thermal_zone_device_register()

Except for enabling the PCI device and a BSW_THERMAL hack, this is
exactly the *SAME*: you add a thermal zone for the ACPI device and
read the sensor using ACPI _TMP methods.

But now you have to add new PCI IDs (Skylake, Broxton, CannonLake,
CoffeeLake, GeminiLake, etc) for every new platform.  This seems like
a regression, not an improvement.  What am I missing?

> Since OEM systems will have different power properties and thermal
> trips, there is a companion ACPI device, which provides PPCC and
> thermal trips and optionally output from another temperature sensor
> from the default on processor cores.

Bjorn
srinivas pandruvada Nov. 7, 2018, 10:42 p.m. UTC | #23
On Wed, 2018-11-07 at 15:31 -0600, Bjorn Helgaas wrote:
> On Wed, Nov 07, 2018 at 11:15:37AM -0800, Srinivas Pandruvada wrote:
> > On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote:
> > > [+cc Sumeet, Srinivas for INT3401 questions below]
> > > [Beginning of thread:
> > > 
> > 
> > 
https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/
> > > ]
> > > 
> > > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote:
> > > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote:
> > > > > This isn't some complicated new device where the programming
> > > > > model changed on the new CPU.  This is a thermometer that was
> > > > > already supported.  ACPI provides plenty of functionality
> > > > > that
> > > > > could be used to support this generically, e.g., see
> > > > > drivers/acpi/thermal.c,
> > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c,
> > > > > etc.
> > > > 
> > > > Ok, you say ACPI but how do you envision practically doing
> > > > that?
> > > > I mean, this is used by old boxes too - ever since K8. So how
> > > > do
> > > > we go and add ACPI functionality to old boxes?
> > > > 
> > > > Or do you mean it should simply be converted to do
> > > > pci_register_driver() with a struct pci_driver pointer which
> > > > has
> > > > all those PCI device IDs in a table? I'm looking at the last
> > > > example
> > > > drivers/thermal/int340x_thermal/processor_thermal_device.c you
> > > > gave above.
> > > 
> > > No, there would be no need to change anything for boxes already
> > > in
> > > the field.  But for *new* systems, you could make devices or
> > > thermal zones in the ACPI namespace (they might even already be
> > > there for use by Windows).
> > > 
> > > drivers/thermal/int340x_thermal/processor_thermal_device.c claims
> > > either INT3401 ACPI devices or listed PCI devices. 
> > 
> > To enumerate a driver to get processor temperature and get power
> > properties, we have two methods:
> > -  The older Atom processors valleyview and Baytrail had no PCI
> > device
> > for the processor thermal management. There was INT3401 ACPI device
> > to
> > handle this.
> > 
> > - The newer processors for core and Atom, has a dedicate PCI device
> > and
> > there is no INT3401 ACPI device anymore.
> 
> This is really interesting because it's completely the reverse of
> what
> I would have expected.
> 
> You use INT3401 on the *older* processors, where int3401_add() is
> called for any platform devices with INT3401 PNP ID:
> 
>   int3401_add(plat_dev)                   # platform/ACPI .probe
>     proc_thermal_add(dev)
>       adev = ACPI_COMPANION(dev)
>       int340x_thermal_zone_add(adev)
> 	thermal_zone_device_register()
> 
> The sensors are read in this path, where thermal_zone_get_temp() is
> the generic thermal entry point:
> 
>   thermal_zone_get_temp()
>     tz->ops->get_temp()
>       int340x_thermal_get_zone_temp()    # ops.get_temp
> 	acpi_evaluate_integer(..., "_TMP", ...)
> 
> The above works for any platform that supplies the INT3401 device
> because the _TMP method that actually reads the sensor is supplied by
> the platform firmware.
> 
> On *newer* processors, you apparently use this path:
> 
>   proc_thermal_pci_probe(pci_dev)        # PCI .probe
>     pci_enable_device(pci_dev)
>     proc_thermal_add(dev)
>       adev = ACPI_COMPANION(dev)
>       int340x_thermal_zone_add(adev)
> 	thermal_zone_device_register()
> 
> Except for enabling the PCI device and a BSW_THERMAL hack, this is
> exactly the *SAME*: you add a thermal zone for the ACPI device and
> read the sensor using ACPI _TMP methods.
> 
> But now you have to add new PCI IDs (Skylake, Broxton, CannonLake,
> CoffeeLake, GeminiLake, etc) for every new platform.  This seems like
> a regression, not an improvement.  What am I missing?


Path of activation via both devices is same. Both will call
proc_thermal_add().

There is no INT3401 on any newer atom or core platforms, so you can't
enumerate on this device. We don't control what ACPI device is present
on a system. It depends on what the other non-Linux OS is using.
 Also the PCI  ACPI companion device doesn't have to supply a _TMP
method.


The INT3401 is a special device which must have _TMP otherwise firmware
validation will fail. Yes, if there is INT3401 on all platforms we
don't need PCI enumeration just for temperature and trips. But this PCI
device brings in lots of other features which are still in works.


Not sure about the context of discussion here. Are you suggesting some
core changes where we don't have to add PCI ids like Skylake etc. ?


Thanks,
Srinivas

> 


> > Since OEM systems will have different power properties and thermal
> > trips, there is a companion ACPI device, which provides PPCC and
> > thermal trips and optionally output from another temperature sensor
> > from the default on processor cores.
> 
> Bjorn
Bjorn Helgaas Nov. 7, 2018, 11:14 p.m. UTC | #24
On Wed, Nov 07, 2018 at 02:42:00PM -0800, Srinivas Pandruvada wrote:
> On Wed, 2018-11-07 at 15:31 -0600, Bjorn Helgaas wrote:
> > On Wed, Nov 07, 2018 at 11:15:37AM -0800, Srinivas Pandruvada wrote:
> > > On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote:
> > > > [+cc Sumeet, Srinivas for INT3401 questions below]
> > > > [Beginning of thread:
> https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/
> > > > ]
> > > > 
> > > > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote:
> > > > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote:
> > > > > > This isn't some complicated new device where the programming
> > > > > > model changed on the new CPU.  This is a thermometer that was
> > > > > > already supported.  ACPI provides plenty of functionality
> > > > > > that
> > > > > > could be used to support this generically, e.g., see
> > > > > > drivers/acpi/thermal.c,
> > > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c,
> > > > > > etc.
> > > > > 
> > > > > Ok, you say ACPI but how do you envision practically doing
> > > > > that?
> > > > > I mean, this is used by old boxes too - ever since K8. So how
> > > > > do
> > > > > we go and add ACPI functionality to old boxes?
> > > > > 
> > > > > Or do you mean it should simply be converted to do
> > > > > pci_register_driver() with a struct pci_driver pointer which
> > > > > has
> > > > > all those PCI device IDs in a table? I'm looking at the last
> > > > > example
> > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c you
> > > > > gave above.
> > > > 
> > > > No, there would be no need to change anything for boxes already
> > > > in
> > > > the field.  But for *new* systems, you could make devices or
> > > > thermal zones in the ACPI namespace (they might even already be
> > > > there for use by Windows).
> > > > 
> > > > drivers/thermal/int340x_thermal/processor_thermal_device.c claims
> > > > either INT3401 ACPI devices or listed PCI devices. 
> > > 
> > > To enumerate a driver to get processor temperature and get power
> > > properties, we have two methods:
> > > -  The older Atom processors valleyview and Baytrail had no PCI
> > > device
> > > for the processor thermal management. There was INT3401 ACPI device
> > > to
> > > handle this.
> > > 
> > > - The newer processors for core and Atom, has a dedicate PCI device
> > > and
> > > there is no INT3401 ACPI device anymore.
> > 
> > This is really interesting because it's completely the reverse of
> > what
> > I would have expected.
> > 
> > You use INT3401 on the *older* processors, where int3401_add() is
> > called for any platform devices with INT3401 PNP ID:
> > 
> >   int3401_add(plat_dev)                   # platform/ACPI .probe
> >     proc_thermal_add(dev)
> >       adev = ACPI_COMPANION(dev)
> >       int340x_thermal_zone_add(adev)
> > 	thermal_zone_device_register()
> > 
> > The sensors are read in this path, where thermal_zone_get_temp() is
> > the generic thermal entry point:
> > 
> >   thermal_zone_get_temp()
> >     tz->ops->get_temp()
> >       int340x_thermal_get_zone_temp()    # ops.get_temp
> > 	acpi_evaluate_integer(..., "_TMP", ...)
> > 
> > The above works for any platform that supplies the INT3401 device
> > because the _TMP method that actually reads the sensor is supplied by
> > the platform firmware.
> > 
> > On *newer* processors, you apparently use this path:
> > 
> >   proc_thermal_pci_probe(pci_dev)        # PCI .probe
> >     pci_enable_device(pci_dev)
> >     proc_thermal_add(dev)
> >       adev = ACPI_COMPANION(dev)
> >       int340x_thermal_zone_add(adev)
> > 	thermal_zone_device_register()
> > 
> > Except for enabling the PCI device and a BSW_THERMAL hack, this is
> > exactly the *SAME*: you add a thermal zone for the ACPI device and
> > read the sensor using ACPI _TMP methods.
> > 
> > But now you have to add new PCI IDs (Skylake, Broxton, CannonLake,
> > CoffeeLake, GeminiLake, etc) for every new platform.  This seems like
> > a regression, not an improvement.  What am I missing?
> 
> Path of activation via both devices is same. Both will call
> proc_thermal_add().
> 
> There is no INT3401 on any newer atom or core platforms, so you can't
> enumerate on this device. We don't control what ACPI device is present
> on a system. It depends on what the other non-Linux OS is using.

Sure, you can't *force* OEMs to supply a given ACPI device, but you
can certainly say "if you want this functionality, supply INT3401
devices."  That's what you do with PNP0A03 (PCI host bridges), for
example.  If an OEM doesn't supply PNP0A03 devices, the system can
boot just fine as long as you don't need PCI.

This model of using the PCI IDs forces OS vendors to release updates
for every new platform.  I guess you must have considered that and
decided whatever benefit you're getting was worth the cost.

>  Also the PCI  ACPI companion device doesn't have to supply a _TMP
> method.

> The INT3401 is a special device which must have _TMP otherwise firmware
> validation will fail. Yes, if there is INT3401 on all platforms we
> don't need PCI enumeration just for temperature and trips. But this PCI
> device brings in lots of other features which are still in works.

You can add new features in ACPI devices just as well as with PCI
enumeration.

> Not sure about the context of discussion here. Are you suggesting some
> core changes where we don't have to add PCI ids like Skylake etc. ?

This started because I suggested that AMD was making work for
themselves by exposing more topology detail than necessary, which
means they have to change amd_nb.c and add PCI IDs to k10temp.c to
accommodate new platforms.

And it looks like Intel is doing the same thing by moving from a model
where some platform specifics can be encapsulated in ACPI methods so a
new platform doesn't force an OS release, to a model where new
platforms require OS driver updates.

Obviously I don't know all the new features you're planning, so I'll
stop pushing on this and wasting your time.

Bjorn
srinivas pandruvada Nov. 7, 2018, 11:30 p.m. UTC | #25
[...]

> Sure, you can't *force* OEMs to supply a given ACPI device, but you
> can certainly say "if you want this functionality, supply INT3401
> devices."  That's what you do with PNP0A03 (PCI host bridges), for
> example.  If an OEM doesn't supply PNP0A03 devices, the system can
> boot just fine as long as you don't need PCI.
> 
> This model of using the PCI IDs forces OS vendors to release updates
> for every new platform.  I guess you must have considered that and
> decided whatever benefit you're getting was worth the cost.

Not worth cost. This is a pain. Every release we end up adding a single
line change to many drivers adding a PCI device id. 
Since there is no unique class_mask for PCI device for these devices,
we need to add device_id for each generation even if there is no
change.
Instead if we have some feature to say don't enumerate for PCI device
id < X and a black list, it will save lot of work.

Thanks,
Srinivas
srinivas pandruvada Nov. 7, 2018, 11:44 p.m. UTC | #26
On Wed, 2018-11-07 at 15:30 -0800, Srinivas Pandruvada wrote:
> [...]
> 
> > Sure, you can't *force* OEMs to supply a given ACPI device, but you
> > can certainly say "if you want this functionality, supply INT3401
> > devices."  That's what you do with PNP0A03 (PCI host bridges), for
> > example.  If an OEM doesn't supply PNP0A03 devices, the system can
> > boot just fine as long as you don't need PCI.
> > 
> > This model of using the PCI IDs forces OS vendors to release
> > updates
> > for every new platform.  I guess you must have considered that and
> > decided whatever benefit you're getting was worth the cost.
> 
> Not worth cost. This is a pain. Every release we end up adding a
> single
> line change to many drivers adding a PCI device id. 
> Since there is no unique class_mask for PCI device for these devices,
> we need to add device_id for each generation even if there is no
> change.
> Instead if we have some feature to say don't enumerate for PCI device
> id < X and a black list, it will save lot of work.
This still needs some work on our internal PCI device allocation scheme
, where we can reserve a block of ids for a PCI device for same
functionality from generation to generation.

Thanks,
Srinivas
Guenter Roeck Nov. 8, 2018, 1:40 a.m. UTC | #27
On 11/7/18 3:14 PM, Bjorn Helgaas wrote:
>
>>
>> There is no INT3401 on any newer atom or core platforms, so you can't
>> enumerate on this device. We don't control what ACPI device is present
>> on a system. It depends on what the other non-Linux OS is using.
> 
> Sure, you can't *force* OEMs to supply a given ACPI device, but you
> can certainly say "if you want this functionality, supply INT3401
> devices."  That's what you do with PNP0A03 (PCI host bridges), for
> example.  If an OEM doesn't supply PNP0A03 devices, the system can
> boot just fine as long as you don't need PCI.
> 
> This model of using the PCI IDs forces OS vendors to release updates
> for every new platform.  I guess you must have considered that and
> decided whatever benefit you're getting was worth the cost.
> 

I really dislike where this is going. Board vendors - and that included
Intel when Intel was still selling boards - have a long history of only
making mandatory methods available in ACPI. Pretty much all of them don't
make hardware monitoring information available via ACPI. This is a pain
especially for laptops where the information is provided by an embedded
controller. On systems with Super-IO chips with dedicated hardware
monitoring functionality, they often go as far as signing mutual NDAs
with chip vendors, which lets both the board and the chip vendor claim
that they can not provide chip specifications to third parties, aka
users.

You are pretty much extending that to CPU temperature monitoring. The
fallout, if adopted, will be that it will effectively no longer be
possible to monitor the temperature on chips supporting this
"feature".

I do not think that would be a good idea.

Guenter
Bjorn Helgaas Nov. 8, 2018, 1:59 p.m. UTC | #28
On Wed, Nov 07, 2018 at 05:40:14PM -0800, Guenter Roeck wrote:
> On 11/7/18 3:14 PM, Bjorn Helgaas wrote:
> > 
> > > 
> > > There is no INT3401 on any newer atom or core platforms, so you can't
> > > enumerate on this device. We don't control what ACPI device is present
> > > on a system. It depends on what the other non-Linux OS is using.
> > 
> > Sure, you can't *force* OEMs to supply a given ACPI device, but you
> > can certainly say "if you want this functionality, supply INT3401
> > devices."  That's what you do with PNP0A03 (PCI host bridges), for
> > example.  If an OEM doesn't supply PNP0A03 devices, the system can
> > boot just fine as long as you don't need PCI.
> > 
> > This model of using the PCI IDs forces OS vendors to release updates
> > for every new platform.  I guess you must have considered that and
> > decided whatever benefit you're getting was worth the cost.
> > 
> 
> I really dislike where this is going. Board vendors - and that included
> Intel when Intel was still selling boards - have a long history of only
> making mandatory methods available in ACPI. Pretty much all of them don't
> make hardware monitoring information available via ACPI. This is a pain
> especially for laptops where the information is provided by an embedded
> controller. On systems with Super-IO chips with dedicated hardware
> monitoring functionality, they often go as far as signing mutual NDAs
> with chip vendors, which lets both the board and the chip vendor claim
> that they can not provide chip specifications to third parties, aka
> users.
> 
> You are pretty much extending that to CPU temperature monitoring. The
> fallout, if adopted, will be that it will effectively no longer be
> possible to monitor the temperature on chips supporting this
> "feature".
> 
> I do not think that would be a good idea.

I wasn't aware of these political implications.  Thanks for raising
them.

I'm not in a position to balance those implications vs the technical
question of minimizing the burden of supporting new platforms, so I'll
try again to bow out of this.

Bjorn
diff mbox series

Patch

diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 19d489ee2b1e..c0bf26aeb7c3 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -213,7 +213,10 @@  int amd_cache_northbridges(void)
 	const struct pci_device_id *root_ids = amd_root_ids;
 	struct pci_dev *root, *misc, *link;
 	struct amd_northbridge *nb;
-	u16 i = 0;
+	u16 roots_per_misc = 0;
+	u16 misc_count = 0;
+	u16 root_count = 0;
+	u16 i, j;
 
 	if (amd_northbridges.num)
 		return 0;
@@ -226,26 +229,52 @@  int amd_cache_northbridges(void)
 
 	misc = NULL;
 	while ((misc = next_northbridge(misc, misc_ids)) != NULL)
-		i++;
+		misc_count++;
 
-	if (!i)
+	root = NULL;
+	while ((root = next_northbridge(root, root_ids)) != NULL)
+		root_count++;
+
+	if (!misc_count)
 		return -ENODEV;
 
-	nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL);
+	if (root_count) {
+		roots_per_misc = root_count / misc_count;
+
+		/*
+		 * There should be _exactly_ N roots for each DF/SMN
+		 * interface.
+		 */
+		if (!roots_per_misc || (root_count % roots_per_misc)) {
+			pr_info("Unsupported AMD DF/PCI configuration found\n");
+			return -ENODEV;
+		}
+	}
+
+	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
 	if (!nb)
 		return -ENOMEM;
 
 	amd_northbridges.nb = nb;
-	amd_northbridges.num = i;
+	amd_northbridges.num = misc_count;
 
 	link = misc = root = NULL;
-	for (i = 0; i != amd_northbridges.num; i++) {
+	for (i = 0; i < amd_northbridges.num; i++) {
 		node_to_amd_nb(i)->root = root =
 			next_northbridge(root, root_ids);
 		node_to_amd_nb(i)->misc = misc =
 			next_northbridge(misc, misc_ids);
 		node_to_amd_nb(i)->link = link =
 			next_northbridge(link, link_ids);
+
+		/*
+		 * If there are more root devices than data fabric/SMN,
+		 * interfaces, then the root devices per DF/SMN
+		 * interface are redundant and N-1 should be skipped so
+		 * they aren't mapped incorrectly.
+		 */
+		for (j = 1; j < roots_per_misc; j++)
+			root = next_northbridge(root, root_ids);
 	}
 
 	if (amd_gart_present())