Message ID | 20181102181055.130531-3-brian.woods@amd.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Update DF/SMN access and k10temp for AMD F17h M30h | expand |
On Fri, Nov 02, 2018 at 06:11:07PM +0000, Woods, Brian wrote: > Add support for new processors which have multiple PCI root complexes > per data fabric/SMN interface. The interfaces per root complex are > redundant and should be skipped. This makes sure the DF/SMN interfaces > get accessed via the correct root complex. SMN? > Ex: > DF/SMN 0 -> 60 > 40 > 20 > 00 > DF/SMN 1 -> e0 > c0 > a0 > 80 This isn't my code, and I'm not really objecting to these changes, but from where I sit, the fact that you need this sort of vendor-specific topology discovery is a little bit ugly and seems like something of a maintenance issue. You could argue that this is sort of an "AMD CPU driver", which is entitled to be device-specific, and that does make some sense. But device-specific code is typically packaged as a driver that uses driver registration interfaces like acpi_bus_register_driver(), pci_register_driver(), etc. That gives you a consistent structure and, more importantly, a framework for dealing with hotplug. It doesn't look like amd_nb.c would deal well with hot-add of CPUs. > Signed-off-by: Brian Woods <brian.woods@amd.com> > --- > arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------ > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c > index 19d489ee2b1e..c0bf26aeb7c3 100644 > --- a/arch/x86/kernel/amd_nb.c > +++ b/arch/x86/kernel/amd_nb.c > @@ -213,7 +213,10 @@ int amd_cache_northbridges(void) > const struct pci_device_id *root_ids = amd_root_ids; > struct pci_dev *root, *misc, *link; > struct amd_northbridge *nb; > - u16 i = 0; > + u16 roots_per_misc = 0; > + u16 misc_count = 0; > + u16 root_count = 0; > + u16 i, j; > > if (amd_northbridges.num) > return 0; > @@ -226,26 +229,52 @@ int amd_cache_northbridges(void) > > misc = NULL; > while ((misc = next_northbridge(misc, misc_ids)) != NULL) > - i++; > + misc_count++; > > - if (!i) > + root = NULL; > + while ((root = next_northbridge(root, root_ids)) != NULL) > + root_count++; > + > + if (!misc_count) > return -ENODEV; > > - nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL); > + if (root_count) { > + roots_per_misc = root_count / misc_count; > + > + /* > + * There should be _exactly_ N roots for each DF/SMN > + * interface. > + */ > + if (!roots_per_misc || (root_count % roots_per_misc)) { > + pr_info("Unsupported AMD DF/PCI configuration found\n"); > + return -ENODEV; > + } > + } > + > + nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); > if (!nb) > return -ENOMEM; > > amd_northbridges.nb = nb; > - amd_northbridges.num = i; > + amd_northbridges.num = misc_count; > > link = misc = root = NULL; > - for (i = 0; i != amd_northbridges.num; i++) { > + for (i = 0; i < amd_northbridges.num; i++) { > node_to_amd_nb(i)->root = root = > next_northbridge(root, root_ids); > node_to_amd_nb(i)->misc = misc = > next_northbridge(misc, misc_ids); > node_to_amd_nb(i)->link = link = > next_northbridge(link, link_ids); > + > + /* > + * If there are more root devices than data fabric/SMN, > + * interfaces, then the root devices per DF/SMN > + * interface are redundant and N-1 should be skipped so > + * they aren't mapped incorrectly. > + */ > + for (j = 1; j < roots_per_misc; j++) > + root = next_northbridge(root, root_ids); > } > > if (amd_gart_present()) > -- > 2.11.0 >
On Fri, Nov 02, 2018 at 02:59:25PM -0500, Bjorn Helgaas wrote: > This isn't my code, and I'm not really objecting to these changes, but > from where I sit, the fact that you need this sort of vendor-specific > topology discovery is a little bit ugly and seems like something of a > maintenance issue. You could argue that this is sort of an "AMD CPU > driver", which is entitled to be device-specific, and that does make > some sense. It is a bunch of glue code which enumerates the PCI devices a CPU has and other in-kernel users can use that instead of doing the discovery/enumeration themselves. > But device-specific code is typically packaged as a driver that uses > driver registration interfaces like acpi_bus_register_driver(), > pci_register_driver(), etc. That gives you a consistent structure > and, more importantly, a framework for dealing with hotplug. It > doesn't look like amd_nb.c would deal well with hot-add of CPUs. If you mean physical hotadd, then that's a non-issue as, AFAIK, AMD doesn't support that. Now, TBH I've never tried soft-offlining the cores of a node and then check whether using the PCI devices of that node would work. Now, I don't mind this getting converted to a proper PCI driver as long as it is not a module as it has to be present at all times. Other than that, I'm a happy camper. Thx.
On Fri, Nov 02, 2018 at 06:11:07PM +0000, Woods, Brian wrote: > Add support for new processors which have multiple PCI root complexes > per data fabric/SMN interface. Please write out abbreviations. I believe it is only you and I who know what SMN means. :) > The interfaces per root complex are redundant and should be skipped. And I believe it is only you who understands that sentence. :) Please elaborate why interfaces need to be skipped, *which* interfaces need to be skipped and which is the correct interface to access DF/SMN through? > This makes sure the DF/SMN interfaces get accessed via the correct > root complex. > > Ex: > DF/SMN 0 -> 60 > 40 > 20 > 00 > DF/SMN 1 -> e0 > c0 > a0 > 80 > > Signed-off-by: Brian Woods <brian.woods@amd.com> > --- > arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------ > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c > index 19d489ee2b1e..c0bf26aeb7c3 100644 > --- a/arch/x86/kernel/amd_nb.c > +++ b/arch/x86/kernel/amd_nb.c > @@ -213,7 +213,10 @@ int amd_cache_northbridges(void) > const struct pci_device_id *root_ids = amd_root_ids; > struct pci_dev *root, *misc, *link; > struct amd_northbridge *nb; > - u16 i = 0; > + u16 roots_per_misc = 0; > + u16 misc_count = 0; > + u16 root_count = 0; > + u16 i, j; > > if (amd_northbridges.num) > return 0; > @@ -226,26 +229,52 @@ int amd_cache_northbridges(void) > > misc = NULL; > while ((misc = next_northbridge(misc, misc_ids)) != NULL) > - i++; > + misc_count++; > > - if (!i) > + root = NULL; > + while ((root = next_northbridge(root, root_ids)) != NULL) > + root_count++; > + > + if (!misc_count) > return -ENODEV; So you're doing the root_count above but returning in the !misc_count case. So that root_count iteration was unnecessary work. IOW, you should keep the misc_count check after its loop. > > - nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL); > + if (root_count) { > + roots_per_misc = root_count / misc_count; > + > + /* > + * There should be _exactly_ N roots for each DF/SMN > + * interface. > + */ > + if (!roots_per_misc || (root_count % roots_per_misc)) { > + pr_info("Unsupported AMD DF/PCI configuration found\n"); > + return -ENODEV; > + } > + } > + > + nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); > if (!nb) > return -ENOMEM; > > amd_northbridges.nb = nb; > - amd_northbridges.num = i; > + amd_northbridges.num = misc_count; > > link = misc = root = NULL; > - for (i = 0; i != amd_northbridges.num; i++) { > + for (i = 0; i < amd_northbridges.num; i++) { > node_to_amd_nb(i)->root = root = > next_northbridge(root, root_ids); > node_to_amd_nb(i)->misc = misc = > next_northbridge(misc, misc_ids); > node_to_amd_nb(i)->link = link = > next_northbridge(link, link_ids); > + > + /* > + * If there are more root devices than data fabric/SMN, > + * interfaces, then the root devices per DF/SMN > + * interface are redundant and N-1 should be skipped so > + * they aren't mapped incorrectly. > + */ This text is trying to explain it a bit better but you still still need to specify which are the redundant ones. All N-1 or is there a special root device through which the DF/SMN gets accessed or? Thx.
On Mon, Nov 05, 2018 at 08:38:40PM +0100, Borislav Petkov wrote: > On Fri, Nov 02, 2018 at 06:11:07PM +0000, Woods, Brian wrote: > > Add support for new processors which have multiple PCI root complexes > > per data fabric/SMN interface. > > Please write out abbreviations. I believe it is only you and I who know > what SMN means. :) Will do. > > The interfaces per root complex are redundant and should be skipped. > > And I believe it is only you who understands that sentence. :) > > Please elaborate why interfaces need to be skipped, *which* interfaces > need to be skipped and which is the correct interface to access DF/SMN > through? See last comment. > > This makes sure the DF/SMN interfaces get accessed via the correct > > root complex. > > > > Ex: > > DF/SMN 0 -> 60 > > 40 > > 20 > > 00 > > DF/SMN 1 -> e0 > > c0 > > a0 > > 80 > > > > Signed-off-by: Brian Woods <brian.woods@amd.com> > > --- > > arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------ > > 1 file changed, 35 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c > > index 19d489ee2b1e..c0bf26aeb7c3 100644 > > --- a/arch/x86/kernel/amd_nb.c > > +++ b/arch/x86/kernel/amd_nb.c > > @@ -213,7 +213,10 @@ int amd_cache_northbridges(void) > > const struct pci_device_id *root_ids = amd_root_ids; > > struct pci_dev *root, *misc, *link; > > struct amd_northbridge *nb; > > - u16 i = 0; > > + u16 roots_per_misc = 0; > > + u16 misc_count = 0; > > + u16 root_count = 0; > > + u16 i, j; > > > > if (amd_northbridges.num) > > return 0; > > @@ -226,26 +229,52 @@ int amd_cache_northbridges(void) > > > > misc = NULL; > > while ((misc = next_northbridge(misc, misc_ids)) != NULL) > > - i++; > > + misc_count++; > > > > - if (!i) > > + root = NULL; > > + while ((root = next_northbridge(root, root_ids)) != NULL) > > + root_count++; > > + > > + if (!misc_count) > > return -ENODEV; > > So you're doing the root_count above but returning in the !misc_count > case. So that root_count iteration was unnecessary work. IOW, you should > keep the misc_count check after its loop. I think having them togeter is cleaner. If you aren't finding any misc IDs, I highly doubt you'll find any root IDs. There shouldn't be much of a difference in how fast the function exits, either way. If you want it the other way though, I don't mind changing it. > > > > - nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL); > > + if (root_count) { > > + roots_per_misc = root_count / misc_count; > > + > > + /* > > + * There should be _exactly_ N roots for each DF/SMN > > + * interface. > > + */ > > + if (!roots_per_misc || (root_count % roots_per_misc)) { > > + pr_info("Unsupported AMD DF/PCI configuration found\n"); > > + return -ENODEV; > > + } > > + } > > + > > + nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); > > if (!nb) > > return -ENOMEM; > > > > amd_northbridges.nb = nb; > > - amd_northbridges.num = i; > > + amd_northbridges.num = misc_count; > > > > link = misc = root = NULL; > > - for (i = 0; i != amd_northbridges.num; i++) { > > + for (i = 0; i < amd_northbridges.num; i++) { > > node_to_amd_nb(i)->root = root = > > next_northbridge(root, root_ids); > > node_to_amd_nb(i)->misc = misc = > > next_northbridge(misc, misc_ids); > > node_to_amd_nb(i)->link = link = > > next_northbridge(link, link_ids); > > + > > + /* > > + * If there are more root devices than data fabric/SMN, > > + * interfaces, then the root devices per DF/SMN > > + * interface are redundant and N-1 should be skipped so > > + * they aren't mapped incorrectly. > > + */ > > This text is trying to explain it a bit better but you still still need > to specify which are the redundant ones. All N-1 or is there a special > root device through which the DF/SMN gets accessed or? > > Thx. Would /* * If there are more PCI root devices than data fabric/ * system management network interfaces, then the (N) * PCI roots per DF/SMN interface are functionally the * same (for DF/SMN access) and N-1 are redundant. The * N-1 PCI roots should be skipped per DF/SMN interface * so the DF/SMN interfaces get mapped to the correct * PCI root. */ be better? I would update the commit msg also. > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, Nov 05, 2018 at 08:33:34PM +0000, Woods, Brian wrote: > I think having them togeter is cleaner. If you aren't finding any > misc IDs, I highly doubt you'll find any root IDs. There shouldn't > be much of a difference in how fast the function exits, either way. > If you want it the other way though, I don't mind changing it. Yes please. Because this is the usual kernel coding style of calling a function (or a loop which has some result in this case) and testing that result immediately after the function call. > Would > > /* > * If there are more PCI root devices than data fabric/ > * system management network interfaces, then the (N) > * PCI roots per DF/SMN interface are functionally the > * same (for DF/SMN access) and N-1 are redundant. The > * N-1 PCI roots should be skipped per DF/SMN interface > * so the DF/SMN interfaces get mapped to the correct > * PCI root. You say "correct" as there is a special one. But the text before it says they're "functionally the same" wrt DF/SMN access so it sounds to me like we wanna map the first one we find and ignore the others. I.e., we wanna say "... so the DF/SMN interfaces get mapped to the *first* PCI root and the others N-1 ignored." Or am I misreading this? Thx.
[+cc Takashi, Andy, Colin, Myron for potential distro impact] [Beginning of thread: https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/] On Sat, Nov 03, 2018 at 12:29:48AM +0100, Borislav Petkov wrote: > On Fri, Nov 02, 2018 at 02:59:25PM -0500, Bjorn Helgaas wrote: > > This isn't my code, and I'm not really objecting to these changes, but > > from where I sit, the fact that you need this sort of vendor-specific > > topology discovery is a little bit ugly and seems like something of a > > maintenance issue. I think this is the most important part, and I should have elaborated on it instead of getting into the driver structure details below. It is a major goal of ACPI and PCI that an old kernel should work unchanged on a new platform unless it needs to use new functionality introduced in the new platform. amd_nb.c prevents us from achieving that goal. These patches don't add new functionality; they merely describe minor topographical differences in new hardware. We usually try to do that in a more generic way, e.g., via an ACPI method, so the new platform can update the ACPI method and use an old, already-qualified, already-shipped kernel. I'm not strenuously objecting to these because this isn't a *huge* deal, but I suspect it is a source of friction for distros that don't want to update and requalify their software for every new platform. > > You could argue that this is sort of an "AMD CPU > > driver", which is entitled to be device-specific, and that does make > > some sense. > > It is a bunch of glue code which enumerates the PCI devices a CPU > has and other in-kernel users can use that instead of doing the > discovery/enumeration themselves. > > > But device-specific code is typically packaged as a driver that uses > > driver registration interfaces like acpi_bus_register_driver(), > > pci_register_driver(), etc. That gives you a consistent structure > > and, more importantly, a framework for dealing with hotplug. It > > doesn't look like amd_nb.c would deal well with hot-add of CPUs. > > If you mean physical hotadd, then that's a non-issue as, AFAIK, AMD > doesn't support that. > > Now, TBH I've never tried soft-offlining the cores of a node and then > check whether using the PCI devices of that node would work. > > Now, I don't mind this getting converted to a proper PCI driver as long > as it is not a module as it has to be present at all times. Other than > that, I'm a happy camper. amd_nb.c uses pci_get_device(), which is incompatible with hotplug and subverts the usual driver/device ownership model. We could pursue this part of the conversation, but I think it's more fruitful to approach this from the "new machine, old kernel" angle above.
On Mon, Nov 05, 2018 at 03:45:37PM -0600, Bjorn Helgaas wrote: > amd_nb.c prevents us from achieving that goal. These patches don't > add new functionality; they merely describe minor topographical > differences in new hardware. We usually try to do that in a more > generic way, e.g., via an ACPI method, so the new platform can update > the ACPI method and use an old, already-qualified, already-shipped > kernel. > > I'm not strenuously objecting to these because this isn't a *huge* > deal, but I suspect it is a source of friction for distros that don't > want to update and requalify their software for every new platform. Err, how is this any different from adding distro support for a new CPU family? This is basically the same thing. When distros add support for new hardware, they have to backport patches for upstream. These PCI devices which are part of the CPU are part of that hardware enablement. So there's no way around doing that enablement. I don't think you can do "old distro, new hardware" stuff without *some* hw enablement.
On Mon, Nov 05, 2018 at 10:42:33PM +0100, Borislav Petkov wrote: > Yes please. Because this is the usual kernel coding style of calling a > function (or a loop which has some result in this case) and testing that > result immediately after the function call. Done. > You say "correct" as there is a special one. But the text before it says > they're "functionally the same" wrt DF/SMN access so it sounds to me > like we wanna map the first one we find and ignore the others. > > I.e., we wanna say > > "... so the DF/SMN interfaces get mapped to the *first* PCI root and the > others N-1 ignored." > > Or am I misreading this? > > Thx. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. Your understanding is correct. It's more so that the following DF/SMN interface gets mapped correctly. /* * If there are more PCI root devices than data fabric/ * system management network interfaces, then the (N) * PCI roots per DF/SMN interface are functionally the * same (for DF/SMN access) and N-1 are redundant. N-1 * PCI roots should be skipped per DF/SMN interface so * the following DF/SMN interfaces get mapped to * correct PCI roots. */ Does that read clearer?
On Mon, Nov 05, 2018 at 11:32:16PM +0000, Woods, Brian wrote: > Your understanding is correct. It's more so that the following DF/SMN > interface gets mapped correctly. > /* > * If there are more PCI root devices than data fabric/ > * system management network interfaces, then the (N) > * PCI roots per DF/SMN interface are functionally the > * same (for DF/SMN access) and N-1 are redundant. N-1 > * PCI roots should be skipped per DF/SMN interface so > * the following DF/SMN interfaces get mapped to > * correct PCI roots. > */ > Does that read clearer? Yap, thanks!
On Mon, Nov 05, 2018 at 10:56:50PM +0100, Borislav Petkov wrote: > On Mon, Nov 05, 2018 at 03:45:37PM -0600, Bjorn Helgaas wrote: > > amd_nb.c prevents us from achieving that goal. These patches don't > > add new functionality; they merely describe minor topographical > > differences in new hardware. We usually try to do that in a more > > generic way, e.g., via an ACPI method, so the new platform can update > > the ACPI method and use an old, already-qualified, already-shipped > > kernel. > > > > I'm not strenuously objecting to these because this isn't a *huge* > > deal, but I suspect it is a source of friction for distros that don't > > want to update and requalify their software for every new platform. > > Err, how is this any different from adding distro support for a new CPU > family? > > This is basically the same thing. When distros add support for new > hardware, they have to backport patches for upstream. These PCI devices > which are part of the CPU are part of that hardware enablement. > > So there's no way around doing that enablement. I don't think you can do > "old distro, new hardware" stuff without *some* hw enablement. Just depends on how hard you want to work on defining abstract interfaces. This isn't some complicated new device where the programming model changed on the new CPU. This is a thermometer that was already supported. ACPI provides plenty of functionality that could be used to support this generically, e.g., see drivers/acpi/thermal.c, drivers/thermal/int340x_thermal/processor_thermal_device.c, etc. But maybe there's some real value in the nitty-gritty device-specific code in amd_nb.c. If so, I guess you're stuck with updates like this and negotiating with the distros to do backports and new releases. Bjorn
On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote: > This isn't some complicated new device where the programming model > changed on the new CPU. This is a thermometer that was already > supported. ACPI provides plenty of functionality that could be used > to support this generically, e.g., see drivers/acpi/thermal.c, > drivers/thermal/int340x_thermal/processor_thermal_device.c, etc. Ok, you say ACPI but how do you envision practically doing that? I mean, this is used by old boxes too - ever since K8. So how do we go and add ACPI functionality to old boxes? Or do you mean it should simply be converted to do pci_register_driver() with a struct pci_driver pointer which has all those PCI device IDs in a table? I'm looking at the last example drivers/thermal/int340x_thermal/processor_thermal_device.c you gave above. > But maybe there's some real value in the nitty-gritty device-specific > code in amd_nb.c. If so, I guess you're stuck with updates like this > and negotiating with the distros to do backports and new releases. Well, even if it is converted to a different registration scheme, you still need to add new PCI device IDs to the table, no? So *some* sort of enablement still needs to happen. And then the argument about needing enablement for distros is moot because it still needs enablement/backporting - regardless of the registration scheme. Or do you mean something else? I mean, don't get me wrong, I don't mind if it gets converted to pci_register_driver() if you think it fits better this way with the drivers registering with PCI devices - I'm just trying to understand the reasoning for it. Thx.
[+cc Sumeet, Srinivas for INT3401 questions below] [Beginning of thread: https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/] On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote: > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote: > > This isn't some complicated new device where the programming model > > changed on the new CPU. This is a thermometer that was already > > supported. ACPI provides plenty of functionality that could be used > > to support this generically, e.g., see drivers/acpi/thermal.c, > > drivers/thermal/int340x_thermal/processor_thermal_device.c, etc. > > Ok, you say ACPI but how do you envision practically doing that? I mean, > this is used by old boxes too - ever since K8. So how do we go and add > ACPI functionality to old boxes? > > Or do you mean it should simply be converted to do pci_register_driver() > with a struct pci_driver pointer which has all those PCI device IDs in a > table? I'm looking at the last example > drivers/thermal/int340x_thermal/processor_thermal_device.c you gave above. No, there would be no need to change anything for boxes already in the field. But for *new* systems, you could make devices or thermal zones in the ACPI namespace (they might even already be there for use by Windows). drivers/thermal/int340x_thermal/processor_thermal_device.c claims either INT3401 ACPI devices or listed PCI devices. It looks like it tries the platform (INT3401) devices first, and if it finds any, it ignores any matching PCI devices. This *could* be so it uses INT3401 devices on new platforms and falls back to the PCI devices otherwise, although there *are* recent updates that add PCI IDs. It looks like INT3401 is Intel-specific since it uses a PPCC method which isn't defined by the ACPI spec. But AMD could define a similar PNP ID and have new platforms expose ACPI devices with _TMP methods that know how to read the sensor on that platform. Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone (ACPI 6.2, sec 11), would be sufficient. I don't know what the relationship between hwmon and other thermal stuff, e.g., Documentation/thermal/sysfs-api.txt is. acpi/thermal.c looks tied into the drivers/thermal stuff (it registers "thermal_zone" devices), but not to hwmon. > > But maybe there's some real value in the nitty-gritty device-specific > > code in amd_nb.c. If so, I guess you're stuck with updates like this > > and negotiating with the distros to do backports and new releases. > > Well, even if it is converted to a different registration scheme, you > still need to add new PCI device IDs to the table, no? So *some* sort of > enablement still needs to happen. As long as we have a driver that knows how to claim a known PNP ID, and every new platform exposes a device compatible with that ID, the driver should just work. Bjorn
On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote: > Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone > (ACPI 6.2, sec 11), would be sufficient. I don't know what the > relationship between hwmon and other thermal stuff, e.g., > Documentation/thermal/sysfs-api.txt is. acpi/thermal.c looks tied > into the drivers/thermal stuff (it registers "thermal_zone" devices), > but not to hwmon. Err, I still don't think I'm catching your drift but let me stop you right there: amd_nb is not there only for hwmon/k10temp. It is a small interface glue if you will, which exports the CPU functionality in PCI config space to other consumers. So it is not really a driver - it is used by drivers to talk/query CPU settings through it. With that said, I don't think I understand all that talk about PNP IDs and ACPI methods. But maybe I'm missing something... So what's up?
On Wed, Nov 07, 2018 at 10:18:38AM +0100, Borislav Petkov wrote: > On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote: > > Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone > > (ACPI 6.2, sec 11), would be sufficient. I don't know what the > > relationship between hwmon and other thermal stuff, e.g., > > Documentation/thermal/sysfs-api.txt is. acpi/thermal.c looks tied > > into the drivers/thermal stuff (it registers "thermal_zone" devices), > > but not to hwmon. > > Err, I still don't think I'm catching your drift but let me stop you > right there: amd_nb is not there only for hwmon/k10temp. It is a small > interface glue if you will, which exports the CPU functionality in PCI > config space to other consumers. > > So it is not really a driver - it is used by drivers to talk/query CPU > settings through it. > > With that said, I don't think I understand all that talk about PNP IDs > and ACPI methods. But maybe I'm missing something... Firmware supplies ACPI namespace. The namespace contains an abstract description of the platform, including devices. Devices are identified by PNP IDs, which are analogous to PCI vendor/device IDs, except that a device may have several generic "compatible device IDs" in addition to an ID unique to the device. Devices may also contain methods (supplied by firmware as part of the namespace), which are essentially bytecode that can be executed by the ACPI interpreter in the kernel. Linux drivers claim ACPI devices based on PNP ID and operate them using either ACPI methods (which can decouple the driver from device specifics) or the usual direct MMIO/IO port/MSR style. Here's an outline of how it *could* work: - AMD defines "AMD0001" device ID for the CPU temp sensor - BIOS supplies AMD0001 devices in the ACPI namespace - Each AMD0001 device has a _TMP method (supplied by BIOS and specific to the CPU) - Linux driver claims AMD0001 devices - Driver reads temp sensors by executing _TMP methods (Linux ACPI interpreter runs the bytecode) That way when you release a new platform with different temp sensors, you update the BIOS AMD0001 devices and _TMP methods to know about them, and the old Linux driver works unchanged. Bjorn
On 11/7/18 1:18 AM, Borislav Petkov wrote: > On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote: >> Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone >> (ACPI 6.2, sec 11), would be sufficient. I don't know what the >> relationship between hwmon and other thermal stuff, e.g., >> Documentation/thermal/sysfs-api.txt is. acpi/thermal.c looks tied >> into the drivers/thermal stuff (it registers "thermal_zone" devices), >> but not to hwmon. > > Err, I still don't think I'm catching your drift but let me stop you > right there: amd_nb is not there only for hwmon/k10temp. It is a small > interface glue if you will, which exports the CPU functionality in PCI > config space to other consumers. > Also, thermal and hwmon are orthogonal, just like hwmon and iio. One would typically have a driver in one subsystem, in some cases bridging to the other subsystem, but one would not have drivers in both subsystems. I think Bjorn is suggesting that the k10temp driver should move to the thermal subsystem, though I don't really understand what that has to do with finding the correct PCI device(s) to query. Or maybe I misunderstand. Guenter > So it is not really a driver - it is used by drivers to talk/query CPU > settings through it. > > With that said, I don't think I understand all that talk about PNP IDs > and ACPI methods. But maybe I'm missing something... > > So what's up? >
On Wed, Nov 07, 2018 at 07:38:56AM -0600, Bjorn Helgaas wrote: > Firmware supplies ACPI namespace. The namespace contains an abstract > description of the platform, including devices. Devices are > identified by PNP IDs, which are analogous to PCI vendor/device IDs, > except that a device may have several generic "compatible device IDs" > in addition to an ID unique to the device. Devices may also contain > methods (supplied by firmware as part of the namespace), which are > essentially bytecode that can be executed by the ACPI interpreter in > the kernel. Linux drivers claim ACPI devices based on PNP ID and > operate them using either ACPI methods (which can decouple the driver > from device specifics) or the usual direct MMIO/IO port/MSR style. > > Here's an outline of how it *could* work: > > - AMD defines "AMD0001" device ID for the CPU temp sensor > - BIOS supplies AMD0001 devices in the ACPI namespace > - Each AMD0001 device has a _TMP method (supplied by BIOS and > specific to the CPU) > - Linux driver claims AMD0001 devices > - Driver reads temp sensors by executing _TMP methods (Linux ACPI > interpreter runs the bytecode) Thanks for explaining. > That way when you release a new platform with different temp sensors, > you update the BIOS AMD0001 devices and _TMP methods to know about > them, and the old Linux driver works unchanged. So I don't know about temp sensors - I'm talking about amd_nb which is something... well, I explained already what it is in my previous mail so I won't repeat myself. Anyway, if there is such a PNP ID device - and I believe I have stumbled upon some blurb about it in the BKDGs - which says "this device represents the PCI device IDs of a CPU" and if that can be used to register amd_nb through it, then sure, I don't see why not. This way, when new CPU comes out and the same PNP ID device is present, amd_nb would load, sure. Maybe Brian knows more on the subject...
On Wed, Nov 07, 2018 at 05:07:07PM +0100, Borislav Petkov wrote: > On Wed, Nov 07, 2018 at 07:38:56AM -0600, Bjorn Helgaas wrote: > > Firmware supplies ACPI namespace. The namespace contains an abstract > > description of the platform, including devices. Devices are > > identified by PNP IDs, which are analogous to PCI vendor/device IDs, > > except that a device may have several generic "compatible device IDs" > > in addition to an ID unique to the device. Devices may also contain > > methods (supplied by firmware as part of the namespace), which are > > essentially bytecode that can be executed by the ACPI interpreter in > > the kernel. Linux drivers claim ACPI devices based on PNP ID and > > operate them using either ACPI methods (which can decouple the driver > > from device specifics) or the usual direct MMIO/IO port/MSR style. > > > > Here's an outline of how it *could* work: > > > > - AMD defines "AMD0001" device ID for the CPU temp sensor > > - BIOS supplies AMD0001 devices in the ACPI namespace > > - Each AMD0001 device has a _TMP method (supplied by BIOS and > > specific to the CPU) > > - Linux driver claims AMD0001 devices > > - Driver reads temp sensors by executing _TMP methods (Linux ACPI > > interpreter runs the bytecode) > > Thanks for explaining. > > > That way when you release a new platform with different temp sensors, > > you update the BIOS AMD0001 devices and _TMP methods to know about > > them, and the old Linux driver works unchanged. > > So I don't know about temp sensors - I'm talking about amd_nb which is > something... well, I explained already what it is in my previous mail so > I won't repeat myself. > > Anyway, if there is such a PNP ID device - and I believe I have stumbled > upon some blurb about it in the BKDGs - which says "this device > represents the PCI device IDs of a CPU" and if that can be used to > register amd_nb through it, then sure, I don't see why not. > > This way, when new CPU comes out and the same PNP ID device is present, > amd_nb would load, sure. No, the idea was more that that temp monitoring, e.g., k10temp, could be independent of amd_nb. But I can tell this idea isn't going anywhere, so let's just forget that I stuck my neck out and let it die on the vine :) Bjorn
On Wed, Nov 07, 2018 at 05:51:22AM -0800, Guenter Roeck wrote: > On 11/7/18 1:18 AM, Borislav Petkov wrote: > > On Tue, Nov 06, 2018 at 05:20:41PM -0600, Bjorn Helgaas wrote: > > > Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone > > > (ACPI 6.2, sec 11), would be sufficient. I don't know what the > > > relationship between hwmon and other thermal stuff, e.g., > > > Documentation/thermal/sysfs-api.txt is. acpi/thermal.c looks tied > > > into the drivers/thermal stuff (it registers "thermal_zone" devices), > > > but not to hwmon. > > > > Err, I still don't think I'm catching your drift but let me stop you > > right there: amd_nb is not there only for hwmon/k10temp. It is a small > > interface glue if you will, which exports the CPU functionality in PCI > > config space to other consumers. > > Also, thermal and hwmon are orthogonal, just like hwmon and iio. One would > typically have a driver in one subsystem, in some cases bridging to the > other subsystem, but one would not have drivers in both subsystems. > I think Bjorn is suggesting that the k10temp driver should move to the > thermal subsystem, though I don't really understand what that has to do > with finding the correct PCI device(s) to query. Or maybe I misunderstand. Not really; I'm suggesting that it's possible to make k10temp work in a way that requires less knowledge about the AMD topology and hence fewer changes for new platforms. Today, k10temp needs CPU/PCI topology info from amd_nb to read the sensors via PCI registers. k10temp could conceivably read the sensors via ACPI methods, which means that topology info would be in the firmware and k10temp wouldn't depend on amd_nb. Bjorn
On Wed, Nov 07, 2018 at 11:10:44AM -0600, Bjorn Helgaas wrote: > No, the idea was more that that temp monitoring, e.g., k10temp, could > be independent of amd_nb. > > But I can tell this idea isn't going anywhere, so let's just forget > that I stuck my neck out and let it die on the vine :) No no, your idea makes sense. I see the advantage of not having to do enablement so that old k10temp can load on new machines. Sure. We just need to find out about the PNP ID... :-)
On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote: > [+cc Sumeet, Srinivas for INT3401 questions below] > [Beginning of thread: > https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/ > ] > > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote: > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote: > > > This isn't some complicated new device where the programming > > > model > > > changed on the new CPU. This is a thermometer that was already > > > supported. ACPI provides plenty of functionality that could be > > > used > > > to support this generically, e.g., see drivers/acpi/thermal.c, > > > drivers/thermal/int340x_thermal/processor_thermal_device.c, etc. > > > > Ok, you say ACPI but how do you envision practically doing that? I > > mean, > > this is used by old boxes too - ever since K8. So how do we go and > > add > > ACPI functionality to old boxes? > > > > Or do you mean it should simply be converted to do > > pci_register_driver() > > with a struct pci_driver pointer which has all those PCI device IDs > > in a > > table? I'm looking at the last example > > drivers/thermal/int340x_thermal/processor_thermal_device.c you gave > > above. > > No, there would be no need to change anything for boxes already in > the > field. But for *new* systems, you could make devices or thermal > zones > in the ACPI namespace (they might even already be there for use by > Windows). > > drivers/thermal/int340x_thermal/processor_thermal_device.c claims > either INT3401 ACPI devices or listed PCI devices. To enumerate a driver to get processor temperature and get power properties, we have two methods: - The older Atom processors valleyview and Baytrail had no PCI device for the processor thermal management. There was INT3401 ACPI device to handle this. - The newer processors for core and Atom, has a dedicate PCI device and there is no INT3401 ACPI device anymore. Since OEM systems will have different power properties and thermal trips, there is a companion ACPI device, which provides PPCC and thermal trips and optionally output from another temperature sensor from the default on processor cores. Thanks, Srinivas > It looks like it > tries the platform (INT3401) devices first, and if it finds any, it > ignores any matching PCI devices. This *could* be so it uses INT3401 > devices on new platforms and falls back to the PCI devices otherwise, > although there *are* recent updates that add PCI IDs. > > It looks like INT3401 is Intel-specific since it uses a PPCC method > which isn't defined by the ACPI spec. But AMD could define a similar > PNP ID and have new platforms expose ACPI devices with _TMP methods > that know how to read the sensor on that platform. > > Or maybe even drivers/acpi/thermal.c, which claims every Thermal Zone > (ACPI 6.2, sec 11), would be sufficient. I don't know what the > relationship between hwmon and other thermal stuff, e.g., > Documentation/thermal/sysfs-api.txt is. acpi/thermal.c looks tied > into the drivers/thermal stuff (it registers "thermal_zone" devices), > but not to hwmon. > > > > But maybe there's some real value in the nitty-gritty device- > > > specific > > > code in amd_nb.c. If so, I guess you're stuck with updates like > > > this > > > and negotiating with the distros to do backports and new > > > releases. > > > > Well, even if it is converted to a different registration scheme, > > you > > still need to add new PCI device IDs to the table, no? So *some* > > sort of > > enablement still needs to happen. > > As long as we have a driver that knows how to claim a known PNP ID, > and every new platform exposes a device compatible with that ID, the > driver should just work. > > Bjorn
On Wed, Nov 07, 2018 at 05:07:07PM +0100, Borislav Petkov wrote: > Thanks for explaining. > > So I don't know about temp sensors - I'm talking about amd_nb which is > something... well, I explained already what it is in my previous mail so > I won't repeat myself. > > Anyway, if there is such a PNP ID device - and I believe I have stumbled > upon some blurb about it in the BKDGs - which says "this device > represents the PCI device IDs of a CPU" and if that can be used to > register amd_nb through it, then sure, I don't see why not. > > This way, when new CPU comes out and the same PNP ID device is present, > amd_nb would load, sure. > > Maybe Brian knows more on the subject... > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. More on exactly what? If it's the PNP ID device, if you can let me know where in the BKDG you found it before so I can look for keywords etc? Because I'd need to find those for k8 onwards so.
On Wed, Nov 07, 2018 at 11:15:37AM -0800, Srinivas Pandruvada wrote: > On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote: > > [+cc Sumeet, Srinivas for INT3401 questions below] > > [Beginning of thread: > > > https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/ > > ] > > > > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote: > > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote: > > > > This isn't some complicated new device where the programming > > > > model changed on the new CPU. This is a thermometer that was > > > > already supported. ACPI provides plenty of functionality that > > > > could be used to support this generically, e.g., see > > > > drivers/acpi/thermal.c, > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c, > > > > etc. > > > > > > Ok, you say ACPI but how do you envision practically doing that? > > > I mean, this is used by old boxes too - ever since K8. So how do > > > we go and add ACPI functionality to old boxes? > > > > > > Or do you mean it should simply be converted to do > > > pci_register_driver() with a struct pci_driver pointer which has > > > all those PCI device IDs in a table? I'm looking at the last > > > example > > > drivers/thermal/int340x_thermal/processor_thermal_device.c you > > > gave above. > > > > No, there would be no need to change anything for boxes already in > > the field. But for *new* systems, you could make devices or > > thermal zones in the ACPI namespace (they might even already be > > there for use by Windows). > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c claims > > either INT3401 ACPI devices or listed PCI devices. > > To enumerate a driver to get processor temperature and get power > properties, we have two methods: > - The older Atom processors valleyview and Baytrail had no PCI device > for the processor thermal management. There was INT3401 ACPI device to > handle this. > > - The newer processors for core and Atom, has a dedicate PCI device and > there is no INT3401 ACPI device anymore. This is really interesting because it's completely the reverse of what I would have expected. You use INT3401 on the *older* processors, where int3401_add() is called for any platform devices with INT3401 PNP ID: int3401_add(plat_dev) # platform/ACPI .probe proc_thermal_add(dev) adev = ACPI_COMPANION(dev) int340x_thermal_zone_add(adev) thermal_zone_device_register() The sensors are read in this path, where thermal_zone_get_temp() is the generic thermal entry point: thermal_zone_get_temp() tz->ops->get_temp() int340x_thermal_get_zone_temp() # ops.get_temp acpi_evaluate_integer(..., "_TMP", ...) The above works for any platform that supplies the INT3401 device because the _TMP method that actually reads the sensor is supplied by the platform firmware. On *newer* processors, you apparently use this path: proc_thermal_pci_probe(pci_dev) # PCI .probe pci_enable_device(pci_dev) proc_thermal_add(dev) adev = ACPI_COMPANION(dev) int340x_thermal_zone_add(adev) thermal_zone_device_register() Except for enabling the PCI device and a BSW_THERMAL hack, this is exactly the *SAME*: you add a thermal zone for the ACPI device and read the sensor using ACPI _TMP methods. But now you have to add new PCI IDs (Skylake, Broxton, CannonLake, CoffeeLake, GeminiLake, etc) for every new platform. This seems like a regression, not an improvement. What am I missing? > Since OEM systems will have different power properties and thermal > trips, there is a companion ACPI device, which provides PPCC and > thermal trips and optionally output from another temperature sensor > from the default on processor cores. Bjorn
On Wed, 2018-11-07 at 15:31 -0600, Bjorn Helgaas wrote: > On Wed, Nov 07, 2018 at 11:15:37AM -0800, Srinivas Pandruvada wrote: > > On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote: > > > [+cc Sumeet, Srinivas for INT3401 questions below] > > > [Beginning of thread: > > > > > > > https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/ > > > ] > > > > > > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote: > > > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote: > > > > > This isn't some complicated new device where the programming > > > > > model changed on the new CPU. This is a thermometer that was > > > > > already supported. ACPI provides plenty of functionality > > > > > that > > > > > could be used to support this generically, e.g., see > > > > > drivers/acpi/thermal.c, > > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c, > > > > > etc. > > > > > > > > Ok, you say ACPI but how do you envision practically doing > > > > that? > > > > I mean, this is used by old boxes too - ever since K8. So how > > > > do > > > > we go and add ACPI functionality to old boxes? > > > > > > > > Or do you mean it should simply be converted to do > > > > pci_register_driver() with a struct pci_driver pointer which > > > > has > > > > all those PCI device IDs in a table? I'm looking at the last > > > > example > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c you > > > > gave above. > > > > > > No, there would be no need to change anything for boxes already > > > in > > > the field. But for *new* systems, you could make devices or > > > thermal zones in the ACPI namespace (they might even already be > > > there for use by Windows). > > > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c claims > > > either INT3401 ACPI devices or listed PCI devices. > > > > To enumerate a driver to get processor temperature and get power > > properties, we have two methods: > > - The older Atom processors valleyview and Baytrail had no PCI > > device > > for the processor thermal management. There was INT3401 ACPI device > > to > > handle this. > > > > - The newer processors for core and Atom, has a dedicate PCI device > > and > > there is no INT3401 ACPI device anymore. > > This is really interesting because it's completely the reverse of > what > I would have expected. > > You use INT3401 on the *older* processors, where int3401_add() is > called for any platform devices with INT3401 PNP ID: > > int3401_add(plat_dev) # platform/ACPI .probe > proc_thermal_add(dev) > adev = ACPI_COMPANION(dev) > int340x_thermal_zone_add(adev) > thermal_zone_device_register() > > The sensors are read in this path, where thermal_zone_get_temp() is > the generic thermal entry point: > > thermal_zone_get_temp() > tz->ops->get_temp() > int340x_thermal_get_zone_temp() # ops.get_temp > acpi_evaluate_integer(..., "_TMP", ...) > > The above works for any platform that supplies the INT3401 device > because the _TMP method that actually reads the sensor is supplied by > the platform firmware. > > On *newer* processors, you apparently use this path: > > proc_thermal_pci_probe(pci_dev) # PCI .probe > pci_enable_device(pci_dev) > proc_thermal_add(dev) > adev = ACPI_COMPANION(dev) > int340x_thermal_zone_add(adev) > thermal_zone_device_register() > > Except for enabling the PCI device and a BSW_THERMAL hack, this is > exactly the *SAME*: you add a thermal zone for the ACPI device and > read the sensor using ACPI _TMP methods. > > But now you have to add new PCI IDs (Skylake, Broxton, CannonLake, > CoffeeLake, GeminiLake, etc) for every new platform. This seems like > a regression, not an improvement. What am I missing? Path of activation via both devices is same. Both will call proc_thermal_add(). There is no INT3401 on any newer atom or core platforms, so you can't enumerate on this device. We don't control what ACPI device is present on a system. It depends on what the other non-Linux OS is using. Also the PCI ACPI companion device doesn't have to supply a _TMP method. The INT3401 is a special device which must have _TMP otherwise firmware validation will fail. Yes, if there is INT3401 on all platforms we don't need PCI enumeration just for temperature and trips. But this PCI device brings in lots of other features which are still in works. Not sure about the context of discussion here. Are you suggesting some core changes where we don't have to add PCI ids like Skylake etc. ? Thanks, Srinivas > > > Since OEM systems will have different power properties and thermal > > trips, there is a companion ACPI device, which provides PPCC and > > thermal trips and optionally output from another temperature sensor > > from the default on processor cores. > > Bjorn
On Wed, Nov 07, 2018 at 02:42:00PM -0800, Srinivas Pandruvada wrote: > On Wed, 2018-11-07 at 15:31 -0600, Bjorn Helgaas wrote: > > On Wed, Nov 07, 2018 at 11:15:37AM -0800, Srinivas Pandruvada wrote: > > > On Tue, 2018-11-06 at 17:20 -0600, Bjorn Helgaas wrote: > > > > [+cc Sumeet, Srinivas for INT3401 questions below] > > > > [Beginning of thread: > https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.woods@amd.com/ > > > > ] > > > > > > > > On Tue, Nov 06, 2018 at 11:00:59PM +0100, Borislav Petkov wrote: > > > > > On Tue, Nov 06, 2018 at 03:42:56PM -0600, Bjorn Helgaas wrote: > > > > > > This isn't some complicated new device where the programming > > > > > > model changed on the new CPU. This is a thermometer that was > > > > > > already supported. ACPI provides plenty of functionality > > > > > > that > > > > > > could be used to support this generically, e.g., see > > > > > > drivers/acpi/thermal.c, > > > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c, > > > > > > etc. > > > > > > > > > > Ok, you say ACPI but how do you envision practically doing > > > > > that? > > > > > I mean, this is used by old boxes too - ever since K8. So how > > > > > do > > > > > we go and add ACPI functionality to old boxes? > > > > > > > > > > Or do you mean it should simply be converted to do > > > > > pci_register_driver() with a struct pci_driver pointer which > > > > > has > > > > > all those PCI device IDs in a table? I'm looking at the last > > > > > example > > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c you > > > > > gave above. > > > > > > > > No, there would be no need to change anything for boxes already > > > > in > > > > the field. But for *new* systems, you could make devices or > > > > thermal zones in the ACPI namespace (they might even already be > > > > there for use by Windows). > > > > > > > > drivers/thermal/int340x_thermal/processor_thermal_device.c claims > > > > either INT3401 ACPI devices or listed PCI devices. > > > > > > To enumerate a driver to get processor temperature and get power > > > properties, we have two methods: > > > - The older Atom processors valleyview and Baytrail had no PCI > > > device > > > for the processor thermal management. There was INT3401 ACPI device > > > to > > > handle this. > > > > > > - The newer processors for core and Atom, has a dedicate PCI device > > > and > > > there is no INT3401 ACPI device anymore. > > > > This is really interesting because it's completely the reverse of > > what > > I would have expected. > > > > You use INT3401 on the *older* processors, where int3401_add() is > > called for any platform devices with INT3401 PNP ID: > > > > int3401_add(plat_dev) # platform/ACPI .probe > > proc_thermal_add(dev) > > adev = ACPI_COMPANION(dev) > > int340x_thermal_zone_add(adev) > > thermal_zone_device_register() > > > > The sensors are read in this path, where thermal_zone_get_temp() is > > the generic thermal entry point: > > > > thermal_zone_get_temp() > > tz->ops->get_temp() > > int340x_thermal_get_zone_temp() # ops.get_temp > > acpi_evaluate_integer(..., "_TMP", ...) > > > > The above works for any platform that supplies the INT3401 device > > because the _TMP method that actually reads the sensor is supplied by > > the platform firmware. > > > > On *newer* processors, you apparently use this path: > > > > proc_thermal_pci_probe(pci_dev) # PCI .probe > > pci_enable_device(pci_dev) > > proc_thermal_add(dev) > > adev = ACPI_COMPANION(dev) > > int340x_thermal_zone_add(adev) > > thermal_zone_device_register() > > > > Except for enabling the PCI device and a BSW_THERMAL hack, this is > > exactly the *SAME*: you add a thermal zone for the ACPI device and > > read the sensor using ACPI _TMP methods. > > > > But now you have to add new PCI IDs (Skylake, Broxton, CannonLake, > > CoffeeLake, GeminiLake, etc) for every new platform. This seems like > > a regression, not an improvement. What am I missing? > > Path of activation via both devices is same. Both will call > proc_thermal_add(). > > There is no INT3401 on any newer atom or core platforms, so you can't > enumerate on this device. We don't control what ACPI device is present > on a system. It depends on what the other non-Linux OS is using. Sure, you can't *force* OEMs to supply a given ACPI device, but you can certainly say "if you want this functionality, supply INT3401 devices." That's what you do with PNP0A03 (PCI host bridges), for example. If an OEM doesn't supply PNP0A03 devices, the system can boot just fine as long as you don't need PCI. This model of using the PCI IDs forces OS vendors to release updates for every new platform. I guess you must have considered that and decided whatever benefit you're getting was worth the cost. > Also the PCI ACPI companion device doesn't have to supply a _TMP > method. > The INT3401 is a special device which must have _TMP otherwise firmware > validation will fail. Yes, if there is INT3401 on all platforms we > don't need PCI enumeration just for temperature and trips. But this PCI > device brings in lots of other features which are still in works. You can add new features in ACPI devices just as well as with PCI enumeration. > Not sure about the context of discussion here. Are you suggesting some > core changes where we don't have to add PCI ids like Skylake etc. ? This started because I suggested that AMD was making work for themselves by exposing more topology detail than necessary, which means they have to change amd_nb.c and add PCI IDs to k10temp.c to accommodate new platforms. And it looks like Intel is doing the same thing by moving from a model where some platform specifics can be encapsulated in ACPI methods so a new platform doesn't force an OS release, to a model where new platforms require OS driver updates. Obviously I don't know all the new features you're planning, so I'll stop pushing on this and wasting your time. Bjorn
[...] > Sure, you can't *force* OEMs to supply a given ACPI device, but you > can certainly say "if you want this functionality, supply INT3401 > devices." That's what you do with PNP0A03 (PCI host bridges), for > example. If an OEM doesn't supply PNP0A03 devices, the system can > boot just fine as long as you don't need PCI. > > This model of using the PCI IDs forces OS vendors to release updates > for every new platform. I guess you must have considered that and > decided whatever benefit you're getting was worth the cost. Not worth cost. This is a pain. Every release we end up adding a single line change to many drivers adding a PCI device id. Since there is no unique class_mask for PCI device for these devices, we need to add device_id for each generation even if there is no change. Instead if we have some feature to say don't enumerate for PCI device id < X and a black list, it will save lot of work. Thanks, Srinivas
On Wed, 2018-11-07 at 15:30 -0800, Srinivas Pandruvada wrote: > [...] > > > Sure, you can't *force* OEMs to supply a given ACPI device, but you > > can certainly say "if you want this functionality, supply INT3401 > > devices." That's what you do with PNP0A03 (PCI host bridges), for > > example. If an OEM doesn't supply PNP0A03 devices, the system can > > boot just fine as long as you don't need PCI. > > > > This model of using the PCI IDs forces OS vendors to release > > updates > > for every new platform. I guess you must have considered that and > > decided whatever benefit you're getting was worth the cost. > > Not worth cost. This is a pain. Every release we end up adding a > single > line change to many drivers adding a PCI device id. > Since there is no unique class_mask for PCI device for these devices, > we need to add device_id for each generation even if there is no > change. > Instead if we have some feature to say don't enumerate for PCI device > id < X and a black list, it will save lot of work. This still needs some work on our internal PCI device allocation scheme , where we can reserve a block of ids for a PCI device for same functionality from generation to generation. Thanks, Srinivas
On 11/7/18 3:14 PM, Bjorn Helgaas wrote: > >> >> There is no INT3401 on any newer atom or core platforms, so you can't >> enumerate on this device. We don't control what ACPI device is present >> on a system. It depends on what the other non-Linux OS is using. > > Sure, you can't *force* OEMs to supply a given ACPI device, but you > can certainly say "if you want this functionality, supply INT3401 > devices." That's what you do with PNP0A03 (PCI host bridges), for > example. If an OEM doesn't supply PNP0A03 devices, the system can > boot just fine as long as you don't need PCI. > > This model of using the PCI IDs forces OS vendors to release updates > for every new platform. I guess you must have considered that and > decided whatever benefit you're getting was worth the cost. > I really dislike where this is going. Board vendors - and that included Intel when Intel was still selling boards - have a long history of only making mandatory methods available in ACPI. Pretty much all of them don't make hardware monitoring information available via ACPI. This is a pain especially for laptops where the information is provided by an embedded controller. On systems with Super-IO chips with dedicated hardware monitoring functionality, they often go as far as signing mutual NDAs with chip vendors, which lets both the board and the chip vendor claim that they can not provide chip specifications to third parties, aka users. You are pretty much extending that to CPU temperature monitoring. The fallout, if adopted, will be that it will effectively no longer be possible to monitor the temperature on chips supporting this "feature". I do not think that would be a good idea. Guenter
On Wed, Nov 07, 2018 at 05:40:14PM -0800, Guenter Roeck wrote: > On 11/7/18 3:14 PM, Bjorn Helgaas wrote: > > > > > > > > There is no INT3401 on any newer atom or core platforms, so you can't > > > enumerate on this device. We don't control what ACPI device is present > > > on a system. It depends on what the other non-Linux OS is using. > > > > Sure, you can't *force* OEMs to supply a given ACPI device, but you > > can certainly say "if you want this functionality, supply INT3401 > > devices." That's what you do with PNP0A03 (PCI host bridges), for > > example. If an OEM doesn't supply PNP0A03 devices, the system can > > boot just fine as long as you don't need PCI. > > > > This model of using the PCI IDs forces OS vendors to release updates > > for every new platform. I guess you must have considered that and > > decided whatever benefit you're getting was worth the cost. > > > > I really dislike where this is going. Board vendors - and that included > Intel when Intel was still selling boards - have a long history of only > making mandatory methods available in ACPI. Pretty much all of them don't > make hardware monitoring information available via ACPI. This is a pain > especially for laptops where the information is provided by an embedded > controller. On systems with Super-IO chips with dedicated hardware > monitoring functionality, they often go as far as signing mutual NDAs > with chip vendors, which lets both the board and the chip vendor claim > that they can not provide chip specifications to third parties, aka > users. > > You are pretty much extending that to CPU temperature monitoring. The > fallout, if adopted, will be that it will effectively no longer be > possible to monitor the temperature on chips supporting this > "feature". > > I do not think that would be a good idea. I wasn't aware of these political implications. Thanks for raising them. I'm not in a position to balance those implications vs the technical question of minimizing the burden of supporting new platforms, so I'll try again to bow out of this. Bjorn
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 19d489ee2b1e..c0bf26aeb7c3 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -213,7 +213,10 @@ int amd_cache_northbridges(void) const struct pci_device_id *root_ids = amd_root_ids; struct pci_dev *root, *misc, *link; struct amd_northbridge *nb; - u16 i = 0; + u16 roots_per_misc = 0; + u16 misc_count = 0; + u16 root_count = 0; + u16 i, j; if (amd_northbridges.num) return 0; @@ -226,26 +229,52 @@ int amd_cache_northbridges(void) misc = NULL; while ((misc = next_northbridge(misc, misc_ids)) != NULL) - i++; + misc_count++; - if (!i) + root = NULL; + while ((root = next_northbridge(root, root_ids)) != NULL) + root_count++; + + if (!misc_count) return -ENODEV; - nb = kcalloc(i, sizeof(struct amd_northbridge), GFP_KERNEL); + if (root_count) { + roots_per_misc = root_count / misc_count; + + /* + * There should be _exactly_ N roots for each DF/SMN + * interface. + */ + if (!roots_per_misc || (root_count % roots_per_misc)) { + pr_info("Unsupported AMD DF/PCI configuration found\n"); + return -ENODEV; + } + } + + nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); if (!nb) return -ENOMEM; amd_northbridges.nb = nb; - amd_northbridges.num = i; + amd_northbridges.num = misc_count; link = misc = root = NULL; - for (i = 0; i != amd_northbridges.num; i++) { + for (i = 0; i < amd_northbridges.num; i++) { node_to_amd_nb(i)->root = root = next_northbridge(root, root_ids); node_to_amd_nb(i)->misc = misc = next_northbridge(misc, misc_ids); node_to_amd_nb(i)->link = link = next_northbridge(link, link_ids); + + /* + * If there are more root devices than data fabric/SMN, + * interfaces, then the root devices per DF/SMN + * interface are redundant and N-1 should be skipped so + * they aren't mapped incorrectly. + */ + for (j = 1; j < roots_per_misc; j++) + root = next_northbridge(root, root_ids); } if (amd_gart_present())
Add support for new processors which have multiple PCI root complexes per data fabric/SMN interface. The interfaces per root complex are redundant and should be skipped. This makes sure the DF/SMN interfaces get accessed via the correct root complex. Ex: DF/SMN 0 -> 60 40 20 00 DF/SMN 1 -> e0 c0 a0 80 Signed-off-by: Brian Woods <brian.woods@amd.com> --- arch/x86/kernel/amd_nb.c | 41 +++++++++++++++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 6 deletions(-)