mbox series

[00/16] AMD NB and SMN rework

Message ID 20241023172150.659002-1-yazen.ghannam@amd.com (mailing list archive)
Headers show
Series AMD NB and SMN rework | expand

Message

Yazen Ghannam Oct. 23, 2024, 5:21 p.m. UTC
Hi all,

The theme of this set is decoupling the "AMD node" concept from the
legacy northbridge support.

Additionally, AMD System Management Network (SMN) access code is
decoupled and expanded too.

Patches 1-3 begin reducing the scope of AMD_NB.

Patches 4-9 begin moving generic AMD node support out of AMD_NB.

Patches 10-13 move SMN support out of AMD_NB and do some refactoring.

Patch 14 has HSMP reuse SMN functionality.

Patches 15-16 address userspace access to SMN.

I say "begin" above because there is more to do here. Ultimately, AMD_NB
should only be needed for code used on legacy systems with northbridges.
Also, any and all SMN users in the kernel need to be updated to use the
central SMN code. Local solutions should be avoided.

Thanks,
Yazen

Mario Limonciello (4):
  x86/amd_nb, x86/amd_node: Simplify amd_pci_dev_to_node_id()
  x86/amd_nb: Move SMN access code to a new amd_smn driver
  x86/amd_smn: Add SMN offsets to exclusive region access
  x86/amd_smn: Add support for debugfs access to SMN registers

Yazen Ghannam (12):
  x86/mce/amd: Remove shared threshold bank plumbing
  x86/amd_nb: Restrict init function to AMD-based systems
  x86/amd_nb: Clean up early_is_amd_nb()
  x86: Start moving AMD Node functionality out of AMD_NB
  x86/amd_nb: Simplify function 4 search
  x86/amd_nb: Simplify root device search
  x86/amd_nb: Use topology info to get AMD node count
  x86/amd_nb: Simplify function 3 search
  x86/amd_smn: Fixup __amd_smn_rw()
  x86/amd_smn: Remove dependency on AMD_NB
  x86/amd_smn: Use defines for register offsets
  x86/amd_smn, platform/x86/amd/hsmp: Have HSMP use SMN

 MAINTAINERS                          |  15 ++
 arch/x86/Kconfig                     |   9 +-
 arch/x86/include/asm/amd_nb.h        |  53 +----
 arch/x86/include/asm/amd_node.h      |  39 ++++
 arch/x86/include/asm/amd_smn.h       |  14 ++
 arch/x86/kernel/Makefile             |   2 +
 arch/x86/kernel/amd_nb.c             | 294 ++-------------------------
 arch/x86/kernel/amd_node.c           |  91 +++++++++
 arch/x86/kernel/amd_smn.c            | 269 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/mce/amd.c        | 127 +++---------
 arch/x86/pci/fixup.c                 |   4 +-
 drivers/edac/Kconfig                 |   1 +
 drivers/edac/amd64_edac.c            |   1 +
 drivers/hwmon/Kconfig                |   2 +-
 drivers/hwmon/k10temp.c              |   2 +-
 drivers/platform/x86/amd/Kconfig     |   2 +-
 drivers/platform/x86/amd/hsmp.c      |  32 +--
 drivers/platform/x86/amd/pmc/Kconfig |   2 +-
 drivers/platform/x86/amd/pmc/pmc.c   |   2 +-
 drivers/platform/x86/amd/pmf/Kconfig |   2 +-
 drivers/platform/x86/amd/pmf/core.c  |   2 +-
 drivers/ras/amd/atl/Kconfig          |   1 +
 drivers/ras/amd/atl/internal.h       |   1 +
 23 files changed, 495 insertions(+), 472 deletions(-)
 create mode 100644 arch/x86/include/asm/amd_node.h
 create mode 100644 arch/x86/include/asm/amd_smn.h
 create mode 100644 arch/x86/kernel/amd_node.c
 create mode 100644 arch/x86/kernel/amd_smn.c


base-commit: 94559bac4d403b1575b32a863f5c0429cdd33eaa

Comments

Bjorn Helgaas Oct. 23, 2024, 5:59 p.m. UTC | #1
On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
> Hi all,
> 
> The theme of this set is decoupling the "AMD node" concept from the
> legacy northbridge support.
> 
> Additionally, AMD System Management Network (SMN) access code is
> decoupled and expanded too.
> 
> Patches 1-3 begin reducing the scope of AMD_NB.
> 
> Patches 4-9 begin moving generic AMD node support out of AMD_NB.
> 
> Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
> 
> Patch 14 has HSMP reuse SMN functionality.
> 
> Patches 15-16 address userspace access to SMN.
> 
> I say "begin" above because there is more to do here. Ultimately, AMD_NB
> should only be needed for code used on legacy systems with northbridges.
> Also, any and all SMN users in the kernel need to be updated to use the
> central SMN code. Local solutions should be avoided.

Glad to see many of the PCI device IDs going away; thanks for working
on that!

The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
ideal since all those pci_get_*() interfaces are kind of ugly in my
opinion, and using them means we have to encode topology details in
the kernel.  But this still seems like a big improvement.

Bjorn
Yazen Ghannam Oct. 24, 2024, 4:01 p.m. UTC | #2
On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
> On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
> > Hi all,
> > 
> > The theme of this set is decoupling the "AMD node" concept from the
> > legacy northbridge support.
> > 
> > Additionally, AMD System Management Network (SMN) access code is
> > decoupled and expanded too.
> > 
> > Patches 1-3 begin reducing the scope of AMD_NB.
> > 
> > Patches 4-9 begin moving generic AMD node support out of AMD_NB.
> > 
> > Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
> > 
> > Patch 14 has HSMP reuse SMN functionality.
> > 
> > Patches 15-16 address userspace access to SMN.
> > 
> > I say "begin" above because there is more to do here. Ultimately, AMD_NB
> > should only be needed for code used on legacy systems with northbridges.
> > Also, any and all SMN users in the kernel need to be updated to use the
> > central SMN code. Local solutions should be avoided.
> 
> Glad to see many of the PCI device IDs going away; thanks for working
> on that!
> 
> The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
> ideal since all those pci_get_*() interfaces are kind of ugly in my
> opinion, and using them means we have to encode topology details in
> the kernel.  But this still seems like a big improvement.
> 

Thanks for the feedback. Hopefully, we'll come to some improved
solution. :)

Can you please elaborate on your concern? Is it about saying "thing X is
always at SBDF A:B:C.D" or something else?

Thanks,
Yazen
Bjorn Helgaas Oct. 24, 2024, 5:46 p.m. UTC | #3
On Thu, Oct 24, 2024 at 12:01:59PM -0400, Yazen Ghannam wrote:
> On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
> > On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
> > > Hi all,
> > > 
> > > The theme of this set is decoupling the "AMD node" concept from the
> > > legacy northbridge support.
> > > 
> > > Additionally, AMD System Management Network (SMN) access code is
> > > decoupled and expanded too.
> > > 
> > > Patches 1-3 begin reducing the scope of AMD_NB.
> > > 
> > > Patches 4-9 begin moving generic AMD node support out of AMD_NB.
> > > 
> > > Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
> > > 
> > > Patch 14 has HSMP reuse SMN functionality.
> > > 
> > > Patches 15-16 address userspace access to SMN.
> > > 
> > > I say "begin" above because there is more to do here. Ultimately, AMD_NB
> > > should only be needed for code used on legacy systems with northbridges.
> > > Also, any and all SMN users in the kernel need to be updated to use the
> > > central SMN code. Local solutions should be avoided.
> > 
> > Glad to see many of the PCI device IDs going away; thanks for working
> > on that!
> > 
> > The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
> > ideal since all those pci_get_*() interfaces are kind of ugly in my
> > opinion, and using them means we have to encode topology details in
> > the kernel.  But this still seems like a big improvement.
> 
> Thanks for the feedback. Hopefully, we'll come to some improved
> solution. :)
> 
> Can you please elaborate on your concern? Is it about saying "thing X is
> always at SBDF A:B:C.D" or something else?

"Thing X is always at SBDF A:B:C.D" is one big reason.  "A:B:C.D" says
nothing about the actual functionality of the device.  A PCI
Vendor/Device ID or a PNP ID identifies the device programming model
independent of its geographical location.  Inferring the functionality
and programming model from the location is a maintenance issue because
hardware may change the address.

PCI bus numbers are under software control, so in general it's not
safe to rely on them, although in this case these devices are probably
on root buses where the bus number is either fixed or determined by
BIOS configuration of the host bridge.

I don't like the pci_get_*() functions because they break the driver
model.  The usual .probe() model binds a device to a driver, which
essentially means the driver owns the device and its resources, and
the driver and doesn't have to worry about other code interfering.

Unlike pci_get_*(), the .probe()/.remove() model automatically handles
hotplug without extra things like notifiers in the driver.  Hotplug
may not be an issue in this particular case, but it requires specific
platform knowledge to be sure.  Some platforms do support CPU and PCI
host bridge hotplug.

Thanks again for doing all this work.  It's a huge improvement
already!

Bjorn
Mario Limonciello Oct. 24, 2024, 8:08 p.m. UTC | #4
On 10/24/2024 12:46, Bjorn Helgaas wrote:
> On Thu, Oct 24, 2024 at 12:01:59PM -0400, Yazen Ghannam wrote:
>> On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
>>> On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
>>>> Hi all,
>>>>
>>>> The theme of this set is decoupling the "AMD node" concept from the
>>>> legacy northbridge support.
>>>>
>>>> Additionally, AMD System Management Network (SMN) access code is
>>>> decoupled and expanded too.
>>>>
>>>> Patches 1-3 begin reducing the scope of AMD_NB.
>>>>
>>>> Patches 4-9 begin moving generic AMD node support out of AMD_NB.
>>>>
>>>> Patches 10-13 move SMN support out of AMD_NB and do some refactoring.
>>>>
>>>> Patch 14 has HSMP reuse SMN functionality.
>>>>
>>>> Patches 15-16 address userspace access to SMN.
>>>>
>>>> I say "begin" above because there is more to do here. Ultimately, AMD_NB
>>>> should only be needed for code used on legacy systems with northbridges.
>>>> Also, any and all SMN users in the kernel need to be updated to use the
>>>> central SMN code. Local solutions should be avoided.
>>>
>>> Glad to see many of the PCI device IDs going away; thanks for working
>>> on that!
>>>
>>> The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
>>> ideal since all those pci_get_*() interfaces are kind of ugly in my
>>> opinion, and using them means we have to encode topology details in
>>> the kernel.  But this still seems like a big improvement.
>>
>> Thanks for the feedback. Hopefully, we'll come to some improved
>> solution. :)
>>
>> Can you please elaborate on your concern? Is it about saying "thing X is
>> always at SBDF A:B:C.D" or something else?
> 
> "Thing X is always at SBDF A:B:C.D" is one big reason.  "A:B:C.D" says
> nothing about the actual functionality of the device.  A PCI
> Vendor/Device ID or a PNP ID identifies the device programming model
> independent of its geographical location.  Inferring the functionality
> and programming model from the location is a maintenance issue because
> hardware may change the address.
> 
> PCI bus numbers are under software control, so in general it's not
> safe to rely on them, although in this case these devices are probably
> on root buses where the bus number is either fixed or determined by
> BIOS configuration of the host bridge.
> 
> I don't like the pci_get_*() functions because they break the driver
> model.  The usual .probe() model binds a device to a driver, which
> essentially means the driver owns the device and its resources, and
> the driver and doesn't have to worry about other code interfering.

Are you suggesting that perhaps we should be introducing amd_smn (patch 
10) as a PCI driver that binds "to the root device" instead?

If we made this change, I would wonder if it comes up early enough, 
particularly considering quirk_clear_strap_no_soft_reset_dev2_f0() uses 
the SMN symbols as PCI fixup final which happens before a driver 
attaches (pci_bus_add_device()).

There are some areas that do discovery (for example amd_node_get_root() 
in patch 6/16).

I think we should aspire to do is much discovery as possible but I don't 
know we can get TOTALLY away from some hardcoded topology information.

> 
> Unlike pci_get_*(), the .probe()/.remove() model automatically handles
> hotplug without extra things like notifiers in the driver.  Hotplug
> may not be an issue in this particular case, but it requires specific
> platform knowledge to be sure.  Some platforms do support CPU and PCI
> host bridge hotplug.
> 

Yeah hotplug won't matter for these.

> Thanks again for doing all this work.  It's a huge improvement
> already!
Bjorn Helgaas Oct. 24, 2024, 9:06 p.m. UTC | #5
On Thu, Oct 24, 2024 at 03:08:41PM -0500, Mario Limonciello wrote:
> On 10/24/2024 12:46, Bjorn Helgaas wrote:
> > On Thu, Oct 24, 2024 at 12:01:59PM -0400, Yazen Ghannam wrote:
> > > On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
> > > > On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
> ...

> > > > The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
> > > > ideal since all those pci_get_*() interfaces are kind of ugly in my
> > > > opinion, and using them means we have to encode topology details in
> > > > the kernel.  But this still seems like a big improvement.
> > > 
> > > Thanks for the feedback. Hopefully, we'll come to some improved
> > > solution. :)
> > > 
> > > Can you please elaborate on your concern? Is it about saying "thing X is
> > > always at SBDF A:B:C.D" or something else?
> > 
> > "Thing X is always at SBDF A:B:C.D" is one big reason.  "A:B:C.D" says
> > nothing about the actual functionality of the device.  A PCI
> > Vendor/Device ID or a PNP ID identifies the device programming model
> > independent of its geographical location.  Inferring the functionality
> > and programming model from the location is a maintenance issue because
> > hardware may change the address.
> > 
> > PCI bus numbers are under software control, so in general it's not
> > safe to rely on them, although in this case these devices are probably
> > on root buses where the bus number is either fixed or determined by
> > BIOS configuration of the host bridge.
> > 
> > I don't like the pci_get_*() functions because they break the driver
> > model.  The usual .probe() model binds a device to a driver, which
> > essentially means the driver owns the device and its resources, and
> > the driver and doesn't have to worry about other code interfering.
> 
> Are you suggesting that perhaps we should be introducing amd_smn (patch 10)
> as a PCI driver that binds "to the root device" instead?

I don't know any of the specifics, so I can't really opine on that.

The PCI specs envision that a Vendor/Device ID defines the programming
model of the device, and you would only use a new Device ID when that
programming model changes.

Of course, vendors like to define a new set of Device IDs for every
new chipset even when no driver changes are required, so even if a new
SMN works exactly the same as in previous chipsets, you're probably
back to having to add a new Device ID for every new chipset.

The Subsystem Vendor ID and Subsystem ID exist to solve a similar
problem (sort of in reverse).  If AMD could allocate a Subsystem ID
for this SMN programming model and use that same ID in every chipset,
you could make a pci_driver.id_table entry that would match them all,
e.g.,

  .vendor = PCI_VENDOR_ID_AMD,
  .device = PCI_ANY_ID,
  .subvendor = PCI_VENDOR_ID_AMD,
  .subdevice = PCI_SUBSYSTEM_AMD_SMN,

(pci_device_id.subdevice is misnamed; the spec calls it "Subsystem ID")

> There are some areas that do discovery (for example amd_node_get_root() in
> patch 6/16).

Sort of.  amd_node_get_root() and amd_node_get_func() both just grub
through all the devices that the PCI core has enumerated and return
the one that has the right geographical address.

There's no binding to a driver, so another driver could come along and
bind to the same device, and then you have a potential conflict.

You also give up all the standard driver model infrastructure for
hotplug, power management, etc.  Granted, you probably don't care
about those things here.

Bjorn
Mario Limonciello Oct. 24, 2024, 9:20 p.m. UTC | #6
On 10/24/2024 16:06, Bjorn Helgaas wrote:
> On Thu, Oct 24, 2024 at 03:08:41PM -0500, Mario Limonciello wrote:
>> On 10/24/2024 12:46, Bjorn Helgaas wrote:
>>> On Thu, Oct 24, 2024 at 12:01:59PM -0400, Yazen Ghannam wrote:
>>>> On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
>>>>> On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
>> ...
> 
>>>>> The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
>>>>> ideal since all those pci_get_*() interfaces are kind of ugly in my
>>>>> opinion, and using them means we have to encode topology details in
>>>>> the kernel.  But this still seems like a big improvement.
>>>>
>>>> Thanks for the feedback. Hopefully, we'll come to some improved
>>>> solution. :)
>>>>
>>>> Can you please elaborate on your concern? Is it about saying "thing X is
>>>> always at SBDF A:B:C.D" or something else?
>>>
>>> "Thing X is always at SBDF A:B:C.D" is one big reason.  "A:B:C.D" says
>>> nothing about the actual functionality of the device.  A PCI
>>> Vendor/Device ID or a PNP ID identifies the device programming model
>>> independent of its geographical location.  Inferring the functionality
>>> and programming model from the location is a maintenance issue because
>>> hardware may change the address.
>>>
>>> PCI bus numbers are under software control, so in general it's not
>>> safe to rely on them, although in this case these devices are probably
>>> on root buses where the bus number is either fixed or determined by
>>> BIOS configuration of the host bridge.
>>>
>>> I don't like the pci_get_*() functions because they break the driver
>>> model.  The usual .probe() model binds a device to a driver, which
>>> essentially means the driver owns the device and its resources, and
>>> the driver and doesn't have to worry about other code interfering.
>>
>> Are you suggesting that perhaps we should be introducing amd_smn (patch 10)
>> as a PCI driver that binds "to the root device" instead?
> 
> I don't know any of the specifics, so I can't really opine on that.
> 
> The PCI specs envision that a Vendor/Device ID defines the programming
> model of the device, and you would only use a new Device ID when that
> programming model changes.
> 
> Of course, vendors like to define a new set of Device IDs for every
> new chipset even when no driver changes are required, so even if a new
> SMN works exactly the same as in previous chipsets, you're probably
> back to having to add a new Device ID for every new chipset.

Yeah; this I believe is why we're here today and trying to find 
something more manageable (IE this series).

> 
> The Subsystem Vendor ID and Subsystem ID exist to solve a similar
> problem (sort of in reverse).  If AMD could allocate a Subsystem ID
> for this SMN programming model and use that same ID in every chipset,
> you could make a pci_driver.id_table entry that would match them all,
> e.g.,
> 
>    .vendor = PCI_VENDOR_ID_AMD,
>    .device = PCI_ANY_ID,
>    .subvendor = PCI_VENDOR_ID_AMD,
>    .subdevice = PCI_SUBSYSTEM_AMD_SMN,
> 
> (pci_device_id.subdevice is misnamed; the spec calls it "Subsystem ID")

Isn't the subsystem ID based typically upon the platform it's running 
on?  For example I seem to recall on Dell systems it's used the value 
that was in the SBMIOS ProductSKU field here (IoW not something AMD 
would control).

I mean I guess maybe we could do a:

     .vendor = PCI_VENDOR_ID_AMD,
     .device = PCI_ANY_ID,
     .class = PCI_CLASS_BRIDGE_HOST << 8

And then in probe() figure out if it's the right one, but that's still 
pretty ugly, eh?

> 
>> There are some areas that do discovery (for example amd_node_get_root() in
>> patch 6/16).
> 
> Sort of.  amd_node_get_root() and amd_node_get_func() both just grub
> through all the devices that the PCI core has enumerated and return
> the one that has the right geographical address.
> 
> There's no binding to a driver, so another driver could come along and
> bind to the same device, and then you have a potential conflict.
> 
> You also give up all the standard driver model infrastructure for
> hotplug, power management, etc.  Granted, you probably don't care
> about those things here.

Right; I agree none of that matters here.
Bjorn Helgaas Oct. 24, 2024, 9:47 p.m. UTC | #7
On Thu, Oct 24, 2024 at 04:20:35PM -0500, Mario Limonciello wrote:
> On 10/24/2024 16:06, Bjorn Helgaas wrote:
> > On Thu, Oct 24, 2024 at 03:08:41PM -0500, Mario Limonciello wrote:
> > > On 10/24/2024 12:46, Bjorn Helgaas wrote:
> > > > On Thu, Oct 24, 2024 at 12:01:59PM -0400, Yazen Ghannam wrote:
> > > > > On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
> > > > > > On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
> > > ...
> > 
> > > > > > The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
> > > > > > ideal since all those pci_get_*() interfaces are kind of ugly in my
> > > > > > opinion, and using them means we have to encode topology details in
> > > > > > the kernel.  But this still seems like a big improvement.
> > > > > 
> > > > > Thanks for the feedback. Hopefully, we'll come to some improved
> > > > > solution. :)
> > > > > 
> > > > > Can you please elaborate on your concern? Is it about saying "thing X is
> > > > > always at SBDF A:B:C.D" or something else?
> > > > 
> > > > "Thing X is always at SBDF A:B:C.D" is one big reason.  "A:B:C.D" says
> > > > nothing about the actual functionality of the device.  A PCI
> > > > Vendor/Device ID or a PNP ID identifies the device programming model
> > > > independent of its geographical location.  Inferring the functionality
> > > > and programming model from the location is a maintenance issue because
> > > > hardware may change the address.
> > > > 
> > > > PCI bus numbers are under software control, so in general it's not
> > > > safe to rely on them, although in this case these devices are probably
> > > > on root buses where the bus number is either fixed or determined by
> > > > BIOS configuration of the host bridge.
> > > > 
> > > > I don't like the pci_get_*() functions because they break the driver
> > > > model.  The usual .probe() model binds a device to a driver, which
> > > > essentially means the driver owns the device and its resources, and
> > > > the driver and doesn't have to worry about other code interfering.
> > > 
> > > Are you suggesting that perhaps we should be introducing amd_smn (patch 10)
> > > as a PCI driver that binds "to the root device" instead?
> > 
> > I don't know any of the specifics, so I can't really opine on that.
> > 
> > The PCI specs envision that a Vendor/Device ID defines the programming
> > model of the device, and you would only use a new Device ID when that
> > programming model changes.
> > 
> > Of course, vendors like to define a new set of Device IDs for every
> > new chipset even when no driver changes are required, so even if a new
> > SMN works exactly the same as in previous chipsets, you're probably
> > back to having to add a new Device ID for every new chipset.
> 
> Yeah; this I believe is why we're here today and trying to find something
> more manageable (IE this series).

Another alternative would be an ACPI device where you can use the same
_HID (or at least a _CID) for all the chipsets.

> > The Subsystem Vendor ID and Subsystem ID exist to solve a similar
> > problem (sort of in reverse).  If AMD could allocate a Subsystem ID
> > for this SMN programming model and use that same ID in every chipset,
> > you could make a pci_driver.id_table entry that would match them all,
> > e.g.,
> > 
> >    .vendor = PCI_VENDOR_ID_AMD,
> >    .device = PCI_ANY_ID,
> >    .subvendor = PCI_VENDOR_ID_AMD,
> >    .subdevice = PCI_SUBSYSTEM_AMD_SMN,
> > 
> > (pci_device_id.subdevice is misnamed; the spec calls it "Subsystem ID")
> 
> Isn't the subsystem ID based typically upon the platform it's
> running on?  For example I seem to recall on Dell systems it's used
> the value that was in the SBMIOS ProductSKU field here (IoW not
> something AMD would control).

Right, it is typically based on the platform; that's why I said "in
reverse."  I think all these devices are integrated into the chipset,
so I'm speculating that platform vendors would have no need (maybe
even no way) to use the Subsystem ID.  But maybe that's not the case.

> I mean I guess maybe we could do a:
> 
>     .vendor = PCI_VENDOR_ID_AMD,
>     .device = PCI_ANY_ID,
>     .class = PCI_CLASS_BRIDGE_HOST << 8
> 
> And then in probe() figure out if it's the right one, but that's still
> pretty ugly, eh?

I think there are some drivers that do this, and it's not completely
terrible.  The probe() can just return failure if it doesn't want the
device.

Bjorn
Yazen Ghannam Oct. 31, 2024, 4:22 p.m. UTC | #8
On Thu, Oct 24, 2024 at 04:47:53PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 24, 2024 at 04:20:35PM -0500, Mario Limonciello wrote:
> > On 10/24/2024 16:06, Bjorn Helgaas wrote:
> > > On Thu, Oct 24, 2024 at 03:08:41PM -0500, Mario Limonciello wrote:
> > > > On 10/24/2024 12:46, Bjorn Helgaas wrote:
> > > > > On Thu, Oct 24, 2024 at 12:01:59PM -0400, Yazen Ghannam wrote:
> > > > > > On Wed, Oct 23, 2024 at 12:59:28PM -0500, Bjorn Helgaas wrote:
> > > > > > > On Wed, Oct 23, 2024 at 05:21:34PM +0000, Yazen Ghannam wrote:
> > > > ...
> > > 
> > > > > > > The use of pci_get_slot() and pci_get_domain_bus_and_slot() is not
> > > > > > > ideal since all those pci_get_*() interfaces are kind of ugly in my
> > > > > > > opinion, and using them means we have to encode topology details in
> > > > > > > the kernel.  But this still seems like a big improvement.
> > > > > > 
> > > > > > Thanks for the feedback. Hopefully, we'll come to some improved
> > > > > > solution. :)
> > > > > > 
> > > > > > Can you please elaborate on your concern? Is it about saying "thing X is
> > > > > > always at SBDF A:B:C.D" or something else?
> > > > > 
> > > > > "Thing X is always at SBDF A:B:C.D" is one big reason.  "A:B:C.D" says
> > > > > nothing about the actual functionality of the device.  A PCI
> > > > > Vendor/Device ID or a PNP ID identifies the device programming model
> > > > > independent of its geographical location.  Inferring the functionality
> > > > > and programming model from the location is a maintenance issue because
> > > > > hardware may change the address.
> > > > > 
> > > > > PCI bus numbers are under software control, so in general it's not
> > > > > safe to rely on them, although in this case these devices are probably
> > > > > on root buses where the bus number is either fixed or determined by
> > > > > BIOS configuration of the host bridge.
> > > > > 
> > > > > I don't like the pci_get_*() functions because they break the driver
> > > > > model.  The usual .probe() model binds a device to a driver, which
> > > > > essentially means the driver owns the device and its resources, and
> > > > > the driver and doesn't have to worry about other code interfering.
> > > > 
> > > > Are you suggesting that perhaps we should be introducing amd_smn (patch 10)
> > > > as a PCI driver that binds "to the root device" instead?
> > > 
> > > I don't know any of the specifics, so I can't really opine on that.
> > > 
> > > The PCI specs envision that a Vendor/Device ID defines the programming
> > > model of the device, and you would only use a new Device ID when that
> > > programming model changes.
> > > 
> > > Of course, vendors like to define a new set of Device IDs for every
> > > new chipset even when no driver changes are required, so even if a new
> > > SMN works exactly the same as in previous chipsets, you're probably
> > > back to having to add a new Device ID for every new chipset.
> > 
> > Yeah; this I believe is why we're here today and trying to find something
> > more manageable (IE this series).
> 
> Another alternative would be an ACPI device where you can use the same
> _HID (or at least a _CID) for all the chipsets.
>

Yes, we've had some internal discussions about something like this. Of
course, any new solution will only apply to future products.

Another option could be for the platform to provide an abstracted
interface for each unique access method. For example, we could have
define UEFI PRM methods, the code can run in OS context, and the details
would be abstracted. But again, this would have to come for future
products. :/

> > > The Subsystem Vendor ID and Subsystem ID exist to solve a similar
> > > problem (sort of in reverse).  If AMD could allocate a Subsystem ID
> > > for this SMN programming model and use that same ID in every chipset,
> > > you could make a pci_driver.id_table entry that would match them all,
> > > e.g.,
> > > 
> > >    .vendor = PCI_VENDOR_ID_AMD,
> > >    .device = PCI_ANY_ID,
> > >    .subvendor = PCI_VENDOR_ID_AMD,
> > >    .subdevice = PCI_SUBSYSTEM_AMD_SMN,
> > > 
> > > (pci_device_id.subdevice is misnamed; the spec calls it "Subsystem ID")
> > 
> > Isn't the subsystem ID based typically upon the platform it's
> > running on?  For example I seem to recall on Dell systems it's used
> > the value that was in the SBMIOS ProductSKU field here (IoW not
> > something AMD would control).
> 
> Right, it is typically based on the platform; that's why I said "in
> reverse."  I think all these devices are integrated into the chipset,
> so I'm speculating that platform vendors would have no need (maybe
> even no way) to use the Subsystem ID.  But maybe that's not the case.
>

The devices are integrated. However, they aren't solely used for the
register access interfaces. The index/data pairs just happen to reside
in a root complex device, but of course the root complex is not there
just for this use.

> > I mean I guess maybe we could do a:
> > 
> >     .vendor = PCI_VENDOR_ID_AMD,
> >     .device = PCI_ANY_ID,
> >     .class = PCI_CLASS_BRIDGE_HOST << 8
> > 
> > And then in probe() figure out if it's the right one, but that's still
> > pretty ugly, eh?
> 
> I think there are some drivers that do this, and it's not completely
> terrible.  The probe() can just return failure if it doesn't want the
> device.
>

Would it make sense to have a driver if we're not actually driving
anything? We really just need to read/write to a few registers in the
same vein of using an I/O access port.

I think a driver would really work if there was a lot more functionality
and the access port was just one of many features.

But maybe a really bare-bones driver isn't too much more than what we
have. And it could be more maintainable.

And maybe we can treat the "AMD Node" as a logical device that
collects/manages interfaces across multiple devices.

Bjorn, would you mind if we pursued this as a follow up to this set? I
think there's potential for some of these ideas. But I'll need to do
more research and discuss with others.

Thanks,
Yazen