Message ID | 20160818224610.GA28276@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Aug 18, 2016 at 06:46:10PM -0400, Keith Busch wrote: > On Thu, Aug 18, 2016 at 02:56:56PM -0500, Bjorn Helgaas wrote: > > On Wed, Aug 17, 2016 at 07:09:51PM -0400, Keith Busch wrote: > > > We've augmented 'ledmon' with libpci and toggles these indicators very > > > similar to how 'setpci' can change PCI config registers. It is done only > > > after the storage device is up and registered, which is well outside > > > the time when pciehp is actively using the slot control. > > > > I assume this means ledmon writes Slot Control to manage the LEDs. > > Correct, the registers that are defined for AttnCtrl and PwrCtrl are > what's being redefined for the new pattern. > > We are definitely not dead set on requiring we let ledmon have direct > access to SlotCtl through libpci. That's just the one way we thought of > that didn't require new kernel dependencies. > > > That sounds pretty scary to me because it's impossible to enforce the > > assumption that ledmon only uses Slot Control when pciehp isn't using > > it. Hotplug includes surprise events, and I think pciehp is entitled > > to assume it is the sole owner of Slot Control. > > I understand the concern here. You don't want independent programs > vying to mask/unmask bits in SlotCtl and end up with a state neither > intended. The only scenario that I can come up with where that could > happen is if ledmon changes the LED on a slot at the same time the drive > is removed, but there is no MRL or attention buttons on this hardware, > and the rest that we do care about looks totally harmless on these slots. > > This condition also isn't really new. While probably not recommended, > I could blink the standard Attn and Pwr LED's like this (user, of course, > assumes all the risks): > > # setpci -s <B:D.f> CAP_EXP+18.w=280:3c0 Definitely not recommended. If you use setpci, all bets are off and we can't really support the system afterwards. In fact, we probably should taint the kernel when users write to config space. If we tainted the kernel and put something in dmesg on the first user config space write, I might be OK with the original quirk to make pciehp ignore the attention and power indicators. The only reason I suggest a dmesg note is because while we can't *imagine* a scenario where pciehp and ledmon stomp on each other, that doesn't mean such a scenario doesn't exist, and a problem would be hard to reproduce and hard to debug. > > I wonder if there's some way to implement the LED control in pciehp, > > e.g., by enhancing pciehp_set_attention_status(). I assume the > > desired indicator state is coming from some in-kernel storage driver, > > and ledmon learns about that somehow, then fiddles with Slot Control > > behind the kernel's back? That looping from kernel to user and back > > to kernel sounds a little dodgy and fragile. > > That is definitely an interesting possibility if you are open to > this. We'd "just" need the kernel to comprehend these vendor specific > bit patterns for this particular generation of hardware. > > If you're okay with that, then I'm more than happy to propose a patch > for consideration. We can then have ledmon subscribe to the sysfs entry > point instead of going through libpci, no problem. I was imagining these LEDs as some sort of extension to the PCIe hotplug model, but I think that was a mistake: logically, they have nothing to do with hotplug, and the only reason they're currently associated with hotplug is because you chose to re-use a bus (VPP) that happens to be connected to the Slot Control registers. From an architectural point of view, these LEDs seem device-specific or storage-specific, with no connection to PCIe at all, so I don't know why we would put them in the PCIe spec or teach pciehp about them. > > Can you share any details about how you plan to implement this on the > > next generation of devices? Maybe we can enhance pciehp indicator > > control in a way that supports both Sky Lake and future devices. > > One possibility is to define through PCI-SIG > SlotCap2 and SlotCtl2 with this kind of LED control. > > Another possibilities I hear about through the NVMe-MI working group > is creating similar type of devices as SES (SCSI Enclosure Services) > for PCIe SSD enclosures. > Since these are specific to VMD domains, there are folks at Intel who > want to see the VMD driver mask off these capabilities from the driver. > There is never a case where a slot in this domain should advertise these, > so masking off the bits at the config read level would achieve the desired > result. It'd be isolated to the VMD driver, and the quirk wouldn't need > to be maintained as addtional vendor devices implementing the same quirk > are made. My biggest concern is the ownership problem, and this strategy doesn't address that. > --- > diff --git a/arch/x86/pci/vmd.c b/arch/x86/pci/vmd.c > index b73da50..aa2791f 100644 > --- a/arch/x86/pci/vmd.c > +++ b/arch/x86/pci/vmd.c > @@ -452,6 +452,7 @@ static int vmd_pci_read(struct pci_bus *bus, > unsigned int devfn, int reg, > { > struct vmd_dev *vmd = vmd_from_bus(bus); > char __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); > + struct pci_dev *dev; > unsigned long flags; > int ret = 0; > > @@ -474,6 +475,17 @@ static int vmd_pci_read(struct pci_bus *bus, > unsigned int devfn, int reg, > break; > } > spin_unlock_irqrestore(&vmd->cfg_lock, flags); > + > + list_for_each_entry(dev, &bus->devices, bus_list) { > + if (dev->devfn == devfn) { > + if (dev->pcie_cap && > + reg == dev->pcie_cap + PCI_EXP_SLTCAP) > + *value &= ~(PCI_EXP_SLTCAP_AIP | > + PCI_EXP_SLTCAP_PIP); > + break; > + } > + } > + > return ret; > } > > -- -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 22, 2016 at 11:55:24AM -0500, Bjorn Helgaas wrote: > I was imagining these LEDs as some sort of extension to the PCIe > hotplug model, but I think that was a mistake: logically, they have > nothing to do with hotplug, and the only reason they're currently > associated with hotplug is because you chose to re-use a bus (VPP) > that happens to be connected to the Slot Control registers. > > From an architectural point of view, these LEDs seem device-specific > or storage-specific, with no connection to PCIe at all, so I don't > know why we would put them in the PCIe spec or teach pciehp about > them. It's not entirely for hotplug scenarios, but it does help with user pain points locating devices they intend to hot remove. I hear many vendors are for the concept of proposing new status and location indicator definitions to PCIe. I don't think anyone is suggesting the implementation requiring this patch be made standard; this generation of hardware is just a non-standard implementation that needs a quirk to help fill the gap. Would it be more palatable if I modify the quirk such that when set, pciehp provides a sysfs entry allowing arbitrary user defined Slot Control commands? That removes the dangerous direct access from user space. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 22, 2016 at 05:15:36PM -0400, Keith Busch wrote: > On Mon, Aug 22, 2016 at 11:55:24AM -0500, Bjorn Helgaas wrote: > > I was imagining these LEDs as some sort of extension to the PCIe > > hotplug model, but I think that was a mistake: logically, they have > > nothing to do with hotplug, and the only reason they're currently > > associated with hotplug is because you chose to re-use a bus (VPP) > > that happens to be connected to the Slot Control registers. > > > > From an architectural point of view, these LEDs seem device-specific > > or storage-specific, with no connection to PCIe at all, so I don't > > know why we would put them in the PCIe spec or teach pciehp about > > them. > > It's not entirely for hotplug scenarios, but it does help with user pain > points locating devices they intend to hot remove. > > I hear many vendors are for the concept of proposing new status > and location indicator definitions to PCIe. I don't think anyone is > suggesting the implementation requiring this patch be made standard; > this generation of hardware is just a non-standard implementation that > needs a quirk to help fill the gap. I'm still not buying the hotplug connection. The existing Attention Indicator is already defined for the purpose of locating a device (PCIe r3.0, sec 6.7.1.1.1): A blinking Attention Indicator indicates that system software is identifying this slot for a human operator to find. This behavior is controlled by a user (for example, from a software user interface or management tool). The other IBPI states (fail, rebuild, predicted failure, hotspare, degraded, failed) all seem like storage-related things without an obvious connection to pciehp. I would expect a kernel driver like md to manage those states. PCIe hotplug is designed with a 1:1:1 relationship between the Downstream Port, the slot, and a replaceable device. It's not obvious to me how the IBPI signaling maps into that. The first diagram on the IBPI wikipedia page shows a single iPass cable (e.g., a single PCIe slot connection) connected to an enclosure of four drives. Each drive has three LEDs, so we couldn't control all twelve of those LEDs using the single Slot Control of the Downstream Port. > Would it be more palatable if I modify the quirk such that when set, > pciehp provides a sysfs entry allowing arbitrary user defined Slot Control > commands? That removes the dangerous direct access from user space. Maybe, I dunno. So far, it still seems like an unrelated wart on pciehp. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 23, 2016 at 08:39:13AM -0500, Bjorn Helgaas wrote: > On Mon, Aug 22, 2016 at 05:15:36PM -0400, Keith Busch wrote: > > The other IBPI states (fail, rebuild, predicted failure, hotspare, > degraded, failed) all seem like storage-related things without an > obvious connection to pciehp. I would expect a kernel driver like md > to manage those states. Yes, that's fairly accurate on how this works with other storage transports. While md doesn't directly affect LED states, 'ledmon' subscribes to md events and executes an appropriate LED control method specific to that transport. It just happens that this quirky hardware chose to re-purpose the Slot Control commands for the PCIe transport specific method to get feature parity with other storage fabrics. > PCIe hotplug is designed with a 1:1:1 relationship between the > Downstream Port, the slot, and a replaceable device. It's not obvious > to me how the IBPI signaling maps into that. The first diagram on the > IBPI wikipedia page shows a single iPass cable (e.g., a single PCIe > slot connection) connected to an enclosure of four drives. Each drive > has three LEDs, so we couldn't control all twelve of those LEDs using > the single Slot Control of the Downstream Port. For a PCIe storage transport, each drive has its own slot, and the relationship with the downstream port and its replaceable device is still preserved; connections to each drive provide individual Slot Control. > > Would it be more palatable if I modify the quirk such that when set, > > pciehp provides a sysfs entry allowing arbitrary user defined Slot Control > > commands? That removes the dangerous direct access from user space. > > Maybe, I dunno. So far, it still seems like an unrelated wart on pciehp. It's superficially related to pciehp only because that's the only module that touches Slot Control, and this particular hardware interprets pciehp's control commands differently than the specification. Since the hardware can't be changed, is there any guidance you can recommend we follow to appropriately fence off pciehp from attention and power indicator control? I initially attempted the least invasive method, but I'm happy to explore other possibilities. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/23/2016 1:10 PM, Keith Busch wrote: > It's superficially related to pciehp only because that's the only module > that touches Slot Control, and this particular hardware interprets > pciehp's control commands differently than the specification. > > Since the hardware can't be changed, is there any guidance you can > recommend we follow to appropriately fence off pciehp from attention and > power indicator control? I initially attempted the least invasive method, > but I'm happy to explore other possibilities. Most other non-standard HW require an LED driver for hotplug in the kernel. IBM has one. It sound like you need another one. static struct hotplug_slot_ops cpci_hotplug_slot_ops = { .enable_slot = enable_slot, .disable_slot = disable_slot, .set_attention_status = set_attention_status, .get_power_status = get_power_status, .get_attention_status = get_attention_status, .get_adapter_status = get_adapter_status, .get_latch_status = get_latch_status, }; cpci_hotplug_core.c and ibmphp_core.c.
On Tue, Aug 23, 2016 at 01:14:03PM -0400, Sinan Kaya wrote: > On 8/23/2016 1:10 PM, Keith Busch wrote: > > It's superficially related to pciehp only because that's the only module > > that touches Slot Control, and this particular hardware interprets > > pciehp's control commands differently than the specification. > > > > Since the hardware can't be changed, is there any guidance you can > > recommend we follow to appropriately fence off pciehp from attention and > > power indicator control? I initially attempted the least invasive method, > > but I'm happy to explore other possibilities. > > Most other non-standard HW require an LED driver for hotplug in the kernel. > IBM has one. It sound like you need another one. We'd still need to fence off pciehp from LED control since it manipulates these in response to hot plug events handled by that driver. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 23, 2016 at 03:23:31PM -0400, Keith Busch wrote: > On Tue, Aug 23, 2016 at 01:14:03PM -0400, Sinan Kaya wrote: > > On 8/23/2016 1:10 PM, Keith Busch wrote: > > > It's superficially related to pciehp only because that's the only module > > > that touches Slot Control, and this particular hardware interprets > > > pciehp's control commands differently than the specification. > > > > > > Since the hardware can't be changed, is there any guidance you can > > > recommend we follow to appropriately fence off pciehp from attention and > > > power indicator control? I initially attempted the least invasive method, > > > but I'm happy to explore other possibilities. > > > > Most other non-standard HW require an LED driver for hotplug in the kernel. > > IBM has one. It sound like you need another one. > > We'd still need to fence off pciehp from LED control since it manipulates > these in response to hot plug events handled by that driver. Everybody probably knows this, but the problem is not with LED control per se. The problem is multiple writers to the Slot Control register. Each write to Slot Control is a "command", and software is responsible for waiting for one command to complete before writing another (see PCIe r3.0, sec 6.7.3.2). It's no problem to make pciehp keep its mitts off the LEDs; the problem is if ledmon writes Slot Control for the indicators, and pciehp writes Slot Control for some other reason at about the same time. Sinan has a very interesting idea... Maybe we can work with that. Hmm, it looks like if we have an attention indicator, we already put an "attention" file in sysfs (see attention_write_file()). That should eventually call into pciehp_set_attention_status(), where we currently only support off/on/blink states. What if you made a quirk that replaced pciehp_set_attention_status() with a special version that supported the extra states you need and made ledmon use that sysfs file instead of writing Slot Control directly? Would everything just work? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 23, 2016 at 01:10:18PM -0400, Keith Busch wrote: > On Tue, Aug 23, 2016 at 08:39:13AM -0500, Bjorn Helgaas wrote: > > On Mon, Aug 22, 2016 at 05:15:36PM -0400, Keith Busch wrote: > > > > The other IBPI states (fail, rebuild, predicted failure, hotspare, > > degraded, failed) all seem like storage-related things without an > > obvious connection to pciehp. I would expect a kernel driver like md > > to manage those states. > > Yes, that's fairly accurate on how this works with other storage > transports. While md doesn't directly affect LED states, 'ledmon' > subscribes to md events and executes an appropriate LED control method > specific to that transport. This is a tangent, but is there some reason for putting ledmon in the middle of this scheme? It seems like the important pieces (md and LED control) are intimately related and both in the kernel, so why the trip through userland? ledmon sounds like a glorified special-case /dev/mem, with similar security and stability issues. Plus it must have to associate a storage device with a PCIe port, grub through its config space to find the Slot Control register, deal with possible hotplug, etc. It all sounds icky. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 23, 2016 at 02:52:21PM -0500, Bjorn Helgaas wrote: > Everybody probably knows this, but the problem is not with LED control > per se. The problem is multiple writers to the Slot Control register. > Each write to Slot Control is a "command", and software is responsible > for waiting for one command to complete before writing another (see > PCIe r3.0, sec 6.7.3.2). > > It's no problem to make pciehp keep its mitts off the LEDs; the > problem is if ledmon writes Slot Control for the indicators, and > pciehp writes Slot Control for some other reason at about the same > time. > > Sinan has a very interesting idea... Maybe we can work with that. > Hmm, it looks like if we have an attention indicator, we already put > an "attention" file in sysfs (see attention_write_file()). That > should eventually call into pciehp_set_attention_status(), where we > currently only support off/on/blink states. > > What if you made a quirk that replaced pciehp_set_attention_status() > with a special version that supported the extra states you need and > made ledmon use that sysfs file instead of writing Slot Control > directly? Would everything just work? I'm on board with this idea. I'll get some testing started and send a new version if/when successful. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 23, 2016 at 03:02:32PM -0500, Bjorn Helgaas wrote: > On Tue, Aug 23, 2016 at 01:10:18PM -0400, Keith Busch wrote: > > Yes, that's fairly accurate on how this works with other storage > > transports. While md doesn't directly affect LED states, 'ledmon' > > subscribes to md events and executes an appropriate LED control method > > specific to that transport. > > This is a tangent, but is there some reason for putting ledmon in the > middle of this scheme? It seems like the important pieces (md and LED > control) are intimately related and both in the kernel, so why the > trip through userland? ledmon sounds like a glorified special-case > /dev/mem, with similar security and stability issues. Plus it must > have to associate a storage device with a PCIe port, grub through its > config space to find the Slot Control register, deal with possible > hotplug, etc. It all sounds icky. DM knows to report status on groups and members, but as far as I know, there's no in-kernel API for setting storage status LEDs. If we want to put it in the kernel, I think it'd be a longer term project, but could be interesting to try if this is a desirable feature. There's more than one way to set status (SGPIO, SES, IPMI), and 'ledmon' already knows how to figure out which method to use for many storage devices. We will just need to teach it whatever method we expose for PCIe attached drives. The direct SlotCtl config write method had no new kernel dependencies, so seemed desirable, but you raised a good point that it potentially breaks the slot control command sequence. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/pci/vmd.c b/arch/x86/pci/vmd.c index b73da50..aa2791f 100644 --- a/arch/x86/pci/vmd.c +++ b/arch/x86/pci/vmd.c @@ -452,6 +452,7 @@ static int vmd_pci_read(struct pci_bus *bus, unsigned int devfn, int reg, { struct vmd_dev *vmd = vmd_from_bus(bus); char __iomem *addr = vmd_cfg_addr(vmd, bus, devfn, reg, len); + struct pci_dev *dev; unsigned long flags; int ret = 0;