Patchworkβ pci-express hotplug

login
register
about
Submitter Kenji Kaneshige
Date 2009-10-28 06:15:00
Message ID <4AE7E164.80408@jp.fujitsu.com>
Download mbox | patch
Permalink /patch/56225/
State RFC
Headers show

Comments

Kenji Kaneshige - 2009-10-28 06:15:00
Jens Axboe wrote:
> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>> * Jens Axboe <jens.axboe@oracle.com>:
>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>> acpiphp: Slot [1] registered
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>> acpiphp: Slot [2] registered
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>> acpiphp: Slot [6] registered
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>> acpiphp: Slot [7] registered
>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>> acpiphp_glue: Total 4 slots
>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>> slots' power files.
>>>>>>
>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>
>>>>>> If so, there should be debug output when you try and turn them
>>>>>> on.
>>>>> It produces:
>>>>>
>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>> Hm, so for some reason, firmware on your machine is telling us
>>>> that it doesn't think cards are present and/or enabled.
>>>>
>>>> Unfortunately, I don't know why your firmware would be saying
>>>> that. We could add some more debug printks to see what firmware
>>>> thinks about your system... Or we could just wait and see what
>>>> happens after you get your hardware replaced.
>>> New board, the exact same thing happens.
>>>
>>>>> I have a card in one of the slots only this time.
>>>>>
>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>> slots, right? :)
>>>>> Yes :-)
>>>>>
>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>> please.
>>>>> Send privately.
>>>> No difference in before and after. Odd.
>>>>
>>>> If you want to poke us again after your hardware swap, please do
>>>> so. Sorry for being not so helpful. :-/
>>> Poke :-)
>>>
>>> One more thing I tried was pushing the power button on the slot
>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>> I get the same power fault bit interrupt storm. So no difference from
>>> using the sysfs interface or doing it on the box side, doesn't work
>>> either way.
>>>
>> I'd like to confirm power fault interrupt storm, just in case.
>> Could you get /proc/interrupts information after power fault
>> problem happens and send it to me?
> 
> The box pretty much hangs when I try to power on a slot with pciehp, so
> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
> either (see previous reply to Alex).
> 

Could you try the attached debugging patch? With this patch, power
fault interrupt would be disabled after 100 power fault detected (
I hope so). You can get /proc/interrupts after that.

Thanks,
Kenji Kaneshige
---
 drivers/pci/hotplug/pciehp_hpc.c |    8 ++++++++
 1 file changed, 8 insertions(+)
Jens Axboe - 2009-10-28 09:23:25
On Wed, Oct 28 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>> * Jens Axboe <jens.axboe@oracle.com>:
>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>> slots' power files.
>>>>>>>
>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>
>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>> on.
>>>>>> It produces:
>>>>>>
>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>> that it doesn't think cards are present and/or enabled.
>>>>>
>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>> that. We could add some more debug printks to see what firmware
>>>>> thinks about your system... Or we could just wait and see what
>>>>> happens after you get your hardware replaced.
>>>> New board, the exact same thing happens.
>>>>
>>>>>> I have a card in one of the slots only this time.
>>>>>>
>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>> slots, right? :)
>>>>>> Yes :-)
>>>>>>
>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>> please.
>>>>>> Send privately.
>>>>> No difference in before and after. Odd.
>>>>>
>>>>> If you want to poke us again after your hardware swap, please do
>>>>> so. Sorry for being not so helpful. :-/
>>>> Poke :-)
>>>>
>>>> One more thing I tried was pushing the power button on the slot
>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>> I get the same power fault bit interrupt storm. So no difference from
>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>> either way.
>>>>
>>> I'd like to confirm power fault interrupt storm, just in case.
>>> Could you get /proc/interrupts information after power fault
>>> problem happens and send it to me?
>>
>> The box pretty much hangs when I try to power on a slot with pciehp, so
>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>> either (see previous reply to Alex).
>>
>
> Could you try the attached debugging patch? With this patch, power
> fault interrupt would be disabled after 100 power fault detected (
> I hope so). You can get /proc/interrupts after that.

Here is the output of doing the power on with that patch applied.

pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: Power fault interrupt received
pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
pciehp 0000:00:05.0:pcie04: Link Training Error occurs 
pciehp 0000:00:05.0:pcie04: Failed to check link status
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
Kenji Kaneshige - 2009-10-29 07:44:19
Jens Axboe wrote:
> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>> * Jens Axboe <jens.axboe@oracle.com>:
>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>> slots' power files.
>>>>>>>>
>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>
>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>> on.
>>>>>>> It produces:
>>>>>>>
>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>
>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>> that. We could add some more debug printks to see what firmware
>>>>>> thinks about your system... Or we could just wait and see what
>>>>>> happens after you get your hardware replaced.
>>>>> New board, the exact same thing happens.
>>>>>
>>>>>>> I have a card in one of the slots only this time.
>>>>>>>
>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>> slots, right? :)
>>>>>>> Yes :-)
>>>>>>>
>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>> please.
>>>>>>> Send privately.
>>>>>> No difference in before and after. Odd.
>>>>>>
>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>> so. Sorry for being not so helpful. :-/
>>>>> Poke :-)
>>>>>
>>>>> One more thing I tried was pushing the power button on the slot
>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>> either way.
>>>>>
>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>> Could you get /proc/interrupts information after power fault
>>>> problem happens and send it to me?
>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>> either (see previous reply to Alex).
>>>
>> Could you try the attached debugging patch? With this patch, power
>> fault interrupt would be disabled after 100 power fault detected (
>> I hope so). You can get /proc/interrupts after that.
> 
> Here is the output of doing the power on with that patch applied.
> 
> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
> pciehp 0000:00:05.0:pcie04: Link Training Error occurs 
> pciehp 0000:00:05.0:pcie04: Failed to check link status
> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
> 

From the console log, it seems that my debug patch worked as I expected
(power fault event interrupts ware disabled after 100 power fault event).
But for some reasons, /proc/interrupts indicates only 5 interrupts of
pciehp. Just in case, did you get /proc/interrupts after doing power on?

Thanks,
Kenji Kaneshige



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe - 2009-10-29 08:58:25
On Thu, Oct 29 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>> * Jens Axboe <jens.axboe@oracle.com>:
>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>> slots' power files.
>>>>>>>>>
>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>
>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>> on.
>>>>>>>> It produces:
>>>>>>>>
>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>
>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>> happens after you get your hardware replaced.
>>>>>> New board, the exact same thing happens.
>>>>>>
>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>
>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>> slots, right? :)
>>>>>>>> Yes :-)
>>>>>>>>
>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>> please.
>>>>>>>> Send privately.
>>>>>>> No difference in before and after. Odd.
>>>>>>>
>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>> Poke :-)
>>>>>>
>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>> either way.
>>>>>>
>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>> Could you get /proc/interrupts information after power fault
>>>>> problem happens and send it to me?
>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>> either (see previous reply to Alex).
>>>>
>>> Could you try the attached debugging patch? With this patch, power
>>> fault interrupt would be disabled after 100 power fault detected (
>>> I hope so). You can get /proc/interrupts after that.
>>
>> Here is the output of doing the power on with that patch applied.
>>
>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp 
>> 0000:00:05.0:pcie04: Failed to check link status
>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>
>
> From the console log, it seems that my debug patch worked as I expected
> (power fault event interrupts ware disabled after 100 power fault event).
> But for some reasons, /proc/interrupts indicates only 5 interrupts of
> pciehp. Just in case, did you get /proc/interrupts after doing power on?

Nope, it was captured post the power on attempt and the above log dump.
Kenji Kaneshige - 2009-10-29 09:23:06
Jens Axboe wrote:
> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>> * Jens Axboe <jens.axboe@oracle.com>:
>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>> slots' power files.
>>>>>>>>>>
>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>
>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>> on.
>>>>>>>>> It produces:
>>>>>>>>>
>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>
>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>> happens after you get your hardware replaced.
>>>>>>> New board, the exact same thing happens.
>>>>>>>
>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>
>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>> slots, right? :)
>>>>>>>>> Yes :-)
>>>>>>>>>
>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>> please.
>>>>>>>>> Send privately.
>>>>>>>> No difference in before and after. Odd.
>>>>>>>>
>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>> Poke :-)
>>>>>>>
>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>> either way.
>>>>>>>
>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>> Could you get /proc/interrupts information after power fault
>>>>>> problem happens and send it to me?
>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>> either (see previous reply to Alex).
>>>>>
>>>> Could you try the attached debugging patch? With this patch, power
>>>> fault interrupt would be disabled after 100 power fault detected (
>>>> I hope so). You can get /proc/interrupts after that.
>>> Here is the output of doing the power on with that patch applied.
>>>
>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp 
>>> 0000:00:05.0:pcie04: Failed to check link status
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>
>> From the console log, it seems that my debug patch worked as I expected
>> (power fault event interrupts ware disabled after 100 power fault event).
>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
> 
> Nope, it was captured post the power on attempt and the above log dump.
> 

Can I confirm that? (sorry for my poor English skill)

The /proc/interrupt was captured *before* the power on attempt and the log.
Correct?

Thanks,
Kenji Kaneshige





--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe - 2009-10-29 09:24:53
On Thu, Oct 29 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>>> Jens Axboe wrote:
>>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>>> * Jens Axboe <jens.axboe@oracle.com>:
>>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>>> slots' power files.
>>>>>>>>>>>
>>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>>
>>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>>> on.
>>>>>>>>>> It produces:
>>>>>>>>>>
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>>
>>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>>> happens after you get your hardware replaced.
>>>>>>>> New board, the exact same thing happens.
>>>>>>>>
>>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>>
>>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>>> slots, right? :)
>>>>>>>>>> Yes :-)
>>>>>>>>>>
>>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>>> please.
>>>>>>>>>> Send privately.
>>>>>>>>> No difference in before and after. Odd.
>>>>>>>>>
>>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>>> Poke :-)
>>>>>>>>
>>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>>> either way.
>>>>>>>>
>>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>>> Could you get /proc/interrupts information after power fault
>>>>>>> problem happens and send it to me?
>>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>>> either (see previous reply to Alex).
>>>>>>
>>>>> Could you try the attached debugging patch? With this patch, power
>>>>> fault interrupt would be disabled after 100 power fault detected (
>>>>> I hope so). You can get /proc/interrupts after that.
>>>> Here is the output of doing the power on with that patch applied.
>>>>
>>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp  
>>>> 0000:00:05.0:pcie04: Failed to check link status
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>>
>>> From the console log, it seems that my debug patch worked as I expected
>>> (power fault event interrupts ware disabled after 100 power fault event).
>>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
>>
>> Nope, it was captured post the power on attempt and the above log dump.
>>
>
> Can I confirm that? (sorry for my poor English skill)
>
> The /proc/interrupt was captured *before* the power on attempt and the log.
> Correct?

No, the /proc/interrupt output was captured AFTER the power on attempt
and the log capture shown above.

Patch

Index: 20091026/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- 20091026.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ 20091026/drivers/pci/hotplug/pciehp_hpc.c
@@ -612,6 +612,7 @@  static irqreturn_t pcie_isr(int irq, voi
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct slot *slot = ctrl->slot;
 	u16 detected, intr_loc;
+	static int nr_power_faults = 0;
 
 	/*
 	 * In order to guarantee that all interrupt events are
@@ -664,6 +665,13 @@  static irqreturn_t pcie_isr(int irq, voi
 	if (intr_loc & PCI_EXP_SLTSTA_PDC)
 		pciehp_handle_presence_change(slot);
 
+	if ((intr_loc & PCI_EXP_SLTSTA_PFD) && (++nr_power_faults > 100)) {
+		u16 reg16;
+		pciehp_readw(ctrl, PCI_EXP_SLTCTL, &reg16);
+		reg16 &= ~PCI_EXP_SLTCTL_PFDE;
+		pciehp_writew(ctrl, PCI_EXP_SLTCTL, reg16);
+	}
+
 	/* Check Power Fault Detected */
 	if ((intr_loc & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
 		ctrl->power_fault_detected = 1;