From patchwork Mon Mar 11 21:18:40 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Mokrejs X-Patchwork-Id: 2251321 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id A02D74020C for ; Mon, 11 Mar 2013 21:18:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932193Ab3CKVSr (ORCPT ); Mon, 11 Mar 2013 17:18:47 -0400 Received: from fold.natur.cuni.cz ([195.113.57.32]:53404 "HELO fold.natur.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S932188Ab3CKVSp (ORCPT ); Mon, 11 Mar 2013 17:18:45 -0400 Received: (qmail 5211 invoked from network); 11 Mar 2013 21:18:41 -0000 Received: from unknown (HELO ?192.168.251.6?) (192.168.251.6) by 192.168.251.1 with SMTP; 11 Mar 2013 21:18:41 -0000 Message-ID: <513E4A30.5070909@fold.natur.cuni.cz> Date: Mon, 11 Mar 2013 22:18:40 +0100 From: Martin Mokrejs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 SeaMonkey/2.16 MIME-Version: 1.0 To: "linux-pci@vger.kernel.org" , Bjorn Helgaas , "Rafael J. Wysocki" , Yinghai Lu Subject: 3.7.10: acpiphp with pcie_aspm=off and eSATA card Sil 3132 X-Enigmail-Version: 1.5 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Hi, while testing how acpiphp behaves when an eSATA express card is repeatedly inserted and removed from a slot, I realized few new things. The while true; do echo -n "$(date +%T) SlotStatus "; setpci -s1c.7 0x5a.w; sleep 1; done reveals that when I cold boot with a card in a slot, it reports 0040. When I boot with an empty slot, we start with 0000 instead. The first insertion of the card moves us to 0140. If the mechanics works well, value stays at 0140 and upon sucessfull eject it falls down to 0100. If the card slips out (like in this case) the value reports 0140 while the slot is actually empty. Then the next insert in unnoticed, value is still 0140, and following eject (even eject) rescues us and the slot reports 0100 properly. The above is a "general" behavior I would say. I haven't seen the 0040 status before but could be I did not test coldplug with pciehp yet while using the setpci shell command. A side note: To prepare for even dimmer future, let me tell you that I already saw more values! During eject there is also 0148 possible and that falls down to 0108 upon eject. Funny was that I left the eSATA card in the slot while starring at the new numbers and meanwhile they started due to some background activity changing back and forth (I am not showing it here). Probably because of the sata_sil24 being half dead. I am placing a file http://195.113.57.32/~mmokrejs/tmp/3.7.10_acpiphp_pcie_aspm_off.tar.bz2 (200kB) which contains lspci outputs every second and lsusb, dmesg: while true; do lspci -vvvxxx > "lspci_loop.$(date +%T)"; sleep 1; done It seems lspci does not always report all changes in sufficient detail unless I use -vvvxxx. Below is tracking by every second when I inserted a card and it slipped out, followed by another (successful) insertion into the slot. I am showing some of the diffs here but open along this email a file slot_status.txt from the tar.bz2 above. Definitely, below you can see PresDet is changed on the SltSta Status: line only while NOT on the Changed: line. From other threads I opened I conclude this is a common bug to pciehp and acpiphp, or is a lspci bug, or a hardware/bios bug. I speculate the "SltSta: Changed:" line of lspci is not updated properly, or at all, not only the PresDet in it. I am curious whether the below hex diffs could be transformed into some readable line in default, "lspci -vv" output. # diff -u -w lspci_loop.20:17:32 lspci_loop.20:17:33 Another insert ... # diff -u -w lspci_loop.20:18:49 lspci_loop.20:18:52 etc. Now, maybe somebody have the guts to look into the held locks in hung tasks as shown in dmesg_after_alt_sysrq_w.txt. ;-) [ 361.740663] INFO: task kworker/u:4:1615 blocked for more than 120 seconds. [ 361.740670] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.740675] kworker/u:4 D ffffffff81ca46c0 0 1615 2 0x00000000 [ 361.740688] ffff88040b3b9c88 0000000000000046 0000000000000096 ffff88040d2f5d00 [ 361.740698] ffffffff815e48c1 ffff88040b8a6c80 ffff88040b3b9fd8 ffff88040b3b9fd8 [ 361.740707] 0000000000011700 ffff88040b8a6c80 0000000000000292 ffff880405d08000 [ 361.740718] Call Trace: [ 361.740738] [] ? _raw_spin_unlock_irqrestore+0x3a/0x41 [ 361.740741] [] schedule+0x5f/0x61 [ 361.740745] [] ata_port_wait_eh+0x6d/0xcb [ 361.740750] [] ? __init_waitqueue_head+0x4c/0x4c [ 361.740752] [] ata_port_probe+0x25/0x32 [ 361.740754] [] async_port_probe+0x30/0x4d [ 361.740758] [] async_run_entry_fn+0xc0/0x1a4 [ 361.740760] [] process_one_work+0x1ae/0x2d2 [ 361.740762] [] ? process_one_work+0x144/0x2d2 [ 361.740771] [] ? async_schedule+0x12/0x12 [ 361.740773] [] worker_thread+0x13e/0x1d5 [ 361.740774] [] ? manage_workers+0x23d/0x23d [ 361.740776] [] kthread+0xac/0xb4 [ 361.740778] [] ? __init_kthread_worker+0x54/0x54 [ 361.740781] [] ret_from_fork+0x7c/0xb0 [ 361.740783] [] ? __init_kthread_worker+0x54/0x54 [ 361.740785] 2 locks held by kworker/u:4/1615: [ 361.740785] #0: (events_unbound){.+.+.+}, at: [] process_one_work+0x144/0x2d2 [ 361.740790] #1: ((&entry->work)){+.+.+.}, at: [] process_one_work+0x144/0x2d2 Sure, acpiphp (but also pciehp) should prevent any drivers to pickup the newly formed device and force 6sec delay, because the cards sometimes slip out of the slot. It is meaningless to introduce the delay in every PCI driver, that should be handled by those handling express slot hotplug. Until then, we will see in logs things like this. The eSATA used to OOPs my kernel hardly in 3.4 and the Firewire still kills the kernel with 3.7 and 3.9-rc1 (stacks not shown, were posted already in the past). Compared with that, the kworker hung task is an improvement as it does not kill my running kernel and my work. ;-) Martin --- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- lspci_loop.20:17:32 2013-03-11 20:17:32.000000000 +0100 +++ lspci_loop.20:17:33 2013-03-11 20:17:33.000000000 +0100 @@ -516,7 +516,7 @@ 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 01 58 c2 c9 00 00 00 00 0a 98 a0 20 00 00 00 00 -60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 00 40 +60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00 # diff -u -w lspci_loop.20:17:33 lspci_loop.20:17:34 --- lspci_loop.20:17:33 2013-03-11 20:17:33.000000000 +0100 +++ lspci_loop.20:17:34 2013-03-11 20:17:34.000000000 +0100 @@ -516,15 +516,15 @@ 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 01 58 c2 c9 00 00 00 00 0a 98 a0 20 00 00 00 00 -60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 +60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 00 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 -90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00 +90: 00 00 00 00 00 00 00 00 13 00 06 03 00 01 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 30 00 d6 +e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 33 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 # diff -u -w lspci_loop.20:17:34 lspci_loop.20:17:35 --- lspci_loop.20:17:34 2013-03-11 20:17:34.000000000 +0100 +++ lspci_loop.20:17:35 2013-03-11 20:17:35.000000000 +0100 @@ -516,7 +516,7 @@ 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 01 58 c2 c9 00 00 00 00 0a 98 a0 20 00 00 00 00 -60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 00 40 +60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 90: 00 00 00 00 00 00 00 00 13 00 06 03 00 01 00 00 @@ -524,7 +524,7 @@ b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 33 00 d6 +e0: 00 00 00 00 80 8d 0d 00 00 c0 00 d6 04 33 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 # diff -u -w lspci_loop.20:17:35 lspci_loop.20:17:37 --- lspci_loop.20:17:35 2013-03-11 20:17:35.000000000 +0100 +++ lspci_loop.20:17:37 2013-03-11 20:17:37.000000000 +0100 @@ -519,12 +519,12 @@ 60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 -90: 00 00 00 00 00 00 00 00 13 00 06 03 00 01 00 00 +90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 80 8d 0d 00 00 c0 00 d6 04 33 00 d6 +e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 33 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 # diff -u -w lspci_loop.20:17:37 lspci_loop.20:17:38 --- lspci_loop.20:17:37 2013-03-11 20:17:37.000000000 +0100 +++ lspci_loop.20:17:38 2013-03-11 20:17:38.000000000 +0100 @@ -524,7 +524,7 @@ b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 33 00 d6 +e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 30 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 # diff -u -w lspci_loop.20:17:39 lspci_loop.20:17:40 --- lspci_loop.20:17:39 2013-03-11 20:17:39.000000000 +0100 +++ lspci_loop.20:17:40 2013-03-11 20:17:40.000000000 +0100 @@ -519,12 +519,12 @@ 60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 -90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00 +90: 00 00 00 00 00 00 00 00 13 00 06 03 00 01 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 30 00 d6 +e0: 00 00 00 00 80 8d 0d 80 00 c0 00 d6 04 33 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 # diff -u -w lspci_loop.20:17:40 lspci_loop.20:17:41 --- lspci_loop.20:17:40 2013-03-11 20:17:40.000000000 +0100 +++ lspci_loop.20:17:41 2013-03-11 20:17:41.000000000 +0100 @@ -519,12 +519,12 @@ 60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 -90: 00 00 00 00 00 00 00 00 13 00 06 03 00 01 00 00 +90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 80 8d 0d 80 00 c0 00 d6 04 33 00 d6 +e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 33 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 # diff -u -w lspci_loop.20:17:41 lspci_loop.20:17:42 --- lspci_loop.20:17:41 2013-03-11 20:17:41.000000000 +0100 +++ lspci_loop.20:17:42 2013-03-11 20:17:42.000000000 +0100 @@ -453,12 +453,12 @@ ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- - LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Slot #7, PowerLimit 10.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- - SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- + SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- @@ -482,7 +482,7 @@ 20: c0 f6 c0 f7 01 f0 01 f1 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 04 10 00 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 08 -50: 40 00 11 50 60 b2 3c 00 00 00 00 01 00 00 00 00 +50: 40 00 11 70 60 b2 3c 00 00 00 40 01 00 00 00 00 60: 00 00 00 00 16 00 00 00 00 00 00 00 00 00 00 00 70: 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 05 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @@ -516,7 +516,7 @@ 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 01 58 c2 c9 00 00 00 00 0a 98 a0 20 00 00 00 00 -60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 08 40 +60: 20 20 ff 07 00 00 00 00 01 00 00 01 00 00 00 40 70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00 90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00 @@ -524,7 +524,7 @@ b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00 -e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 33 00 d6 +e0: 00 00 00 00 00 00 00 00 00 00 00 00 04 30 00 d6 f0: 00 00 00 00 88 85 80 00 87 0f 05 08 08 17 5b 20 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC Controller (rev 05) @@ -799,3 +799,56 @@ e0: 00 00 40 63 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) + Subsystem: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller + Physical Slot: 1 + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- + Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-