Message ID | CALCETrWdQwvMcgC=PESyECzkyMD8-vzjNEHOCkm5z=ENH7coHg@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
> > I doubt I can bisect -- the trigger was a new AP, not a new kernel. I > can't exactly cut the AP in half :) I see.. This is really weird though. Anyway. > > > Pre-suspend, i.e., working: > > [ 20.949900] enabled = 1, wowlan = 0 > [ 20.950177] enabled = 1, wowlan = 0 > [ 21.614016] enabled = 1, wowlan = 0 > [ 21.614658] enabled = 1, wowlan = 0 > [ 42.667586] enabled = 0, wowlan = 0 > [ 42.672514] enabled = 1, wowlan = 0 > [ 53.088165] fuse init (API version 7.23) > [ 53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts > [ 53.130945] SELinux: initialized (dev fusectl, type fusectl), uses > genfs_contexts > [ 85.627558] enabled = 0, wowlan = 0 > [ 85.631686] enabled = 1, wowlan = 0 > [ 134.649346] e1000e: em1 NIC Link is Down > [ 137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local > choice (Reason: 3=DEAUTH_LEAVING) > [ 137.682780] enabled = 0, wowlan = 0 > [ 137.693889] enabled = 0, wowlan = 0 > > Post-suspend, i.e., not working: > > [ 144.406303] enabled = 1, wowlan = 0 > [ 144.406496] enabled = 1, wowlan = 0 > [ 145.026827] enabled = 1, wowlan = 0 > [ 145.028211] enabled = 1, wowlan = 0 > [ 165.688632] enabled = 0, wowlan = 0 > [ 165.689960] enabled = 0, wowlan = 0 > [ 165.693988] enabled = 1, wowlan = 0 > [ 165.694245] enabled = 1, wowlan = 0 > [ 208.641426] enabled = 0, wowlan = 0 > [ 208.641786] enabled = 0, wowlan = 0 > [ 208.647499] enabled = 1, wowlan = 0 > [ 208.647639] enabled = 1, wowlan = 0 > [ 271.435558] enabled = 0, wowlan = 0 > [ 271.435767] enabled = 0, wowlan = 0 > [ 271.440125] enabled = 1, wowlan = 0 > [ 271.440405] enabled = 1, wowlan = 0 > > With even more instrumentation added, I did get a glitch before > suspend/resume, but it came with more than two power setting updates. > Logs and patch attached, complete with call stacks. > I don't see any callstacks? Doesn't matter though. > Is it possible that part of the problem is that the firmware doesn't > like being spammed with power config changes so frequently? > Don't think so. But the suspend / resume thing is not the kind of things that are fun to debug. I am afraid that I will disable power save completely since I don't really have the time to dig into that. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: >> >> I doubt I can bisect -- the trigger was a new AP, not a new kernel. I >> can't exactly cut the AP in half :) > > I see.. This is really weird though. Anyway. > >> >> >> Pre-suspend, i.e., working: >> >> [ 20.949900] enabled = 1, wowlan = 0 >> [ 20.950177] enabled = 1, wowlan = 0 >> [ 21.614016] enabled = 1, wowlan = 0 >> [ 21.614658] enabled = 1, wowlan = 0 >> [ 42.667586] enabled = 0, wowlan = 0 >> [ 42.672514] enabled = 1, wowlan = 0 >> [ 53.088165] fuse init (API version 7.23) >> [ 53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts >> [ 53.130945] SELinux: initialized (dev fusectl, type fusectl), uses >> genfs_contexts >> [ 85.627558] enabled = 0, wowlan = 0 >> [ 85.631686] enabled = 1, wowlan = 0 >> [ 134.649346] e1000e: em1 NIC Link is Down >> [ 137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local >> choice (Reason: 3=DEAUTH_LEAVING) >> [ 137.682780] enabled = 0, wowlan = 0 >> [ 137.693889] enabled = 0, wowlan = 0 >> >> Post-suspend, i.e., not working: >> >> [ 144.406303] enabled = 1, wowlan = 0 >> [ 144.406496] enabled = 1, wowlan = 0 >> [ 145.026827] enabled = 1, wowlan = 0 >> [ 145.028211] enabled = 1, wowlan = 0 >> [ 165.688632] enabled = 0, wowlan = 0 >> [ 165.689960] enabled = 0, wowlan = 0 >> [ 165.693988] enabled = 1, wowlan = 0 >> [ 165.694245] enabled = 1, wowlan = 0 >> [ 208.641426] enabled = 0, wowlan = 0 >> [ 208.641786] enabled = 0, wowlan = 0 >> [ 208.647499] enabled = 1, wowlan = 0 >> [ 208.647639] enabled = 1, wowlan = 0 >> [ 271.435558] enabled = 0, wowlan = 0 >> [ 271.435767] enabled = 0, wowlan = 0 >> [ 271.440125] enabled = 1, wowlan = 0 >> [ 271.440405] enabled = 1, wowlan = 0 >> >> With even more instrumentation added, I did get a glitch before >> suspend/resume, but it came with more than two power setting updates. >> Logs and patch attached, complete with call stacks. >> > > I don't see any callstacks? > Doesn't matter though. I think the callstacks were in the attachment. I could have messed up, though. Anyway, I don't buy the theory that this is caused by the firmware going out to lunch. The queues files in debugfs show the rx queue chugging along and all of the tx queues have read_ptr == write_ptr. Wireshark shows incoming broadcast traffic, too. I'd guess that the problem is more likely to be that the card is failing to wake up and notice pending data in the TIM. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: >>> >>> I doubt I can bisect -- the trigger was a new AP, not a new kernel. I >>> can't exactly cut the AP in half :) >> >> I see.. This is really weird though. Anyway. >> >>> >>> >>> Pre-suspend, i.e., working: >>> >>> [ 20.949900] enabled = 1, wowlan = 0 >>> [ 20.950177] enabled = 1, wowlan = 0 >>> [ 21.614016] enabled = 1, wowlan = 0 >>> [ 21.614658] enabled = 1, wowlan = 0 >>> [ 42.667586] enabled = 0, wowlan = 0 >>> [ 42.672514] enabled = 1, wowlan = 0 >>> [ 53.088165] fuse init (API version 7.23) >>> [ 53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts >>> [ 53.130945] SELinux: initialized (dev fusectl, type fusectl), uses >>> genfs_contexts >>> [ 85.627558] enabled = 0, wowlan = 0 >>> [ 85.631686] enabled = 1, wowlan = 0 >>> [ 134.649346] e1000e: em1 NIC Link is Down >>> [ 137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local >>> choice (Reason: 3=DEAUTH_LEAVING) >>> [ 137.682780] enabled = 0, wowlan = 0 >>> [ 137.693889] enabled = 0, wowlan = 0 >>> >>> Post-suspend, i.e., not working: >>> >>> [ 144.406303] enabled = 1, wowlan = 0 >>> [ 144.406496] enabled = 1, wowlan = 0 >>> [ 145.026827] enabled = 1, wowlan = 0 >>> [ 145.028211] enabled = 1, wowlan = 0 >>> [ 165.688632] enabled = 0, wowlan = 0 >>> [ 165.689960] enabled = 0, wowlan = 0 >>> [ 165.693988] enabled = 1, wowlan = 0 >>> [ 165.694245] enabled = 1, wowlan = 0 >>> [ 208.641426] enabled = 0, wowlan = 0 >>> [ 208.641786] enabled = 0, wowlan = 0 >>> [ 208.647499] enabled = 1, wowlan = 0 >>> [ 208.647639] enabled = 1, wowlan = 0 >>> [ 271.435558] enabled = 0, wowlan = 0 >>> [ 271.435767] enabled = 0, wowlan = 0 >>> [ 271.440125] enabled = 1, wowlan = 0 >>> [ 271.440405] enabled = 1, wowlan = 0 >>> >>> With even more instrumentation added, I did get a glitch before >>> suspend/resume, but it came with more than two power setting updates. >>> Logs and patch attached, complete with call stacks. >>> >> >> I don't see any callstacks? >> Doesn't matter though. > > I think the callstacks were in the attachment. I could have messed up, though. > > Anyway, I don't buy the theory that this is caused by the firmware > going out to lunch. The queues files in debugfs show the rx queue > chugging along and all of the tx queues have read_ptr == write_ptr. > Wireshark shows incoming broadcast traffic, too. I'd guess that the > problem is more likely to be that the card is failing to wake up and > notice pending data in the TIM. OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing pretty well. I'll test a stock kernel configured like that for the next few days. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/02/2014 09:11 PM, Andy Lutomirski wrote: > On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: >>>> >>>> I doubt I can bisect -- the trigger was a new AP, not a new kernel. I >>>> can't exactly cut the AP in half :) >>> >>> I see.. This is really weird though. Anyway. >>> >>>> >>>> >>>> Pre-suspend, i.e., working: >>>> >>>> [ 20.949900] enabled = 1, wowlan = 0 >>>> [ 20.950177] enabled = 1, wowlan = 0 >>>> [ 21.614016] enabled = 1, wowlan = 0 >>>> [ 21.614658] enabled = 1, wowlan = 0 >>>> [ 42.667586] enabled = 0, wowlan = 0 >>>> [ 42.672514] enabled = 1, wowlan = 0 >>>> [ 53.088165] fuse init (API version 7.23) >>>> [ 53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts >>>> [ 53.130945] SELinux: initialized (dev fusectl, type fusectl), uses >>>> genfs_contexts >>>> [ 85.627558] enabled = 0, wowlan = 0 >>>> [ 85.631686] enabled = 1, wowlan = 0 >>>> [ 134.649346] e1000e: em1 NIC Link is Down >>>> [ 137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local >>>> choice (Reason: 3=DEAUTH_LEAVING) >>>> [ 137.682780] enabled = 0, wowlan = 0 >>>> [ 137.693889] enabled = 0, wowlan = 0 >>>> >>>> Post-suspend, i.e., not working: >>>> >>>> [ 144.406303] enabled = 1, wowlan = 0 >>>> [ 144.406496] enabled = 1, wowlan = 0 >>>> [ 145.026827] enabled = 1, wowlan = 0 >>>> [ 145.028211] enabled = 1, wowlan = 0 >>>> [ 165.688632] enabled = 0, wowlan = 0 >>>> [ 165.689960] enabled = 0, wowlan = 0 >>>> [ 165.693988] enabled = 1, wowlan = 0 >>>> [ 165.694245] enabled = 1, wowlan = 0 >>>> [ 208.641426] enabled = 0, wowlan = 0 >>>> [ 208.641786] enabled = 0, wowlan = 0 >>>> [ 208.647499] enabled = 1, wowlan = 0 >>>> [ 208.647639] enabled = 1, wowlan = 0 >>>> [ 271.435558] enabled = 0, wowlan = 0 >>>> [ 271.435767] enabled = 0, wowlan = 0 >>>> [ 271.440125] enabled = 1, wowlan = 0 >>>> [ 271.440405] enabled = 1, wowlan = 0 >>>> >>>> With even more instrumentation added, I did get a glitch before >>>> suspend/resume, but it came with more than two power setting updates. >>>> Logs and patch attached, complete with call stacks. >>>> >>> >>> I don't see any callstacks? >>> Doesn't matter though. >> >> I think the callstacks were in the attachment. I could have messed up, though. >> >> Anyway, I don't buy the theory that this is caused by the firmware >> going out to lunch. The queues files in debugfs show the rx queue >> chugging along and all of the tx queues have read_ptr == write_ptr. >> Wireshark shows incoming broadcast traffic, too. I'd guess that the >> problem is more likely to be that the card is failing to wake up and >> notice pending data in the TIM. Well... I might have been unclear here (I never know how much detail I should share with the recipient :)). From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen. No power save - no need for wakeup interrupt. > > OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing > pretty well. I'll test a stock kernel configured like that for the > next few days. > That's interesting... -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 2, 2014 at 11:40 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: > > > On 06/02/2014 09:11 PM, Andy Lutomirski wrote: >> On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>> On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: >>>>> >>>>> I doubt I can bisect -- the trigger was a new AP, not a new kernel. I >>>>> can't exactly cut the AP in half :) >>>> >>>> I see.. This is really weird though. Anyway. >>>> >>>>> >>>>> >>>>> Pre-suspend, i.e., working: >>>>> >>>>> [ 20.949900] enabled = 1, wowlan = 0 >>>>> [ 20.950177] enabled = 1, wowlan = 0 >>>>> [ 21.614016] enabled = 1, wowlan = 0 >>>>> [ 21.614658] enabled = 1, wowlan = 0 >>>>> [ 42.667586] enabled = 0, wowlan = 0 >>>>> [ 42.672514] enabled = 1, wowlan = 0 >>>>> [ 53.088165] fuse init (API version 7.23) >>>>> [ 53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts >>>>> [ 53.130945] SELinux: initialized (dev fusectl, type fusectl), uses >>>>> genfs_contexts >>>>> [ 85.627558] enabled = 0, wowlan = 0 >>>>> [ 85.631686] enabled = 1, wowlan = 0 >>>>> [ 134.649346] e1000e: em1 NIC Link is Down >>>>> [ 137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local >>>>> choice (Reason: 3=DEAUTH_LEAVING) >>>>> [ 137.682780] enabled = 0, wowlan = 0 >>>>> [ 137.693889] enabled = 0, wowlan = 0 >>>>> >>>>> Post-suspend, i.e., not working: >>>>> >>>>> [ 144.406303] enabled = 1, wowlan = 0 >>>>> [ 144.406496] enabled = 1, wowlan = 0 >>>>> [ 145.026827] enabled = 1, wowlan = 0 >>>>> [ 145.028211] enabled = 1, wowlan = 0 >>>>> [ 165.688632] enabled = 0, wowlan = 0 >>>>> [ 165.689960] enabled = 0, wowlan = 0 >>>>> [ 165.693988] enabled = 1, wowlan = 0 >>>>> [ 165.694245] enabled = 1, wowlan = 0 >>>>> [ 208.641426] enabled = 0, wowlan = 0 >>>>> [ 208.641786] enabled = 0, wowlan = 0 >>>>> [ 208.647499] enabled = 1, wowlan = 0 >>>>> [ 208.647639] enabled = 1, wowlan = 0 >>>>> [ 271.435558] enabled = 0, wowlan = 0 >>>>> [ 271.435767] enabled = 0, wowlan = 0 >>>>> [ 271.440125] enabled = 1, wowlan = 0 >>>>> [ 271.440405] enabled = 1, wowlan = 0 >>>>> >>>>> With even more instrumentation added, I did get a glitch before >>>>> suspend/resume, but it came with more than two power setting updates. >>>>> Logs and patch attached, complete with call stacks. >>>>> >>>> >>>> I don't see any callstacks? >>>> Doesn't matter though. >>> >>> I think the callstacks were in the attachment. I could have messed up, though. >>> >>> Anyway, I don't buy the theory that this is caused by the firmware >>> going out to lunch. The queues files in debugfs show the rx queue >>> chugging along and all of the tx queues have read_ptr == write_ptr. >>> Wireshark shows incoming broadcast traffic, too. I'd guess that the >>> problem is more likely to be that the card is failing to wake up and >>> notice pending data in the TIM. > > Well... I might have been unclear here (I never know how much detail I should share with the recipient :)). > From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen. > No power save - no need for wakeup interrupt. I'm still unconvinced. One of the tx queues actually has a both read_ptr and write_ptr incrementing once or twice per second even when I can't ping the gateway. Can you point me at the right code or log stuff to look at? > >> >> OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing >> pretty well. I'll test a stock kernel configured like that for the >> next few days. >> > > That's interesting... --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> >>>> Anyway, I don't buy the theory that this is caused by the firmware >>>> going out to lunch. The queues files in debugfs show the rx queue >>>> chugging along and all of the tx queues have read_ptr == write_ptr. >>>> Wireshark shows incoming broadcast traffic, too. I'd guess that the >>>> problem is more likely to be that the card is failing to wake up and >>>> notice pending data in the TIM. >> >> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)). >> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen. >> No power save - no need for wakeup interrupt. > > I'm still unconvinced. One of the tx queues actually has a both > read_ptr and write_ptr incrementing once or twice per second even when > I can't ping the gateway. Can you point me at the right code or log > stuff to look at? drivers/net/wireless/iwlwifi/pcie/tx.c: static void iwl_pcie_txq_inc_wr_ptr(struct iwl_trans *trans, struct iwl_txq *txq) [snip] if (reg & CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP) { IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n", txq_id, reg); iwl_set_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ); txq->need_update = true; return; } [snip] } ISR - drivers/net/wireless/iwlwifi/pcie/tx.c: irqreturn_t iwl_pcie_irq_handler(int irq, void *dev_id) { [snip] /* uCode wakes up after power-down sleep */ if (inta & CSR_INT_BIT_WAKEUP) { IWL_DEBUG_ISR(trans, "Wakeup interrupt\n"); iwl_pcie_rxq_check_wrptr(trans); iwl_pcie_txq_check_wrptrs(trans); isr_stats->wakeup++; handled |= CSR_INT_BIT_WAKEUP; } [snip] } drivers/net/wireless/iwlwifi/pcie/tx.c: void iwl_pcie_txq_check_wrptrs(struct iwl_trans *trans) { struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans); int i; for (i = 0; i < trans->cfg->base_params->num_of_queues; i++) { struct iwl_txq *txq = &trans_pcie->txq[i]; spin_lock_bh(&txq->lock); if (trans_pcie->txq[i].need_update) { iwl_pcie_txq_inc_wr_ptr(trans, txq); trans_pcie->txq[i].need_update = false; } spin_unlock_bh(&txq->lock); } } Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets. > >> >>> >>> OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing >>> pretty well. I'll test a stock kernel configured like that for the >>> next few days. >>> >> >> That's interesting... > > --Andy > -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 2, 2014 at 11:53 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: > >>>>> >>>>> Anyway, I don't buy the theory that this is caused by the firmware >>>>> going out to lunch. The queues files in debugfs show the rx queue >>>>> chugging along and all of the tx queues have read_ptr == write_ptr. >>>>> Wireshark shows incoming broadcast traffic, too. I'd guess that the >>>>> problem is more likely to be that the card is failing to wake up and >>>>> notice pending data in the TIM. >>> >>> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)). >>> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen. >>> No power save - no need for wakeup interrupt. >> >> I'm still unconvinced. One of the tx queues actually has a both >> read_ptr and write_ptr incrementing once or twice per second even when >> I can't ping the gateway. Can you point me at the right code or log >> stuff to look at? > > drivers/net/wireless/iwlwifi/pcie/tx.c: > static void iwl_pcie_txq_inc_wr_ptr(struct iwl_trans *trans, > struct iwl_txq *txq) > [snip] > > if (reg & CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP) { > IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n", > txq_id, reg); > iwl_set_bit(trans, CSR_GP_CNTRL, > CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ); > txq->need_update = true; > return; > } > [snip] > } > > ISR - drivers/net/wireless/iwlwifi/pcie/tx.c: > irqreturn_t iwl_pcie_irq_handler(int irq, void *dev_id) > { > [snip] > /* uCode wakes up after power-down sleep */ > if (inta & CSR_INT_BIT_WAKEUP) { > IWL_DEBUG_ISR(trans, "Wakeup interrupt\n"); > iwl_pcie_rxq_check_wrptr(trans); > iwl_pcie_txq_check_wrptrs(trans); > > isr_stats->wakeup++; > > handled |= CSR_INT_BIT_WAKEUP; > } > [snip] > } > > drivers/net/wireless/iwlwifi/pcie/tx.c: > void iwl_pcie_txq_check_wrptrs(struct iwl_trans *trans) > { > struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans); > int i; > > for (i = 0; i < trans->cfg->base_params->num_of_queues; i++) { > struct iwl_txq *txq = &trans_pcie->txq[i]; > > spin_lock_bh(&txq->lock); > if (trans_pcie->txq[i].need_update) { > iwl_pcie_txq_inc_wr_ptr(trans, txq); > trans_pcie->txq[i].need_update = false; > } > spin_unlock_bh(&txq->lock); > } > } > > Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets. If that's queue 4 in the debugfs tx_queues file, then I think that that's the queue that keeps moving when the card is dead. I'll see if I can improve the tx_queues file to show pending updates. Yay troubleshooting old hardware. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >>>>>> >>>>>> Anyway, I don't buy the theory that this is caused by the firmware >>>>>> going out to lunch. The queues files in debugfs show the rx queue >>>>>> chugging along and all of the tx queues have read_ptr == write_ptr. >>>>>> Wireshark shows incoming broadcast traffic, too. I'd guess that the >>>>>> problem is more likely to be that the card is failing to wake up and >>>>>> notice pending data in the TIM. >>>> >>>> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)). >>>> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen. >>>> No power save - no need for wakeup interrupt. >>> >>> I'm still unconvinced. One of the tx queues actually has a both >>> read_ptr and write_ptr incrementing once or twice per second even when >>> I can't ping the gateway. Can you point me at the right code or log >>> stuff to look at? >> >> drivers/net/wireless/iwlwifi/pcie/tx.c: >> static void iwl_pcie_txq_inc_wr_ptr(struct iwl_trans *trans, >> struct iwl_txq *txq) >> [snip] >> >> if (reg & CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP) { >> IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n", >> txq_id, reg); >> iwl_set_bit(trans, CSR_GP_CNTRL, >> CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ); >> txq->need_update = true; >> return; >> } >> [snip] >> } >> >> ISR - drivers/net/wireless/iwlwifi/pcie/tx.c: >> irqreturn_t iwl_pcie_irq_handler(int irq, void *dev_id) >> { >> [snip] >> /* uCode wakes up after power-down sleep */ >> if (inta & CSR_INT_BIT_WAKEUP) { >> IWL_DEBUG_ISR(trans, "Wakeup interrupt\n"); >> iwl_pcie_rxq_check_wrptr(trans); >> iwl_pcie_txq_check_wrptrs(trans); >> >> isr_stats->wakeup++; >> >> handled |= CSR_INT_BIT_WAKEUP; >> } >> [snip] >> } >> >> drivers/net/wireless/iwlwifi/pcie/tx.c: >> void iwl_pcie_txq_check_wrptrs(struct iwl_trans *trans) >> { >> struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans); >> int i; >> >> for (i = 0; i < trans->cfg->base_params->num_of_queues; i++) { >> struct iwl_txq *txq = &trans_pcie->txq[i]; >> >> spin_lock_bh(&txq->lock); >> if (trans_pcie->txq[i].need_update) { >> iwl_pcie_txq_inc_wr_ptr(trans, txq); >> trans_pcie->txq[i].need_update = false; >> } >> spin_unlock_bh(&txq->lock); >> } >> } >> >> Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets. > > If that's queue 4 in the debugfs tx_queues file, then I think that > that's the queue that keeps moving when the card is dead. I'll see if > I can improve the tx_queues file to show pending updates. > > Yay troubleshooting old hardware. > If you plan to spend some time hacking on the code, please take into account this code has been changed in 3.14 (mostly cleanups by Johannes and myself). Also note that long ago, I disabled shadow registers (you'll see ifs all over the code). Shadow registers makes the whole wakup thing much easier since it is handled internally in the HW. BUT we had tons of issues with it so we disabled it. Most of the issues should have gone away with the cmd_in_flight thing. So I would recommend you to give it a try with shadow register enabled. Just check the relevant iwl-XXXX.c file. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 2, 2014 at 12:32 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote: >>> Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets. >> >> If that's queue 4 in the debugfs tx_queues file, then I think that >> that's the queue that keeps moving when the card is dead. I'll see if >> I can improve the tx_queues file to show pending updates. >> >> Yay troubleshooting old hardware. >> > If you plan to spend some time hacking on the code, please take into > account this code has been changed in 3.14 (mostly cleanups by > Johannes and myself). > Also note that long ago, I disabled shadow registers (you'll see ifs > all over the code). Shadow registers makes the whole wakup thing much > easier since it is handled internally in the HW. BUT we had tons of > issues with it so we disabled it. Most of the issues should have gone > away with the cmd_in_flight thing. So I would recommend you to give it > a try with shadow register enabled. Just check the relevant iwl-XXXX.c > file. Will do. FWIW, I looked at the cmd_in_flight thing and, unless there's a bunch of locking magic going on, it looks racy. It seems like some code expects it to be protected by reg_lock, and other code expects it to be protected by txq_lock (and there are multiple txq->lock instances, I think). Also, what stops the card from going to sleep while processing a wakeup interrupt? For example, what prevents this chain of events? The only thing I can see is that the only thing that clears ACCESS_REQ is the rx interrupt processing, which happens after wakeup processing, but this seems rather fragile. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> If that's queue 4 in the debugfs tx_queues file, then I think that >>> that's the queue that keeps moving when the card is dead. I'll see if >>> I can improve the tx_queues file to show pending updates. >>> >>> Yay troubleshooting old hardware. >>> >> If you plan to spend some time hacking on the code, please take into >> account this code has been changed in 3.14 (mostly cleanups by >> Johannes and myself). >> Also note that long ago, I disabled shadow registers (you'll see ifs >> all over the code). Shadow registers makes the whole wakup thing much >> easier since it is handled internally in the HW. BUT we had tons of >> issues with it so we disabled it. Most of the issues should have gone >> away with the cmd_in_flight thing. So I would recommend you to give it >> a try with shadow register enabled. Just check the relevant iwl-XXXX.c >> file. > > Will do. > > FWIW, I looked at the cmd_in_flight thing and, unless there's a bunch > of locking magic going on, it looks racy. It seems like some code > expects it to be protected by reg_lock, and other code expects it to > be protected by txq_lock (and there are multiple txq->lock instances, > I think). :) Yeah - there is lockdep magic there. In any case, we set it when we enqueue and host command and clear it when the last command is being responded. When we set it, we request access, and vice versa. > > Also, what stops the card from going to sleep while processing a > wakeup interrupt? For example, what prevents this chain of events? > The only thing I can see is that the only thing that clears ACCESS_REQ > is the rx interrupt processing, which happens after wakeup processing, > but this seems rather fragile. > As long as ACCESS_REQ isn't cleared, the NIC is awake. We set it when we we want to wakeup the NIC. I can't really find where we clear it right now, but I am half sleeping :) -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
commit 5225b6b3256c399d29de1e0db1915e07005516e7 Author: Andy Lutomirski <luto@amacapital.net> Date: Tue May 27 10:06:09 2014 -0700 iwlwifi pm hack diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c b/drivers/net/wireless/iwlwifi/dvm/power.c index b4e6141..5d94162 100644 --- a/drivers/net/wireless/iwlwifi/dvm/power.c +++ b/drivers/net/wireless/iwlwifi/dvm/power.c @@ -288,10 +288,15 @@ static void iwl_power_build_cmd(struct iwl_priv *priv, bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS; int dtimper; + printk(KERN_INFO "enabled = %d, wowlan = %d\n", (int)enabled, (int)priv->wowlan); + dump_stack(); + dtimper = priv->hw->conf.ps_dtim_period ?: 1; if (priv->wowlan) iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper); + if (!enabled) + iwl_power_sleep_cam_cmd(priv, cmd); else if (!priv->lib->no_idle_support && priv->hw->conf.flags & IEEE80211_CONF_IDLE) iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20); @@ -299,9 +304,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv, /* in thermal throttling low power state */ iwl_static_sleep_cmd(priv, cmd, iwl_tt_current_power_mode(priv), dtimper); - } else if (!enabled) - iwl_power_sleep_cam_cmd(priv, cmd); - else if (priv->power_data.debug_sleep_level_override >= 0) + } else if (priv->power_data.debug_sleep_level_override >= 0) iwl_static_sleep_cmd(priv, cmd, priv->power_data.debug_sleep_level_override, dtimper);