diff mbox

amth10k: fix promisc handling

Message ID 1431434736-7077-1-git-send-email-michal.kazior@tieto.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Michal Kazior May 12, 2015, 12:45 p.m. UTC
Patch df1404650ccb ("mac80211: remove support for
IFF_PROMISC") removed promiscuous flag propagation
to drivers.

However the patch was designed against ath10k
without 548462133d98 ("ath10k: fix interrupt
storm").

After merge the code drifted into being no longer
correct and due to monitor vdev being
overzealously started caused IBSS to crash on
999.999.0.636 for QCA988X (this firmware revision
is known to have issues with monitor vdev).

This patch keeps expectations of commit
548462133d98 (i.e. reduce irq storm by not
enabling monitor vdev for AP) and doesn't break
existing (known) setups that imply promiscuous
mode on network interfaces.

Contrary to what it looks like 548462133d98
functionality is not reverted since the intention
was a subset of what df1404650ccb did.

Fixes: c17c997d5613 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/mac.c | 29 +----------------------------
 1 file changed, 1 insertion(+), 28 deletions(-)

Comments

Michal Kazior May 21, 2015, 5:40 a.m. UTC | #1
On 12 May 2015 at 14:45, Michal Kazior <michal.kazior@tieto.com> wrote:
> Patch df1404650ccb ("mac80211: remove support for
> IFF_PROMISC") removed promiscuous flag propagation
> to drivers.
>
> However the patch was designed against ath10k
> without 548462133d98 ("ath10k: fix interrupt
> storm").
>
> After merge the code drifted into being no longer
> correct and due to monitor vdev being
> overzealously started caused IBSS to crash on
> 999.999.0.636 for QCA988X (this firmware revision
> is known to have issues with monitor vdev).
>
> This patch keeps expectations of commit
> 548462133d98 (i.e. reduce irq storm by not
> enabling monitor vdev for AP) and doesn't break
> existing (known) setups that imply promiscuous
> mode on network interfaces.
>
> Contrary to what it looks like 548462133d98
> functionality is not reverted since the intention
> was a subset of what df1404650ccb did.
>
> Fixes: c17c997d5613 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next")
> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>

Apparently this also fixes some weird issues with qca6174 hw2.1 notably:
 - ath10k causing disconnecting of other devices in a BSS
 - random Fw crashes

Both problems started to happen because c17c997d5613 enabled monitor
vdev by default on STA interfaces. It seems that qca6174 hw2.1
firmware has issues similar to those of qca988x 999.999.0.636
regarding monitor vdev opration.

Also, I've made a typo in the subject.

I'll post v2 with subject fixed and extended commit log later.


Micha?
Kalle Valo May 21, 2015, 7:40 a.m. UTC | #2
Adding John as this involved wireless-testing

Michal Kazior <michal.kazior@tieto.com> writes:

> On 12 May 2015 at 14:45, Michal Kazior <michal.kazior@tieto.com> wrote:
>> Patch df1404650ccb ("mac80211: remove support for
>> IFF_PROMISC") removed promiscuous flag propagation
>> to drivers.
>>
>> However the patch was designed against ath10k
>> without 548462133d98 ("ath10k: fix interrupt
>> storm").
>>
>> After merge the code drifted into being no longer
>> correct and due to monitor vdev being
>> overzealously started caused IBSS to crash on
>> 999.999.0.636 for QCA988X (this firmware revision
>> is known to have issues with monitor vdev).
>>
>> This patch keeps expectations of commit
>> 548462133d98 (i.e. reduce irq storm by not
>> enabling monitor vdev for AP) and doesn't break
>> existing (known) setups that imply promiscuous
>> mode on network interfaces.
>>
>> Contrary to what it looks like 548462133d98
>> functionality is not reverted since the intention
>> was a subset of what df1404650ccb did.
>>
>> Fixes: c17c997d5613 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next")
>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>
> Apparently this also fixes some weird issues with qca6174 hw2.1 notably:
>  - ath10k causing disconnecting of other devices in a BSS
>  - random Fw crashes
>
> Both problems started to happen because c17c997d5613 enabled monitor
> vdev by default on STA interfaces. It seems that qca6174 hw2.1
> firmware has issues similar to those of qca988x 999.999.0.636
> regarding monitor vdev opration.
>
> Also, I've made a typo in the subject.
>
> I'll post v2 with subject fixed and extended commit log later.

Keep in mind that c17c997d5613 is actually from wireless-testing.git
which means that it will never go to wireless-drivers-next.git nor to
net-next.git. So the merge conflict bug is purely in
wireless-testing.git and in master branch of ath.git (but not in
ath-next branch!).

I think John should apply your v2 patch once you send it. But if you
have something which should be fixed in ath-next remember to send that
in a separate patch so that I can apply that directly to ath-next.
Kalle Valo May 25, 2015, 12:25 p.m. UTC | #3
Kalle Valo <kvalo@qca.qualcomm.com> writes:

> Adding John as this involved wireless-testing
>
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> On 12 May 2015 at 14:45, Michal Kazior <michal.kazior@tieto.com> wrote:
>>> Patch df1404650ccb ("mac80211: remove support for
>>> IFF_PROMISC") removed promiscuous flag propagation
>>> to drivers.
>>>
>>> However the patch was designed against ath10k
>>> without 548462133d98 ("ath10k: fix interrupt
>>> storm").
>>>
>>> After merge the code drifted into being no longer
>>> correct and due to monitor vdev being
>>> overzealously started caused IBSS to crash on
>>> 999.999.0.636 for QCA988X (this firmware revision
>>> is known to have issues with monitor vdev).
>>>
>>> This patch keeps expectations of commit
>>> 548462133d98 (i.e. reduce irq storm by not
>>> enabling monitor vdev for AP) and doesn't break
>>> existing (known) setups that imply promiscuous
>>> mode on network interfaces.
>>>
>>> Contrary to what it looks like 548462133d98
>>> functionality is not reverted since the intention
>>> was a subset of what df1404650ccb did.
>>>
>>> Fixes: c17c997d5613 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next")
>>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>>
>> Apparently this also fixes some weird issues with qca6174 hw2.1 notably:
>>  - ath10k causing disconnecting of other devices in a BSS
>>  - random Fw crashes
>>
>> Both problems started to happen because c17c997d5613 enabled monitor
>> vdev by default on STA interfaces. It seems that qca6174 hw2.1
>> firmware has issues similar to those of qca988x 999.999.0.636
>> regarding monitor vdev opration.
>>
>> Also, I've made a typo in the subject.
>>
>> I'll post v2 with subject fixed and extended commit log later.
>
> Keep in mind that c17c997d5613 is actually from wireless-testing.git
> which means that it will never go to wireless-drivers-next.git nor to
> net-next.git. So the merge conflict bug is purely in
> wireless-testing.git and in master branch of ath.git (but not in
> ath-next branch!).
>
> I think John should apply your v2 patch once you send it. But if you
> have something which should be fixed in ath-next remember to send that
> in a separate patch so that I can apply that directly to ath-next.

Actually now that Dave pulled my pull request the issue is fixed in
wireless-drivers-next already. So once John pulls from
wireless-drivers-next and makes sure that ath10k is 100% identical in
both trees the issue should be sorted out and no need for extra patches.
Sebastian Gottschall May 25, 2015, 5:10 p.m. UTC | #4
Hello

could it be possible to add a ACK timing feature to the ath10k firmware 
(QCA9880 internal register 0x8014, mask 0x3FFF)

regards,

Sebastian
Ben Greear May 25, 2015, 5:13 p.m. UTC | #5
On 05/25/2015 10:10 AM, Sebastian Gottschall wrote:
> Hello
>
> could it be possible to add a ACK timing feature to the ath10k firmware (QCA9880 internal register 0x8014, mask 0x3FFF)

You just need ability to set this register to some value?

If so, probably something I could add to CT firmware, at least.

Thanks,
Ben
Sebastian Gottschall May 25, 2015, 5:48 p.m. UTC | #6
Am 25.05.2015 um 19:13 schrieb Ben Greear:
>
>
> On 05/25/2015 10:10 AM, Sebastian Gottschall wrote:
>> Hello
>>
>> could it be possible to add a ACK timing feature to the ath10k 
>> firmware (QCA9880 internal register 0x8014, mask 0x3FFF)
>
> You just need ability to set this register to some value?
>
> If so, probably something I could add to CT firmware, at least.
>
not alone. this register is rewritten on each reset (channel change 
etc.) so it needs to be correct handled.
yes. just writing and handling the ack value would be enough. the math 
behind is no problem.
otherwise its impossible todo long range links with ath10k. (LSDK based 
firmware from compex do support this feature unlike ath10k)

for distance handling the following parameters must be adjustable (in 
ath9k we implemented the coverageclass attribute for it which was based 
on my previous work on madwifi)
since i just have a old ath10k firmware source which i never got working 
(working toolchain missing) i just write down you the register 
definitions here which must be adjustable.
the math etc. for calculating these values can be done later by me in ath10k
OS_REG_WRITE(MAC_DCU_GBL_IFS_SLOT_ADDRESS, 
MAC_DCU_GBL_IFS_SLOT_DURATION_SET(your_slot_time_here * 88)); //(default 
value is 9)
OS_REG_WRITE(MAC_DCU_GBL_IFS_SIFS_ADDRESS, 
MAC_DCU_GBL_IFS_SIFS_DURATION_SET(your_sifs_time_here * 88)); //(default 
value is 14)
OS_REG_WRITE(MAC_DCU_GBL_IFS_EIFS_ADDRESS, 
MAC_DCU_GBL_IFS_SIFS_DURATION_SET(your_eifs_time_here * 88)); //(default 
value is 92)
OS_REG_WRITE(MAC_PCU_ACK_CTS_TIMEOUT_ADDRESS, 
MAC_PCU_ACK_CTS_TIMEOUT_ACK_TIMEOUT_SET(your_ack_time_here * 88) ); // 
(default value is 30)
OS_REG_WRITE(MAC_PCU_ACK_CTS_TIMEOUT_ADDRESS, 
MAC_PCU_ACK_CTS_TIMEOUT_CTS_TIMEOUT_SET(your_cts_time_here * 88)); // 
(default value is 30)


these registers are prewritten using the ini array  named 
qca9880_peregrine_bimodal_asic_mac

its possible to adjust them using debugfs reg_value and reg_addr, but as 
i said on each channel change or internal reset, the registers are 
overwritten with default values. so best is to adjust them direct
after registers are written from ini array.


> Thanks,
> Ben
>
>
Ben Greear May 25, 2015, 5:53 p.m. UTC | #7
On 05/25/2015 10:48 AM, Sebastian Gottschall wrote:
> Am 25.05.2015 um 19:13 schrieb Ben Greear:
>>
>>
>> On 05/25/2015 10:10 AM, Sebastian Gottschall wrote:
>>> Hello
>>>
>>> could it be possible to add a ACK timing feature to the ath10k firmware (QCA9880 internal register 0x8014, mask 0x3FFF)
>>
>> You just need ability to set this register to some value?
>>
>> If so, probably something I could add to CT firmware, at least.
>>
> not alone. this register is rewritten on each reset (channel change etc.) so it needs to be correct handled.
> yes. just writing and handling the ack value would be enough. the math behind is no problem.
> otherwise its impossible todo long range links with ath10k. (LSDK based firmware from compex do support this feature unlike ath10k)

I'll see if I can add this to my firmware, probably will be a few days before I can get
time to work on it.  Will post to list when I have a FW build ready for testing.

Thanks,
Ben
Sebastian Gottschall May 25, 2015, 7:21 p.m. UTC | #8
Am 25.05.2015 um 19:53 schrieb Ben Greear:
>
>
> On 05/25/2015 10:48 AM, Sebastian Gottschall wrote:
>> Am 25.05.2015 um 19:13 schrieb Ben Greear:
>>>
>>>
>>> On 05/25/2015 10:10 AM, Sebastian Gottschall wrote:
>>>> Hello
>>>>
>>>> could it be possible to add a ACK timing feature to the ath10k 
>>>> firmware (QCA9880 internal register 0x8014, mask 0x3FFF)
>>>
>>> You just need ability to set this register to some value?
>>>
>>> If so, probably something I could add to CT firmware, at least.
>>>
>> not alone. this register is rewritten on each reset (channel change 
>> etc.) so it needs to be correct handled.
>> yes. just writing and handling the ack value would be enough. the 
>> math behind is no problem.
>> otherwise its impossible todo long range links with ath10k. (LSDK 
>> based firmware from compex do support this feature unlike ath10k)
>
> I'll see if I can add this to my firmware, probably will be a few days 
> before I can get
> time to work on it.  Will post to list when I have a FW build ready 
> for testing.
do you plan to bring up your codebase to 10.2.4 with api 5 one time?
or is the code already up to date, just using the old api?
>
> Thanks,
> Ben
>
Ben Greear May 25, 2015, 7:32 p.m. UTC | #9
On 05/25/2015 12:21 PM, Sebastian Gottschall wrote:
> Am 25.05.2015 um 19:53 schrieb Ben Greear:
>>
>>
>> On 05/25/2015 10:48 AM, Sebastian Gottschall wrote:
>>> Am 25.05.2015 um 19:13 schrieb Ben Greear:
>>>>
>>>>
>>>> On 05/25/2015 10:10 AM, Sebastian Gottschall wrote:
>>>>> Hello
>>>>>
>>>>> could it be possible to add a ACK timing feature to the ath10k firmware (QCA9880 internal register 0x8014, mask 0x3FFF)
>>>>
>>>> You just need ability to set this register to some value?
>>>>
>>>> If so, probably something I could add to CT firmware, at least.
>>>>
>>> not alone. this register is rewritten on each reset (channel change etc.) so it needs to be correct handled.
>>> yes. just writing and handling the ack value would be enough. the math behind is no problem.
>>> otherwise its impossible todo long range links with ath10k. (LSDK based firmware from compex do support this feature unlike ath10k)
>>
>> I'll see if I can add this to my firmware, probably will be a few days before I can get
>> time to work on it.  Will post to list when I have a FW build ready for testing.
> do you plan to bring up your codebase to 10.2.4 with api 5 one time?
> or is the code already up to date, just using the old api?

I'm having a slow time getting updated source from QCA, but I plan to
move to a newer code base when I can get access.

For now, my firmware is based on 10.1.467, but it has quite a bit of improvements
and changes.  It does not support some of the newer chipsets that newer QCA
firmware supports.

Thanks,
Ben
Sebastian Gottschall May 25, 2015, 8:31 p.m. UTC | #10
Am 25.05.2015 um 21:32 schrieb Ben Greear:
>
>
> On 05/25/2015 12:21 PM, Sebastian Gottschall wrote:
>> Am 25.05.2015 um 19:53 schrieb Ben Greear:
>>>
>>>
>>> On 05/25/2015 10:48 AM, Sebastian Gottschall wrote:
>>>> Am 25.05.2015 um 19:13 schrieb Ben Greear:
>>>>>
>>>>>
>>>>> On 05/25/2015 10:10 AM, Sebastian Gottschall wrote:
>>>>>> Hello
>>>>>>
>>>>>> could it be possible to add a ACK timing feature to the ath10k 
>>>>>> firmware (QCA9880 internal register 0x8014, mask 0x3FFF)
>>>>>
>>>>> You just need ability to set this register to some value?
>>>>>
>>>>> If so, probably something I could add to CT firmware, at least.
>>>>>
>>>> not alone. this register is rewritten on each reset (channel change 
>>>> etc.) so it needs to be correct handled.
>>>> yes. just writing and handling the ack value would be enough. the 
>>>> math behind is no problem.
>>>> otherwise its impossible todo long range links with ath10k. (LSDK 
>>>> based firmware from compex do support this feature unlike ath10k)
>>>
>>> I'll see if I can add this to my firmware, probably will be a few 
>>> days before I can get
>>> time to work on it.  Will post to list when I have a FW build ready 
>>> for testing.
>> do you plan to bring up your codebase to 10.2.4 with api 5 one time?
>> or is the code already up to date, just using the old api?
>
> I'm having a slow time getting updated source from QCA, but I plan to
> move to a newer code base when I can get access.
>
> For now, my firmware is based on 10.1.467, but it has quite a bit of 
> improvements
> and changes.  It does not support some of the newer chipsets that 
> newer QCA
> firmware supports.
as soon as i have seen each new chipset has a own firmware. the standard 
firmware will only support AR9880 v2.
since i'm only working on embedded devices which are only running on 
AR9880 v2 based chipsets, this isnt a big issue

Sebastian
>
> Thanks,
> Ben
>
>
Sebastian Gottschall May 25, 2015, 9:26 p.m. UTC | #11
today using the latest testing driver, i found out the memory 
consumption is unbelievable high.
my router here has 64 mb ram. this ram is fully taken after some minutes 
by ath10k. but only if data flow present.

here the results of "free" after some minutes
root@DD-WRT:~# free
total         used         free       shared      buffers
Mem:         61636        58752         2884            0 2600
-/+ buffers:              56152         5484
Swap:            0            0            0


now i terminate hostapd which controls the ath10k chipset


root@DD-WRT:~# kill 902
root@DD-WRT:~# free
total         used         free       shared      buffers
Mem:         61636        23212        38424            0 2416
-/+ buffers:              20796        40840
Swap:            0            0            0


you see the difference?


regards,
Sebastian Gottschall
Ben Greear May 25, 2015, 10:39 p.m. UTC | #12
Default firmware has a hard-coded minimum number of tx buffers (somewhere
more than 1k buffers I think).  Maybe driver is allocating all this
memory somehow?

If you do one-way traffic tests (udp), I wonder if you can tell if it is tx
or rx that consumes the memory?

CT firmware can be configured to use any multiple-of-8 amount of tx
buffers, though I have not tested below around 600.

Thanks,
Ben

On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
> today using the latest testing driver, i found out the memory consumption is unbelievable high.
> my router here has 64 mb ram. this ram is fully taken after some minutes by ath10k. but only if data flow present.
>
> here the results of "free" after some minutes
> root@DD-WRT:~# free
> total         used         free       shared      buffers
> Mem:         61636        58752         2884            0 2600
> -/+ buffers:              56152         5484
> Swap:            0            0            0
>
>
> now i terminate hostapd which controls the ath10k chipset
>
>
> root@DD-WRT:~# kill 902
> root@DD-WRT:~# free
> total         used         free       shared      buffers
> Mem:         61636        23212        38424            0 2416
> -/+ buffers:              20796        40840
> Swap:            0            0            0
>
>
> you see the difference?
>
>
> regards,
> Sebastian Gottschall
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
>
Sebastian Gottschall May 25, 2015, 11 p.m. UTC | #13
Am 26.05.2015 um 00:39 schrieb Ben Greear:
> Default firmware has a hard-coded minimum number of tx buffers (somewhere
> more than 1k buffers I think).  Maybe driver is allocating all this
> memory somehow?
>
> If you do one-way traffic tests (udp), I wonder if you can tell if it 
> is tx
> or rx that consumes the memory?
its tx. i have a ethernet over ip tunnel running on that link and i 
broadcast iptv in that way. (its my way to convert multicast to unicast)
the  tunnel itself is rfc ethernet over ip, which is somewhat like udp. 
so connectionless protocol

Sebastian
>
> CT firmware can be configured to use any multiple-of-8 amount of tx
> buffers, though I have not tested below around 600.
>
> Thanks,
> Ben
>
> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>> today using the latest testing driver, i found out the memory 
>> consumption is unbelievable high.
>> my router here has 64 mb ram. this ram is fully taken after some 
>> minutes by ath10k. but only if data flow present.
>>
>> here the results of "free" after some minutes
>> root@DD-WRT:~# free
>> total         used         free       shared      buffers
>> Mem:         61636        58752         2884            0 2600
>> -/+ buffers:              56152         5484
>> Swap:            0            0            0
>>
>>
>> now i terminate hostapd which controls the ath10k chipset
>>
>>
>> root@DD-WRT:~# kill 902
>> root@DD-WRT:~# free
>> total         used         free       shared      buffers
>> Mem:         61636        23212        38424            0 2416
>> -/+ buffers:              20796        40840
>> Swap:            0            0            0
>>
>>
>> you see the difference?
>>
>>
>> regards,
>> Sebastian Gottschall
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
>>
>
Ben Greear May 25, 2015, 11:42 p.m. UTC | #14
Can you test with ath9k to make sure it is actually ath10k related?

And/or try traffic in RX direction only to see if that still uses
lots of memory?

Does memory come back after you just stop traffic (w/out stopping
hostapd)?

Thanks,
Ben


On 05/25/2015 04:00 PM, Sebastian Gottschall wrote:
> Am 26.05.2015 um 00:39 schrieb Ben Greear:
>> Default firmware has a hard-coded minimum number of tx buffers (somewhere
>> more than 1k buffers I think).  Maybe driver is allocating all this
>> memory somehow?
>>
>> If you do one-way traffic tests (udp), I wonder if you can tell if it is tx
>> or rx that consumes the memory?
> its tx. i have a ethernet over ip tunnel running on that link and i broadcast iptv in that way. (its my way to convert multicast to unicast)
> the  tunnel itself is rfc ethernet over ip, which is somewhat like udp. so connectionless protocol
>
> Sebastian
>>
>> CT firmware can be configured to use any multiple-of-8 amount of tx
>> buffers, though I have not tested below around 600.
>>
>> Thanks,
>> Ben
>>
>> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>>> today using the latest testing driver, i found out the memory consumption is unbelievable high.
>>> my router here has 64 mb ram. this ram is fully taken after some minutes by ath10k. but only if data flow present.
>>>
>>> here the results of "free" after some minutes
>>> root@DD-WRT:~# free
>>> total         used         free       shared      buffers
>>> Mem:         61636        58752         2884            0 2600
>>> -/+ buffers:              56152         5484
>>> Swap:            0            0            0
>>>
>>>
>>> now i terminate hostapd which controls the ath10k chipset
>>>
>>>
>>> root@DD-WRT:~# kill 902
>>> root@DD-WRT:~# free
>>> total         used         free       shared      buffers
>>> Mem:         61636        23212        38424            0 2416
>>> -/+ buffers:              20796        40840
>>> Swap:            0            0            0
>>>
>>>
>>> you see the difference?
>>>
>>>
>>> regards,
>>> Sebastian Gottschall
>>>
>>> _______________________________________________
>>> ath10k mailing list
>>> ath10k@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>>
>>
>
Sebastian Gottschall May 26, 2015, 12:07 a.m. UTC | #15
Am 26.05.2015 um 01:42 schrieb Ben Greear:
> Can you test with ath9k to make sure it is actually ath10k related?
already tested. this device has 2 chipsets. one is ath9k based and the 
second is ath10k based. :-)
only if i kill the hostapd process which controls ath10k. the memory 
waste is gone
>
> And/or try traffic in RX direction only to see if that still uses
> lots of memory?

>
> Does memory come back after you just stop traffic (w/out stopping
> hostapd)?
yes. slowly. its fluctuating. so sometimes there is 30 mb free again and 
seconds later just 2 mb. so very heavy changes. on bigger routers with 
more than 64 mb (i have a second here with 128 mb)
the total consumption stabilizes at 45 - 50 mb for the driver only which 
is still too much for sure.  so it may not a leak. but ath10k or the 
firmware is wasting too much memory for embedded devices
and ar9880 is just used on embedded devices almost
>
> Thanks,
> Ben
>
>
> On 05/25/2015 04:00 PM, Sebastian Gottschall wrote:
>> Am 26.05.2015 um 00:39 schrieb Ben Greear:
>>> Default firmware has a hard-coded minimum number of tx buffers 
>>> (somewhere
>>> more than 1k buffers I think).  Maybe driver is allocating all this
>>> memory somehow?
>>>
>>> If you do one-way traffic tests (udp), I wonder if you can tell if 
>>> it is tx
>>> or rx that consumes the memory?
>> its tx. i have a ethernet over ip tunnel running on that link and i 
>> broadcast iptv in that way. (its my way to convert multicast to unicast)
>> the  tunnel itself is rfc ethernet over ip, which is somewhat like 
>> udp. so connectionless protocol
>>
>> Sebastian
>>>
>>> CT firmware can be configured to use any multiple-of-8 amount of tx
>>> buffers, though I have not tested below around 600.
>>>
>>> Thanks,
>>> Ben
>>>
>>> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>>>> today using the latest testing driver, i found out the memory 
>>>> consumption is unbelievable high.
>>>> my router here has 64 mb ram. this ram is fully taken after some 
>>>> minutes by ath10k. but only if data flow present.
>>>>
>>>> here the results of "free" after some minutes
>>>> root@DD-WRT:~# free
>>>> total         used         free       shared      buffers
>>>> Mem:         61636        58752         2884            0 2600
>>>> -/+ buffers:              56152         5484
>>>> Swap:            0            0            0
>>>>
>>>>
>>>> now i terminate hostapd which controls the ath10k chipset
>>>>
>>>>
>>>> root@DD-WRT:~# kill 902
>>>> root@DD-WRT:~# free
>>>> total         used         free       shared      buffers
>>>> Mem:         61636        23212        38424            0 2416
>>>> -/+ buffers:              20796        40840
>>>> Swap:            0            0            0
>>>>
>>>>
>>>> you see the difference?
>>>>
>>>>
>>>> regards,
>>>> Sebastian Gottschall
>>>>
>>>> _______________________________________________
>>>> ath10k mailing list
>>>> ath10k@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>>>
>>>
>>
>
Michal Kazior May 26, 2015, 5:42 a.m. UTC | #16
On 26 May 2015 at 02:07, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
> Am 26.05.2015 um 01:42 schrieb Ben Greear:
>>
>> Can you test with ath9k to make sure it is actually ath10k related?
>
> already tested. this device has 2 chipsets. one is ath9k based and the
> second is ath10k based. :-)
> only if i kill the hostapd process which controls ath10k. the memory waste
> is gone

Keep in mind that hostapd itself requires memory to function as well.
Each process (and thread) need some internal kernel memory (stack, et
al).


>> And/or try traffic in RX direction only to see if that still uses
>> lots of memory?
>
>
>>
>> Does memory come back after you just stop traffic (w/out stopping
>> hostapd)?
>
> yes. slowly. its fluctuating. so sometimes there is 30 mb free again and
> seconds later just 2 mb. so very heavy changes. on bigger routers with more
> than 64 mb (i have a second here with 128 mb)
> the total consumption stabilizes at 45 - 50 mb for the driver only which is
> still too much for sure.  so it may not a leak. but ath10k or the firmware
> is wasting too much memory for embedded devices
> and ar9880 is just used on embedded devices almost

Using `free` is a pretty poor way of assessing memory usage of a
kernel driver. It reports how much the OS has memory available to
userspace immediately (kernel recycles some memory for performance
reasons, e.g. SLAB does it). There's a lot of metadata too so what you
actually see is many other things that involve ath10k being used.

The driver itself should be consuming around 5MB of memory at idle
(interface up, no significant traffic). Most of this goes for the Rx
ring which has 1023*1920 bytes (+/- allocation and metadata waste).
Then there's a bunch of CE buffers as well which take up some memory
(used for driver-firmware communication), e.g. 2048*512 + 2048*128
(HTT and WMI, both target->host).

When Txing it may eat up additional 1424 * (MSDU size +
sizeof(skbuff)). Note that Tx queues can be longer - driver isn't
aware of qdiscs and those can store frames as well.

11ac supports frame aggregates going up to 1MB so these queues pretty
much need to be this long if you want to be able to get highest
possible throughput.


Micha?

>
>>
>> Thanks,
>> Ben
>>
>>
>> On 05/25/2015 04:00 PM, Sebastian Gottschall wrote:
>>>
>>> Am 26.05.2015 um 00:39 schrieb Ben Greear:
>>>>
>>>> Default firmware has a hard-coded minimum number of tx buffers
>>>> (somewhere
>>>> more than 1k buffers I think).  Maybe driver is allocating all this
>>>> memory somehow?
>>>>
>>>> If you do one-way traffic tests (udp), I wonder if you can tell if it is
>>>> tx
>>>> or rx that consumes the memory?
>>>
>>> its tx. i have a ethernet over ip tunnel running on that link and i
>>> broadcast iptv in that way. (its my way to convert multicast to unicast)
>>> the  tunnel itself is rfc ethernet over ip, which is somewhat like udp.
>>> so connectionless protocol
>>>
>>> Sebastian
>>>>
>>>>
>>>> CT firmware can be configured to use any multiple-of-8 amount of tx
>>>> buffers, though I have not tested below around 600.
>>>>
>>>> Thanks,
>>>> Ben
>>>>
>>>> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>>>>>
>>>>> today using the latest testing driver, i found out the memory
>>>>> consumption is unbelievable high.
>>>>> my router here has 64 mb ram. this ram is fully taken after some
>>>>> minutes by ath10k. but only if data flow present.
>>>>>
>>>>> here the results of "free" after some minutes
>>>>> root@DD-WRT:~# free
>>>>> total         used         free       shared      buffers
>>>>> Mem:         61636        58752         2884            0 2600
>>>>> -/+ buffers:              56152         5484
>>>>> Swap:            0            0            0
>>>>>
>>>>>
>>>>> now i terminate hostapd which controls the ath10k chipset
>>>>>
>>>>>
>>>>> root@DD-WRT:~# kill 902
>>>>> root@DD-WRT:~# free
>>>>> total         used         free       shared      buffers
>>>>> Mem:         61636        23212        38424            0 2416
>>>>> -/+ buffers:              20796        40840
>>>>> Swap:            0            0            0
>>>>>
>>>>>
>>>>> you see the difference?
>>>>>
>>>>>
>>>>> regards,
>>>>> Sebastian Gottschall
>>>>>
>>>>> _______________________________________________
>>>>> ath10k mailing list
>>>>> ath10k@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>>>>
>>>>
>>>
>>
>
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
Rajkumar Manoharan May 26, 2015, 6:20 a.m. UTC | #17
On Tue, May 26, 2015 at 07:42:35AM +0200, Michal Kazior wrote:
> On 26 May 2015 at 02:07, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
> > Am 26.05.2015 um 01:42 schrieb Ben Greear:
> >>
> >> Can you test with ath9k to make sure it is actually ath10k related?
> >
> > already tested. this device has 2 chipsets. one is ath9k based and the
> > second is ath10k based. :-)
> > only if i kill the hostapd process which controls ath10k. the memory waste
> > is gone
> 
> Keep in mind that hostapd itself requires memory to function as well.
> Each process (and thread) need some internal kernel memory (stack, et
> al).
>
Have seen simialar issue long hours run in mbssid mode with multi-client.
Killing hostapd regains memory.

[<c021dd44>] (unwind_backtrace) from [<c021ae0c>] (show_stack+0x10/0x14)
[<c021ae0c>] (show_stack) from [<c0336b9c>] (dump_stack+0x88/0xcc)
[<c0336b9c>] (dump_stack) from [<c0279804>] (dump_header.isra.11+0x64/0x178)
[<c0279804>] (dump_header.isra.11) from [<c0279b10>] (oom_kill_process+0x70/0x384)
[<c0279b10>] (oom_kill_process) from [<c027a2a0>] (out_of_memory+0x2d4/0x304)
[<c027a2a0>] (out_of_memory) from [<c027d180>] (__alloc_pages_nodemask+0x608/0x664)
[<c027d180>] (__alloc_pages_nodemask) from [<c0278780>] (filemap_fault+0x1f8/0x390)
[<c0278780>] (filemap_fault) from [<c028f45c>] (__do_fault+0xa4/0x42c)
[<c028f45c>] (__do_fault) from [<c0292494>] (handle_mm_fault+0x230/0x7b0)
[<c0292494>] (handle_mm_fault) from [<c021f70c>] (do_page_fault+0x114/0x26c)
[<c021f70c>] (do_page_fault) from [<c0208440>] (do_PrefetchAbort+0x34/0x98)

Need to check whether it is a regression or not.

-Rajkumar
Sebastian Gottschall May 26, 2015, 7:23 a.m. UTC | #18
Am 26.05.2015 um 07:42 schrieb Michal Kazior:
> On 26 May 2015 at 02:07, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
>> Am 26.05.2015 um 01:42 schrieb Ben Greear:
>>> Can you test with ath9k to make sure it is actually ath10k related?
>> already tested. this device has 2 chipsets. one is ath9k based and the
>> second is ath10k based. :-)
>> only if i kill the hostapd process which controls ath10k. the memory waste
>> is gone
> Keep in mind that hostapd itself requires memory to function as well.
> Each process (and thread) need some internal kernel memory (stack, et
> al).
>
i know. 1.8 mb is what i see in userspace. the ath9k and ath10k 
controling hostapd uses the same amount of memory.
no difference between them. 50 mb is never taken by hostapd.
consider that this embedded device  has just 64 mb ram. (dlink-dir859)

>
>> yes. slowly. its fluctuating. so sometimes there is 30 mb free again and
>> seconds later just 2 mb. so very heavy changes. on bigger routers with more
>> than 64 mb (i have a second here with 128 mb)
>> the total consumption stabilizes at 45 - 50 mb for the driver only which is
>> still too much for sure.  so it may not a leak. but ath10k or the firmware
>> is wasting too much memory for embedded devices
>> and ar9880 is just used on embedded devices almost
> Using `free` is a pretty poor way of assessing memory usage of a
> kernel driver. It reports how much the OS has memory available to
> userspace immediately (kernel recycles some memory for performance
> reasons, e.g. SLAB does it). There's a lot of metadata too so what you
> actually see is many other things that involve ath10k being used.
i checked meminfo as well. but you dont see any differences in it. it 
shows only differences in the same values as free.
all other slab related info etc are not changing.
>
> The driver itself should be consuming around 5MB of memory at idle
> (interface up, no significant traffic). Most of this goes for the Rx
> ring which has 1023*1920 bytes (+/- allocation and metadata waste).
> Then there's a bunch of CE buffers as well which take up some memory
> (used for driver-firmware communication), e.g. 2048*512 + 2048*128
> (HTT and WMI, both target->host).
>
> When Txing it may eat up additional 1424 * (MSDU size +
> sizeof(skbuff)). Note that Tx queues can be longer - driver isn't
> aware of qdiscs and those can store frames as well.
>
> 11ac supports frame aggregates going up to 1MB so these queues pretty
> much need to be this long if you want to be able to get highest
> possible throughput.
yes. but ath10k has its main usage on embedded devices. at least for 
AR9880 chipsets
since there is not even a windows driver available for AR9880.
so now consider that ath10k is not able to run on devices with good 
stability, where the QCA LSDK Driver
does not seem to have that big resource problem.
so it doesnt make much sense to go on here in this way.
this resource problem must be solved. about 50 MB is really too much.


Sebastian
>
>
> Micha?
>
>>> Thanks,
>>> Ben
>>>
>>>
>>> On 05/25/2015 04:00 PM, Sebastian Gottschall wrote:
>>>> Am 26.05.2015 um 00:39 schrieb Ben Greear:
>>>>> Default firmware has a hard-coded minimum number of tx buffers
>>>>> (somewhere
>>>>> more than 1k buffers I think).  Maybe driver is allocating all this
>>>>> memory somehow?
>>>>>
>>>>> If you do one-way traffic tests (udp), I wonder if you can tell if it is
>>>>> tx
>>>>> or rx that consumes the memory?
>>>> its tx. i have a ethernet over ip tunnel running on that link and i
>>>> broadcast iptv in that way. (its my way to convert multicast to unicast)
>>>> the  tunnel itself is rfc ethernet over ip, which is somewhat like udp.
>>>> so connectionless protocol
>>>>
>>>> Sebastian
>>>>>
>>>>> CT firmware can be configured to use any multiple-of-8 amount of tx
>>>>> buffers, though I have not tested below around 600.
>>>>>
>>>>> Thanks,
>>>>> Ben
>>>>>
>>>>> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>>>>>> today using the latest testing driver, i found out the memory
>>>>>> consumption is unbelievable high.
>>>>>> my router here has 64 mb ram. this ram is fully taken after some
>>>>>> minutes by ath10k. but only if data flow present.
>>>>>>
>>>>>> here the results of "free" after some minutes
>>>>>> root@DD-WRT:~# free
>>>>>> total         used         free       shared      buffers
>>>>>> Mem:         61636        58752         2884            0 2600
>>>>>> -/+ buffers:              56152         5484
>>>>>> Swap:            0            0            0
>>>>>>
>>>>>>
>>>>>> now i terminate hostapd which controls the ath10k chipset
>>>>>>
>>>>>>
>>>>>> root@DD-WRT:~# kill 902
>>>>>> root@DD-WRT:~# free
>>>>>> total         used         free       shared      buffers
>>>>>> Mem:         61636        23212        38424            0 2416
>>>>>> -/+ buffers:              20796        40840
>>>>>> Swap:            0            0            0
>>>>>>
>>>>>>
>>>>>> you see the difference?
>>>>>>
>>>>>>
>>>>>> regards,
>>>>>> Sebastian Gottschall
>>>>>>
>>>>>> _______________________________________________
>>>>>> ath10k mailing list
>>>>>> ath10k@lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>>>>>
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
Sebastian Gottschall May 26, 2015, 7:26 a.m. UTC | #19
good point. ath10k is configured with one additional vap for me. but not 
multi client.
both vap's are running in ap mode. let me send you my hostapd config here.
passphrases has been masked


driver=nl80211
ctrl_interface=/var/run/hostapd
wmm_ac_bk_cwmin=4
wmm_ac_bk_cwmax=10
wmm_ac_bk_aifs=7
wmm_ac_bk_txop_limit=0
wmm_ac_bk_acm=0
wmm_ac_be_aifs=3
wmm_ac_be_cwmin=4
wmm_ac_be_cwmax=10
wmm_ac_be_acm=0
wmm_ac_vi_aifs=2
wmm_ac_vi_cwmin=3
wmm_ac_vi_cwmax=4
wmm_ac_vi_txop_limit=94
wmm_ac_vi_acm=0
wmm_ac_vo_aifs=2
wmm_ac_vo_cwmin=2
wmm_ac_vo_cwmax=3
wmm_ac_vo_txop_limit=47
wmm_ac_vo_acm=0
tx_queue_data3_aifs=7
tx_queue_data3_cwmin=15
tx_queue_data3_cwmax=1023
tx_queue_data3_burst=0
tx_queue_data2_aifs=3
tx_queue_data2_cwmin=15
tx_queue_data2_cwmax=63
tx_queue_data1_aifs=1
tx_queue_data1_cwmin=7
tx_queue_data1_cwmax=15
tx_queue_data1_burst=3.0
tx_queue_data0_aifs=1
tx_queue_data0_cwmin=3
tx_queue_data0_cwmax=7
tx_queue_data0_burst=1.5
country_code=DE
tx_queue_data2_burst=2.0
wmm_ac_be_txop_limit=64
ieee80211n=1
dynamic_ht40=0
ht_capab=[HT40+][LDPC][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][DSSS_CCK-40]
vht_capab=[RXLDPC][SHORT-GI-80][TX-STBC-2BY1][RX-STBC1][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN][MAX-MPDU-11454][MAX-A-MPDU-LEN-EXP7]
ieee80211ac=1
vht_oper_chwidth=1
vht_oper_centr_freq_seg0_idx=106
hw_mode=a
channel=100
frequency=5500
beacon_int=100

dtim_period=2

interface=ath1
disassoc_low_ack=1
wds_sta=1
wmm_enabled=1
bssid=E8:CC:18:FF:E0:A4
ignore_broadcast_ssid=0
max_num_sta=256
ssid=dd-wrt-NA-5
bridge=br0
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
dump_file=/tmp/hostapd.dump
eapol_version=1
eapol_key_index_workaround=0
wpa=2
wpa_passphrase=***********
wpa_key_mgmt=WPA-PSK
wpa_pairwise=CCMP
wpa_group_rekey=3600


bss=ath1.1
disassoc_low_ack=1
wmm_enabled=1
bssid=EA:CC:18:FF:E0:A4
ignore_broadcast_ssid=0
max_num_sta=256
ssid=dd-wrt-TV
bridge=br0
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
dump_file=/tmp/hostapd.dump
eapol_version=1
eapol_key_index_workaround=0
wpa=2
wpa_passphrase=************
wpa_key_mgmt=WPA-PSK
wpa_pairwise=CCMP
wpa_group_rekey=3600



Am 26.05.2015 um 08:20 schrieb Rajkumar Manoharan:
> On Tue, May 26, 2015 at 07:42:35AM +0200, Michal Kazior wrote:
>> On 26 May 2015 at 02:07, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
>>> Am 26.05.2015 um 01:42 schrieb Ben Greear:
>>>> Can you test with ath9k to make sure it is actually ath10k related?
>>> already tested. this device has 2 chipsets. one is ath9k based and the
>>> second is ath10k based. :-)
>>> only if i kill the hostapd process which controls ath10k. the memory waste
>>> is gone
>> Keep in mind that hostapd itself requires memory to function as well.
>> Each process (and thread) need some internal kernel memory (stack, et
>> al).
>>
> Have seen simialar issue long hours run in mbssid mode with multi-client.
> Killing hostapd regains memory.
>
> [<c021dd44>] (unwind_backtrace) from [<c021ae0c>] (show_stack+0x10/0x14)
> [<c021ae0c>] (show_stack) from [<c0336b9c>] (dump_stack+0x88/0xcc)
> [<c0336b9c>] (dump_stack) from [<c0279804>] (dump_header.isra.11+0x64/0x178)
> [<c0279804>] (dump_header.isra.11) from [<c0279b10>] (oom_kill_process+0x70/0x384)
> [<c0279b10>] (oom_kill_process) from [<c027a2a0>] (out_of_memory+0x2d4/0x304)
> [<c027a2a0>] (out_of_memory) from [<c027d180>] (__alloc_pages_nodemask+0x608/0x664)
> [<c027d180>] (__alloc_pages_nodemask) from [<c0278780>] (filemap_fault+0x1f8/0x390)
> [<c0278780>] (filemap_fault) from [<c028f45c>] (__do_fault+0xa4/0x42c)
> [<c028f45c>] (__do_fault) from [<c0292494>] (handle_mm_fault+0x230/0x7b0)
> [<c0292494>] (handle_mm_fault) from [<c021f70c>] (do_page_fault+0x114/0x26c)
> [<c021f70c>] (do_page_fault) from [<c0208440>] (do_PrefetchAbort+0x34/0x98)
>
> Need to check whether it is a regression or not.
>
> -Rajkumar
>
Michal Kazior May 26, 2015, 8:26 a.m. UTC | #20
On 26 May 2015 at 09:23, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
> Am 26.05.2015 um 07:42 schrieb Michal Kazior:
[...]
>> The driver itself should be consuming around 5MB of memory at idle
>> (interface up, no significant traffic). Most of this goes for the Rx
>> ring which has 1023*1920 bytes (+/- allocation and metadata waste).
>> Then there's a bunch of CE buffers as well which take up some memory
>> (used for driver-firmware communication), e.g. 2048*512 + 2048*128
>> (HTT and WMI, both target->host).
>>
>> When Txing it may eat up additional 1424 * (MSDU size +
>> sizeof(skbuff)). Note that Tx queues can be longer - driver isn't
>> aware of qdiscs and those can store frames as well.
>>
>> 11ac supports frame aggregates going up to 1MB so these queues pretty
>> much need to be this long if you want to be able to get highest
>> possible throughput.
>
> yes. but ath10k has its main usage on embedded devices. at least for AR9880
> chipsets

I'm aware of that.


> since there is not even a windows driver available for AR9880.
> so now consider that ath10k is not able to run on devices with good
> stability, where the QCA LSDK Driver
> does not seem to have that big resource problem.

Did you measure LSDK the same way within same conditions? Same libc,
same kernel, etc?

Do you see OOMs? What stability issues are we talking about?

Did you try stressing the system by actually trying to consume memory
until it's run out to see how much memory is _really_ left for the
system to use?


> so it doesnt make much sense to go on here in this way.
> this resource problem must be solved. about 50 MB is really too much.

I don't see this much memory being used with ath10k in my x86_64
virtual machine even with `free`. I see ~10MB of less "free" memory
after starting hostapd and running traffic for some time vs no hostapd
and ath10k stopped.

I don't even see how ath10k could take 50MB directly. Perhaps there's
some lazy memory recycling going on in the system? Maybe more memory
is effectively consumed (compared to ath9k) due to alignment
requirements or memory paging (which become more apparent with
increased number of allocations)?


Micha?

>
>
> Sebastian
>
>>
>>
>> Micha?
>>
>>>> Thanks,
>>>> Ben
>>>>
>>>>
>>>> On 05/25/2015 04:00 PM, Sebastian Gottschall wrote:
>>>>>
>>>>> Am 26.05.2015 um 00:39 schrieb Ben Greear:
>>>>>>
>>>>>> Default firmware has a hard-coded minimum number of tx buffers
>>>>>> (somewhere
>>>>>> more than 1k buffers I think).  Maybe driver is allocating all this
>>>>>> memory somehow?
>>>>>>
>>>>>> If you do one-way traffic tests (udp), I wonder if you can tell if it
>>>>>> is
>>>>>> tx
>>>>>> or rx that consumes the memory?
>>>>>
>>>>> its tx. i have a ethernet over ip tunnel running on that link and i
>>>>> broadcast iptv in that way. (its my way to convert multicast to
>>>>> unicast)
>>>>> the  tunnel itself is rfc ethernet over ip, which is somewhat like udp.
>>>>> so connectionless protocol
>>>>>
>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> CT firmware can be configured to use any multiple-of-8 amount of tx
>>>>>> buffers, though I have not tested below around 600.
>>>>>>
>>>>>> Thanks,
>>>>>> Ben
>>>>>>
>>>>>> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>>>>>>>
>>>>>>> today using the latest testing driver, i found out the memory
>>>>>>> consumption is unbelievable high.
>>>>>>> my router here has 64 mb ram. this ram is fully taken after some
>>>>>>> minutes by ath10k. but only if data flow present.
>>>>>>>
>>>>>>> here the results of "free" after some minutes
>>>>>>> root@DD-WRT:~# free
>>>>>>> total         used         free       shared      buffers
>>>>>>> Mem:         61636        58752         2884            0 2600
>>>>>>> -/+ buffers:              56152         5484
>>>>>>> Swap:            0            0            0
>>>>>>>
>>>>>>>
>>>>>>> now i terminate hostapd which controls the ath10k chipset
>>>>>>>
>>>>>>>
>>>>>>> root@DD-WRT:~# kill 902
>>>>>>> root@DD-WRT:~# free
>>>>>>> total         used         free       shared      buffers
>>>>>>> Mem:         61636        23212        38424            0 2416
>>>>>>> -/+ buffers:              20796        40840
>>>>>>> Swap:            0            0            0
>>>>>>>
>>>>>>>
>>>>>>> you see the difference?
>>>>>>>
>>>>>>>
>>>>>>> regards,
>>>>>>> Sebastian Gottschall
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ath10k mailing list
>>>>>>> ath10k@lists.infradead.org
>>>>>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>>>>>>
>>>
>>> _______________________________________________
>>> ath10k mailing list
>>> ath10k@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/ath10k
>
>
Sebastian Gottschall May 26, 2015, 8:37 a.m. UTC | #21
Am 26.05.2015 um 10:26 schrieb Michal Kazior:
> On 26 May 2015 at 09:23, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
>> Am 26.05.2015 um 07:42 schrieb Michal Kazior:
> [...]
>>> The driver itself should be consuming around 5MB of memory at idle
>>> (interface up, no significant traffic). Most of this goes for the Rx
>>> ring which has 1023*1920 bytes (+/- allocation and metadata waste).
>>> Then there's a bunch of CE buffers as well which take up some memory
>>> (used for driver-firmware communication), e.g. 2048*512 + 2048*128
>>> (HTT and WMI, both target->host).
>>>
>>> When Txing it may eat up additional 1424 * (MSDU size +
>>> sizeof(skbuff)). Note that Tx queues can be longer - driver isn't
>>> aware of qdiscs and those can store frames as well.
>>>
>>> 11ac supports frame aggregates going up to 1MB so these queues pretty
>>> much need to be this long if you want to be able to get highest
>>> possible throughput.
>> yes. but ath10k has its main usage on embedded devices. at least for AR9880
>> chipsets
> I'm aware of that.
>
>
>> since there is not even a windows driver available for AR9880.
>> so now consider that ath10k is not able to run on devices with good
>> stability, where the QCA LSDK Driver
>> does not seem to have that big resource problem.
> Did you measure LSDK the same way within same conditions? Same libc,
> same kernel, etc?
i measured userspace memory consumption. and all what cannot be seen can 
be counted as taken by the kernel.
>
> Do you see OOMs? What stability issues are we talking about?
>
> Did you try stressing the system by actually trying to consume memory
> until it's run out to see how much memory is _really_ left for the
> system to use?
no. the original dlink-dir859 firmware based on qca lsdk, does not 
provide oom's
but with ath10k i was able to crash my device, since it was running out 
of memory.
and i dont need to stress the system. running with one single client and 
8 mbit tx flow is enough to just have 2 mb ram free on a 64 mb system

>
>
>> so it doesnt make much sense to go on here in this way.
>> this resource problem must be solved. about 50 MB is really too much.
> I don't see this much memory being used with ath10k in my x86_64
> virtual machine even with `free`. I see ~10MB of less "free" memory
> after starting hostapd and running traffic for some time vs no hostapd
> and ath10k stopped.
you wont see the memory taken that easy and your x64 system has likelly 
alot of ram, so you dont notice that 50 mb are just taken by ath10k.
if you kill the hostapd process of ath10k, you will see the difference 
likelly.
one point here raised up, is that qca is aware of high memory 
consumption with vap's
my example has 2 vap's. i already provided a config file for hostapd on 
this mailing list
>
> I don't even see how ath10k could take 50MB directly. Perhaps there's
> some lazy memory recycling going on in the system? Maybe more memory
> is effectively consumed (compared to ath9k) due to alignment
> requirements or memory paging (which become more apparent with
> increased number of allocations)?
ath9k takes about 2 - 3 mb ram, if i compare the consumption before and 
after destroying a running ath9k hostapd instance.

>
>
> Micha?
>
>>
>> Sebastian
>>
>>>
>>> Micha?
>>>
>>>>> Thanks,
>>>>> Ben
>>>>>
>>>>>
>>>>> On 05/25/2015 04:00 PM, Sebastian Gottschall wrote:
>>>>>> Am 26.05.2015 um 00:39 schrieb Ben Greear:
>>>>>>> Default firmware has a hard-coded minimum number of tx buffers
>>>>>>> (somewhere
>>>>>>> more than 1k buffers I think).  Maybe driver is allocating all this
>>>>>>> memory somehow?
>>>>>>>
>>>>>>> If you do one-way traffic tests (udp), I wonder if you can tell if it
>>>>>>> is
>>>>>>> tx
>>>>>>> or rx that consumes the memory?
>>>>>> its tx. i have a ethernet over ip tunnel running on that link and i
>>>>>> broadcast iptv in that way. (its my way to convert multicast to
>>>>>> unicast)
>>>>>> the  tunnel itself is rfc ethernet over ip, which is somewhat like udp.
>>>>>> so connectionless protocol
>>>>>>
>>>>>> Sebastian
>>>>>>>
>>>>>>> CT firmware can be configured to use any multiple-of-8 amount of tx
>>>>>>> buffers, though I have not tested below around 600.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ben
>>>>>>>
>>>>>>> On 05/25/2015 02:26 PM, Sebastian Gottschall wrote:
>>>>>>>> today using the latest testing driver, i found out the memory
>>>>>>>> consumption is unbelievable high.
>>>>>>>> my router here has 64 mb ram. this ram is fully taken after some
>>>>>>>> minutes by ath10k. but only if data flow present.
>>>>>>>>
>>>>>>>> here the results of "free" after some minutes
>>>>>>>> root@DD-WRT:~# free
>>>>>>>> total         used         free       shared      buffers
>>>>>>>> Mem:         61636        58752         2884            0 2600
>>>>>>>> -/+ buffers:              56152         5484
>>>>>>>> Swap:            0            0            0
>>>>>>>>
>>>>>>>>
>>>>>>>> now i terminate hostapd which controls the ath10k chipset
>>>>>>>>
>>>>>>>>
>>>>>>>> root@DD-WRT:~# kill 902
>>>>>>>> root@DD-WRT:~# free
>>>>>>>> total         used         free       shared      buffers
>>>>>>>> Mem:         61636        23212        38424            0 2416
>>>>>>>> -/+ buffers:              20796        40840
>>>>>>>> Swap:            0            0            0
>>>>>>>>
>>>>>>>>
>>>>>>>> you see the difference?
>>>>>>>>
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Sebastian Gottschall
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ath10k mailing list
>>>>>>>> ath10k@lists.infradead.org
>>>>>>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>>>>>>>
>>>> _______________________________________________
>>>> ath10k mailing list
>>>> ath10k@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>
Michal Kazior May 26, 2015, 9:21 a.m. UTC | #22
On 26 May 2015 at 10:37, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
> Am 26.05.2015 um 10:26 schrieb Michal Kazior:
>>
>> On 26 May 2015 at 09:23, Sebastian Gottschall <s.gottschall@dd-wrt.com>
>> wrote:
[...]
>> Do you see OOMs? What stability issues are we talking about?
>>
>> Did you try stressing the system by actually trying to consume memory
>> until it's run out to see how much memory is _really_ left for the
>> system to use?
>
> no. the original dlink-dir859 firmware based on qca lsdk, does not provide
> oom's
> but with ath10k i was able to crash my device, since it was running out of
> memory.

How did it crash, i.e. did you manage to get a call trace? If not, can
you connect UART to the system and get one, please?


> and i dont need to stress the system. running with one single client and 8
> mbit tx flow is enough to just have 2 mb ram free on a 64 mb system

 1. Is the router acting as an endpoint in the traffic or a bridge?
 2. So does it crash or is free memory just low during traffic? It's
not clear to me.


>>> so it doesnt make much sense to go on here in this way.
>>> this resource problem must be solved. about 50 MB is really too much.
>>
>> I don't see this much memory being used with ath10k in my x86_64
>> virtual machine even with `free`. I see ~10MB of less "free" memory
>> after starting hostapd and running traffic for some time vs no hostapd
>> and ath10k stopped.
>
> you wont see the memory taken that easy and your x64 system has likelly alot
> of ram, so you dont notice that 50 mb are just taken by ath10k.

The amount of memory in a virtual machine doesn't matter. If anything
I should be seeing _more_ memory being consumed since kernel should be
more relaxed due to smaller memory pressure.

I have a very bare VM if you're implying I have a lot of background noise.

If you're still doubting here's a couple of printouts (I've run my VM
with 64MB of RAM; some of it is obviously reserved and unreachable):

user processes:
>     1 ?        S      0:01 /bin/sh /init
>  1189 ?        Ss     0:00 udevd --daemon
>  1471 ?        Ss     0:00 /usr/sbin/sshd
>  1530 ttyS0    Ss+    0:00 /bin/login -f
>  1533 ttyS0    S+     0:00  \_ -rc
>  1564 ttyS0    R+     0:00      \_ ps fax
(everything else is kernel threads)

after boot (ath10k module loaded and probed):
>              total       used       free     shared    buffers     cached
> Mem:         46928      33808      13120        152          0       5156
> -/+ buffers/cache:      28652      18276
> Swap:            0          0          0

hostad+iperf:
>              total       used       free     shared    buffers     cached
> Mem:         46928      44672       2256        440          0       2952
> -/+ buffers/cache:      41720       5208
> Swap:            0          0          0

hostapd (no iperf):
>              total       used       free     shared    buffers     cached
> Mem:         46928      42436       4492        500          0       2784
> -/+ buffers/cache:      39652       7276
> Swap:            0          0          0

hostapd stopped:
>              total       used       free     shared    buffers     cached
> Mem:         46928      32220      14708        388          0       2604
> -/+ buffers/cache:      29616      17312
> Swap:            0          0          0

ath10k_pci and ath10k_core unloaded:
>              total       used       free     shared    buffers     cached
> Mem:         46928      28552      18376        144          0       4712
> -/+ buffers/cache:      23840      23088
> Swap:            0          0          0

While running iperf I was able to get 400mbps+ of UDP traffic with
another 2x2 11ac device without much trouble.

Do note: The VM is running a glibc based system and has kernel and
modules with full debugging hence the high base memory usage. Yet it
still manages to work just fine.


> if you kill the hostapd process of ath10k, you will see the difference
> likelly.
> one point here raised up, is that qca is aware of high memory consumption
> with vap's
> my example has 2 vap's. i already provided a config file for hostapd on this
> mailing list

You must be aware you can't compare ath10k to LSDK apples to apples.
Their QSDK includes kernel customizations which makes it nearly
impossible to compare. They may have some fixes for the platform
itself that haven't been upstreamed for what it's worth.


Micha?
Sebastian Gottschall May 26, 2015, 11:19 a.m. UTC | #23
Am 26.05.2015 um 11:21 schrieb Michal Kazior:
> On 26 May 2015 at 10:37, Sebastian Gottschall <s.gottschall@dd-wrt.com> wrote:
>> Am 26.05.2015 um 10:26 schrieb Michal Kazior:
>>> On 26 May 2015 at 09:23, Sebastian Gottschall <s.gottschall@dd-wrt.com>
>>> wrote:
> [...]
>>> Do you see OOMs? What stability issues are we talking about?
>>>
>>> Did you try stressing the system by actually trying to consume memory
>>> until it's run out to see how much memory is _really_ left for the
>>> system to use?
>> no. the original dlink-dir859 firmware based on qca lsdk, does not provide
>> oom's
>> but with ath10k i was able to crash my device, since it was running out of
>> memory.
> How did it crash, i.e. did you manage to get a call trace? If not, can
> you connect UART to the system and get one, please?
no real crash. its was a out of memory hang. so the userspace will not 
work correct anymore
>
>
>> and i dont need to stress the system. running with one single client and 8
>> mbit tx flow is enough to just have 2 mb ram free on a 64 mb system
>   1. Is the router acting as an endpoint in the traffic or a bridge?
>   2. So does it crash or is free memory just low during traffic? It's
> not clear to me.
dlink asked me to port this device with dd-wrt so the router can be in 
any situation.
right now its configured as standard accesspoint with 2 interfaces for 
ath10k. (see my hostapd config i provided earlier today, it clearly 
shows how its configured)
and the crash is pure out of memory. the traffic is constant about 8 
mbit tx flow
>
>>>> so it doesnt make much sense to go on here in this way.
>>>> this resource problem must be solved. about 50 MB is really too much.
>>> I don't see this much memory being used with ath10k in my x86_64
>>> virtual machine even with `free`. I see ~10MB of less "free" memory
>>> after starting hostapd and running traffic for some time vs no hostapd
>>> and ath10k stopped.
>> you wont see the memory taken that easy and your x64 system has likelly alot
>> of ram, so you dont notice that 50 mb are just taken by ath10k.
> The amount of memory in a virtual machine doesn't matter. If anything
> I should be seeing _more_ memory being consumed since kernel should be
> more relaxed due to smaller memory pressure.
if the userspace has no memory left, the kernel will raise oom handler
>
> I have a very bare VM if you're implying I have a lot of background noise.
>
> If you're still doubting here's a couple of printouts (I've run my VM
> with 64MB of RAM; some of it is obviously reserved and unreachable):
>
> user processes:
>>      1 ?        S      0:01 /bin/sh /init
>>   1189 ?        Ss     0:00 udevd --daemon
>>   1471 ?        Ss     0:00 /usr/sbin/sshd
>>   1530 ttyS0    Ss+    0:00 /bin/login -f
>>   1533 ttyS0    S+     0:00  \_ -rc
>>   1564 ttyS0    R+     0:00      \_ ps fax
yes. i have alot of ram free. the system itself just takes 16 - 20 mb 
ram out of 64 mb.
if i now start the ath10k interface, the whole system memory is almost 
gone. (if traffic is flowing)
> (everything else is kernel threads)
>
> after boot (ath10k module loaded and probed):
>>               total       used       free     shared    buffers     cached
>> Mem:         46928      33808      13120        152          0       5156
>> -/+ buffers/cache:      28652      18276
>> Swap:            0          0          0
use the config i provided and generate some traffic. then you will see 
that the left memory is running till zero
> hostad+iperf:
>>               total       used       free     shared    buffers     cached
>> Mem:         46928      44672       2256        440          0       2952
>> -/+ buffers/cache:      41720       5208
>> Swap:            0          0          0
you see it already here.
> hostapd (no iperf):
>>               total       used       free     shared    buffers     cached
>> Mem:         46928      42436       4492        500          0       2784
>> -/+ buffers/cache:      39652       7276
>> Swap:            0          0          0
> hostapd stopped:
>>               total       used       free     shared    buffers     cached
>> Mem:         46928      32220      14708        388          0       2604
>> -/+ buffers/cache:      29616      17312
>> Swap:            0          0          0
> ath10k_pci and ath10k_core unloaded:
>>               total       used       free     shared    buffers     cached
>> Mem:         46928      28552      18376        144          0       4712
>> -/+ buffers/cache:      23840      23088
>> Swap:            0          0          0
> While running iperf I was able to get 400mbps+ of UDP traffic with
> another 2x2 11ac device without much trouble.
>
> Do note: The VM is running a glibc based system and has kernel and
> modules with full debugging hence the high base memory usage. Yet it
> still manages to work just fine.
mine is musl based. no debugging beside this
>
>
>> if you kill the hostapd process of ath10k, you will see the difference
>> likelly.
>> one point here raised up, is that qca is aware of high memory consumption
>> with vap's
>> my example has 2 vap's. i already provided a config file for hostapd on this
>> mailing list
> You must be aware you can't compare ath10k to LSDK apples to apples.
> Their QSDK includes kernel customizations which makes it nearly
> impossible to compare. They may have some fixes for the platform
> itself that haven't been upstreamed for what it's worth.
i know. but what we want to reach is that ath10k can be used in routers 
with 64 mb ram. right now its not enough
and dd-wrt is highly optimized for small memory footprint. but it has 
also features like nas storage or even freeradius which cannot be used
on such devices if all memory is already taken by a single driver
>
>
> Micha?
>
Kalle Valo May 27, 2015, 10:25 a.m. UTC | #24
Kalle Valo <kvalo@qca.qualcomm.com> writes:

>>>> Fixes: c17c997d5613 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next")
>>>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>>>
>>> Apparently this also fixes some weird issues with qca6174 hw2.1 notably:
>>>  - ath10k causing disconnecting of other devices in a BSS
>>>  - random Fw crashes
>>>
>>> Both problems started to happen because c17c997d5613 enabled monitor
>>> vdev by default on STA interfaces. It seems that qca6174 hw2.1
>>> firmware has issues similar to those of qca988x 999.999.0.636
>>> regarding monitor vdev opration.
>>>
>>> Also, I've made a typo in the subject.
>>>
>>> I'll post v2 with subject fixed and extended commit log later.
>>
>> Keep in mind that c17c997d5613 is actually from wireless-testing.git
>> which means that it will never go to wireless-drivers-next.git nor to
>> net-next.git. So the merge conflict bug is purely in
>> wireless-testing.git and in master branch of ath.git (but not in
>> ath-next branch!).
>>
>> I think John should apply your v2 patch once you send it. But if you
>> have something which should be fixed in ath-next remember to send that
>> in a separate patch so that I can apply that directly to ath-next.
>
> Actually now that Dave pulled my pull request the issue is fixed in
> wireless-drivers-next already. So once John pulls from
> wireless-drivers-next and makes sure that ath10k is 100% identical in
> both trees the issue should be sorted out and no need for extra patches.

John now fixed this in wireless-testing, thanks John. And I now updated
ath.git master branch so it should be ok as well. Please let me know if
there are still problems.
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 425dbe271495..594eb369ff7f 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -1031,22 +1031,6 @@  static int ath10k_monitor_stop(struct ath10k *ar)
 	return 0;
 }
 
-static bool ath10k_mac_should_disable_promisc(struct ath10k *ar)
-{
-	struct ath10k_vif *arvif;
-
-	if (!ar->num_started_vdevs)
-		return false;
-
-	list_for_each_entry(arvif, &ar->arvifs, list)
-		if (arvif->vdev_type != WMI_VDEV_TYPE_AP)
-			return false;
-
-	ath10k_dbg(ar, ATH10K_DBG_MAC,
-		   "mac disabling promiscuous mode because vdev is started\n");
-	return true;
-}
-
 static bool ath10k_mac_monitor_vdev_is_needed(struct ath10k *ar)
 {
 	int num_ctx;
@@ -1065,7 +1049,6 @@  static bool ath10k_mac_monitor_vdev_is_needed(struct ath10k *ar)
 		return false;
 
 	return ar->monitor ||
-	       !ath10k_mac_should_disable_promisc(ar) ||
 	       test_bit(ATH10K_CAC_RUNNING, &ar->dev_flags);
 }
 
@@ -1267,7 +1250,7 @@  static int ath10k_vdev_start_restart(struct ath10k_vif *arvif,
 {
 	struct ath10k *ar = arvif->ar;
 	struct wmi_vdev_start_request_arg arg = {};
-	int ret = 0, ret2;
+	int ret = 0;
 
 	lockdep_assert_held(&ar->conf_mutex);
 
@@ -1326,16 +1309,6 @@  static int ath10k_vdev_start_restart(struct ath10k_vif *arvif,
 	ar->num_started_vdevs++;
 	ath10k_recalc_radar_detection(ar);
 
-	ret = ath10k_monitor_recalc(ar);
-	if (ret) {
-		ath10k_warn(ar, "mac failed to recalc monitor for vdev %i restart %d: %d\n",
-			    arg.vdev_id, restart, ret);
-		ret2 = ath10k_vdev_stop(arvif);
-		if (ret2)
-			ath10k_warn(ar, "mac failed to stop vdev %i restart %d: %d\n",
-				    arg.vdev_id, restart, ret2);
-	}
-
 	return ret;
 }