diff mbox

ath10k: Make HTT fill size configurable

Message ID 1420432796-10765-1-git-send-email-sujith@msujith.org (mailing list archive)
State Rejected
Headers show

Commit Message

Sujith Manoharan Jan. 5, 2015, 4:39 a.m. UTC
From: Sujith Manoharan <c_manoha@qca.qualcomm.com>

The HTT RX ring is replenished with a maximum of 16 buffers,
but this might be insufficient when RX traffic is high.
Not having enough RX buffers throttles the FW, resulting
in low throughput.

This patch adds a module parameter to adjust the fill size
based on the platform/usage.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/core.c   | 3 +++
 drivers/net/wireless/ath/ath10k/htt.h    | 2 ++
 drivers/net/wireless/ath/ath10k/htt_rx.c | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

Comments

Michal Kazior Jan. 7, 2015, 10:14 a.m. UTC | #1
On 5 January 2015 at 05:39, Sujith Manoharan <sujith@msujith.org> wrote:
> From: Sujith Manoharan <c_manoha@qca.qualcomm.com>
>
> The HTT RX ring is replenished with a maximum of 16 buffers,
> but this might be insufficient when RX traffic is high.
> Not having enough RX buffers throttles the FW, resulting
> in low throughput.
>
> This patch adds a module parameter to adjust the fill size
> based on the platform/usage.

I don't think this should be adjustable.

I assume you tested this in conjunction with rx batching patch - what
fill level values did you use to see throughput improvement? How
(un)responsive was serial/ssh connection to the machine?

I have a patch internally which removes the replenishment tasklet and
I wonder how will it behave with rx batching then, hmm..


Micha?
Sujith Manoharan Jan. 7, 2015, 10:37 a.m. UTC | #2
Michal Kazior wrote:
> I don't think this should be adjustable.

Why ?

> I assume you tested this in conjunction with rx batching patch - what
> fill level values did you use to see throughput improvement? How
> (un)responsive was serial/ssh connection to the machine?

Any value north of 128 made the console unresponsive, with
the CPU maxed out. But 16 appears to be rather low ? Performance
numbers are low and we end up with lots of replenish tasklets.

But - the internal driver somehow manages to keep CPU usage
down at a fill level of *1000*.

I think for ath10k, the default could stay the same on low end MIPS boards
like AP135, but on slightly more powerful boards like AP148 with dual-core
ARM processors, performance can be boosted by increasing the fill level
while still keeping responsiveness at a reasonable level.

> I have a patch internally which removes the replenishment tasklet and
> I wonder how will it behave with rx batching then, hmm..

I tried that earlier and could see only marginal improvement. The biggest
increase came when I replaced netif_receive_skb() with netif_rx()
in mac80211/rx.c.

Sujith
Vasanthakumar Thiagarajan Jan. 7, 2015, 10:53 a.m. UTC | #3
On Wednesday 07 January 2015 03:44 PM, Michal Kazior wrote:
> On 5 January 2015 at 05:39, Sujith Manoharan <sujith@msujith.org> wrote:
>> From: Sujith Manoharan <c_manoha@qca.qualcomm.com>
>>
>> The HTT RX ring is replenished with a maximum of 16 buffers,
>> but this might be insufficient when RX traffic is high.
>> Not having enough RX buffers throttles the FW, resulting
>> in low throughput.
>>
>> This patch adds a module parameter to adjust the fill size
>> based on the platform/usage.
>
> I don't think this should be adjustable.

Default value of 16 causes rx ring corruption within 1 hour of stress
testing. Increasing this limit to 1000 solves the issue, not sure what
max value in > 16 and <= 1000 works. This lower value obviously has issues
on some platforms.

Vasanth
Kalle Valo Jan. 13, 2015, 2:44 p.m. UTC | #4
Sujith Manoharan <sujith@msujith.org> writes:

>> I have a patch internally which removes the replenishment tasklet and
>> I wonder how will it behave with rx batching then, hmm..
>
> I tried that earlier and could see only marginal improvement. The biggest
> increase came when I replaced netif_receive_skb() with netif_rx()
> in mac80211/rx.c.

Any ideas why that increases throughput?
Kalle Valo Jan. 13, 2015, 2:48 p.m. UTC | #5
Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> writes:

>>> This patch adds a module parameter to adjust the fill size
>>> based on the platform/usage.
>>
>> I don't think this should be adjustable.
>
> Default value of 16 causes rx ring corruption within 1 hour of stress
> testing. Increasing this limit to 1000 solves the issue, not sure what
> max value in > 16 and <= 1000 works.

Does the patch below fix that issue? Or is this issue still open?

5de6dfc82f71 ath10k: Fix potential Rx ring corruption
Kalle Valo Jan. 13, 2015, 3:03 p.m. UTC | #6
Sujith Manoharan <sujith@msujith.org> writes:

> From: Sujith Manoharan <c_manoha@qca.qualcomm.com>
>
> The HTT RX ring is replenished with a maximum of 16 buffers,
> but this might be insufficient when RX traffic is high.
> Not having enough RX buffers throttles the FW, resulting
> in low throughput.
>
> This patch adds a module parameter to adjust the fill size
> based on the platform/usage.
>
> Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>

[...]

> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -33,14 +33,17 @@
>  unsigned int ath10k_debug_mask;
>  static bool uart_print;
>  static bool skip_otp;
> +int htt_fill_size = ATH10K_HTT_MAX_NUM_REFILL;
>  
>  module_param_named(debug_mask, ath10k_debug_mask, uint, 0644);
>  module_param(uart_print, bool, 0644);
>  module_param(skip_otp, bool, 0644);
> +module_param(htt_fill_size, int, 0644);
>  
>  MODULE_PARM_DESC(debug_mask, "Debugging mask");
>  MODULE_PARM_DESC(uart_print, "Uart target debugging");
>  MODULE_PARM_DESC(skip_otp, "Skip otp failure for calibration in testmode");
> +MODULE_PARM_DESC(htt_fill_size, "HTT RX ring fill size");

I'm not sure if I really like this path of having interfaces to
configure driver internals. I suspect that if we add this, we will have
even more later. It's a lot more documenting for us and more work to the
users to understand what parameters they should use. Also this means
that testing will be more challenging as people will use different
values and results won't be comparable.

Isn't there any other way to solve the problem? Like automatically
changing the value somehow (no idea how) or some other means?

Or if nothing else helps, a crazy idea is to have some sort of platform
profile parameter:

0 = auto
1 = slow
2 = fast
3 = superfast (x86)

And then we would have preset values (not just htt_fill_size) within
ath10k and they get chosen based on the profile configured.
Sujith Manoharan Jan. 13, 2015, 8:56 p.m. UTC | #7
Kalle Valo wrote:
> > I tried that earlier and could see only marginal improvement. The biggest
> > increase came when I replaced netif_receive_skb() with netif_rx()
> > in mac80211/rx.c.
> 
> Any ideas why that increases throughput?

Not sure, but RPS is enabled by default on this platform,
so maybe that influences something...

Sujith
Sujith Manoharan Jan. 13, 2015, 9:08 p.m. UTC | #8
Kalle Valo wrote:
> I'm not sure if I really like this path of having interfaces to
> configure driver internals. I suspect that if we add this, we will have
> even more later. It's a lot more documenting for us and more work to the
> users to understand what parameters they should use. Also this means
> that testing will be more challenging as people will use different
> values and results won't be comparable.
> 
> Isn't there any other way to solve the problem? Like automatically
> changing the value somehow (no idea how) or some other means?

We are limiting performance by restricting the fill size. A user
will just use the default, which is still the same and remains
unchanged. But, having a way to adjust this based on the platform
seems reasonable to me and I think trying to change this value dynamically
is overkill.

> Or if nothing else helps, a crazy idea is to have some sort of platform
> profile parameter:
> 
> 0 = auto
> 1 = slow
> 2 = fast
> 3 = superfast (x86)
> 
> And then we would have preset values (not just htt_fill_size) within
> ath10k and they get chosen based on the profile configured.

I don't think a network driver should limit its performance with
such profiles. Moreover, x86 can be slow too - at least, my 4 year
old machine is. :-)

Sujith
Michal Kazior Jan. 14, 2015, 9:22 a.m. UTC | #9
On 13 January 2015 at 22:08, Sujith Manoharan <sujith@msujith.org> wrote:
> Kalle Valo wrote:
>> I'm not sure if I really like this path of having interfaces to
>> configure driver internals. I suspect that if we add this, we will have
>> even more later. It's a lot more documenting for us and more work to the
>> users to understand what parameters they should use. Also this means
>> that testing will be more challenging as people will use different
>> values and results won't be comparable.
>>
>> Isn't there any other way to solve the problem? Like automatically
>> changing the value somehow (no idea how) or some other means?
>
> We are limiting performance by restricting the fill size. A user
> will just use the default, which is still the same and remains
> unchanged. But, having a way to adjust this based on the platform
> seems reasonable to me and I think trying to change this value dynamically
> is overkill.

We should just fix the tx/rx processing instead. The HTT throttling
limit was originally introduced to deal with watchdog issues we've
seen on AP135. Tasklets were starving system too much.

I've been playing around with threaded irqs in ath10k in my tree and
I've seen improvement with Rx. However Tx instead becomes broken in
the process and I'm yet to find a definite and final answer why that
is the case. My suspicion is that NAPI, which is used by the ethernet
driver, runs in tasklets and they aren't frequent enough to trigger
ksoftirqd so they starve the system. The current non-threaded irq
approach yields more tasklet schedules for Tx and hits ksoftirqd more
often making it nice on userspace. If that's the case I don't really
have an idea how to solve this now.


Micha?
Sujith Manoharan Jan. 14, 2015, 9:50 a.m. UTC | #10
Michal Kazior wrote:
> We should just fix the tx/rx processing instead. The HTT throttling
> limit was originally introduced to deal with watchdog issues we've
> seen on AP135. Tasklets were starving system too much.

Then the fill limit restriction is a workaround for AP135
and shouldn't be applicable for other platforms ?

> I've been playing around with threaded irqs in ath10k in my tree and
> I've seen improvement with Rx. However Tx instead becomes broken in
> the process and I'm yet to find a definite and final answer why that
> is the case. My suspicion is that NAPI, which is used by the ethernet
> driver, runs in tasklets and they aren't frequent enough to trigger
> ksoftirqd so they starve the system. The current non-threaded irq
> approach yields more tasklet schedules for Tx and hits ksoftirqd more
> often making it nice on userspace. If that's the case I don't really
> have an idea how to solve this now.

I haven't looked at the TX path in detail, but regarding RX, these
were the bottlenecks:

* RX batch indication.
* HTT fill level.
* netif_receive_skb() usage instead of netif_rx().

AFAIK, the internal driver schedules only one tasklet for a CE interrupt
and everything is done in that context, along with refilling HTT.
ath10k has several stages: ISR -> CE tasklet -> HTT tasklet -> Replenish tasklet.
The softirq count/load will definitely be high with so many tasklets ?

Sujith
Michal Kazior Jan. 14, 2015, 10:12 a.m. UTC | #11
On 14 January 2015 at 10:50, Sujith Manoharan <sujith@msujith.org> wrote:
> Michal Kazior wrote:
>> We should just fix the tx/rx processing instead. The HTT throttling
>> limit was originally introduced to deal with watchdog issues we've
>> seen on AP135. Tasklets were starving system too much.
>
> Then the fill limit restriction is a workaround for AP135
> and shouldn't be applicable for other platforms ?

If I were to narrow down I'd say all uniprocessors. AP135 is just an
example where the problem easily occurs since it has an underpowered
cpu for the task. Even a more powerful single-core system will get
into trouble - all you need is a couple extra netfilter rules, nat,
some running services or additional processing hardware (usb anyone?).


>> I've been playing around with threaded irqs in ath10k in my tree and
>> I've seen improvement with Rx. However Tx instead becomes broken in
>> the process and I'm yet to find a definite and final answer why that
>> is the case. My suspicion is that NAPI, which is used by the ethernet
>> driver, runs in tasklets and they aren't frequent enough to trigger
>> ksoftirqd so they starve the system. The current non-threaded irq
>> approach yields more tasklet schedules for Tx and hits ksoftirqd more
>> often making it nice on userspace. If that's the case I don't really
>> have an idea how to solve this now.
>
> I haven't looked at the TX path in detail, but regarding RX, these
> were the bottlenecks:
>
> * RX batch indication.
> * HTT fill level.
> * netif_receive_skb() usage instead of netif_rx().
>
> AFAIK, the internal driver schedules only one tasklet for a CE interrupt
> and everything is done in that context, along with refilling HTT.
> ath10k has several stages: ISR -> CE tasklet -> HTT tasklet -> Replenish tasklet.
> The softirq count/load will definitely be high with so many tasklets ?

Apparently this isn't enough. Also, tasklets aren't subject to regular
scheduling policies and they just steal time from other threads. This
is important if you consider how much time a single tasklet can run -
you can actually estimate this.

800mbps is 100mbytes/s. Assume this is what host system can handle at 100% cpu.
HTT Rx ring is 1000 frames long which is ~1.5mbyte of data (assuming
1500bytes for each packet).
You eventually end up with cycles which drain entire htt rx ring and
then replenish it if you push more traffic that host cpu can take.
100mbytes / 1.5mbytes = ~66runs/s which is ~15ms for each tasklet run.
That's a lot. You might not get a chance to cycle through all the
running processes to give them their timeslice for a few seconds..

If you starve userspace which runs a watchdog process you'll end up
failing to poke the watchdog timer in kernel and you'll get a reboot.

I'm starting to think tasklets are plain evil for network drivers :P


Micha?
Sujith Manoharan Jan. 15, 2015, 12:17 a.m. UTC | #12
Michal Kazior wrote:
> If I were to narrow down I'd say all uniprocessors. AP135 is just an
> example where the problem easily occurs since it has an underpowered
> cpu for the task. Even a more powerful single-core system will get
> into trouble - all you need is a couple extra netfilter rules, nat,
> some running services or additional processing hardware (usb anyone?).

There are platforms which offload such network operations to the HW.
AP148 is an example.

> Apparently this isn't enough. Also, tasklets aren't subject to regular
> scheduling policies and they just steal time from other threads. This
> is important if you consider how much time a single tasklet can run -
> you can actually estimate this.
> 
> 800mbps is 100mbytes/s. Assume this is what host system can handle at 100% cpu.
> HTT Rx ring is 1000 frames long which is ~1.5mbyte of data (assuming
> 1500bytes for each packet).
> You eventually end up with cycles which drain entire htt rx ring and
> then replenish it if you push more traffic that host cpu can take.
> 100mbytes / 1.5mbytes = ~66runs/s which is ~15ms for each tasklet run.
> That's a lot. You might not get a chance to cycle through all the
> running processes to give them their timeslice for a few seconds..
> 
> If you starve userspace which runs a watchdog process you'll end up
> failing to poke the watchdog timer in kernel and you'll get a reboot.
> 
> I'm starting to think tasklets are plain evil for network drivers :P

Well, we need a way to lift the fill level restriction for platforms
that can afford a higher value, until the tx/rx engine in ath10k is
rewritten - and I don't want to carry an internal patch...

Sujith
Kalle Valo Jan. 15, 2015, 7:19 a.m. UTC | #13
Michal Kazior <michal.kazior@tieto.com> writes:

> I'm starting to think tasklets are plain evil for network drivers :P

They are. Actually I remember years ago seeing a discussion about
removing tasklets altogether. But obviously it didn't go anywhere as we
still have them.
Kalle Valo Jan. 15, 2015, 7:38 a.m. UTC | #14
Sujith Manoharan <sujith@msujith.org> writes:

> Kalle Valo wrote:
>> I'm not sure if I really like this path of having interfaces to
>> configure driver internals. I suspect that if we add this, we will have
>> even more later. It's a lot more documenting for us and more work to the
>> users to understand what parameters they should use. Also this means
>> that testing will be more challenging as people will use different
>> values and results won't be comparable.
>> 
>> Isn't there any other way to solve the problem? Like automatically
>> changing the value somehow (no idea how) or some other means?
>
> We are limiting performance by restricting the fill size. A user
> will just use the default, which is still the same and remains
> unchanged. But, having a way to adjust this based on the platform
> seems reasonable to me and I think trying to change this value dynamically
> is overkill.

Did you read at all what I wrote above? For example, what if we later
don't actually use ATH10K_HTT_MAX_NUM_REFILL anymore?

And the fundamental problem is that I still don't know what fill value
you think is the best to use here, and that means neither will the
users. That's why the values need to be within ath10k, not expose driver
internals and use some magic numbers which are not available anywhere.

>> Or if nothing else helps, a crazy idea is to have some sort of platform
>> profile parameter:
>> 
>> 0 = auto
>> 1 = slow
>> 2 = fast
>> 3 = superfast (x86)
>> 
>> And then we would have preset values (not just htt_fill_size) within
>> ath10k and they get chosen based on the profile configured.
>
> I don't think a network driver should limit its performance with
> such profiles. Moreover, x86 can be slow too - at least, my 4 year
> old machine is. :-)

We are not limitting anything more. That's not any different from what
your patch does, just that the parameter name is different and the user
doesn't have direct access to the internal parameter. Let me explain a
bit more how that would work in this HTT fill case:

profile 0 (auto):
profile 1 (slow):
        htt_fill_size = ATH10K_HTT_MAX_NUM_REFILL
        break:
profile 2 (fast):
        htt_fill_size = 77 /* or whatever */
        break;

So in this method the user can still choose optimal settings for a
certain platform, I assume AP148 in this case, and not know about driver
internals. And if there are more parameters we need to change in the
future, we can use the same modparam to change that as well.
Sujith Manoharan Jan. 15, 2015, 7:50 a.m. UTC | #15
Kalle Valo wrote:
> Did you read at all what I wrote above? For example, what if we later
> don't actually use ATH10K_HTT_MAX_NUM_REFILL anymore?
> 
> And the fundamental problem is that I still don't know what fill value
> you think is the best to use here, and that means neither will the
> users. That's why the values need to be within ath10k, not expose driver
> internals and use some magic numbers which are not available anywhere.

What magic numbers ?

How is 16 less a magic number than 128 or 96 or the original 1000 ?

> We are not limitting anything more. That's not any different from what
> your patch does, just that the parameter name is different and the user
> doesn't have direct access to the internal parameter. Let me explain a
> bit more how that would work in this HTT fill case:
> 
> profile 0 (auto):
> profile 1 (slow):
>         htt_fill_size = ATH10K_HTT_MAX_NUM_REFILL
>         break:
> profile 2 (fast):
>         htt_fill_size = 77 /* or whatever */
>         break;
> 
> So in this method the user can still choose optimal settings for a
> certain platform, I assume AP148 in this case, and not know about driver
> internals. And if there are more parameters we need to change in the
> future, we can use the same modparam to change that as well.

You seem to be fixated on some imaginary user that is going to be
confused by a module parameter.

I don't see how adding more complexity by adding profiles and such
is going to make a user less confused.

Sujith
Kalle Valo Jan. 15, 2015, 8:09 a.m. UTC | #16
Sujith Manoharan <sujith@msujith.org> writes:

> Kalle Valo wrote:
>> Did you read at all what I wrote above? For example, what if we later
>> don't actually use ATH10K_HTT_MAX_NUM_REFILL anymore?
>> 
>> And the fundamental problem is that I still don't know what fill value
>> you think is the best to use here, and that means neither will the
>> users. That's why the values need to be within ath10k, not expose driver
>> internals and use some magic numbers which are not available anywhere.
>
> What magic numbers ?
>
> How is 16 less a magic number than 128 or 96 or the original 1000 ?

Magic number for the _user_. With an user I mean an engineer installing
ath10k to his device/distro/whatever.

Of course an ath10k developer will understand the internal numbers
inside out, but we are not here talking about ath10k developers.

>> We are not limitting anything more. That's not any different from what
>> your patch does, just that the parameter name is different and the user
>> doesn't have direct access to the internal parameter. Let me explain a
>> bit more how that would work in this HTT fill case:
>> 
>> profile 0 (auto):
>> profile 1 (slow):
>>         htt_fill_size = ATH10K_HTT_MAX_NUM_REFILL
>>         break:
>> profile 2 (fast):
>>         htt_fill_size = 77 /* or whatever */
>>         break;
>> 
>> So in this method the user can still choose optimal settings for a
>> certain platform, I assume AP148 in this case, and not know about driver
>> internals. And if there are more parameters we need to change in the
>> future, we can use the same modparam to change that as well.
>
> You seem to be fixated on some imaginary user that is going to be
> confused by a module parameter.

Imaginary? You should do some IT support sometime to see how it really
is like there :) People have even problems installing firmware images.

> I don't see how adding more complexity by adding profiles and such
> is going to make a user less confused.

Because it's easy to document three values, or actually two in this
case. I STILL don't know what values you think people should use with
the htt_fill parameter.

But of course I would prefer that there would be no need to configure
any module parameters at all and ath10k would work in optimal speed out
of box. That's what we should strive for.
Sujith Manoharan Jan. 15, 2015, 8:16 a.m. UTC | #17
Kalle Valo wrote:
> I STILL don't know what values you think people should use with
> the htt_fill parameter.

None. The default stays the same. But, it can be adjusted based
on the platform, if higher throughput is desired.

Sujith
Sujith Manoharan Jan. 15, 2015, 8:21 a.m. UTC | #18
Sujith Manoharan wrote:
> None. The default stays the same. But, it can be adjusted based
> on the platform, if higher throughput is desired.

But, anyway, this patch can be dropped. We'll change the fill level
restriction internally, based on the customer's platform.

Sujith
Kalle Valo Jan. 15, 2015, 8:32 a.m. UTC | #19
Sujith Manoharan <sujith@msujith.org> writes:

> Kalle Valo wrote:
>> I STILL don't know what values you think people should use with
>> the htt_fill parameter.
>
> None. The default stays the same. But, it can be adjusted based
> on the platform, if higher throughput is desired.

Yeah, but everyone want higher throughput (including me). So what value
is best in your opinion?
Kalle Valo Jan. 15, 2015, 8:35 a.m. UTC | #20
Sujith Manoharan <sujith@msujith.org> writes:

> Sujith Manoharan wrote:
>> None. The default stays the same. But, it can be adjusted based
>> on the platform, if higher throughput is desired.
>
> But, anyway, this patch can be dropped. We'll change the fill level
> restriction internally, based on the customer's platform.

I'm sad to hear that, now others cannot benefit from your work.

What was wrong with my proposal? Would it help if I write an RFC patch?
Just tell me what HTT fill value I should use.
Sujith Manoharan Jan. 15, 2015, 8:43 a.m. UTC | #21
Kalle Valo wrote:
> Yeah, but everyone want higher throughput (including me). So what value
> is best in your opinion?

Lower throughput has been okay for all this time, with a value
of 16. I suppose the value can be retained to not cause regressions
on AP135 or other platforms with a low-powered CPU. As Michal mentioned,
this value allows AP135 to function without the watchdog rebooting
the board.

Sujith
Sujith Manoharan Jan. 15, 2015, 8:47 a.m. UTC | #22
Kalle Valo wrote:
> Would it help if I write an RFC patch?

No, I think having profiles like 'slow', 'fast', 'super-fast' for a network
driver will makes things complex. But, maybe others on this list
might want it...

Sujith
Julian Calaby Jan. 15, 2015, 11:03 p.m. UTC | #23
Hi Sujith,

On Thu, Jan 15, 2015 at 7:43 PM, Sujith Manoharan <sujith@msujith.org> wrote:
> Kalle Valo wrote:
>> Yeah, but everyone want higher throughput (including me). So what value
>> is best in your opinion?
>
> Lower throughput has been okay for all this time, with a value
> of 16. I suppose the value can be retained to not cause regressions
> on AP135 or other platforms with a low-powered CPU. As Michal mentioned,
> this value allows AP135 to function without the watchdog rebooting
> the board.

Can we detect if this parameter is too high / low and adjust / limit
it on the fly?

Thanks,
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 2d0671e..a74af21 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -33,14 +33,17 @@ 
 unsigned int ath10k_debug_mask;
 static bool uart_print;
 static bool skip_otp;
+int htt_fill_size = ATH10K_HTT_MAX_NUM_REFILL;
 
 module_param_named(debug_mask, ath10k_debug_mask, uint, 0644);
 module_param(uart_print, bool, 0644);
 module_param(skip_otp, bool, 0644);
+module_param(htt_fill_size, int, 0644);
 
 MODULE_PARM_DESC(debug_mask, "Debugging mask");
 MODULE_PARM_DESC(uart_print, "Uart target debugging");
 MODULE_PARM_DESC(skip_otp, "Skip otp failure for calibration in testmode");
+MODULE_PARM_DESC(htt_fill_size, "HTT RX ring fill size");
 
 static const struct ath10k_hw_params ath10k_hw_params_list[] = {
 	{
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 1bd5545..1aa5db1 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -26,6 +26,8 @@ 
 #include "htc.h"
 #include "rx_desc.h"
 
+extern int htt_fill_size;
+
 enum htt_dbg_stats_type {
 	HTT_DBG_STATS_WAL_PDEV_TXRX = 1 << 0,
 	HTT_DBG_STATS_RX_REORDER    = 1 << 1,
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 9c782a4..1440b0e 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -128,7 +128,7 @@  static void ath10k_htt_rx_msdu_buff_replenish(struct ath10k_htt *htt)
 	 * improves the avarage and stability. */
 	spin_lock_bh(&htt->rx_ring.lock);
 	num_deficit = htt->rx_ring.fill_level - htt->rx_ring.fill_cnt;
-	num_to_fill = min(ATH10K_HTT_MAX_NUM_REFILL, num_deficit);
+	num_to_fill = min(htt_fill_size, num_deficit);
 	num_deficit -= num_to_fill;
 	ret = ath10k_htt_rx_ring_fill_n(htt, num_to_fill);
 	if (ret == -ENOMEM) {