diff mbox series

[v2,net] ibmveth: Reduce maximum tx queues to 8

Message ID 20221102183837.157966-1-nnac123@linux.ibm.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,net] ibmveth: Reduce maximum tx queues to 8 | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 8 maintainers not CCed: linuxppc-dev@lists.ozlabs.org kuba@kernel.org mpe@ellerman.id.au davem@davemloft.net npiggin@gmail.com christophe.leroy@csgroup.eu edumazet@google.com pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: 'Taht' may be misspelled - perhaps 'That'? WARNING: 'taht' may be misspelled - perhaps 'that'?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Nick Child Nov. 2, 2022, 6:38 p.m. UTC
Previously, the maximum number of transmit queues allowed was 16. Due to
resource concerns, limit to 8 queues instead.

Since the driver is virtualized away from the physical NIC, the purpose
of multiple queues is purely to allow for parallel calls to the
hypervisor. Therefore, there is no noticeable effect on performance by
reducing queue count to 8.

Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---
Relevant links:
 - Introduce multiple tx queues (accepted in v6.1):
   https://lore.kernel.org/netdev/20220928214350.29795-2-nnac123@linux.ibm.com/
 - Resource concerns with 16 queues:
   https://lore.kernel.org/netdev/CAA93jw5reJmaOvt9vw15C1fo1AN7q5jVKzUocbAoNDC-cpi=KQ@mail.gmail.com/
  - v1 (only change is commit message length and typo in Reported-by tag):
    https://lore.kernel.org/netdev/20221102153040.149244-1-nnac123@linux.ibm.com/
 drivers/net/ethernet/ibm/ibmveth.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jakub Kicinski Nov. 4, 2022, 3:59 a.m. UTC | #1
On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:
> Previously, the maximum number of transmit queues allowed was 16. Due to
> resource concerns, limit to 8 queues instead.
> 
> Since the driver is virtualized away from the physical NIC, the purpose
> of multiple queues is purely to allow for parallel calls to the
> hypervisor. Therefore, there is no noticeable effect on performance by
> reducing queue count to 8.

I'm not sure if that's the point Dave was making but we should be
influencing the default, not the MAX. Why limit the MAX?
Nick Child Nov. 4, 2022, 2:06 p.m. UTC | #2
On 11/3/22 22:59, Jakub Kicinski wrote:
> On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:
>> Previously, the maximum number of transmit queues allowed was 16. Due to
>> resource concerns, limit to 8 queues instead.
>>
>> Since the driver is virtualized away from the physical NIC, the purpose
>> of multiple queues is purely to allow for parallel calls to the
>> hypervisor. Therefore, there is no noticeable effect on performance by
>> reducing queue count to 8.
> 
> I'm not sure if that's the point Dave was making but we should be
> influencing the default, not the MAX. Why limit the MAX?

The MAX is always allocated in the drivers probe function. In the 
drivers open and ethtool-set-channels functions we set 
real_num_tx_queues. So the number of allocated queues is always MAX
but the number of queues actually in use may differ and can be set by 
the user.
I hope this explains. Otherwise, please let me know.

Thanks again for taking the time to review,
Nick Child
Jakub Kicinski Nov. 4, 2022, 5:59 p.m. UTC | #3
On Fri, 4 Nov 2022 09:06:02 -0500 Nick Child wrote:
> On 11/3/22 22:59, Jakub Kicinski wrote:
> > On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:  
> >> Previously, the maximum number of transmit queues allowed was 16. Due to
> >> resource concerns, limit to 8 queues instead.
> >>
> >> Since the driver is virtualized away from the physical NIC, the purpose
> >> of multiple queues is purely to allow for parallel calls to the
> >> hypervisor. Therefore, there is no noticeable effect on performance by
> >> reducing queue count to 8.  
> > 
> > I'm not sure if that's the point Dave was making but we should be
> > influencing the default, not the MAX. Why limit the MAX?  
> 
> The MAX is always allocated in the drivers probe function. In the 
> drivers open and ethtool-set-channels functions we set 
> real_num_tx_queues. So the number of allocated queues is always MAX
> but the number of queues actually in use may differ and can be set by 
> the user.
> I hope this explains. Otherwise, please let me know.

Perhaps I don't understand the worry. Is allowing 16 queues a problem
because it limits how many instances the hypervisor can support?
Or is the concern coming from your recent work on BQL and having many
queues exacerbating buffer bloat?
Nick Child Nov. 4, 2022, 6:15 p.m. UTC | #4
On 11/4/22 12:59, Jakub Kicinski wrote:
> On Fri, 4 Nov 2022 09:06:02 -0500 Nick Child wrote:
>> On 11/3/22 22:59, Jakub Kicinski wrote:
>>> On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:
>>>> Previously, the maximum number of transmit queues allowed was 16. Due to
>>>> resource concerns, limit to 8 queues instead.
>>>>
>>>> Since the driver is virtualized away from the physical NIC, the purpose
>>>> of multiple queues is purely to allow for parallel calls to the
>>>> hypervisor. Therefore, there is no noticeable effect on performance by
>>>> reducing queue count to 8.
>>>
>>> I'm not sure if that's the point Dave was making but we should be
>>> influencing the default, not the MAX. Why limit the MAX?
>>
>> The MAX is always allocated in the drivers probe function. In the
>> drivers open and ethtool-set-channels functions we set
>> real_num_tx_queues. So the number of allocated queues is always MAX
>> but the number of queues actually in use may differ and can be set by
>> the user.
>> I hope this explains. Otherwise, please let me know.
> 
> Perhaps I don't understand the worry. Is allowing 16 queues a problem
> because it limits how many instances the hypervisor can support?

No, the hypervisor is unaware of the number of netdev queues. The reason
for adding more netdev queues in the first place is to allow the higher
networking layers to make parallel calls to the drivers xmit function,
which the hypervisor can handle.

> Or is the concern coming from your recent work on BQL and having many
> queues exacerbating buffer bloat?

Yes, and Dave can jump in here if I am wrong, but, from my 
understanding, if the NIC cannot send packets at the rate that
they are queued then these queues will inevitably fill to txqueuelen.
In this case, having more queues will not mean better throughput but
will result in a large number of allocations sitting in queues 
(bufferbloat). I believe Dave's point was, if more queues does not
allow for better performance (and can risk bufferbloat) then why
have so many at all.

After going through testing and seeing no difference in performance
with 8 vs 16 queues, I would rather not have the driver be a culprit
of potential resource hogging.
Jakub Kicinski Nov. 4, 2022, 6:39 p.m. UTC | #5
On Fri, 4 Nov 2022 13:15:39 -0500 Nick Child wrote:
> > Or is the concern coming from your recent work on BQL and having many
> > queues exacerbating buffer bloat?  
> 
> Yes, and Dave can jump in here if I am wrong, but, from my 
> understanding, if the NIC cannot send packets at the rate that
> they are queued then these queues will inevitably fill to txqueuelen.
> In this case, having more queues will not mean better throughput but
> will result in a large number of allocations sitting in queues 
> (bufferbloat). I believe Dave's point was, if more queues does not
> allow for better performance (and can risk bufferbloat) then why
> have so many at all.
> 
> After going through testing and seeing no difference in performance
> with 8 vs 16 queues, I would rather not have the driver be a culprit
> of potential resource hogging.

Right, so my point was that user can always shoot themselves in the
foot with bad configs. You can leave the MAX at 16, in case someone
needs it. Limit the default real queues setting instead, most users
will use the default.
David Laight Nov. 5, 2022, 11:33 a.m. UTC | #6
From: Nick Child
> Sent: 04 November 2022 18:16
...
> Yes, and Dave can jump in here if I am wrong, but, from my
> understanding, if the NIC cannot send packets at the rate that
> they are queued then these queues will inevitably fill to txqueuelen.
> In this case, having more queues will not mean better throughput but
> will result in a large number of allocations sitting in queues
> (bufferbloat). I believe Dave's point was, if more queues does not
> allow for better performance (and can risk bufferbloat) then why
> have so many at all.

Isn't there another issue (noted in the tg3 driver) that if the
underlying hardware (or other consumer) is doing a round-robin
scan of the transmit queues then (IIRC) a lot of small packet
going through one queue can starve queues sending big packets.
Certainly tg3 has multiple tx queues disabled.

There is an associated problem with drivers that force the
number of transmit and receive rings (or whatever) to be the same.
The receive processing is far more expensive than transmit
(it is also much more critical - it doesn't really matter if
transmits get slightly delayed, but dropping rx packets is a PITA).
If threaded NAPI is enabled (to avoid issues with softint
processing) then you don't really need lots of threads for
transmit queues, but do need ones for the rx queues.
I had to use threaded NAPI with the threads running under
the RT scheduler to avoid packet loss (at 500,000 pkg/sec).

With tg3 four rx queues and one tx worked fine.

	David (not Dave)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 4f8357187292..6b5faf1feb0b 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -99,7 +99,7 @@  static inline long h_illan_attributes(unsigned long unit_address,
 #define IBMVETH_FILT_LIST_SIZE 4096
 #define IBMVETH_MAX_BUF_SIZE (1024 * 128)
 #define IBMVETH_MAX_TX_BUF_SIZE (1024 * 64)
-#define IBMVETH_MAX_QUEUES 16U
+#define IBMVETH_MAX_QUEUES 8U
 
 static int pool_size[] = { 512, 1024 * 2, 1024 * 16, 1024 * 32, 1024 * 64 };
 static int pool_count[] = { 256, 512, 256, 256, 256 };