[v2,net] ibmveth: Reduce maximum tx queues to 8

Message ID	20221102183837.157966-1-nnac123@linux.ibm.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Nick Child <nnac123@linux.ibm.com> To: netdev@vger.kernel.org Cc: nick.child@ibm.com, bjking1@linux.ibm.com, ricklind@us.ibm.com, dave.taht@gmail.com, Nick Child <nnac123@linux.ibm.com> Subject: [PATCH v2 net] ibmveth: Reduce maximum tx queues to 8 Date: Wed, 2 Nov 2022 13:38:37 -0500 Message-Id: <20221102183837.157966-1-nnac123@linux.ibm.com> Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Precedence: bulk
Series	[v2,net] ibmveth: Reduce maximum tx queues to 8 \| expand [v2,net] ibmveth: Reduce maximum tx queues to 8

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	fail	Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	warning	8 maintainers not CCed: linuxppc-dev@lists.ozlabs.org kuba@kernel.org mpe@ellerman.id.au davem@davemloft.net npiggin@gmail.com christophe.leroy@csgroup.eu edumazet@google.com pabeni@redhat.com
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	warning	WARNING: 'Taht' may be misspelled - perhaps 'That'? WARNING: 'taht' may be misspelled - perhaps 'that'?
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Nick Child Nov. 2, 2022, 6:38 p.m. UTC

Previously, the maximum number of transmit queues allowed was 16. Due to
resource concerns, limit to 8 queues instead.

Since the driver is virtualized away from the physical NIC, the purpose
of multiple queues is purely to allow for parallel calls to the
hypervisor. Therefore, there is no noticeable effect on performance by
reducing queue count to 8.

Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---
Relevant links:
 - Introduce multiple tx queues (accepted in v6.1):
   https://lore.kernel.org/netdev/20220928214350.29795-2-nnac123@linux.ibm.com/
 - Resource concerns with 16 queues:
   https://lore.kernel.org/netdev/CAA93jw5reJmaOvt9vw15C1fo1AN7q5jVKzUocbAoNDC-cpi=KQ@mail.gmail.com/
  - v1 (only change is commit message length and typo in Reported-by tag):
    https://lore.kernel.org/netdev/20221102153040.149244-1-nnac123@linux.ibm.com/
 drivers/net/ethernet/ibm/ibmveth.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Jakub Kicinski Nov. 4, 2022, 3:59 a.m. UTC | #1

On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:
> Previously, the maximum number of transmit queues allowed was 16. Due to
> resource concerns, limit to 8 queues instead.
> 
> Since the driver is virtualized away from the physical NIC, the purpose
> of multiple queues is purely to allow for parallel calls to the
> hypervisor. Therefore, there is no noticeable effect on performance by
> reducing queue count to 8.

I'm not sure if that's the point Dave was making but we should be
influencing the default, not the MAX. Why limit the MAX?

Nick Child Nov. 4, 2022, 2:06 p.m. UTC | #2

On 11/3/22 22:59, Jakub Kicinski wrote:
> On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:
>> Previously, the maximum number of transmit queues allowed was 16. Due to
>> resource concerns, limit to 8 queues instead.
>>
>> Since the driver is virtualized away from the physical NIC, the purpose
>> of multiple queues is purely to allow for parallel calls to the
>> hypervisor. Therefore, there is no noticeable effect on performance by
>> reducing queue count to 8.
> 
> I'm not sure if that's the point Dave was making but we should be
> influencing the default, not the MAX. Why limit the MAX?

The MAX is always allocated in the drivers probe function. In the 
drivers open and ethtool-set-channels functions we set 
real_num_tx_queues. So the number of allocated queues is always MAX
but the number of queues actually in use may differ and can be set by 
the user.
I hope this explains. Otherwise, please let me know.

Thanks again for taking the time to review,
Nick Child

Jakub Kicinski Nov. 4, 2022, 5:59 p.m. UTC | #3

On Fri, 4 Nov 2022 09:06:02 -0500 Nick Child wrote:
> On 11/3/22 22:59, Jakub Kicinski wrote:
> > On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:  
> >> Previously, the maximum number of transmit queues allowed was 16. Due to
> >> resource concerns, limit to 8 queues instead.
> >>
> >> Since the driver is virtualized away from the physical NIC, the purpose
> >> of multiple queues is purely to allow for parallel calls to the
> >> hypervisor. Therefore, there is no noticeable effect on performance by
> >> reducing queue count to 8.  
> > 
> > I'm not sure if that's the point Dave was making but we should be
> > influencing the default, not the MAX. Why limit the MAX?  
> 
> The MAX is always allocated in the drivers probe function. In the 
> drivers open and ethtool-set-channels functions we set 
> real_num_tx_queues. So the number of allocated queues is always MAX
> but the number of queues actually in use may differ and can be set by 
> the user.
> I hope this explains. Otherwise, please let me know.

Perhaps I don't understand the worry. Is allowing 16 queues a problem
because it limits how many instances the hypervisor can support?
Or is the concern coming from your recent work on BQL and having many
queues exacerbating buffer bloat?

Nick Child Nov. 4, 2022, 6:15 p.m. UTC | #4

On 11/4/22 12:59, Jakub Kicinski wrote:
> On Fri, 4 Nov 2022 09:06:02 -0500 Nick Child wrote:
>> On 11/3/22 22:59, Jakub Kicinski wrote:
>>> On Wed,  2 Nov 2022 13:38:37 -0500 Nick Child wrote:
>>>> Previously, the maximum number of transmit queues allowed was 16. Due to
>>>> resource concerns, limit to 8 queues instead.
>>>>
>>>> Since the driver is virtualized away from the physical NIC, the purpose
>>>> of multiple queues is purely to allow for parallel calls to the
>>>> hypervisor. Therefore, there is no noticeable effect on performance by
>>>> reducing queue count to 8.
>>>
>>> I'm not sure if that's the point Dave was making but we should be
>>> influencing the default, not the MAX. Why limit the MAX?
>>
>> The MAX is always allocated in the drivers probe function. In the
>> drivers open and ethtool-set-channels functions we set
>> real_num_tx_queues. So the number of allocated queues is always MAX
>> but the number of queues actually in use may differ and can be set by
>> the user.
>> I hope this explains. Otherwise, please let me know.
> 
> Perhaps I don't understand the worry. Is allowing 16 queues a problem
> because it limits how many instances the hypervisor can support?

No, the hypervisor is unaware of the number of netdev queues. The reason
for adding more netdev queues in the first place is to allow the higher
networking layers to make parallel calls to the drivers xmit function,
which the hypervisor can handle.

> Or is the concern coming from your recent work on BQL and having many
> queues exacerbating buffer bloat?

Yes, and Dave can jump in here if I am wrong, but, from my 
understanding, if the NIC cannot send packets at the rate that
they are queued then these queues will inevitably fill to txqueuelen.
In this case, having more queues will not mean better throughput but
will result in a large number of allocations sitting in queues 
(bufferbloat). I believe Dave's point was, if more queues does not
allow for better performance (and can risk bufferbloat) then why
have so many at all.

After going through testing and seeing no difference in performance
with 8 vs 16 queues, I would rather not have the driver be a culprit
of potential resource hogging.

Jakub Kicinski Nov. 4, 2022, 6:39 p.m. UTC | #5

On Fri, 4 Nov 2022 13:15:39 -0500 Nick Child wrote:
> > Or is the concern coming from your recent work on BQL and having many
> > queues exacerbating buffer bloat?  
> 
> Yes, and Dave can jump in here if I am wrong, but, from my 
> understanding, if the NIC cannot send packets at the rate that
> they are queued then these queues will inevitably fill to txqueuelen.
> In this case, having more queues will not mean better throughput but
> will result in a large number of allocations sitting in queues 
> (bufferbloat). I believe Dave's point was, if more queues does not
> allow for better performance (and can risk bufferbloat) then why
> have so many at all.
> 
> After going through testing and seeing no difference in performance
> with 8 vs 16 queues, I would rather not have the driver be a culprit
> of potential resource hogging.

Right, so my point was that user can always shoot themselves in the
foot with bad configs. You can leave the MAX at 16, in case someone
needs it. Limit the default real queues setting instead, most users
will use the default.

David Laight Nov. 5, 2022, 11:33 a.m. UTC | #6

From: Nick Child
> Sent: 04 November 2022 18:16
...
> Yes, and Dave can jump in here if I am wrong, but, from my
> understanding, if the NIC cannot send packets at the rate that
> they are queued then these queues will inevitably fill to txqueuelen.
> In this case, having more queues will not mean better throughput but
> will result in a large number of allocations sitting in queues
> (bufferbloat). I believe Dave's point was, if more queues does not
> allow for better performance (and can risk bufferbloat) then why
> have so many at all.

Isn't there another issue (noted in the tg3 driver) that if the
underlying hardware (or other consumer) is doing a round-robin
scan of the transmit queues then (IIRC) a lot of small packet
going through one queue can starve queues sending big packets.
Certainly tg3 has multiple tx queues disabled.

There is an associated problem with drivers that force the
number of transmit and receive rings (or whatever) to be the same.
The receive processing is far more expensive than transmit
(it is also much more critical - it doesn't really matter if
transmits get slightly delayed, but dropping rx packets is a PITA).
If threaded NAPI is enabled (to avoid issues with softint
processing) then you don't really need lots of threads for
transmit queues, but do need ones for the rx queues.
I had to use threaded NAPI with the threads running under
the RT scheduler to avoid packet loss (at 500,000 pkg/sec).

With tg3 four rx queues and one tx worked fine.

	David (not Dave)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

[v2,net] ibmveth: Reduce maximum tx queues to 8

Checks

Commit Message

Comments

Patch