diff mbox series

[net] vmxnet3: use gro callback when UPT is enabled

Message ID 20230308222504.25675-1-doshir@vmware.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] vmxnet3: use gro callback when UPT is enabled | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 34 this patch: 34
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 34 this patch: 34
netdev/checkpatch warning WARNING: line length of 99 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ronak Doshi March 8, 2023, 10:25 p.m. UTC
Currently, vmxnet3 uses GRO callback only if LRO is disabled. However,
on smartNic based setups where UPT is supported, LRO can be enabled
from guest VM but UPT devicve does not support LRO as of now. In such
cases, there can be performance degradation as GRO is not being done.

This patch fixes this issue by calling GRO API when UPT is enabled. We
use updateRxProd to determine if UPT mode is active or not.

Cc: stable@vger.kernel.org
Fixes: 6f91f4ba046e ("vmxnet3: add support for capability registers")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Yunsheng Lin March 9, 2023, 12:34 a.m. UTC | #1
On 2023/3/9 6:25, Ronak Doshi wrote:
> Currently, vmxnet3 uses GRO callback only if LRO is disabled. However,
> on smartNic based setups where UPT is supported, LRO can be enabled
> from guest VM but UPT devicve does not support LRO as of now. In such
> cases, there can be performance degradation as GRO is not being done.
> 
> This patch fixes this issue by calling GRO API when UPT is enabled. We
> use updateRxProd to determine if UPT mode is active or not.
> 
> Cc: stable@vger.kernel.org
> Fixes: 6f91f4ba046e ("vmxnet3: add support for capability registers")
> Signed-off-by: Ronak Doshi <doshir@vmware.com>
> Acked-by: Guolin Yang <gyang@vmware.com>
> ---
>  drivers/net/vmxnet3/vmxnet3_drv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> index 682987040ea8..8f7ac7d85afc 100644
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -1688,7 +1688,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
>  			if (unlikely(rcd->ts))
>  				__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), rcd->tci);
>  
> -			if (adapter->netdev->features & NETIF_F_LRO)
> +			/* Use GRO callback if UPT is enabled */
> +			if ((adapter->netdev->features & NETIF_F_LRO) && !rq->shared->updateRxProd)

If UPT devicve does not support LRO, why not just clear the NETIF_F_LRO from
adapter->netdev->features?

With above change, it seems that LRO is supported for user' POV, but the GRO
is actually being done.

Also, if NETIF_F_LRO is set, do we need to clear the NETIF_F_GRO bit, so that
there is no confusion for user?

>  				netif_receive_skb(skb);
>  			else
>  				napi_gro_receive(&rq->napi, skb);
>
Ronak Doshi March 9, 2023, 10:50 p.m. UTC | #2
> > On 3/8/23, 4:34 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
> >
> > - if (adapter->netdev->features & NETIF_F_LRO)
> > + /* Use GRO callback if UPT is enabled */
> > + if ((adapter->netdev->features & NETIF_F_LRO) && !rq->shared->updateRxProd)
> >
> >
> If UPT devicve does not support LRO, why not just clear the NETIF_F_LRO from
> adapter->netdev->features?
>
>
> With above change, it seems that LRO is supported for user' POV, but the GRO
> is actually being done.
>
>
> Also, if NETIF_F_LRO is set, do we need to clear the NETIF_F_GRO bit, so that
> there is no confusion for user?

We cannot clear LRO bit as the virtual nic can run in either emulation or UPT mode.
When the vnic switches the mode between UPT and emulation, the guest vm is not
notified. Hence, we use updateRxProd which is shared in datapath to check what mode
is being run.

Also, we plan to add an event to notify the guest about this but that is for separate patch
and may take some time.

Thanks, 
Ronak
Yunsheng Lin March 10, 2023, 1:02 a.m. UTC | #3
On 2023/3/10 6:50, Ronak Doshi wrote:
> 
> > > On 3/8/23, 4:34 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>>>
>>> - if (adapter->netdev->features & NETIF_F_LRO)
>>> + /* Use GRO callback if UPT is enabled */
>>> + if ((adapter->netdev->features & NETIF_F_LRO) && !rq->shared->updateRxProd)
>>>
>>>
>> If UPT devicve does not support LRO, why not just clear the NETIF_F_LRO from
>> adapter->netdev->features?
>>
>>
>> With above change, it seems that LRO is supported for user' POV, but the GRO
>> is actually being done.
>>
>>
>> Also, if NETIF_F_LRO is set, do we need to clear the NETIF_F_GRO bit, so that
>> there is no confusion for user?
> 
> We cannot clear LRO bit as the virtual nic can run in either emulation or UPT mode.
> When the vnic switches the mode between UPT and emulation, the guest vm is not
> notified. Hence, we use updateRxProd which is shared in datapath to check what mode
> is being run.

So it is a run time thing? What happens if some LRO'ed packet is put in the rx queue,
and the the vnic switches the mode to UPT, is it ok for those LRO'ed packets to go through
the software GSO processing? If yes, why not just call napi_gro_receive() for LRO case too?

Looking closer, it seems vnic is implementing hw GRO from driver' view, as the driver is
setting skb_shinfo(skb)->gso_* accordingly:

https://elixir.bootlin.com/linux/latest/source/drivers/net/vmxnet3/vmxnet3_drv.c#L1665

In that case, you may call napi_gro_receive() for those GRO'ed skb too, see:

https://lore.kernel.org/netdev/166479721495.20474.5436625882203781290.git-patchwork-notify@kernel.org/T/

> 
> Also, we plan to add an event to notify the guest about this but that is for separate patch
> and may take some time.
> 
> Thanks, 
> Ronak 
>
Ronak Doshi March 14, 2023, 9:09 p.m. UTC | #4
> On 3/9/23, 5:02 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>
> So it is a run time thing? What happens if some LRO'ed packet is put in the rx queue,
> and the the vnic switches the mode to UPT, is it ok for those LRO'ed packets to go through
> the software GSO processing?
Yes, it should be fine.

> If yes, why not just call napi_gro_receive() for LRO case too?
>
We had done perf measurements in the past and it turned out this results in perf penalty.
See https://patchwork.ozlabs.org/project/netdev/patch/1308947605-4300-1-git-send-email-jesse@nicira.com/

In fact, internally recently we did some perf measurements on RHEL 9.0, and it still showed some penalty.

> Looking closer, it seems vnic is implementing hw GRO from driver' view, as the driver is
> setting skb_shinfo(skb)->gso_* accordingly:
>
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fnet%2Fvmxnet3%2Fvmxnet3_drv.c%23L1665&data=05%7C01%7Cdoshir%40vmware.com%7C68e4b3dbd7d948887f0808db21031e2c%>7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140069565449054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LAw6oCG2MgYH4TPQAnWUy25E2u%2FDMSW2aSJ7OY2%2FOu8%3D&reserved=0 <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fnet%2Fvmxnet3%2Fvmxnet3_drv.c%23L1665&amp;data=05%7C01%7Cdoshir%40vmware.com%7C68e4b3dbd7d948887f0808db21031e2c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140069565449054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LAw6oCG2MgYH4TPQAnWUy25E2u%2FDMSW2aSJ7OY2%2FOu8%3D&amp;reserved=0>
>
>
> In that case, you may call napi_gro_receive() for those GRO'ed skb too, see:
>

I see. Seems this got added recently. This will need re-evaluation by the team based on ToT Linux.
But this can be done in near future and as this might take time, for now this patch should be applied as
UPT patches are already up-streamed.

Thanks, 
Ronak
Yunsheng Lin March 15, 2023, 1:51 a.m. UTC | #5
On 2023/3/15 5:09, Ronak Doshi wrote:
> 
> > On 3/9/23, 5:02 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>>
>> So it is a run time thing? What happens if some LRO'ed packet is put in the rx queue,
>> and the the vnic switches the mode to UPT, is it ok for those LRO'ed packets to go through
>> the software GSO processing?
> Yes, it should be fine.
> 
>> If yes, why not just call napi_gro_receive() for LRO case too?
>>
> We had done perf measurements in the past and it turned out this results in perf penalty.
> See https://patchwork.ozlabs.org/project/netdev/patch/1308947605-4300-1-git-send-email-jesse@nicira.com/
> 
> In fact, internally recently we did some perf measurements on RHEL 9.0, and it still showed some penalty.

Does clearing the NETIF_F_GRO for netdev->features bring back the performance?
If no, maybe there is something need investigating.

> 
>> Looking closer, it seems vnic is implementing hw GRO from driver' view, as the driver is
>> setting skb_shinfo(skb)->gso_* accordingly:
>>
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fnet%2Fvmxnet3%2Fvmxnet3_drv.c%23L1665&data=05%7C01%7Cdoshir%40vmware.com%7C68e4b3dbd7d948887f0808db21031e2c%>7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140069565449054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LAw6oCG2MgYH4TPQAnWUy25E2u%2FDMSW2aSJ7OY2%2FOu8%3D&reserved=0 <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fnet%2Fvmxnet3%2Fvmxnet3_drv.c%23L1665&amp;data=05%7C01%7Cdoshir%40vmware.com%7C68e4b3dbd7d948887f0808db21031e2c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140069565449054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LAw6oCG2MgYH4TPQAnWUy25E2u%2FDMSW2aSJ7OY2%2FOu8%3D&amp;reserved=0>
>>
>>
>> In that case, you may call napi_gro_receive() for those GRO'ed skb too, see:
>>
> 
> I see. Seems this got added recently. This will need re-evaluation by the team based on ToT Linux.
> But this can be done in near future and as this might take time, for now this patch should be applied as
> UPT patches are already up-streamed.

Checking rq->shared->updateRxProd in the driver to decide if gro is allow does not seems right to
me, as the netstack has used the NETIF_F_GRO checking in netif_elide_gro().

Does clearing NETIF_F_GRO for netdev->features during the driver init process works for your
case?

As netdev->hw_features is for the driver to advertise the hw's capability, and the driver
can enable/disable specific capability by setting netdev->features during the driver init
process, and user can get to enable/disable specific capability using ethtool later if user
need to.

> 
> Thanks, 
> Ronak 
> 
>
Ronak Doshi March 15, 2023, 2:27 a.m. UTC | #6
> On 3/14/23, 6:52 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>
> Does clearing the NETIF_F_GRO for netdev->features bring back the performance?
> If no, maybe there is something need investigating.

Yes, it does. Simply using netif_receive_skb works fine.

> Checking rq->shared->updateRxProd in the driver to decide if gro is allow does not seems right to
> me, as the netstack has used the NETIF_F_GRO checking in netif_elide_gro().
>
updateRxProd is NOT being used to determine if GRO is allowed. It is being used to indicate UPT is
active, so the driver should just use GRO callback. This is as good as having only GRO callback for UPT driver
which you were suggesting earlier.

> Does clearing NETIF_F_GRO for netdev->features during the driver init process works for your
> case?

No this does not work as UPT mode can be enabled/disabled at runtime without guest being informed.
This is para-virtualized driver and does not know if the guest is being run in emulation or UPT.

> As netdev->hw_features is for the driver to advertise the hw's capability, and the driver
> can enable/disable specific capability by setting netdev->features during the driver init
> process, and user can get to enable/disable specific capability using ethtool later if user
> need to.

As I mentioned above, guest is not informed at runtime about UPT status. So, we need this
mechanism to avoid performance penalty.

Thanks,
Ronak
Yunsheng Lin March 15, 2023, 3:04 a.m. UTC | #7
On 2023/3/15 10:27, Ronak Doshi wrote:
> 
>> On 3/14/23, 6:52 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>>
>> Does clearing the NETIF_F_GRO for netdev->features bring back the performance?
>> If no, maybe there is something need investigating.
> 
> Yes, it does. Simply using netif_receive_skb works fine.
> 
>> Checking rq->shared->updateRxProd in the driver to decide if gro is allow does not seems right to
>> me, as the netstack has used the NETIF_F_GRO checking in netif_elide_gro().
>>
> updateRxProd is NOT being used to determine if GRO is allowed. It is being used to indicate UPT is
> active, so the driver should just use GRO callback. This is as good as having only GRO callback for UPT driver
> which you were suggesting earlier.
> 
>> Does clearing NETIF_F_GRO for netdev->features during the driver init process works for your
>> case?
> 
> No this does not work as UPT mode can be enabled/disabled at runtime without guest being informed.
> This is para-virtualized driver and does not know if the guest is being run in emulation or UPT.

I think checking updateRxProd in some way means the above para-virtualized driver need to
know if the guest is being run in emulation or UPT.

I am not sure how we can handle the runtime hw capability changing thing yet, that is why
I suggested setting the hw capability during the driver init process, then user can enable
or disable GRO if need to.

Suppose user enable the software GRO using ethtool, disabling the GRO through some runtime
checking seems against the will of the user.

Also, if you are able to "add an event to notify the guest about this", I suppose the
para-virtualized driver will clear the specific bit in netdev->hw_features and
netdev->features when handling the event? does user need to be notified about this, does
user get confusion about this change without notification?

IMHO, being para-virtualized driver does not make any difference, the users do not care if
they are configuring a netdev behind a para-virtualized driver or not.

> 
>> As netdev->hw_features is for the driver to advertise the hw's capability, and the driver
>> can enable/disable specific capability by setting netdev->features during the driver init
>> process, and user can get to enable/disable specific capability using ethtool later if user
>> need to.
> 
> As I mentioned above, guest is not informed at runtime about UPT status. So, we need this
> mechanism to avoid performance penalty.
> 
> Thanks,
> Ronak
> 
> 
>
Ronak Doshi March 15, 2023, 11:44 p.m. UTC | #8
> On 3/14/23, 8:05 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>
> I am not sure how we can handle the runtime hw capability changing thing yet, that is why
> I suggested setting the hw capability during the driver init process, then user can enable
> or disable GRO if need to.
>
It is not about enabling or disabling the LRO/GRO. It is about which callback to be used to
deliver the packets to the stack.

During init, the vnic will always come up in emulation (non-UPT) mode and user can request 
whichever feature they want (lro or gro or both). If it is in UPT mode, as we know UPT device
does not support LRO, we use gro API to deliver. If GRO is disabled by the user, then it can still
take the normal path. If in emulation (non-UPT) mode, ESXi will perform LRO.

> Suppose user enable the software GRO using ethtool, disabling the GRO through some runtime
> checking seems against the will of the user.
>
We are not disabling GRO here, it's either we perform LRO on ESXi or GRO in guest stack.


> Also, if you are able to "add an event to notify the guest about this", I suppose the
> para-virtualized driver will clear the specific bit in netdev->hw_features and
> netdev->features when handling the event? does user need to be notified about this, does
> user get confusion about this change without notification?
>
We won’t be changing any feature bits. It is just to let know the driver that UPT is active and it
should use GRO path instead of relying on ESXi LRO.

Thanks,
Ronak
Yunsheng Lin March 16, 2023, 1:47 a.m. UTC | #9
On 2023/3/16 7:44, Ronak Doshi wrote:
> 
>> On 3/14/23, 8:05 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>>
>> I am not sure how we can handle the runtime hw capability changing thing yet, that is why
>> I suggested setting the hw capability during the driver init process, then user can enable
>> or disable GRO if need to.
>>
> It is not about enabling or disabling the LRO/GRO. It is about which callback to be used to
> deliver the packets to the stack.

That's the piont I am trying to make.
If I understand it correctly, you can not change callback from napi_gro_receive() to
netif_receive_skb() when netdev->features has the NETIF_F_GRO bit set.

NETIF_F_GRO bit in netdev->features is to tell user that netstack will perform the
software GRO processing if the packet can be GRO'ed.

Calling netif_receive_skb() with NETIF_F_GRO bit set in netdev->features will cause
confusion for user, IMHO.

> 
> During init, the vnic will always come up in emulation (non-UPT) mode and user can request 
> whichever feature they want (lro or gro or both). If it is in UPT mode, as we know UPT device
> does not support LRO, we use gro API to deliver. If GRO is disabled by the user, then it can still
> take the normal path. If in emulation (non-UPT) mode, ESXi will perform LRO.
> 
>> Suppose user enable the software GRO using ethtool, disabling the GRO through some runtime
>> checking seems against the will of the user.
>>
> We are not disabling GRO here, it's either we perform LRO on ESXi or GRO in guest stack.

I means software GRO performed by netstack.
There are NETIF_F_GRO_HW and NETIF_F_LRO bit for GRO and LRO performed by hw. LRO on ESXi
is like hw offload in the eye of the driver in the guest, even if it is processed by some
software in the ESXi.

> 
> 
>> Also, if you are able to "add an event to notify the guest about this", I suppose the
>> para-virtualized driver will clear the specific bit in netdev->hw_features and
>> netdev->features when handling the event? does user need to be notified about this, does
>> user get confusion about this change without notification?
>>
> We won’t be changing any feature bits. It is just to let know the driver that UPT is active and it
> should use GRO path instead of relying on ESXi LRO.

As above, there is different feature bit for that, NETIF_F_LRO, NETIF_F_GRO and
NETIF_F_GRO_HW.
IMHO, deciding which callback to be used depending on some driver configuation
without corporation with the above feature bits does not seems right to me.

> 
> Thanks,
> Ronak
>
Ronak Doshi March 16, 2023, 4:03 a.m. UTC | #10
> On 3/15/23, 6:47 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
>
> That's the piont I am trying to make.
> If I understand it correctly, you can not change callback from napi_gro_receive() to
> netif_receive_skb() when netdev->features has the NETIF_F_GRO bit set.
Where are we doing this? Our preference is to use netif_receive_skb() only when LRO is enabled.
If both LRO and GRO are enabled on the vnic, which API should be used?

> NETIF_F_GRO bit in netdev->features is to tell user that netstack will perform the
> software GRO processing if the packet can be GRO'ed.
Even if the packet is already LRO'ed?

> Calling netif_receive_skb() with NETIF_F_GRO bit set in netdev->features will cause
> confusion for user, IMHO.
As long as LRO is enabled and performed by ESXi (which it will do), I don’t think user cares for GRO.
Even if we use napi_gro_receive() for such case, it degrades the performance as unnecessary cycles
are spend on an already LRO'ed packet.


> As above, there is different feature bit for that, NETIF_F_LRO, NETIF_F_GRO and
> NETIF_F_GRO_HW.
> IMHO, deciding which callback to be used depending on some driver configuation
> without corporation with the above feature bits does not seems right to me.

We are not neglecting feature bits. We just know that in UPT LRO won’t be done, so we by
default use napi_gro_receive() callback.

Thanks, 
Ronak
Jakub Kicinski March 16, 2023, 4:13 a.m. UTC | #11
On Thu, 16 Mar 2023 04:03:52 +0000 Ronak Doshi wrote:
> > Calling netif_receive_skb() with NETIF_F_GRO bit set in netdev->features will cause
> > confusion for user, IMHO.  
> As long as LRO is enabled and performed by ESXi (which it will do), I don’t think user cares for GRO.
> Even if we use napi_gro_receive() for such case, it degrades the performance as unnecessary cycles
> are spend on an already LRO'ed packet.

Can you provide some numbers to illustrate what the slow down is?
Ronak Doshi March 16, 2023, 5:21 a.m. UTC | #12
> On 3/15/23, 9:13 PM, "Jakub Kicinski" <kuba@kernel.org <mailto:kuba@kernel.org>> wrote:
> 
> Can you provide some numbers to illustrate what the slow down is?

Below are some sample test numbers collected by our perf team. 
                          Test                                    socket & msg size                          base               using only gro
1VM    14vcpu UDP stream receive        256K 256 bytes (packets/sec)    217.01 Kps    187.98 Kps         -13.37%
16VM  2vcpu   TCP stream send Thpt     8K     256 bytes (Gbps)                18.00 Gbps    17.02 Gbps         -5.44%
1VM    14vcpu ResponseTimeMean Receive (in micro secs)                      163 us             170 us                -4.29%

In the past as well similar test was done. See
https://patchwork.ozlabs.org/project/netdev/patch/1308947605-4300-1-git-send-email-jesse@nicira.com/

But, unfortunately there are no stats present in that discussion.

Thanks,
Ronak
Jakub Kicinski March 16, 2023, 8:34 p.m. UTC | #13
On Thu, 16 Mar 2023 05:21:42 +0000 Ronak Doshi wrote:
> Below are some sample test numbers collected by our perf team. 
>                           Test                                    socket & msg size                          base               using only gro
> 1VM    14vcpu UDP stream receive        256K 256 bytes (packets/sec)    217.01 Kps    187.98 Kps         -13.37%
> 16VM  2vcpu   TCP stream send Thpt     8K     256 bytes (Gbps)                18.00 Gbps    17.02 Gbps         -5.44%
> 1VM    14vcpu ResponseTimeMean Receive (in micro secs)                      163 us             170 us                -4.29%

A bit more than I suspected, thanks for the data.
Yunsheng Lin March 17, 2023, 2:37 a.m. UTC | #14
On 2023/3/17 4:34, Jakub Kicinski wrote:
> On Thu, 16 Mar 2023 05:21:42 +0000 Ronak Doshi wrote:
>> Below are some sample test numbers collected by our perf team. 
>>                           Test                                    socket & msg size                          base               using only gro
>> 1VM    14vcpu UDP stream receive        256K 256 bytes (packets/sec)    217.01 Kps    187.98 Kps         -13.37%
>> 16VM  2vcpu   TCP stream send Thpt     8K     256 bytes (Gbps)                18.00 Gbps    17.02 Gbps         -5.44%
>> 1VM    14vcpu ResponseTimeMean Receive (in micro secs)                      163 us             170 us                -4.29%
> 
> A bit more than I suspected, thanks for the data.

Maybe we do some investigation to find out why the performace lost is more than
suspected first.

For example if LRO'ed skb is added in gro_list->list, and then new LRO'ed skb from
the same flow only go through the whole GSO processing only to find out we have to
flush out the old LRO'ed in the gro_list->list, and add new LRO'ed skb in gro_list->list
again?


> .
>
Ronak Doshi March 17, 2023, 8:27 p.m. UTC | #15
> On 3/16/23, 7:37 PM, "Yunsheng Lin" <linyunsheng@huawei.com <mailto:linyunsheng@huawei.com>> wrote:
> > On 2023/3/17 4:34, Jakub Kicinski wrote:
> > On Thu, 16 Mar 2023 05:21:42 +0000 Ronak Doshi wrote:
> >> Below are some sample test numbers collected by our perf team.
> >> Test socket & msg size base using only gro
> >> 1VM 14vcpu UDP stream receive 256K 256 bytes (packets/sec) 217.01 Kps 187.98 Kps -13.37%
> >> 16VM 2vcpu TCP stream send Thpt 8K 256 bytes (Gbps) 18.00 Gbps 17.02 Gbps -5.44%
> >> 1VM 14vcpu ResponseTimeMean Receive (in micro secs) 163 us 170 us -4.29%
> >
> > A bit more than I suspected, thanks for the data.
>
> Maybe we do some investigation to find out why the performace lost is more than
> suspected first.
>

I don’t think holding this patch to investigate why it takes longer in GRO is worthwhile.
That is a separate issue. UPT patches are already upstreamed to Linux and cross-ported to
relevant distros for customers to use. We need to apply this patch to avoid the performance
degradation in UPT mode as LRO is not available on UPT device.

I don’t see a functional issue with this patch. In UPT as LRO is not available, it needs to use GRO.

Thanks, 
Ronak
Jakub Kicinski March 18, 2023, 2:43 a.m. UTC | #16
On Fri, 17 Mar 2023 20:27:50 +0000 Ronak Doshi wrote:
> I don’t think holding this patch to investigate why it takes longer in GRO is worthwhile.
> That is a separate issue. UPT patches are already upstreamed to Linux and cross-ported to
> relevant distros for customers to use. We need to apply this patch to avoid the performance
> degradation in UPT mode as LRO is not available on UPT device.
> 
> I don’t see a functional issue with this patch. In UPT as LRO is not available, it needs to use GRO.

Fine by me, FWIW, but please respin the patch and feed some of 
the discussion into the commit message.
diff mbox series

Patch

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 682987040ea8..8f7ac7d85afc 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1688,7 +1688,8 @@  vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
 			if (unlikely(rcd->ts))
 				__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), rcd->tci);
 
-			if (adapter->netdev->features & NETIF_F_LRO)
+			/* Use GRO callback if UPT is enabled */
+			if ((adapter->netdev->features & NETIF_F_LRO) && !rq->shared->updateRxProd)
 				netif_receive_skb(skb);
 			else
 				napi_gro_receive(&rq->napi, skb);