diff mbox series

[net] hv_netvsc: Fix race between VF offering and VF association message from host

Message ID 1664372913-26140-1-git-send-email-gauravkohli@linux.microsoft.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] hv_netvsc: Fix race between VF offering and VF association message from host | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 4 maintainers not CCed: kuba@kernel.org pabeni@redhat.com edumazet@google.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 56 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Gaurav Kohli Sept. 28, 2022, 1:48 p.m. UTC
During vm boot, there might be possibility that vf registration
call comes before the vf association from host to vm.

And this might break netvsc vf path, To prevent the same block
vf registration until vf bind message comes from host.

Cc: stable@vger.kernel.org
Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number")
Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |  3 ++-
 drivers/net/hyperv/netvsc.c     |  5 +++++
 drivers/net/hyperv/netvsc_drv.c | 18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

Comments

Haiyang Zhang Sept. 29, 2022, 8:39 p.m. UTC | #1
> -----Original Message-----
> From: Gaurav Kohli <gauravkohli@linux.microsoft.com>
> Sent: Wednesday, September 28, 2022 9:49 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org
> Subject: [PATCH net] hv_netvsc: Fix race between VF offering and VF
> association message from host
> 
> During vm boot, there might be possibility that vf registration
> call comes before the vf association from host to vm.
> 
> And this might break netvsc vf path, To prevent the same block
> vf registration until vf bind message comes from host.
> 
> Cc: stable@vger.kernel.org
> Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number")
> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

By the way, did you use "git send-email"? I didn't see the stable@vger.kernel.org cc-ed in your original email.
Jakub Kicinski Sept. 30, 2022, 2:26 a.m. UTC | #2
On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote:
> During vm boot, there might be possibility that vf registration
> call comes before the vf association from host to vm.
> 
> And this might break netvsc vf path, To prevent the same block
> vf registration until vf bind message comes from host.
> 
> Cc: stable@vger.kernel.org
> Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number")
> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>

Is it possible to add a timeout or such? Waiting for an external 
event while holding rtnl lock seems a little scary.

The other question is - what protects the completion and ->vf_alloc
from races? Is there some locking? ->vf_alloc only goes from 0 to 1
and never back?
Haiyang Zhang Sept. 30, 2022, 1:03 p.m. UTC | #3
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, September 29, 2022 10:26 PM
> To: Gaurav Kohli <gauravkohli@linux.microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net] hv_netvsc: Fix race between VF offering and VF
> association message from host
> 
> On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote:
> > During vm boot, there might be possibility that vf registration
> > call comes before the vf association from host to vm.
> >
> > And this might break netvsc vf path, To prevent the same block
> > vf registration until vf bind message comes from host.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number")
> > Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>
> 
> Is it possible to add a timeout or such? Waiting for an external
> event while holding rtnl lock seems a little scary.

We used to have time-out in many places of this driver. But there is
no protocol guarantees of the host response time, so the time out value
cannot be set. These time-outs were removed several years ago.


> The other question is - what protects the completion and ->vf_alloc
> from races? Is there some locking? ->vf_alloc only goes from 0 to 1
> and never back?

When Vf is removed, the vf_assoc msg will set it to 0 here:
        net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
        net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;

Also, I think this condition can be changed from:
+	if (vf_is_up && !net_device_ctx->vf_alloc) {
to: 
+	if (vf_is_up) {
So when VF comes up, it always wait for the completion without depending
on the vf_alloc.

Thanks,
- Haiyang
Gaurav Kohli Oct. 6, 2022, 4:13 a.m. UTC | #4
On 9/30/2022 6:33 PM, Haiyang Zhang wrote:
>
>> -----Original Message-----
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Thursday, September 29, 2022 10:26 PM
>> To: Gaurav Kohli <gauravkohli@linux.microsoft.com>
>> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>> <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
>> <decui@microsoft.com>; linux-hyperv@vger.kernel.org;
>> netdev@vger.kernel.org
>> Subject: Re: [PATCH net] hv_netvsc: Fix race between VF offering and VF
>> association message from host
>>
>> On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote:
>>> During vm boot, there might be possibility that vf registration
>>> call comes before the vf association from host to vm.
>>>
>>> And this might break netvsc vf path, To prevent the same block
>>> vf registration until vf bind message comes from host.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number")
>>> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>
>> Is it possible to add a timeout or such? Waiting for an external
>> event while holding rtnl lock seems a little scary.
> We used to have time-out in many places of this driver. But there is
> no protocol guarantees of the host response time, so the time out value
> cannot be set. These time-outs were removed several years ago.
>
>
>> The other question is - what protects the completion and ->vf_alloc
>> from races? Is there some locking? ->vf_alloc only goes from 0 to 1
>> and never back?

Thanks for the comment, i understand your concern for vf_alloc and 
reinit completion part, I think

we can move reinit completion to unregistration part of vf code.

Let me send v2 patch.

> When Vf is removed, the vf_assoc msg will set it to 0 here:
>          net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
>          net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
>
> Also, I think this condition can be changed from:
> +	if (vf_is_up && !net_device_ctx->vf_alloc) {

Thanks for the comment.

This is needed to maintain state machine, as netvsc change event can 
comes multiple time. That's why i have put

extra check to avoid any deadlock.

> to:
> +	if (vf_is_up) {
> So when VF comes up, it always wait for the completion without depending
> on the vf_alloc.
>
> Thanks,
> - Haiyang
diff mbox series

Patch

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 25b38a374e3c..dd5919ec408b 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1051,7 +1051,8 @@  struct net_device_context {
 	u32 vf_alloc;
 	/* Serial number of the VF to team with */
 	u32 vf_serial;
-
+	/* completion variable to confirm vf association */
+	struct completion vf_add;
 	/* Is the current data path through the VF NIC? */
 	bool  data_path_is_vf;
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 8bcba6f21aa9..6531d8d9f17f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1582,6 +1582,11 @@  static void netvsc_send_vf(struct net_device *ndev,
 
 	net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
 	net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
+
+	if (net_device_ctx->vf_alloc)
+		complete(&net_device_ctx->vf_add);
+	else
+		reinit_completion(&net_device_ctx->vf_add);
 	netdev_info(ndev, "VF slot %u %s\n",
 		    net_device_ctx->vf_serial,
 		    net_device_ctx->vf_alloc ? "added" : "removed");
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c27cb1267ca5..393c69430147 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2309,6 +2309,18 @@  static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
 
 	}
 
+	/* Fallback path to check synthetic vf with
+	 * help of mac addr
+	 */
+	list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) {
+		ndev = hv_get_drvdata(ndev_ctx->device_ctx);
+		if (ether_addr_equal(vf_netdev->perm_addr, ndev->perm_addr)) {
+			netdev_notice(vf_netdev,
+				      "falling back to mac addr based matching\n");
+			return ndev;
+		}
+	}
+
 	netdev_notice(vf_netdev,
 		      "no netdev found for vf serial:%u\n", serial);
 	return NULL;
@@ -2405,6 +2417,11 @@  static int netvsc_vf_changed(struct net_device *vf_netdev, unsigned long event)
 	if (net_device_ctx->data_path_is_vf == vf_is_up)
 		return NOTIFY_OK;
 
+	if (vf_is_up && !net_device_ctx->vf_alloc) {
+		netdev_info(ndev, "Waiting for the VF association from host\n");
+		wait_for_completion(&net_device_ctx->vf_add);
+	}
+
 	ret = netvsc_switch_datapath(ndev, vf_is_up);
 
 	if (ret) {
@@ -2475,6 +2492,7 @@  static int netvsc_probe(struct hv_device *dev,
 
 	INIT_DELAYED_WORK(&net_device_ctx->dwork, netvsc_link_change);
 
+	init_completion(&net_device_ctx->vf_add);
 	spin_lock_init(&net_device_ctx->lock);
 	INIT_LIST_HEAD(&net_device_ctx->reconfig_events);
 	INIT_DELAYED_WORK(&net_device_ctx->vf_takeover, netvsc_vf_setup);