Message ID | 1664372913-26140-1-git-send-email-gauravkohli@linux.microsoft.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] hv_netvsc: Fix race between VF offering and VF association message from host | expand |
> -----Original Message----- > From: Gaurav Kohli <gauravkohli@linux.microsoft.com> > Sent: Wednesday, September 28, 2022 9:49 AM > To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang > <haiyangz@microsoft.com>; Stephen Hemminger > <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui > <decui@microsoft.com>; linux-hyperv@vger.kernel.org; > netdev@vger.kernel.org > Subject: [PATCH net] hv_netvsc: Fix race between VF offering and VF > association message from host > > During vm boot, there might be possibility that vf registration > call comes before the vf association from host to vm. > > And this might break netvsc vf path, To prevent the same block > vf registration until vf bind message comes from host. > > Cc: stable@vger.kernel.org > Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") > Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> By the way, did you use "git send-email"? I didn't see the stable@vger.kernel.org cc-ed in your original email.
On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote: > During vm boot, there might be possibility that vf registration > call comes before the vf association from host to vm. > > And this might break netvsc vf path, To prevent the same block > vf registration until vf bind message comes from host. > > Cc: stable@vger.kernel.org > Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") > Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Is it possible to add a timeout or such? Waiting for an external event while holding rtnl lock seems a little scary. The other question is - what protects the completion and ->vf_alloc from races? Is there some locking? ->vf_alloc only goes from 0 to 1 and never back?
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 29, 2022 10:26 PM > To: Gaurav Kohli <gauravkohli@linux.microsoft.com> > Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang > <haiyangz@microsoft.com>; Stephen Hemminger > <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui > <decui@microsoft.com>; linux-hyperv@vger.kernel.org; > netdev@vger.kernel.org > Subject: Re: [PATCH net] hv_netvsc: Fix race between VF offering and VF > association message from host > > On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote: > > During vm boot, there might be possibility that vf registration > > call comes before the vf association from host to vm. > > > > And this might break netvsc vf path, To prevent the same block > > vf registration until vf bind message comes from host. > > > > Cc: stable@vger.kernel.org > > Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") > > Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> > > Is it possible to add a timeout or such? Waiting for an external > event while holding rtnl lock seems a little scary. We used to have time-out in many places of this driver. But there is no protocol guarantees of the host response time, so the time out value cannot be set. These time-outs were removed several years ago. > The other question is - what protects the completion and ->vf_alloc > from races? Is there some locking? ->vf_alloc only goes from 0 to 1 > and never back? When Vf is removed, the vf_assoc msg will set it to 0 here: net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated; net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial; Also, I think this condition can be changed from: + if (vf_is_up && !net_device_ctx->vf_alloc) { to: + if (vf_is_up) { So when VF comes up, it always wait for the completion without depending on the vf_alloc. Thanks, - Haiyang
On 9/30/2022 6:33 PM, Haiyang Zhang wrote: > >> -----Original Message----- >> From: Jakub Kicinski <kuba@kernel.org> >> Sent: Thursday, September 29, 2022 10:26 PM >> To: Gaurav Kohli <gauravkohli@linux.microsoft.com> >> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang >> <haiyangz@microsoft.com>; Stephen Hemminger >> <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui >> <decui@microsoft.com>; linux-hyperv@vger.kernel.org; >> netdev@vger.kernel.org >> Subject: Re: [PATCH net] hv_netvsc: Fix race between VF offering and VF >> association message from host >> >> On Wed, 28 Sep 2022 06:48:33 -0700 Gaurav Kohli wrote: >>> During vm boot, there might be possibility that vf registration >>> call comes before the vf association from host to vm. >>> >>> And this might break netvsc vf path, To prevent the same block >>> vf registration until vf bind message comes from host. >>> >>> Cc: stable@vger.kernel.org >>> Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") >>> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> >> Is it possible to add a timeout or such? Waiting for an external >> event while holding rtnl lock seems a little scary. > We used to have time-out in many places of this driver. But there is > no protocol guarantees of the host response time, so the time out value > cannot be set. These time-outs were removed several years ago. > > >> The other question is - what protects the completion and ->vf_alloc >> from races? Is there some locking? ->vf_alloc only goes from 0 to 1 >> and never back? Thanks for the comment, i understand your concern for vf_alloc and reinit completion part, I think we can move reinit completion to unregistration part of vf code. Let me send v2 patch. > When Vf is removed, the vf_assoc msg will set it to 0 here: > net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated; > net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial; > > Also, I think this condition can be changed from: > + if (vf_is_up && !net_device_ctx->vf_alloc) { Thanks for the comment. This is needed to maintain state machine, as netvsc change event can comes multiple time. That's why i have put extra check to avoid any deadlock. > to: > + if (vf_is_up) { > So when VF comes up, it always wait for the completion without depending > on the vf_alloc. > > Thanks, > - Haiyang
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 25b38a374e3c..dd5919ec408b 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -1051,7 +1051,8 @@ struct net_device_context { u32 vf_alloc; /* Serial number of the VF to team with */ u32 vf_serial; - + /* completion variable to confirm vf association */ + struct completion vf_add; /* Is the current data path through the VF NIC? */ bool data_path_is_vf; diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 8bcba6f21aa9..6531d8d9f17f 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -1582,6 +1582,11 @@ static void netvsc_send_vf(struct net_device *ndev, net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated; net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial; + + if (net_device_ctx->vf_alloc) + complete(&net_device_ctx->vf_add); + else + reinit_completion(&net_device_ctx->vf_add); netdev_info(ndev, "VF slot %u %s\n", net_device_ctx->vf_serial, net_device_ctx->vf_alloc ? "added" : "removed"); diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index c27cb1267ca5..393c69430147 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2309,6 +2309,18 @@ static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev) } + /* Fallback path to check synthetic vf with + * help of mac addr + */ + list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) { + ndev = hv_get_drvdata(ndev_ctx->device_ctx); + if (ether_addr_equal(vf_netdev->perm_addr, ndev->perm_addr)) { + netdev_notice(vf_netdev, + "falling back to mac addr based matching\n"); + return ndev; + } + } + netdev_notice(vf_netdev, "no netdev found for vf serial:%u\n", serial); return NULL; @@ -2405,6 +2417,11 @@ static int netvsc_vf_changed(struct net_device *vf_netdev, unsigned long event) if (net_device_ctx->data_path_is_vf == vf_is_up) return NOTIFY_OK; + if (vf_is_up && !net_device_ctx->vf_alloc) { + netdev_info(ndev, "Waiting for the VF association from host\n"); + wait_for_completion(&net_device_ctx->vf_add); + } + ret = netvsc_switch_datapath(ndev, vf_is_up); if (ret) { @@ -2475,6 +2492,7 @@ static int netvsc_probe(struct hv_device *dev, INIT_DELAYED_WORK(&net_device_ctx->dwork, netvsc_link_change); + init_completion(&net_device_ctx->vf_add); spin_lock_init(&net_device_ctx->lock); INIT_LIST_HEAD(&net_device_ctx->reconfig_events); INIT_DELAYED_WORK(&net_device_ctx->vf_takeover, netvsc_vf_setup);
During vm boot, there might be possibility that vf registration call comes before the vf association from host to vm. And this might break netvsc vf path, To prevent the same block vf registration until vf bind message comes from host. Cc: stable@vger.kernel.org Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> --- drivers/net/hyperv/hyperv_net.h | 3 ++- drivers/net/hyperv/netvsc.c | 5 +++++ drivers/net/hyperv/netvsc_drv.c | 18 ++++++++++++++++++ 3 files changed, 25 insertions(+), 1 deletion(-)