Message ID | 980c6c3d-e10e-4459-8565-e8fbde122f00@suse.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 7b55984c96ffe9e236eb9c82a2196e0b1f84990d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] xen-netback: properly sync TX responses | expand |
On Mon, 29 Jan 2024 14:03:08 +0100 Jan Beulich wrote: > Invoking the make_tx_response() / push_tx_responses() pair with no lock > held would be acceptable only if all such invocations happened from the > same context (NAPI instance or dealloc thread). Since this isn't the > case, and since the interface "spec" also doesn't demand that multicast > operations may only be performed with no in-flight transmits, > MCAST_{ADD,DEL} processing also needs to acquire the response lock > around the invocations. > > To prevent similar mistakes going forward, "downgrade" the present > functions to private helpers of just the two remaining ones using them > directly, with no forward declarations anymore. This involves renaming > what so far was make_tx_response(), for the new function of that name > to serve the new (wrapper) purpose. > > While there, > - constify the txp parameters, > - correct xenvif_idx_release()'s status parameter's type, > - rename {,_}make_tx_response()'s status parameters for consistency with > xenvif_idx_release()'s. Hi Paul, is this one on your TODO list to review or should we do our best? :)
On 01/02/2024 00:23, Jakub Kicinski wrote: > On Mon, 29 Jan 2024 14:03:08 +0100 Jan Beulich wrote: >> Invoking the make_tx_response() / push_tx_responses() pair with no lock >> held would be acceptable only if all such invocations happened from the >> same context (NAPI instance or dealloc thread). Since this isn't the >> case, and since the interface "spec" also doesn't demand that multicast >> operations may only be performed with no in-flight transmits, >> MCAST_{ADD,DEL} processing also needs to acquire the response lock >> around the invocations. >> >> To prevent similar mistakes going forward, "downgrade" the present >> functions to private helpers of just the two remaining ones using them >> directly, with no forward declarations anymore. This involves renaming >> what so far was make_tx_response(), for the new function of that name >> to serve the new (wrapper) purpose. >> >> While there, >> - constify the txp parameters, >> - correct xenvif_idx_release()'s status parameter's type, >> - rename {,_}make_tx_response()'s status parameters for consistency with >> xenvif_idx_release()'s. > > Hi Paul, is this one on your TODO list to review or should > we do our best? :) Sorry for the delay. I'll take a look at this now. Paul
On 29/01/2024 13:03, Jan Beulich wrote: > Invoking the make_tx_response() / push_tx_responses() pair with no lock > held would be acceptable only if all such invocations happened from the > same context (NAPI instance or dealloc thread). Since this isn't the > case, and since the interface "spec" also doesn't demand that multicast > operations may only be performed with no in-flight transmits, > MCAST_{ADD,DEL} processing also needs to acquire the response lock > around the invocations. > > To prevent similar mistakes going forward, "downgrade" the present > functions to private helpers of just the two remaining ones using them > directly, with no forward declarations anymore. This involves renaming > what so far was make_tx_response(), for the new function of that name > to serve the new (wrapper) purpose. > > While there, > - constify the txp parameters, > - correct xenvif_idx_release()'s status parameter's type, > - rename {,_}make_tx_response()'s status parameters for consistency with > xenvif_idx_release()'s. > > Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control") > Cc: stable@vger.kernel.org > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > Of course this could be split into two or even more separate changes, > but I think these transformations are best done all in one go. > > It remains questionable whether push_tx_responses() really needs > invoking after every single _make_tx_response(). > > MCAST_{ADD,DEL} are odd also from another perspective: They're supposed > to come with "dummy requests", with the comment in the public header > leaving open what that means. IIRC the only reference I had at the time was Solaris code... I don't really there being any documentation of the feature. I think that https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/xen/io/xnb.c is probably similar to the code I looked at to determine what the Solaris frontend expected. So I think 'dummy' means 'ignored'. > Netback doesn't check dummy-ness (e.g. > size being zero). Furthermore the description in the public header > doesn't really make clear that there's a restriction of one such "extra" > per dummy request. Yet the way xenvif_get_extras() works precludes > multiple ADDs or multiple DELs in a single dummy request (only the last > one would be honored afaict). While the way xenvif_tx_build_gops() works > precludes an ADD and a DEL coming together in a single dummy request > (the DEL would be ignored). It appears the Solaris backend never coped with multiple extra_info so what the 'correct' semantic would be is unclear. Anyway... Reviewed-by: Paul Durrant <paul@xen.org> > > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -104,13 +104,12 @@ bool provides_xdp_headroom = true; > module_param(provides_xdp_headroom, bool, 0644); > > static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, > - u8 status); > + s8 status); > > static void make_tx_response(struct xenvif_queue *queue, > - struct xen_netif_tx_request *txp, > + const struct xen_netif_tx_request *txp, > unsigned int extra_count, > - s8 st); > -static void push_tx_responses(struct xenvif_queue *queue); > + s8 status); > > static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); > > @@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_ > unsigned int extra_count, RING_IDX end) > { > RING_IDX cons = queue->tx.req_cons; > - unsigned long flags; > > do { > - spin_lock_irqsave(&queue->response_lock, flags); > make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR); > - push_tx_responses(queue); > - spin_unlock_irqrestore(&queue->response_lock, flags); > if (cons == end) > break; > RING_COPY_REQUEST(&queue->tx, cons++, txp); > @@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x > for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS; > nr_slots--) { > if (unlikely(!txp->size)) { > - unsigned long flags; > - > - spin_lock_irqsave(&queue->response_lock, flags); > make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY); > - push_tx_responses(queue); > - spin_unlock_irqrestore(&queue->response_lock, flags); > ++txp; > continue; > } > @@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x > > for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) { > if (unlikely(!txp->size)) { > - unsigned long flags; > - > - spin_lock_irqsave(&queue->response_lock, flags); > make_tx_response(queue, txp, 0, > XEN_NETIF_RSP_OKAY); > - push_tx_responses(queue); > - spin_unlock_irqrestore(&queue->response_lock, > - flags); > continue; > } > > @@ -995,7 +979,6 @@ static void xenvif_tx_build_gops(struct > (ret == 0) ? > XEN_NETIF_RSP_OKAY : > XEN_NETIF_RSP_ERROR); > - push_tx_responses(queue); > continue; > } > > @@ -1007,7 +990,6 @@ static void xenvif_tx_build_gops(struct > > make_tx_response(queue, &txreq, extra_count, > XEN_NETIF_RSP_OKAY); > - push_tx_responses(queue); > continue; > } > > @@ -1433,8 +1415,35 @@ int xenvif_tx_action(struct xenvif_queue > return work_done; > } > > +static void _make_tx_response(struct xenvif_queue *queue, > + const struct xen_netif_tx_request *txp, > + unsigned int extra_count, > + s8 status) > +{ > + RING_IDX i = queue->tx.rsp_prod_pvt; > + struct xen_netif_tx_response *resp; > + > + resp = RING_GET_RESPONSE(&queue->tx, i); > + resp->id = txp->id; > + resp->status = status; > + > + while (extra_count-- != 0) > + RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL; > + > + queue->tx.rsp_prod_pvt = ++i; > +} > + > +static void push_tx_responses(struct xenvif_queue *queue) > +{ > + int notify; > + > + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify); > + if (notify) > + notify_remote_via_irq(queue->tx_irq); > +} > + > static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, > - u8 status) > + s8 status) > { > struct pending_tx_info *pending_tx_info; > pending_ring_idx_t index; > @@ -1444,8 +1453,8 @@ static void xenvif_idx_release(struct xe > > spin_lock_irqsave(&queue->response_lock, flags); > > - make_tx_response(queue, &pending_tx_info->req, > - pending_tx_info->extra_count, status); > + _make_tx_response(queue, &pending_tx_info->req, > + pending_tx_info->extra_count, status); > > /* Release the pending index before pusing the Tx response so > * its available before a new Tx request is pushed by the > @@ -1459,32 +1468,19 @@ static void xenvif_idx_release(struct xe > spin_unlock_irqrestore(&queue->response_lock, flags); > } > > - > static void make_tx_response(struct xenvif_queue *queue, > - struct xen_netif_tx_request *txp, > + const struct xen_netif_tx_request *txp, > unsigned int extra_count, > - s8 st) > + s8 status) > { > - RING_IDX i = queue->tx.rsp_prod_pvt; > - struct xen_netif_tx_response *resp; > - > - resp = RING_GET_RESPONSE(&queue->tx, i); > - resp->id = txp->id; > - resp->status = st; > - > - while (extra_count-- != 0) > - RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL; > + unsigned long flags; > > - queue->tx.rsp_prod_pvt = ++i; > -} > + spin_lock_irqsave(&queue->response_lock, flags); > > -static void push_tx_responses(struct xenvif_queue *queue) > -{ > - int notify; > + _make_tx_response(queue, txp, extra_count, status); > + push_tx_responses(queue); > > - RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify); > - if (notify) > - notify_remote_via_irq(queue->tx_irq); > + spin_unlock_irqrestore(&queue->response_lock, flags); > } > > static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 29 Jan 2024 14:03:08 +0100 you wrote: > Invoking the make_tx_response() / push_tx_responses() pair with no lock > held would be acceptable only if all such invocations happened from the > same context (NAPI instance or dealloc thread). Since this isn't the > case, and since the interface "spec" also doesn't demand that multicast > operations may only be performed with no in-flight transmits, > MCAST_{ADD,DEL} processing also needs to acquire the response lock > around the invocations. > > [...] Here is the summary with links: - [net] xen-netback: properly sync TX responses https://git.kernel.org/netdev/net/c/7b55984c96ff You are awesome, thank you!
--- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -104,13 +104,12 @@ bool provides_xdp_headroom = true; module_param(provides_xdp_headroom, bool, 0644); static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, - u8 status); + s8 status); static void make_tx_response(struct xenvif_queue *queue, - struct xen_netif_tx_request *txp, + const struct xen_netif_tx_request *txp, unsigned int extra_count, - s8 st); -static void push_tx_responses(struct xenvif_queue *queue); + s8 status); static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); @@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_ unsigned int extra_count, RING_IDX end) { RING_IDX cons = queue->tx.req_cons; - unsigned long flags; do { - spin_lock_irqsave(&queue->response_lock, flags); make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR); - push_tx_responses(queue); - spin_unlock_irqrestore(&queue->response_lock, flags); if (cons == end) break; RING_COPY_REQUEST(&queue->tx, cons++, txp); @@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS; nr_slots--) { if (unlikely(!txp->size)) { - unsigned long flags; - - spin_lock_irqsave(&queue->response_lock, flags); make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); - spin_unlock_irqrestore(&queue->response_lock, flags); ++txp; continue; } @@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) { if (unlikely(!txp->size)) { - unsigned long flags; - - spin_lock_irqsave(&queue->response_lock, flags); make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); - spin_unlock_irqrestore(&queue->response_lock, - flags); continue; } @@ -995,7 +979,6 @@ static void xenvif_tx_build_gops(struct (ret == 0) ? XEN_NETIF_RSP_OKAY : XEN_NETIF_RSP_ERROR); - push_tx_responses(queue); continue; } @@ -1007,7 +990,6 @@ static void xenvif_tx_build_gops(struct make_tx_response(queue, &txreq, extra_count, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); continue; } @@ -1433,8 +1415,35 @@ int xenvif_tx_action(struct xenvif_queue return work_done; } +static void _make_tx_response(struct xenvif_queue *queue, + const struct xen_netif_tx_request *txp, + unsigned int extra_count, + s8 status) +{ + RING_IDX i = queue->tx.rsp_prod_pvt; + struct xen_netif_tx_response *resp; + + resp = RING_GET_RESPONSE(&queue->tx, i); + resp->id = txp->id; + resp->status = status; + + while (extra_count-- != 0) + RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL; + + queue->tx.rsp_prod_pvt = ++i; +} + +static void push_tx_responses(struct xenvif_queue *queue) +{ + int notify; + + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify); + if (notify) + notify_remote_via_irq(queue->tx_irq); +} + static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, - u8 status) + s8 status) { struct pending_tx_info *pending_tx_info; pending_ring_idx_t index; @@ -1444,8 +1453,8 @@ static void xenvif_idx_release(struct xe spin_lock_irqsave(&queue->response_lock, flags); - make_tx_response(queue, &pending_tx_info->req, - pending_tx_info->extra_count, status); + _make_tx_response(queue, &pending_tx_info->req, + pending_tx_info->extra_count, status); /* Release the pending index before pusing the Tx response so * its available before a new Tx request is pushed by the @@ -1459,32 +1468,19 @@ static void xenvif_idx_release(struct xe spin_unlock_irqrestore(&queue->response_lock, flags); } - static void make_tx_response(struct xenvif_queue *queue, - struct xen_netif_tx_request *txp, + const struct xen_netif_tx_request *txp, unsigned int extra_count, - s8 st) + s8 status) { - RING_IDX i = queue->tx.rsp_prod_pvt; - struct xen_netif_tx_response *resp; - - resp = RING_GET_RESPONSE(&queue->tx, i); - resp->id = txp->id; - resp->status = st; - - while (extra_count-- != 0) - RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL; + unsigned long flags; - queue->tx.rsp_prod_pvt = ++i; -} + spin_lock_irqsave(&queue->response_lock, flags); -static void push_tx_responses(struct xenvif_queue *queue) -{ - int notify; + _make_tx_response(queue, txp, extra_count, status); + push_tx_responses(queue); - RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify); - if (notify) - notify_remote_via_irq(queue->tx_irq); + spin_unlock_irqrestore(&queue->response_lock, flags); } static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
Invoking the make_tx_response() / push_tx_responses() pair with no lock held would be acceptable only if all such invocations happened from the same context (NAPI instance or dealloc thread). Since this isn't the case, and since the interface "spec" also doesn't demand that multicast operations may only be performed with no in-flight transmits, MCAST_{ADD,DEL} processing also needs to acquire the response lock around the invocations. To prevent similar mistakes going forward, "downgrade" the present functions to private helpers of just the two remaining ones using them directly, with no forward declarations anymore. This involves renaming what so far was make_tx_response(), for the new function of that name to serve the new (wrapper) purpose. While there, - constify the txp parameters, - correct xenvif_idx_release()'s status parameter's type, - rename {,_}make_tx_response()'s status parameters for consistency with xenvif_idx_release()'s. Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control") Cc: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> --- Of course this could be split into two or even more separate changes, but I think these transformations are best done all in one go. It remains questionable whether push_tx_responses() really needs invoking after every single _make_tx_response(). MCAST_{ADD,DEL} are odd also from another perspective: They're supposed to come with "dummy requests", with the comment in the public header leaving open what that means. Netback doesn't check dummy-ness (e.g. size being zero). Furthermore the description in the public header doesn't really make clear that there's a restriction of one such "extra" per dummy request. Yet the way xenvif_get_extras() works precludes multiple ADDs or multiple DELs in a single dummy request (only the last one would be honored afaict). While the way xenvif_tx_build_gops() works precludes an ADD and a DEL coming together in a single dummy request (the DEL would be ignored).