Message ID | 20231006001614.1678782-1-quic_subashab@quicinc.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v4] net: qualcomm: rmnet: Add side band flow control support | expand |
On Thu, 5 Oct 2023 17:16:14 -0700 Subash Abhinov Kasiviswanathan wrote: > Individual rmnet devices map to specific network types such as internet, > multimedia messaging services, IP multimedia subsystem etc. Each of > these network types may support varying quality of service for different > bearers or traffic types. > > The physical device interconnect to radio hardware may support a > higher data rate than what is actually supported by the radio network. > Any packets transmitted to the radio hardware which exceed the radio > network data rate limit maybe dropped. This patch tries to minimize the > loss of packets by adding support for bearer level flow control within a > rmnet device by ensuring that the packets transmitted do not exceed the > limit allowed by the radio network. > > In order to support multiple bearers, rmnet must be created as a > multiqueue TX netdevice. Radio hardware communicates the supported > bearer information for a given network via side band signalling. > Consider the following mapping - > > IPv4 UDP port 1234 - Mark 0x1001 - Queue 1 > IPv6 TCP port 2345 - Mark 0x2001 - Queue 2 > > iptables can be used to install filters which mark packets matching these > specific traffic patterns and the RMNET_QUEUE_MAPPING_ADD operation can > then be to install the mapping of the mark to the specific txqueue. I don't understand why you need driver specific commands to do this. It should be easily achievable using existing TC qdisc infra. What's the gap?
On 10/9/2023 8:42 PM, Jakub Kicinski wrote: > On Thu, 5 Oct 2023 17:16:14 -0700 Subash Abhinov Kasiviswanathan wrote: >> Individual rmnet devices map to specific network types such as internet, >> multimedia messaging services, IP multimedia subsystem etc. Each of >> these network types may support varying quality of service for different >> bearers or traffic types. >> >> The physical device interconnect to radio hardware may support a >> higher data rate than what is actually supported by the radio network. >> Any packets transmitted to the radio hardware which exceed the radio >> network data rate limit maybe dropped. This patch tries to minimize the >> loss of packets by adding support for bearer level flow control within a >> rmnet device by ensuring that the packets transmitted do not exceed the >> limit allowed by the radio network. >> >> In order to support multiple bearers, rmnet must be created as a >> multiqueue TX netdevice. Radio hardware communicates the supported >> bearer information for a given network via side band signalling. >> Consider the following mapping - >> >> IPv4 UDP port 1234 - Mark 0x1001 - Queue 1 >> IPv6 TCP port 2345 - Mark 0x2001 - Queue 2 >> >> iptables can be used to install filters which mark packets matching these >> specific traffic patterns and the RMNET_QUEUE_MAPPING_ADD operation can >> then be to install the mapping of the mark to the specific txqueue. > > I don't understand why you need driver specific commands to do this. > It should be easily achievable using existing TC qdisc infra. > What's the gap? tc doesn't allow userspace to manipulate the flow state (allow / disallow traffic) on a specific queue. As I understand, the traffic dequeued / queued / dropped on a specific queue of existing qdiscs are controlled by the implementation of the qdisc itself.
On Mon, 9 Oct 2023 22:00:40 -0600 Subash Abhinov Kasiviswanathan (KS) wrote: > > I don't understand why you need driver specific commands to do this. > > It should be easily achievable using existing TC qdisc infra. > > What's the gap? > > tc doesn't allow userspace to manipulate the flow state (allow / > disallow traffic) on a specific queue. As I understand, the traffic > dequeued / queued / dropped on a specific queue of existing qdiscs are > controlled by the implementation of the qdisc itself. I'm not sure what you mean. Qdiscs can form hierarchies. You put mq first and then whatever child qdisc you want for individual queues.
On 10/10/2023 8:56 AM, Jakub Kicinski wrote: > On Mon, 9 Oct 2023 22:00:40 -0600 Subash Abhinov Kasiviswanathan (KS) > wrote: >>> I don't understand why you need driver specific commands to do this. >>> It should be easily achievable using existing TC qdisc infra. >>> What's the gap? >> >> tc doesn't allow userspace to manipulate the flow state (allow / >> disallow traffic) on a specific queue. As I understand, the traffic >> dequeued / queued / dropped on a specific queue of existing qdiscs are >> controlled by the implementation of the qdisc itself. > > I'm not sure what you mean. Qdiscs can form hierarchies. > You put mq first and then whatever child qdisc you want for individual > queues. There is no userspace interface exposed today currently to invoke netif_tx_stop_queue(dev, queue) / netif_tx_wake_queue(dev, queue). The API itself can only be invoked within kernel. I was wondering if it would be acceptable to add a user accessible interface in core networking to stop_queue / wake_queue instead of the driver.
On Tue, 10 Oct 2023 09:23:12 -0600 Subash Abhinov Kasiviswanathan (KS) wrote: > > I'm not sure what you mean. Qdiscs can form hierarchies. > > You put mq first and then whatever child qdisc you want for individual > > queues. > > There is no userspace interface exposed today currently to invoke > netif_tx_stop_queue(dev, queue) / netif_tx_wake_queue(dev, queue). The > API itself can only be invoked within kernel. > > I was wondering if it would be acceptable to add a user accessible > interface in core networking to stop_queue / wake_queue instead of the > driver. Maybe not driver queue control but if there's no qdisc which allows users to pause from user space, I think that would be a much easier sale. That said the flow of the whole thing seems a bit complex. Can't the driver somehow be notified by the device directly? User space will suffer from all sort of wake up / scheduling latencies, it'd be better if the whole sleep / wake thing was handled in the kernel.
On 10/10/2023 12:21 PM, Jakub Kicinski wrote: > On Tue, 10 Oct 2023 09:23:12 -0600 Subash Abhinov Kasiviswanathan (KS) > wrote: >> >> I was wondering if it would be acceptable to add a user accessible >> interface in core networking to stop_queue / wake_queue instead of the >> driver. > > Maybe not driver queue control but if there's no qdisc which allows > users to pause from user space, I think that would be a much easier > sale. > > That said the flow of the whole thing seems a bit complex. > Can't the driver somehow be notified by the device directly? > User space will suffer from all sort of wake up / scheduling > latencies, it'd be better if the whole sleep / wake thing was > handled in the kernel. Our userspace module relies on various inputs from radio hardware and has proprietary logic to determine when to transmit / stop sending packets corresponding to a specific bearer. I agree that an in kernel scheme might be faster than an userspace - kernel solution. However, I believe that this latency impact could be reduced through schemes like setting process priority, pinning applications in isolated cores etc.
On 10/10/2023 3:32 PM, Subash Abhinov Kasiviswanathan (KS) wrote: > > > On 10/10/2023 12:21 PM, Jakub Kicinski wrote: >> On Tue, 10 Oct 2023 09:23:12 -0600 Subash Abhinov Kasiviswanathan (KS) >> wrote: >>> >>> I was wondering if it would be acceptable to add a user accessible >>> interface in core networking to stop_queue / wake_queue instead of the >>> driver. >> >> Maybe not driver queue control but if there's no qdisc which allows >> users to pause from user space, I think that would be a much easier >> sale. >> >> That said the flow of the whole thing seems a bit complex. >> Can't the driver somehow be notified by the device directly? >> User space will suffer from all sort of wake up / scheduling >> latencies, it'd be better if the whole sleep / wake thing was >> handled in the kernel. > > Our userspace module relies on various inputs from radio hardware and > has proprietary logic to determine when to transmit / stop sending > packets corresponding to a specific bearer. I agree that an in kernel > scheme might be faster than an userspace - kernel solution. However, I > believe that this latency impact could be reduced through schemes like > setting process priority, pinning applications in isolated cores etc. After reviewing the qdisc set closer, it appears that my understanding was incorrect as tc-plug provides userspace controllable queuing. Thanks for the help and discussion!
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index 39d24e07f306..038d32ab84d6 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. + * Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved. * * RMNET configuration engine */ @@ -19,6 +20,7 @@ static const struct nla_policy rmnet_policy[IFLA_RMNET_MAX + 1] = { [IFLA_RMNET_MUX_ID] = { .type = NLA_U16 }, [IFLA_RMNET_FLAGS] = { .len = sizeof(struct ifla_rmnet_flags) }, + [IFLA_RMNET_QUEUE] = { .len = sizeof(struct rmnet_queue_mapping) }, }; static int rmnet_is_real_dev_registered(const struct net_device *real_dev) @@ -88,6 +90,66 @@ static int rmnet_register_real_device(struct net_device *real_dev, return 0; } +static int rmnet_update_queue_map(struct net_device *dev, u8 operation, + u8 txqueue, u32 mark, + struct netlink_ext_ack *extack) +{ + struct rmnet_priv *priv = netdev_priv(dev); + struct netdev_queue *q; + void *p; + u8 txq; + + if (unlikely(txqueue >= dev->num_tx_queues)) { + NL_SET_ERR_MSG_MOD(extack, "invalid txqueue"); + return -EINVAL; + } + + switch (operation) { + case RMNET_QUEUE_MAPPING_ADD: + p = xa_store(&priv->queue_map, mark, xa_mk_value(txqueue), + GFP_ATOMIC); + if (xa_is_err(p)) { + NL_SET_ERR_MSG_MOD(extack, "unable to add mapping"); + return xa_err(p); + } + break; + case RMNET_QUEUE_MAPPING_REMOVE: + p = xa_erase(&priv->queue_map, mark); + if (xa_is_err(p)) { + NL_SET_ERR_MSG_MOD(extack, "unable to remove mapping"); + return xa_err(p); + } + break; + case RMNET_QUEUE_ENABLE: + case RMNET_QUEUE_DISABLE: + p = xa_load(&priv->queue_map, mark); + if (p && xa_is_value(p)) { + txq = xa_to_value(p); + + q = netdev_get_tx_queue(dev, txq); + if (unlikely(!q)) { + NL_SET_ERR_MSG_MOD(extack, + "unsupported queue mapping"); + return -EINVAL; + } + + if (operation == RMNET_QUEUE_ENABLE) + netif_tx_wake_queue(q); + else + netif_tx_stop_queue(q); + } else { + NL_SET_ERR_MSG_MOD(extack, "invalid queue mapping"); + return -EINVAL; + } + break; + default: + NL_SET_ERR_MSG_MOD(extack, "unsupported queue operation"); + return -EOPNOTSUPP; + } + + return 0; +} + static void rmnet_unregister_bridge(struct rmnet_port *port) { struct net_device *bridge_dev, *real_dev, *rmnet_dev; @@ -175,8 +237,26 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev, netdev_dbg(dev, "data format [0x%08X]\n", data_format); port->data_format = data_format; + if (data[IFLA_RMNET_QUEUE]) { + struct rmnet_queue_mapping *queue_map; + + queue_map = nla_data(data[IFLA_RMNET_QUEUE]); + err = rmnet_update_queue_map(dev, queue_map->operation, + queue_map->txqueue, + queue_map->mark, extack); + if (err < 0) + goto err3; + + netdev_dbg(dev, "op %02x txq %02x mark %08x\n", + queue_map->operation, queue_map->txqueue, + queue_map->mark); + } + return 0; +err3: + hlist_del_init_rcu(&ep->hlnode); + netdev_upper_dev_unlink(real_dev, dev); err2: unregister_netdevice(dev); rmnet_vnd_dellink(mux_id, port, ep); @@ -352,6 +432,22 @@ static int rmnet_changelink(struct net_device *dev, struct nlattr *tb[], } } + if (data[IFLA_RMNET_QUEUE]) { + struct rmnet_queue_mapping *queue_map; + int err; + + queue_map = nla_data(data[IFLA_RMNET_QUEUE]); + err = rmnet_update_queue_map(dev, queue_map->operation, + queue_map->txqueue, + queue_map->mark, extack); + if (err < 0) + return err; + + netdev_dbg(dev, "op %02x txq %02x mark %08x\n", + queue_map->operation, queue_map->txqueue, + queue_map->mark); + } + return 0; } @@ -361,7 +457,9 @@ static size_t rmnet_get_size(const struct net_device *dev) /* IFLA_RMNET_MUX_ID */ nla_total_size(2) + /* IFLA_RMNET_FLAGS */ - nla_total_size(sizeof(struct ifla_rmnet_flags)); + nla_total_size(sizeof(struct ifla_rmnet_flags)) + + /* IFLA_RMNET_QUEUE */ + nla_total_size(sizeof(struct rmnet_queue_mapping)); } static int rmnet_fill_info(struct sk_buff *skb, const struct net_device *dev) diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h index ed112d51ac5a..ae8300fc5ed7 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (c) 2013-2014, 2016-2018, 2021 The Linux Foundation. * All rights reserved. + * Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved. * * RMNET Data configuration engine */ @@ -87,6 +88,7 @@ struct rmnet_priv { struct rmnet_pcpu_stats __percpu *pcpu_stats; struct gro_cells gro_cells; struct rmnet_priv_stats stats; + struct xarray queue_map; }; struct rmnet_port *rmnet_get_port_rcu(struct net_device *real_dev); diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c index 046b5f7d8e7c..de2792231293 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. + * Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved. * * RMNET Data virtual network driver */ @@ -158,6 +159,24 @@ static void rmnet_get_stats64(struct net_device *dev, s->tx_dropped = total_stats.tx_drops; } +static u16 rmnet_vnd_select_queue(struct net_device *dev, + struct sk_buff *skb, + struct net_device *sb_dev) +{ + struct rmnet_priv *priv = netdev_priv(dev); + void *p; + u8 txq; + + p = xa_load(&priv->queue_map, skb->mark); + if (!p || !xa_is_value(p)) + return 0; + + txq = xa_to_value(p); + + netdev_dbg(dev, "mark %08x -> txq %02x\n", skb->mark, txq); + return txq; +} + static const struct net_device_ops rmnet_vnd_ops = { .ndo_start_xmit = rmnet_vnd_start_xmit, .ndo_change_mtu = rmnet_vnd_change_mtu, @@ -167,6 +186,7 @@ static const struct net_device_ops rmnet_vnd_ops = { .ndo_init = rmnet_vnd_init, .ndo_uninit = rmnet_vnd_uninit, .ndo_get_stats64 = rmnet_get_stats64, + .ndo_select_queue = rmnet_vnd_select_queue, }; static const char rmnet_gstrings_stats[][ETH_GSTRING_LEN] = { @@ -334,6 +354,8 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev, priv->mux_id = id; + xa_init(&priv->queue_map); + netdev_dbg(rmnet_dev, "rmnet dev created\n"); } diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index fac351a93aed..452867d5246a 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1368,6 +1368,7 @@ enum { IFLA_RMNET_UNSPEC, IFLA_RMNET_MUX_ID, IFLA_RMNET_FLAGS, + IFLA_RMNET_QUEUE, __IFLA_RMNET_MAX, }; @@ -1378,6 +1379,21 @@ struct ifla_rmnet_flags { __u32 mask; }; +enum { + RMNET_QUEUE_OPERATION_UNSPEC, + RMNET_QUEUE_MAPPING_ADD, /* Add new queue <-> mark mapping */ + RMNET_QUEUE_MAPPING_REMOVE, /* Remove queue <-> mark mapping */ + RMNET_QUEUE_ENABLE, /* Allow traffic on an existing queue */ + RMNET_QUEUE_DISABLE, /* Stop traffic on an existing queue */ +}; + +struct rmnet_queue_mapping { + __u8 operation; + __u8 txqueue; + __u16 padding; + __u32 mark; +}; + /* MCTP section */ enum {