Message ID | 20241102125136.5030-3-annaemesenyiri@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | support SO_PRIORITY cmsg | expand |
Anna Emese Nyiri wrote: > The Linux socket API currently supports setting SO_PRIORITY at the > socket level, which applies a uniform priority to all packets sent > through that socket. The only exception is IP_TOS; if specified as > ancillary data, the packet does not inherit the socket's priority. > Instead, the priority value is computed when handling the ancillary > data (as implemented in commit <f02db315b8d88> nit: drop the brackets > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If > the priority is set via IP_TOS, then skb->priority derives its value > from the rt_tos2priority function, which calculates the priority > based on the value of ipc->tos obtained from IP_TOS. However, if > IP_TOS is not used and the priority has been set through a control > message, skb->priority will take the value provided by that control > message. The above describes the new situation? There is no way to set priority to a control message prior to this patch. > Therefore, when both options are available, the primary > source for skb->priority is the value set via IP_TOS. > > Currently, there is no option to set the priority directly from > userspace on a per-packet basis. The following changes allow > SO_PRIORITY to be set through control messages (CMSG), giving > userspace applications more granular control over packet priorities. > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY Duplicate statement. Overall, the explanation can perhaps be condensed and made more clear. > is specified as ancillary data, the packet is sent with the priority > value set through sockc->priority_cmsg_value, overriding the No longer matches the code. > socket-level values set via the traditional setsockopt() method. This > is analogous to existing support for SO_MARK (as implemented in commit > <c6af0c227a22> ("ip: support SO_MARK cmsg")). > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > --- > include/net/inet_sock.h | 2 +- > include/net/ip.h | 3 ++- > include/net/sock.h | 4 +++- > net/can/raw.c | 2 +- > net/core/sock.c | 8 ++++++++ > net/ipv4/ip_output.c | 7 +++++-- > net/ipv4/raw.c | 2 +- > net/ipv6/ip6_output.c | 3 ++- > net/ipv6/raw.c | 2 +- > net/packet/af_packet.c | 2 +- > 10 files changed, 25 insertions(+), 10 deletions(-) > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > index 56d8bc5593d3..3ccbad881d74 100644 > --- a/include/net/inet_sock.h > +++ b/include/net/inet_sock.h > @@ -172,7 +172,7 @@ struct inet_cork { > u8 tx_flags; > __u8 ttl; > __s16 tos; > - char priority; > + u32 priority; Let's check with pahole how this affects struct size and holes. It likely adds a hole, but unavoidably so. > __u16 gso_size; > u32 ts_opt_id; > u64 transmit_time; > diff --git a/include/net/ip.h b/include/net/ip.h > index 0e548c1f2a0e..e8f71a191277 100644 > --- a/include/net/ip.h > +++ b/include/net/ip.h > @@ -81,7 +81,7 @@ struct ipcm_cookie { > __u8 protocol; > __u8 ttl; > __s16 tos; > - char priority; > + u32 priority; No need for a field in ipcm_cookie, when also present in sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is the right location. If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with rt_tos2priority. Interesting that this override by IP_TOS seems to be IPV4 only. There is no equivalent call to rt_tos2priority when setting IPV6_TCLASS. > __u16 gso_size; > }; > > @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, > ipcm_init(ipcm); > > ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); > + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority); > ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); > ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); > ipcm->addr = inet->inet_saddr; > diff --git a/include/net/sock.h b/include/net/sock.h > index 7464e9f9f47c..316a34d6c48b 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1814,13 +1814,15 @@ struct sockcm_cookie { > u32 mark; > u32 tsflags; > u32 ts_opt_id; > + u32 priority; > }; > > static inline void sockcm_init(struct sockcm_cookie *sockc, > const struct sock *sk) > { > *sockc = (struct sockcm_cookie) { > - .tsflags = READ_ONCE(sk->sk_tsflags) > + .tsflags = READ_ONCE(sk->sk_tsflags), > + .priority = READ_ONCE(sk->sk_priority), > }; > } > > diff --git a/net/can/raw.c b/net/can/raw.c > index 255c0a8f39d6..46e8ed9d64da 100644 > --- a/net/can/raw.c > +++ b/net/can/raw.c > @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) > } > > skb->dev = dev; > - skb->priority = READ_ONCE(sk->sk_priority); > + skb->priority = sockc.priority; > skb->mark = READ_ONCE(sk->sk_mark); > skb->tstamp = sockc.transmit_time; > > diff --git a/net/core/sock.c b/net/core/sock.c > index 5ecf6f1a470c..d5586b9212dd 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, > case SCM_RIGHTS: > case SCM_CREDENTIALS: > break; > + case SO_PRIORITY: > + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) > + return -EINVAL; > + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) { > + sockc->priority = *(u32 *)CMSG_DATA(cmsg); > + break; > + } > + return -EPERM; nit: invert to make the error case the (speculated as unlikely) branch and have the common path unindented. > default: > return -EINVAL; > }
Willem de Bruijn <willemdebruijn.kernel@gmail.com> ezt írta (időpont: 2024. nov. 3., V, 2:27): > > Anna Emese Nyiri wrote: > > The Linux socket API currently supports setting SO_PRIORITY at the > > socket level, which applies a uniform priority to all packets sent > > through that socket. The only exception is IP_TOS; if specified as > > ancillary data, the packet does not inherit the socket's priority. > > Instead, the priority value is computed when handling the ancillary > > data (as implemented in commit <f02db315b8d88> > > nit: drop the brackets > > > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If > > the priority is set via IP_TOS, then skb->priority derives its value > > from the rt_tos2priority function, which calculates the priority > > based on the value of ipc->tos obtained from IP_TOS. However, if > > IP_TOS is not used and the priority has been set through a control > > message, skb->priority will take the value provided by that control > > message. > > The above describes the new situation? There is no way to set > priority to a control message prior to this patch. > > > Therefore, when both options are available, the primary > > source for skb->priority is the value set via IP_TOS. > > > > Currently, there is no option to set the priority directly from > > userspace on a per-packet basis. The following changes allow > > SO_PRIORITY to be set through control messages (CMSG), giving > > userspace applications more granular control over packet priorities. > > > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY > > Duplicate statement. Overall, the explanation can perhaps be > condensed and made more clear. > > > is specified as ancillary data, the packet is sent with the priority > > value set through sockc->priority_cmsg_value, overriding the > > No longer matches the code. > > > socket-level values set via the traditional setsockopt() method. This > > is analogous to existing support for SO_MARK (as implemented in commit > > <c6af0c227a22> ("ip: support SO_MARK cmsg")). > > > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > > --- > > include/net/inet_sock.h | 2 +- > > include/net/ip.h | 3 ++- > > include/net/sock.h | 4 +++- > > net/can/raw.c | 2 +- > > net/core/sock.c | 8 ++++++++ > > net/ipv4/ip_output.c | 7 +++++-- > > net/ipv4/raw.c | 2 +- > > net/ipv6/ip6_output.c | 3 ++- > > net/ipv6/raw.c | 2 +- > > net/packet/af_packet.c | 2 +- > > 10 files changed, 25 insertions(+), 10 deletions(-) > > > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > > index 56d8bc5593d3..3ccbad881d74 100644 > > --- a/include/net/inet_sock.h > > +++ b/include/net/inet_sock.h > > @@ -172,7 +172,7 @@ struct inet_cork { > > u8 tx_flags; > > __u8 ttl; > > __s16 tos; > > - char priority; > > + u32 priority; > > Let's check with pahole how this affects struct size and holes. > It likely adds a hole, but unavoidably so. > > > __u16 gso_size; > > u32 ts_opt_id; > > u64 transmit_time; > > diff --git a/include/net/ip.h b/include/net/ip.h > > index 0e548c1f2a0e..e8f71a191277 100644 > > --- a/include/net/ip.h > > +++ b/include/net/ip.h > > @@ -81,7 +81,7 @@ struct ipcm_cookie { > > __u8 protocol; > > __u8 ttl; > > __s16 tos; > > - char priority; > > + u32 priority; > > No need for a field in ipcm_cookie, when also present in > sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is > the right location. I think there could be a problem if the priority is set by IP_TOS for some reason, and then also via cmsg. The latter value may overwrite it. In the ip_setup_cork() function, there is therefore a check for the value cork->tos != -1 to give priority to the value set by IP_TOS. And that's why I thought that there should be a priority field in both ipcm_cookie and sockcm_cookie. The priority field already existed in ipcm_cookie, I didn't add it. I just changed the type. > > If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with > rt_tos2priority. > > Interesting that this override by IP_TOS seems to be IPV4 only. There > is no equivalent call to rt_tos2priority when setting IPV6_TCLASS. > > > __u16 gso_size; > > }; > > > > @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, > > ipcm_init(ipcm); > > > > ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); > > + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority); > > ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); > > ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); > > ipcm->addr = inet->inet_saddr; > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 7464e9f9f47c..316a34d6c48b 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -1814,13 +1814,15 @@ struct sockcm_cookie { > > u32 mark; > > u32 tsflags; > > u32 ts_opt_id; > > + u32 priority; > > }; > > > > static inline void sockcm_init(struct sockcm_cookie *sockc, > > const struct sock *sk) > > { > > *sockc = (struct sockcm_cookie) { > > - .tsflags = READ_ONCE(sk->sk_tsflags) > > + .tsflags = READ_ONCE(sk->sk_tsflags), > > + .priority = READ_ONCE(sk->sk_priority), > > }; > > } > > > > diff --git a/net/can/raw.c b/net/can/raw.c > > index 255c0a8f39d6..46e8ed9d64da 100644 > > --- a/net/can/raw.c > > +++ b/net/can/raw.c > > @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) > > } > > > > skb->dev = dev; > > - skb->priority = READ_ONCE(sk->sk_priority); > > + skb->priority = sockc.priority; > > skb->mark = READ_ONCE(sk->sk_mark); > > skb->tstamp = sockc.transmit_time; > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 5ecf6f1a470c..d5586b9212dd 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, > > case SCM_RIGHTS: > > case SCM_CREDENTIALS: > > break; > > + case SO_PRIORITY: > > + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) > > + return -EINVAL; > > + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) { > > + sockc->priority = *(u32 *)CMSG_DATA(cmsg); > > + break; > > + } > > + return -EPERM; > > nit: invert to make the error case the (speculated as unlikely) branch > and have the common path unindented. > > > default: > > return -EINVAL; > > }
Anna Nyiri wrote: > Willem de Bruijn <willemdebruijn.kernel@gmail.com> ezt írta (időpont: > 2024. nov. 3., V, 2:27): > > > > Anna Emese Nyiri wrote: > > > The Linux socket API currently supports setting SO_PRIORITY at the > > > socket level, which applies a uniform priority to all packets sent > > > through that socket. The only exception is IP_TOS; if specified as > > > ancillary data, the packet does not inherit the socket's priority. > > > Instead, the priority value is computed when handling the ancillary > > > data (as implemented in commit <f02db315b8d88> > > > > nit: drop the brackets > > > > > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If > > > the priority is set via IP_TOS, then skb->priority derives its value > > > from the rt_tos2priority function, which calculates the priority > > > based on the value of ipc->tos obtained from IP_TOS. However, if > > > IP_TOS is not used and the priority has been set through a control > > > message, skb->priority will take the value provided by that control > > > message. > > > > The above describes the new situation? There is no way to set > > priority to a control message prior to this patch. > > > > > Therefore, when both options are available, the primary > > > source for skb->priority is the value set via IP_TOS. > > > > > > Currently, there is no option to set the priority directly from > > > userspace on a per-packet basis. The following changes allow > > > SO_PRIORITY to be set through control messages (CMSG), giving > > > userspace applications more granular control over packet priorities. > > > > > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY > > > > Duplicate statement. Overall, the explanation can perhaps be > > condensed and made more clear. > > > > > is specified as ancillary data, the packet is sent with the priority > > > value set through sockc->priority_cmsg_value, overriding the > > > > No longer matches the code. > > > > > socket-level values set via the traditional setsockopt() method. This > > > is analogous to existing support for SO_MARK (as implemented in commit > > > <c6af0c227a22> ("ip: support SO_MARK cmsg")). > > > > > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> > > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > > > --- > > > include/net/inet_sock.h | 2 +- > > > include/net/ip.h | 3 ++- > > > include/net/sock.h | 4 +++- > > > net/can/raw.c | 2 +- > > > net/core/sock.c | 8 ++++++++ > > > net/ipv4/ip_output.c | 7 +++++-- > > > net/ipv4/raw.c | 2 +- > > > net/ipv6/ip6_output.c | 3 ++- > > > net/ipv6/raw.c | 2 +- > > > net/packet/af_packet.c | 2 +- > > > 10 files changed, 25 insertions(+), 10 deletions(-) > > > > > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > > > index 56d8bc5593d3..3ccbad881d74 100644 > > > --- a/include/net/inet_sock.h > > > +++ b/include/net/inet_sock.h > > > @@ -172,7 +172,7 @@ struct inet_cork { > > > u8 tx_flags; > > > __u8 ttl; > > > __s16 tos; > > > - char priority; > > > + u32 priority; > > > > Let's check with pahole how this affects struct size and holes. > > It likely adds a hole, but unavoidably so. > > > > > __u16 gso_size; > > > u32 ts_opt_id; > > > u64 transmit_time; > > > diff --git a/include/net/ip.h b/include/net/ip.h > > > index 0e548c1f2a0e..e8f71a191277 100644 > > > --- a/include/net/ip.h > > > +++ b/include/net/ip.h > > > @@ -81,7 +81,7 @@ struct ipcm_cookie { > > > __u8 protocol; > > > __u8 ttl; > > > __s16 tos; > > > - char priority; > > > + u32 priority; > > > > No need for a field in ipcm_cookie, when also present in > > sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is > > the right location. > > I think there could be a problem if the priority is set by IP_TOS for > some reason, and then also via cmsg. The latter value may overwrite > it. In the ip_setup_cork() function, there is therefore a check for > the value cork->tos != -1 to give priority to the value set by IP_TOS. > And that's why I thought that there should be a priority field in both > ipcm_cookie and sockcm_cookie. The priority field already existed in > ipcm_cookie, I didn't add it. I just changed the type. The existing behavior that adds a branch in the hot path is actually not needed. The preferred pattern is that the cookie is initialized with the sk field, and then optionally overwritten when parsing cmsgs. The path is slightly complicated by the fact that ipcm_init_sk does not call sockcm_init, but more or less open codes that. The callers of ipcm_init_sk are datagram sockets that have more opportunities to override per-socket options on a per-packet basis. So I was wrong that the field only has to be initialized in sockcm_init. It will have to be initialized by both initializers. But still only a u32 single field is needed.
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 56d8bc5593d3..3ccbad881d74 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -172,7 +172,7 @@ struct inet_cork { u8 tx_flags; __u8 ttl; __s16 tos; - char priority; + u32 priority; __u16 gso_size; u32 ts_opt_id; u64 transmit_time; diff --git a/include/net/ip.h b/include/net/ip.h index 0e548c1f2a0e..e8f71a191277 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -81,7 +81,7 @@ struct ipcm_cookie { __u8 protocol; __u8 ttl; __s16 tos; - char priority; + u32 priority; __u16 gso_size; }; @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, ipcm_init(ipcm); ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority); ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); ipcm->addr = inet->inet_saddr; diff --git a/include/net/sock.h b/include/net/sock.h index 7464e9f9f47c..316a34d6c48b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1814,13 +1814,15 @@ struct sockcm_cookie { u32 mark; u32 tsflags; u32 ts_opt_id; + u32 priority; }; static inline void sockcm_init(struct sockcm_cookie *sockc, const struct sock *sk) { *sockc = (struct sockcm_cookie) { - .tsflags = READ_ONCE(sk->sk_tsflags) + .tsflags = READ_ONCE(sk->sk_tsflags), + .priority = READ_ONCE(sk->sk_priority), }; } diff --git a/net/can/raw.c b/net/can/raw.c index 255c0a8f39d6..46e8ed9d64da 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) } skb->dev = dev; - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc.priority; skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; diff --git a/net/core/sock.c b/net/core/sock.c index 5ecf6f1a470c..d5586b9212dd 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, case SCM_RIGHTS: case SCM_CREDENTIALS: break; + case SO_PRIORITY: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) + return -EINVAL; + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) { + sockc->priority = *(u32 *)CMSG_DATA(cmsg); + break; + } + return -EPERM; default: return -EINVAL; } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 0065b1996c94..72b37321c0ea 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1328,7 +1328,10 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, cork->ttl = ipc->ttl; cork->tos = ipc->tos; cork->mark = ipc->sockc.mark; - cork->priority = ipc->priority; + if (cork->tos != -1) + cork->priority = ipc->priority; + else + cork->priority = ipc->sockc.priority; cork->transmit_time = ipc->sockc.transmit_time; cork->tx_flags = 0; sock_tx_timestamp(sk, &ipc->sockc, &cork->tx_flags); @@ -1465,7 +1468,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, ip_options_build(skb, opt, cork->addr, rt); } - skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); + skb->priority = cork->priority; skb->mark = cork->mark; if (sk_is_tcp(sk)) skb_set_delivery_time(skb, cork->transmit_time, SKB_CLOCK_MONOTONIC); diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 0e9e01967ec9..4304a68d1db0 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -358,7 +358,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb_reserve(skb, hlen); skb->protocol = htons(ETH_P_IP); - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc->priority; skb->mark = sockc->mark; skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid); skb_dst_set(skb, &rt->dst); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index f7b4608bb316..ec9673b7ab16 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1401,6 +1401,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, cork->base.gso_size = ipc6->gso_size; cork->base.tx_flags = 0; cork->base.mark = ipc6->sockc.mark; + cork->base.priority = ipc6->sockc.priority; sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags); if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID) { cork->base.flags |= IPCORK_TS_OPT_ID; @@ -1939,7 +1940,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, hdr->saddr = fl6->saddr; hdr->daddr = *final_dst; - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = cork->base.priority; skb->mark = cork->base.mark; if (sk_is_tcp(sk)) skb_set_delivery_time(skb, cork->base.transmit_time, SKB_CLOCK_MONOTONIC); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 8476a3944a88..c82cc6cfdbd2 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -619,7 +619,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb_reserve(skb, hlen); skb->protocol = htons(ETH_P_IPV6); - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc->priority; skb->mark = sockc->mark; skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 886c0dd47b66..f8d87d622699 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3126,7 +3126,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->protocol = proto; skb->dev = dev; - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc.priority; skb->mark = sockc.mark; skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
The Linux socket API currently supports setting SO_PRIORITY at the socket level, which applies a uniform priority to all packets sent through that socket. The only exception is IP_TOS; if specified as ancillary data, the packet does not inherit the socket's priority. Instead, the priority value is computed when handling the ancillary data (as implemented in commit <f02db315b8d88> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If the priority is set via IP_TOS, then skb->priority derives its value from the rt_tos2priority function, which calculates the priority based on the value of ipc->tos obtained from IP_TOS. However, if IP_TOS is not used and the priority has been set through a control message, skb->priority will take the value provided by that control message. Therefore, when both options are available, the primary source for skb->priority is the value set via IP_TOS. Currently, there is no option to set the priority directly from userspace on a per-packet basis. The following changes allow SO_PRIORITY to be set through control messages (CMSG), giving userspace applications more granular control over packet priorities. This patch enables setting skb->priority using CMSG. If SO_PRIORITY is specified as ancillary data, the packet is sent with the priority value set through sockc->priority_cmsg_value, overriding the socket-level values set via the traditional setsockopt() method. This is analogous to existing support for SO_MARK (as implemented in commit <c6af0c227a22> ("ip: support SO_MARK cmsg")). Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> --- include/net/inet_sock.h | 2 +- include/net/ip.h | 3 ++- include/net/sock.h | 4 +++- net/can/raw.c | 2 +- net/core/sock.c | 8 ++++++++ net/ipv4/ip_output.c | 7 +++++-- net/ipv4/raw.c | 2 +- net/ipv6/ip6_output.c | 3 ++- net/ipv6/raw.c | 2 +- net/packet/af_packet.c | 2 +- 10 files changed, 25 insertions(+), 10 deletions(-)