diff mbox series

[net-next,v2,2/2] support SO_PRIORITY cmsg

Message ID 20241102125136.5030-3-annaemesenyiri@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series support SO_PRIORITY cmsg | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success Errors and warnings before: 2 (+0) this patch: 2 (+0)
netdev/cc_maintainers warning 5 maintainers not CCed: horms@kernel.org dsahern@kernel.org mkl@pengutronix.de linux-can@vger.kernel.org socketcan@hartkopp.net
netdev/build_clang success Errors and warnings before: 27 this patch: 27
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2606 this patch: 2606
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 119 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 18 this patch: 18
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-11-02--18-00 (tests: 781)

Commit Message

Anna Nyiri Nov. 2, 2024, 12:51 p.m. UTC
The Linux socket API currently supports setting SO_PRIORITY at the
socket level, which applies a uniform priority to all packets sent
through that socket. The only exception is IP_TOS; if specified as
ancillary data, the packet does not inherit the socket's priority.
Instead, the priority value is computed when handling the ancillary
data (as implemented in commit <f02db315b8d88>
("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If
the priority is set via IP_TOS, then skb->priority derives its value
from the rt_tos2priority function, which calculates the priority
based on the value of ipc->tos obtained from IP_TOS. However, if
IP_TOS is not used and the priority has been set through a control
message, skb->priority will take the value provided by that control
message. Therefore, when both options are available, the primary
source for skb->priority is the value set via IP_TOS.

Currently, there is no option to set the priority directly from
userspace on a per-packet basis. The following changes allow
SO_PRIORITY to be set through control messages (CMSG), giving
userspace applications more granular control over packet priorities.

This patch enables setting skb->priority using CMSG. If SO_PRIORITY
is specified as ancillary data, the packet is sent with the priority
value set through sockc->priority_cmsg_value, overriding the
socket-level values set via the traditional setsockopt() method. This
is analogous to existing support for SO_MARK (as implemented in commit
<c6af0c227a22> ("ip: support SO_MARK cmsg")).

Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
 include/net/inet_sock.h | 2 +-
 include/net/ip.h        | 3 ++-
 include/net/sock.h      | 4 +++-
 net/can/raw.c           | 2 +-
 net/core/sock.c         | 8 ++++++++
 net/ipv4/ip_output.c    | 7 +++++--
 net/ipv4/raw.c          | 2 +-
 net/ipv6/ip6_output.c   | 3 ++-
 net/ipv6/raw.c          | 2 +-
 net/packet/af_packet.c  | 2 +-
 10 files changed, 25 insertions(+), 10 deletions(-)

Comments

Willem de Bruijn Nov. 3, 2024, 1:27 a.m. UTC | #1
Anna Emese Nyiri wrote:
> The Linux socket API currently supports setting SO_PRIORITY at the
> socket level, which applies a uniform priority to all packets sent
> through that socket. The only exception is IP_TOS; if specified as
> ancillary data, the packet does not inherit the socket's priority.
> Instead, the priority value is computed when handling the ancillary
> data (as implemented in commit <f02db315b8d88>

nit: drop the brackets

> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If
> the priority is set via IP_TOS, then skb->priority derives its value
> from the rt_tos2priority function, which calculates the priority
> based on the value of ipc->tos obtained from IP_TOS. However, if
> IP_TOS is not used and the priority has been set through a control
> message, skb->priority will take the value provided by that control
> message.

The above describes the new situation? There is no way to set
priority to a control message prior to this patch.

> Therefore, when both options are available, the primary
> source for skb->priority is the value set via IP_TOS.
> 
> Currently, there is no option to set the priority directly from
> userspace on a per-packet basis. The following changes allow
> SO_PRIORITY to be set through control messages (CMSG), giving
> userspace applications more granular control over packet priorities.
> 
> This patch enables setting skb->priority using CMSG. If SO_PRIORITY

Duplicate statement. Overall, the explanation can perhaps be
condensed and made more clear.

> is specified as ancillary data, the packet is sent with the priority
> value set through sockc->priority_cmsg_value, overriding the

No longer matches the code.

> socket-level values set via the traditional setsockopt() method. This
> is analogous to existing support for SO_MARK (as implemented in commit
> <c6af0c227a22> ("ip: support SO_MARK cmsg")).
> 
> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> ---
>  include/net/inet_sock.h | 2 +-
>  include/net/ip.h        | 3 ++-
>  include/net/sock.h      | 4 +++-
>  net/can/raw.c           | 2 +-
>  net/core/sock.c         | 8 ++++++++
>  net/ipv4/ip_output.c    | 7 +++++--
>  net/ipv4/raw.c          | 2 +-
>  net/ipv6/ip6_output.c   | 3 ++-
>  net/ipv6/raw.c          | 2 +-
>  net/packet/af_packet.c  | 2 +-
>  10 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 56d8bc5593d3..3ccbad881d74 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -172,7 +172,7 @@ struct inet_cork {
>  	u8			tx_flags;
>  	__u8			ttl;
>  	__s16			tos;
> -	char			priority;
> +	u32			priority;

Let's check with pahole how this affects struct size and holes.
It likely adds a hole, but unavoidably so.

>  	__u16			gso_size;
>  	u32			ts_opt_id;
>  	u64			transmit_time;
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 0e548c1f2a0e..e8f71a191277 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -81,7 +81,7 @@ struct ipcm_cookie {
>  	__u8			protocol;
>  	__u8			ttl;
>  	__s16			tos;
> -	char			priority;
> +	u32			priority;

No need for a field in ipcm_cookie, when also present in
sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is
the right location.

If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with
rt_tos2priority.

Interesting that this override by IP_TOS seems to be IPV4 only. There
is no equivalent call to rt_tos2priority when setting IPV6_TCLASS.

>  	__u16			gso_size;
>  };
>  
> @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
>  	ipcm_init(ipcm);
>  
>  	ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark);
> +	ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority);
>  	ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags);
>  	ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
>  	ipcm->addr = inet->inet_saddr;
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 7464e9f9f47c..316a34d6c48b 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1814,13 +1814,15 @@ struct sockcm_cookie {
>  	u32 mark;
>  	u32 tsflags;
>  	u32 ts_opt_id;
> +	u32 priority;
>  };
>  
>  static inline void sockcm_init(struct sockcm_cookie *sockc,
>  			       const struct sock *sk)
>  {
>  	*sockc = (struct sockcm_cookie) {
> -		.tsflags = READ_ONCE(sk->sk_tsflags)
> +		.tsflags = READ_ONCE(sk->sk_tsflags),
> +		.priority = READ_ONCE(sk->sk_priority),
>  	};
>  }
>  
> diff --git a/net/can/raw.c b/net/can/raw.c
> index 255c0a8f39d6..46e8ed9d64da 100644
> --- a/net/can/raw.c
> +++ b/net/can/raw.c
> @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
>  	}
>  
>  	skb->dev = dev;
> -	skb->priority = READ_ONCE(sk->sk_priority);
> +	skb->priority = sockc.priority;
>  	skb->mark = READ_ONCE(sk->sk_mark);
>  	skb->tstamp = sockc.transmit_time;
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5ecf6f1a470c..d5586b9212dd 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>  	case SCM_RIGHTS:
>  	case SCM_CREDENTIALS:
>  		break;
> +	case SO_PRIORITY:
> +		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> +			return -EINVAL;
> +		if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) {
> +			sockc->priority = *(u32 *)CMSG_DATA(cmsg);
> +			break;
> +		}
> +		return -EPERM;

nit: invert to make the error case the (speculated as unlikely) branch
and have the common path unindented.

>  	default:
>  		return -EINVAL;
>  	}
Anna Nyiri Nov. 3, 2024, 1:39 p.m. UTC | #2
Willem de Bruijn <willemdebruijn.kernel@gmail.com> ezt írta (időpont:
2024. nov. 3., V, 2:27):
>
> Anna Emese Nyiri wrote:
> > The Linux socket API currently supports setting SO_PRIORITY at the
> > socket level, which applies a uniform priority to all packets sent
> > through that socket. The only exception is IP_TOS; if specified as
> > ancillary data, the packet does not inherit the socket's priority.
> > Instead, the priority value is computed when handling the ancillary
> > data (as implemented in commit <f02db315b8d88>
>
> nit: drop the brackets
>
> > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If
> > the priority is set via IP_TOS, then skb->priority derives its value
> > from the rt_tos2priority function, which calculates the priority
> > based on the value of ipc->tos obtained from IP_TOS. However, if
> > IP_TOS is not used and the priority has been set through a control
> > message, skb->priority will take the value provided by that control
> > message.
>
> The above describes the new situation? There is no way to set
> priority to a control message prior to this patch.
>
> > Therefore, when both options are available, the primary
> > source for skb->priority is the value set via IP_TOS.
> >
> > Currently, there is no option to set the priority directly from
> > userspace on a per-packet basis. The following changes allow
> > SO_PRIORITY to be set through control messages (CMSG), giving
> > userspace applications more granular control over packet priorities.
> >
> > This patch enables setting skb->priority using CMSG. If SO_PRIORITY
>
> Duplicate statement. Overall, the explanation can perhaps be
> condensed and made more clear.
>
> > is specified as ancillary data, the packet is sent with the priority
> > value set through sockc->priority_cmsg_value, overriding the
>
> No longer matches the code.
>
> > socket-level values set via the traditional setsockopt() method. This
> > is analogous to existing support for SO_MARK (as implemented in commit
> > <c6af0c227a22> ("ip: support SO_MARK cmsg")).
> >
> > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > ---
> >  include/net/inet_sock.h | 2 +-
> >  include/net/ip.h        | 3 ++-
> >  include/net/sock.h      | 4 +++-
> >  net/can/raw.c           | 2 +-
> >  net/core/sock.c         | 8 ++++++++
> >  net/ipv4/ip_output.c    | 7 +++++--
> >  net/ipv4/raw.c          | 2 +-
> >  net/ipv6/ip6_output.c   | 3 ++-
> >  net/ipv6/raw.c          | 2 +-
> >  net/packet/af_packet.c  | 2 +-
> >  10 files changed, 25 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> > index 56d8bc5593d3..3ccbad881d74 100644
> > --- a/include/net/inet_sock.h
> > +++ b/include/net/inet_sock.h
> > @@ -172,7 +172,7 @@ struct inet_cork {
> >       u8                      tx_flags;
> >       __u8                    ttl;
> >       __s16                   tos;
> > -     char                    priority;
> > +     u32                     priority;
>
> Let's check with pahole how this affects struct size and holes.
> It likely adds a hole, but unavoidably so.
>
> >       __u16                   gso_size;
> >       u32                     ts_opt_id;
> >       u64                     transmit_time;
> > diff --git a/include/net/ip.h b/include/net/ip.h
> > index 0e548c1f2a0e..e8f71a191277 100644
> > --- a/include/net/ip.h
> > +++ b/include/net/ip.h
> > @@ -81,7 +81,7 @@ struct ipcm_cookie {
> >       __u8                    protocol;
> >       __u8                    ttl;
> >       __s16                   tos;
> > -     char                    priority;
> > +     u32                     priority;
>
> No need for a field in ipcm_cookie, when also present in
> sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is
> the right location.

I think there could be a problem if the priority is set by IP_TOS for
some reason, and then also via cmsg. The latter value may overwrite
it. In the ip_setup_cork() function, there is therefore a check for
the value cork->tos != -1 to give priority to the value set by IP_TOS.
And that's why I thought that there should be a priority field in both
ipcm_cookie and sockcm_cookie. The priority field already existed in
ipcm_cookie, I didn't add it. I just changed the type.

>
> If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with
> rt_tos2priority.
>
> Interesting that this override by IP_TOS seems to be IPV4 only. There
> is no equivalent call to rt_tos2priority when setting IPV6_TCLASS.
>
> >       __u16                   gso_size;
> >  };
> >
> > @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
> >       ipcm_init(ipcm);
> >
> >       ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark);
> > +     ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority);
> >       ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags);
> >       ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
> >       ipcm->addr = inet->inet_saddr;
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 7464e9f9f47c..316a34d6c48b 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1814,13 +1814,15 @@ struct sockcm_cookie {
> >       u32 mark;
> >       u32 tsflags;
> >       u32 ts_opt_id;
> > +     u32 priority;
> >  };
> >
> >  static inline void sockcm_init(struct sockcm_cookie *sockc,
> >                              const struct sock *sk)
> >  {
> >       *sockc = (struct sockcm_cookie) {
> > -             .tsflags = READ_ONCE(sk->sk_tsflags)
> > +             .tsflags = READ_ONCE(sk->sk_tsflags),
> > +             .priority = READ_ONCE(sk->sk_priority),
> >       };
> >  }
> >
> > diff --git a/net/can/raw.c b/net/can/raw.c
> > index 255c0a8f39d6..46e8ed9d64da 100644
> > --- a/net/can/raw.c
> > +++ b/net/can/raw.c
> > @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> >       }
> >
> >       skb->dev = dev;
> > -     skb->priority = READ_ONCE(sk->sk_priority);
> > +     skb->priority = sockc.priority;
> >       skb->mark = READ_ONCE(sk->sk_mark);
> >       skb->tstamp = sockc.transmit_time;
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 5ecf6f1a470c..d5586b9212dd 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
> >       case SCM_RIGHTS:
> >       case SCM_CREDENTIALS:
> >               break;
> > +     case SO_PRIORITY:
> > +             if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> > +                     return -EINVAL;
> > +             if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) {
> > +                     sockc->priority = *(u32 *)CMSG_DATA(cmsg);
> > +                     break;
> > +             }
> > +             return -EPERM;
>
> nit: invert to make the error case the (speculated as unlikely) branch
> and have the common path unindented.
>
> >       default:
> >               return -EINVAL;
> >       }
Willem de Bruijn Nov. 3, 2024, 1:57 p.m. UTC | #3
Anna Nyiri wrote:
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> ezt írta (időpont:
> 2024. nov. 3., V, 2:27):
> >
> > Anna Emese Nyiri wrote:
> > > The Linux socket API currently supports setting SO_PRIORITY at the
> > > socket level, which applies a uniform priority to all packets sent
> > > through that socket. The only exception is IP_TOS; if specified as
> > > ancillary data, the packet does not inherit the socket's priority.
> > > Instead, the priority value is computed when handling the ancillary
> > > data (as implemented in commit <f02db315b8d88>
> >
> > nit: drop the brackets
> >
> > > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If
> > > the priority is set via IP_TOS, then skb->priority derives its value
> > > from the rt_tos2priority function, which calculates the priority
> > > based on the value of ipc->tos obtained from IP_TOS. However, if
> > > IP_TOS is not used and the priority has been set through a control
> > > message, skb->priority will take the value provided by that control
> > > message.
> >
> > The above describes the new situation? There is no way to set
> > priority to a control message prior to this patch.
> >
> > > Therefore, when both options are available, the primary
> > > source for skb->priority is the value set via IP_TOS.
> > >
> > > Currently, there is no option to set the priority directly from
> > > userspace on a per-packet basis. The following changes allow
> > > SO_PRIORITY to be set through control messages (CMSG), giving
> > > userspace applications more granular control over packet priorities.
> > >
> > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY
> >
> > Duplicate statement. Overall, the explanation can perhaps be
> > condensed and made more clear.
> >
> > > is specified as ancillary data, the packet is sent with the priority
> > > value set through sockc->priority_cmsg_value, overriding the
> >
> > No longer matches the code.
> >
> > > socket-level values set via the traditional setsockopt() method. This
> > > is analogous to existing support for SO_MARK (as implemented in commit
> > > <c6af0c227a22> ("ip: support SO_MARK cmsg")).
> > >
> > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > ---
> > >  include/net/inet_sock.h | 2 +-
> > >  include/net/ip.h        | 3 ++-
> > >  include/net/sock.h      | 4 +++-
> > >  net/can/raw.c           | 2 +-
> > >  net/core/sock.c         | 8 ++++++++
> > >  net/ipv4/ip_output.c    | 7 +++++--
> > >  net/ipv4/raw.c          | 2 +-
> > >  net/ipv6/ip6_output.c   | 3 ++-
> > >  net/ipv6/raw.c          | 2 +-
> > >  net/packet/af_packet.c  | 2 +-
> > >  10 files changed, 25 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> > > index 56d8bc5593d3..3ccbad881d74 100644
> > > --- a/include/net/inet_sock.h
> > > +++ b/include/net/inet_sock.h
> > > @@ -172,7 +172,7 @@ struct inet_cork {
> > >       u8                      tx_flags;
> > >       __u8                    ttl;
> > >       __s16                   tos;
> > > -     char                    priority;
> > > +     u32                     priority;
> >
> > Let's check with pahole how this affects struct size and holes.
> > It likely adds a hole, but unavoidably so.
> >
> > >       __u16                   gso_size;
> > >       u32                     ts_opt_id;
> > >       u64                     transmit_time;
> > > diff --git a/include/net/ip.h b/include/net/ip.h
> > > index 0e548c1f2a0e..e8f71a191277 100644
> > > --- a/include/net/ip.h
> > > +++ b/include/net/ip.h
> > > @@ -81,7 +81,7 @@ struct ipcm_cookie {
> > >       __u8                    protocol;
> > >       __u8                    ttl;
> > >       __s16                   tos;
> > > -     char                    priority;
> > > +     u32                     priority;
> >
> > No need for a field in ipcm_cookie, when also present in
> > sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is
> > the right location.
> 
> I think there could be a problem if the priority is set by IP_TOS for
> some reason, and then also via cmsg. The latter value may overwrite
> it. In the ip_setup_cork() function, there is therefore a check for
> the value cork->tos != -1 to give priority to the value set by IP_TOS.
> And that's why I thought that there should be a priority field in both
> ipcm_cookie and sockcm_cookie. The priority field already existed in
> ipcm_cookie, I didn't add it. I just changed the type.

The existing behavior that adds a branch in the hot path is actually
not needed.

The preferred pattern is that the cookie is initialized with the sk
field, and then optionally overwritten when parsing cmsgs.

The path is slightly complicated by the fact that ipcm_init_sk does
not call sockcm_init, but more or less open codes that.

The callers of ipcm_init_sk are datagram sockets that have more
opportunities to override per-socket options on a per-packet basis.

So I was wrong that the field only has to be initialized in
sockcm_init. It will have to be initialized by both initializers.

But still only a u32 single field is needed.
diff mbox series

Patch

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 56d8bc5593d3..3ccbad881d74 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -172,7 +172,7 @@  struct inet_cork {
 	u8			tx_flags;
 	__u8			ttl;
 	__s16			tos;
-	char			priority;
+	u32			priority;
 	__u16			gso_size;
 	u32			ts_opt_id;
 	u64			transmit_time;
diff --git a/include/net/ip.h b/include/net/ip.h
index 0e548c1f2a0e..e8f71a191277 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -81,7 +81,7 @@  struct ipcm_cookie {
 	__u8			protocol;
 	__u8			ttl;
 	__s16			tos;
-	char			priority;
+	u32			priority;
 	__u16			gso_size;
 };
 
@@ -96,6 +96,7 @@  static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
 	ipcm_init(ipcm);
 
 	ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark);
+	ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority);
 	ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags);
 	ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
 	ipcm->addr = inet->inet_saddr;
diff --git a/include/net/sock.h b/include/net/sock.h
index 7464e9f9f47c..316a34d6c48b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1814,13 +1814,15 @@  struct sockcm_cookie {
 	u32 mark;
 	u32 tsflags;
 	u32 ts_opt_id;
+	u32 priority;
 };
 
 static inline void sockcm_init(struct sockcm_cookie *sockc,
 			       const struct sock *sk)
 {
 	*sockc = (struct sockcm_cookie) {
-		.tsflags = READ_ONCE(sk->sk_tsflags)
+		.tsflags = READ_ONCE(sk->sk_tsflags),
+		.priority = READ_ONCE(sk->sk_priority),
 	};
 }
 
diff --git a/net/can/raw.c b/net/can/raw.c
index 255c0a8f39d6..46e8ed9d64da 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -962,7 +962,7 @@  static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 	}
 
 	skb->dev = dev;
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc.priority;
 	skb->mark = READ_ONCE(sk->sk_mark);
 	skb->tstamp = sockc.transmit_time;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 5ecf6f1a470c..d5586b9212dd 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2941,6 +2941,14 @@  int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 	case SCM_RIGHTS:
 	case SCM_CREDENTIALS:
 		break;
+	case SO_PRIORITY:
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
+			return -EINVAL;
+		if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) {
+			sockc->priority = *(u32 *)CMSG_DATA(cmsg);
+			break;
+		}
+		return -EPERM;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0065b1996c94..72b37321c0ea 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1328,7 +1328,10 @@  static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	cork->ttl = ipc->ttl;
 	cork->tos = ipc->tos;
 	cork->mark = ipc->sockc.mark;
-	cork->priority = ipc->priority;
+	if (cork->tos != -1)
+		cork->priority = ipc->priority;
+	else
+		cork->priority = ipc->sockc.priority;
 	cork->transmit_time = ipc->sockc.transmit_time;
 	cork->tx_flags = 0;
 	sock_tx_timestamp(sk, &ipc->sockc, &cork->tx_flags);
@@ -1465,7 +1468,7 @@  struct sk_buff *__ip_make_skb(struct sock *sk,
 		ip_options_build(skb, opt, cork->addr, rt);
 	}
 
-	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
+	skb->priority = cork->priority;
 	skb->mark = cork->mark;
 	if (sk_is_tcp(sk))
 		skb_set_delivery_time(skb, cork->transmit_time, SKB_CLOCK_MONOTONIC);
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 0e9e01967ec9..4304a68d1db0 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -358,7 +358,7 @@  static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	skb_reserve(skb, hlen);
 
 	skb->protocol = htons(ETH_P_IP);
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc->priority;
 	skb->mark = sockc->mark;
 	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
 	skb_dst_set(skb, &rt->dst);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f7b4608bb316..ec9673b7ab16 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1401,6 +1401,7 @@  static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
 	cork->base.mark = ipc6->sockc.mark;
+	cork->base.priority = ipc6->sockc.priority;
 	sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags);
 	if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID) {
 		cork->base.flags |= IPCORK_TS_OPT_ID;
@@ -1939,7 +1940,7 @@  struct sk_buff *__ip6_make_skb(struct sock *sk,
 	hdr->saddr = fl6->saddr;
 	hdr->daddr = *final_dst;
 
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = cork->base.priority;
 	skb->mark = cork->base.mark;
 	if (sk_is_tcp(sk))
 		skb_set_delivery_time(skb, cork->base.transmit_time, SKB_CLOCK_MONOTONIC);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8476a3944a88..c82cc6cfdbd2 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -619,7 +619,7 @@  static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 	skb_reserve(skb, hlen);
 
 	skb->protocol = htons(ETH_P_IPV6);
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc->priority;
 	skb->mark = sockc->mark;
 	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 886c0dd47b66..f8d87d622699 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3126,7 +3126,7 @@  static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 
 	skb->protocol = proto;
 	skb->dev = dev;
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc.priority;
 	skb->mark = sockc.mark;
 	skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);