Message ID | 20240930092416.80830-2-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net-timestamp: add some trivial | expand |
Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Even though this case is unlikely to happen, we have to avoid such > a case occurring at an earlier point: the sk_rmem_alloc could get > increased because of inserting more and more skbs into the errqueue > when calling __skb_complete_tx_timestamp(). This bad case would stop > the socket transmitting soon. It is up to the application to read from the error queue frequently enough and/or increase SO_RCVBUF. > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/core/sock.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/core/sock.c b/net/core/sock.c > index fe87f9bd8f16..4bddd6f62e4f 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > if (val & ~SOF_TIMESTAMPING_MASK) > return -EINVAL; > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > + return -EINVAL; > + This breaks hardware timestamping > if (val & SOF_TIMESTAMPING_OPT_ID_TCP && > !(val & SOF_TIMESTAMPING_OPT_ID)) > return -EINVAL; > -- > 2.37.3 >
On 30/09/2024 10:24, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Even though this case is unlikely to happen, we have to avoid such > a case occurring at an earlier point: the sk_rmem_alloc could get > increased because of inserting more and more skbs into the errqueue > when calling __skb_complete_tx_timestamp(). This bad case would stop > the socket transmitting soon. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/core/sock.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/core/sock.c b/net/core/sock.c > index fe87f9bd8f16..4bddd6f62e4f 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > if (val & ~SOF_TIMESTAMPING_MASK) > return -EINVAL; > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > + return -EINVAL; SOF_TIMESTAMPING_TX_RECORD_MASK contains SOF_TIMESTAMPING_TX_HARDWARE. That means that there will be no option to enable HW TX timestamping without enabling software timestamping. I believe this is wrong restriction. > + > if (val & SOF_TIMESTAMPING_OPT_ID_TCP && > !(val & SOF_TIMESTAMPING_OPT_ID)) > return -EINVAL;
On Mon, Sep 30, 2024 at 6:48 PM Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > > On 30/09/2024 10:24, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Even though this case is unlikely to happen, we have to avoid such > > a case occurring at an earlier point: the sk_rmem_alloc could get > > increased because of inserting more and more skbs into the errqueue > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > the socket transmitting soon. > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/core/sock.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > if (val & ~SOF_TIMESTAMPING_MASK) > > return -EINVAL; > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > + return -EINVAL; > > SOF_TIMESTAMPING_TX_RECORD_MASK contains SOF_TIMESTAMPING_TX_HARDWARE. > That means that there will be no option to enable HW TX timestamping > without enabling software timestamping. I believe this is wrong > restriction. Thanks! You are right. I should have realized that. I need to get rid of this one. Thanks, Jason
On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Even though this case is unlikely to happen, we have to avoid such > > a case occurring at an earlier point: the sk_rmem_alloc could get > > increased because of inserting more and more skbs into the errqueue > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > the socket transmitting soon. > > It is up to the application to read from the error queue frequently > enough and/or increase SO_RCVBUF. Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on the loopback, it will soon stop. That's the reason why I tried to add the restriction just in case. > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/core/sock.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > if (val & ~SOF_TIMESTAMPING_MASK) > > return -EINVAL; > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > + return -EINVAL; > > + > > This breaks hardware timestamping Yes, and sorry about that. I'll fix this. > > > if (val & SOF_TIMESTAMPING_OPT_ID_TCP && > > !(val & SOF_TIMESTAMPING_OPT_ID)) > > return -EINVAL; > > -- > > 2.37.3 > > > >
Jason Xing wrote: > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > increased because of inserting more and more skbs into the errqueue > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > the socket transmitting soon. > > > > It is up to the application to read from the error queue frequently > > enough and/or increase SO_RCVBUF. > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > the loopback, it will soon stop. That's the reason why I tried to add > the restriction just in case. I don't follow at all. That bit does not affect the core issue: that the application is not clearing its error queue quickly enough. > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > net/core/sock.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > return -EINVAL; > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > + return -EINVAL; > > > + > > > > This breaks hardware timestamping > > Yes, and sorry about that. I'll fix this. As is I don't understand the purpose of this patch. Please do not just resubmit with a change, but explain the problem and suggested solution first. > > > > > if (val & SOF_TIMESTAMPING_OPT_ID_TCP && > > > !(val & SOF_TIMESTAMPING_OPT_ID)) > > > return -EINVAL; > > > -- > > > 2.37.3 > > > > > > >
On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > > increased because of inserting more and more skbs into the errqueue > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > > the socket transmitting soon. > > > > > > It is up to the application to read from the error queue frequently > > > enough and/or increase SO_RCVBUF. > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > > the loopback, it will soon stop. That's the reason why I tried to add > > the restriction just in case. > > I don't follow at all. > > That bit does not affect the core issue: that the application is not > clearing its error queue quickly enough. > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > net/core/sock.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > > --- a/net/core/sock.c > > > > +++ b/net/core/sock.c > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > > return -EINVAL; > > > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > > + return -EINVAL; > > > > + > > > > > > This breaks hardware timestamping > > > > Yes, and sorry about that. I'll fix this. > > As is I don't understand the purpose of this patch. Please do not > just resubmit with a change, but explain the problem and suggested > solution first. > I will drop this patch because I just tested with my program in the local machine and found there is one mistake I made about calculating the diff between those two . Sorry for the noise. Well, I only need to send a V2 patch of patch [3/3] in the next few days. BTW, please allow me to ask one question unrelated to this patch again. I do wonder: if we batch the recv skbs from the errqueue when calling tcp_recvmsg() -> inet_recv_error(), it could break users, right? Thanks, Jason
Jason Xing wrote: > On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > > > increased because of inserting more and more skbs into the errqueue > > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > > > the socket transmitting soon. > > > > > > > > It is up to the application to read from the error queue frequently > > > > enough and/or increase SO_RCVBUF. > > > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > > > the loopback, it will soon stop. That's the reason why I tried to add > > > the restriction just in case. > > > > I don't follow at all. > > > > That bit does not affect the core issue: that the application is not > > clearing its error queue quickly enough. > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > --- > > > > > net/core/sock.c | 4 ++++ > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > > > --- a/net/core/sock.c > > > > > +++ b/net/core/sock.c > > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > > > return -EINVAL; > > > > > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > > > + return -EINVAL; > > > > > + > > > > > > > > This breaks hardware timestamping > > > > > > Yes, and sorry about that. I'll fix this. > > > > As is I don't understand the purpose of this patch. Please do not > > just resubmit with a change, but explain the problem and suggested > > solution first. > > > > I will drop this patch because I just tested with my program in the > local machine and found there is one mistake I made about calculating > the diff between those two . Sorry for the noise. > > Well, I only need to send a V2 patch of patch [3/3] in the next few days. > > BTW, please allow me to ask one question unrelated to this patch > again. I do wonder: if we batch the recv skbs from the errqueue when > calling tcp_recvmsg() -> inet_recv_error(), it could break users, > right? Analogous to __msg_zerocopy_callback with __msg_zerocopy_callback. Only here we cannot return range-based results and thus cannot just expand the range of the one outstanding notification. This would mean in ip(v6)_recv_error calling sock_dequeue_err_skb, sock_recv_timestamp and put_cmsg IP_RECVERR multiple times. And ip_cmsg_recv if needed. Existing applications do not have to expect multiple results per single recvmsg call. So enabling that unconditionally could break them. Adding this will require a new flag. An sk_tsflag is the obvious approach. Interpreting a MSG_* flag passed to recvmsg would be another option. If there is a bit that can be set with MSG_ERRQUEUE and cannot already be set currently. But I don't think that's the case. We allow all bits and ignore any undefined ones.
On Tue, Oct 1, 2024 at 1:14 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > Jason Xing wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > > > > increased because of inserting more and more skbs into the errqueue > > > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > > > > the socket transmitting soon. > > > > > > > > > > It is up to the application to read from the error queue frequently > > > > > enough and/or increase SO_RCVBUF. > > > > > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > > > > the loopback, it will soon stop. That's the reason why I tried to add > > > > the restriction just in case. > > > > > > I don't follow at all. > > > > > > That bit does not affect the core issue: that the application is not > > > clearing its error queue quickly enough. > > > > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > --- > > > > > > net/core/sock.c | 4 ++++ > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > > > > --- a/net/core/sock.c > > > > > > +++ b/net/core/sock.c > > > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > > > > return -EINVAL; > > > > > > > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > > > > + return -EINVAL; > > > > > > + > > > > > > > > > > This breaks hardware timestamping > > > > > > > > Yes, and sorry about that. I'll fix this. > > > > > > As is I don't understand the purpose of this patch. Please do not > > > just resubmit with a change, but explain the problem and suggested > > > solution first. > > > > > > > I will drop this patch because I just tested with my program in the > > local machine and found there is one mistake I made about calculating > > the diff between those two . Sorry for the noise. > > > > Well, I only need to send a V2 patch of patch [3/3] in the next few days. > > > > BTW, please allow me to ask one question unrelated to this patch > > again. I do wonder: if we batch the recv skbs from the errqueue when > > calling tcp_recvmsg() -> inet_recv_error(), it could break users, > > right? > > Analogous to __msg_zerocopy_callback with __msg_zerocopy_callback. > > Only here we cannot return range-based results and thus cannot just > expand the range of the one outstanding notification. > > This would mean in ip(v6)_recv_error calling sock_dequeue_err_skb, > sock_recv_timestamp and put_cmsg IP_RECVERR multiple times. And > ip_cmsg_recv if needed. > > Existing applications do not have to expect multiple results per > single recvmsg call. So enabling that unconditionally could break > them. Thanks for your explanation! I was unsure because I read some use cases in github and txtimestamp.c, they can only handle one err skb at one time. > > Adding this will require a new flag. An sk_tsflag is the obvious > approach. > > Interpreting a MSG_* flag passed to recvmsg would be > another option. If there is a bit that can be set with MSG_ERRQUEUE > and cannot already be set currently. But I don't think that's the > case. We allow all bits and ignore any undefined ones. Do you feel it is necessary that we can implement this idea to optimize it, saving 2 or 3 syscalls at most at one time? IIRC, it's you who proposed that we can batch them when applying the tracepoints mechanism after I gave a presentation at netconf :) It's really good. That inspires me a lot and makes me keep wondering if we can do this these days. Since I've already finished the bpf for timestamping feature locally which bypasses receiving skbs from errqueue, I believe it could be helpful for those applications that still have tendency to use the "traditional way" to trace. What are your thoughts on this? If you agree, do you want to do this on your own or allow me to give it a try? Thanks, Jason
Jason Xing wrote: > On Tue, Oct 1, 2024 at 1:14 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > Jason Xing wrote: > > > > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > Jason Xing wrote: > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > > > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > > > > > increased because of inserting more and more skbs into the errqueue > > > > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > > > > > the socket transmitting soon. > > > > > > > > > > > > It is up to the application to read from the error queue frequently > > > > > > enough and/or increase SO_RCVBUF. > > > > > > > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > > > > > the loopback, it will soon stop. That's the reason why I tried to add > > > > > the restriction just in case. > > > > > > > > I don't follow at all. > > > > > > > > That bit does not affect the core issue: that the application is not > > > > clearing its error queue quickly enough. > > > > > > > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > > --- > > > > > > > net/core/sock.c | 4 ++++ > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > > > > > --- a/net/core/sock.c > > > > > > > +++ b/net/core/sock.c > > > > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > > > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > > > > > return -EINVAL; > > > > > > > > > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > > > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > > > > > + return -EINVAL; > > > > > > > + > > > > > > > > > > > > This breaks hardware timestamping > > > > > > > > > > Yes, and sorry about that. I'll fix this. > > > > > > > > As is I don't understand the purpose of this patch. Please do not > > > > just resubmit with a change, but explain the problem and suggested > > > > solution first. > > > > > > > > > > I will drop this patch because I just tested with my program in the > > > local machine and found there is one mistake I made about calculating > > > the diff between those two . Sorry for the noise. > > > > > > Well, I only need to send a V2 patch of patch [3/3] in the next few days. > > > > > > BTW, please allow me to ask one question unrelated to this patch > > > again. I do wonder: if we batch the recv skbs from the errqueue when > > > calling tcp_recvmsg() -> inet_recv_error(), it could break users, > > > right? > > > > Analogous to __msg_zerocopy_callback with __msg_zerocopy_callback. > > > > Only here we cannot return range-based results and thus cannot just > > expand the range of the one outstanding notification. > > > > This would mean in ip(v6)_recv_error calling sock_dequeue_err_skb, > > sock_recv_timestamp and put_cmsg IP_RECVERR multiple times. And > > ip_cmsg_recv if needed. > > > > Existing applications do not have to expect multiple results per > > single recvmsg call. So enabling that unconditionally could break > > them. > > Thanks for your explanation! I was unsure because I read some use > cases in github and txtimestamp.c, they can only handle one err skb at > one time. > > > > > Adding this will require a new flag. An sk_tsflag is the obvious > > approach. > > > > Interpreting a MSG_* flag passed to recvmsg would be > > another option. If there is a bit that can be set with MSG_ERRQUEUE > > and cannot already be set currently. But I don't think that's the > > case. We allow all bits and ignore any undefined ones. > > Do you feel it is necessary that we can implement this idea to > optimize it, saving 2 or 3 syscalls at most at one time? IIRC, it's > you who proposed that we can batch them when applying the tracepoints > mechanism after I gave a presentation at netconf :) It's really good. > That inspires me a lot and makes me keep wondering if we can do this > these days. > > Since I've already finished the bpf for timestamping feature locally > which bypasses receiving skbs from errqueue, That's great! > I believe it could be > helpful for those applications that still have tendency to use the > "traditional way" to trace. > > What are your thoughts on this? If you agree, do you want to do this > on your own or allow me to give it a try? I'd focus on the workload that you care about most, which is the administrator driven interface, which will use BPF. This micro optimization would need some benchmarks that show that it has a measurable effect.
On Tue, Oct 1, 2024 at 2:15 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Tue, Oct 1, 2024 at 1:14 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > Jason Xing wrote: > > > > > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > > > Jason Xing wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > > > > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > > > > > > increased because of inserting more and more skbs into the errqueue > > > > > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > > > > > > the socket transmitting soon. > > > > > > > > > > > > > > It is up to the application to read from the error queue frequently > > > > > > > enough and/or increase SO_RCVBUF. > > > > > > > > > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > > > > > > the loopback, it will soon stop. That's the reason why I tried to add > > > > > > the restriction just in case. > > > > > > > > > > I don't follow at all. > > > > > > > > > > That bit does not affect the core issue: that the application is not > > > > > clearing its error queue quickly enough. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > > > --- > > > > > > > > net/core/sock.c | 4 ++++ > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > > > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > > > > > > --- a/net/core/sock.c > > > > > > > > +++ b/net/core/sock.c > > > > > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > > > > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > > > > > > return -EINVAL; > > > > > > > > > > > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > > > > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > > > > > > + return -EINVAL; > > > > > > > > + > > > > > > > > > > > > > > This breaks hardware timestamping > > > > > > > > > > > > Yes, and sorry about that. I'll fix this. > > > > > > > > > > As is I don't understand the purpose of this patch. Please do not > > > > > just resubmit with a change, but explain the problem and suggested > > > > > solution first. > > > > > > > > > > > > > I will drop this patch because I just tested with my program in the > > > > local machine and found there is one mistake I made about calculating > > > > the diff between those two . Sorry for the noise. > > > > > > > > Well, I only need to send a V2 patch of patch [3/3] in the next few days. > > > > > > > > BTW, please allow me to ask one question unrelated to this patch > > > > again. I do wonder: if we batch the recv skbs from the errqueue when > > > > calling tcp_recvmsg() -> inet_recv_error(), it could break users, > > > > right? > > > > > > Analogous to __msg_zerocopy_callback with __msg_zerocopy_callback. > > > > > > Only here we cannot return range-based results and thus cannot just > > > expand the range of the one outstanding notification. > > > > > > This would mean in ip(v6)_recv_error calling sock_dequeue_err_skb, > > > sock_recv_timestamp and put_cmsg IP_RECVERR multiple times. And > > > ip_cmsg_recv if needed. > > > > > > Existing applications do not have to expect multiple results per > > > single recvmsg call. So enabling that unconditionally could break > > > them. > > > > Thanks for your explanation! I was unsure because I read some use > > cases in github and txtimestamp.c, they can only handle one err skb at > > one time. > > > > > > > > Adding this will require a new flag. An sk_tsflag is the obvious > > > approach. > > > > > > Interpreting a MSG_* flag passed to recvmsg would be > > > another option. If there is a bit that can be set with MSG_ERRQUEUE > > > and cannot already be set currently. But I don't think that's the > > > case. We allow all bits and ignore any undefined ones. > > > > Do you feel it is necessary that we can implement this idea to > > optimize it, saving 2 or 3 syscalls at most at one time? IIRC, it's > > you who proposed that we can batch them when applying the tracepoints > > mechanism after I gave a presentation at netconf :) It's really good. > > That inspires me a lot and makes me keep wondering if we can do this > > these days. > > > > Since I've already finished the bpf for timestamping feature locally > > which bypasses receiving skbs from errqueue, > > That's great! > > > I believe it could be > > helpful for those applications that still have tendency to use the > > "traditional way" to trace. > > > > What are your thoughts on this? If you agree, do you want to do this > > on your own or allow me to give it a try? > > I'd focus on the workload that you care about most, which is the > administrator driven interface, which will use BPF. > > This micro optimization would need some benchmarks that show that it > has a measurable effect. Got it. I will post that series soon. Thanks, Jason
diff --git a/net/core/sock.c b/net/core/sock.c index fe87f9bd8f16..4bddd6f62e4f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, if (val & ~SOF_TIMESTAMPING_MASK) return -EINVAL; + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && + !(val & SOF_TIMESTAMPING_SOFTWARE)) + return -EINVAL; + if (val & SOF_TIMESTAMPING_OPT_ID_TCP && !(val & SOF_TIMESTAMPING_OPT_ID)) return -EINVAL;