Message ID | 20230325152417.5403-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: fix raising a softirq on the current cpu with rps enabled | expand |
On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > Since we decide to put the skb into a backlog queue of another > cpu, we should not raise the softirq for the current cpu. When > to raise a softirq is based on whether we have more data left to > process later. As to the current cpu, there is no indication of > more data enqueued, so we do not need this action. After enqueuing > to another cpu, net_rx_action() function will call ipi and then > another cpu will raise the softirq as expected. > > Also, raising more softirqs which set the corresponding bit field > can make the IRQ mechanism think we probably need to start ksoftirqd > on the current cpu. Actually it shouldn't happen. > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/core/dev.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1518a366783b..bfaaa652f50c 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > if (sd != mysd) { > sd->rps_ipi_next = mysd->rps_ipi_list; > mysd->rps_ipi_list = sd; > - > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > return 1; > } > #endif /* CONFIG_RPS */ > -- > 2.37.3 > This is not going to work in some cases. Please take a deeper look. I have to run, if you (or others) do not find the reason, I will give more details when I am done traveling.
On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Since we decide to put the skb into a backlog queue of another > > cpu, we should not raise the softirq for the current cpu. When > > to raise a softirq is based on whether we have more data left to > > process later. As to the current cpu, there is no indication of > > more data enqueued, so we do not need this action. After enqueuing > > to another cpu, net_rx_action() function will call ipi and then > > another cpu will raise the softirq as expected. > > > > Also, raising more softirqs which set the corresponding bit field > > can make the IRQ mechanism think we probably need to start ksoftirqd > > on the current cpu. Actually it shouldn't happen. > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/core/dev.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1518a366783b..bfaaa652f50c 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > if (sd != mysd) { > > sd->rps_ipi_next = mysd->rps_ipi_list; > > mysd->rps_ipi_list = sd; > > - > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > return 1; > > } > > #endif /* CONFIG_RPS */ > > -- > > 2.37.3 > > > > This is not going to work in some cases. Please take a deeper look. I'll do it. I've already been struggling with this for a few days. I still have no clue. I found out in some cases that frequently starting ksoftirqd is not that good because this thread may be blocked when waiting in the runqueue. So I tried to avoid raising softirqs to mitigate the issue and then I noticed this one. > > I have to run, if you (or others) do not find the reason, I will give > more details when I am done traveling. Happy traveling :) Thanks, Jason
On Sun, Mar 26, 2023 at 9:39 AM Hillf Danton <hdanton@sina.com> wrote: > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> > > > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > if (sd !=3D mysd) { > > sd->rps_ipi_next =3D mysd->rps_ipi_list; > > mysd->rps_ipi_list =3D sd; > > - > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > return 1; > > } > > Nope, ipi should be sent. But no ipi can go without irq enabled. > Sorry, I didn't get it. IPI is sent in net_rx_action() and apparently I didn't touch this part in this patch. Here is only about whether we should raise an IRQ even if the skb will be enqueued into another cpu. > So feel free to work out what sense made by disabling irq at the call site > then directly send ipi instead.
On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Since we decide to put the skb into a backlog queue of another > > cpu, we should not raise the softirq for the current cpu. When > > to raise a softirq is based on whether we have more data left to > > process later. As to the current cpu, there is no indication of > > more data enqueued, so we do not need this action. After enqueuing > > to another cpu, net_rx_action() function will call ipi and then > > another cpu will raise the softirq as expected. > > > > Also, raising more softirqs which set the corresponding bit field > > can make the IRQ mechanism think we probably need to start ksoftirqd > > on the current cpu. Actually it shouldn't happen. > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/core/dev.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1518a366783b..bfaaa652f50c 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > if (sd != mysd) { > > sd->rps_ipi_next = mysd->rps_ipi_list; > > mysd->rps_ipi_list = sd; > > - > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > return 1; > > } > > #endif /* CONFIG_RPS */ > > -- > > 2.37.3 > > > > This is not going to work in some cases. Please take a deeper look. > > I have to run, if you (or others) do not find the reason, I will give > more details when I am done traveling. I'm wondering whether we could use @mysd instead of @sd like this: if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) __raise_softirq_irqoff(NET_RX_SOFTIRQ); I traced back to some historical changes and saw some relations with this commit ("net: solve a NAPI race"): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39e6c8208d7b6fb9d2047850fb3327db567b564b Thanks, Jason
On Sun, Mar 26, 2023 at 12:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Since we decide to put the skb into a backlog queue of another > > > cpu, we should not raise the softirq for the current cpu. When > > > to raise a softirq is based on whether we have more data left to > > > process later. As to the current cpu, there is no indication of > > > more data enqueued, so we do not need this action. After enqueuing > > > to another cpu, net_rx_action() function will call ipi and then > > > another cpu will raise the softirq as expected. > > > > > > Also, raising more softirqs which set the corresponding bit field > > > can make the IRQ mechanism think we probably need to start ksoftirqd > > > on the current cpu. Actually it shouldn't happen. > > > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > net/core/dev.c | 2 -- > > > 1 file changed, 2 deletions(-) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 1518a366783b..bfaaa652f50c 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > > if (sd != mysd) { > > > sd->rps_ipi_next = mysd->rps_ipi_list; > > > mysd->rps_ipi_list = sd; > > > - > > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > return 1; > > > } > > > #endif /* CONFIG_RPS */ > > > -- > > > 2.37.3 > > > > > > > This is not going to work in some cases. Please take a deeper look. > > > > I have to run, if you (or others) do not find the reason, I will give > > more details when I am done traveling. > > I'm wondering whether we could use @mysd instead of @sd like this: > > if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > __raise_softirq_irqoff(NET_RX_SOFTIRQ); Ah, I have to add more precise code because the above codes may mislead people. diff --git a/net/core/dev.c b/net/core/dev.c index 1518a366783b..9ac9b32e392f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4594,8 +4594,9 @@ static int napi_schedule_rps(struct softnet_data *sd) if (sd != mysd) { sd->rps_ipi_next = mysd->rps_ipi_list; mysd->rps_ipi_list = sd; + if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) + __raise_softirq_irqoff(NET_RX_SOFTIRQ); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); return 1; } #endif /* CONFIG_RPS */ Eric, I realized that some paths don't call the ipi to notify another cpu. If someone grabs the NAPI_STATE_SCHED flag, we know that at the end of net_rx_action() or the beginning of process_backlog(), the net_rps_action_and_irq_enable() will handle the information delivery. However, if no one grabs the flag, in some paths we could not have a chance immediately to tell another cpu to raise the softirq and then process those pending data. Thus, I have to make sure if someone owns the napi poll as shown above. If I get this wrong, please correct me if you're available. Thanks in advance. > > I traced back to some historical changes and saw some relations with > this commit ("net: solve a NAPI race"): > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39e6c8208d7b6fb9d2047850fb3327db567b564b > > Thanks, > Jason
On Sun, Mar 26, 2023 at 6:10 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sun, Mar 26, 2023 at 12:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Since we decide to put the skb into a backlog queue of another > > > > cpu, we should not raise the softirq for the current cpu. When > > > > to raise a softirq is based on whether we have more data left to > > > > process later. As to the current cpu, there is no indication of > > > > more data enqueued, so we do not need this action. After enqueuing > > > > to another cpu, net_rx_action() function will call ipi and then > > > > another cpu will raise the softirq as expected. > > > > > > > > Also, raising more softirqs which set the corresponding bit field > > > > can make the IRQ mechanism think we probably need to start ksoftirqd > > > > on the current cpu. Actually it shouldn't happen. > > > > > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > net/core/dev.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index 1518a366783b..bfaaa652f50c 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > > > if (sd != mysd) { > > > > sd->rps_ipi_next = mysd->rps_ipi_list; > > > > mysd->rps_ipi_list = sd; > > > > - > > > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > return 1; > > > > } > > > > #endif /* CONFIG_RPS */ > > > > -- > > > > 2.37.3 > > > > > > > > > > This is not going to work in some cases. Please take a deeper look. > > > > > > I have to run, if you (or others) do not find the reason, I will give > > > more details when I am done traveling. > > > > I'm wondering whether we could use @mysd instead of @sd like this: > > > > if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > Ah, I have to add more precise code because the above codes may mislead people. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1518a366783b..9ac9b32e392f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4594,8 +4594,9 @@ static int napi_schedule_rps(struct softnet_data *sd) > if (sd != mysd) { > sd->rps_ipi_next = mysd->rps_ipi_list; > mysd->rps_ipi_list = sd; > + if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) Forgive me. Really I need some coffee. I made a mistake. This line above should be: + if (!test_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) But the whole thing doesn't feel right. I need a few days to dig into this part until Eric can help me with more of it. Thanks, Jason > + __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > return 1; > } > #endif /* CONFIG_RPS */ > > Eric, I realized that some paths don't call the ipi to notify another > cpu. If someone grabs the NAPI_STATE_SCHED flag, we know that at the > end of net_rx_action() or the beginning of process_backlog(), the > net_rps_action_and_irq_enable() will handle the information delivery. > However, if no one grabs the flag, in some paths we could not have a > chance immediately to tell another cpu to raise the softirq and then > process those pending data. Thus, I have to make sure if someone owns > the napi poll as shown above. > > If I get this wrong, please correct me if you're available. Thanks in advance. > > > > > I traced back to some historical changes and saw some relations with > > this commit ("net: solve a NAPI race"): > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39e6c8208d7b6fb9d2047850fb3327db567b564b > > > > Thanks, > > Jason
> > Forgive me. Really I need some coffee. I made a mistake. This line > above should be: > > + if (!test_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > > But the whole thing doesn't feel right. I need a few days to dig into > this part until Eric can help me with more of it. > I am still traveling, and this is weekend time :/ It should not be too hard to read net/core/dev.c and remember that not _all_ drivers (or some core networking functions) use the NAPI model. eg look at netif_rx() and ask yourself why your patch is buggy. Just look at callers of enqueue_to_backlog() and ask yourself if all of them are called from net_rx_action() [The answer is no, just in case you wonder] In order to add your optimization, more work is needed, like adding new parameters so that we do not miss critical __raise_softirq_irqoff(NET_RX_SOFTIRQ) when _needed_. We keep going circles around softirq deficiencies, I feel you are trying to fix a second-order 'issue'. Real cause is elsewhere, look at recent patches from Jakub. Thanks.
On Mon, Mar 27, 2023 at 1:35 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > Forgive me. Really I need some coffee. I made a mistake. This line > > above should be: > > > > + if (!test_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > > > > But the whole thing doesn't feel right. I need a few days to dig into > > this part until Eric can help me with more of it. > > > > I am still traveling, and this is weekend time :/ Thanks for your time, Eric, really appreciate it. > > It should not be too hard to read net/core/dev.c and remember that not > _all_ drivers (or some core networking functions) use the NAPI model. > > eg look at netif_rx() and ask yourself why your patch is buggy. Yes, it is. In my last email I sent yesterday I encountered one issue which exactly happened when I started hundreds of iperf processes transmitting data to loopback. It got stuck :( So I realized it is the non-napi case that triggers such a problem. > > Just look at callers of enqueue_to_backlog() and ask yourself if all > of them are called from net_rx_action() > > [The answer is no, just in case you wonder] > > In order to add your optimization, more work is needed, like adding > new parameters so that we do not miss critical > __raise_softirq_irqoff(NET_RX_SOFTIRQ) when _needed_. Thanks, I need to do more work/study on it. > > We keep going circles around softirq deficiencies, I feel you are > trying to fix a second-order 'issue'. Right, going circles gives me a headache. > > Real cause is elsewhere, look at recent patches from Jakub. After you pointed out, I searched and found there is indeed one patchset in 2022 The tile like this: [PATCH 0/3] softirq: uncontroversial change Thanks, Jason > > Thanks.
diff --git a/net/core/dev.c b/net/core/dev.c index 1518a366783b..bfaaa652f50c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) if (sd != mysd) { sd->rps_ipi_next = mysd->rps_ipi_list; mysd->rps_ipi_list = sd; - - __raise_softirq_irqoff(NET_RX_SOFTIRQ); return 1; } #endif /* CONFIG_RPS */