Message ID | YbN1OL0I1ja4Fwkb@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: dev: Always serialize on Qdisc::busylock in __dev_xmit_skb() on PREEMPT_RT. | expand |
On Fri, 2021-12-10 at 16:41 +0100, Sebastian Andrzej Siewior wrote: > The root-lock is dropped before dev_hard_start_xmit() is invoked and after > setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted > by another sender/task with a higher priority then this new sender won't > be able to submit packets to the NIC directly instead they will be > enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner > is scheduled again and finishes the job. > > By serializing every task on the ->busylock then the task will be > preempted by a sender only after the Qdisc has no owner. > > Always serialize on the busylock on PREEMPT_RT. Not sure how much is relevant in the RT context, but this should impact the xmit tput in a relevant, negative way. If I read correctly, you use the busylock to trigger priority ceiling on each sender. I'm wondering if there are other alternative ways (no contended lock, just some RT specific annotation) to mark a whole section of code for priority ceiling ?!? Thanks! Paolo > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > net/core/dev.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 2a352e668d103..4a701cf2e2c10 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3836,8 +3836,16 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > * separate lock before trying to get qdisc main lock. > * This permits qdisc->running owner to get the lock more > * often and dequeue packets faster. > + * On PREEMPT_RT it is possible to preempt the qdisc owner during xmit > + * and then other tasks will only enqueue packets. The packets will be > + * sent after the qdisc owner is scheduled again. To prevent this > + * scenario the task always serialize on the lock. > */ > - contended = qdisc_is_running(q); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > + contended = qdisc_is_running(q); > + else > + contended = true; > + > if (unlikely(contended)) > spin_lock(&q->busylock); >
On 2021-12-10 17:35:21 [+0100], Paolo Abeni wrote: > On Fri, 2021-12-10 at 16:41 +0100, Sebastian Andrzej Siewior wrote: > > The root-lock is dropped before dev_hard_start_xmit() is invoked and after > > setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted > > by another sender/task with a higher priority then this new sender won't > > be able to submit packets to the NIC directly instead they will be > > enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner > > is scheduled again and finishes the job. > > > > By serializing every task on the ->busylock then the task will be > > preempted by a sender only after the Qdisc has no owner. > > > > Always serialize on the busylock on PREEMPT_RT. > > Not sure how much is relevant in the RT context, but this should impact > the xmit tput in a relevant, negative way. Negative because everyone blocks on lock and transmits packets directly instead of adding it to the queue and leaving for more? > If I read correctly, you use the busylock to trigger priority ceiling > on each sender. I'm wondering if there are other alternative ways (no > contended lock, just some RT specific annotation) to mark a whole > section of code for priority ceiling ?!? priority ceiling as you call it, happens always with the help of a lock. The root_lock is dropped in sch_direct_xmit(). qdisc_run_begin() sets only a bit with no owner association. If I know why the busy-lock bad than I could add another one. The important part is force the sende out of the section so the task with the higher priority can send packets instead of queueing them only. > Thanks! > > Paolo Sebastian
On Fri, Dec 10, 2021 at 8:46 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2021-12-10 17:35:21 [+0100], Paolo Abeni wrote: > > On Fri, 2021-12-10 at 16:41 +0100, Sebastian Andrzej Siewior wrote: > > > The root-lock is dropped before dev_hard_start_xmit() is invoked and after > > > setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted > > > by another sender/task with a higher priority then this new sender won't > > > be able to submit packets to the NIC directly instead they will be > > > enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner > > > is scheduled again and finishes the job. > > > > > > By serializing every task on the ->busylock then the task will be > > > preempted by a sender only after the Qdisc has no owner. > > > > > > Always serialize on the busylock on PREEMPT_RT. > > > > Not sure how much is relevant in the RT context, but this should impact > > the xmit tput in a relevant, negative way. > > Negative because everyone blocks on lock and transmits packets directly > instead of adding it to the queue and leaving for more? > > > If I read correctly, you use the busylock to trigger priority ceiling > > on each sender. I'm wondering if there are other alternative ways (no > > contended lock, just some RT specific annotation) to mark a whole > > section of code for priority ceiling ?!? > > priority ceiling as you call it, happens always with the help of a lock. > The root_lock is dropped in sch_direct_xmit(). > qdisc_run_begin() sets only a bit with no owner association. > If I know why the busy-lock bad than I could add another one. The > important part is force the sende out of the section so the task with > the higher priority can send packets instead of queueing them only. Problem is that if you have a qdisc, qdisc_dequeue() will not know that the packet queued by high prio thread needs to shortcut all prior packets and be sent right away. Because of that, it seems just better that a high prio thread returns immediately and let the dirty work (dequeue packets and send them to devices) be done by other threads ? > > > Thanks! > > > > Paolo > > Sebastian
On 2021-12-10 08:52:55 [-0800], Eric Dumazet wrote: > > Problem is that if you have a qdisc, qdisc_dequeue() will not know that the > packet queued by high prio thread needs to shortcut all prior packets and > be sent right away. That is correct. > Because of that, it seems just better that a high prio thread returns > immediately and let the dirty work (dequeue packets and send them to devices) > be done by other threads ? Ah okay. Lets say you have a task sending a packet every 100ms. And you want that packet to leave the NIC asap. This is your task with SCHED_FIFO 70 priority. IMPORTANT otherwise. Lets say you have a ssh session on the same NIC running at SCHED_OTHER. Nothing important obviously. The NIC is idle, ssh sends a keep-alive, 80 bytes, one skb. SCHED_OTHER, SSH, CPU0 FIFO 70, IMPORTANT, CPU1 __dev_xmit_skb() -> spin_lock(root_lock); -> qdisc_run_begin() -> __test_and_set_bit(__QDISC_STATE2_RUNNING); -> sch_direct_xmit() *PREEMPT* (by something else) __dev_xmit_skb() -> spin_lock(root_lock); <--- PI-boost -> spin_unlock(root_lock); *PREEMPT* again ---> PI boost ends after unlock -> qdisc_run_begin() -> __test_and_set_bit(__QDISC_STATE2_RUNNING) (nope); -> dev_qdisc_enqueue() -> qdisc_run_begin() returns false -> spin_unlock(root_lock); at this point, we don't return to the SSH thread, instead other RT tasks are running. That means that the SSH thread, which owns the qdisc and the NIC, remains preempted and the NIC idle. 300ms pass by, the IMPORTANT thread queued two additional packets. Now, a few usecs later, all SCHED_FIFO tasks go idle on CPU0, and the SSH thread can continue with dev_hard_start_xmit() and send its packet, -> dev_hard_start_xmit() __qdisc_run() (yes, and the three additional packets) … By always acquiring the busy-lock, in the previous scenario, the task with the high-priority allows the low-priority task to run until qdisc_run_end() at which point it releases the qdisc so the high-priority task can actually send packets. One side note: This scenario requires two CPUs (one for low priority task that gets preempted and owns the qdisc, and one one for high priority task that wants to sends packets). With one CPU only the local_bh_disable() invocation would ensure that low-priority thread left the bh-disable section entirely. > > > Thanks! > > > > > > Paolo > > Sebastian
On Fri, 10 Dec 2021 16:41:44 +0100 Sebastian Andrzej Siewior wrote: > - contended = qdisc_is_running(q); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > + contended = qdisc_is_running(q); > + else > + contended = true; Why not: contended = qdisc_is_running(q) || IS_ENABLED(CONFIG_PREEMPT_RT);
On Fri, 2021-12-10 at 17:46 +0100, Sebastian Andrzej Siewior wrote: > On 2021-12-10 17:35:21 [+0100], Paolo Abeni wrote: > > On Fri, 2021-12-10 at 16:41 +0100, Sebastian Andrzej Siewior wrote: > > > The root-lock is dropped before dev_hard_start_xmit() is invoked and after > > > setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted > > > by another sender/task with a higher priority then this new sender won't > > > be able to submit packets to the NIC directly instead they will be > > > enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner > > > is scheduled again and finishes the job. > > > > > > By serializing every task on the ->busylock then the task will be > > > preempted by a sender only after the Qdisc has no owner. > > > > > > Always serialize on the busylock on PREEMPT_RT. > > > > Not sure how much is relevant in the RT context, but this should impact > > the xmit tput in a relevant, negative way. > > Negative because everyone blocks on lock and transmits packets directly > instead of adding it to the queue and leaving for more? In the uncontended scenario this adds an atomic operation and increases the data set size. Thinking again about it, in RT build this is much less noticeable (I forgot spinlock are actually mutex in RT...) > > If I read correctly, you use the busylock to trigger priority > > ceiling > > on each sender. I'm wondering if there are other alternative ways > > (no > > contended lock, just some RT specific annotation) to mark a whole > > section of code for priority ceiling ?!? > > priority ceiling as you call it, happens always with the help of a > lock. > The root_lock is dropped in sch_direct_xmit(). > qdisc_run_begin() sets only a bit with no owner association. > If I know why the busy-lock bad than I could add another one. The > important part is force the sende out of the section so the task with > the higher priority can send packets instead of queueing them only. I thought about adding a local_lock() instead, but that will not solve. I don't see a better solution than this patch, so I'm ok with that. Please follow Jakub's suggestion to clean the code a bit. Thanks! Paolo
On 2021-12-10 20:32:56 [-0800], Jakub Kicinski wrote: > On Fri, 10 Dec 2021 16:41:44 +0100 Sebastian Andrzej Siewior wrote: > > - contended = qdisc_is_running(q); > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > + contended = qdisc_is_running(q); > > + else > > + contended = true; > > Why not: > > contended = qdisc_is_running(q) || IS_ENABLED(CONFIG_PREEMPT_RT); I could do that. But I would swap the two arguments so that the IS_ENABLED macro comes first and if true qdisc_is_running() is optimized away. Sebastian
On Mon, 13 Dec 2021 11:45:59 +0100 Sebastian Andrzej Siewior wrote: > On 2021-12-10 20:32:56 [-0800], Jakub Kicinski wrote: > > On Fri, 10 Dec 2021 16:41:44 +0100 Sebastian Andrzej Siewior wrote: > > > - contended = qdisc_is_running(q); > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > > + contended = qdisc_is_running(q); > > > + else > > > + contended = true; > > > > Why not: > > > > contended = qdisc_is_running(q) || IS_ENABLED(CONFIG_PREEMPT_RT); > > I could do that. But I would swap the two arguments so that the > IS_ENABLED macro comes first and if true qdisc_is_running() is optimized > away. FWIW I disagree. My version was more readable. The sprinkling of the PREEMPT_RT tosh is getting a little annoying. Trying to regress the clarity of the code should be top of mind here.
On 2021-12-13 08:15:56 [-0800], Jakub Kicinski wrote: > FWIW I disagree. My version was more readable. The sprinkling of the > PREEMPT_RT tosh is getting a little annoying. Trying to regress the > clarity of the code should be top of mind here. No worries. Let me spin a new version with a swap. Sebastian
On Mon, 13 Dec 2021 17:18:05 +0100 Sebastian Andrzej Siewior wrote: > On 2021-12-13 08:15:56 [-0800], Jakub Kicinski wrote: > > FWIW I disagree. My version was more readable. The sprinkling of the > > PREEMPT_RT tosh is getting a little annoying. Trying to regress the > > clarity of the code should be top of mind here. > > No worries. Let me spin a new version with a swap. Dave applied your previous version, it's not a big deal, we'll live.
On 2021-12-13 08:20:46 [-0800], Jakub Kicinski wrote: > On Mon, 13 Dec 2021 17:18:05 +0100 Sebastian Andrzej Siewior wrote: > > On 2021-12-13 08:15:56 [-0800], Jakub Kicinski wrote: > > > FWIW I disagree. My version was more readable. The sprinkling of the > > > PREEMPT_RT tosh is getting a little annoying. Trying to regress the > > > clarity of the code should be top of mind here. > > > > No worries. Let me spin a new version with a swap. > > Dave applied your previous version, it's not a big deal, we'll live. but happy. Not just live. I've sent a follow-up. I need the network department in a happy mood :) Sebastian
diff --git a/net/core/dev.c b/net/core/dev.c index 2a352e668d103..4a701cf2e2c10 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3836,8 +3836,16 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, * separate lock before trying to get qdisc main lock. * This permits qdisc->running owner to get the lock more * often and dequeue packets faster. + * On PREEMPT_RT it is possible to preempt the qdisc owner during xmit + * and then other tasks will only enqueue packets. The packets will be + * sent after the qdisc owner is scheduled again. To prevent this + * scenario the task always serialize on the lock. */ - contended = qdisc_is_running(q); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + contended = qdisc_is_running(q); + else + contended = true; + if (unlikely(contended)) spin_lock(&q->busylock);
The root-lock is dropped before dev_hard_start_xmit() is invoked and after setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted by another sender/task with a higher priority then this new sender won't be able to submit packets to the NIC directly instead they will be enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner is scheduled again and finishes the job. By serializing every task on the ->busylock then the task will be preempted by a sender only after the Qdisc has no owner. Always serialize on the busylock on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- net/core/dev.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)