Message ID | 828f9d2e9fa1738463a443140eb8f071e41a6fb9.1667897099.git.geliang.tang@suse.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | BPF redundant scheduler | expand |
Context | Check | Description |
---|---|---|
matttbe/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 202 lines checked |
matttbe/build | success | Build and static analysis OK |
matttbe/KVM_Validation__normal | success | Success! ✅ |
matttbe/KVM_Validation__debug | fail | Critical: 2 Call Trace(s) ❌ |
On Tue, 8 Nov 2022, Geliang Tang wrote: > To support redundant package schedulers more easily, this patch refactors > __mptcp_push_pending() logic from: > > For each dfrag: > While sends succeed: > Call the scheduler (selects subflow and msk->snd_burst) > Update subflow locks (push/release/acquire as needed) > Send the dfrag data with mptcp_sendmsg_frag() > Update already_sent, snd_nxt, snd_burst > Update msk->first_pending > Push/release on final subflow > > -> > > While first_pending isn't empty: > Call the scheduler (selects subflow and msk->snd_burst) > Update subflow locks (push/release/acquire as needed) > For each pending dfrag: > While sends succeed: > Send the dfrag data with mptcp_sendmsg_frag() > Update already_sent, snd_nxt, snd_burst > Update msk->first_pending > Break if required by msk->snd_burst / etc > Push/release on final subflow > > Refactors __mptcp_subflow_push_pending logic from: > > For each dfrag: > While sends succeed: > Call the scheduler (selects subflow and msk->snd_burst) > Send the dfrag data with mptcp_subflow_delegate(), break > Send the dfrag data with mptcp_sendmsg_frag() > Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst > Update msk->first_pending > > -> > > While first_pending isn't empty: > Call the scheduler (selects subflow and msk->snd_burst) > Send the dfrag data with mptcp_subflow_delegate(), break > Send the dfrag data with mptcp_sendmsg_frag() > For each pending dfrag: > While sends succeed: > Send the dfrag data with mptcp_sendmsg_frag() > Update already_sent, snd_nxt, snd_burst > Update msk->first_pending > Break if required by msk->snd_burst / etc > > Move the duplicate code from __mptcp_push_pending() and > __mptcp_subflow_push_pending() into a new helper function, named > __subflow_push_pending(). Simplify __mptcp_push_pending() and > __mptcp_subflow_push_pending() by invoking this helper. > > Also move the burst check conditions out of the function > mptcp_subflow_get_send(), check them in __subflow_push_pending() in > the inner "for each pending dfrag" loop. > > Signed-off-by: Geliang Tang <geliang.tang@suse.com> > --- > net/mptcp/protocol.c | 155 +++++++++++++++++++++++-------------------- > 1 file changed, 82 insertions(+), 73 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index ede644556b20..1fb3b46fa427 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -1426,14 +1426,6 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) > sk_stream_memory_free(msk->first) ? msk->first : NULL; > } > > - /* re-use last subflow, if the burst allow that */ > - if (msk->last_snd && msk->snd_burst > 0 && > - sk_stream_memory_free(msk->last_snd) && > - mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) { > - mptcp_set_timeout(sk); > - return msk->last_snd; > - } > - > /* pick the subflow with the lower wmem/wspace ratio */ > for (i = 0; i < SSK_MODE_MAX; ++i) { > send_info[i].ssk = NULL; > @@ -1537,57 +1529,86 @@ void mptcp_check_and_set_pending(struct sock *sk) > mptcp_sk(sk)->push_pending |= BIT(MPTCP_PUSH_PENDING); > } > > -void __mptcp_push_pending(struct sock *sk, unsigned int flags) > +static int __subflow_push_pending(struct sock *sk, struct sock *ssk, > + struct mptcp_sendmsg_info *info) > { > - struct sock *prev_ssk = NULL, *ssk = NULL; > struct mptcp_sock *msk = mptcp_sk(sk); > - struct mptcp_sendmsg_info info = { > - .flags = flags, > - }; > - bool do_check_data_fin = false; > struct mptcp_data_frag *dfrag; > - int len; > + int len, copied = 0, err = 0; > > while ((dfrag = mptcp_send_head(sk))) { > - info.sent = dfrag->already_sent; > - info.limit = dfrag->data_len; > + info->sent = dfrag->already_sent; > + info->limit = dfrag->data_len; > len = dfrag->data_len - dfrag->already_sent; > while (len > 0) { > int ret = 0; > > - prev_ssk = ssk; > - ssk = mptcp_subflow_get_send(msk); > - > - /* First check. If the ssk has changed since > - * the last round, release prev_ssk > - */ > - if (ssk != prev_ssk && prev_ssk) > - mptcp_push_release(prev_ssk, &info); > - if (!ssk) > - goto out; > - > - /* Need to lock the new subflow only if different > - * from the previous one, otherwise we are still > - * helding the relevant lock > - */ > - if (ssk != prev_ssk) > - lock_sock(ssk); > - > - ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info); > + ret = mptcp_sendmsg_frag(sk, ssk, dfrag, info); > if (ret <= 0) { > - if (ret == -EAGAIN) > - continue; > - mptcp_push_release(ssk, &info); > + err = copied ? : ret; > goto out; > } > > - do_check_data_fin = true; > - info.sent += ret; > + info->sent += ret; > + copied += ret; > len -= ret; > > mptcp_update_post_push(msk, dfrag, ret); > } > WRITE_ONCE(msk->first_pending, mptcp_send_next(sk)); > + > + if (msk->snd_burst <= 0 || > + !sk_stream_memory_free(ssk) || > + !mptcp_subflow_active(mptcp_subflow_ctx(ssk))) { > + err = copied ? : -EAGAIN; > + goto out; > + } > + mptcp_set_timeout(sk); > + } > + err = copied; > + > +out: > + return err; > +} > + > +void __mptcp_push_pending(struct sock *sk, unsigned int flags) > +{ > + struct sock *prev_ssk = NULL, *ssk = NULL; > + struct mptcp_sock *msk = mptcp_sk(sk); > + struct mptcp_sendmsg_info info = { > + .flags = flags, > + }; > + bool do_check_data_fin = false; > + > + while (mptcp_send_head(sk)) { > + int ret = 0; > + > + prev_ssk = ssk; > + ssk = mptcp_subflow_get_send(msk); > + > + /* First check. If the ssk has changed since > + * the last round, release prev_ssk > + */ > + if (ssk != prev_ssk && prev_ssk) > + mptcp_push_release(prev_ssk, &info); > + if (!ssk) > + goto out; > + > + /* Need to lock the new subflow only if different > + * from the previous one, otherwise we are still > + * helding the relevant lock > + */ > + if (ssk != prev_ssk) > + lock_sock(ssk); > + > + ret = __subflow_push_pending(sk, ssk, &info); > + if (ret <= 0) { > + if (ret == -EAGAIN) > + continue; > + mptcp_push_release(ssk, &info); > + goto out; > + } > + do_check_data_fin = true; > } > > /* at this point we held the socket lock for the last subflow we used */ > @@ -1608,49 +1629,37 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool > struct mptcp_sendmsg_info info = { > .data_lock_held = true, > }; > - struct mptcp_data_frag *dfrag; > struct sock *xmit_ssk; > - int len, copied = 0; I'm not sure why copied is removed here, it's still useful for knowing when a tcp_push() is required at the end of the function. The push needs to happen if data has been sent. __subflow_push_pending() can be called more than once, and if it returned a positive number on *any* call then tcp_push() is required. Option A: Continue using the 'copied' variable Option B: Use a bool instead (like "bool do_push;"). > + int ret = 0; > > info.flags = 0; > - while ((dfrag = mptcp_send_head(sk))) { > - info.sent = dfrag->already_sent; > - info.limit = dfrag->data_len; > - len = dfrag->data_len - dfrag->already_sent; > - while (len > 0) { > - int ret = 0; > - > - /* check for a different subflow usage only after > - * spooling the first chunk of data > - */ > - xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk); > - if (!xmit_ssk) > - goto out; > - if (xmit_ssk != ssk) { > - mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), > - MPTCP_DELEGATE_SEND); > - goto out; > - } > - > - ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info); > - if (ret <= 0) > - goto out; > - > - info.sent += ret; > - copied += ret; > - len -= ret; > - first = false; > + while (mptcp_send_head(sk)) { > + /* check for a different subflow usage only after > + * spooling the first chunk of data > + */ > + xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk); > + if (!xmit_ssk) > + goto out; > + if (xmit_ssk != ssk) { > + mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), > + MPTCP_DELEGATE_SEND); > + goto out; > + } > > - mptcp_update_post_push(msk, dfrag, ret); > + ret = __subflow_push_pending(sk, ssk, &info); > + first = false; > + if (ret <= 0) { > + if (ret == -EAGAIN) > + continue; > + break; > } > - WRITE_ONCE(msk->first_pending, mptcp_send_next(sk)); Option A: "copied += ret;" here Option B: "do_push = true;" > } > > out: > /* __mptcp_alloc_tx_skb could have released some wmem and we are > * not going to flush it via release_sock() > */ > - if (copied) { > + if (ret) { Option A: leave this unchanged Option B: replace with "if (do_push) {" > tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle, > info.size_goal); > if (!mptcp_timer_pending(sk)) > -- > 2.35.3 > > > -- Mat Martineau Intel
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ede644556b20..1fb3b46fa427 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1426,14 +1426,6 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) sk_stream_memory_free(msk->first) ? msk->first : NULL; } - /* re-use last subflow, if the burst allow that */ - if (msk->last_snd && msk->snd_burst > 0 && - sk_stream_memory_free(msk->last_snd) && - mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) { - mptcp_set_timeout(sk); - return msk->last_snd; - } - /* pick the subflow with the lower wmem/wspace ratio */ for (i = 0; i < SSK_MODE_MAX; ++i) { send_info[i].ssk = NULL; @@ -1537,57 +1529,86 @@ void mptcp_check_and_set_pending(struct sock *sk) mptcp_sk(sk)->push_pending |= BIT(MPTCP_PUSH_PENDING); } -void __mptcp_push_pending(struct sock *sk, unsigned int flags) +static int __subflow_push_pending(struct sock *sk, struct sock *ssk, + struct mptcp_sendmsg_info *info) { - struct sock *prev_ssk = NULL, *ssk = NULL; struct mptcp_sock *msk = mptcp_sk(sk); - struct mptcp_sendmsg_info info = { - .flags = flags, - }; - bool do_check_data_fin = false; struct mptcp_data_frag *dfrag; - int len; + int len, copied = 0, err = 0; while ((dfrag = mptcp_send_head(sk))) { - info.sent = dfrag->already_sent; - info.limit = dfrag->data_len; + info->sent = dfrag->already_sent; + info->limit = dfrag->data_len; len = dfrag->data_len - dfrag->already_sent; while (len > 0) { int ret = 0; - prev_ssk = ssk; - ssk = mptcp_subflow_get_send(msk); - - /* First check. If the ssk has changed since - * the last round, release prev_ssk - */ - if (ssk != prev_ssk && prev_ssk) - mptcp_push_release(prev_ssk, &info); - if (!ssk) - goto out; - - /* Need to lock the new subflow only if different - * from the previous one, otherwise we are still - * helding the relevant lock - */ - if (ssk != prev_ssk) - lock_sock(ssk); - - ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info); + ret = mptcp_sendmsg_frag(sk, ssk, dfrag, info); if (ret <= 0) { - if (ret == -EAGAIN) - continue; - mptcp_push_release(ssk, &info); + err = copied ? : ret; goto out; } - do_check_data_fin = true; - info.sent += ret; + info->sent += ret; + copied += ret; len -= ret; mptcp_update_post_push(msk, dfrag, ret); } WRITE_ONCE(msk->first_pending, mptcp_send_next(sk)); + + if (msk->snd_burst <= 0 || + !sk_stream_memory_free(ssk) || + !mptcp_subflow_active(mptcp_subflow_ctx(ssk))) { + err = copied ? : -EAGAIN; + goto out; + } + mptcp_set_timeout(sk); + } + err = copied; + +out: + return err; +} + +void __mptcp_push_pending(struct sock *sk, unsigned int flags) +{ + struct sock *prev_ssk = NULL, *ssk = NULL; + struct mptcp_sock *msk = mptcp_sk(sk); + struct mptcp_sendmsg_info info = { + .flags = flags, + }; + bool do_check_data_fin = false; + + while (mptcp_send_head(sk)) { + int ret = 0; + + prev_ssk = ssk; + ssk = mptcp_subflow_get_send(msk); + + /* First check. If the ssk has changed since + * the last round, release prev_ssk + */ + if (ssk != prev_ssk && prev_ssk) + mptcp_push_release(prev_ssk, &info); + if (!ssk) + goto out; + + /* Need to lock the new subflow only if different + * from the previous one, otherwise we are still + * helding the relevant lock + */ + if (ssk != prev_ssk) + lock_sock(ssk); + + ret = __subflow_push_pending(sk, ssk, &info); + if (ret <= 0) { + if (ret == -EAGAIN) + continue; + mptcp_push_release(ssk, &info); + goto out; + } + do_check_data_fin = true; } /* at this point we held the socket lock for the last subflow we used */ @@ -1608,49 +1629,37 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool struct mptcp_sendmsg_info info = { .data_lock_held = true, }; - struct mptcp_data_frag *dfrag; struct sock *xmit_ssk; - int len, copied = 0; + int ret = 0; info.flags = 0; - while ((dfrag = mptcp_send_head(sk))) { - info.sent = dfrag->already_sent; - info.limit = dfrag->data_len; - len = dfrag->data_len - dfrag->already_sent; - while (len > 0) { - int ret = 0; - - /* check for a different subflow usage only after - * spooling the first chunk of data - */ - xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk); - if (!xmit_ssk) - goto out; - if (xmit_ssk != ssk) { - mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), - MPTCP_DELEGATE_SEND); - goto out; - } - - ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info); - if (ret <= 0) - goto out; - - info.sent += ret; - copied += ret; - len -= ret; - first = false; + while (mptcp_send_head(sk)) { + /* check for a different subflow usage only after + * spooling the first chunk of data + */ + xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk); + if (!xmit_ssk) + goto out; + if (xmit_ssk != ssk) { + mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), + MPTCP_DELEGATE_SEND); + goto out; + } - mptcp_update_post_push(msk, dfrag, ret); + ret = __subflow_push_pending(sk, ssk, &info); + first = false; + if (ret <= 0) { + if (ret == -EAGAIN) + continue; + break; } - WRITE_ONCE(msk->first_pending, mptcp_send_next(sk)); } out: /* __mptcp_alloc_tx_skb could have released some wmem and we are * not going to flush it via release_sock() */ - if (copied) { + if (ret) { tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle, info.size_goal); if (!mptcp_timer_pending(sk))
To support redundant package schedulers more easily, this patch refactors __mptcp_push_pending() logic from: For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Push/release on final subflow -> While first_pending isn't empty: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) For each pending dfrag: While sends succeed: Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Break if required by msk->snd_burst / etc Push/release on final subflow Refactors __mptcp_subflow_push_pending logic from: For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Send the dfrag data with mptcp_subflow_delegate(), break Send the dfrag data with mptcp_sendmsg_frag() Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst Update msk->first_pending -> While first_pending isn't empty: Call the scheduler (selects subflow and msk->snd_burst) Send the dfrag data with mptcp_subflow_delegate(), break Send the dfrag data with mptcp_sendmsg_frag() For each pending dfrag: While sends succeed: Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Break if required by msk->snd_burst / etc Move the duplicate code from __mptcp_push_pending() and __mptcp_subflow_push_pending() into a new helper function, named __subflow_push_pending(). Simplify __mptcp_push_pending() and __mptcp_subflow_push_pending() by invoking this helper. Also move the burst check conditions out of the function mptcp_subflow_get_send(), check them in __subflow_push_pending() in the inner "for each pending dfrag" loop. Signed-off-by: Geliang Tang <geliang.tang@suse.com> --- net/mptcp/protocol.c | 155 +++++++++++++++++++++++-------------------- 1 file changed, 82 insertions(+), 73 deletions(-)