Message ID | 20230620102856.56074-5-hare@suse.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net/tls: fixes for NVMe-over-TLS | expand |
> Implement ->read_sock() function for use with nvme-tcp. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > Reviewed-by: Sagi Grimberg <sagi@grimberg.me> > Cc: Boris Pismenny <boris.pismenny@gmail.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: netdev@vger.kernel.org > --- > net/tls/tls.h | 2 ++ > net/tls/tls_main.c | 2 ++ > net/tls/tls_sw.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 82 insertions(+) > > diff --git a/net/tls/tls.h b/net/tls/tls.h > index d002c3af1966..ba55cd5c4913 100644 > --- a/net/tls/tls.h > +++ b/net/tls/tls.h > @@ -114,6 +114,8 @@ bool tls_sw_sock_is_readable(struct sock *sk); > ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, > struct pipe_inode_info *pipe, > size_t len, unsigned int flags); > +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t read_actor); > > int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); > void tls_device_splice_eof(struct socket *sock); > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c > index 7b9c83dd7de2..1a062a8c6d33 100644 > --- a/net/tls/tls_main.c > +++ b/net/tls/tls_main.c > @@ -963,10 +963,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG] > ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE]; > ops[TLS_BASE][TLS_SW ].splice_read = tls_sw_splice_read; > ops[TLS_BASE][TLS_SW ].poll = tls_sk_poll; > + ops[TLS_BASE][TLS_SW ].read_sock = tls_sw_read_sock; > > ops[TLS_SW ][TLS_SW ] = ops[TLS_SW ][TLS_BASE]; > ops[TLS_SW ][TLS_SW ].splice_read = tls_sw_splice_read; > ops[TLS_SW ][TLS_SW ].poll = tls_sk_poll; > + ops[TLS_SW ][TLS_SW ].read_sock = tls_sw_read_sock; > > #ifdef CONFIG_TLS_DEVICE > ops[TLS_HW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE]; > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index 97379e34c997..e918c98bbeb2 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -2231,6 +2231,84 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, > goto splice_read_end; > } > > +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t read_actor) > +{ > + struct tls_context *tls_ctx = tls_get_ctx(sk); > + struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); > + struct strp_msg *rxm = NULL; > + struct tls_msg *tlm; > + struct sk_buff *skb; > + ssize_t copied = 0; > + int err, used; > + > + err = tls_rx_reader_lock(sk, ctx, true); > + if (err < 0) > + return err; Unlike recvmsg or splice_read, the caller of read_sock is assumed to have the socket locked, and tls_rx_reader_lock also calls lock_sock, how is this not a deadlock? I'm not exactly clear why the lock is needed here or what is the subtle distinction between tls_rx_reader_lock and what lock_sock provides.
On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote: > > + err = tls_rx_reader_lock(sk, ctx, true); > > + if (err < 0) > > + return err; > > Unlike recvmsg or splice_read, the caller of read_sock is assumed to > have the socket locked, and tls_rx_reader_lock also calls lock_sock, > how is this not a deadlock? Yeah :| > I'm not exactly clear why the lock is needed here or what is the subtle > distinction between tls_rx_reader_lock and what lock_sock provides. It's a bit of a workaround for the consistency of the data stream. There's bunch of state in the TLS ULP and waiting for mem or data releases and re-takes the socket lock. So to stop the flow annoying corner case races I slapped a lock around all of the reader. IMHO depending on the socket lock for anything non-trivial and outside of the socket itself is a bad idea in general. The immediate need at the time was that if you did a read() and someone else did a peek() at the same time from a stream of A B C D you may read A D B C.
On 6/20/23 19:08, Jakub Kicinski wrote: > On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote: >>> + err = tls_rx_reader_lock(sk, ctx, true); >>> + if (err < 0) >>> + return err; >> >> Unlike recvmsg or splice_read, the caller of read_sock is assumed to >> have the socket locked, and tls_rx_reader_lock also calls lock_sock, >> how is this not a deadlock? > > Yeah :| > >> I'm not exactly clear why the lock is needed here or what is the subtle >> distinction between tls_rx_reader_lock and what lock_sock provides. > > It's a bit of a workaround for the consistency of the data stream. > There's bunch of state in the TLS ULP and waiting for mem or data > releases and re-takes the socket lock. So to stop the flow annoying > corner case races I slapped a lock around all of the reader. > > IMHO depending on the socket lock for anything non-trivial and outside > of the socket itself is a bad idea in general. > > The immediate need at the time was that if you did a read() and someone > else did a peek() at the same time from a stream of A B C D you may read > A D B C. Leaving me ever so confused. read_sock() is a generic interface; we cannot require a protocol specific lock before calling it. What to do now? Drop the tls_rx_read_lock from read_sock() again? Cheers, Hannes
>> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote: >>>> + err = tls_rx_reader_lock(sk, ctx, true); >>>> + if (err < 0) >>>> + return err; >>> >>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to >>> have the socket locked, and tls_rx_reader_lock also calls lock_sock, >>> how is this not a deadlock? >> >> Yeah :| >> >>> I'm not exactly clear why the lock is needed here or what is the subtle >>> distinction between tls_rx_reader_lock and what lock_sock provides. >> >> It's a bit of a workaround for the consistency of the data stream. >> There's bunch of state in the TLS ULP and waiting for mem or data >> releases and re-takes the socket lock. So to stop the flow annoying >> corner case races I slapped a lock around all of the reader. >> >> IMHO depending on the socket lock for anything non-trivial and outside >> of the socket itself is a bad idea in general. >> >> The immediate need at the time was that if you did a read() and someone >> else did a peek() at the same time from a stream of A B C D you may read >> A D B C. > > Leaving me ever so confused. > > read_sock() is a generic interface; we cannot require a protocol > specific lock before calling it. > > What to do now? > Drop the tls_rx_read_lock from read_sock() again? Probably just need to synchronize the readers by splitting that from tls_rx_reader_lock: -- diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 53f944e6d8ef..53404c3fdcc6 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1845,13 +1845,10 @@ tls_read_flush_backlog(struct sock *sk, struct tls_prot_info *prot, return sk_flush_backlog(sk); } -static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx, - bool nonblock) +static int tls_rx_reader_acquire(struct sock *sk, struct tls_sw_context_rx *ctx, + bool nonblock) { long timeo; - int err; - - lock_sock(sk); timeo = sock_rcvtimeo(sk, nonblock); @@ -1865,26 +1862,30 @@ static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx, !READ_ONCE(ctx->reader_present), &wait); remove_wait_queue(&ctx->wq, &wait); - if (timeo <= 0) { - err = -EAGAIN; - goto err_unlock; - } - if (signal_pending(current)) { - err = sock_intr_errno(timeo); - goto err_unlock; - } + if (timeo <= 0) + return -EAGAIN; + if (signal_pending(current)) + return sock_intr_errno(timeo); } WRITE_ONCE(ctx->reader_present, 1); return 0; +} -err_unlock: - release_sock(sk); +static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx, + bool nonblock) +{ + int err; + + lock_sock(sk); + err = tls_rx_reader_acquire(sk, ctx, nonblock); + if (err) + release_sock(sk); return err; } -static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx) +static void tls_rx_reader_release(struct sock *sk, struct tls_sw_context_rx *ctx) { if (unlikely(ctx->reader_contended)) { if (wq_has_sleeper(&ctx->wq)) @@ -1896,6 +1897,11 @@ static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx) } WRITE_ONCE(ctx->reader_present, 0); +} + +static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx) +{ + tls_rx_reader_release(sk, ctx); release_sock(sk); } -- Then read_sock can just acquire/release.
On 6/21/23 10:39, Sagi Grimberg wrote: > >>> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote: >>>>> + err = tls_rx_reader_lock(sk, ctx, true); >>>>> + if (err < 0) >>>>> + return err; >>>> >>>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to >>>> have the socket locked, and tls_rx_reader_lock also calls lock_sock, >>>> how is this not a deadlock? >>> >>> Yeah :| >>> >>>> I'm not exactly clear why the lock is needed here or what is the subtle >>>> distinction between tls_rx_reader_lock and what lock_sock provides. >>> >>> It's a bit of a workaround for the consistency of the data stream. >>> There's bunch of state in the TLS ULP and waiting for mem or data >>> releases and re-takes the socket lock. So to stop the flow annoying >>> corner case races I slapped a lock around all of the reader. >>> >>> IMHO depending on the socket lock for anything non-trivial and outside >>> of the socket itself is a bad idea in general. >>> >>> The immediate need at the time was that if you did a read() and someone >>> else did a peek() at the same time from a stream of A B C D you may read >>> A D B C. >> >> Leaving me ever so confused. >> >> read_sock() is a generic interface; we cannot require a protocol >> specific lock before calling it. >> >> What to do now? >> Drop the tls_rx_read_lock from read_sock() again? > > Probably just need to synchronize the readers by splitting that from > tls_rx_reader_lock: > -- > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index 53f944e6d8ef..53404c3fdcc6 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -1845,13 +1845,10 @@ tls_read_flush_backlog(struct sock *sk, struct > tls_prot_info *prot, > return sk_flush_backlog(sk); > } > > -static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx > *ctx, > - bool nonblock) > +static int tls_rx_reader_acquire(struct sock *sk, struct > tls_sw_context_rx *ctx, > + bool nonblock) > { > long timeo; > - int err; > - > - lock_sock(sk); > > timeo = sock_rcvtimeo(sk, nonblock); > > @@ -1865,26 +1862,30 @@ static int tls_rx_reader_lock(struct sock *sk, > struct tls_sw_context_rx *ctx, > !READ_ONCE(ctx->reader_present), &wait); > remove_wait_queue(&ctx->wq, &wait); > > - if (timeo <= 0) { > - err = -EAGAIN; > - goto err_unlock; > - } > - if (signal_pending(current)) { > - err = sock_intr_errno(timeo); > - goto err_unlock; > - } > + if (timeo <= 0) > + return -EAGAIN; > + if (signal_pending(current)) > + return sock_intr_errno(timeo); > } > > WRITE_ONCE(ctx->reader_present, 1); > > return 0; > +} > > -err_unlock: > - release_sock(sk); > +static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx > *ctx, > + bool nonblock) > +{ > + int err; > + > + lock_sock(sk); > + err = tls_rx_reader_acquire(sk, ctx, nonblock); > + if (err) > + release_sock(sk); > return err; > } > > -static void tls_rx_reader_unlock(struct sock *sk, struct > tls_sw_context_rx *ctx) > +static void tls_rx_reader_release(struct sock *sk, struct > tls_sw_context_rx *ctx) > { > if (unlikely(ctx->reader_contended)) { > if (wq_has_sleeper(&ctx->wq)) > @@ -1896,6 +1897,11 @@ static void tls_rx_reader_unlock(struct sock *sk, > struct tls_sw_context_rx *ctx) > } > > WRITE_ONCE(ctx->reader_present, 0); > +} > + > +static void tls_rx_reader_unlock(struct sock *sk, struct > tls_sw_context_rx *ctx) > +{ > + tls_rx_reader_release(sk, ctx); > release_sock(sk); > } > -- > > Then read_sock can just acquire/release. Good suggestion. Will be including it in the next round. Cheers, Hannes
On 6/21/23 12:08, Hannes Reinecke wrote: > On 6/21/23 10:39, Sagi Grimberg wrote: >> >>>> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote: >>>>>> + err = tls_rx_reader_lock(sk, ctx, true); >>>>>> + if (err < 0) >>>>>> + return err; >>>>> >>>>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to >>>>> have the socket locked, and tls_rx_reader_lock also calls lock_sock, >>>>> how is this not a deadlock? >>>> >>>> Yeah :| >>>> >>>>> I'm not exactly clear why the lock is needed here or what is the >>>>> subtle >>>>> distinction between tls_rx_reader_lock and what lock_sock provides. >>>> >>>> It's a bit of a workaround for the consistency of the data stream. >>>> There's bunch of state in the TLS ULP and waiting for mem or data >>>> releases and re-takes the socket lock. So to stop the flow annoying >>>> corner case races I slapped a lock around all of the reader. >>>> >>>> IMHO depending on the socket lock for anything non-trivial and outside >>>> of the socket itself is a bad idea in general. >>>> >>>> The immediate need at the time was that if you did a read() and someone >>>> else did a peek() at the same time from a stream of A B C D you may >>>> read >>>> A D B C. >>> >>> Leaving me ever so confused. >>> >>> read_sock() is a generic interface; we cannot require a protocol >>> specific lock before calling it. >>> >>> What to do now? >>> Drop the tls_rx_read_lock from read_sock() again? >> >> Probably just need to synchronize the readers by splitting that from >> tls_rx_reader_lock: >> -- >> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c >> index 53f944e6d8ef..53404c3fdcc6 100644 >> --- a/net/tls/tls_sw.c >> +++ b/net/tls/tls_sw.c >> @@ -1845,13 +1845,10 @@ tls_read_flush_backlog(struct sock *sk, struct >> tls_prot_info *prot, >> return sk_flush_backlog(sk); >> } >> >> -static int tls_rx_reader_lock(struct sock *sk, struct >> tls_sw_context_rx *ctx, >> - bool nonblock) >> +static int tls_rx_reader_acquire(struct sock *sk, struct >> tls_sw_context_rx *ctx, >> + bool nonblock) >> { >> long timeo; >> - int err; >> - >> - lock_sock(sk); >> >> timeo = sock_rcvtimeo(sk, nonblock); >> >> @@ -1865,26 +1862,30 @@ static int tls_rx_reader_lock(struct sock *sk, >> struct tls_sw_context_rx *ctx, >> !READ_ONCE(ctx->reader_present), &wait); >> remove_wait_queue(&ctx->wq, &wait); >> >> - if (timeo <= 0) { >> - err = -EAGAIN; >> - goto err_unlock; >> - } >> - if (signal_pending(current)) { >> - err = sock_intr_errno(timeo); >> - goto err_unlock; >> - } >> + if (timeo <= 0) >> + return -EAGAIN; >> + if (signal_pending(current)) >> + return sock_intr_errno(timeo); >> } >> >> WRITE_ONCE(ctx->reader_present, 1); >> >> return 0; >> +} >> >> -err_unlock: >> - release_sock(sk); >> +static int tls_rx_reader_lock(struct sock *sk, struct >> tls_sw_context_rx *ctx, >> + bool nonblock) >> +{ >> + int err; >> + >> + lock_sock(sk); >> + err = tls_rx_reader_acquire(sk, ctx, nonblock); >> + if (err) >> + release_sock(sk); >> return err; >> } >> >> -static void tls_rx_reader_unlock(struct sock *sk, struct >> tls_sw_context_rx *ctx) >> +static void tls_rx_reader_release(struct sock *sk, struct >> tls_sw_context_rx *ctx) >> { >> if (unlikely(ctx->reader_contended)) { >> if (wq_has_sleeper(&ctx->wq)) >> @@ -1896,6 +1897,11 @@ static void tls_rx_reader_unlock(struct sock >> *sk, struct tls_sw_context_rx *ctx) >> } >> >> WRITE_ONCE(ctx->reader_present, 0); >> +} >> + >> +static void tls_rx_reader_unlock(struct sock *sk, struct >> tls_sw_context_rx *ctx) >> +{ >> + tls_rx_reader_release(sk, ctx); >> release_sock(sk); >> } >> -- >> >> Then read_sock can just acquire/release. > > Good suggestion. > Will be including it in the next round. Maybe more appropriate helper names would be tls_rx_reader_enter / tls_rx_reader_exit. Whatever Jakub prefers...
On Wed, 21 Jun 2023 12:49:21 +0300 Sagi Grimberg wrote: > > Good suggestion. > > Will be including it in the next round. > > Maybe more appropriate helper names would be > tls_rx_reader_enter / tls_rx_reader_exit. > > Whatever Jakub prefers... I was thinking along the same lines but with __ in front of the names of the factored out code. Your naming as suggested in the diff is better.
diff --git a/net/tls/tls.h b/net/tls/tls.h index d002c3af1966..ba55cd5c4913 100644 --- a/net/tls/tls.h +++ b/net/tls/tls.h @@ -114,6 +114,8 @@ bool tls_sw_sock_is_readable(struct sock *sk); ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t read_actor); int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); void tls_device_splice_eof(struct socket *sock); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 7b9c83dd7de2..1a062a8c6d33 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -963,10 +963,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG] ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE]; ops[TLS_BASE][TLS_SW ].splice_read = tls_sw_splice_read; ops[TLS_BASE][TLS_SW ].poll = tls_sk_poll; + ops[TLS_BASE][TLS_SW ].read_sock = tls_sw_read_sock; ops[TLS_SW ][TLS_SW ] = ops[TLS_SW ][TLS_BASE]; ops[TLS_SW ][TLS_SW ].splice_read = tls_sw_splice_read; ops[TLS_SW ][TLS_SW ].poll = tls_sk_poll; + ops[TLS_SW ][TLS_SW ].read_sock = tls_sw_read_sock; #ifdef CONFIG_TLS_DEVICE ops[TLS_HW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE]; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 97379e34c997..e918c98bbeb2 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2231,6 +2231,84 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, goto splice_read_end; } +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t read_actor) +{ + struct tls_context *tls_ctx = tls_get_ctx(sk); + struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); + struct strp_msg *rxm = NULL; + struct tls_msg *tlm; + struct sk_buff *skb; + ssize_t copied = 0; + int err, used; + + err = tls_rx_reader_lock(sk, ctx, true); + if (err < 0) + return err; + if (!skb_queue_empty(&ctx->rx_list)) { + skb = __skb_dequeue(&ctx->rx_list); + } else { + struct tls_decrypt_arg darg; + + err = tls_rx_rec_wait(sk, NULL, true, true); + if (err <= 0) { + tls_rx_reader_unlock(sk, ctx); + return err; + } + + memset(&darg.inargs, 0, sizeof(darg.inargs)); + + err = tls_rx_one_record(sk, NULL, &darg); + if (err < 0) { + tls_err_abort(sk, -EBADMSG); + tls_rx_reader_unlock(sk, ctx); + return err; + } + + tls_rx_rec_done(ctx); + skb = darg.skb; + } + + do { + rxm = strp_msg(skb); + tlm = tls_msg(skb); + + /* read_sock does not support reading control messages */ + if (tlm->control != TLS_RECORD_TYPE_DATA) { + err = -EINVAL; + goto read_sock_requeue; + } + + used = read_actor(desc, skb, rxm->offset, rxm->full_len); + if (used <= 0) { + err = used; + goto read_sock_end; + } + + copied += used; + if (used < rxm->full_len) { + rxm->offset += used; + rxm->full_len -= used; + if (!desc->count) + goto read_sock_requeue; + } else { + consume_skb(skb); + if (desc->count && !skb_queue_empty(&ctx->rx_list)) + skb = __skb_dequeue(&ctx->rx_list); + else + skb = NULL; + } + } while (skb); + +read_sock_end: + tls_rx_reader_unlock(sk, ctx); + return copied ? : err; + +read_sock_requeue: + __skb_queue_head(&ctx->rx_list, skb); + goto read_sock_end; +} + bool tls_sw_sock_is_readable(struct sock *sk) { struct tls_context *tls_ctx = tls_get_ctx(sk);