Message ID | alpine.DEB.2.10.1711151100490.22767@sstabellini-ThinkPad-X260 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/15/2017 02:09 PM, Stefano Stabellini wrote: > On Wed, 15 Nov 2017, Juergen Gross wrote: >>>>> while(mutex_is_locked(&map->active.in_mutex.owner) || >>>>> mutex_is_locked(&map->active.out_mutex.owner)) >>>>> cpu_relax(); >>>>> >>>>> ? >>>> I'm not convinced there isn't a race. >>>> >>>> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only >>>> then in_mutex is taken. What happens if pvcalls_front_release() resets >>>> sk_send_head and manages to test the mutex before the mutex is locked? >>>> >>>> Even in case this is impossible: the whole construct seems to be rather >>>> fragile. > I agree it looks fragile, and I agree that it might be best to avoid the > usage of in_mutex and out_mutex as refcounts. More comments on this > below. > > >>> I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and >>> not rely on mutex state. >> Yes, this would work. > Yes, I agree it would work and for the sake of getting something in > shape for the merge window I am attaching a patch for it. Please go > ahead with it. Let me know if you need anything else immediately, and > I'll work on it ASAP. > > > > However, I should note that this is a pretty big hammer we are using: > the refcount is global, while we only need to wait until it's only us > _on this specific socket_. Can you explain why socket is important? > > We really need a per socket refcount. If we don't want to use the mutex > internal counters, then we need another one. > > See the appended patch that introduces a per socket refcount. However, > for the merge window, also using pvcalls_refcount is fine. > > The race Juergen is concerned about is only theoretically possible: > > recvmsg: release: > > test sk_send_head clear sk_send_head > <only few instructions> <prepare a message> > grab in_mutex <send a message to the server> > <wait for an answer> > test in_mutex > > Without kernel preemption is not possible for release to clear > sk_send_head and test in_mutex after recvmsg tests sk_send_head and > before recvmsg grabs in_mutex. Sorry, I don't follow --- what does preemption have to do with this? If recvmsg and release happen on different processors the order of operations can be CPU0 CPU1 test sk_send_head <interrupt> clear sk_send_head <send a message to the server> <wait for an answer> test in_mutex free everything grab in_mutex I actually think RCU should take care of all of this. But for now I will take your refcount-based patch. However, it also needs comment update. How about /* * We need to make sure that send/rcvmsg on this socket has not started * before we've cleared sk_send_head here. The easiest (though not optimal) * way to guarantee this is to see that no pvcall (other than us) is in progress. */ -boris > > But maybe we need to disable kernel preemption in recvmsg and sendmsg to > stay on the safe side? > > The patch below introduces a per active socket refcount, so that we > don't have to rely on in_mutex and out_mutex for refcounting. It also > disables preemption in sendmsg and recvmsg in the region described > above. > > I don't think this patch should go in immediately. We can take our time > to figure out the best fix. > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > index 0c1ec68..8c1030b 100644 > --- a/drivers/xen/pvcalls-front.c > +++ b/drivers/xen/pvcalls-front.c > @@ -68,6 +68,7 @@ struct sock_mapping { > struct pvcalls_data data; > struct mutex in_mutex; > struct mutex out_mutex; > + atomic_t sock_refcount; > > wait_queue_head_t inflight_conn_req; > } active; > @@ -497,15 +498,20 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg, > } > bedata = dev_get_drvdata(&pvcalls_front_dev->dev); > > + preempt_disable(); > map = (struct sock_mapping *) sock->sk->sk_send_head; > if (!map) { > + preempt_enable(); > pvcalls_exit(); > return -ENOTSOCK; > } > > + atomic_inc(&map->active.sock_refcount); > mutex_lock(&map->active.out_mutex); > + preempt_enable(); > if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) { > mutex_unlock(&map->active.out_mutex); > + atomic_dec(&map->active.sock_refcount); > pvcalls_exit(); > return -EAGAIN; > } > @@ -528,6 +534,7 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg, > tot_sent = sent; > > mutex_unlock(&map->active.out_mutex); > + atomic_dec(&map->active.sock_refcount); > pvcalls_exit(); > return tot_sent; > } > @@ -600,13 +607,17 @@ int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > } > bedata = dev_get_drvdata(&pvcalls_front_dev->dev); > > + preempt_disable(); > map = (struct sock_mapping *) sock->sk->sk_send_head; > if (!map) { > + preempt_enable(); > pvcalls_exit(); > return -ENOTSOCK; > } > > + atomic_inc(&map->active.sock_refcount); > mutex_lock(&map->active.in_mutex); > + preempt_enable(); > if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER)) > len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER); > > @@ -625,6 +636,7 @@ int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > ret = 0; > > mutex_unlock(&map->active.in_mutex); > + atomic_dec(&map->active.sock_refcount); > pvcalls_exit(); > return ret; > } > @@ -1048,8 +1060,7 @@ int pvcalls_front_release(struct socket *sock) > * is set to NULL -- we only need to wait for the existing > * waiters to return. > */ > - while (!mutex_trylock(&map->active.in_mutex) || > - !mutex_trylock(&map->active.out_mutex)) > + while (atomic_read(&map->active.sock_refcount) > 0) > cpu_relax(); > > pvcalls_front_free_map(bedata, map);
On 11/15/2017 02:50 PM, Boris Ostrovsky wrote: > On 11/15/2017 02:09 PM, Stefano Stabellini wrote: >> >> However, I should note that this is a pretty big hammer we are using: >> the refcount is global, while we only need to wait until it's only us >> _on this specific socket_. > Can you explain why socket is important? Nevermind. I was thinking about *processor* socket (as in cores, threads, packages etc. I am right now looking at a bug that deals with core behavior ;-)) -boris
xen/pvcalls: fix potential endless loop in pvcalls-front.c mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you take in_mutex on the first try, but you can't take out_mutex. Next times you call mutex_trylock() in_mutex is going to fail. It's an endless loop. Solve the problem by waiting until the global refcount is 1 instead (the refcount is 1 when the only active pvcalls frontend function is pvcalls_front_release). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 0c1ec68..72a74c3 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -1048,8 +1048,7 @@ int pvcalls_front_release(struct socket *sock) * is set to NULL -- we only need to wait for the existing * waiters to return. */ - while (!mutex_trylock(&map->active.in_mutex) || - !mutex_trylock(&map->active.out_mutex)) + while (atomic_read(&pvcalls_refcount) > 1) cpu_relax(); pvcalls_front_free_map(bedata, map);