Message ID | alpine.DEB.2.10.1711151319250.12676@sstabellini-ThinkPad-X260 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 15 Nov 2017, Stefano Stabellini wrote: > On Wed, 15 Nov 2017, Boris Ostrovsky wrote: > > On 11/15/2017 02:09 PM, Stefano Stabellini wrote: > > > On Wed, 15 Nov 2017, Juergen Gross wrote: > > >>>>> while(mutex_is_locked(&map->active.in_mutex.owner) || > > >>>>> mutex_is_locked(&map->active.out_mutex.owner)) > > >>>>> cpu_relax(); > > >>>>> > > >>>>> ? > > >>>> I'm not convinced there isn't a race. > > >>>> > > >>>> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only > > >>>> then in_mutex is taken. What happens if pvcalls_front_release() resets > > >>>> sk_send_head and manages to test the mutex before the mutex is locked? > > >>>> > > >>>> Even in case this is impossible: the whole construct seems to be rather > > >>>> fragile. > > > I agree it looks fragile, and I agree that it might be best to avoid the > > > usage of in_mutex and out_mutex as refcounts. More comments on this > > > below. > > > > > > > > >>> I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and > > >>> not rely on mutex state. > > >> Yes, this would work. > > > Yes, I agree it would work and for the sake of getting something in > > > shape for the merge window I am attaching a patch for it. Please go > > > ahead with it. Let me know if you need anything else immediately, and > > > I'll work on it ASAP. > > > > > > > > > > > > However, I should note that this is a pretty big hammer we are using: > > > the refcount is global, while we only need to wait until it's only us > > > _on this specific socket_. > > > > Can you explain why socket is important? > > Yes, of course: there are going to be many open sockets on a given > pvcalls connection. pvcalls_refcount is global: waiting on > pvcalls_refcount means waiting until any operations on any unrelated > sockets stop. While we only need to wait until the operations on the one > socket we want to close stop. > > > > > > > > We really need a per socket refcount. If we don't want to use the mutex > > > internal counters, then we need another one. > > > > > > See the appended patch that introduces a per socket refcount. However, > > > for the merge window, also using pvcalls_refcount is fine. > > > > > > The race Juergen is concerned about is only theoretically possible: > > > > > > recvmsg: release: > > > > > > test sk_send_head clear sk_send_head > > > <only few instructions> <prepare a message> > > > grab in_mutex <send a message to the server> > > > <wait for an answer> > > > test in_mutex > > > > > > Without kernel preemption is not possible for release to clear > > > sk_send_head and test in_mutex after recvmsg tests sk_send_head and > > > before recvmsg grabs in_mutex. > > > > Sorry, I don't follow --- what does preemption have to do with this? If > > recvmsg and release happen on different processors the order of > > operations can be > > > > CPU0 CPU1 > > > > test sk_send_head > > <interrupt> > > clear sk_send_head > > <send a message to the server> > > <wait for an answer> > > test in_mutex > > free everything > > grab in_mutex > > > > I actually think RCU should take care of all of this. > > Preemption could cause something very similar to happen, but your > example is very good too, even better, because it could trigger the > issue even with preemption disabled. I'll think more about this and > submit a separate patch on top of the simple pvcalls_refcount patch > below. > > > > > But for now I will take your refcount-based patch. However, it also > > needs comment update. > > > > How about > > > > /* > > * We need to make sure that send/rcvmsg on this socket has not started > > * before we've cleared sk_send_head here. The easiest (though not optimal) > > * way to guarantee this is to see that no pvcall (other than us) is in > > progress. > > */ > > Yes, this is the patch: > > --- > > > xen/pvcalls: fix potential endless loop in pvcalls-front.c > > mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you > take in_mutex on the first try, but you can't take out_mutex. Next times > you call mutex_trylock() in_mutex is going to fail. It's an endless > loop. > > Solve the problem by waiting until the global refcount is 1 instead (the > refcount is 1 when the only active pvcalls frontend function is > pvcalls_front_release). > > Reported-by: Dan Carpenter <dan.carpenter@oracle.com> > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> > CC: boris.ostrovsky@oracle.com > CC: jgross@suse.com > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > index 0c1ec68..54c0fda 100644 > --- a/drivers/xen/pvcalls-front.c > +++ b/drivers/xen/pvcalls-front.c > @@ -1043,13 +1043,12 @@ int pvcalls_front_release(struct socket *sock) > wake_up_interruptible(&map->active.inflight_conn_req); > > /* > - * Wait until there are no more waiters on the mutexes. > - * We know that no new waiters can be added because sk_send_head > - * is set to NULL -- we only need to wait for the existing > - * waiters to return. > - */ > - while (!mutex_trylock(&map->active.in_mutex) || > - !mutex_trylock(&map->active.out_mutex)) > + * We need to make sure that sendmsg/recvmsg on this socket have > + * not started before we've cleared sk_send_head here. The > + * easiest (though not optimal) way to guarantee this is to see > + * that no pvcall (other than us) is in progress. > + */ > + while (atomic_read(&pvcalls_refcount) > 1) > cpu_relax(); > > pvcalls_front_free_map(bedata, map); Sorry, code style issue: one missing space in the comment. I'll send it again separately.
On 11/15/2017 04:50 PM, Stefano Stabellini wrote: > > Sorry, code style issue: one missing space in the comment. I'll send it > again separately I've already fixed this, no worries. -boris
On Wed, 15 Nov 2017, Boris Ostrovsky wrote: > On 11/15/2017 04:50 PM, Stefano Stabellini wrote: > > > > Sorry, code style issue: one missing space in the comment. I'll send it > > again separately > > > I've already fixed this, no worries. Thank you!!
On 15/11/17 22:20, Stefano Stabellini wrote: > On Wed, 15 Nov 2017, Boris Ostrovsky wrote: >> On 11/15/2017 02:09 PM, Stefano Stabellini wrote: >>> On Wed, 15 Nov 2017, Juergen Gross wrote: >>>>>>> while(mutex_is_locked(&map->active.in_mutex.owner) || >>>>>>> mutex_is_locked(&map->active.out_mutex.owner)) >>>>>>> cpu_relax(); >>>>>>> >>>>>>> ? >>>>>> I'm not convinced there isn't a race. >>>>>> >>>>>> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only >>>>>> then in_mutex is taken. What happens if pvcalls_front_release() resets >>>>>> sk_send_head and manages to test the mutex before the mutex is locked? >>>>>> >>>>>> Even in case this is impossible: the whole construct seems to be rather >>>>>> fragile. >>> I agree it looks fragile, and I agree that it might be best to avoid the >>> usage of in_mutex and out_mutex as refcounts. More comments on this >>> below. >>> >>> >>>>> I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and >>>>> not rely on mutex state. >>>> Yes, this would work. >>> Yes, I agree it would work and for the sake of getting something in >>> shape for the merge window I am attaching a patch for it. Please go >>> ahead with it. Let me know if you need anything else immediately, and >>> I'll work on it ASAP. >>> >>> >>> >>> However, I should note that this is a pretty big hammer we are using: >>> the refcount is global, while we only need to wait until it's only us >>> _on this specific socket_. >> >> Can you explain why socket is important? > > Yes, of course: there are going to be many open sockets on a given > pvcalls connection. pvcalls_refcount is global: waiting on > pvcalls_refcount means waiting until any operations on any unrelated > sockets stop. While we only need to wait until the operations on the one > socket we want to close stop. > > >>> >>> We really need a per socket refcount. If we don't want to use the mutex >>> internal counters, then we need another one. >>> >>> See the appended patch that introduces a per socket refcount. However, >>> for the merge window, also using pvcalls_refcount is fine. >>> >>> The race Juergen is concerned about is only theoretically possible: >>> >>> recvmsg: release: >>> >>> test sk_send_head clear sk_send_head >>> <only few instructions> <prepare a message> >>> grab in_mutex <send a message to the server> >>> <wait for an answer> >>> test in_mutex >>> >>> Without kernel preemption is not possible for release to clear >>> sk_send_head and test in_mutex after recvmsg tests sk_send_head and >>> before recvmsg grabs in_mutex. >> >> Sorry, I don't follow --- what does preemption have to do with this? If >> recvmsg and release happen on different processors the order of >> operations can be >> >> CPU0 CPU1 >> >> test sk_send_head >> <interrupt> >> clear sk_send_head >> <send a message to the server> >> <wait for an answer> >> test in_mutex >> free everything >> grab in_mutex >> >> I actually think RCU should take care of all of this. > > Preemption could cause something very similar to happen, but your > example is very good too, even better, because it could trigger the > issue even with preemption disabled. I'll think more about this and > submit a separate patch on top of the simple pvcalls_refcount patch > below. We are running as a guest. Even with interrupts off the vcpu could be off the pcpu for several milliseconds! Don't count on code length to avoid races! Juergen
diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 0c1ec68..54c0fda 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -1043,13 +1043,12 @@ int pvcalls_front_release(struct socket *sock) wake_up_interruptible(&map->active.inflight_conn_req); /* - * Wait until there are no more waiters on the mutexes. - * We know that no new waiters can be added because sk_send_head - * is set to NULL -- we only need to wait for the existing - * waiters to return. - */ - while (!mutex_trylock(&map->active.in_mutex) || - !mutex_trylock(&map->active.out_mutex)) + * We need to make sure that sendmsg/recvmsg on this socket have + * not started before we've cleared sk_send_head here. The + * easiest (though not optimal) way to guarantee this is to see + * that no pvcall (other than us) is in progress. + */ + while (atomic_read(&pvcalls_refcount) > 1) cpu_relax(); pvcalls_front_free_map(bedata, map);