Message ID | 8b21ff13-d6ea-3fa5-8d87-c05157112e4b@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | evtchn: (not so) recent XSAs follow-on | expand |
Hi Jan, On 05/01/2021 13:09, Jan Beulich wrote: > The local domain's lock is needed for the port allocation, but for the > remote side the per-channel lock is sufficient. The per-channel locks > then need to be acquired slightly earlier, though. I was expecting is little bit more information in the commit message because there are a few changes in behavior with this change: 1) AFAICT, evtchn_allocate_port() rely on rchn->state to be protected by the rd->event_lock. Now that you dropped the rd->event_lock, rchn->state may be accessed while it is updated in evtchn_bind_interdomain(). The port cannot go back to ECS_FREE here, but I think the access needs to be switched to {read, write}_atomic() or ACCESS_ONCE. 2) xsm_evtchn_interdomain() is now going to be called without the rd->event_lock. Can you confirm that the lock is not needed by XSM? > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > v4: New. > > --- a/xen/common/event_channel.c > +++ b/xen/common/event_channel.c > @@ -355,18 +355,7 @@ static long evtchn_bind_interdomain(evtc > if ( (rd = rcu_lock_domain_by_id(rdom)) == NULL ) > return -ESRCH; > > - /* Avoid deadlock by first acquiring lock of domain with smaller id. */ > - if ( ld < rd ) > - { > - spin_lock(&ld->event_lock); > - spin_lock(&rd->event_lock); > - } > - else > - { > - if ( ld != rd ) > - spin_lock(&rd->event_lock); > - spin_lock(&ld->event_lock); > - } > + spin_lock(&ld->event_lock); > > if ( (lport = get_free_port(ld)) < 0 ) > ERROR_EXIT(lport); > @@ -375,15 +364,19 @@ static long evtchn_bind_interdomain(evtc > if ( !port_is_valid(rd, rport) ) > ERROR_EXIT_DOM(-EINVAL, rd); > rchn = evtchn_from_port(rd, rport); > + > + double_evtchn_lock(lchn, rchn); > + > if ( (rchn->state != ECS_UNBOUND) || > (rchn->u.unbound.remote_domid != ld->domain_id) ) You want to unlock the event channels here. > ERROR_EXIT_DOM(-EINVAL, rd); > > rc = xsm_evtchn_interdomain(XSM_HOOK, ld, lchn, rd, rchn); > if ( rc ) > + { > + double_evtchn_unlock(lchn, rchn); > goto out; > - > - double_evtchn_lock(lchn, rchn); > + } > > lchn->u.interdomain.remote_dom = rd; > lchn->u.interdomain.remote_port = rport; > @@ -407,8 +400,6 @@ static long evtchn_bind_interdomain(evtc > out: > check_free_port(ld, lport); > spin_unlock(&ld->event_lock); > - if ( ld != rd ) > - spin_unlock(&rd->event_lock); > > rcu_unlock_domain(rd); > > Cheers,
On 09/01/2021 15:41, Julien Grall wrote: > Hi Jan, > > On 05/01/2021 13:09, Jan Beulich wrote: >> The local domain's lock is needed for the port allocation, but for the >> remote side the per-channel lock is sufficient. The per-channel locks >> then need to be acquired slightly earlier, though. > > I was expecting is little bit more information in the commit message > because there are a few changes in behavior with this change: > > 1) AFAICT, evtchn_allocate_port() rely on rchn->state to be protected > by the rd->event_lock. Now that you dropped the rd->event_lock, > rchn->state may be accessed while it is updated in > evtchn_bind_interdomain(). The port cannot go back to ECS_FREE here, but > I think the access needs to be switched to {read, write}_atomic() or > ACCESS_ONCE. > > 2) xsm_evtchn_interdomain() is now going to be called without the > rd->event_lock. Can you confirm that the lock is not needed by XSM? Actually, I think there is a bigger issue. evtchn_close() will check chn1->state with just d1->event_lock held (IOW, there chn1->lock is not held). If the remote domain happen to close the unbound port at the same time the local domain bound it, then you may end up in the following situation: evtchn_bind_interdomain() | evtchn_close() | | switch ( chn1->state ) | case ECS_UNBOUND: | /* nothing to do */ double_evtchn_lock() | rchn->state = ECS_INTERDOMAIN | double_evtchn_unlock() | | evtchn_write_lock(chn1) | evtchn_free(d1, chn1) | evtchn_write_unlock(chn1) When the local domain will try to close the port, it will hit the BUG_ON(chn2->state != ECS_INTERDOMAIN) because the remote port were already freed. I think this can be solved by acquiring the event lock earlier on in evtchn_close(). Although, this may become a can of worms as it would be more complex to prevent lock inversion because chn1->lock and chn2->lock. Cheers,
On 09.01.2021 17:14, Julien Grall wrote: > On 09/01/2021 15:41, Julien Grall wrote: >> On 05/01/2021 13:09, Jan Beulich wrote: >>> The local domain's lock is needed for the port allocation, but for the >>> remote side the per-channel lock is sufficient. The per-channel locks >>> then need to be acquired slightly earlier, though. >> >> I was expecting is little bit more information in the commit message >> because there are a few changes in behavior with this change: >> >> 1) AFAICT, evtchn_allocate_port() rely on rchn->state to be protected >> by the rd->event_lock. Now that you dropped the rd->event_lock, >> rchn->state may be accessed while it is updated in >> evtchn_bind_interdomain(). The port cannot go back to ECS_FREE here, but >> I think the access needs to be switched to {read, write}_atomic() or >> ACCESS_ONCE. >> >> 2) xsm_evtchn_interdomain() is now going to be called without the >> rd->event_lock. Can you confirm that the lock is not needed by XSM? > > Actually, I think there is a bigger issue. evtchn_close() will check > chn1->state with just d1->event_lock held (IOW, there chn1->lock is not > held). > > If the remote domain happen to close the unbound port at the same time > the local domain bound it, then you may end up in the following situation: > > > evtchn_bind_interdomain() | evtchn_close() > | > | switch ( chn1->state ) > | case ECS_UNBOUND: > | /* nothing to do */ > double_evtchn_lock() | > rchn->state = ECS_INTERDOMAIN | > double_evtchn_unlock() | > | evtchn_write_lock(chn1) > | evtchn_free(d1, chn1) > | evtchn_write_unlock(chn1) > > When the local domain will try to close the port, it will hit the > BUG_ON(chn2->state != ECS_INTERDOMAIN) because the remote port were > already freed. Hmm, yes, thanks for spotting (and sorry for taking a while to reply). > I think this can be solved by acquiring the event lock earlier on in > evtchn_close(). Although, this may become a can of worms as it would be > more complex to prevent lock inversion because chn1->lock and chn2->lock. Indeed. I think I'll give up on trying to eliminate the double per-domain event locking for the time being, and resubmit with both patches dropped. Jan
--- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -355,18 +355,7 @@ static long evtchn_bind_interdomain(evtc if ( (rd = rcu_lock_domain_by_id(rdom)) == NULL ) return -ESRCH; - /* Avoid deadlock by first acquiring lock of domain with smaller id. */ - if ( ld < rd ) - { - spin_lock(&ld->event_lock); - spin_lock(&rd->event_lock); - } - else - { - if ( ld != rd ) - spin_lock(&rd->event_lock); - spin_lock(&ld->event_lock); - } + spin_lock(&ld->event_lock); if ( (lport = get_free_port(ld)) < 0 ) ERROR_EXIT(lport); @@ -375,15 +364,19 @@ static long evtchn_bind_interdomain(evtc if ( !port_is_valid(rd, rport) ) ERROR_EXIT_DOM(-EINVAL, rd); rchn = evtchn_from_port(rd, rport); + + double_evtchn_lock(lchn, rchn); + if ( (rchn->state != ECS_UNBOUND) || (rchn->u.unbound.remote_domid != ld->domain_id) ) ERROR_EXIT_DOM(-EINVAL, rd); rc = xsm_evtchn_interdomain(XSM_HOOK, ld, lchn, rd, rchn); if ( rc ) + { + double_evtchn_unlock(lchn, rchn); goto out; - - double_evtchn_lock(lchn, rchn); + } lchn->u.interdomain.remote_dom = rd; lchn->u.interdomain.remote_port = rport; @@ -407,8 +400,6 @@ static long evtchn_bind_interdomain(evtc out: check_free_port(ld, lport); spin_unlock(&ld->event_lock); - if ( ld != rd ) - spin_unlock(&rd->event_lock); rcu_unlock_domain(rd);
The local domain's lock is needed for the port allocation, but for the remote side the per-channel lock is sufficient. The per-channel locks then need to be acquired slightly earlier, though. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v4: New.