diff mbox series

[v8,2/3] xen/events: rework fifo queue locking

Message ID 20201125105122.3650-3-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series xen/events: further locking adjustments | expand

Commit Message

Jürgen Groß Nov. 25, 2020, 10:51 a.m. UTC
Two cpus entering evtchn_fifo_set_pending() for the same event channel
can race in case the first one gets interrupted after setting
EVTCHN_FIFO_PENDING and when the other one manages to set
EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
lead to evtchn_check_pollers() being called before the event is put
properly into the queue, resulting eventually in the guest not seeing
the event pending and thus blocking forever afterwards.

Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
lock") made the race just more obvious, while the fifo event channel
implementation had this race from the beginning when an unmask operation
was running in parallel with an event channel send operation.

Using a spinlock for the per event channel lock is problematic due to
some paths needing to take the lock are called with interrupts off, so
the lock would need to disable interrupts, which in turn breaks some
use cases related to vm events.

For avoiding this race the queue locking in evtchn_fifo_set_pending()
needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an
event channel needs to change queues both queues need to be locked
initially.

Reported-by: Jan Beulich <jbeulich@suse.com>
Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
Fixes: de6acb78bf0e137c ("evtchn: use a per-event channel lock for sending events")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V7:
- new patch

V8:
- update commit message (Jan Beulich)
- update double locking (Jan Beulich)
- add comment (Jan Beulich)
- fix handling when not getting lock (Jan Beulich)
- add const (Jan Beulich)

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/event_fifo.c | 128 ++++++++++++++++++++++------------------
 1 file changed, 70 insertions(+), 58 deletions(-)

Comments

Jan Beulich Nov. 27, 2020, 1:11 p.m. UTC | #1
On 25.11.2020 11:51, Juergen Gross wrote:
> Two cpus entering evtchn_fifo_set_pending() for the same event channel
> can race in case the first one gets interrupted after setting
> EVTCHN_FIFO_PENDING and when the other one manages to set
> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
> lead to evtchn_check_pollers() being called before the event is put
> properly into the queue, resulting eventually in the guest not seeing
> the event pending and thus blocking forever afterwards.
> 
> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
> lock") made the race just more obvious, while the fifo event channel
> implementation had this race from the beginning when an unmask operation
> was running in parallel with an event channel send operation.

I notice you've altered the Fixes: tag, but you still say "from
the beginning" here.

> Using a spinlock for the per event channel lock is problematic due to
> some paths needing to take the lock are called with interrupts off, so
> the lock would need to disable interrupts, which in turn breaks some
> use cases related to vm events.

This reads as if it got put here by mistake. May I suggest to start
with "Using ... had turned out problematic ..." and then add
something like "Therefore that lock was switched to an rw one"?

> For avoiding this race the queue locking in evtchn_fifo_set_pending()
> needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
> EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an
> event channel needs to change queues both queues need to be locked
> initially.

"... in order to avoid having a window with no lock held at all"?

> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>          return;
>      }
>  
> +    /*
> +     * Lock all queues related to the event channel (in case of a queue change
> +     * this might be two).
> +     * It is mandatory to do that before setting and testing the PENDING bit
> +     * and to hold the current queue lock until the event has put into the

"has been" or "was" I think?

With adjustments along these lines (which I guess could again be
done while committing) or reasons against supplied
Reviewed-by: Jan Beulich <jbeulich@suse.com>

One aspect which I wonder whether it wants adding to the description
is that this change makes a bad situation worse: Back at the time
per-channel locks were added to avoid the bottleneck of the per-
domain event lock. While a per-queue lock's scope at least isn't the
entire domain, their use on the send path still has been preventing
full parallelism here. And this patch widens the lock holding region.
If at least there was a fast path not requiring any locking ...

Jan
Jürgen Groß Nov. 27, 2020, 1:31 p.m. UTC | #2
On 27.11.20 14:11, Jan Beulich wrote:
> On 25.11.2020 11:51, Juergen Gross wrote:
>> Two cpus entering evtchn_fifo_set_pending() for the same event channel
>> can race in case the first one gets interrupted after setting
>> EVTCHN_FIFO_PENDING and when the other one manages to set
>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
>> lead to evtchn_check_pollers() being called before the event is put
>> properly into the queue, resulting eventually in the guest not seeing
>> the event pending and thus blocking forever afterwards.
>>
>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
>> lock") made the race just more obvious, while the fifo event channel
>> implementation had this race from the beginning when an unmask operation
>> was running in parallel with an event channel send operation.
> 
> I notice you've altered the Fixes: tag, but you still say "from
> the beginning" here.

Oh, indeed. Thanks for spotting.

> 
>> Using a spinlock for the per event channel lock is problematic due to
>> some paths needing to take the lock are called with interrupts off, so
>> the lock would need to disable interrupts, which in turn breaks some
>> use cases related to vm events.
> 
> This reads as if it got put here by mistake. May I suggest to start
> with "Using ... had turned out problematic ..." and then add
> something like "Therefore that lock was switched to an rw one"?

Fine with me.

> 
>> For avoiding this race the queue locking in evtchn_fifo_set_pending()
>> needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
>> EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an
>> event channel needs to change queues both queues need to be locked
>> initially.
> 
> "... in order to avoid having a window with no lock held at all"?

Yes, this is better.

> 
>> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>>           return;
>>       }
>>   
>> +    /*
>> +     * Lock all queues related to the event channel (in case of a queue change
>> +     * this might be two).
>> +     * It is mandatory to do that before setting and testing the PENDING bit
>> +     * and to hold the current queue lock until the event has put into the
> 
> "has been" or "was" I think?

"has been".

> 
> With adjustments along these lines (which I guess could again be
> done while committing) or reasons against supplied
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> 
> One aspect which I wonder whether it wants adding to the description
> is that this change makes a bad situation worse: Back at the time
> per-channel locks were added to avoid the bottleneck of the per-
> domain event lock. While a per-queue lock's scope at least isn't the
> entire domain, their use on the send path still has been preventing
> full parallelism here. And this patch widens the lock holding region.

As the description already says that additional operations are to be
guarded by the lock I think it is rather clear that the lock holding
region is being widened.

OTOH I wouldn't reject such an addition to the commit message if you
think it is required.


Juergen
Julien Grall Nov. 27, 2020, 1:58 p.m. UTC | #3
Hi Juergen,

On 25/11/2020 10:51, Juergen Gross wrote:
> -static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
> -                                                struct evtchn *evtchn,
> -                                                unsigned long *flags)
> -{
> -    struct vcpu *v;
> -    struct evtchn_fifo_queue *q, *old_q;
> -    unsigned int try;
> -    union evtchn_fifo_lastq lastq;
> -
> -    for ( try = 0; try < 3; try++ )
> -    {
> -        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> -        v = d->vcpu[lastq.last_vcpu_id];
> -        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
> -
> -        spin_lock_irqsave(&old_q->lock, *flags);
> -
> -        v = d->vcpu[lastq.last_vcpu_id];
> -        q = &v->evtchn_fifo->queue[lastq.last_priority];
> -
> -        if ( old_q == q )
> -            return old_q;
> -
> -        spin_unlock_irqrestore(&old_q->lock, *flags);
> -    }
> -
> -    gprintk(XENLOG_WARNING,
> -            "dom%d port %d lost event (too many queue changes)\n",
> -            d->domain_id, evtchn->port);
> -    return NULL;
> -}
> -
>   static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link)
>   {
>       event_word_t new, old;
> @@ -190,6 +158,9 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>       event_word_t *word;
>       unsigned long flags;
>       bool_t was_pending;
> +    struct evtchn_fifo_queue *q, *old_q;
> +    unsigned int try;
> +    bool linked = true;
>   
>       port = evtchn->port;
>       word = evtchn_fifo_word_from_port(d, port);
> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>           return;
>       }
>   
> +    /*
> +     * Lock all queues related to the event channel (in case of a queue change
> +     * this might be two).
> +     * It is mandatory to do that before setting and testing the PENDING bit
> +     * and to hold the current queue lock until the event has put into the
> +     * list of pending events in order to avoid waking up a guest without the
> +     * event being visibly pending in the guest.
> +     */
> +    for ( try = 0; try < 4; try++ )

May I ask why the number of try is 4 rather than the original 3?

> +    {
> +        union evtchn_fifo_lastq lastq;
> +        const struct vcpu *old_v;
> +
> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> +        old_v = d->vcpu[lastq.last_vcpu_id];
> +
> +        q = &v->evtchn_fifo->queue[evtchn->priority];
> +        old_q = &old_v->evtchn_fifo->queue[lastq.last_priority];
> +
> +        if ( q == old_q )
> +            spin_lock_irqsave(&q->lock, flags);
> +        else if ( q < old_q )
> +        {
> +            spin_lock_irqsave(&q->lock, flags);
> +            spin_lock(&old_q->lock);
> +        }
> +        else
> +        {
> +            spin_lock_irqsave(&old_q->lock, flags);
> +            spin_lock(&q->lock);
> +        }
> +
> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
> +        old_v = d->vcpu[lastq.last_vcpu_id];
> +        if ( q == &v->evtchn_fifo->queue[evtchn->priority] &&
> +             old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] )
> +            break;
> +
> +        if ( q != old_q )
> +            spin_unlock(&old_q->lock);
> +        spin_unlock_irqrestore(&q->lock, flags);
> +    }
> +
>       was_pending = guest_test_and_set_bit(d, EVTCHN_FIFO_PENDING, word);
>   
> +    /* If we didn't get the lock bail out. */
> +    if ( try == 4 )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "dom%d port %d lost event (too many queue changes)\n",
> +                d->domain_id, evtchn->port);

NIT: You can use %pd use in place of dom%d.

Cheers,
Jan Beulich Nov. 27, 2020, 2:03 p.m. UTC | #4
On 27.11.2020 14:58, Julien Grall wrote:
> On 25/11/2020 10:51, Juergen Gross wrote:
>> +    /* If we didn't get the lock bail out. */
>> +    if ( try == 4 )
>> +    {
>> +        gprintk(XENLOG_WARNING,
>> +                "dom%d port %d lost event (too many queue changes)\n",
>> +                d->domain_id, evtchn->port);
> 
> NIT: You can use %pd use in place of dom%d.

Oh, indeed - not just can, but imo really should. I'll record this
for on-commit adjustment, unless a v9 becomes necessary anyway.

Jan
Jürgen Groß Nov. 27, 2020, 2:05 p.m. UTC | #5
On 27.11.20 14:58, Julien Grall wrote:
> Hi Juergen,
> 
> On 25/11/2020 10:51, Juergen Gross wrote:
>> -static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>> -                                                struct evtchn *evtchn,
>> -                                                unsigned long *flags)
>> -{
>> -    struct vcpu *v;
>> -    struct evtchn_fifo_queue *q, *old_q;
>> -    unsigned int try;
>> -    union evtchn_fifo_lastq lastq;
>> -
>> -    for ( try = 0; try < 3; try++ )
>> -    {
>> -        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>> -        v = d->vcpu[lastq.last_vcpu_id];
>> -        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
>> -
>> -        spin_lock_irqsave(&old_q->lock, *flags);
>> -
>> -        v = d->vcpu[lastq.last_vcpu_id];
>> -        q = &v->evtchn_fifo->queue[lastq.last_priority];
>> -
>> -        if ( old_q == q )
>> -            return old_q;
>> -
>> -        spin_unlock_irqrestore(&old_q->lock, *flags);
>> -    }
>> -
>> -    gprintk(XENLOG_WARNING,
>> -            "dom%d port %d lost event (too many queue changes)\n",
>> -            d->domain_id, evtchn->port);
>> -    return NULL;
>> -}
>> -
>>   static int try_set_link(event_word_t *word, event_word_t *w, 
>> uint32_t link)
>>   {
>>       event_word_t new, old;
>> @@ -190,6 +158,9 @@ static void evtchn_fifo_set_pending(struct vcpu 
>> *v, struct evtchn *evtchn)
>>       event_word_t *word;
>>       unsigned long flags;
>>       bool_t was_pending;
>> +    struct evtchn_fifo_queue *q, *old_q;
>> +    unsigned int try;
>> +    bool linked = true;
>>       port = evtchn->port;
>>       word = evtchn_fifo_word_from_port(d, port);
>> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct vcpu 
>> *v, struct evtchn *evtchn)
>>           return;
>>       }
>> +    /*
>> +     * Lock all queues related to the event channel (in case of a 
>> queue change
>> +     * this might be two).
>> +     * It is mandatory to do that before setting and testing the 
>> PENDING bit
>> +     * and to hold the current queue lock until the event has put 
>> into the
>> +     * list of pending events in order to avoid waking up a guest 
>> without the
>> +     * event being visibly pending in the guest.
>> +     */
>> +    for ( try = 0; try < 4; try++ )
> 
> May I ask why the number of try is 4 rather than the original 3?

Oh, I think this is just a typo. OTOH it doesn't really matter.

> 
>> +    {
>> +        union evtchn_fifo_lastq lastq;
>> +        const struct vcpu *old_v;
>> +
>> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>> +        old_v = d->vcpu[lastq.last_vcpu_id];
>> +
>> +        q = &v->evtchn_fifo->queue[evtchn->priority];
>> +        old_q = &old_v->evtchn_fifo->queue[lastq.last_priority];
>> +
>> +        if ( q == old_q )
>> +            spin_lock_irqsave(&q->lock, flags);
>> +        else if ( q < old_q )
>> +        {
>> +            spin_lock_irqsave(&q->lock, flags);
>> +            spin_lock(&old_q->lock);
>> +        }
>> +        else
>> +        {
>> +            spin_lock_irqsave(&old_q->lock, flags);
>> +            spin_lock(&q->lock);
>> +        }
>> +
>> +        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>> +        old_v = d->vcpu[lastq.last_vcpu_id];
>> +        if ( q == &v->evtchn_fifo->queue[evtchn->priority] &&
>> +             old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] )
>> +            break;
>> +
>> +        if ( q != old_q )
>> +            spin_unlock(&old_q->lock);
>> +        spin_unlock_irqrestore(&q->lock, flags);
>> +    }
>> +
>>       was_pending = guest_test_and_set_bit(d, EVTCHN_FIFO_PENDING, word);
>> +    /* If we didn't get the lock bail out. */
>> +    if ( try == 4 )
>> +    {
>> +        gprintk(XENLOG_WARNING,
>> +                "dom%d port %d lost event (too many queue changes)\n",
>> +                d->domain_id, evtchn->port);
> 
> NIT: You can use %pd use in place of dom%d.

Yes, indeed. This was just moved around. :-)


Juergen
Julien Grall Nov. 27, 2020, 2:11 p.m. UTC | #6
Hi Juergen,

On 27/11/2020 14:05, Jürgen Groß wrote:
> On 27.11.20 14:58, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 25/11/2020 10:51, Juergen Gross wrote:
>>> -static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>>> -                                                struct evtchn *evtchn,
>>> -                                                unsigned long *flags)
>>> -{
>>> -    struct vcpu *v;
>>> -    struct evtchn_fifo_queue *q, *old_q;
>>> -    unsigned int try;
>>> -    union evtchn_fifo_lastq lastq;
>>> -
>>> -    for ( try = 0; try < 3; try++ )
>>> -    {
>>> -        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>>> -        v = d->vcpu[lastq.last_vcpu_id];
>>> -        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
>>> -
>>> -        spin_lock_irqsave(&old_q->lock, *flags);
>>> -
>>> -        v = d->vcpu[lastq.last_vcpu_id];
>>> -        q = &v->evtchn_fifo->queue[lastq.last_priority];
>>> -
>>> -        if ( old_q == q )
>>> -            return old_q;
>>> -
>>> -        spin_unlock_irqrestore(&old_q->lock, *flags);
>>> -    }
>>> -
>>> -    gprintk(XENLOG_WARNING,
>>> -            "dom%d port %d lost event (too many queue changes)\n",
>>> -            d->domain_id, evtchn->port);
>>> -    return NULL;
>>> -}
>>> -
>>>   static int try_set_link(event_word_t *word, event_word_t *w, 
>>> uint32_t link)
>>>   {
>>>       event_word_t new, old;
>>> @@ -190,6 +158,9 @@ static void evtchn_fifo_set_pending(struct vcpu 
>>> *v, struct evtchn *evtchn)
>>>       event_word_t *word;
>>>       unsigned long flags;
>>>       bool_t was_pending;
>>> +    struct evtchn_fifo_queue *q, *old_q;
>>> +    unsigned int try;
>>> +    bool linked = true;
>>>       port = evtchn->port;
>>>       word = evtchn_fifo_word_from_port(d, port);
>>> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct vcpu 
>>> *v, struct evtchn *evtchn)
>>>           return;
>>>       }
>>> +    /*
>>> +     * Lock all queues related to the event channel (in case of a 
>>> queue change
>>> +     * this might be two).
>>> +     * It is mandatory to do that before setting and testing the 
>>> PENDING bit
>>> +     * and to hold the current queue lock until the event has put 
>>> into the
>>> +     * list of pending events in order to avoid waking up a guest 
>>> without the
>>> +     * event being visibly pending in the guest.
>>> +     */
>>> +    for ( try = 0; try < 4; try++ )
>>
>> May I ask why the number of try is 4 rather than the original 3?
> 
> Oh, I think this is just a typo. OTOH it doesn't really matter.

I agree that the number of try was likely random and therefore using a 
different number should not matter.

However, this is making more difficult to review the patch because this 
is an unexplained change.

I would prefer if this is dropped. But if you want to keep this change, 
then it should be explained in the commit message.

Cheers,
Jürgen Groß Nov. 27, 2020, 2:14 p.m. UTC | #7
On 27.11.20 15:11, Julien Grall wrote:
> Hi Juergen,
> 
> On 27/11/2020 14:05, Jürgen Groß wrote:
>> On 27.11.20 14:58, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 25/11/2020 10:51, Juergen Gross wrote:
>>>> -static struct evtchn_fifo_queue *lock_old_queue(const struct domain 
>>>> *d,
>>>> -                                                struct evtchn *evtchn,
>>>> -                                                unsigned long *flags)
>>>> -{
>>>> -    struct vcpu *v;
>>>> -    struct evtchn_fifo_queue *q, *old_q;
>>>> -    unsigned int try;
>>>> -    union evtchn_fifo_lastq lastq;
>>>> -
>>>> -    for ( try = 0; try < 3; try++ )
>>>> -    {
>>>> -        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>>>> -        v = d->vcpu[lastq.last_vcpu_id];
>>>> -        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
>>>> -
>>>> -        spin_lock_irqsave(&old_q->lock, *flags);
>>>> -
>>>> -        v = d->vcpu[lastq.last_vcpu_id];
>>>> -        q = &v->evtchn_fifo->queue[lastq.last_priority];
>>>> -
>>>> -        if ( old_q == q )
>>>> -            return old_q;
>>>> -
>>>> -        spin_unlock_irqrestore(&old_q->lock, *flags);
>>>> -    }
>>>> -
>>>> -    gprintk(XENLOG_WARNING,
>>>> -            "dom%d port %d lost event (too many queue changes)\n",
>>>> -            d->domain_id, evtchn->port);
>>>> -    return NULL;
>>>> -}
>>>> -
>>>>   static int try_set_link(event_word_t *word, event_word_t *w, 
>>>> uint32_t link)
>>>>   {
>>>>       event_word_t new, old;
>>>> @@ -190,6 +158,9 @@ static void evtchn_fifo_set_pending(struct vcpu 
>>>> *v, struct evtchn *evtchn)
>>>>       event_word_t *word;
>>>>       unsigned long flags;
>>>>       bool_t was_pending;
>>>> +    struct evtchn_fifo_queue *q, *old_q;
>>>> +    unsigned int try;
>>>> +    bool linked = true;
>>>>       port = evtchn->port;
>>>>       word = evtchn_fifo_word_from_port(d, port);
>>>> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct 
>>>> vcpu *v, struct evtchn *evtchn)
>>>>           return;
>>>>       }
>>>> +    /*
>>>> +     * Lock all queues related to the event channel (in case of a 
>>>> queue change
>>>> +     * this might be two).
>>>> +     * It is mandatory to do that before setting and testing the 
>>>> PENDING bit
>>>> +     * and to hold the current queue lock until the event has put 
>>>> into the
>>>> +     * list of pending events in order to avoid waking up a guest 
>>>> without the
>>>> +     * event being visibly pending in the guest.
>>>> +     */
>>>> +    for ( try = 0; try < 4; try++ )
>>>
>>> May I ask why the number of try is 4 rather than the original 3?
>>
>> Oh, I think this is just a typo. OTOH it doesn't really matter.
> 
> I agree that the number of try was likely random and therefore using a 
> different number should not matter.
> 
> However, this is making more difficult to review the patch because this 
> is an unexplained change.
> 
> I would prefer if this is dropped. But if you want to keep this change, 
> then it should be explained in the commit message.

Well, I could argue that there is potentially one lock more to take, so
the retry number is increased by one, too. ;-)

I think we can just switch back to 3.


Juergen
Julien Grall Nov. 27, 2020, 2:26 p.m. UTC | #8
On 27/11/2020 14:14, Jürgen Groß wrote:
> On 27.11.20 15:11, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 27/11/2020 14:05, Jürgen Groß wrote:
>>> On 27.11.20 14:58, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 25/11/2020 10:51, Juergen Gross wrote:
>>>>> -static struct evtchn_fifo_queue *lock_old_queue(const struct 
>>>>> domain *d,
>>>>> -                                                struct evtchn 
>>>>> *evtchn,
>>>>> -                                                unsigned long *flags)
>>>>> -{
>>>>> -    struct vcpu *v;
>>>>> -    struct evtchn_fifo_queue *q, *old_q;
>>>>> -    unsigned int try;
>>>>> -    union evtchn_fifo_lastq lastq;
>>>>> -
>>>>> -    for ( try = 0; try < 3; try++ )
>>>>> -    {
>>>>> -        lastq.raw = read_atomic(&evtchn->fifo_lastq);
>>>>> -        v = d->vcpu[lastq.last_vcpu_id];
>>>>> -        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
>>>>> -
>>>>> -        spin_lock_irqsave(&old_q->lock, *flags);
>>>>> -
>>>>> -        v = d->vcpu[lastq.last_vcpu_id];
>>>>> -        q = &v->evtchn_fifo->queue[lastq.last_priority];
>>>>> -
>>>>> -        if ( old_q == q )
>>>>> -            return old_q;
>>>>> -
>>>>> -        spin_unlock_irqrestore(&old_q->lock, *flags);
>>>>> -    }
>>>>> -
>>>>> -    gprintk(XENLOG_WARNING,
>>>>> -            "dom%d port %d lost event (too many queue changes)\n",
>>>>> -            d->domain_id, evtchn->port);
>>>>> -    return NULL;
>>>>> -}
>>>>> -
>>>>>   static int try_set_link(event_word_t *word, event_word_t *w, 
>>>>> uint32_t link)
>>>>>   {
>>>>>       event_word_t new, old;
>>>>> @@ -190,6 +158,9 @@ static void evtchn_fifo_set_pending(struct vcpu 
>>>>> *v, struct evtchn *evtchn)
>>>>>       event_word_t *word;
>>>>>       unsigned long flags;
>>>>>       bool_t was_pending;
>>>>> +    struct evtchn_fifo_queue *q, *old_q;
>>>>> +    unsigned int try;
>>>>> +    bool linked = true;
>>>>>       port = evtchn->port;
>>>>>       word = evtchn_fifo_word_from_port(d, port);
>>>>> @@ -204,17 +175,67 @@ static void evtchn_fifo_set_pending(struct 
>>>>> vcpu *v, struct evtchn *evtchn)
>>>>>           return;
>>>>>       }
>>>>> +    /*
>>>>> +     * Lock all queues related to the event channel (in case of a 
>>>>> queue change
>>>>> +     * this might be two).
>>>>> +     * It is mandatory to do that before setting and testing the 
>>>>> PENDING bit
>>>>> +     * and to hold the current queue lock until the event has put 
>>>>> into the
>>>>> +     * list of pending events in order to avoid waking up a guest 
>>>>> without the
>>>>> +     * event being visibly pending in the guest.
>>>>> +     */
>>>>> +    for ( try = 0; try < 4; try++ )
>>>>
>>>> May I ask why the number of try is 4 rather than the original 3?
>>>
>>> Oh, I think this is just a typo. OTOH it doesn't really matter.
>>
>> I agree that the number of try was likely random and therefore using a 
>> different number should not matter.
>>
>> However, this is making more difficult to review the patch because 
>> this is an unexplained change.
>>
>> I would prefer if this is dropped. But if you want to keep this 
>> change, then it should be explained in the commit message.
> 
> Well, I could argue that there is potentially one lock more to take, so
> the retry number is increased by one, too. ;-)

I will not argue against switching 4 :). I care more about explaining 
what we do because this is really frustrating to read some of the older 
commit where there are not much rationale (I probably wrote some).

Cheers,
diff mbox series

Patch

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index f39e61105f..443593c3b3 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -87,38 +87,6 @@  static void evtchn_fifo_init(struct domain *d, struct evtchn *evtchn)
                  d->domain_id, evtchn->port);
 }
 
-static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
-                                                struct evtchn *evtchn,
-                                                unsigned long *flags)
-{
-    struct vcpu *v;
-    struct evtchn_fifo_queue *q, *old_q;
-    unsigned int try;
-    union evtchn_fifo_lastq lastq;
-
-    for ( try = 0; try < 3; try++ )
-    {
-        lastq.raw = read_atomic(&evtchn->fifo_lastq);
-        v = d->vcpu[lastq.last_vcpu_id];
-        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
-
-        spin_lock_irqsave(&old_q->lock, *flags);
-
-        v = d->vcpu[lastq.last_vcpu_id];
-        q = &v->evtchn_fifo->queue[lastq.last_priority];
-
-        if ( old_q == q )
-            return old_q;
-
-        spin_unlock_irqrestore(&old_q->lock, *flags);
-    }
-
-    gprintk(XENLOG_WARNING,
-            "dom%d port %d lost event (too many queue changes)\n",
-            d->domain_id, evtchn->port);
-    return NULL;
-}          
-
 static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link)
 {
     event_word_t new, old;
@@ -190,6 +158,9 @@  static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
     event_word_t *word;
     unsigned long flags;
     bool_t was_pending;
+    struct evtchn_fifo_queue *q, *old_q;
+    unsigned int try;
+    bool linked = true;
 
     port = evtchn->port;
     word = evtchn_fifo_word_from_port(d, port);
@@ -204,17 +175,67 @@  static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         return;
     }
 
+    /*
+     * Lock all queues related to the event channel (in case of a queue change
+     * this might be two).
+     * It is mandatory to do that before setting and testing the PENDING bit
+     * and to hold the current queue lock until the event has put into the
+     * list of pending events in order to avoid waking up a guest without the
+     * event being visibly pending in the guest.
+     */
+    for ( try = 0; try < 4; try++ )
+    {
+        union evtchn_fifo_lastq lastq;
+        const struct vcpu *old_v;
+
+        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+        old_v = d->vcpu[lastq.last_vcpu_id];
+
+        q = &v->evtchn_fifo->queue[evtchn->priority];
+        old_q = &old_v->evtchn_fifo->queue[lastq.last_priority];
+
+        if ( q == old_q )
+            spin_lock_irqsave(&q->lock, flags);
+        else if ( q < old_q )
+        {
+            spin_lock_irqsave(&q->lock, flags);
+            spin_lock(&old_q->lock);
+        }
+        else
+        {
+            spin_lock_irqsave(&old_q->lock, flags);
+            spin_lock(&q->lock);
+        }
+
+        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+        old_v = d->vcpu[lastq.last_vcpu_id];
+        if ( q == &v->evtchn_fifo->queue[evtchn->priority] &&
+             old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] )
+            break;
+
+        if ( q != old_q )
+            spin_unlock(&old_q->lock);
+        spin_unlock_irqrestore(&q->lock, flags);
+    }
+
     was_pending = guest_test_and_set_bit(d, EVTCHN_FIFO_PENDING, word);
 
+    /* If we didn't get the lock bail out. */
+    if ( try == 4 )
+    {
+        gprintk(XENLOG_WARNING,
+                "dom%d port %d lost event (too many queue changes)\n",
+                d->domain_id, evtchn->port);
+        goto done;
+    }
+
     /*
      * Link the event if it unmasked and not already linked.
      */
     if ( !guest_test_bit(d, EVTCHN_FIFO_MASKED, word) &&
          !guest_test_bit(d, EVTCHN_FIFO_LINKED, word) )
     {
-        struct evtchn_fifo_queue *q, *old_q;
         event_word_t *tail_word;
-        bool_t linked = 0;
 
         /*
          * Control block not mapped.  The guest must not unmask an
@@ -225,25 +246,11 @@  static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         {
             printk(XENLOG_G_WARNING
                    "%pv has no FIFO event channel control block\n", v);
-            goto done;
+            goto unlock;
         }
 
-        /*
-         * No locking around getting the queue. This may race with
-         * changing the priority but we are allowed to signal the
-         * event once on the old priority.
-         */
-        q = &v->evtchn_fifo->queue[evtchn->priority];
-
-        old_q = lock_old_queue(d, evtchn, &flags);
-        if ( !old_q )
-            goto done;
-
         if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) )
-        {
-            spin_unlock_irqrestore(&old_q->lock, flags);
-            goto done;
-        }
+            goto unlock;
 
         /*
          * If this event was a tail, the old queue is now empty and
@@ -262,8 +269,8 @@  static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
             lastq.last_priority = q->priority;
             write_atomic(&evtchn->fifo_lastq, lastq.raw);
 
-            spin_unlock_irqrestore(&old_q->lock, flags);
-            spin_lock_irqsave(&q->lock, flags);
+            spin_unlock(&old_q->lock);
+            old_q = q;
         }
 
         /*
@@ -276,6 +283,7 @@  static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
          * If the queue is empty (i.e., we haven't linked to the new
          * event), head must be updated.
          */
+        linked = false;
         if ( q->tail )
         {
             tail_word = evtchn_fifo_word_from_port(d, q->tail);
@@ -284,15 +292,19 @@  static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         if ( !linked )
             write_atomic(q->head, port);
         q->tail = port;
+    }
 
-        spin_unlock_irqrestore(&q->lock, flags);
+ unlock:
+    if ( q != old_q )
+        spin_unlock(&old_q->lock);
+    spin_unlock_irqrestore(&q->lock, flags);
 
-        if ( !linked
-             && !guest_test_and_set_bit(d, q->priority,
-                                        &v->evtchn_fifo->control_block->ready) )
-            vcpu_mark_events_pending(v);
-    }
  done:
+    if ( !linked &&
+         !guest_test_and_set_bit(d, q->priority,
+                                 &v->evtchn_fifo->control_block->ready) )
+        vcpu_mark_events_pending(v);
+
     if ( !was_pending )
         evtchn_check_pollers(d, port);
 }