diff mbox

[v9,0/5] irqfd fixes and enhancements

Message ID 4A521082.40209@novell.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gregory Haskins July 6, 2009, 2:56 p.m. UTC
Avi Kivity wrote:
> On 07/02/2009 06:50 PM, Avi Kivity wrote:
>> On 07/02/2009 06:37 PM, Gregory Haskins wrote:
>>> (Applies to kvm.git/master:1f9050fd)
>>>
>>> The following is the latest attempt to fix the races in
>>> irqfd/eventfd, as
>>> well as restore DEASSIGN support.  For more details, please read the
>>> patch
>>> headers.
>>>
>>> As always, this series has been tested against the kvm-eventfd unit
>>> test
>>> and everything appears to be functioning properly. You can download
>>> this
>>> test here:
>>
>> Applied, thanks.
>>
>
> ... and unapplied.  There's a refcounting mismatch in irqfd_cleanup: a
> reference is taken for each irqfd, but dropped for each guest.  This
> causes an oops if a guest with no irqfds is created and destroyed:

I was able to reproduce this issue.  The problem turned out to be that I
inadvertently always did a flush_workqueue(), even if the work-queue was
never initialized.   

The following interdiff applied to the reverted patch has been confirmed
to fix the issue:

-------------------


---------------------

You can pick up this fix folded into the original v9:5/5 patch here:

git pull
git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git
for-avi

Sorry for the sloppy patch in v9. :(  Will strive to do better next time.

Regards,
-Greg

Comments

Michael S. Tsirkin July 6, 2009, 4:13 p.m. UTC | #1
On Mon, Jul 06, 2009 at 10:56:02AM -0400, Gregory Haskins wrote:
> Avi Kivity wrote:
> > On 07/02/2009 06:50 PM, Avi Kivity wrote:
> >> On 07/02/2009 06:37 PM, Gregory Haskins wrote:
> >>> (Applies to kvm.git/master:1f9050fd)
> >>>
> >>> The following is the latest attempt to fix the races in
> >>> irqfd/eventfd, as
> >>> well as restore DEASSIGN support.  For more details, please read the
> >>> patch
> >>> headers.
> >>>
> >>> As always, this series has been tested against the kvm-eventfd unit
> >>> test
> >>> and everything appears to be functioning properly. You can download
> >>> this
> >>> test here:
> >>
> >> Applied, thanks.
> >>
> >
> > ... and unapplied.  There's a refcounting mismatch in irqfd_cleanup: a
> > reference is taken for each irqfd, but dropped for each guest.  This
> > causes an oops if a guest with no irqfds is created and destroyed:
> 
> I was able to reproduce this issue.  The problem turned out to be that I
> inadvertently always did a flush_workqueue(), even if the work-queue was
> never initialized.   
> 
> The following interdiff applied to the reverted patch has been confirmed
> to fix the issue:

Could you document the init boolean and its locking rules?
The best place to put it would be where the field is declared btw.
Is it true that init === list_empty(&kvm->irqfds.items)?
If yes maybe we don't need this field at all.


> -------------------
> 
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index fcc3469..52b0e04 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
>         struct _irqfd *irqfd, *tmp;
>         struct eventfd_ctx *eventfd;
>  
> +       if (!kvm->irqfds.init)
> +               return -ENOENT;
> +
>         eventfd = eventfd_ctx_fdget(fd);
>         if (IS_ERR(eventfd))
>                 return PTR_ERR(eventfd);

wouldn't it be cleaner to error out in the for each loop if we don't
find an entry to deactivate?  Might be helpful for apps to get an error
if they didn't deassign anything.

> @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm)
>  {
>         struct _irqfd *irqfd, *tmp;
>  
> +       if (!kvm->irqfds.init)
> +               return;
> +

So here, I recall some old comment that flush below was
needed even if list is empty. Is this no longer true?
If not it might be cleaner to only flush if list is not empty.


>         spin_lock_irq(&kvm->irqfds.lock);
>  
>         list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list)
> 
> ---------------------
> 
> You can pick up this fix folded into the original v9:5/5 patch here:
> 
> git pull
> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git
> for-avi
> 
> Sorry for the sloppy patch in v9. :(  Will strive to do better next time.
> 
> Regards,
> -Greg
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins July 6, 2009, 4:41 p.m. UTC | #2
Michael S. Tsirkin wrote:
> On Mon, Jul 06, 2009 at 10:56:02AM -0400, Gregory Haskins wrote:
>   
>> Avi Kivity wrote:
>>     
>>> On 07/02/2009 06:50 PM, Avi Kivity wrote:
>>>       
>>>> On 07/02/2009 06:37 PM, Gregory Haskins wrote:
>>>>         
>>>>> (Applies to kvm.git/master:1f9050fd)
>>>>>
>>>>> The following is the latest attempt to fix the races in
>>>>> irqfd/eventfd, as
>>>>> well as restore DEASSIGN support.  For more details, please read the
>>>>> patch
>>>>> headers.
>>>>>
>>>>> As always, this series has been tested against the kvm-eventfd unit
>>>>> test
>>>>> and everything appears to be functioning properly. You can download
>>>>> this
>>>>> test here:
>>>>>           
>>>> Applied, thanks.
>>>>
>>>>         
>>> ... and unapplied.  There's a refcounting mismatch in irqfd_cleanup: a
>>> reference is taken for each irqfd, but dropped for each guest.  This
>>> causes an oops if a guest with no irqfds is created and destroyed:
>>>       
>> I was able to reproduce this issue.  The problem turned out to be that I
>> inadvertently always did a flush_workqueue(), even if the work-queue was
>> never initialized.   
>>
>> The following interdiff applied to the reverted patch has been confirmed
>> to fix the issue:
>>     
>
> Could you document the init boolean and its locking rules?
> The best place to put it would be where the field is declared btw.
>   

Will do

> Is it true that init === list_empty(&kvm->irqfds.items)?
> If yes maybe we don't need this field at all.
>
>   
No, because its more difficult to maintain the work-queue when
referenced against active irqfds (*).  So instead, its maintained
against guests that use irqfd, whether they have an active irqfd or
not.  Otherwise you have to contend with the eventfd-side release, which
is a little tricky.

(*) I'm sure its not rocket science to get this working, but it was
getting more complex than I thought it was worth, so I simplified the
model to be per-vm.  Note that this design decision/limitation is
declared in the patch header.
>   
>> -------------------
>>
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index fcc3469..52b0e04 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
>>         struct _irqfd *irqfd, *tmp;
>>         struct eventfd_ctx *eventfd;
>>  
>> +       if (!kvm->irqfds.init)
>> +               return -ENOENT;
>> +
>>         eventfd = eventfd_ctx_fdget(fd);
>>         if (IS_ERR(eventfd))
>>                 return PTR_ERR(eventfd);
>>     
>
> wouldn't it be cleaner to error out in the for each loop if we don't
> find an entry to deactivate?  Might be helpful for apps to get an error
> if they didn't deassign anything.
>   

Again, irqfds.init is somewhat orthogonal to whether the list is
populated or not.  This check is for sanity (how can you deassign if you
didnt assign, etc).  Normally this would be a simple BUG_ON() sanity
check, but I don't want a malicious/broken userspace to gain an easy
attack vector ;)

>   
>> @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm)
>>  {
>>         struct _irqfd *irqfd, *tmp;
>>  
>> +       if (!kvm->irqfds.init)
>> +               return;
>> +
>>     
>
> So here, I recall some old comment that flush below was
> needed even if list is empty. Is this no longer true?
>   

If you are using irqfd, its true.  If irqfds.init == false, you are not
using irqfd and thus the flush cannot be needed.

> If not it might be cleaner to only flush if list is not empty.
>
>   
You have to flush if irqfds.init == true even if the list is empty
because you need to be sure that eventfd-side releases complete.  They
may have already removed themselves from the list, but the work-item is
still in flight.

Regards,
-Greg
Michael S. Tsirkin July 6, 2009, 4:49 p.m. UTC | #3
On Mon, Jul 06, 2009 at 12:41:59PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Mon, Jul 06, 2009 at 10:56:02AM -0400, Gregory Haskins wrote:
> >   
> >> Avi Kivity wrote:
> >>     
> >>> On 07/02/2009 06:50 PM, Avi Kivity wrote:
> >>>       
> >>>> On 07/02/2009 06:37 PM, Gregory Haskins wrote:
> >>>>         
> >>>>> (Applies to kvm.git/master:1f9050fd)
> >>>>>
> >>>>> The following is the latest attempt to fix the races in
> >>>>> irqfd/eventfd, as
> >>>>> well as restore DEASSIGN support.  For more details, please read the
> >>>>> patch
> >>>>> headers.
> >>>>>
> >>>>> As always, this series has been tested against the kvm-eventfd unit
> >>>>> test
> >>>>> and everything appears to be functioning properly. You can download
> >>>>> this
> >>>>> test here:
> >>>>>           
> >>>> Applied, thanks.
> >>>>
> >>>>         
> >>> ... and unapplied.  There's a refcounting mismatch in irqfd_cleanup: a
> >>> reference is taken for each irqfd, but dropped for each guest.  This
> >>> causes an oops if a guest with no irqfds is created and destroyed:
> >>>       
> >> I was able to reproduce this issue.  The problem turned out to be that I
> >> inadvertently always did a flush_workqueue(), even if the work-queue was
> >> never initialized.   
> >>
> >> The following interdiff applied to the reverted patch has been confirmed
> >> to fix the issue:
> >>     
> >
> > Could you document the init boolean and its locking rules?
> > The best place to put it would be where the field is declared btw.
> >   
> 
> Will do
> 
> > Is it true that init === list_empty(&kvm->irqfds.items)?
> > If yes maybe we don't need this field at all.
> >
> >   
> No,

OK, I thought it is. I'll wait for the documentation patch then.

> because its more difficult to maintain the work-queue when
> referenced against active irqfds (*).  So instead, its maintained
> against guests that use irqfd, whether they have an active irqfd or
> not.  Otherwise you have to contend with the eventfd-side release, which
> is a little tricky.
> 
> (*) I'm sure its not rocket science to get this working, but it was
> getting more complex than I thought it was worth, so I simplified the
> model to be per-vm.  Note that this design decision/limitation is
> declared in the patch header.
> >   
> >> -------------------
> >>
> >> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> >> index fcc3469..52b0e04 100644
> >> --- a/virt/kvm/eventfd.c
> >> +++ b/virt/kvm/eventfd.c
> >> @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
> >>         struct _irqfd *irqfd, *tmp;
> >>         struct eventfd_ctx *eventfd;
> >>  
> >> +       if (!kvm->irqfds.init)
> >> +               return -ENOENT;
> >> +
> >>         eventfd = eventfd_ctx_fdget(fd);
> >>         if (IS_ERR(eventfd))
> >>                 return PTR_ERR(eventfd);
> >>     
> >
> > wouldn't it be cleaner to error out in the for each loop if we don't
> > find an entry to deactivate?  Might be helpful for apps to get an error
> > if they didn't deassign anything.
> >   
> 
> Again, irqfds.init is somewhat orthogonal to whether the list is
> populated or not.  This check is for sanity (how can you deassign if you
> didnt assign, etc).  Normally this would be a simple BUG_ON() sanity
> check, but I don't want a malicious/broken userspace to gain an easy
> attack vector ;)

what I'm saying is that deassign should return an error if it's passed
and entry that is not on the list.  And if you do this and return before
flush, this check won't be needed.

> >   
> >> @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm)
> >>  {
> >>         struct _irqfd *irqfd, *tmp;
> >>  
> >> +       if (!kvm->irqfds.init)
> >> +               return;
> >> +
> >>     
> >
> > So here, I recall some old comment that flush below was
> > needed even if list is empty. Is this no longer true?
> >   
> 
> If you are using irqfd, its true.  If irqfds.init == false, you are not
> using irqfd and thus the flush cannot be needed.
> 
> > If not it might be cleaner to only flush if list is not empty.
> >
> >   
> You have to flush if irqfds.init == true even if the list is empty
> because you need to be sure that eventfd-side releases complete.  They
> may have already removed themselves from the list, but the work-item is
> still in flight.
> 
> Regards,
> -Greg
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins July 6, 2009, 6:48 p.m. UTC | #4
Michael S. Tsirkin wrote:
> On Mon, Jul 06, 2009 at 12:41:59PM -0400, Gregory Haskins wrote:
>   
>> Michael S. Tsirkin wrote:
>>     
>>>>    
>>>>         
>>> wouldn't it be cleaner to error out in the for each loop if we don't
>>> find an entry to deactivate?  Might be helpful for apps to get an error
>>> if they didn't deassign anything.
>>>   
>>>       
>> Again, irqfds.init is somewhat orthogonal to whether the list is
>> populated or not.  This check is for sanity (how can you deassign if you
>> didnt assign, etc).  Normally this would be a simple BUG_ON() sanity
>> check, but I don't want a malicious/broken userspace to gain an easy
>> attack vector ;)
>>     
>
> what I'm saying is that deassign should return an error if it's passed
> and entry that is not on the list.

This isn't an unreasonable request, and I believe this is actually the
way the original deassign logic worked before we yanked the feature a
few weeks ago.  Its only slightly complicated by the fact that we may
match multiple irqfds, but I think we solved that before by returning
the number we matched.

If Avi answers the other mail stating he wants to still see the
on-demand work go in, lets use your suggestion.

Regards,
-Greg
diff mbox

Patch

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index fcc3469..52b0e04 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -318,6 +318,9 @@  kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
        struct _irqfd *irqfd, *tmp;
        struct eventfd_ctx *eventfd;
 
+       if (!kvm->irqfds.init)
+               return -ENOENT;
+
        eventfd = eventfd_ctx_fdget(fd);
        if (IS_ERR(eventfd))
                return PTR_ERR(eventfd);
@@ -360,6 +363,9 @@  kvm_irqfd_release(struct kvm *kvm)
 {
        struct _irqfd *irqfd, *tmp;
 
+       if (!kvm->irqfds.init)
+               return;
+
        spin_lock_irq(&kvm->irqfds.lock);
 
        list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list)