diff mbox series

AW: KASAN: use-after-free Read in usbhid_power

Message ID caf422aebd314ab8a5dd96ac2d9bb198@SVR-IES-MBX-03.mgc.mentorg.com (mailing list archive)
State New, archived
Headers show
Series AW: KASAN: use-after-free Read in usbhid_power | expand

Commit Message

Schmid, Carsten Aug. 9, 2019, 7:35 a.m. UTC
Hi all having use-after-free issues in USB shutdowns:
I hunted for a similar case in the intel_xhci_usb_sw driver.
What i have found and proposed is (from yesterday):
---
[PATCH] kernel/resource.c: invalidate parent when freed resource has childs

When a resource is freed and has children, the childrens are
left without any hint that their parent is no more valid.
This caused at least one use-after-free in the xhci-hcd using
ext-caps driver when platform code released platform devices.

Fix this by setting child's parent to zero and warn.

Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com>
---
Rationale:
When hunting for the root cause of a crash on a 4.14.86 kernel, i
have found the root cause and checked it being still present
upstream. Our case:
Having xhci-hcd and intel_xhci_usb_sw active we can see in
/proc/meminfo: (exceirpt)
  b3c00000-b3c0ffff : 0000:00:15.0
    b3c00000-b3c0ffff : xhci-hcd
      b3c08070-b3c0846f : intel_xhci_usb_sw
intel_xhci_usb_sw being a child of xhci-hcd.

Doing an unbind command
echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind
leads to xhci-hcd being freed in __release_region.
The intel_xhci_usb_sw resource is accessed in platform code
in platform_device_del with
                for (i = 0; i < pdev->num_resources; i++) {
                        struct resource *r = &pdev->resource[i];
                        if (r->parent)
                                release_resource(r);
                }
as the resource's parent has not been updated, the release_resource
uses the parent:
        p = &old->parent->child;
which is now invalid.
Fix this by marking the parent invalid in the child and give a warning
in dmesg.
---
 kernel/resource.c | 9 +++++++++
 1 file changed, 9 insertions(+)

--
2.17.1

> -----Ursprüngliche Nachricht-----
> Von: linux-usb-owner@vger.kernel.org [mailto:linux-usb-
> owner@vger.kernel.org] Im Auftrag von Alan Stern
> Gesendet: Donnerstag, 8. August 2019 21:37
> An: Andrey Konovalov <andreyknvl@google.com>
> Cc: Oliver Neukum <oneukum@suse.com>; syzkaller-bugs <syzkaller-
> bugs@googlegroups.com>; syzbot
> <syzbot+ef5de9c4f99c4edb4e49@syzkaller.appspotmail.com>; USB list
> <linux-usb@vger.kernel.org>; Hillf Danton <hdanton@sina.com>
> Betreff: Re: KASAN: use-after-free Read in usbhid_power
> 
> On Thu, 8 Aug 2019, Andrey Konovalov wrote:
> 
> > On Thu, Jul 25, 2019 at 5:09 PM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> > >
> > > On Thu, 25 Jul 2019, Oliver Neukum wrote:
> > >
> > > > Am Mittwoch, den 24.07.2019, 17:02 -0400 schrieb Alan Stern:
> > > > > On Wed, 24 Jul 2019, Oliver Neukum wrote:
> > > > >
> > > > > >  drivers/hid/usbhid/hid-core.c | 13 +++++++++++++
> > > > > >  1 file changed, 13 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-
> core.c
> > > > > > index c7bc9db5b192..98b996ecf4d3 100644
> > > > > > --- a/drivers/hid/usbhid/hid-core.c
> > > > > > +++ b/drivers/hid/usbhid/hid-core.c
> > > > > > @@ -1229,6 +1229,17 @@ static int usbhid_power(struct hid_device
> *hid, int lvl)
> > > > > >   struct usbhid_device *usbhid = hid->driver_data;
> > > > > >   int r = 0;
> > > > > >
> > > > > > + spin_lock_irq(&usbhid->lock);
> > > > > > + if (test_bit(HID_DISCONNECTED, &usbhid->iofl)) {
> > > > > > +         r = -ENODEV;
> > > > > > +         spin_unlock_irq(&usbhid->lock);
> > > > > > +         goto bail_out;
> > > > > > + } else {
> > > > > > +         /* protect against disconnect */
> > > > > > +         usb_get_dev(interface_to_usbdev(usbhid->intf));
> > > > > > +         spin_unlock_irq(&usbhid->lock);
> > > > > > + }
> > > > > > +
> > > > > >   switch (lvl) {
> > > > > >   case PM_HINT_FULLON:
> > > > > >           r = usb_autopm_get_interface(usbhid->intf);
> > > > > > @@ -1238,7 +1249,9 @@ static int usbhid_power(struct hid_device
> *hid, int lvl)
> > > > > >           usb_autopm_put_interface(usbhid->intf);
> > > > > >           break;
> > > > > >   }
> > > > > > + usb_put_dev(interface_to_usbdev(usbhid->intf));
> > > > > >
> > > > > > +bail_out:
> > > > > >   return r;
> > > > > >  }
> 
> > This report looks like very similar to these two:
> >
> > https://syzkaller.appspot.com/bug?extid=b156665cf4d1b5e00c76
> > https://syzkaller.appspot.com/bug?extid=3cbe5cd105d2ad56a1df
> 
> It also seems to resemble extids a7a6b9c609b9457c62c6,
> 62a1e04fd3ec2abf099e, and 75e6910bf03e266a277f, although this may be an
> illusion.
> 
> > Maybe we should mark those two as duplicates.
> >
> > Hillf suggested a fix on one of them, but it looks different from what
> > you propose:
> >
> > https://groups.google.com/d/msg/syzkaller-
> bugs/xW7LvKfpyn0/SpKbs5ZLEAAJ
> 
> Go ahead and try it out on all of them.  I don't have a clear feeling
> about it, not having worked on usbhid in quite a while.
> 
> Alan Stern

Comments

Greg KH Aug. 9, 2019, 7:55 a.m. UTC | #1
On Fri, Aug 09, 2019 at 07:35:32AM +0000, Schmid, Carsten wrote:
> Hi all having use-after-free issues in USB shutdowns:
> I hunted for a similar case in the intel_xhci_usb_sw driver.
> What i have found and proposed is (from yesterday):
> ---
> [PATCH] kernel/resource.c: invalidate parent when freed resource has childs
> 
> When a resource is freed and has children, the childrens are
> left without any hint that their parent is no more valid.
> This caused at least one use-after-free in the xhci-hcd using
> ext-caps driver when platform code released platform devices.
> 
> Fix this by setting child's parent to zero and warn.
> 
> Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com>
> ---
> Rationale:
> When hunting for the root cause of a crash on a 4.14.86 kernel, i
> have found the root cause and checked it being still present
> upstream. Our case:
> Having xhci-hcd and intel_xhci_usb_sw active we can see in
> /proc/meminfo: (exceirpt)
>   b3c00000-b3c0ffff : 0000:00:15.0
>     b3c00000-b3c0ffff : xhci-hcd
>       b3c08070-b3c0846f : intel_xhci_usb_sw
> intel_xhci_usb_sw being a child of xhci-hcd.
> 
> Doing an unbind command
> echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind
> leads to xhci-hcd being freed in __release_region.
> The intel_xhci_usb_sw resource is accessed in platform code
> in platform_device_del with
>                 for (i = 0; i < pdev->num_resources; i++) {
>                         struct resource *r = &pdev->resource[i];
>                         if (r->parent)
>                                 release_resource(r);
>                 }
> as the resource's parent has not been updated, the release_resource
> uses the parent:
>         p = &old->parent->child;
> which is now invalid.
> Fix this by marking the parent invalid in the child and give a warning
> in dmesg.
> ---
>  kernel/resource.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 158f04ec1d4f..95340cb0b1c2 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent, resource_size_t start,
>                         write_unlock(&resource_lock);
>                         if (res->flags & IORESOURCE_MUXED)
>                                 wake_up(&muxed_resource_wait);
> +
> +                       write_lock(&resource_lock);

Nit, should't this be written so that you only drop/get the lock if the
above if statement is true?

> +                       if (res->child) {
> +                               printk(KERN_WARNING "__release_region: %s has child %s,"
> +                                               "invalidating childs parent\n",
> +                                               res->name, res->child->name);

What can userspace/anyone do about this if it triggers?

Can't we fix the root problem in the xhci driver to properly order how
it tears things down?  If it has intel_xhci_usb_driver as a "child"
shouldn't that be unbound/freed before the parent is?

thanks,

greg k-h
Schmid, Carsten Aug. 9, 2019, 9:34 a.m. UTC | #2
> -----Ursprüngliche Nachricht-----
> Von: Greg KH [mailto:gregkh@linuxfoundation.org]
> Gesendet: Freitag, 9. August 2019 09:56
> An: Schmid, Carsten <Carsten_Schmid@mentor.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>; Andrey Konovalov
> <andreyknvl@google.com>; Oliver Neukum <oneukum@suse.com>;
> syzkaller-bugs <syzkaller-bugs@googlegroups.com>; syzbot
> <syzbot+ef5de9c4f99c4edb4e49@syzkaller.appspotmail.com>; USB list
> <linux-usb@vger.kernel.org>; Hillf Danton <hdanton@sina.com>
> Betreff: Re: KASAN: use-after-free Read in usbhid_power
> 
> On Fri, Aug 09, 2019 at 07:35:32AM +0000, Schmid, Carsten wrote:
> > Hi all having use-after-free issues in USB shutdowns:
> > I hunted for a similar case in the intel_xhci_usb_sw driver.
> > What i have found and proposed is (from yesterday):
> > ---
> > [PATCH] kernel/resource.c: invalidate parent when freed resource has
> childs
> >
> > When a resource is freed and has children, the childrens are
> > left without any hint that their parent is no more valid.
> > This caused at least one use-after-free in the xhci-hcd using
> > ext-caps driver when platform code released platform devices.
> >
> > Fix this by setting child's parent to zero and warn.
> >
> > Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com>
> > ---
> > Rationale:
> > When hunting for the root cause of a crash on a 4.14.86 kernel, i
> > have found the root cause and checked it being still present
> > upstream. Our case:
> > Having xhci-hcd and intel_xhci_usb_sw active we can see in
> > /proc/meminfo: (exceirpt)
> >   b3c00000-b3c0ffff : 0000:00:15.0
> >     b3c00000-b3c0ffff : xhci-hcd
> >       b3c08070-b3c0846f : intel_xhci_usb_sw
> > intel_xhci_usb_sw being a child of xhci-hcd.
> >
> > Doing an unbind command
> > echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind
> > leads to xhci-hcd being freed in __release_region.
> > The intel_xhci_usb_sw resource is accessed in platform code
> > in platform_device_del with
> >                 for (i = 0; i < pdev->num_resources; i++) {
> >                         struct resource *r = &pdev->resource[i];
> >                         if (r->parent)
> >                                 release_resource(r);
> >                 }
> > as the resource's parent has not been updated, the release_resource
> > uses the parent:
> >         p = &old->parent->child;
> > which is now invalid.
> > Fix this by marking the parent invalid in the child and give a warning
> > in dmesg.
> > ---
> >  kernel/resource.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 158f04ec1d4f..95340cb0b1c2 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent,
> resource_size_t start,
> >                         write_unlock(&resource_lock);
> >                         if (res->flags & IORESOURCE_MUXED)
> >                                 wake_up(&muxed_resource_wait);
> > +
> > +                       write_lock(&resource_lock);
> 
> Nit, should't this be written so that you only drop/get the lock if the
> above if statement is true?
> 
What if some other async part invalidates the child while accessing it?
I wanted to make sure that the res->child is valid through the time it is accessed.

> > +                       if (res->child) {
> > +                               printk(KERN_WARNING "__release_region: %s has child
> %s,"
> > +                                               "invalidating childs parent\n",
> > +                                               res->name, res->child->name);
> 
> What can userspace/anyone do about this if it triggers?
> 
At least a platform driver developer can see he did something wrong.
I had a look at the code of Hans and did not see anything weird.
(platform driver is in drivers/usb/host/xhci-ext-caps.c)
The issue is very racy - what happens when the platform code accesses the
resource mainly depends on if the freed resource memory already has been
reused by someone.

It was hard to find that, and only a single core dump enabled me to find it.
A first attempt was to set resource count to 0 in Hans' driver in the unregister
pdev before calling platform_device_unregister. That worked.
But this is a dirty hack in my eyes. The framework should detect such issues,
so i decided to find the best place where to insert the check - and
found it is the place where the resource is freed and still has childrens.

> Can't we fix the root problem in the xhci driver to properly order how
> it tears things down?  If it has intel_xhci_usb_driver as a "child"
> shouldn't that be unbound/freed before the parent is?
> 
Hans, any idea?

> thanks,
> 
> greg k-h
Hans de Goede Aug. 9, 2019, 10:20 a.m. UTC | #3
Hi,

On 8/9/19 11:34 AM, Schmid, Carsten wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: Greg KH [mailto:gregkh@linuxfoundation.org]
>> Gesendet: Freitag, 9. August 2019 09:56
>> An: Schmid, Carsten <Carsten_Schmid@mentor.com>
>> Cc: Alan Stern <stern@rowland.harvard.edu>; Andrey Konovalov
>> <andreyknvl@google.com>; Oliver Neukum <oneukum@suse.com>;
>> syzkaller-bugs <syzkaller-bugs@googlegroups.com>; syzbot
>> <syzbot+ef5de9c4f99c4edb4e49@syzkaller.appspotmail.com>; USB list
>> <linux-usb@vger.kernel.org>; Hillf Danton <hdanton@sina.com>
>> Betreff: Re: KASAN: use-after-free Read in usbhid_power
>>
>> On Fri, Aug 09, 2019 at 07:35:32AM +0000, Schmid, Carsten wrote:
>>> Hi all having use-after-free issues in USB shutdowns:
>>> I hunted for a similar case in the intel_xhci_usb_sw driver.
>>> What i have found and proposed is (from yesterday):
>>> ---
>>> [PATCH] kernel/resource.c: invalidate parent when freed resource has
>> childs
>>>
>>> When a resource is freed and has children, the childrens are
>>> left without any hint that their parent is no more valid.
>>> This caused at least one use-after-free in the xhci-hcd using
>>> ext-caps driver when platform code released platform devices.
>>>
>>> Fix this by setting child's parent to zero and warn.
>>>
>>> Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com>
>>> ---
>>> Rationale:
>>> When hunting for the root cause of a crash on a 4.14.86 kernel, i
>>> have found the root cause and checked it being still present
>>> upstream. Our case:
>>> Having xhci-hcd and intel_xhci_usb_sw active we can see in
>>> /proc/meminfo: (exceirpt)
>>>    b3c00000-b3c0ffff : 0000:00:15.0
>>>      b3c00000-b3c0ffff : xhci-hcd
>>>        b3c08070-b3c0846f : intel_xhci_usb_sw
>>> intel_xhci_usb_sw being a child of xhci-hcd.
>>>
>>> Doing an unbind command
>>> echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind
>>> leads to xhci-hcd being freed in __release_region.
>>> The intel_xhci_usb_sw resource is accessed in platform code
>>> in platform_device_del with
>>>                  for (i = 0; i < pdev->num_resources; i++) {
>>>                          struct resource *r = &pdev->resource[i];
>>>                          if (r->parent)
>>>                                  release_resource(r);
>>>                  }
>>> as the resource's parent has not been updated, the release_resource
>>> uses the parent:
>>>          p = &old->parent->child;
>>> which is now invalid.
>>> Fix this by marking the parent invalid in the child and give a warning
>>> in dmesg.
>>> ---
>>>   kernel/resource.c | 9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/kernel/resource.c b/kernel/resource.c
>>> index 158f04ec1d4f..95340cb0b1c2 100644
>>> --- a/kernel/resource.c
>>> +++ b/kernel/resource.c
>>> @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent,
>> resource_size_t start,
>>>                          write_unlock(&resource_lock);
>>>                          if (res->flags & IORESOURCE_MUXED)
>>>                                  wake_up(&muxed_resource_wait);
>>> +
>>> +                       write_lock(&resource_lock);
>>
>> Nit, should't this be written so that you only drop/get the lock if the
>> above if statement is true?
>>
> What if some other async part invalidates the child while accessing it?
> I wanted to make sure that the res->child is valid through the time it is accessed.
> 
>>> +                       if (res->child) {
>>> +                               printk(KERN_WARNING "__release_region: %s has child
>> %s,"
>>> +                                               "invalidating childs parent\n",
>>> +                                               res->name, res->child->name);
>>
>> What can userspace/anyone do about this if it triggers?
>>
> At least a platform driver developer can see he did something wrong.
> I had a look at the code of Hans and did not see anything weird.
> (platform driver is in drivers/usb/host/xhci-ext-caps.c)
> The issue is very racy - what happens when the platform code accesses the
> resource mainly depends on if the freed resource memory already has been
> reused by someone.

We are talking memory-mapped io here, so it cannot just be "re-used", it
is wat it is. I guess the PCI BAR could be released and then the physical
address the resource was at could be re-used for another piece of MMIo,
but AFAIK outside of PI=CI hotplug we never release BARs.

Maybe we need to ref-count resources and have the aprent free only be
a deref and not release the resource until the child resource also
is free-ed doing another deref?

I must say that to me it sometimes just seems like always allowing unbind
is a bad idea. Another example of this is things like virtio, where
we can have a filesystem based on virtio-block, but the virtio interface
between the hypervisor and the guest-kernel is a PCI-device and in theory
the user could unbind the virtio driver from that PCI-device, after which
the whole house comes crashing down.

I also know that the extcon framework in its current incarnaton
does not deal with unbind properly...

Maybe it is time that we allow drivers to block unbind instead of
trying to support unbind in really complex situations where normal
use-cases will never need it ?

I do realize unbind is very useful for driver developent without
rebooting.

> It was hard to find that, and only a single core dump enabled me to find it.
> A first attempt was to set resource count to 0 in Hans' driver in the unregister
> pdev before calling platform_device_unregister. That worked.
> But this is a dirty hack in my eyes. The framework should detect such issues,
> so i decided to find the best place where to insert the check - and
> found it is the place where the resource is freed and still has childrens.
> 
>> Can't we fix the root problem in the xhci driver to properly order how
>> it tears things down?  If it has intel_xhci_usb_driver as a "child"
>> shouldn't that be unbound/freed before the parent is?
>>
> Hans, any idea?

1) make resources refcounted, have child resources take a ref on the parent
2) Disallow unbind on devices with bound child-devices?

Regards,

Hans
Schmid, Carsten Aug. 9, 2019, 10:47 a.m. UTC | #4
> 
> We are talking memory-mapped io here, so it cannot just be "re-used", it
> is wat it is. I guess the PCI BAR could be released and then the physical
> address the resource was at could be re-used for another piece of MMIo,
> but AFAIK outside of PI=CI hotplug we never release BARs.
> 
> Maybe we need to ref-count resources and have the aprent free only be
> a deref and not release the resource until the child resource also
> is free-ed doing another deref?
> 
> I must say that to me it sometimes just seems like always allowing unbind
> is a bad idea. Another example of this is things like virtio, where
> we can have a filesystem based on virtio-block, but the virtio interface
> between the hypervisor and the guest-kernel is a PCI-device and in theory
> the user could unbind the virtio driver from that PCI-device, after which
> the whole house comes crashing down.
> 
> I also know that the extcon framework in its current incarnaton
> does not deal with unbind properly...
> 
> Maybe it is time that we allow drivers to block unbind instead of
> trying to support unbind in really complex situations where normal
> use-cases will never need it ?
> 
> I do realize unbind is very useful for driver developent without
> rebooting.
> 

Hey, i did not want to trigger an eartquake in the basement of the kernel ;-)
My intention was to prevent some crashes, and help developers to find their bugs.
I think my patch exactly does this.
 
> 1) make resources refcounted, have child resources take a ref on the parent
> 2) Disallow unbind on devices with bound child-devices?
> 

Exactly what i was thinking of in first attempts.
But i fear that would break even more use cases.

Hans, directly regarding the driver:
The problem i see is that the xhci_intel_unregister_pdev which is added
as an action with devm_add_action_or_reset() is called late by the framework,
later than the usb_hcd_pci_remove() in xhci_pci_remove.
Is there any chance to trigger this before?
This is what Greg meant with "right order".

Anyway, i really appreciate these discussions, thanks for all
your patience.

Best Regards
Carsten
Hans de Goede Aug. 9, 2019, 10:53 a.m. UTC | #5
Hi,

On 8/9/19 12:47 PM, Schmid, Carsten wrote:
>>
>> We are talking memory-mapped io here, so it cannot just be "re-used", it
>> is wat it is. I guess the PCI BAR could be released and then the physical
>> address the resource was at could be re-used for another piece of MMIo,
>> but AFAIK outside of PI=CI hotplug we never release BARs.
>>
>> Maybe we need to ref-count resources and have the aprent free only be
>> a deref and not release the resource until the child resource also
>> is free-ed doing another deref?
>>
>> I must say that to me it sometimes just seems like always allowing unbind
>> is a bad idea. Another example of this is things like virtio, where
>> we can have a filesystem based on virtio-block, but the virtio interface
>> between the hypervisor and the guest-kernel is a PCI-device and in theory
>> the user could unbind the virtio driver from that PCI-device, after which
>> the whole house comes crashing down.
>>
>> I also know that the extcon framework in its current incarnaton
>> does not deal with unbind properly...
>>
>> Maybe it is time that we allow drivers to block unbind instead of
>> trying to support unbind in really complex situations where normal
>> use-cases will never need it ?
>>
>> I do realize unbind is very useful for driver developent without
>> rebooting.
>>
> 
> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-)
> My intention was to prevent some crashes, and help developers to find their bugs.
> I think my patch exactly does this.

Hehe, actually drivers not being able to block unbind has been bugging me for
a while now, because there are cases where this would be really helpful.
>> 1) make resources refcounted, have child resources take a ref on the parent
>> 2) Disallow unbind on devices with bound child-devices?
>>
> 
> Exactly what i was thinking of in first attempts.
> But i fear that would break even more use cases.
> 
> Hans, directly regarding the driver:
> The problem i see is that the xhci_intel_unregister_pdev which is added
> as an action with devm_add_action_or_reset() is called late by the framework,
> later than the usb_hcd_pci_remove() in xhci_pci_remove.
> Is there any chance to trigger this before?
> This is what Greg meant with "right order".

Ah, I missed that part, sure that should be easy, just stop using
devm_add_action_or_reset() and do the xhci_intel_unregister_pdev()
manually at the right time. The downside of this is that you also
need to make sure it happens at the right time from probe error-paths
but given the bug you are hitting, I guess that is probably
already a problem.

Regards,

Hans
Schmid, Carsten Aug. 9, 2019, 12:38 p.m. UTC | #6
Hi again,

>>
>> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-)
>> My intention was to prevent some crashes, and help developers to find their bugs.
>> I think my patch exactly does this.
> 
> Hehe, actually drivers not being able to block unbind has been bugging me
> for
> a while now, because there are cases where this would be really helpful.
>>> 1) make resources refcounted, have child resources take a ref on the parent
>>> 2) Disallow unbind on devices with bound child-devices?
>>>
>> Exactly what i was thinking of in first attempts.
>> But i fear that would break even more use cases.
>>
>> Hans, directly regarding the driver:
>> The problem i see is that the xhci_intel_unregister_pdev which is added
>> as an action with devm_add_action_or_reset() is called late by the framework,
>> later than the usb_hcd_pci_remove() in xhci_pci_remove.
>> Is there any chance to trigger this before?
>> This is what Greg meant with "right order".
> 
> Ah, I missed that part, sure that should be easy, just stop using
> devm_add_action_or_reset() and do the xhci_intel_unregister_pdev()
> manually at the right time. The downside of this is that you also
> need to make sure it happens at the right time from probe error-paths
> but given the bug you are hitting, I guess that is probably
> already a problem.
> 
@Hans:
Sure, will have a look at this. I think i have found where to do that,
but need to check how to get the pdev pointer there ....

@Greg:
I am still confident that my patch in __release_region should be taken in.

Situation now without my patch:
If we have a device driver (or whatever) releasing a resource, the owner of
the child will have no notification that the parent is gone.
Accessing the parent (at least this will happen when trying to free the resource)
might have changed memory at the parent location, and what happens might
be an access to unmapped memory, whatever - an oops and we don't know why.
That's what i experienced and hunting.

Situation with the patch applied:
The owner gets a notification (parent=NULL) and we have an indication in
the kernel log.
If an owner of the resource where the parent is gone checks for the parent,
we are fine. If he doesn't check: we have a NULL pointer deref with a warning
message pointing to the root cause.

Isn't it better to have a pointer to a crash rather than having unreliable
racy crashes in such a case?

Have a nice weekend.

Best regards
Carsten
Greg KH Aug. 9, 2019, 12:54 p.m. UTC | #7
On Fri, Aug 09, 2019 at 12:38:35PM +0000, Schmid, Carsten wrote:
> Hi again,
> 
> >>
> >> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-)
> >> My intention was to prevent some crashes, and help developers to find their bugs.
> >> I think my patch exactly does this.
> > 
> > Hehe, actually drivers not being able to block unbind has been bugging me
> > for
> > a while now, because there are cases where this would be really helpful.
> >>> 1) make resources refcounted, have child resources take a ref on the parent
> >>> 2) Disallow unbind on devices with bound child-devices?
> >>>
> >> Exactly what i was thinking of in first attempts.
> >> But i fear that would break even more use cases.
> >>
> >> Hans, directly regarding the driver:
> >> The problem i see is that the xhci_intel_unregister_pdev which is added
> >> as an action with devm_add_action_or_reset() is called late by the framework,
> >> later than the usb_hcd_pci_remove() in xhci_pci_remove.
> >> Is there any chance to trigger this before?
> >> This is what Greg meant with "right order".
> > 
> > Ah, I missed that part, sure that should be easy, just stop using
> > devm_add_action_or_reset() and do the xhci_intel_unregister_pdev()
> > manually at the right time. The downside of this is that you also
> > need to make sure it happens at the right time from probe error-paths
> > but given the bug you are hitting, I guess that is probably
> > already a problem.
> > 
> @Hans:
> Sure, will have a look at this. I think i have found where to do that,
> but need to check how to get the pdev pointer there ....
> 
> @Greg:
> I am still confident that my patch in __release_region should be taken in.

Ok, submit it in a "real" way and we can consider it :)

thanks,

greg k-h
Schmid, Carsten Aug. 9, 2019, 1 p.m. UTC | #8
>>
>> @Greg:
>> I am still confident that my patch in __release_region should be taken in.
>
> Ok, submit it in a "real" way and we can consider it :)
>
> thanks,
>
> greg k-h

Already done, linux-kernel@vger.kernel.org, see
https://www.spinics.net/lists/kernel/msg3218180.html

Thanks, and have a nice weekend.

Best regards
Carsten
Greg KH Aug. 9, 2019, 1:38 p.m. UTC | #9
On Fri, Aug 09, 2019 at 01:00:25PM +0000, Schmid, Carsten wrote:
> >>
> >> @Greg:
> >> I am still confident that my patch in __release_region should be taken in.
> >
> > Ok, submit it in a "real" way and we can consider it :)
> >
> > thanks,
> >
> > greg k-h
> 
> Already done, linux-kernel@vger.kernel.org, see
> https://www.spinics.net/lists/kernel/msg3218180.html

You didn't cc: any developer, that's a sure way to get a patch ignored
:(

Try resending it with at least the people who get_maintainer.pl says has
touched that file last in it.

Also, Linus is the unofficial resource.c maintainer.  I think he has a
set of userspace testing scripts for changes somewhere, so you should
 cc: him too.  And might as well add me :)

 thanks,

 greg k-h
Hans de Goede Aug. 10, 2019, 11:12 a.m. UTC | #10
Hi,

On 09-08-19 14:38, Schmid, Carsten wrote:
> Hi again,
> 
>>>
>>> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-)
>>> My intention was to prevent some crashes, and help developers to find their bugs.
>>> I think my patch exactly does this.
>>
>> Hehe, actually drivers not being able to block unbind has been bugging me
>> for
>> a while now, because there are cases where this would be really helpful.
>>>> 1) make resources refcounted, have child resources take a ref on the parent
>>>> 2) Disallow unbind on devices with bound child-devices?
>>>>
>>> Exactly what i was thinking of in first attempts.
>>> But i fear that would break even more use cases.
>>>
>>> Hans, directly regarding the driver:
>>> The problem i see is that the xhci_intel_unregister_pdev which is added
>>> as an action with devm_add_action_or_reset() is called late by the framework,
>>> later than the usb_hcd_pci_remove() in xhci_pci_remove.
>>> Is there any chance to trigger this before?
>>> This is what Greg meant with "right order".
>>
>> Ah, I missed that part, sure that should be easy, just stop using
>> devm_add_action_or_reset() and do the xhci_intel_unregister_pdev()
>> manually at the right time. The downside of this is that you also
>> need to make sure it happens at the right time from probe error-paths
>> but given the bug you are hitting, I guess that is probably
>> already a problem.
>>
> @Hans:
> Sure, will have a look at this. I think i have found where to do that,
> but need to check how to get the pdev pointer there ....

You probably need to store the pdev pointer in one of the xhci driver's
private data structs.

Regards,

Hans
diff mbox series

Patch

diff --git a/kernel/resource.c b/kernel/resource.c
index 158f04ec1d4f..95340cb0b1c2 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1200,6 +1200,15 @@  void __release_region(struct resource *parent, resource_size_t start,
                        write_unlock(&resource_lock);
                        if (res->flags & IORESOURCE_MUXED)
                                wake_up(&muxed_resource_wait);
+
+                       write_lock(&resource_lock);
+                       if (res->child) {
+                               printk(KERN_WARNING "__release_region: %s has child %s,"
+                                               "invalidating childs parent\n",
+                                               res->name, res->child->name);
+                               res->child->parent = NULL;
+                       }
+                       write_unlock(&resource_lock);
                        free_resource(res);
                        return;
                }