Message ID | caf422aebd314ab8a5dd96ac2d9bb198@SVR-IES-MBX-03.mgc.mentorg.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | AW: KASAN: use-after-free Read in usbhid_power | expand |
On Fri, Aug 09, 2019 at 07:35:32AM +0000, Schmid, Carsten wrote: > Hi all having use-after-free issues in USB shutdowns: > I hunted for a similar case in the intel_xhci_usb_sw driver. > What i have found and proposed is (from yesterday): > --- > [PATCH] kernel/resource.c: invalidate parent when freed resource has childs > > When a resource is freed and has children, the childrens are > left without any hint that their parent is no more valid. > This caused at least one use-after-free in the xhci-hcd using > ext-caps driver when platform code released platform devices. > > Fix this by setting child's parent to zero and warn. > > Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com> > --- > Rationale: > When hunting for the root cause of a crash on a 4.14.86 kernel, i > have found the root cause and checked it being still present > upstream. Our case: > Having xhci-hcd and intel_xhci_usb_sw active we can see in > /proc/meminfo: (exceirpt) > b3c00000-b3c0ffff : 0000:00:15.0 > b3c00000-b3c0ffff : xhci-hcd > b3c08070-b3c0846f : intel_xhci_usb_sw > intel_xhci_usb_sw being a child of xhci-hcd. > > Doing an unbind command > echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind > leads to xhci-hcd being freed in __release_region. > The intel_xhci_usb_sw resource is accessed in platform code > in platform_device_del with > for (i = 0; i < pdev->num_resources; i++) { > struct resource *r = &pdev->resource[i]; > if (r->parent) > release_resource(r); > } > as the resource's parent has not been updated, the release_resource > uses the parent: > p = &old->parent->child; > which is now invalid. > Fix this by marking the parent invalid in the child and give a warning > in dmesg. > --- > kernel/resource.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/resource.c b/kernel/resource.c > index 158f04ec1d4f..95340cb0b1c2 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent, resource_size_t start, > write_unlock(&resource_lock); > if (res->flags & IORESOURCE_MUXED) > wake_up(&muxed_resource_wait); > + > + write_lock(&resource_lock); Nit, should't this be written so that you only drop/get the lock if the above if statement is true? > + if (res->child) { > + printk(KERN_WARNING "__release_region: %s has child %s," > + "invalidating childs parent\n", > + res->name, res->child->name); What can userspace/anyone do about this if it triggers? Can't we fix the root problem in the xhci driver to properly order how it tears things down? If it has intel_xhci_usb_driver as a "child" shouldn't that be unbound/freed before the parent is? thanks, greg k-h
> -----Ursprüngliche Nachricht----- > Von: Greg KH [mailto:gregkh@linuxfoundation.org] > Gesendet: Freitag, 9. August 2019 09:56 > An: Schmid, Carsten <Carsten_Schmid@mentor.com> > Cc: Alan Stern <stern@rowland.harvard.edu>; Andrey Konovalov > <andreyknvl@google.com>; Oliver Neukum <oneukum@suse.com>; > syzkaller-bugs <syzkaller-bugs@googlegroups.com>; syzbot > <syzbot+ef5de9c4f99c4edb4e49@syzkaller.appspotmail.com>; USB list > <linux-usb@vger.kernel.org>; Hillf Danton <hdanton@sina.com> > Betreff: Re: KASAN: use-after-free Read in usbhid_power > > On Fri, Aug 09, 2019 at 07:35:32AM +0000, Schmid, Carsten wrote: > > Hi all having use-after-free issues in USB shutdowns: > > I hunted for a similar case in the intel_xhci_usb_sw driver. > > What i have found and proposed is (from yesterday): > > --- > > [PATCH] kernel/resource.c: invalidate parent when freed resource has > childs > > > > When a resource is freed and has children, the childrens are > > left without any hint that their parent is no more valid. > > This caused at least one use-after-free in the xhci-hcd using > > ext-caps driver when platform code released platform devices. > > > > Fix this by setting child's parent to zero and warn. > > > > Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com> > > --- > > Rationale: > > When hunting for the root cause of a crash on a 4.14.86 kernel, i > > have found the root cause and checked it being still present > > upstream. Our case: > > Having xhci-hcd and intel_xhci_usb_sw active we can see in > > /proc/meminfo: (exceirpt) > > b3c00000-b3c0ffff : 0000:00:15.0 > > b3c00000-b3c0ffff : xhci-hcd > > b3c08070-b3c0846f : intel_xhci_usb_sw > > intel_xhci_usb_sw being a child of xhci-hcd. > > > > Doing an unbind command > > echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind > > leads to xhci-hcd being freed in __release_region. > > The intel_xhci_usb_sw resource is accessed in platform code > > in platform_device_del with > > for (i = 0; i < pdev->num_resources; i++) { > > struct resource *r = &pdev->resource[i]; > > if (r->parent) > > release_resource(r); > > } > > as the resource's parent has not been updated, the release_resource > > uses the parent: > > p = &old->parent->child; > > which is now invalid. > > Fix this by marking the parent invalid in the child and give a warning > > in dmesg. > > --- > > kernel/resource.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/kernel/resource.c b/kernel/resource.c > > index 158f04ec1d4f..95340cb0b1c2 100644 > > --- a/kernel/resource.c > > +++ b/kernel/resource.c > > @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent, > resource_size_t start, > > write_unlock(&resource_lock); > > if (res->flags & IORESOURCE_MUXED) > > wake_up(&muxed_resource_wait); > > + > > + write_lock(&resource_lock); > > Nit, should't this be written so that you only drop/get the lock if the > above if statement is true? > What if some other async part invalidates the child while accessing it? I wanted to make sure that the res->child is valid through the time it is accessed. > > + if (res->child) { > > + printk(KERN_WARNING "__release_region: %s has child > %s," > > + "invalidating childs parent\n", > > + res->name, res->child->name); > > What can userspace/anyone do about this if it triggers? > At least a platform driver developer can see he did something wrong. I had a look at the code of Hans and did not see anything weird. (platform driver is in drivers/usb/host/xhci-ext-caps.c) The issue is very racy - what happens when the platform code accesses the resource mainly depends on if the freed resource memory already has been reused by someone. It was hard to find that, and only a single core dump enabled me to find it. A first attempt was to set resource count to 0 in Hans' driver in the unregister pdev before calling platform_device_unregister. That worked. But this is a dirty hack in my eyes. The framework should detect such issues, so i decided to find the best place where to insert the check - and found it is the place where the resource is freed and still has childrens. > Can't we fix the root problem in the xhci driver to properly order how > it tears things down? If it has intel_xhci_usb_driver as a "child" > shouldn't that be unbound/freed before the parent is? > Hans, any idea? > thanks, > > greg k-h
Hi, On 8/9/19 11:34 AM, Schmid, Carsten wrote: >> -----Ursprüngliche Nachricht----- >> Von: Greg KH [mailto:gregkh@linuxfoundation.org] >> Gesendet: Freitag, 9. August 2019 09:56 >> An: Schmid, Carsten <Carsten_Schmid@mentor.com> >> Cc: Alan Stern <stern@rowland.harvard.edu>; Andrey Konovalov >> <andreyknvl@google.com>; Oliver Neukum <oneukum@suse.com>; >> syzkaller-bugs <syzkaller-bugs@googlegroups.com>; syzbot >> <syzbot+ef5de9c4f99c4edb4e49@syzkaller.appspotmail.com>; USB list >> <linux-usb@vger.kernel.org>; Hillf Danton <hdanton@sina.com> >> Betreff: Re: KASAN: use-after-free Read in usbhid_power >> >> On Fri, Aug 09, 2019 at 07:35:32AM +0000, Schmid, Carsten wrote: >>> Hi all having use-after-free issues in USB shutdowns: >>> I hunted for a similar case in the intel_xhci_usb_sw driver. >>> What i have found and proposed is (from yesterday): >>> --- >>> [PATCH] kernel/resource.c: invalidate parent when freed resource has >> childs >>> >>> When a resource is freed and has children, the childrens are >>> left without any hint that their parent is no more valid. >>> This caused at least one use-after-free in the xhci-hcd using >>> ext-caps driver when platform code released platform devices. >>> >>> Fix this by setting child's parent to zero and warn. >>> >>> Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com> >>> --- >>> Rationale: >>> When hunting for the root cause of a crash on a 4.14.86 kernel, i >>> have found the root cause and checked it being still present >>> upstream. Our case: >>> Having xhci-hcd and intel_xhci_usb_sw active we can see in >>> /proc/meminfo: (exceirpt) >>> b3c00000-b3c0ffff : 0000:00:15.0 >>> b3c00000-b3c0ffff : xhci-hcd >>> b3c08070-b3c0846f : intel_xhci_usb_sw >>> intel_xhci_usb_sw being a child of xhci-hcd. >>> >>> Doing an unbind command >>> echo 0000:00:15.0 > /sys/bus/pci/drivers/xhci_hcd/unbind >>> leads to xhci-hcd being freed in __release_region. >>> The intel_xhci_usb_sw resource is accessed in platform code >>> in platform_device_del with >>> for (i = 0; i < pdev->num_resources; i++) { >>> struct resource *r = &pdev->resource[i]; >>> if (r->parent) >>> release_resource(r); >>> } >>> as the resource's parent has not been updated, the release_resource >>> uses the parent: >>> p = &old->parent->child; >>> which is now invalid. >>> Fix this by marking the parent invalid in the child and give a warning >>> in dmesg. >>> --- >>> kernel/resource.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/kernel/resource.c b/kernel/resource.c >>> index 158f04ec1d4f..95340cb0b1c2 100644 >>> --- a/kernel/resource.c >>> +++ b/kernel/resource.c >>> @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent, >> resource_size_t start, >>> write_unlock(&resource_lock); >>> if (res->flags & IORESOURCE_MUXED) >>> wake_up(&muxed_resource_wait); >>> + >>> + write_lock(&resource_lock); >> >> Nit, should't this be written so that you only drop/get the lock if the >> above if statement is true? >> > What if some other async part invalidates the child while accessing it? > I wanted to make sure that the res->child is valid through the time it is accessed. > >>> + if (res->child) { >>> + printk(KERN_WARNING "__release_region: %s has child >> %s," >>> + "invalidating childs parent\n", >>> + res->name, res->child->name); >> >> What can userspace/anyone do about this if it triggers? >> > At least a platform driver developer can see he did something wrong. > I had a look at the code of Hans and did not see anything weird. > (platform driver is in drivers/usb/host/xhci-ext-caps.c) > The issue is very racy - what happens when the platform code accesses the > resource mainly depends on if the freed resource memory already has been > reused by someone. We are talking memory-mapped io here, so it cannot just be "re-used", it is wat it is. I guess the PCI BAR could be released and then the physical address the resource was at could be re-used for another piece of MMIo, but AFAIK outside of PI=CI hotplug we never release BARs. Maybe we need to ref-count resources and have the aprent free only be a deref and not release the resource until the child resource also is free-ed doing another deref? I must say that to me it sometimes just seems like always allowing unbind is a bad idea. Another example of this is things like virtio, where we can have a filesystem based on virtio-block, but the virtio interface between the hypervisor and the guest-kernel is a PCI-device and in theory the user could unbind the virtio driver from that PCI-device, after which the whole house comes crashing down. I also know that the extcon framework in its current incarnaton does not deal with unbind properly... Maybe it is time that we allow drivers to block unbind instead of trying to support unbind in really complex situations where normal use-cases will never need it ? I do realize unbind is very useful for driver developent without rebooting. > It was hard to find that, and only a single core dump enabled me to find it. > A first attempt was to set resource count to 0 in Hans' driver in the unregister > pdev before calling platform_device_unregister. That worked. > But this is a dirty hack in my eyes. The framework should detect such issues, > so i decided to find the best place where to insert the check - and > found it is the place where the resource is freed and still has childrens. > >> Can't we fix the root problem in the xhci driver to properly order how >> it tears things down? If it has intel_xhci_usb_driver as a "child" >> shouldn't that be unbound/freed before the parent is? >> > Hans, any idea? 1) make resources refcounted, have child resources take a ref on the parent 2) Disallow unbind on devices with bound child-devices? Regards, Hans
> > We are talking memory-mapped io here, so it cannot just be "re-used", it > is wat it is. I guess the PCI BAR could be released and then the physical > address the resource was at could be re-used for another piece of MMIo, > but AFAIK outside of PI=CI hotplug we never release BARs. > > Maybe we need to ref-count resources and have the aprent free only be > a deref and not release the resource until the child resource also > is free-ed doing another deref? > > I must say that to me it sometimes just seems like always allowing unbind > is a bad idea. Another example of this is things like virtio, where > we can have a filesystem based on virtio-block, but the virtio interface > between the hypervisor and the guest-kernel is a PCI-device and in theory > the user could unbind the virtio driver from that PCI-device, after which > the whole house comes crashing down. > > I also know that the extcon framework in its current incarnaton > does not deal with unbind properly... > > Maybe it is time that we allow drivers to block unbind instead of > trying to support unbind in really complex situations where normal > use-cases will never need it ? > > I do realize unbind is very useful for driver developent without > rebooting. > Hey, i did not want to trigger an eartquake in the basement of the kernel ;-) My intention was to prevent some crashes, and help developers to find their bugs. I think my patch exactly does this. > 1) make resources refcounted, have child resources take a ref on the parent > 2) Disallow unbind on devices with bound child-devices? > Exactly what i was thinking of in first attempts. But i fear that would break even more use cases. Hans, directly regarding the driver: The problem i see is that the xhci_intel_unregister_pdev which is added as an action with devm_add_action_or_reset() is called late by the framework, later than the usb_hcd_pci_remove() in xhci_pci_remove. Is there any chance to trigger this before? This is what Greg meant with "right order". Anyway, i really appreciate these discussions, thanks for all your patience. Best Regards Carsten
Hi, On 8/9/19 12:47 PM, Schmid, Carsten wrote: >> >> We are talking memory-mapped io here, so it cannot just be "re-used", it >> is wat it is. I guess the PCI BAR could be released and then the physical >> address the resource was at could be re-used for another piece of MMIo, >> but AFAIK outside of PI=CI hotplug we never release BARs. >> >> Maybe we need to ref-count resources and have the aprent free only be >> a deref and not release the resource until the child resource also >> is free-ed doing another deref? >> >> I must say that to me it sometimes just seems like always allowing unbind >> is a bad idea. Another example of this is things like virtio, where >> we can have a filesystem based on virtio-block, but the virtio interface >> between the hypervisor and the guest-kernel is a PCI-device and in theory >> the user could unbind the virtio driver from that PCI-device, after which >> the whole house comes crashing down. >> >> I also know that the extcon framework in its current incarnaton >> does not deal with unbind properly... >> >> Maybe it is time that we allow drivers to block unbind instead of >> trying to support unbind in really complex situations where normal >> use-cases will never need it ? >> >> I do realize unbind is very useful for driver developent without >> rebooting. >> > > Hey, i did not want to trigger an eartquake in the basement of the kernel ;-) > My intention was to prevent some crashes, and help developers to find their bugs. > I think my patch exactly does this. Hehe, actually drivers not being able to block unbind has been bugging me for a while now, because there are cases where this would be really helpful. >> 1) make resources refcounted, have child resources take a ref on the parent >> 2) Disallow unbind on devices with bound child-devices? >> > > Exactly what i was thinking of in first attempts. > But i fear that would break even more use cases. > > Hans, directly regarding the driver: > The problem i see is that the xhci_intel_unregister_pdev which is added > as an action with devm_add_action_or_reset() is called late by the framework, > later than the usb_hcd_pci_remove() in xhci_pci_remove. > Is there any chance to trigger this before? > This is what Greg meant with "right order". Ah, I missed that part, sure that should be easy, just stop using devm_add_action_or_reset() and do the xhci_intel_unregister_pdev() manually at the right time. The downside of this is that you also need to make sure it happens at the right time from probe error-paths but given the bug you are hitting, I guess that is probably already a problem. Regards, Hans
Hi again, >> >> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-) >> My intention was to prevent some crashes, and help developers to find their bugs. >> I think my patch exactly does this. > > Hehe, actually drivers not being able to block unbind has been bugging me > for > a while now, because there are cases where this would be really helpful. >>> 1) make resources refcounted, have child resources take a ref on the parent >>> 2) Disallow unbind on devices with bound child-devices? >>> >> Exactly what i was thinking of in first attempts. >> But i fear that would break even more use cases. >> >> Hans, directly regarding the driver: >> The problem i see is that the xhci_intel_unregister_pdev which is added >> as an action with devm_add_action_or_reset() is called late by the framework, >> later than the usb_hcd_pci_remove() in xhci_pci_remove. >> Is there any chance to trigger this before? >> This is what Greg meant with "right order". > > Ah, I missed that part, sure that should be easy, just stop using > devm_add_action_or_reset() and do the xhci_intel_unregister_pdev() > manually at the right time. The downside of this is that you also > need to make sure it happens at the right time from probe error-paths > but given the bug you are hitting, I guess that is probably > already a problem. > @Hans: Sure, will have a look at this. I think i have found where to do that, but need to check how to get the pdev pointer there .... @Greg: I am still confident that my patch in __release_region should be taken in. Situation now without my patch: If we have a device driver (or whatever) releasing a resource, the owner of the child will have no notification that the parent is gone. Accessing the parent (at least this will happen when trying to free the resource) might have changed memory at the parent location, and what happens might be an access to unmapped memory, whatever - an oops and we don't know why. That's what i experienced and hunting. Situation with the patch applied: The owner gets a notification (parent=NULL) and we have an indication in the kernel log. If an owner of the resource where the parent is gone checks for the parent, we are fine. If he doesn't check: we have a NULL pointer deref with a warning message pointing to the root cause. Isn't it better to have a pointer to a crash rather than having unreliable racy crashes in such a case? Have a nice weekend. Best regards Carsten
On Fri, Aug 09, 2019 at 12:38:35PM +0000, Schmid, Carsten wrote: > Hi again, > > >> > >> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-) > >> My intention was to prevent some crashes, and help developers to find their bugs. > >> I think my patch exactly does this. > > > > Hehe, actually drivers not being able to block unbind has been bugging me > > for > > a while now, because there are cases where this would be really helpful. > >>> 1) make resources refcounted, have child resources take a ref on the parent > >>> 2) Disallow unbind on devices with bound child-devices? > >>> > >> Exactly what i was thinking of in first attempts. > >> But i fear that would break even more use cases. > >> > >> Hans, directly regarding the driver: > >> The problem i see is that the xhci_intel_unregister_pdev which is added > >> as an action with devm_add_action_or_reset() is called late by the framework, > >> later than the usb_hcd_pci_remove() in xhci_pci_remove. > >> Is there any chance to trigger this before? > >> This is what Greg meant with "right order". > > > > Ah, I missed that part, sure that should be easy, just stop using > > devm_add_action_or_reset() and do the xhci_intel_unregister_pdev() > > manually at the right time. The downside of this is that you also > > need to make sure it happens at the right time from probe error-paths > > but given the bug you are hitting, I guess that is probably > > already a problem. > > > @Hans: > Sure, will have a look at this. I think i have found where to do that, > but need to check how to get the pdev pointer there .... > > @Greg: > I am still confident that my patch in __release_region should be taken in. Ok, submit it in a "real" way and we can consider it :) thanks, greg k-h
>> >> @Greg: >> I am still confident that my patch in __release_region should be taken in. > > Ok, submit it in a "real" way and we can consider it :) > > thanks, > > greg k-h Already done, linux-kernel@vger.kernel.org, see https://www.spinics.net/lists/kernel/msg3218180.html Thanks, and have a nice weekend. Best regards Carsten
On Fri, Aug 09, 2019 at 01:00:25PM +0000, Schmid, Carsten wrote: > >> > >> @Greg: > >> I am still confident that my patch in __release_region should be taken in. > > > > Ok, submit it in a "real" way and we can consider it :) > > > > thanks, > > > > greg k-h > > Already done, linux-kernel@vger.kernel.org, see > https://www.spinics.net/lists/kernel/msg3218180.html You didn't cc: any developer, that's a sure way to get a patch ignored :( Try resending it with at least the people who get_maintainer.pl says has touched that file last in it. Also, Linus is the unofficial resource.c maintainer. I think he has a set of userspace testing scripts for changes somewhere, so you should cc: him too. And might as well add me :) thanks, greg k-h
Hi, On 09-08-19 14:38, Schmid, Carsten wrote: > Hi again, > >>> >>> Hey, i did not want to trigger an eartquake in the basement of the kernel ;-) >>> My intention was to prevent some crashes, and help developers to find their bugs. >>> I think my patch exactly does this. >> >> Hehe, actually drivers not being able to block unbind has been bugging me >> for >> a while now, because there are cases where this would be really helpful. >>>> 1) make resources refcounted, have child resources take a ref on the parent >>>> 2) Disallow unbind on devices with bound child-devices? >>>> >>> Exactly what i was thinking of in first attempts. >>> But i fear that would break even more use cases. >>> >>> Hans, directly regarding the driver: >>> The problem i see is that the xhci_intel_unregister_pdev which is added >>> as an action with devm_add_action_or_reset() is called late by the framework, >>> later than the usb_hcd_pci_remove() in xhci_pci_remove. >>> Is there any chance to trigger this before? >>> This is what Greg meant with "right order". >> >> Ah, I missed that part, sure that should be easy, just stop using >> devm_add_action_or_reset() and do the xhci_intel_unregister_pdev() >> manually at the right time. The downside of this is that you also >> need to make sure it happens at the right time from probe error-paths >> but given the bug you are hitting, I guess that is probably >> already a problem. >> > @Hans: > Sure, will have a look at this. I think i have found where to do that, > but need to check how to get the pdev pointer there .... You probably need to store the pdev pointer in one of the xhci driver's private data structs. Regards, Hans
diff --git a/kernel/resource.c b/kernel/resource.c index 158f04ec1d4f..95340cb0b1c2 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1200,6 +1200,15 @@ void __release_region(struct resource *parent, resource_size_t start, write_unlock(&resource_lock); if (res->flags & IORESOURCE_MUXED) wake_up(&muxed_resource_wait); + + write_lock(&resource_lock); + if (res->child) { + printk(KERN_WARNING "__release_region: %s has child %s," + "invalidating childs parent\n", + res->name, res->child->name); + res->child->parent = NULL; + } + write_unlock(&resource_lock); free_resource(res); return; }