Message ID | 20240729022316.92219-1-andrey.konovalov@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | usb: gadget: dummy_hcd: execute hrtimer callback in softirq context | expand |
Hi Andrey, On Mon, 2024-07-29 at 04:23 +0200, andrey.konovalov@linux.dev wrote: > From: Andrey Konovalov <andreyknvl@gmail.com> > > Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer > transfer > scheduler") switched dummy_hcd to use hrtimer and made the timer's > callback be executed in the hardirq context. > > With that change, __usb_hcd_giveback_urb now gets executed in the > hardirq > context, which causes problems for KCOV and KMSAN. > > One problem is that KCOV now is unable to collect coverage from > the USB code that gets executed from the dummy_hcd's timer callback, > as KCOV cannot collect coverage in the hardirq context. > > Another problem is that the dummy_hcd hrtimer might get triggered in > the > middle of a softirq with KCOV remote coverage collection enabled, and > that > causes a WARNING in KCOV, as reported by syzbot. (I sent a separate > patch > to shut down this WARNING, but that doesn't fix the other two > issues.) > > Finally, KMSAN appears to ignore tracking memory copying operations > that happen in the hardirq context, which causes false positive > kernel-infoleaks, as reported by syzbot. > > Change the hrtimer in dummy_hcd to execute the callback in the > softirq > context. > > Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac > Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad > Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer > transfer scheduler") > Cc: stable@vger.kernel.org > Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> > > --- > > Marcello, would this change be acceptable for your use case? Thanks for investigating and finding the cause of this problem. I have already submitted an identical patch to change the hrtimer to softirq: https://lkml.org/lkml/2024/6/26/969 However, your commit messages contain more useful information about the problem at hand. So I'm happy to drop my patch in favor of yours. Btw, the same problem has also been reported by the intel kernel test robot. So we should add additional tags to mark this patch as the fix. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202406141323.413a90d2-lkp@intel.com Acked-by: Marcello Sylvester Bauer <sylv@sylv.io> Thanks, Marcello > If we wanted to keep the hardirq hrtimer, we would need teach KCOV to > collect coverage in the hardirq context (or disable it, which would > be > unfortunate) and also fix whatever is wrong with KMSAN, but all that > requires some work. > --- > drivers/usb/gadget/udc/dummy_hcd.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/usb/gadget/udc/dummy_hcd.c > b/drivers/usb/gadget/udc/dummy_hcd.c > index f37b0d8386c1a..ff7bee78bcc49 100644 > --- a/drivers/usb/gadget/udc/dummy_hcd.c > +++ b/drivers/usb/gadget/udc/dummy_hcd.c > @@ -1304,7 +1304,8 @@ static int dummy_urb_enqueue( > > /* kick the scheduler, it'll do the rest */ > if (!hrtimer_active(&dum_hcd->timer)) > - hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), > HRTIMER_MODE_REL); > + hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), > + HRTIMER_MODE_REL_SOFT); > > done: > spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); > @@ -1325,7 +1326,7 @@ static int dummy_urb_dequeue(struct usb_hcd > *hcd, struct urb *urb, int status) > rc = usb_hcd_check_unlink_urb(hcd, urb, status); > if (!rc && dum_hcd->rh_state != DUMMY_RH_RUNNING && > !list_empty(&dum_hcd->urbp_list)) > - hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL); > + hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), > HRTIMER_MODE_REL_SOFT); > > spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); > return rc; > @@ -1995,7 +1996,8 @@ static enum hrtimer_restart dummy_timer(struct > hrtimer *t) > dum_hcd->udev = NULL; > } else if (dum_hcd->rh_state == DUMMY_RH_RUNNING) { > /* want a 1 msec delay here */ > - hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), > HRTIMER_MODE_REL); > + hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), > + HRTIMER_MODE_REL_SOFT); > } > > spin_unlock_irqrestore(&dum->lock, flags); > @@ -2389,7 +2391,7 @@ static int dummy_bus_resume(struct usb_hcd > *hcd) > dum_hcd->rh_state = DUMMY_RH_RUNNING; > set_link_state(dum_hcd); > if (!list_empty(&dum_hcd->urbp_list)) > - hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL); > + hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), > HRTIMER_MODE_REL_SOFT); > hcd->state = HC_STATE_RUNNING; > } > spin_unlock_irq(&dum_hcd->dum->lock); > @@ -2467,7 +2469,7 @@ static DEVICE_ATTR_RO(urbs); > > static int dummy_start_ss(struct dummy_hcd *dum_hcd) > { > - hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > + hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, > HRTIMER_MODE_REL_SOFT); > dum_hcd->timer.function = dummy_timer; > dum_hcd->rh_state = DUMMY_RH_RUNNING; > dum_hcd->stream_en_ep = 0; > @@ -2497,7 +2499,7 @@ static int dummy_start(struct usb_hcd *hcd) > return dummy_start_ss(dum_hcd); > > spin_lock_init(&dum_hcd->dum->lock); > - hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > + hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, > HRTIMER_MODE_REL_SOFT); > dum_hcd->timer.function = dummy_timer; > dum_hcd->rh_state = DUMMY_RH_RUNNING; >
On Mon, Jul 29, 2024 at 10:26 AM Marcello Sylvester Bauer <sylv@sylv.io> wrote: > > Hi Andrey, Hi Marcello, > Thanks for investigating and finding the cause of this problem. I have > already submitted an identical patch to change the hrtimer to softirq: > https://lkml.org/lkml/2024/6/26/969 Ah, I missed that, that's great! > However, your commit messages contain more useful information about the > problem at hand. So I'm happy to drop my patch in favor of yours. That's very considerate, thank you. I'll leave this up to Greg - I don't mind using either patch. > Btw, the same problem has also been reported by the intel kernel test > robot. So we should add additional tags to mark this patch as the fix. > > > Reported-by: kernel test robot <oliver.sang@intel.com> > Closes: > https://lore.kernel.org/oe-lkp/202406141323.413a90d2-lkp@intel.com > Acked-by: Marcello Sylvester Bauer <sylv@sylv.io> Let's also add the syzbot reports mentioned in your patch: Reported-by: syzbot+c793a7eca38803212c61@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c793a7eca38803212c61 Reported-by: syzbot+1e6e0b916b211bee1bd6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1e6e0b916b211bee1bd6 And I also found one more: Reported-by: syzbot+edd9fe0d3a65b14588d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5 Thank you!
On Mon, Jul 29, 2024 at 06:14:30PM +0200, Andrey Konovalov wrote: > On Mon, Jul 29, 2024 at 10:26 AM Marcello Sylvester Bauer <sylv@sylv.io> wrote: > > > > Hi Andrey, > > Hi Marcello, > > > Thanks for investigating and finding the cause of this problem. I have > > already submitted an identical patch to change the hrtimer to softirq: > > https://lkml.org/lkml/2024/6/26/969 > > Ah, I missed that, that's great! > > > However, your commit messages contain more useful information about the > > problem at hand. So I'm happy to drop my patch in favor of yours. > > That's very considerate, thank you. I'll leave this up to Greg - I > don't mind using either patch. > > > Btw, the same problem has also been reported by the intel kernel test > > robot. So we should add additional tags to mark this patch as the fix. > > > > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > Closes: > > https://lore.kernel.org/oe-lkp/202406141323.413a90d2-lkp@intel.com > > Acked-by: Marcello Sylvester Bauer <sylv@sylv.io> > > Let's also add the syzbot reports mentioned in your patch: > > Reported-by: syzbot+c793a7eca38803212c61@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=c793a7eca38803212c61 > Reported-by: syzbot+1e6e0b916b211bee1bd6@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=1e6e0b916b211bee1bd6 > > And I also found one more: > > Reported-by: syzbot+edd9fe0d3a65b14588d5@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5 You need to be careful about claiming that this patch will fix those bug reports. At least one of them (the last one above) still fails with the patch applied. See: https://lore.kernel.org/linux-usb/ade15714-6aa3-4988-8b45-719fc9d74727@rowland.harvard.edu/ and the following response. Alan Stern
On Mon, Jul 29, 2024 at 8:01 PM Alan Stern <stern@rowland.harvard.edu> wrote: > > > And I also found one more: > > > > Reported-by: syzbot+edd9fe0d3a65b14588d5@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5 > > You need to be careful about claiming that this patch will fix those bug > reports. At least one of them (the last one above) still fails with the > patch applied. See: > > https://lore.kernel.org/linux-usb/ade15714-6aa3-4988-8b45-719fc9d74727@rowland.harvard.edu/ > > and the following response. Ah, right, that one is something else, so let's not add those last Reported-by/Closes. However, that crash was bisected to the same guilty patch, so the issue is somehow related. Even if we were to mark it as to be fixed with the patch I sent, this wouldn't be critical: syzbot would just rereport it, and with fresher stack traces. Thank you!
On Mon, Jul 29, 2024 at 4:23 AM <andrey.konovalov@linux.dev> wrote: > > From: Andrey Konovalov <andreyknvl@gmail.com> > > Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer > scheduler") switched dummy_hcd to use hrtimer and made the timer's > callback be executed in the hardirq context. > > With that change, __usb_hcd_giveback_urb now gets executed in the hardirq > context, which causes problems for KCOV and KMSAN. > > One problem is that KCOV now is unable to collect coverage from > the USB code that gets executed from the dummy_hcd's timer callback, > as KCOV cannot collect coverage in the hardirq context. > > Another problem is that the dummy_hcd hrtimer might get triggered in the > middle of a softirq with KCOV remote coverage collection enabled, and that > causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch > to shut down this WARNING, but that doesn't fix the other two issues.) > > Finally, KMSAN appears to ignore tracking memory copying operations > that happen in the hardirq context, which causes false positive > kernel-infoleaks, as reported by syzbot. Hi Andrey, FWIW this problem is tracked as https://github.com/google/kmsan/issues/92, I'll try to revisit it in September.
On Mon, Jul 29, 2024 at 4:23 AM <andrey.konovalov@linux.dev> wrote: > > From: Andrey Konovalov <andreyknvl@gmail.com> > > Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer > scheduler") switched dummy_hcd to use hrtimer and made the timer's > callback be executed in the hardirq context. > > With that change, __usb_hcd_giveback_urb now gets executed in the hardirq > context, which causes problems for KCOV and KMSAN. > > One problem is that KCOV now is unable to collect coverage from > the USB code that gets executed from the dummy_hcd's timer callback, > as KCOV cannot collect coverage in the hardirq context. > > Another problem is that the dummy_hcd hrtimer might get triggered in the > middle of a softirq with KCOV remote coverage collection enabled, and that > causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch > to shut down this WARNING, but that doesn't fix the other two issues.) > > Finally, KMSAN appears to ignore tracking memory copying operations > that happen in the hardirq context, which causes false positive > kernel-infoleaks, as reported by syzbot. > > Change the hrtimer in dummy_hcd to execute the callback in the softirq > context. > > Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac > Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad > Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") > Cc: stable@vger.kernel.org > Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Hi Greg, Could you pick up either this or Marcello's patch (https://lkml.org/lkml/2024/6/26/969)? In case they got lost. Thank you!
On Tue, Aug 27, 2024 at 02:02:00AM +0200, Andrey Konovalov wrote: > On Mon, Jul 29, 2024 at 4:23 AM <andrey.konovalov@linux.dev> wrote: > > > > From: Andrey Konovalov <andreyknvl@gmail.com> > > > > Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer > > scheduler") switched dummy_hcd to use hrtimer and made the timer's > > callback be executed in the hardirq context. > > > > With that change, __usb_hcd_giveback_urb now gets executed in the hardirq > > context, which causes problems for KCOV and KMSAN. > > > > One problem is that KCOV now is unable to collect coverage from > > the USB code that gets executed from the dummy_hcd's timer callback, > > as KCOV cannot collect coverage in the hardirq context. > > > > Another problem is that the dummy_hcd hrtimer might get triggered in the > > middle of a softirq with KCOV remote coverage collection enabled, and that > > causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch > > to shut down this WARNING, but that doesn't fix the other two issues.) > > > > Finally, KMSAN appears to ignore tracking memory copying operations > > that happen in the hardirq context, which causes false positive > > kernel-infoleaks, as reported by syzbot. > > > > Change the hrtimer in dummy_hcd to execute the callback in the softirq > > context. > > > > Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac > > Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad > > Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") > > Cc: stable@vger.kernel.org > > Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> > > Hi Greg, > > Could you pick up either this or Marcello's patch > (https://lkml.org/lkml/2024/6/26/969)? In case they got lost. Both are lost now, (and please use lore.kernel.org, not lkml.org), can you resend the one that you wish to see accepted? thanks, greg k-h
On Tue, Sep 3, 2024 at 9:09 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > Hi Greg, > > > > Could you pick up either this or Marcello's patch > > (https://lkml.org/lkml/2024/6/26/969)? In case they got lost. > > Both are lost now, (and please use lore.kernel.org, not lkml.org), can > you resend the one that you wish to see accepted? Done: https://lore.kernel.org/linux-usb/20240904013051.4409-1-andrey.konovalov@linux.dev/T/#u Thanks!
diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c index f37b0d8386c1a..ff7bee78bcc49 100644 --- a/drivers/usb/gadget/udc/dummy_hcd.c +++ b/drivers/usb/gadget/udc/dummy_hcd.c @@ -1304,7 +1304,8 @@ static int dummy_urb_enqueue( /* kick the scheduler, it'll do the rest */ if (!hrtimer_active(&dum_hcd->timer)) - hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), + HRTIMER_MODE_REL_SOFT); done: spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); @@ -1325,7 +1326,7 @@ static int dummy_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) rc = usb_hcd_check_unlink_urb(hcd, urb, status); if (!rc && dum_hcd->rh_state != DUMMY_RH_RUNNING && !list_empty(&dum_hcd->urbp_list)) - hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL_SOFT); spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); return rc; @@ -1995,7 +1996,8 @@ static enum hrtimer_restart dummy_timer(struct hrtimer *t) dum_hcd->udev = NULL; } else if (dum_hcd->rh_state == DUMMY_RH_RUNNING) { /* want a 1 msec delay here */ - hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), + HRTIMER_MODE_REL_SOFT); } spin_unlock_irqrestore(&dum->lock, flags); @@ -2389,7 +2391,7 @@ static int dummy_bus_resume(struct usb_hcd *hcd) dum_hcd->rh_state = DUMMY_RH_RUNNING; set_link_state(dum_hcd); if (!list_empty(&dum_hcd->urbp_list)) - hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL_SOFT); hcd->state = HC_STATE_RUNNING; } spin_unlock_irq(&dum_hcd->dum->lock); @@ -2467,7 +2469,7 @@ static DEVICE_ATTR_RO(urbs); static int dummy_start_ss(struct dummy_hcd *dum_hcd) { - hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); dum_hcd->timer.function = dummy_timer; dum_hcd->rh_state = DUMMY_RH_RUNNING; dum_hcd->stream_en_ep = 0; @@ -2497,7 +2499,7 @@ static int dummy_start(struct usb_hcd *hcd) return dummy_start_ss(dum_hcd); spin_lock_init(&dum_hcd->dum->lock); - hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); dum_hcd->timer.function = dummy_timer; dum_hcd->rh_state = DUMMY_RH_RUNNING;