Message ID | 8735hniqcm.fsf@posteo.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3] net: atlantic: always deep reset on pm op, fixing null deref regression | expand |
On 06.05.22 00:09, Manuel Ullmann wrote: >>From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001 > Date: Wed, 4 May 2022 21:30:44 +0200 > > The impact of this regression is the same for resume that I saw on > thaw: the kernel hangs and nothing except SysRq rebooting can be done. > > The null deref occurs at the same position as on thaw. > BUG: kernel NULL pointer dereference > RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic] > > Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep > par in pm functions, preventing null derefs"), where I disabled deep > pm resets in suspend and resume, trying to make sense of the > atl_resume_common deep parameter in the first place. > > It turns out, that atlantic always has to deep reset on pm operations > and the parameter is useless. Even though I expected that and tested > resume, I screwed up by kexec-rebooting into an unpatched kernel, thus > missing the breakage. > > This fixup obsoletes the deep parameter of atl_resume_common, but I > leave the cleanup for the maintainers to post to mainline. FWIW, this section starting here and... > PS: I'm very sorry for this regression. > > Changes in v2: > Patch formatting fixes > - Fix Fixes tag > – Simplify stable Cc tag > – Fix Signed-off-by tag > > Changes in v3: > – Prefixed commit reference with "commit" aka I managed to use > checkpatch.pl. > - Added Tested-by tags for the testing reporters. > – People start to get annoyed by my patch revision spamming. Should be > the last one. ...ending here needs should be below the "---" line you already have below. For details see: https://www.kernel.org/doc/html/latest/process/submitting-patches.html BTW, same goes for any "#regzbot" commands (like you had in cbe6c3a8f8f4), as things otherwise get confused when a patch for example is posted as part of a stable/longterm -rc review. But don't worry, no big deal, I handled that :-D Many thx for actually directly getting regzbot involved and taking care of this regression! > Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") > Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ > Reported-by: Jordan Leppert <jordanleppert@protonmail.com> > Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> > Tested-by: Jordan Leppert <jordanleppert@protonmail.com> > Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> > Cc: <stable@vger.kernel.org> # 5.10+ > Signed-off-by: Manuel Ullmann <labre@posteo.de> > --- > drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Ciao, Thorsten
Thorsten Leemhuis <regressions@leemhuis.info> writes: > On 06.05.22 00:09, Manuel Ullmann wrote: >>>From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001 >> Date: Wed, 4 May 2022 21:30:44 +0200 >> >> The impact of this regression is the same for resume that I saw on >> thaw: the kernel hangs and nothing except SysRq rebooting can be done. >> >> The null deref occurs at the same position as on thaw. >> BUG: kernel NULL pointer dereference >> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic] >> >> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep >> par in pm functions, preventing null derefs"), where I disabled deep >> pm resets in suspend and resume, trying to make sense of the >> atl_resume_common deep parameter in the first place. >> >> It turns out, that atlantic always has to deep reset on pm operations >> and the parameter is useless. Even though I expected that and tested >> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus >> missing the breakage. >> >> This fixup obsoletes the deep parameter of atl_resume_common, but I >> leave the cleanup for the maintainers to post to mainline. > > FWIW, this section starting here and... > >> PS: I'm very sorry for this regression. >> >> Changes in v2: >> Patch formatting fixes >> - Fix Fixes tag >> – Simplify stable Cc tag >> – Fix Signed-off-by tag >> >> Changes in v3: >> – Prefixed commit reference with "commit" aka I managed to use >> checkpatch.pl. >> - Added Tested-by tags for the testing reporters. >> – People start to get annoyed by my patch revision spamming. Should be >> the last one. > > ...ending here needs should be below the "---" line you already have > below. For details see: > https://www.kernel.org/doc/html/latest/process/submitting-patches.html > > BTW, same goes for any "#regzbot" commands (like you had in > cbe6c3a8f8f4), as things otherwise get confused when a patch for example > is posted as part of a stable/longterm -rc review. Good to know. Maybe I could patch the handling-regressions documentation to include this. submitting-patches could also link the subsystem specific documentation, e.g. the netdev FAQ, since they handle patches with their more bot tests. Would have helped me a bit. Might be a nice exercise for properly formatted patching ;) > > But don't worry, no big deal, I handled that :-D Many thx for actually > directly getting regzbot involved and taking care of this regression! Thank you for the final cleanup and you’re welcome. :) Where did you handle this? I can’t seem to find the fixup anywhere, i.e. net-next, net, linux-next or lkml. I actually took the time and read that and all related documentation (stable, regressions, coding style) during my vacation a few weeks ago, but my memory was partially overwritten by less useful (work related) data. Instead of regression reports induced panic mode, I should have reread the submitting-patches, especially the last section. >> Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") >> Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ >> Reported-by: Jordan Leppert <jordanleppert@protonmail.com> >> Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> >> Tested-by: Jordan Leppert <jordanleppert@protonmail.com> >> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> >> Cc: <stable@vger.kernel.org> # 5.10+ >> Signed-off-by: Manuel Ullmann <labre@posteo.de> >> --- >> drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) > > Ciao, Thorsten Regards, Manuel
On 07.05.22 15:10, Manuel Ullmann wrote: > Thorsten Leemhuis <regressions@leemhuis.info> writes: > >> On 06.05.22 00:09, Manuel Ullmann wrote: >>> >From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001 >>> Date: Wed, 4 May 2022 21:30:44 +0200 >>> >>> The impact of this regression is the same for resume that I saw on >>> thaw: the kernel hangs and nothing except SysRq rebooting can be done. >>> >>> The null deref occurs at the same position as on thaw. >>> BUG: kernel NULL pointer dereference >>> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic] >>> >>> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep >>> par in pm functions, preventing null derefs"), where I disabled deep >>> pm resets in suspend and resume, trying to make sense of the >>> atl_resume_common deep parameter in the first place. >>> >>> It turns out, that atlantic always has to deep reset on pm operations >>> and the parameter is useless. Even though I expected that and tested >>> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus >>> missing the breakage. >>> >>> This fixup obsoletes the deep parameter of atl_resume_common, but I >>> leave the cleanup for the maintainers to post to mainline. >> >> FWIW, this section starting here and... >> >>> PS: I'm very sorry for this regression. >>> >>> Changes in v2: >>> Patch formatting fixes >>> - Fix Fixes tag >>> – Simplify stable Cc tag >>> – Fix Signed-off-by tag >>> >>> Changes in v3: >>> – Prefixed commit reference with "commit" aka I managed to use >>> checkpatch.pl. >>> - Added Tested-by tags for the testing reporters. >>> – People start to get annoyed by my patch revision spamming. Should be >>> the last one. >> >> ...ending here needs should be below the "---" line you already have >> below. For details see: >> https://www.kernel.org/doc/html/latest/process/submitting-patches.html Sorry, I caused a misunderstanding. I didn't handle the above, I'm not one of the net subsystem developers. Either you submit a v4 fixing this or hope the net maintainer take care of that when they look at it -- but I guess they would highly prefer it if you'd address this. >> BTW, same goes for any "#regzbot" commands (like you had in >> cbe6c3a8f8f4), as things otherwise get confused when a patch for example >> is posted as part of a stable/longterm -rc review. > Good to know. Maybe I could patch the handling-regressions documentation > to include this. Yeah, I have already thought about it, but didn't get down to it yet. Only so much hours in a day. > submitting-patches could also link the subsystem > specific documentation, e.g. the netdev FAQ, since they handle patches > with their more bot tests. Would have helped me a bit. Might be a nice > exercise for properly formatted patching ;) I agree that the docs for submitting patches could need a few improvements and that is likely one of them. >> But don't worry, no big deal, I handled that :-D Many thx for actually >> directly getting regzbot involved and taking care of this regression! > Thank you for the final cleanup and you’re welcome. :) Where did you > handle this? I can’t seem to find the fixup anywhere, i.e. net-next, > net, linux-next or lkml. See above, I only handled the regzbot issue, not the issue with this patch. Sorry for not being clear enough in my wording. > [...] Ciao, Thorsten
Thorsten Leemhuis <regressions@leemhuis.info> writes: > On 07.05.22 15:10, Manuel Ullmann wrote: >> Thorsten Leemhuis <regressions@leemhuis.info> writes: >> >>> On 06.05.22 00:09, Manuel Ullmann wrote: >>>> >From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001 >>>> Date: Wed, 4 May 2022 21:30:44 +0200 >>>> >>>> The impact of this regression is the same for resume that I saw on >>>> thaw: the kernel hangs and nothing except SysRq rebooting can be done. >>>> >>>> The null deref occurs at the same position as on thaw. >>>> BUG: kernel NULL pointer dereference >>>> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic] >>>> >>>> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep >>>> par in pm functions, preventing null derefs"), where I disabled deep >>>> pm resets in suspend and resume, trying to make sense of the >>>> atl_resume_common deep parameter in the first place. >>>> >>>> It turns out, that atlantic always has to deep reset on pm operations >>>> and the parameter is useless. Even though I expected that and tested >>>> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus >>>> missing the breakage. >>>> >>>> This fixup obsoletes the deep parameter of atl_resume_common, but I >>>> leave the cleanup for the maintainers to post to mainline. >>> >>> FWIW, this section starting here and... >>> >>>> PS: I'm very sorry for this regression. >>>> >>>> Changes in v2: >>>> Patch formatting fixes >>>> - Fix Fixes tag >>>> – Simplify stable Cc tag >>>> – Fix Signed-off-by tag >>>> >>>> Changes in v3: >>>> – Prefixed commit reference with "commit" aka I managed to use >>>> checkpatch.pl. >>>> - Added Tested-by tags for the testing reporters. >>>> – People start to get annoyed by my patch revision spamming. Should be >>>> the last one. >>> >>> ...ending here needs should be below the "---" line you already have >>> below. For details see: >>> https://www.kernel.org/doc/html/latest/process/submitting-patches.html > > Sorry, I caused a misunderstanding. I didn't handle the above, I'm not > one of the net subsystem developers. Either you submit a v4 fixing this > or hope the net maintainer take care of that when they look at it -- but > I guess they would highly prefer it if you'd address this. Never mind. Then I’ll post a v4. Thanks for handling the regzbot tracking. I indeed just assumed this already to be correctly regzbot tracked. Never post in a hurry. >>> BTW, same goes for any "#regzbot" commands (like you had in >>> cbe6c3a8f8f4), as things otherwise get confused when a patch for example >>> is posted as part of a stable/longterm -rc review. >> Good to know. Maybe I could patch the handling-regressions documentation >> to include this. > > Yeah, I have already thought about it, but didn't get down to it yet. Well, I could try it eventually. > Only so much hours in a day. I know that issue. ;) >> submitting-patches could also link the subsystem >> specific documentation, e.g. the netdev FAQ, since they handle patches >> with their more bot tests. Would have helped me a bit. Might be a nice >> exercise for properly formatted patching ;) > > I agree that the docs for submitting patches could need a few > improvements and that is likely one of them. Then I’ll try fixing this, too. After all most devs have scarce time for documentation. >>> But don't worry, no big deal, I handled that :-D Many thx for actually >>> directly getting regzbot involved and taking care of this regression! >> Thank you for the final cleanup and you’re welcome. :) Where did you >> handle this? I can’t seem to find the fixup anywhere, i.e. net-next, >> net, linux-next or lkml. > > See above, I only handled the regzbot issue, not the issue with this > patch. Sorry for not being clear enough in my wording. Thanks for clearing this up. Regards, Manuel
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c index 3a529ee8c834..831833911a52 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c @@ -449,7 +449,7 @@ static int aq_pm_freeze(struct device *dev) static int aq_pm_suspend_poweroff(struct device *dev) { - return aq_suspend_common(dev, false); + return aq_suspend_common(dev, true); } static int aq_pm_thaw(struct device *dev) @@ -459,7 +459,7 @@ static int aq_pm_thaw(struct device *dev) static int aq_pm_resume_restore(struct device *dev) { - return atl_resume_common(dev, false); + return atl_resume_common(dev, true); } static const struct dev_pm_ops aq_pm_ops = {