Message ID | Y7RezbPSGrO37NZZ@shell.armlinux.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | Fix a whole host of nvmem registration/cleanup issues | expand |
On 04/01/2023 01.58, Russell King (Oracle) wrote: > Hi, > > This series fixes a whole host of nvmem registration/error cleanup > issues that have been identified by both Hector and myself. It is a > substantial rework of my original patch fixing the first problem. > > The first most obvious problem is the race between nvmem registration > and use, which leads to sporadic failures of drivers to probe at boot > time. > > While fixing this, it has been noticed that a recent fix to check the > return value of dev_set_name() introduced a new bug where wp_gpio was > not being put in that newly introduced error path. > > Then there's a fix for a previous fix which itself purports to fix > another bug, but results in the allocated ID being leaked. Fix for a > fix for a fix is not good! > > Then there's an error in the docbook documentation for wp_gpio (it's > listed as wp-gpio instead) but as nothing seems to set wp_gpio, we > might as well get rid of it - which also solves the issue that we > call gpiod_put() on this whether we own it or not. > > Lastly, there's a fix for yet another spurious white-space in this > code, one of what seems to be a long history of past white-space > fixes. > > These patches have been individually build-tested in the order of > posting, but not run-time tested except for the entire series. > > drivers/nvmem/core.c | 51 ++++++++++++++++++------------------------ > include/linux/nvmem-provider.h | 2 -- > 2 files changed, 22 insertions(+), 31 deletions(-) > Uhh. The series itself looks fine as far as fixing the problems, but I fail to see how this is any better than my attempt as far as backporting or commit atomicity goes. Patch #4 fixes the newer gpio leak bug *and* half fixes the race condition bug, then patch #5 completes the race condition fix but now depends on #4, meaning you're left with exactly the same backporting mess since now you can't apply #5 to older kernels and #4 only to newer ones. Splitting the commits like this buys you nothing. I thought we were doing minimal backportable fixes to solve this, but your commit message for #4 literally says "While a minimal fix for this would be to add the gpiod_put() call, we can do better if we split device_register() [...]"... and then that whole "let's do better" part is what breaks the backportability again. And then of course if you *do* manage to queue at least #4 to be backported to a newer subset of stable trees, #3 certainly isn't going to get backported itself (since it's just removing dead code, not eligible for stable since it fixes no actual bugs), but then you're left with the same broken-on-paper-except-nobody-uses-it-anyway-so-it-doesn't-matter situation my v2 left us in for those stable kernels. That said, thanks for identifying that nobody uses the functionality I supposedly regressed (in a tiny corner case code path where it was already broken anyway) in my v2, and therefore I didn't actually regress anything in practice and strictly fixed real bugs. - Hector
Really not interested in your politics. Not interested in fixing this problem. I'll use these patches to fix the problem in my tree. I don't care about mainline. On Wed, Jan 04, 2023 at 03:12:44AM +0900, Hector Martin wrote: > On 04/01/2023 01.58, Russell King (Oracle) wrote: > > Hi, > > > > This series fixes a whole host of nvmem registration/error cleanup > > issues that have been identified by both Hector and myself. It is a > > substantial rework of my original patch fixing the first problem. > > > > The first most obvious problem is the race between nvmem registration > > and use, which leads to sporadic failures of drivers to probe at boot > > time. > > > > While fixing this, it has been noticed that a recent fix to check the > > return value of dev_set_name() introduced a new bug where wp_gpio was > > not being put in that newly introduced error path. > > > > Then there's a fix for a previous fix which itself purports to fix > > another bug, but results in the allocated ID being leaked. Fix for a > > fix for a fix is not good! > > > > Then there's an error in the docbook documentation for wp_gpio (it's > > listed as wp-gpio instead) but as nothing seems to set wp_gpio, we > > might as well get rid of it - which also solves the issue that we > > call gpiod_put() on this whether we own it or not. > > > > Lastly, there's a fix for yet another spurious white-space in this > > code, one of what seems to be a long history of past white-space > > fixes. > > > > These patches have been individually build-tested in the order of > > posting, but not run-time tested except for the entire series. > > > > drivers/nvmem/core.c | 51 ++++++++++++++++++------------------------ > > include/linux/nvmem-provider.h | 2 -- > > 2 files changed, 22 insertions(+), 31 deletions(-) > > > > Uhh. The series itself looks fine as far as fixing the problems, but I > fail to see how this is any better than my attempt as far as backporting > or commit atomicity goes. Patch #4 fixes the newer gpio leak bug *and* > half fixes the race condition bug, then patch #5 completes the race > condition fix but now depends on #4, meaning you're left with exactly > the same backporting mess since now you can't apply #5 to older kernels > and #4 only to newer ones. Splitting the commits like this buys you nothing. > > I thought we were doing minimal backportable fixes to solve this, but > your commit message for #4 literally says "While a minimal fix for this > would be to add the gpiod_put() call, we can do better if we split > device_register() [...]"... and then that whole "let's do better" part > is what breaks the backportability again. > > And then of course if you *do* manage to queue at least #4 to be > backported to a newer subset of stable trees, #3 certainly isn't going > to get backported itself (since it's just removing dead code, not > eligible for stable since it fixes no actual bugs), but then you're left > with the same > broken-on-paper-except-nobody-uses-it-anyway-so-it-doesn't-matter > situation my v2 left us in for those stable kernels. > > That said, thanks for identifying that nobody uses the functionality I > supposedly regressed (in a tiny corner case code path where it was > already broken anyway) in my v2, and therefore I didn't actually regress > anything in practice and strictly fixed real bugs. > > - Hector >
Hi Russell, On 03/01/2023 16:58, Russell King (Oracle) wrote: > Hi, > > This series fixes a whole host of nvmem registration/error cleanup > issues that have been identified by both Hector and myself. It is a > substantial rework of my original patch fixing the first problem. > > The first most obvious problem is the race between nvmem registration > and use, which leads to sporadic failures of drivers to probe at boot > time. > > While fixing this, it has been noticed that a recent fix to check the > return value of dev_set_name() introduced a new bug where wp_gpio was > not being put in that newly introduced error path. > > Then there's a fix for a previous fix which itself purports to fix > another bug, but results in the allocated ID being leaked. Fix for a > fix for a fix is not good! > > Then there's an error in the docbook documentation for wp_gpio (it's > listed as wp-gpio instead) but as nothing seems to set wp_gpio, we > might as well get rid of it - which also solves the issue that we > call gpiod_put() on this whether we own it or not. > > Lastly, there's a fix for yet another spurious white-space in this > code, one of what seems to be a long history of past white-space > fixes. > > These patches have been individually build-tested in the order of > posting, but not run-time tested except for the entire series. thanks for fixing these issues, I have applied these after fixing the subject on all the patches, as it ended up with email ids in subject. thanks, Srini > > drivers/nvmem/core.c | 51 ++++++++++++++++++------------------------ > include/linux/nvmem-provider.h | 2 -- > 2 files changed, 22 insertions(+), 31 deletions(-) >
On 04/01/2023 06.15, Srinivas Kandagatla wrote: > Hi Russell, > > > On 03/01/2023 16:58, Russell King (Oracle) wrote: >> Hi, >> >> This series fixes a whole host of nvmem registration/error cleanup >> issues that have been identified by both Hector and myself. It is a >> substantial rework of my original patch fixing the first problem. >> >> The first most obvious problem is the race between nvmem registration >> and use, which leads to sporadic failures of drivers to probe at boot >> time. >> >> While fixing this, it has been noticed that a recent fix to check the >> return value of dev_set_name() introduced a new bug where wp_gpio was >> not being put in that newly introduced error path. >> >> Then there's a fix for a previous fix which itself purports to fix >> another bug, but results in the allocated ID being leaked. Fix for a >> fix for a fix is not good! >> >> Then there's an error in the docbook documentation for wp_gpio (it's >> listed as wp-gpio instead) but as nothing seems to set wp_gpio, we >> might as well get rid of it - which also solves the issue that we >> call gpiod_put() on this whether we own it or not. >> >> Lastly, there's a fix for yet another spurious white-space in this >> code, one of what seems to be a long history of past white-space >> fixes. >> >> These patches have been individually build-tested in the order of >> posting, but not run-time tested except for the entire series. > > > thanks for fixing these issues, I have applied these after fixing the > subject on all the patches, as it ended up with email ids in subject. Right. I see none of the issues you two lectured me about actually mattered, it was all for show, and this isn't getting backported anyway. Good job you two. The day I finally rage quit the kernel after enough of this nonsense and make a big dossier of just how fucked up the kernel maintainer community's attitude is, I will be sure to use this as an example. That day is not today though. But I certainly won't be upstreaming any more patches to nvmem. - Hector
On Wed, Jan 04, 2023 at 10:15:14AM +0900, Hector Martin wrote: > On 04/01/2023 06.15, Srinivas Kandagatla wrote: > > Hi Russell, > > > > > > On 03/01/2023 16:58, Russell King (Oracle) wrote: > >> Hi, > >> > >> This series fixes a whole host of nvmem registration/error cleanup > >> issues that have been identified by both Hector and myself. It is a > >> substantial rework of my original patch fixing the first problem. > >> > >> The first most obvious problem is the race between nvmem registration > >> and use, which leads to sporadic failures of drivers to probe at boot > >> time. > >> > >> While fixing this, it has been noticed that a recent fix to check the > >> return value of dev_set_name() introduced a new bug where wp_gpio was > >> not being put in that newly introduced error path. > >> > >> Then there's a fix for a previous fix which itself purports to fix > >> another bug, but results in the allocated ID being leaked. Fix for a > >> fix for a fix is not good! > >> > >> Then there's an error in the docbook documentation for wp_gpio (it's > >> listed as wp-gpio instead) but as nothing seems to set wp_gpio, we > >> might as well get rid of it - which also solves the issue that we > >> call gpiod_put() on this whether we own it or not. > >> > >> Lastly, there's a fix for yet another spurious white-space in this > >> code, one of what seems to be a long history of past white-space > >> fixes. > >> > >> These patches have been individually build-tested in the order of > >> posting, but not run-time tested except for the entire series. > > > > > > thanks for fixing these issues, I have applied these after fixing the > > subject on all the patches, as it ended up with email ids in subject. > > Right. I see none of the issues you two lectured me about actually > mattered, it was all for show, and this isn't getting backported anyway. > > Good job you two. The day I finally rage quit the kernel after enough of > this nonsense and make a big dossier of just how fucked up the kernel > maintainer community's attitude is, I will be sure to use this as an > example. > > That day is not today though. But I certainly won't be upstreaming any > more patches to nvmem. You've really little clue, have you. I really don't see that you'll *ever* get apple hardware properly functional in mainline. Good luck maintaining a fork of the kernel for ever into the future. I've had enough of you.
On Tue, Jan 03, 2023 at 08:56:32PM +0000, Russell King (Oracle) wrote: > Really not interested in your politics. Not interested in fixing this > problem. > > I'll use these patches to fix the problem in my tree. I don't care about > mainline. Having thought this over, this was an unfair over-reaction for which I'd like to apologise. It was proving to be a very stressful couple of days. > > Uhh. The series itself looks fine as far as fixing the problems, but I > > fail to see how this is any better than my attempt as far as backporting > > or commit atomicity goes. Patch #4 fixes the newer gpio leak bug *and* > > half fixes the race condition bug, There are two choices for patch for: 1. add gpiod_put(). 2. make use of the gpiod_put() already present in the release function. Either way, patch 5 depends on patch 4, and there's no way around that. This does not mean that they can't be back ported though - patch 4 can be backported to the appropriate point, and further backported merely by dropping one hunk - and thus it becomes a preparatory patch for patch 5, and even more so, separates the conversion from device_register() to device_initialize()+device_add() from the actual fix itself (which in patch 5 merely moves device_add() and rearranges the cleanup.) However, where our patches differ is that my series fixes one problem in one patch, rather than trying to address multiple problems in one patch. As has been pointed out, this is a documented requirement in the submitting-patches.rst document, which has been there for a very long time. You had been pointed to this document already over this point. Therefore, I believe my series to be a technically better approach which addresses several more issues while conforming to the "Solve only one problem per patch." requirement which can be trivially backported - and I truely believe that even patch 4 complies with the requirement in submitting-patches.rst. I certainly do not believe patch 4 is a kind of "partial" fix for the race condition, since it in no way changes the presence of the race. I hope that we can continue to work to get the Apple M1 supported more fully in the future.
On 11/01/2023 01.25, Russell King (Oracle) wrote: > On Tue, Jan 03, 2023 at 08:56:32PM +0000, Russell King (Oracle) wrote: >> Really not interested in your politics. Not interested in fixing this >> problem. >> >> I'll use these patches to fix the problem in my tree. I don't care about >> mainline. > > Having thought this over, this was an unfair over-reaction for which > I'd like to apologise. It was proving to be a very stressful couple > of days. Thank you. > >>> Uhh. The series itself looks fine as far as fixing the problems, but I >>> fail to see how this is any better than my attempt as far as backporting >>> or commit atomicity goes. Patch #4 fixes the newer gpio leak bug *and* >>> half fixes the race condition bug, > > There are two choices for patch for: > > 1. add gpiod_put(). > 2. make use of the gpiod_put() already present in the release function. > > Either way, patch 5 depends on patch 4, and there's no way around that. How so? My idea was to just add gpiod_put() as one commit, do the race condition fix as another, and optionally clean up as a third commit after that to unify the error paths. I just tried it and it effectively avoids the inter-patch dependency. Done like this, both patches apply cleanly to the 6.1 stable tree, while the race fix applies cleanly to at least the 6.0 and 5.15 stable trees (haven't tried going further back). No manual backports needed. It just works with a Cc: stable@. > However, where our patches differ is that my series fixes one problem > in one patch, rather than trying to address multiple problems in one > patch. As has been pointed out, this is a documented requirement in > the submitting-patches.rst document, which has been there for a very > long time. You had been pointed to this document already over this > point. I'd like to direct your attention to submitting-patches.rst: > For example, if your changes include both bug fixes and performance > enhancements for a single driver, separate those changes into two > or more patches. Note how it says *bug fixes* (plural), and asks you to separate the changes into *two or more* patches (where one is the performance enhancements, and the other the multiple bug fixes, in the two case). The author of the document clearly understood that sometimes it makes perfect sense to group related changes together, when they are of the same type and touching closely related bits of code, and focused on the more egregious case of making wildly unrelated changes (performance and bugfixes, presumably in unrelated parts of the code) in one commit. *That* is what we are trying to avoid, because when done indiscriminately, it makes bisecting and backporting and review much harder. Yet, there is this persistent myth among some kernel maintainers that It Is The Rule Of The Kernel that You Do One, and Exactly One, Bug Fix Per Commit. And this kind of applying (not even actual, in this case) rules with an iron fist without any regard to nuance is one of the things that contributes to the unwelcoming nature of this community. The reality is that us engineers often notice little things to fix while working on code doing something else, and in those cases, it often makes little to no sense to split off the fix into another commit, especially if it's minor enough that it doesn't need backporting or the fix ends up getting subsumed into other changes being made. There is room for discretion in deciding how to split up commits, and it's not something you can make an objective, strict rule for that actually makes sense. When it comes to programming in a large collaborative project like this, strict rules can only be made and enforced for things that can be fully checked and made consistent by a machine... and the Linux kernel development process is too antiquated to have many of those (e.g. many projects are doing fully automated code formatting these days, while here all we have is checkpatch.pl and it's often wrong and unsuitable as an automated gating check). Everything else is at best a strong suggestion, but always subject to human interpretation and nuance, because we aren't robots. I do these kinds of drive-by fixes all the time, because sometimes it just *makes sense*. See for example 38892ea4cef, which fixed a typo while renaming a DT property. Or d182dc6de932, which refactored some OF-related code and in the process also fixed a reference leak in the original code. Those are things not worth backporting, and in those cases there is absolutely no good reason to split out the "fix" part from the "do" part into a tiny one-liner commit. Ultimately, drive-by fixes like this are a gift from the contributor to the maintainer, as they are almost always not something that is actually causing pain for anyone (at least not significant pain), and therefore there is very little reason to spend time on them. Thus, if maintainers attempt to enforce a strict one-patch-per-bugfix policy, it will simply result in (and I'm sure it already is) contributors just not doing those fixes at all, or choosing not to describe them in the commit message when they are "accidentally" fixed as part of a larger change. I know I would. When the patch submission process is painful, you naturally gravitate towards doing the least offensive thing that gets you where you want to be, even if that isn't necessarily the best place for the kernel to be in the end. That all said, you *did* have a very valid point with the merged commit being harder to backport in this case (where a backport is desirable), and checking the blame to see if it would apply cleanly further back was a good call - but then please appeal to that, and not to submitting-patches.rst. And then if we'd agreed on that being the desirable goal, I could've helped you structure the patches in a way that cleanly backports to stable, as I already locally confirmed is possible to do. > Therefore, I believe my series to be a technically better approach > which addresses several more issues while conforming to the "Solve > only one problem per patch." requirement which can be trivially > backported - and I truely believe that even patch 4 complies with > the requirement in submitting-patches.rst. I certainly do not > believe patch 4 is a kind of "partial" fix for the race condition, > since it in no way changes the presence of the race. It certainly fixed more things, but as I said, if this were about backporting, we could've done better ;) Patch 4 may not change the presence of the race, but both you and I know it is preparatory work for that (as if the goal were just to fix the gpiod leak, that would've just been adding the one-line gpiod_put()), so it is somewhat disingenuous to claim that it is a stand-alone fix. I posit that if nothing else it is harder to review as a fix for what it purports to fix, and not the best approach for a backport (which as I said doesn't work out of the box, but even if it did). Sure, it's splitting off the fixes in the sense that only one thing is actually fixed after each commit is applied, but it's not splitting them off at the sensible point for that. > I hope that we can continue to work to get the Apple M1 supported more > fully in the future. Thank you. (I still need to re-send those SMC patches... maybe next week I'll finally find some time) - Hector