Message ID | CAJFUiJizA=ZxXh5BNj-eL6xsVrNEbTnd0Z5yePPDxAR8YjGibw@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Wed, 4 May 2016, Lianwei Wang wrote: > In this example, the unbalanced count is caused by the > cpu_hotplug_pm_callback pm notifier callback function. I doubt that. > We can add a variable to avoid the unbalanced call of cpu_hotplug_enable > ,e.g. > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 3e3f6e49eabb..aa6694f0e9d3 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -1140,16 +1140,21 @@ static int > cpu_hotplug_pm_callback(struct notifier_block *nb, > unsigned long action, void *ptr) > { > + static int disabled; > + > switch (action) { > > case PM_SUSPEND_PREPARE: > case PM_HIBERNATION_PREPARE: > cpu_hotplug_disable(); > + disabled = 1; > break; > > case PM_POST_SUSPEND: > case PM_POST_HIBERNATION: > - cpu_hotplug_enable(); > + if (disabled) > + cpu_hotplug_enable(); > + disabled = 0; > break; > > default: > > Please let me know if you like to fix it in this way. So you are moving the work around one step down w/o providing any reasonable explanation how this asymetric call of that callback can happen. Can you eventually come up with a coherent explanation of the problem down to the root cause or are we going to play this "move the workaround one step down" game for another 10 rounds? > +static void _cpu_hotplug_enable(void) > +{ > + if (WARN(!cpu_hotplug_disabled, "Unbalanced cpu hotplug enable\n")) > + return; > + > + cpu_hotplug_disabled--; > +} > > I like to fix it in the cpu_hotplug_enable because it is a public You CANNOT fix it there. The problem is the call site and NOT cpu_hotplug_enable(). Can you finally accept this? > kernel API and fix in it can prevent any other unbalanced calling. I It cannot prevent any unbalanced calls. It mitigates the issue, but that's a different problem. We can discuss that seperately after fixing the offending call site. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Wed, 4 May 2016, Lianwei Wang wrote: >> In this example, the unbalanced count is caused by the >> cpu_hotplug_pm_callback pm notifier callback function. > > I doubt that. > >> We can add a variable to avoid the unbalanced call of cpu_hotplug_enable >> ,e.g. > >> diff --git a/kernel/cpu.c b/kernel/cpu.c >> index 3e3f6e49eabb..aa6694f0e9d3 100644 >> --- a/kernel/cpu.c >> +++ b/kernel/cpu.c >> @@ -1140,16 +1140,21 @@ static int >> cpu_hotplug_pm_callback(struct notifier_block *nb, >> unsigned long action, void *ptr) >> { >> + static int disabled; >> + >> switch (action) { >> >> case PM_SUSPEND_PREPARE: >> case PM_HIBERNATION_PREPARE: >> cpu_hotplug_disable(); >> + disabled = 1; >> break; >> >> case PM_POST_SUSPEND: >> case PM_POST_HIBERNATION: >> - cpu_hotplug_enable(); >> + if (disabled) >> + cpu_hotplug_enable(); >> + disabled = 0; >> break; >> >> default: >> >> Please let me know if you like to fix it in this way. > > So you are moving the work around one step down w/o providing any reasonable > explanation how this asymetric call of that callback can happen. > > Can you eventually come up with a coherent explanation of the problem down to > the root cause or are we going to play this "move the workaround one step > down" game for another 10 rounds? > Do you agree that any driver can abort the suspend process by returning an error or NOTIFY_BAD if it is not ready to suspend? I have explain it and I also copied the example code that abort suspend by returning an error or NOTIFY_BAD in the pm notifier callback function. The cpu_hotplug_disable and cpu_hotplug_enable are called in one of the PM notifier callback. And they are called from two difference place. Below is how it happened: pm_suspend |--enter_state |--suspend_prepare |--pm_notifier_call_chain(PM_SUSPEND_PREPARE) | |--call_back_1 | |--call_back_.. | |--call_back_n ===> return NOTIFY_BAD to abort call chain and | | suspend process here | |--cpu_hotplug_pm_callback() | | |--cpu_hotplug_disable =====> remember it is not called yet | |--call_back_.. | |--pm_notifier_call_chain(PM_POST_SUSPEND) | |--call_back_1 | |--call_back_.. | |--call_back_n | |--cpu_hotplug_pm_callback() | | |--cpu_hotplug_enable =====> Here it is unbalanced called | |--call_back_.. | So, keep in mind that for pm notifier call chain, the PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always paired called. Sometimes for a driver's pm notifier callback, the PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE. >> +static void _cpu_hotplug_enable(void) >> +{ >> + if (WARN(!cpu_hotplug_disabled, "Unbalanced cpu hotplug enable\n")) >> + return; >> + >> + cpu_hotplug_disabled--; >> +} >> >> I like to fix it in the cpu_hotplug_enable because it is a public > > You CANNOT fix it there. The problem is the call site and NOT > cpu_hotplug_enable(). Can you finally accept this? I know what you mean. But why let the driver unconditionally do "--cpu_hotplug_disabled" without any checking. It should do nothing if it detect a unbalanced enable. A good example for unbalanced checking from enable_irq is here: http://lxr.free-electrons.com/source/kernel/irq/manage.c#L512 It's the cpu hotplug driver's responsibility to guarantee that the cpu hotplug always working well even others failed do something to it. And the driver can check and handle it itself, why not let the driver to handle it and make the cpu hotplug driver more strong? > >> kernel API and fix in it can prevent any other unbalanced calling. > > It cannot prevent any unbalanced calls. It mitigates the issue, but that's a > different problem. It did not migrate the issue. It give a warning message to log the unbalanced issue and it also make sure the cpu hotplug continue to work well even someone do an unbalanced call. It is a good checking as the enable_irq/disable_irq do. There are some other unbalanced checking in kernel too. All make sure the kernel has a better stability. > > We can discuss that seperately after fixing the offending call site. > > Thanks, > > tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 6 May 2016, Lianwei Wang wrote: > On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > Can you eventually come up with a coherent explanation of the problem down to > > the root cause or are we going to play this "move the workaround one step > > down" game for another 10 rounds? > > > Do you agree that any driver can abort the suspend process by > returning an error or NOTIFY_BAD if it is not ready to suspend? > I have explain it and I also copied the example code that abort > suspend by returning an error or NOTIFY_BAD in the pm notifier > callback function. I don't need copied example code which does not tell me what the real problem is. > The cpu_hotplug_disable and cpu_hotplug_enable are called in one of > the PM notifier callback. And they are called from two difference > place. > Below is how it happened: > pm_suspend > |--enter_state > |--suspend_prepare > |--pm_notifier_call_chain(PM_SUSPEND_PREPARE) > | |--call_back_1 > | |--call_back_.. > | |--call_back_n ===> return NOTIFY_BAD to abort call chain and > | | suspend process here > | |--cpu_hotplug_pm_callback() > | | |--cpu_hotplug_disable =====> remember it is not > called yet > | |--call_back_.. > | > |--pm_notifier_call_chain(PM_POST_SUSPEND) > | |--call_back_1 > | |--call_back_.. > | |--call_back_n > | |--cpu_hotplug_pm_callback() > | | |--cpu_hotplug_enable =====> Here it is unbalanced called > | |--call_back_.. > | > So, keep in mind that for pm notifier call chain, the > PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always > paired called. Sometimes for a driver's pm notifier callback, the > PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE. So that is the real problem: cpu_hotplug_pm_callback(PM_POST_SUSPEND) can be called w/o a previous call to cpu_hotplug_pm_callback(PM_SUSPEND_PREPARE). > > It cannot prevent any unbalanced calls. It mitigates the issue, but that's a > > different problem. > It did not migrate the issue. It give a warning message to log the > unbalanced issue and it also make sure the cpu hotplug continue to > work well even someone do an unbalanced call. It is a good checking as > the enable_irq/disable_irq do. There are some other unbalanced > checking in kernel too. All make sure the kernel has a better > stability. I'm not opposed to do that and I said so several times. But I said as well, that we do not add this without fixing the problem which made you write that patch in the first place. So we have a proper explanation for the real problem now, but we have no fix. And again: Your patch is NOT a fix. Simply because it will emit a warning everytime the above happens. And that's wrong because the abort is a legitimate scenario. So please come up with a sensible fix for the suspend abort issue and then we can add the balance check/fixup to the hotplug_disable/enable() code. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have come up a patch to make the pm notifier called symmetrically and currently being tested. I will send it out after pass the test. On Fri, May 6, 2016 at 12:18 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Fri, 6 May 2016, Lianwei Wang wrote: >> On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote: >> > Can you eventually come up with a coherent explanation of the problem down to >> > the root cause or are we going to play this "move the workaround one step >> > down" game for another 10 rounds? >> > >> Do you agree that any driver can abort the suspend process by >> returning an error or NOTIFY_BAD if it is not ready to suspend? >> I have explain it and I also copied the example code that abort >> suspend by returning an error or NOTIFY_BAD in the pm notifier >> callback function. > > I don't need copied example code which does not tell me what the real problem > is. > >> The cpu_hotplug_disable and cpu_hotplug_enable are called in one of >> the PM notifier callback. And they are called from two difference >> place. >> Below is how it happened: >> pm_suspend >> |--enter_state >> |--suspend_prepare >> |--pm_notifier_call_chain(PM_SUSPEND_PREPARE) >> | |--call_back_1 >> | |--call_back_.. >> | |--call_back_n ===> return NOTIFY_BAD to abort call chain and >> | | suspend process here >> | |--cpu_hotplug_pm_callback() >> | | |--cpu_hotplug_disable =====> remember it is not >> called yet >> | |--call_back_.. >> | >> |--pm_notifier_call_chain(PM_POST_SUSPEND) >> | |--call_back_1 >> | |--call_back_.. >> | |--call_back_n >> | |--cpu_hotplug_pm_callback() >> | | |--cpu_hotplug_enable =====> Here it is unbalanced called >> | |--call_back_.. >> | >> So, keep in mind that for pm notifier call chain, the >> PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always >> paired called. Sometimes for a driver's pm notifier callback, the >> PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE. > > So that is the real problem: cpu_hotplug_pm_callback(PM_POST_SUSPEND) can be > called w/o a previous call to cpu_hotplug_pm_callback(PM_SUSPEND_PREPARE). > >> > It cannot prevent any unbalanced calls. It mitigates the issue, but that's a >> > different problem. >> It did not migrate the issue. It give a warning message to log the >> unbalanced issue and it also make sure the cpu hotplug continue to >> work well even someone do an unbalanced call. It is a good checking as >> the enable_irq/disable_irq do. There are some other unbalanced >> checking in kernel too. All make sure the kernel has a better >> stability. > > I'm not opposed to do that and I said so several times. But I said as well, > that we do not add this without fixing the problem which made you write that > patch in the first place. > > So we have a proper explanation for the real problem now, but we have no > fix. > > And again: Your patch is NOT a fix. Simply because it will emit a warning > everytime the above happens. And that's wrong because the abort is a > legitimate scenario. > > So please come up with a sensible fix for the suspend abort issue and then we > can add the balance check/fixup to the hotplug_disable/enable() code. > > Thanks, > > tglx > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 12, 2016 at 1:06 AM, Lianwei Wang <lianwei.wang@gmail.com> wrote: > I have come up a patch to make the pm notifier called symmetrically > and currently being tested. I will send it out after pass the test. > > On Fri, May 6, 2016 at 12:18 AM, Thomas Gleixner <tglx@linutronix.de> wrote: >> On Fri, 6 May 2016, Lianwei Wang wrote: >>> On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote: >>> > Can you eventually come up with a coherent explanation of the problem down to >>> > the root cause or are we going to play this "move the workaround one step >>> > down" game for another 10 rounds? >>> > >>> Do you agree that any driver can abort the suspend process by >>> returning an error or NOTIFY_BAD if it is not ready to suspend? >>> I have explain it and I also copied the example code that abort >>> suspend by returning an error or NOTIFY_BAD in the pm notifier >>> callback function. >> >> I don't need copied example code which does not tell me what the real problem >> is. >> >>> The cpu_hotplug_disable and cpu_hotplug_enable are called in one of >>> the PM notifier callback. And they are called from two difference >>> place. >>> Below is how it happened: >>> pm_suspend >>> |--enter_state >>> |--suspend_prepare >>> |--pm_notifier_call_chain(PM_SUSPEND_PREPARE) >>> | |--call_back_1 >>> | |--call_back_.. >>> | |--call_back_n ===> return NOTIFY_BAD to abort call chain and >>> | | suspend process here >>> | |--cpu_hotplug_pm_callback() >>> | | |--cpu_hotplug_disable =====> remember it is not >>> called yet >>> | |--call_back_.. >>> | >>> |--pm_notifier_call_chain(PM_POST_SUSPEND) >>> | |--call_back_1 >>> | |--call_back_.. >>> | |--call_back_n >>> | |--cpu_hotplug_pm_callback() >>> | | |--cpu_hotplug_enable =====> Here it is unbalanced called >>> | |--call_back_.. >>> | >>> So, keep in mind that for pm notifier call chain, the >>> PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always >>> paired called. Sometimes for a driver's pm notifier callback, the >>> PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE. >> >> So that is the real problem: cpu_hotplug_pm_callback(PM_POST_SUSPEND) can be >> called w/o a previous call to cpu_hotplug_pm_callback(PM_SUSPEND_PREPARE). >> >>> > It cannot prevent any unbalanced calls. It mitigates the issue, but that's a >>> > different problem. >>> It did not migrate the issue. It give a warning message to log the >>> unbalanced issue and it also make sure the cpu hotplug continue to >>> work well even someone do an unbalanced call. It is a good checking as >>> the enable_irq/disable_irq do. There are some other unbalanced >>> checking in kernel too. All make sure the kernel has a better >>> stability. >> >> I'm not opposed to do that and I said so several times. But I said as well, >> that we do not add this without fixing the problem which made you write that >> patch in the first place. >> >> So we have a proper explanation for the real problem now, but we have no >> fix. >> >> And again: Your patch is NOT a fix. Simply because it will emit a warning >> everytime the above happens. And that's wrong because the abort is a >> legitimate scenario. >> >> So please come up with a sensible fix for the suspend abort issue and then we >> can add the balance check/fixup to the hotplug_disable/enable() code. >> >> Thanks, >> >> tglx >> >> I think this is the best solution to fix this unbalanced issue for now. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/cpu.c b/kernel/cpu.c index 3e3f6e49eabb..aa6694f0e9d3 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1140,16 +1140,21 @@ static int cpu_hotplug_pm_callback(struct notifier_block *nb, unsigned long action, void *ptr) { + static int disabled; + switch (action) { case PM_SUSPEND_PREPARE: case PM_HIBERNATION_PREPARE: cpu_hotplug_disable(); + disabled = 1; break; case PM_POST_SUSPEND: case PM_POST_HIBERNATION: - cpu_hotplug_enable(); + if (disabled) + cpu_hotplug_enable(); + disabled = 0; break; default: