Message ID | 1493171393-1825-1-git-send-email-lv.zheng@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote: > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by > acpi_put_table() invocations. So it is not a good timing to report errors. > The strict balanced validation count check should only be enabled after > confirming that all kernel side invocations are safe. We've been living with this bug for 7 years, let's just go fix all acpi_get_table() invocations to make sure they have a corresponding acpi_put_table(). > > Thus this patch removes the fatal error but leaves the error report to > indicate the leak so that developers can notice the required engineering > change. Reported by Dan Williams, fixed by Lv Zheng. > > Reported-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Lv Zheng <lv.zheng@intel.com> > --- > drivers/acpi/acpica/tbutils.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c > index 5a968a7..9e7d95cf 100644 > --- a/drivers/acpi/acpica/tbutils.c > +++ b/drivers/acpi/acpica/tbutils.c > @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc, > "Table %p, Validation count is zero after increment\n", > table_desc)); > table_desc->validation_count--; > - return_ACPI_STATUS(AE_LIMIT); If you want to leave the error report turn it into a WARN_ON_ONCE() so it doesn't keep triggering, but I'd rather we just focus on the missing acpi_put_table() calls. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, > From: Dan Williams [mailto:dan.j.williams@intel.com] > Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling > > On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote: > > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by > > acpi_put_table() invocations. So it is not a good timing to report errors. > > The strict balanced validation count check should only be enabled after > > confirming that all kernel side invocations are safe. > > We've been living with this bug for 7 years, let's just go fix all > acpi_get_table() invocations to make sure they have a corresponding > acpi_put_table(). We knew that, you should have already seen a series internally or externally from me achieving this. It's done several years ago. But it takes long time to make the ACPICA part upstreamed. Now my plan is: 1. introduce the APIs but allow old usage models in order not to change old ACPICA behavior and its users. 2. fix all users 3. disallow old usage models. It's just my mistake to leak the final stage approach to the ACPICA upstream from my local repo. Now we can try to jump to the final step, but as far as I know, not only Linux, ACPICA itself also contains several broken cases. Bottom line of Linux kernel is we shouldn't break any running system. So IMO, we will need this commit during this special period. I didn't say the final step is wrong or is not required. We can do both in parallel. So could you please help to confirm if it's working. And I would like to suggest linux to take this first step fix along with other final step fixes during this period. Thanks and best regards Lv > > > > > Thus this patch removes the fatal error but leaves the error report to > > indicate the leak so that developers can notice the required engineering > > change. Reported by Dan Williams, fixed by Lv Zheng. > > > > Reported-by: Dan Williams <dan.j.williams@intel.com> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com> > > --- > > drivers/acpi/acpica/tbutils.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c > > index 5a968a7..9e7d95cf 100644 > > --- a/drivers/acpi/acpica/tbutils.c > > +++ b/drivers/acpi/acpica/tbutils.c > > @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc, > > "Table %p, Validation count is zero after increment\n", > > table_desc)); > > table_desc->validation_count--; > > - return_ACPI_STATUS(AE_LIMIT); > > If you want to leave the error report turn it into a WARN_ON_ONCE() so > it doesn't keep triggering, but I'd rather we just focus on the > missing acpi_put_table() calls.
On Tue, Apr 25, 2017 at 10:15 PM, Zheng, Lv <lv.zheng@intel.com> wrote: > Hi, > >> From: Dan Williams [mailto:dan.j.williams@intel.com] >> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling >> >> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote: >> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by >> > acpi_put_table() invocations. So it is not a good timing to report errors. >> > The strict balanced validation count check should only be enabled after >> > confirming that all kernel side invocations are safe. >> >> We've been living with this bug for 7 years, let's just go fix all >> acpi_get_table() invocations to make sure they have a corresponding >> acpi_put_table(). > > We knew that, you should have already seen a series internally or > externally from me achieving this. > It's done several years ago. But it takes long time to make the > ACPICA part upstreamed. > > Now my plan is: > 1. introduce the APIs but allow old usage models in order not to > change old ACPICA behavior and its users. > 2. fix all users > 3. disallow old usage models. > It's just my mistake to leak the final stage approach to the ACPICA > upstream from my local repo. > Now we can try to jump to the final step, but as far as I know, > not only Linux, ACPICA itself also contains several broken cases. > > Bottom line of Linux kernel is we shouldn't break any running system. > So IMO, we will need this commit during this special period. > > I didn't say the final step is wrong or is not required. > We can do both in parallel. > > So could you please help to confirm if it's working. > And I would like to suggest linux to take this first step fix along > with other final step fixes during this period. I just think "this period" is very short and we can skip the band-aid and go straight to auditing the 48 call sites of acpi_get_table. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 26, 2017 at 7:13 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Apr 25, 2017 at 10:15 PM, Zheng, Lv <lv.zheng@intel.com> wrote: >> Hi, >> >>> From: Dan Williams [mailto:dan.j.williams@intel.com] >>> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling >>> >>> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote: >>> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by >>> > acpi_put_table() invocations. So it is not a good timing to report errors. >>> > The strict balanced validation count check should only be enabled after >>> > confirming that all kernel side invocations are safe. >>> >>> We've been living with this bug for 7 years, let's just go fix all >>> acpi_get_table() invocations to make sure they have a corresponding >>> acpi_put_table(). >> >> We knew that, you should have already seen a series internally or >> externally from me achieving this. >> It's done several years ago. But it takes long time to make the >> ACPICA part upstreamed. >> >> Now my plan is: >> 1. introduce the APIs but allow old usage models in order not to >> change old ACPICA behavior and its users. >> 2. fix all users >> 3. disallow old usage models. >> It's just my mistake to leak the final stage approach to the ACPICA >> upstream from my local repo. >> Now we can try to jump to the final step, but as far as I know, >> not only Linux, ACPICA itself also contains several broken cases. >> >> Bottom line of Linux kernel is we shouldn't break any running system. >> So IMO, we will need this commit during this special period. >> >> I didn't say the final step is wrong or is not required. >> We can do both in parallel. >> >> So could you please help to confirm if it's working. >> And I would like to suggest linux to take this first step fix along >> with other final step fixes during this period. > > I just think "this period" is very short and we can skip the band-aid > and go straight to auditing the 48 call sites of acpi_get_table. Moreover, I don't think this workaround is a workable approach because it leaves the ACPI_ERROR() in place to continue to spam the logs. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c index 5a968a7..9e7d95cf 100644 --- a/drivers/acpi/acpica/tbutils.c +++ b/drivers/acpi/acpica/tbutils.c @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc, "Table %p, Validation count is zero after increment\n", table_desc)); table_desc->validation_count--; - return_ACPI_STATUS(AE_LIMIT); } *out_table = table_desc->pointer;
In the Linux kernel side, acpi_get_table() hasn't been fully balanced by acpi_put_table() invocations. So it is not a good timing to report errors. The strict balanced validation count check should only be enabled after confirming that all kernel side invocations are safe. Thus this patch removes the fatal error but leaves the error report to indicate the leak so that developers can notice the required engineering change. Reported by Dan Williams, fixed by Lv Zheng. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> --- drivers/acpi/acpica/tbutils.c | 1 - 1 file changed, 1 deletion(-)