diff mbox

[RFC] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

Message ID 1493171393-1825-1-git-send-email-lv.zheng@intel.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Lv Zheng April 26, 2017, 1:49 a.m. UTC
In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
acpi_put_table() invocations. So it is not a good timing to report errors.
The strict balanced validation count check should only be enabled after
confirming that all kernel side invocations are safe.

Thus this patch removes the fatal error but leaves the error report to
indicate the leak so that developers can notice the required engineering
change. Reported by Dan Williams, fixed by Lv Zheng.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/acpica/tbutils.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Dan Williams April 26, 2017, 5 a.m. UTC | #1
On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote:
> In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
> acpi_put_table() invocations. So it is not a good timing to report errors.
> The strict balanced validation count check should only be enabled after
> confirming that all kernel side invocations are safe.

We've been living with this bug for 7 years, let's just go fix all
acpi_get_table() invocations to make sure they have a corresponding
acpi_put_table().

>
> Thus this patch removes the fatal error but leaves the error report to
> indicate the leak so that developers can notice the required engineering
> change. Reported by Dan Williams, fixed by Lv Zheng.
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> ---
>  drivers/acpi/acpica/tbutils.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index 5a968a7..9e7d95cf 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,
>                             "Table %p, Validation count is zero after increment\n",
>                             table_desc));
>                 table_desc->validation_count--;
> -               return_ACPI_STATUS(AE_LIMIT);

If you want to leave the error report turn it into a WARN_ON_ONCE() so
it doesn't keep triggering, but I'd rather we just focus on the
missing acpi_put_table() calls.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lv Zheng April 26, 2017, 5:15 a.m. UTC | #2
Hi,

> From: Dan Williams [mailto:dan.j.williams@intel.com]

> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

> 

> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote:

> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by

> > acpi_put_table() invocations. So it is not a good timing to report errors.

> > The strict balanced validation count check should only be enabled after

> > confirming that all kernel side invocations are safe.

> 

> We've been living with this bug for 7 years, let's just go fix all

> acpi_get_table() invocations to make sure they have a corresponding

> acpi_put_table().


We knew that, you should have already seen a series internally or
externally from me achieving this.
It's done several years ago. But it takes long time to make the
ACPICA part upstreamed.

Now my plan is:
1. introduce the APIs but allow old usage models in order not to
   change old ACPICA behavior and its users.
2. fix all users
3. disallow old usage models.
It's just my mistake to leak the final stage approach to the ACPICA
upstream from my local repo.
Now we can try to jump to the final step, but as far as I know,
not only Linux, ACPICA itself also contains several broken cases.

Bottom line of Linux kernel is we shouldn't break any running system.
So IMO, we will need this commit during this special period.

I didn't say the final step is wrong or is not required.
We can do both in parallel.

So could you please help to confirm if it's working.
And I would like to suggest linux to take this first step fix along
with other final step fixes during this period.

Thanks and best regards
Lv

> 

> >

> > Thus this patch removes the fatal error but leaves the error report to

> > indicate the leak so that developers can notice the required engineering

> > change. Reported by Dan Williams, fixed by Lv Zheng.

> >

> > Reported-by: Dan Williams <dan.j.williams@intel.com>

> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>

> > ---

> >  drivers/acpi/acpica/tbutils.c | 1 -

> >  1 file changed, 1 deletion(-)

> >

> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c

> > index 5a968a7..9e7d95cf 100644

> > --- a/drivers/acpi/acpica/tbutils.c

> > +++ b/drivers/acpi/acpica/tbutils.c

> > @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,

> >                             "Table %p, Validation count is zero after increment\n",

> >                             table_desc));

> >                 table_desc->validation_count--;

> > -               return_ACPI_STATUS(AE_LIMIT);

> 

> If you want to leave the error report turn it into a WARN_ON_ONCE() so

> it doesn't keep triggering, but I'd rather we just focus on the

> missing acpi_put_table() calls.
Dan Williams April 26, 2017, 2:13 p.m. UTC | #3
On Tue, Apr 25, 2017 at 10:15 PM, Zheng, Lv <lv.zheng@intel.com> wrote:
> Hi,
>
>> From: Dan Williams [mailto:dan.j.williams@intel.com]
>> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
>>
>> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote:
>> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
>> > acpi_put_table() invocations. So it is not a good timing to report errors.
>> > The strict balanced validation count check should only be enabled after
>> > confirming that all kernel side invocations are safe.
>>
>> We've been living with this bug for 7 years, let's just go fix all
>> acpi_get_table() invocations to make sure they have a corresponding
>> acpi_put_table().
>
> We knew that, you should have already seen a series internally or
> externally from me achieving this.
> It's done several years ago. But it takes long time to make the
> ACPICA part upstreamed.
>
> Now my plan is:
> 1. introduce the APIs but allow old usage models in order not to
>    change old ACPICA behavior and its users.
> 2. fix all users
> 3. disallow old usage models.
> It's just my mistake to leak the final stage approach to the ACPICA
> upstream from my local repo.
> Now we can try to jump to the final step, but as far as I know,
> not only Linux, ACPICA itself also contains several broken cases.
>
> Bottom line of Linux kernel is we shouldn't break any running system.
> So IMO, we will need this commit during this special period.
>
> I didn't say the final step is wrong or is not required.
> We can do both in parallel.
>
> So could you please help to confirm if it's working.
> And I would like to suggest linux to take this first step fix along
> with other final step fixes during this period.

I just think "this period" is very short and we can skip the band-aid
and go straight to auditing the 48 call sites of acpi_get_table.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams April 26, 2017, 3:34 p.m. UTC | #4
On Wed, Apr 26, 2017 at 7:13 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 25, 2017 at 10:15 PM, Zheng, Lv <lv.zheng@intel.com> wrote:
>> Hi,
>>
>>> From: Dan Williams [mailto:dan.j.williams@intel.com]
>>> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
>>>
>>> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <lv.zheng@intel.com> wrote:
>>> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
>>> > acpi_put_table() invocations. So it is not a good timing to report errors.
>>> > The strict balanced validation count check should only be enabled after
>>> > confirming that all kernel side invocations are safe.
>>>
>>> We've been living with this bug for 7 years, let's just go fix all
>>> acpi_get_table() invocations to make sure they have a corresponding
>>> acpi_put_table().
>>
>> We knew that, you should have already seen a series internally or
>> externally from me achieving this.
>> It's done several years ago. But it takes long time to make the
>> ACPICA part upstreamed.
>>
>> Now my plan is:
>> 1. introduce the APIs but allow old usage models in order not to
>>    change old ACPICA behavior and its users.
>> 2. fix all users
>> 3. disallow old usage models.
>> It's just my mistake to leak the final stage approach to the ACPICA
>> upstream from my local repo.
>> Now we can try to jump to the final step, but as far as I know,
>> not only Linux, ACPICA itself also contains several broken cases.
>>
>> Bottom line of Linux kernel is we shouldn't break any running system.
>> So IMO, we will need this commit during this special period.
>>
>> I didn't say the final step is wrong or is not required.
>> We can do both in parallel.
>>
>> So could you please help to confirm if it's working.
>> And I would like to suggest linux to take this first step fix along
>> with other final step fixes during this period.
>
> I just think "this period" is very short and we can skip the band-aid
> and go straight to auditing the 48 call sites of acpi_get_table.

Moreover, I don't think this workaround is a workable approach because
it leaves the ACPI_ERROR() in place to continue to spam the logs.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 5a968a7..9e7d95cf 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -422,7 +422,6 @@  acpi_tb_get_table(struct acpi_table_desc *table_desc,
 			    "Table %p, Validation count is zero after increment\n",
 			    table_desc));
 		table_desc->validation_count--;
-		return_ACPI_STATUS(AE_LIMIT);
 	}
 
 	*out_table = table_desc->pointer;