diff mbox series

thermal: Add a sanity check for invalid state at stats update

Message ID 20200330140859.12535-1-tiwai@suse.de (mailing list archive)
State New, archived
Delegated to: Zhang Rui
Headers show
Series thermal: Add a sanity check for invalid state at stats update | expand

Commit Message

Takashi Iwai March 30, 2020, 2:08 p.m. UTC
The thermal sysfs handler keeps the statistics table with the fixed
size that was determined from the initial max_states() call, and the
table entry is updated at each sysfs cur_state write call.  And, when
the driver's set_cur_state() ops accepts the value given from
user-space, the thermal sysfs core blindly applies it to the
statistics table entry, which may overflow and cause an Oops.
Although it's rather a bug in the driver's ops implementations, we
shouldn't crash but rather give a proper warning instead.

This patch adds a sanity check for avoiding such an OOB access and
warns with a stack trace to show the suspicious device in question.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---

We've hit some crash by stress tests, and this patch at least works
around the crash itself.  While the actual bug fix of the buggy driver
is still being investigated, I submit the hardening in the core side
at first.

 drivers/thermal/thermal_sysfs.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Amit Kucheria April 7, 2020, 1:30 p.m. UTC | #1
On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <tiwai@suse.de> wrote:
>
> The thermal sysfs handler keeps the statistics table with the fixed
> size that was determined from the initial max_states() call, and the
> table entry is updated at each sysfs cur_state write call.  And, when
> the driver's set_cur_state() ops accepts the value given from
> user-space, the thermal sysfs core blindly applies it to the
> statistics table entry, which may overflow and cause an Oops.
> Although it's rather a bug in the driver's ops implementations, we
> shouldn't crash but rather give a proper warning instead.
>
> This patch adds a sanity check for avoiding such an OOB access and
> warns with a stack trace to show the suspicious device in question.

Hi Takashi,

Instead of this warning, I think we should reject such input when
writing to cur_state.

See attached patch. If you think this OK, I'll submit it.

Regards,
Amit

> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>
> We've hit some crash by stress tests, and this patch at least works
> around the crash itself.  While the actual bug fix of the buggy driver
> is still being investigated, I submit the hardening in the core side
> at first.
>
>  drivers/thermal/thermal_sysfs.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index aa99edb4dff7..a23c4e701d63 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
>
>         spin_lock(&stats->lock);
>
> +       if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> +                         "new state %ld exceeds max_state %ld",
> +                         new_state, stats->max_states))
> +               goto unlock;
> +
>         if (stats->state == new_state)
>                 goto unlock;
>
> --
> 2.16.4
>
Takashi Iwai April 7, 2020, 2:12 p.m. UTC | #2
On Tue, 07 Apr 2020 15:30:51 +0200,
Amit Kucheria wrote:
> 
> On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > The thermal sysfs handler keeps the statistics table with the fixed
> > size that was determined from the initial max_states() call, and the
> > table entry is updated at each sysfs cur_state write call.  And, when
> > the driver's set_cur_state() ops accepts the value given from
> > user-space, the thermal sysfs core blindly applies it to the
> > statistics table entry, which may overflow and cause an Oops.
> > Although it's rather a bug in the driver's ops implementations, we
> > shouldn't crash but rather give a proper warning instead.
> >
> > This patch adds a sanity check for avoiding such an OOB access and
> > warns with a stack trace to show the suspicious device in question.
> 
> Hi Takashi,
> 
> Instead of this warning, I think we should reject such input when
> writing to cur_state.
> 
> See attached patch. If you think this OK, I'll submit it.

Actually the input value itself is correct, the problem is rather
about the max_states that may vary depending on other driver.  So IMO,
we don't want to refuse the input completely.

Please see the thread:
  https://lore.kernel.org/linux-acpi/s5h5zeiwd01.wl-tiwai@suse.de/


thanks,

Takashi


> 
> Regards,
> Amit
> 
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >
> > We've hit some crash by stress tests, and this patch at least works
> > around the crash itself.  While the actual bug fix of the buggy driver
> > is still being investigated, I submit the hardening in the core side
> > at first.
> >
> >  drivers/thermal/thermal_sysfs.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> > index aa99edb4dff7..a23c4e701d63 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
> >
> >         spin_lock(&stats->lock);
> >
> > +       if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> > +                         "new state %ld exceeds max_state %ld",
> > +                         new_state, stats->max_states))
> > +               goto unlock;
> > +
> >         if (stats->state == new_state)
> >                 goto unlock;
> >
> > --
> > 2.16.4
> >
> From 54266260d483ab4476510dd4461a1cafc611e17d Mon Sep 17 00:00:00 2001
> Message-Id: <54266260d483ab4476510dd4461a1cafc611e17d.1586266224.git.amit.kucheria@linaro.org>
> From: Amit Kucheria <amit.kucheria@linaro.org>
> Date: Tue, 7 Apr 2020 18:48:14 +0530
> Subject: [PATCH] thermal: Reject invalid cur_state input from userspace
> 
> We don't check if the cur_state value input in sysfs is greater than the
> maximum cooling state that the cooling device supports. This can cause
> access to unallocated memory in case THERMAL_STATISTICS in enabled and
> could also crash cooling devices that don't check for an invalid state in
> their set_cur_state() callback.
> 
> Return an error if the state being requested in greater than the maximum
> cooling state the device supports.
> 
> Reported-by: Takashi Iwai <tiwai@suse.de>
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
>  drivers/thermal/thermal_sysfs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 7e1d11bdd258..8033e5a9386a 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -703,7 +703,7 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
>  		const char *buf, size_t count)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	unsigned long state;
> +	unsigned long state, max_state;
>  	int result;
>  
>  	if (sscanf(buf, "%ld\n", &state) != 1)
> @@ -712,6 +712,13 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
>  	if ((long)state < 0)
>  		return -EINVAL;
>  
> +	result = cdev->ops->get_max_state(cdev, &max_state);
> +	if (result)
> +		return result;
> +
> +	if (state >= max_state)
> +		return -EINVAL;
> +
>  	mutex_lock(&cdev->lock);
>  
>  	result = cdev->ops->set_cur_state(cdev, state);
> -- 
> 2.20.1
>
Zhang, Rui April 7, 2020, 2:17 p.m. UTC | #3
Right, I have a V2 patch series, which will be post soon.

Thanks,
rui

-----Original Message-----
From: linux-pm-owner@vger.kernel.org <linux-pm-owner@vger.kernel.org> On Behalf Of Takashi Iwai
Sent: Tuesday, April 07, 2020 10:13 PM
To: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Takashi Iwai <tiwai@suse.de>; Zhang, Rui <rui.zhang@intel.com>; Daniel Lezcano <daniel.lezcano@linaro.org>; Linux PM list <linux-pm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] thermal: Add a sanity check for invalid state at stats update
Importance: High

On Tue, 07 Apr 2020 15:30:51 +0200,
Amit Kucheria wrote:
> 
> On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > The thermal sysfs handler keeps the statistics table with the fixed 
> > size that was determined from the initial max_states() call, and the 
> > table entry is updated at each sysfs cur_state write call.  And, 
> > when the driver's set_cur_state() ops accepts the value given from 
> > user-space, the thermal sysfs core blindly applies it to the 
> > statistics table entry, which may overflow and cause an Oops.
> > Although it's rather a bug in the driver's ops implementations, we 
> > shouldn't crash but rather give a proper warning instead.
> >
> > This patch adds a sanity check for avoiding such an OOB access and 
> > warns with a stack trace to show the suspicious device in question.
> 
> Hi Takashi,
> 
> Instead of this warning, I think we should reject such input when 
> writing to cur_state.
> 
> See attached patch. If you think this OK, I'll submit it.

Actually the input value itself is correct, the problem is rather about the max_states that may vary depending on other driver.  So IMO, we don't want to refuse the input completely.

Please see the thread:
  https://lore.kernel.org/linux-acpi/s5h5zeiwd01.wl-tiwai@suse.de/


thanks,

Takashi


> 
> Regards,
> Amit
> 
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >
> > We've hit some crash by stress tests, and this patch at least works 
> > around the crash itself.  While the actual bug fix of the buggy 
> > driver is still being investigated, I submit the hardening in the 
> > core side at first.
> >
> >  drivers/thermal/thermal_sysfs.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c 
> > b/drivers/thermal/thermal_sysfs.c index aa99edb4dff7..a23c4e701d63 
> > 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct 
> > thermal_cooling_device *cdev,
> >
> >         spin_lock(&stats->lock);
> >
> > +       if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> > +                         "new state %ld exceeds max_state %ld",
> > +                         new_state, stats->max_states))
> > +               goto unlock;
> > +
> >         if (stats->state == new_state)
> >                 goto unlock;
> >
> > --
> > 2.16.4
> >
> From 54266260d483ab4476510dd4461a1cafc611e17d Mon Sep 17 00:00:00 2001
> Message-Id: 
> <54266260d483ab4476510dd4461a1cafc611e17d.1586266224.git.amit.kucheria
> @linaro.org>
> From: Amit Kucheria <amit.kucheria@linaro.org>
> Date: Tue, 7 Apr 2020 18:48:14 +0530
> Subject: [PATCH] thermal: Reject invalid cur_state input from 
> userspace
> 
> We don't check if the cur_state value input in sysfs is greater than 
> the maximum cooling state that the cooling device supports. This can 
> cause access to unallocated memory in case THERMAL_STATISTICS in 
> enabled and could also crash cooling devices that don't check for an 
> invalid state in their set_cur_state() callback.
> 
> Return an error if the state being requested in greater than the 
> maximum cooling state the device supports.
> 
> Reported-by: Takashi Iwai <tiwai@suse.de>
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
>  drivers/thermal/thermal_sysfs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c 
> b/drivers/thermal/thermal_sysfs.c index 7e1d11bdd258..8033e5a9386a 
> 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -703,7 +703,7 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
>  		const char *buf, size_t count)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	unsigned long state;
> +	unsigned long state, max_state;
>  	int result;
>  
>  	if (sscanf(buf, "%ld\n", &state) != 1) @@ -712,6 +712,13 @@ 
> cur_state_store(struct device *dev, struct device_attribute *attr,
>  	if ((long)state < 0)
>  		return -EINVAL;
>  
> +	result = cdev->ops->get_max_state(cdev, &max_state);
> +	if (result)
> +		return result;
> +
> +	if (state >= max_state)
> +		return -EINVAL;
> +
>  	mutex_lock(&cdev->lock);
>  
>  	result = cdev->ops->set_cur_state(cdev, state);
> --
> 2.20.1
>
Zhang, Rui April 7, 2020, 3:37 p.m. UTC | #4
Too late for me to post it out by this midnight, will post it out tomorrow.

Thanks,
rui

-----Original Message-----
From: linux-pm-owner@vger.kernel.org <linux-pm-owner@vger.kernel.org> On Behalf Of Zhang, Rui
Sent: Tuesday, April 07, 2020 10:17 PM
To: Takashi Iwai <tiwai@suse.de>; Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>; Linux PM list <linux-pm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] thermal: Add a sanity check for invalid state at stats update

Right, I have a V2 patch series, which will be post soon.

Thanks,
rui

-----Original Message-----
From: linux-pm-owner@vger.kernel.org <linux-pm-owner@vger.kernel.org> On Behalf Of Takashi Iwai
Sent: Tuesday, April 07, 2020 10:13 PM
To: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Takashi Iwai <tiwai@suse.de>; Zhang, Rui <rui.zhang@intel.com>; Daniel Lezcano <daniel.lezcano@linaro.org>; Linux PM list <linux-pm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] thermal: Add a sanity check for invalid state at stats update
Importance: High

On Tue, 07 Apr 2020 15:30:51 +0200,
Amit Kucheria wrote:
> 
> On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > The thermal sysfs handler keeps the statistics table with the fixed 
> > size that was determined from the initial max_states() call, and the 
> > table entry is updated at each sysfs cur_state write call.  And, 
> > when the driver's set_cur_state() ops accepts the value given from 
> > user-space, the thermal sysfs core blindly applies it to the 
> > statistics table entry, which may overflow and cause an Oops.
> > Although it's rather a bug in the driver's ops implementations, we 
> > shouldn't crash but rather give a proper warning instead.
> >
> > This patch adds a sanity check for avoiding such an OOB access and 
> > warns with a stack trace to show the suspicious device in question.
> 
> Hi Takashi,
> 
> Instead of this warning, I think we should reject such input when 
> writing to cur_state.
> 
> See attached patch. If you think this OK, I'll submit it.

Actually the input value itself is correct, the problem is rather about the max_states that may vary depending on other driver.  So IMO, we don't want to refuse the input completely.

Please see the thread:
  https://lore.kernel.org/linux-acpi/s5h5zeiwd01.wl-tiwai@suse.de/


thanks,

Takashi


> 
> Regards,
> Amit
> 
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >
> > We've hit some crash by stress tests, and this patch at least works 
> > around the crash itself.  While the actual bug fix of the buggy 
> > driver is still being investigated, I submit the hardening in the 
> > core side at first.
> >
> >  drivers/thermal/thermal_sysfs.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c 
> > b/drivers/thermal/thermal_sysfs.c index aa99edb4dff7..a23c4e701d63
> > 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct
> > thermal_cooling_device *cdev,
> >
> >         spin_lock(&stats->lock);
> >
> > +       if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> > +                         "new state %ld exceeds max_state %ld",
> > +                         new_state, stats->max_states))
> > +               goto unlock;
> > +
> >         if (stats->state == new_state)
> >                 goto unlock;
> >
> > --
> > 2.16.4
> >
> From 54266260d483ab4476510dd4461a1cafc611e17d Mon Sep 17 00:00:00 2001
> Message-Id: 
> <54266260d483ab4476510dd4461a1cafc611e17d.1586266224.git.amit.kucheria
> @linaro.org>
> From: Amit Kucheria <amit.kucheria@linaro.org>
> Date: Tue, 7 Apr 2020 18:48:14 +0530
> Subject: [PATCH] thermal: Reject invalid cur_state input from 
> userspace
> 
> We don't check if the cur_state value input in sysfs is greater than 
> the maximum cooling state that the cooling device supports. This can 
> cause access to unallocated memory in case THERMAL_STATISTICS in 
> enabled and could also crash cooling devices that don't check for an 
> invalid state in their set_cur_state() callback.
> 
> Return an error if the state being requested in greater than the 
> maximum cooling state the device supports.
> 
> Reported-by: Takashi Iwai <tiwai@suse.de>
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
>  drivers/thermal/thermal_sysfs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c 
> b/drivers/thermal/thermal_sysfs.c index 7e1d11bdd258..8033e5a9386a
> 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -703,7 +703,7 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
>  		const char *buf, size_t count)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	unsigned long state;
> +	unsigned long state, max_state;
>  	int result;
>  
>  	if (sscanf(buf, "%ld\n", &state) != 1) @@ -712,6 +712,13 @@ 
> cur_state_store(struct device *dev, struct device_attribute *attr,
>  	if ((long)state < 0)
>  		return -EINVAL;
>  
> +	result = cdev->ops->get_max_state(cdev, &max_state);
> +	if (result)
> +		return result;
> +
> +	if (state >= max_state)
> +		return -EINVAL;
> +
>  	mutex_lock(&cdev->lock);
>  
>  	result = cdev->ops->set_cur_state(cdev, state);
> --
> 2.20.1
>
diff mbox series

Patch

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index aa99edb4dff7..a23c4e701d63 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -772,6 +772,11 @@  void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
 
 	spin_lock(&stats->lock);
 
+	if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
+			  "new state %ld exceeds max_state %ld",
+			  new_state, stats->max_states))
+		goto unlock;
+
 	if (stats->state == new_state)
 		goto unlock;