diff mbox series

drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

Message ID 20240326124838.3049215-1-badal.nilawar@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind | expand

Commit Message

Nilawar, Badal March 26, 2024, 12:48 p.m. UTC
i915_hwmon and its resources are managed resources of i915 dev.
During i915 driver unregister flow the function i915_hwmon_unregister()
explicitly makes i915_hwmon resource NULL. This happen before
hwmon is actually unregistered. Doing so may cause UAF if hwmon
sysfs is being accessed:

<7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
<7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
<4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
<4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U             6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
<4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
<4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
<4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
<4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
<4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
<4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
<4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
<4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
<4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
<4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
<4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
<4> [518.621158] PKRU: 55555554
<4> [518.623858] Call Trace:
<4> [518.626303]  <TASK>
<4> [518.628400]  ? __die_body+0x1a/0x60
<4> [518.631881]  ? die_addr+0x38/0x60
<4> [518.635182]  ? exc_general_protection+0x1a1/0x400
<4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
<4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
<4> [518.649007]  hwm_read+0x9a/0x310 [i915]
<4> [518.652953]  hwmon_attr_show+0x36/0x140
<4> [518.656775]  dev_attr_show+0x15/0x60

Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 2 --
 drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
 2 files changed, 7 deletions(-)

Comments

Krzysztofik, Janusz March 26, 2024, 9:28 p.m. UTC | #1
On Tuesday, 26 March 2024 13:48:38 CET Badal Nilawar wrote:
> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:
> 
> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U             6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
> <4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
> <4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
> <4> [518.621158] PKRU: 55555554
> <4> [518.623858] Call Trace:
> <4> [518.626303]  <TASK>
> <4> [518.628400]  ? __die_body+0x1a/0x60
> <4> [518.631881]  ? die_addr+0x38/0x60
> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> <4> [518.656775]  dev_attr_show+0x15/0x60

I don't think that's a good example of what you are talking about in your 
commit description.  I haven't looked throughout the i915 code to find out who 
actually uses that i915->hwmon pointer and when, but while looking at the 
hwmon code that now fails on sysfs access, I haven't noticed any use of 
i915->hwmon.

I think that instead of dropping i915_hwmon_unregister() we have to actually 
unregister hwmon in the body of that function, which is called from 
i915_driver_unregister() intended for closing user interfaces, then called 
relatively early during driver unbind, so hwmon sysfs entries disappear before 
i915 structures, especially uncore used by hwmon code, are freed.

Thanks,
Janusz

> 
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 2 --
>  drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
>  2 files changed, 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>  	for_each_gt(gt, dev_priv, i)
>  		intel_gt_driver_unregister(gt);
>  
> -	i915_hwmon_unregister(dev_priv);
> -
>  	i915_perf_unregister(dev_priv);
>  	i915_pmu_unregister(dev_priv);
>  
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
>  			ddat_gt->hwmon_dev = hwmon_dev;
>  	}
>  }
> -
> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> -{
> -	fetch_and_zero(&i915->hwmon);
> -}
> 

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
Nilawar, Badal March 27, 2024, 4:30 a.m. UTC | #2
On 27-03-2024 02:58, Krzysztofik, Janusz wrote:
> On Tuesday, 26 March 2024 13:48:38 CET Badal Nilawar wrote:
>> i915_hwmon and its resources are managed resources of i915 dev.
>> During i915 driver unregister flow the function i915_hwmon_unregister()
>> explicitly makes i915_hwmon resource NULL. This happen before
>> hwmon is actually unregistered. Doing so may cause UAF if hwmon
>> sysfs is being accessed:
>>
>> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
>> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
>> <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
>> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U             6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
>> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
>> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
>> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
>> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
>> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
>> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
>> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
>> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
>> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
>> <4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
>> <4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
>> <4> [518.621158] PKRU: 55555554
>> <4> [518.623858] Call Trace:
>> <4> [518.626303]  <TASK>
>> <4> [518.628400]  ? __die_body+0x1a/0x60
>> <4> [518.631881]  ? die_addr+0x38/0x60
>> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
>> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
>> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
>> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
>> <4> [518.652953]  hwmon_attr_show+0x36/0x140
>> <4> [518.656775]  dev_attr_show+0x15/0x60
> 
> I don't think that's a good example of what you are talking about in your
> commit description.  I haven't looked throughout the i915 code to find out who
> actually uses that i915->hwmon pointer and when, but while looking at the
> hwmon code that now fails on sysfs access, I haven't noticed any use of
> i915->hwmon.

i915_hwmon is main structure and I think issue is ddat is invalid.

struct hwm_drvdata {
         struct i915_hwmon *hwmon;
         struct intel_uncore *uncore;
         struct device *hwmon_dev;
         struct hwm_energy_info ei;  /*  Energy info for energy1_input */
         char name[12];
         int gt_n;
         bool reset_in_progress;
         wait_queue_head_t waitq;
};

struct i915_hwmon {
         struct hwm_drvdata ddat;
         struct hwm_drvdata ddat_gt[I915_MAX_GT];
         struct mutex hwmon_lock;    /* counter overflow logic and rmw */
         struct hwm_reg rg;
         int scl_shift_power;
         int scl_shift_energy;
         int scl_shift_time;
};

(gdb) list *hwm_energy+0x2b
0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
129             struct hwm_energy_info *ei = &ddat->ei;
130             intel_wakeref_t wakeref;
131             i915_reg_t rgaddr;
132             u32 reg_val;
133
134             if (ddat->gt_n >= 0)
135                     rgaddr = hwmon->rg.energy_status_tile;
136             else
137                     rgaddr = hwmon->rg.energy_status_all;
138

> 
> I think that instead of dropping i915_hwmon_unregister() we have to actually
> unregister hwmon in the body of that function, which is called from
> i915_driver_unregister() intended for closing user interfaces, then called
> relatively early during driver unbind, so hwmon sysfs entries disappear before
> i915 structures, especially uncore used by hwmon code, are freed.

You mean uncore are freed before hwmon get unregistered, I don't think 
so. uncore is also drm device managed resource so in sequence I think it 
should be freed after i915 dev managed resources freed.

  871 static int intel_gt_tile_setup(struct intel_gt *gt, phys_addr_t 
phys_addr)
  872 {
  873         int ret;
  874
  875         if (!gt_is_root(gt)) {
  876                 struct intel_uncore *uncore;
  877                 spinlock_t *irq_lock;
  878
  879                 uncore = drmm_kzalloc(&gt->i915->drm, 
sizeof(*uncore), GFP_KERNEL);
  880                 if (!uncore)
  881                         return -ENOMEM;
  882
  883                 irq_lock = drmm_kzalloc(&gt->i915->drm, 
sizeof(*irq_lock), GFP_KERNEL);
  884                 if (!irq_lock)
  885                         return -ENOMEM;
  886

Regards,
Badal
> 
> Thanks,
> Janusz
> 
>>
>> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
>> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
>> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_driver.c | 2 --
>>   drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
>>   2 files changed, 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>> index 4b9233c07a22..a95b455964b7 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>>   	for_each_gt(gt, dev_priv, i)
>>   		intel_gt_driver_unregister(gt);
>>   
>> -	i915_hwmon_unregister(dev_priv);
>> -
>>   	i915_perf_unregister(dev_priv);
>>   	i915_pmu_unregister(dev_priv);
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
>> index c9169e56b9a1..91f171752d34 100644
>> --- a/drivers/gpu/drm/i915/i915_hwmon.c
>> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
>> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
>>   			ddat_gt->hwmon_dev = hwmon_dev;
>>   	}
>>   }
>> -
>> -void i915_hwmon_unregister(struct drm_i915_private *i915)
>> -{
>> -	fetch_and_zero(&i915->hwmon);
>> -}
>>
>
Krzysztofik, Janusz March 27, 2024, 9:15 a.m. UTC | #3
Hi Badal,

On Wednesday, 27 March 2024 05:30:17 CET Nilawar, Badal wrote:
> 
> On 27-03-2024 02:58, Krzysztofik, Janusz wrote:
> > On Tuesday, 26 March 2024 13:48:38 CET Badal Nilawar wrote:
> >> i915_hwmon and its resources are managed resources of i915 dev.
> >> During i915 driver unregister flow the function i915_hwmon_unregister()
> >> explicitly makes i915_hwmon resource NULL. This happen before
> >> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> >> sysfs is being accessed:
> >>
> >> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> >> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
> >> <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
> >> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U             6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> >> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> >> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> >> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> >> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
> >> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
> >> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
> >> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
> >> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
> >> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
> >> <4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
> >> <4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
> >> <4> [518.621158] PKRU: 55555554
> >> <4> [518.623858] Call Trace:
> >> <4> [518.626303]  <TASK>
> >> <4> [518.628400]  ? __die_body+0x1a/0x60
> >> <4> [518.631881]  ? die_addr+0x38/0x60
> >> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> >> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> >> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> >> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> >> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> >> <4> [518.656775]  dev_attr_show+0x15/0x60
> > 
> > I don't think that's a good example of what you are talking about in your
> > commit description.  I haven't looked throughout the i915 code to find out who
> > actually uses that i915->hwmon pointer and when, but while looking at the
> > hwmon code that now fails on sysfs access, I haven't noticed any use of
> > i915->hwmon.
> 
> i915_hwmon is main structure and I think issue is ddat is invalid.
> 
> struct hwm_drvdata {
>          struct i915_hwmon *hwmon;
>          struct intel_uncore *uncore;
>          struct device *hwmon_dev;
>          struct hwm_energy_info ei;  /*  Energy info for energy1_input */
>          char name[12];
>          int gt_n;
>          bool reset_in_progress;
>          wait_queue_head_t waitq;
> };
> 
> struct i915_hwmon {
>          struct hwm_drvdata ddat;
>          struct hwm_drvdata ddat_gt[I915_MAX_GT];
>          struct mutex hwmon_lock;    /* counter overflow logic and rmw */
>          struct hwm_reg rg;
>          int scl_shift_power;
>          int scl_shift_energy;
>          int scl_shift_time;
> };
> 
> (gdb) list *hwm_energy+0x2b
> 0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
> 129             struct hwm_energy_info *ei = &ddat->ei;
> 130             intel_wakeref_t wakeref;
> 131             i915_reg_t rgaddr;
> 132             u32 reg_val;
> 133
> 134             if (ddat->gt_n >= 0)
> 135                     rgaddr = hwmon->rg.energy_status_tile;
> 136             else
> 137                     rgaddr = hwmon->rg.energy_status_all;
> 138

For me, that still doesn't explain why you think that i915->hwmon reset to 
NULL on i915 driver unregister can be the root cause of the reported UAF in 
hwmon sysfs and this patch is going to fix that UAF issue.  I can see no 
references to i915->hwmon in that code, and even then, that's not NULL what is 
reported here as non-canonical address.  The code is using a no longer valid 
pointer to an already freed (and poisoned) memory.

> 
> > 
> > I think that instead of dropping i915_hwmon_unregister() we have to actually
> > unregister hwmon in the body of that function, which is called from
> > i915_driver_unregister() intended for closing user interfaces, then called
> > relatively early during driver unbind, so hwmon sysfs entries disappear before
> > i915 structures, especially uncore used by hwmon code, are freed.
> 
> You mean uncore are freed before hwmon get unregistered, I don't think 
> so. uncore is also drm device managed resource so in sequence I think it 
> should be freed after i915 dev managed resources freed.

If both uncore and hwmon are managed resources of i915 device then how can you 
predict which of them is released first?

Thanks,
Janusz

> 
>   871 static int intel_gt_tile_setup(struct intel_gt *gt, phys_addr_t 
> phys_addr)
>   872 {
>   873         int ret;
>   874
>   875         if (!gt_is_root(gt)) {
>   876                 struct intel_uncore *uncore;
>   877                 spinlock_t *irq_lock;
>   878
>   879                 uncore = drmm_kzalloc(&gt->i915->drm, sizeof(*uncore), GFP_KERNEL);
>   880                 if (!uncore)
>   881                         return -ENOMEM;
>   882
>   883                 irq_lock = drmm_kzalloc(&gt->i915->drm, 
> sizeof(*irq_lock), GFP_KERNEL);
>   884                 if (!irq_lock)
>   885                         return -ENOMEM;
>   886
> 
> Regards,
> Badal
> > 
> > Thanks,
> > Janusz
> > 
> >>
> >> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> >> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> >> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> >> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_driver.c | 2 --
> >>   drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
> >>   2 files changed, 7 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> >> index 4b9233c07a22..a95b455964b7 100644
> >> --- a/drivers/gpu/drm/i915/i915_driver.c
> >> +++ b/drivers/gpu/drm/i915/i915_driver.c
> >> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> >>   	for_each_gt(gt, dev_priv, i)
> >>   		intel_gt_driver_unregister(gt);
> >>   
> >> -	i915_hwmon_unregister(dev_priv);
> >> -
> >>   	i915_perf_unregister(dev_priv);
> >>   	i915_pmu_unregister(dev_priv);
> >>   
> >> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> >> index c9169e56b9a1..91f171752d34 100644
> >> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> >> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> >> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
> >>   			ddat_gt->hwmon_dev = hwmon_dev;
> >>   	}
> >>   }
> >> -
> >> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> >> -{
> >> -	fetch_and_zero(&i915->hwmon);
> >> -}
> >>
> > 
> 

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
Dixit, Ashutosh March 27, 2024, 7:37 p.m. UTC | #4
On Tue, 26 Mar 2024 05:48:38 -0700, Badal Nilawar wrote:
>

Hi Badal,

> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:

I don't agree with this patch. For even if we remove setting to NULL, it
doesn't explain what we see, which is that the devm_kzalloc'd i915->hwmon
has been *freed* (since ddat (i915->hwmon->ddat) is 0x6b6b6b6b6b6b6dbf):

	(gdb) list *hwm_energy+0x2b
	0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
	129             struct hwm_energy_info *ei = &ddat->ei;
	130             intel_wakeref_t wakeref;
	131             i915_reg_t rgaddr;
	132             u32 reg_val;
	133
	134             if (ddat->gt_n >= 0)
	135                     rgaddr = hwmon->rg.energy_status_tile;
	136             else
	137                     rgaddr = hwmon->rg.energy_status_all;
	138

If we don't have i915_hwmon_unregister equivalent in xe, that's because xe
has not implemented equivalents of i915_hwmon_power_max_disable and
i915_hwmon_power_max_restore, where the check is used.

Also, we should verify any fix before sending a patch out, either via local
repro (say use a script to read hwmon energy file while running the
selftests) or CI. Though in this case CI doesn't show any failures but I am
wondering if that is only by chance.

Fwiu, the issue is somehow being caused by ddat being *freed* before hwmon
sysfs has disappeared, at module unload time (since selftests use module
unload), allowing prometheus-node process to call into the energy read
sysfs and cause the crash.

Thanks.
--
Ashutosh


>
> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U             6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
> <4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
> <4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
> <4> [518.621158] PKRU: 55555554
> <4> [518.623858] Call Trace:
> <4> [518.626303]  <TASK>
> <4> [518.628400]  ? __die_body+0x1a/0x60
> <4> [518.631881]  ? die_addr+0x38/0x60
> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> <4> [518.656775]  dev_attr_show+0x15/0x60
>
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 2 --
>  drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
>  2 files changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>	for_each_gt(gt, dev_priv, i)
>		intel_gt_driver_unregister(gt);
>
> -	i915_hwmon_unregister(dev_priv);
> -
>	i915_perf_unregister(dev_priv);
>	i915_pmu_unregister(dev_priv);
>
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
>			ddat_gt->hwmon_dev = hwmon_dev;
>	}
>  }
> -
> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> -{
> -	fetch_and_zero(&i915->hwmon);
> -}
> --
> 2.25.1
>
Dixit, Ashutosh March 27, 2024, 8:37 p.m. UTC | #5
On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
>

Hi Janusz,

> For me, that still doesn't explain why you think that i915->hwmon reset to
> NULL on i915 driver unregister can be the root cause of the reported UAF in
> hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> references to i915->hwmon in that code, and even then, that's not NULL what is
> reported here as non-canonical address.  The code is using a no longer valid
> pointer to an already freed (and poisoned) memory.

Correct, I said basically the same thing in my reply to the patch. That the
patch doesn't explain that ddat seems to have been freed and poisoned.

> > > I think that instead of dropping i915_hwmon_unregister() we have to actually
> > > unregister hwmon in the body of that function, which is called from
> > > i915_driver_unregister() intended for closing user interfaces, then called
> > > relatively early during driver unbind, so hwmon sysfs entries disappear before
> > > i915 structures, especially uncore used by hwmon code, are freed.
> >
> > You mean uncore are freed before hwmon get unregistered, I don't think
> > so. uncore is also drm device managed resource so in sequence I think it
> > should be freed after i915 dev managed resources freed.
>
> If both uncore and hwmon are managed resources of i915 device then how can you
> predict which of them is released first?

Look at __hwmon_device_register. Here we see:

	hdev->parent = dev

So the hwmon device is a child device of the drm device (against which ddat
is devm_kzalloc'd). Since a child device holds a reference against the
parent (device_add() has get_device(dev->parent)), I would expect the hwmon
device to disappear before the parent drm device. And I am assuming hwmon
sysfs is linked to the hwmon device, so sysfs should disappear before ddat
getting freed. But apparently this is not what is happening, so there's
still something we are missing.

Thanks.
--
Ashutosh
Krzysztofik, Janusz March 28, 2024, 10:06 a.m. UTC | #6
Hi Ashutosh,

On Wednesday, 27 March 2024 21:37:19 CET Dixit, Ashutosh wrote:
> On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
> >
> 
> Hi Janusz,
> 
> > For me, that still doesn't explain why you think that i915->hwmon reset to
> > NULL on i915 driver unregister can be the root cause of the reported UAF in
> > hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> > references to i915->hwmon in that code, and even then, that's not NULL what is
> > reported here as non-canonical address.  The code is using a no longer valid
> > pointer to an already freed (and poisoned) memory.
> 
> Correct, I said basically the same thing in my reply to the patch. That the
> patch doesn't explain that ddat seems to have been freed and poisoned.
> 
> > > > I think that instead of dropping i915_hwmon_unregister() we have to actually
> > > > unregister hwmon in the body of that function, which is called from
> > > > i915_driver_unregister() intended for closing user interfaces, then called
> > > > relatively early during driver unbind, so hwmon sysfs entries disappear before
> > > > i915 structures, especially uncore used by hwmon code, are freed.
> > >
> > > You mean uncore are freed before hwmon get unregistered, I don't think
> > > so. uncore is also drm device managed resource so in sequence I think it
> > > should be freed after i915 dev managed resources freed.
> >
> > If both uncore and hwmon are managed resources of i915 device then how can you
> > predict which of them is released first?
> 
> Look at __hwmon_device_register. Here we see:
> 
> 	hdev->parent = dev
> 
> So the hwmon device is a child device of the drm device (against which ddat
> is devm_kzalloc'd). Since a child device holds a reference against the
> parent (device_add() has get_device(dev->parent)), I would expect the hwmon
> device to disappear before the parent drm device. 

OK, but the parent device has a lot of managed resources, not only hwmon, and 
for me hwmon apparently depends on at least one of those resources, e.g., a 
structure with pointers to hardware accessors or hardware registers.  Release 
of all those i915 resources is not atomic, then it may happen that a resource 
which hwmon depends on is released while the hwmon resource still active and 
has its sysfs interface exposed, I believe.

> And I am assuming hwmon
> sysfs is linked to the hwmon device, so sysfs should disappear before ddat
> getting freed. But apparently this is not what is happening, 

If that occurred the actual scenario of the failure, i.e., from still user 
reachable code of our sysfs accessors, dev_get_drvdata(dev) pointing to the 
ddat field of an already freed hwmon structure, then I would blame hwmon core 
for that, not our usage of it.

Thanks,
Janusz

> so there's
> still something we are missing.
> 
> Thanks.
> --
> Ashutosh
> 

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
Dixit, Ashutosh March 29, 2024, 3:47 a.m. UTC | #7
On Wed, 27 Mar 2024 13:37:19 -0700, Dixit, Ashutosh wrote:
>
> On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
> >
>
> Hi Janusz,
>
> > For me, that still doesn't explain why you think that i915->hwmon reset to
> > NULL on i915 driver unregister can be the root cause of the reported UAF in
> > hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> > references to i915->hwmon in that code, and even then, that's not NULL what is
> > reported here as non-canonical address.  The code is using a no longer valid
> > pointer to an already freed (and poisoned) memory.
>
> Correct, I said basically the same thing in my reply to the patch. That the
> patch doesn't explain that ddat seems to have been freed and poisoned.
>
> > > > I think that instead of dropping i915_hwmon_unregister() we have to actually
> > > > unregister hwmon in the body of that function, which is called from
> > > > i915_driver_unregister() intended for closing user interfaces, then called
> > > > relatively early during driver unbind, so hwmon sysfs entries disappear before
> > > > i915 structures, especially uncore used by hwmon code, are freed.
> > >
> > > You mean uncore are freed before hwmon get unregistered, I don't think
> > > so. uncore is also drm device managed resource so in sequence I think it
> > > should be freed after i915 dev managed resources freed.
> >
> > If both uncore and hwmon are managed resources of i915 device then how can you
> > predict which of them is released first?
>
> Look at __hwmon_device_register. Here we see:
>
>	hdev->parent = dev
>
> So the hwmon device is a child device of the drm device (against which ddat
> is devm_kzalloc'd). Since a child device holds a reference against the
> parent (device_add() has get_device(dev->parent)), I would expect the hwmon
> device to disappear before the parent drm device. And I am assuming hwmon
> sysfs is linked to the hwmon device, so sysfs should disappear before ddat
> getting freed. But apparently this is not what is happening, so there's
> still something we are missing.

I looked into this a bit more. It doesn't seem to have to do with device
references. Basically, managed resources are maintained in a list and they
are removed in a reverse order from which they are added. See
release_nodes(). I believe (haven't really checked it but it looks to be
the case from the logs below) the call stack during release is:

device_release_driver ->
 __device_release_driver ->
  device_unbind_cleanup ->
   devres_release_all ->
    release_nodes

With CONFIG_DEBUG_DEVRES=y and 'echo 1 > /sys/module/devres/parameters/log'
and running the selftests I always see:

Alloc:

[  878.406556] i915 0000:03:00.0: DEVRES ADD ffff88815d8d4000 devm_kzalloc_release (616 bytes)
[  878.407632] i915 0000:03:00.0: DEVRES ADD ffff888199e6dd40 devm_hwmon_release (8 bytes)

Dealloc:

[  879.317635] i915 0000:03:00.0: DEVRES REL ffff888199e6dd40 devm_hwmon_release (8 bytes)
[  879.318261] i915 0000:03:00.0: DEVRES REL ffff88815d8d4000 devm_kzalloc_release (616 bytes)

So from the above the memory is always freed after hwmon is
unregistered. 616 bytes is 'sizeof(struct i915_hwmon)'.

So the code seems to be correct. Unless something else is going on these CI
systems where we see this crash.

If we can repro the crash locally (as I described in my other email) that
would be the best. Then we can verify any fix. If we cannot repro it
locally, then we'll need to think of something else. Basically the
direction would be to see if we eliminate devm_ and if that makes the issue
go away.

Thanks.
--
Ashutosh
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 4b9233c07a22..a95b455964b7 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -660,8 +660,6 @@  static void i915_driver_unregister(struct drm_i915_private *dev_priv)
 	for_each_gt(gt, dev_priv, i)
 		intel_gt_driver_unregister(gt);
 
-	i915_hwmon_unregister(dev_priv);
-
 	i915_perf_unregister(dev_priv);
 	i915_pmu_unregister(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
index c9169e56b9a1..91f171752d34 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -841,8 +841,3 @@  void i915_hwmon_register(struct drm_i915_private *i915)
 			ddat_gt->hwmon_dev = hwmon_dev;
 	}
 }
-
-void i915_hwmon_unregister(struct drm_i915_private *i915)
-{
-	fetch_and_zero(&i915->hwmon);
-}