diff mbox series

[v2,1/1] drm/i915/reset: Fix error_state_read ptr + offset use

Message ID 20220226062732.747531-2-alan.previn.teres.alexis@intel.com (mailing list archive)
State New, archived
Headers show
Series Fix i915 error_state_read ptr use | expand

Commit Message

Alan Previn Feb. 26, 2022, 6:27 a.m. UTC
Fix our pointer offset usage in error_state_read
when there is no i915_gpu_coredump but buf offset
is non-zero.

This fixes a kernel page fault can happen when
multiple tests are running concurrently in a loop
and one is producing engine resets and consuming
the i915 error_state dump while the other is
forcing full GT resets. (takes a while to trigger).

The dmesg call trace:

5014 [ 5590.803000] BUG: unable to handle page fault for address: ffffffffa0b0e000
5015 [ 5590.803009] #PF: supervisor read access in kernel mode
5016 [ 5590.803013] #PF: error_code(0x0000) - not-present page
5017 [ 5590.803016] PGD 5814067 P4D 5814067 PUD 5815063 PMD 109de4067 PTE 0
5018 [ 5590.803022] Oops: 0000 [#1] PREEMPT SMP NOPTI
5019 [ 5590.803026] CPU: 5 PID: 13656 Comm: i915_hangman Tainted: G U 5.17.0-rc5-
			ups69-guc-err-capt-rev6+ #136
5020 [ 5590.803033] Hardware name: Intel Corporation Alder Lake Client Platform/
			AlderLake-M LP4x RVP, BIOS ADLPFWI1.R00.3031.A02.2201171222
			01/17/2022
5021 [ 5590.803039] RIP: 0010:memcpy_erms+0x6/0x10
5022 [ 5590.803045] Code: fe ff ff cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83
			e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89
			d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
5023 [ 5590.803054] RSP: 0018:ffffc90003a8fdf0 EFLAGS: 00010282
5024 [ 5590.803057] RAX: ffff888107ee9000 RBX: ffff888108cb1a00 RCX: 0000000000000f8f
5025 [ 5590.803061] RDX: 0000000000001000 RSI: ffffffffa0b0e000 RDI: ffff888107ee9071
5026 [ 5590.803065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
5027 [ 5590.803069] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000019
5028 [ 5590.803073] R13: 0000000000174fff R14: 0000000000001000 R15: ffff888107ee9000
5029 [ 5590.803077] FS: 00007f62a99bee80(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000
5030 [ 5590.803082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
5031 [ 5590.803085] CR2: ffffffffa0b0e000 CR3: 000000010a1a8004 CR4: 0000000000770ee0
5032 [ 5590.803089] PKRU: 55555554
5033 [ 5590.803091] Call Trace:
5034 [ 5590.803093] <TASK>
5035 [ 5590.803096] error_state_read+0xa1/0xd0 [i915]
5036 [ 5590.803175] kernfs_fop_read_iter+0xb2/0x1b0
5037 [ 5590.803180] new_sync_read+0x116/0x1a0
5038 [ 5590.803185] vfs_read+0x114/0x1b0
5039 [ 5590.803189] ksys_read+0x63/0xe0
5040 [ 5590.803193] do_syscall_64+0x38/0xc0
5041 [ 5590.803197] entry_SYSCALL_64_after_hwframe+0x44/0xae
5042 [ 5590.803201] RIP: 0033:0x7f62aaea5912
5043 [ 5590.803204] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a b9 0c 00 e8 05 19 02 00 0f 1f 44 00
			00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0
			ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
5044 [ 5590.803213] RSP: 002b:00007fff5b659ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
5045 [ 5590.803218] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f62aaea5912
5046 [ 5590.803221] RDX: 000000000008b000 RSI: 00007f62a8c4000f RDI: 0000000000000006
5047 [ 5590.803225] RBP: 00007f62a8bcb00f R08: 0000000000200010 R09: 0000000000101000
5048 [ 5590.803229] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000006
5049 [ 5590.803233] R13: 0000000000075000 R14: 00007f62a8acb010 R15: 0000000000200000
5050 [ 5590.803238] </TASK>
5051 [ 5590.803240] Modules linked in: i915 ttm drm_buddy drm_dp_helper drm_kms_helper syscopyarea
			sysfillrect sysimgblt fb_sys_fops prime_numbers nfnetlink br_netfilter overlay
			mei_pxp mei_hdcp x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi
			snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me
			mei fuse ip_tables x_tables crct10dif_pclmul e1000e crc32_pclmul ptp i2c_i801
			ghash_clmulni_intel i2c_smbus pps_core [last unloa ded: ttm]
5052 [ 5590.803277] CR2: ffffffffa0b0e000
5053 [ 5590.803280] ---[ end trace 0000000000000000 ]---

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/i915_sysfs.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

John Harrison March 1, 2022, 9:37 p.m. UTC | #1
On 2/25/2022 22:27, Alan Previn wrote:
> Fix our pointer offset usage in error_state_read
> when there is no i915_gpu_coredump but buf offset
> is non-zero.
>
> This fixes a kernel page fault can happen when
> multiple tests are running concurrently in a loop
> and one is producing engine resets and consuming
> the i915 error_state dump while the other is
> forcing full GT resets. (takes a while to trigger).
Does need a fixes tag given that it is fixing a bug in an earlier patch. 
Especially when it is a kernel BUG.
E.g.:
13:23> dim fixes 0e39037b31655
Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.0+


> The dmesg call trace:
>
> 5014 [ 5590.803000] BUG: unable to handle page fault for address: ffffffffa0b0e000
> 5015 [ 5590.803009] #PF: supervisor read access in kernel mode
> 5016 [ 5590.803013] #PF: error_code(0x0000) - not-present page
> 5017 [ 5590.803016] PGD 5814067 P4D 5814067 PUD 5815063 PMD 109de4067 PTE 0
> 5018 [ 5590.803022] Oops: 0000 [#1] PREEMPT SMP NOPTI
> 5019 [ 5590.803026] CPU: 5 PID: 13656 Comm: i915_hangman Tainted: G U 5.17.0-rc5-
> 			ups69-guc-err-capt-rev6+ #136
> 5020 [ 5590.803033] Hardware name: Intel Corporation Alder Lake Client Platform/
> 			AlderLake-M LP4x RVP, BIOS ADLPFWI1.R00.3031.A02.2201171222
> 			01/17/2022
> 5021 [ 5590.803039] RIP: 0010:memcpy_erms+0x6/0x10
> 5022 [ 5590.803045] Code: fe ff ff cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83
> 			e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89
> 			d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
> 5023 [ 5590.803054] RSP: 0018:ffffc90003a8fdf0 EFLAGS: 00010282
> 5024 [ 5590.803057] RAX: ffff888107ee9000 RBX: ffff888108cb1a00 RCX: 0000000000000f8f
> 5025 [ 5590.803061] RDX: 0000000000001000 RSI: ffffffffa0b0e000 RDI: ffff888107ee9071
> 5026 [ 5590.803065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> 5027 [ 5590.803069] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000019
> 5028 [ 5590.803073] R13: 0000000000174fff R14: 0000000000001000 R15: ffff888107ee9000
> 5029 [ 5590.803077] FS: 00007f62a99bee80(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000
> 5030 [ 5590.803082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 5031 [ 5590.803085] CR2: ffffffffa0b0e000 CR3: 000000010a1a8004 CR4: 0000000000770ee0
> 5032 [ 5590.803089] PKRU: 55555554
> 5033 [ 5590.803091] Call Trace:
> 5034 [ 5590.803093] <TASK>
> 5035 [ 5590.803096] error_state_read+0xa1/0xd0 [i915]
> 5036 [ 5590.803175] kernfs_fop_read_iter+0xb2/0x1b0
> 5037 [ 5590.803180] new_sync_read+0x116/0x1a0
> 5038 [ 5590.803185] vfs_read+0x114/0x1b0
> 5039 [ 5590.803189] ksys_read+0x63/0xe0
> 5040 [ 5590.803193] do_syscall_64+0x38/0xc0
> 5041 [ 5590.803197] entry_SYSCALL_64_after_hwframe+0x44/0xae
> 5042 [ 5590.803201] RIP: 0033:0x7f62aaea5912
> 5043 [ 5590.803204] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a b9 0c 00 e8 05 19 02 00 0f 1f 44 00
> 			00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0
> 			ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> 5044 [ 5590.803213] RSP: 002b:00007fff5b659ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> 5045 [ 5590.803218] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f62aaea5912
> 5046 [ 5590.803221] RDX: 000000000008b000 RSI: 00007f62a8c4000f RDI: 0000000000000006
> 5047 [ 5590.803225] RBP: 00007f62a8bcb00f R08: 0000000000200010 R09: 0000000000101000
> 5048 [ 5590.803229] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000006
> 5049 [ 5590.803233] R13: 0000000000075000 R14: 00007f62a8acb010 R15: 0000000000200000
> 5050 [ 5590.803238] </TASK>
> 5051 [ 5590.803240] Modules linked in: i915 ttm drm_buddy drm_dp_helper drm_kms_helper syscopyarea
> 			sysfillrect sysimgblt fb_sys_fops prime_numbers nfnetlink br_netfilter overlay
> 			mei_pxp mei_hdcp x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi
> 			snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me
> 			mei fuse ip_tables x_tables crct10dif_pclmul e1000e crc32_pclmul ptp i2c_i801
> 			ghash_clmulni_intel i2c_smbus pps_core [last unloa ded: ttm]
> 5052 [ 5590.803277] CR2: ffffffffa0b0e000
> 5053 [ 5590.803280] ---[ end trace 0000000000000000 ]---
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_sysfs.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index a4d1759375b9..c40e51298066 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
>   	struct device *kdev = kobj_to_dev(kobj);
>   	struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
>   	struct i915_gpu_coredump *gpu;
> -	ssize_t ret;
> +	ssize_t ret = 0;
>   
>   	gpu = i915_first_error_state(i915);
>   	if (IS_ERR(gpu)) {
> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
>   		const char *str = "No error state collected\n";
>   		size_t len = strlen(str);
>   
> -		ret = min_t(size_t, count, len - off);
> -		memcpy(buf, str + off, ret);
> +		if (off < len) {
> +			ret = min_t(size_t, count, len - off);
> +			memcpy(buf, str + off, ret);
> +		}
So the problem is that the error dump disappeared while it was being 
read? That seems like it cause more problems than just this out-of-range 
access. E.g. what if the dump was freed and a new one created? The 
entity doing the partial reads would end up with half of one dump and 
half of the next.

I think we should at least add a FIXME comment to the code that fast 
recycling dumps could cause inconsistent sysfs reads.

I guess you could do something like add a unique id to the gpu coredump 
structure. Then, when a partial sysfs read occurs starting at zero (i.e. 
a new read), take a note of the id somewhere (e.g. inside the i915 
structure). When the next non-zero read comes in, compare the save id 
with the current coredump's id and return an error if there is a mis-match.

Not sure if this would be viewed as an important enough problem to be 
worth fixing. I'd be happy with just a FIXME comment for now.

I would also change the test to be 'if (off)' rather than 'if (off < 
len)'. Technically, the user could have read the first 10 bytes of a 
coredump and then gets "tate collected\n" as the remainder, for example. 
If we ensure that 'off' is zero then we know that this is a fresh read 
from scratch.

John.


>   	}
>   
>   	return ret;
Alan Previn March 8, 2022, 7:47 p.m. UTC | #2
On 3/1/2022 1:37 PM, John Harrison wrote:
> On 2/25/2022 22:27, Alan Previn wrote:
>> ...
>> This fixes a kernel page fault can happen when
>> multiple tests are running concurrently in a loop
>> and one is producing engine resets and consuming
>> the i915 error_state dump while the other is
>> forcing full GT resets. (takes a while to trigger).
> Does need a fixes tag given that it is fixing a bug in an earlier 
> patch. Especially when it is a kernel BUG.
> E.g.:
> 13:23> dim fixes 0e39037b31655
> Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
>
okay, will add that.

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
b/drivers/gpu/drm/i915/i915_sysfs.c
>> index a4d1759375b9..c40e51298066 100644
>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file 
>> *filp, struct kobject *kobj,
>>       struct device *kdev = kobj_to_dev(kobj);
>>       struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
>>       struct i915_gpu_coredump *gpu;
>> -    ssize_t ret;
>> +    ssize_t ret = 0;
>>         gpu = i915_first_error_state(i915);
>>       if (IS_ERR(gpu)) {
>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file 
>> *filp, struct kobject *kobj,
>>           const char *str = "No error state collected\n";
>>           size_t len = strlen(str);
>>   -        ret = min_t(size_t, count, len - off);
>> -        memcpy(buf, str + off, ret);
>> +        if (off < len) {
>> +            ret = min_t(size_t, count, len - off);
>> +            memcpy(buf, str + off, ret);
>> +        }
> So the problem is that the error dump disappeared while it was being 
> read? That seems like it cause more problems than just this 
> out-of-range access. E.g. what if the dump was freed and a new one 
> created? The entity doing the partial reads would end up with half of 
> one dump and half of the next.
>
> I think we should at least add a FIXME comment to the code that fast 
> recycling dumps could cause inconsistent sysfs reads.
>
> I guess you could do something like add a unique id to the gpu 
> coredump structure. Then, when a partial sysfs read occurs starting at 
> zero (i.e. a new read), take a note of the id somewhere (e.g. inside 
> the i915 structure). When the next non-zero read comes in, compare the 
> save id with the current coredump's id and return an error if there is 
> a mis-match.
>
> Not sure if this would be viewed as an important enough problem to be 
> worth fixing. I'd be happy with just a FIXME comment for now.
For now I shall add a FIXME additional checks might impact IGT's that 
currently dump and check the error state. I would assume with that 
additional check in kernel, we would need to return a specific error 
value like -ENODATA or -ENOLINK and handle it accordingly in the igt.
>
> I would also change the test to be 'if (off)' rather than 'if (off < 
> len)'. Technically, the user could have read the first 10 bytes of a 
> coredump and then gets "tate collected\n" as the remainder, for 
> example. If we ensure that 'off' is zero then we know that this is a 
> fresh read from scratch.
>
> John.
>
I'm a little confused, did u mean: in the case we dont have a gpu-state 
to report, and then the user offset is zero (i.e. "if (!off)" ) then we 
copy the string vs if we dont have a gpu-state to report and the 
user-offset is non-zero, then we return an error (like the -ENOLINK?). 
If thats what you meant, then it does mean we are assuming that (in the 
case we dont have a gpu-state) the user will never come in with a 
first-time-read where the user's buffer size of less than the string 
length and be expected continue to read the rest of the string-length. 
This i guess is okay since even 6 chars are enough to say "No err".  :)
>>       }
>>         return ret;
>
John Harrison March 10, 2022, 1:18 a.m. UTC | #3
On 3/8/2022 11:47, Teres Alexis, Alan Previn wrote:
> On 3/1/2022 1:37 PM, John Harrison wrote:
>> On 2/25/2022 22:27, Alan Previn wrote:
>>> ...
>>> This fixes a kernel page fault can happen when
>>> multiple tests are running concurrently in a loop
>>> and one is producing engine resets and consuming
>>> the i915 error_state dump while the other is
>>> forcing full GT resets. (takes a while to trigger).
>> Does need a fixes tag given that it is fixing a bug in an earlier 
>> patch. Especially when it is a kernel BUG.
>> E.g.:
>> 13:23> dim fixes 0e39037b31655
>> Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
>>
> okay, will add that.
>
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
> b/drivers/gpu/drm/i915/i915_sysfs.c
>>> index a4d1759375b9..c40e51298066 100644
>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file 
>>> *filp, struct kobject *kobj,
>>>       struct device *kdev = kobj_to_dev(kobj);
>>>       struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
>>>       struct i915_gpu_coredump *gpu;
>>> -    ssize_t ret;
>>> +    ssize_t ret = 0;
>>>         gpu = i915_first_error_state(i915);
>>>       if (IS_ERR(gpu)) {
>>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file 
>>> *filp, struct kobject *kobj,
>>>           const char *str = "No error state collected\n";
>>>           size_t len = strlen(str);
>>>   -        ret = min_t(size_t, count, len - off);
>>> -        memcpy(buf, str + off, ret);
>>> +        if (off < len) {
>>> +            ret = min_t(size_t, count, len - off);
>>> +            memcpy(buf, str + off, ret);
>>> +        }
>> So the problem is that the error dump disappeared while it was being 
>> read? That seems like it cause more problems than just this 
>> out-of-range access. E.g. what if the dump was freed and a new one 
>> created? The entity doing the partial reads would end up with half of 
>> one dump and half of the next.
>>
>> I think we should at least add a FIXME comment to the code that fast 
>> recycling dumps could cause inconsistent sysfs reads.
>>
>> I guess you could do something like add a unique id to the gpu 
>> coredump structure. Then, when a partial sysfs read occurs starting 
>> at zero (i.e. a new read), take a note of the id somewhere (e.g. 
>> inside the i915 structure). When the next non-zero read comes in, 
>> compare the save id with the current coredump's id and return an 
>> error if there is a mis-match.
>>
>> Not sure if this would be viewed as an important enough problem to be 
>> worth fixing. I'd be happy with just a FIXME comment for now.
> For now I shall add a FIXME additional checks might impact IGT's that 
> currently dump and check the error state. I would assume with that 
> additional check in kernel, we would need to return a specific error 
> value like -ENODATA or -ENOLINK and handle it accordingly in the igt.
If an an extra check against returning invalid data affects an existing 
IGT then that IGT is already broken. The check would to prevent userland 
reading the first half of one capture and then getting the second half 
of a completely different one. If that is already happening then the 
returned data is meaningless gibberish and even if the IGT says it is 
passing, it really isn't.


>>
>> I would also change the test to be 'if (off)' rather than 'if (off < 
>> len)'. Technically, the user could have read the first 10 bytes of a 
>> coredump and then gets "tate collected\n" as the remainder, for 
>> example. If we ensure that 'off' is zero then we know that this is a 
>> fresh read from scratch.
>>
>> John.
>>
> I'm a little confused, did u mean: in the case we dont have a 
> gpu-state to report, and then the user offset is zero (i.e. "if 
> (!off)" ) then we copy the string vs if we dont have a gpu-state to 
> report and the user-offset is non-zero, then we return an error (like 
> the -ENOLINK?). If thats what you meant, then it does mean we are 
> assuming that (in the case we dont have a gpu-state) the user will 
> never come in with a first-time-read where the user's buffer size of 
> less than the string length and be expected continue to read the rest 
> of the string-length. This i guess is okay since even 6 chars are 
> enough to say "No err".  :)
Sorry, yes. That was meant to say 'if (!off)'.

Hmm, I guess technically the user could be issuing single byte reads. 
User's can be evil.

Okay. Maybe just add a FIXME about needing to check for 
changed/missing/new error state since last read if the offset is 
non-zero and otherwise leave it as is.

John.


>>>       }
>>>         return ret;
>>
Alan Previn March 10, 2022, 1:45 a.m. UTC | #4
On 3/9/2022 5:18 PM, John Harrison wrote:
> On 3/8/2022 11:47, Teres Alexis, Alan Previn wrote:
>> On 3/1/2022 1:37 PM, John Harrison wrote:
>>> On 2/25/2022 22:27, Alan Previn wrote:
>>>> ...
>>>> This fixes a kernel page fault can happen when
>>>> multiple tests are running concurrently in a loop
>>>> and one is producing engine resets and consuming
>>>> the i915 error_state dump while the other is
>>>> forcing full GT resets. (takes a while to trigger).
>>> Does need a fixes tag given that it is fixing a bug in an earlier 
>>> patch. Especially when it is a kernel BUG.
>>> E.g.:
>>> 13:23> dim fixes 0e39037b31655
>>> Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
>>>
>> okay, will add that.
>>
>> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
>> b/drivers/gpu/drm/i915/i915_sysfs.c
>>>> index a4d1759375b9..c40e51298066 100644
>>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>>>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file 
>>>> *filp, struct kobject *kobj,
>>>>       struct device *kdev = kobj_to_dev(kobj);
>>>>       struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
>>>>       struct i915_gpu_coredump *gpu;
>>>> -    ssize_t ret;
>>>> +    ssize_t ret = 0;
>>>>         gpu = i915_first_error_state(i915);
>>>>       if (IS_ERR(gpu)) {
>>>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file 
>>>> *filp, struct kobject *kobj,
>>>>           const char *str = "No error state collected\n";
>>>>           size_t len = strlen(str);
>>>>   -        ret = min_t(size_t, count, len - off);
>>>> -        memcpy(buf, str + off, ret);
>>>> +        if (off < len) {
>>>> +            ret = min_t(size_t, count, len - off);
>>>> +            memcpy(buf, str + off, ret);
>>>> +        }
>>> So the problem is that the error dump disappeared while it was being 
>>> read? That seems like it cause more problems than just this 
>>> out-of-range access. E.g. what if the dump was freed and a new one 
>>> created? The entity doing the partial reads would end up with half 
>>> of one dump and half of the next.
>>>
>>> I think we should at least add a FIXME comment to the code that fast 
>>> recycling dumps could cause inconsistent sysfs reads.
>>>
>>> I guess you could do something like add a unique id to the gpu 
>>> coredump structure. Then, when a partial sysfs read occurs starting 
>>> at zero (i.e. a new read), take a note of the id somewhere (e.g. 
>>> inside the i915 structure). When the next non-zero read comes in, 
>>> compare the save id with the current coredump's id and return an 
>>> error if there is a mis-match.
>>>
>>> Not sure if this would be viewed as an important enough problem to 
>>> be worth fixing. I'd be happy with just a FIXME comment for now.
>> For now I shall add a FIXME additional checks might impact IGT's that 
>> currently dump and check the error state. I would assume with that 
>> additional check in kernel, we would need to return a specific error 
>> value like -ENODATA or -ENOLINK and handle it accordingly in the igt.
> If an an extra check against returning invalid data affects an 
> existing IGT then that IGT is already broken. The check would to 
> prevent userland reading the first half of one capture and then 
> getting the second half of a completely different one. If that is 
> already happening then the returned data is meaningless gibberish and 
> even if the IGT says it is passing, it really isn't.
>
>
>>>
>>> I would also change the test to be 'if (off)' rather than 'if (off < 
>>> len)'. Technically, the user could have read the first 10 bytes of a 
>>> coredump and then gets "tate collected\n" as the remainder, for 
>>> example. If we ensure that 'off' is zero then we know that this is a 
>>> fresh read from scratch.
>>>
>>> John.
>>>
>> I'm a little confused, did u mean: in the case we dont have a 
>> gpu-state to report, and then the user offset is zero (i.e. "if 
>> (!off)" ) then we copy the string vs if we dont have a gpu-state to 
>> report and the user-offset is non-zero, then we return an error (like 
>> the -ENOLINK?). If thats what you meant, then it does mean we are 
>> assuming that (in the case we dont have a gpu-state) the user will 
>> never come in with a first-time-read where the user's buffer size of 
>> less than the string length and be expected continue to read the rest 
>> of the string-length. This i guess is okay since even 6 chars are 
>> enough to say "No err". :)
> Sorry, yes. That was meant to say 'if (!off)'.
>
> Hmm, I guess technically the user could be issuing single byte reads. 
> User's can be evil.
>
> Okay. Maybe just add a FIXME about needing to check for 
> changed/missing/new error state since last read if the offset is 
> non-zero and otherwise leave it as is.
>
> John.
>
Sounds good - will do. Apologies on the tardiness with the responses.
>
>>>>       }
>>>>         return ret;
>>>
>
Alan Previn March 11, 2022, 4:52 a.m. UTC | #5
i missed something in rev3, but rev4 ended up as a different series.

Please review this new URL for this patch. Apologies for the confusion:

https://patchwork.freedesktop.org/series/101256/


...alan

On 3/9/2022 5:45 PM, Teres Alexis, Alan Previn wrote:
>
> On 3/9/2022 5:18 PM, John Harrison wrote:
>> On 3/8/2022 11:47, Teres Alexis, Alan Previn wrote:
>>> On 3/1/2022 1:37 PM, John Harrison wrote:
>>>> On 2/25/2022 22:27, Alan Previn wrote:
>>>>> ...
>>>>> This fixes a kernel page fault can happen when
>>>>> multiple tests are running concurrently in a loop
>>>>> and one is producing engine resets and consuming
>>>>> the i915 error_state dump while the other is
>>>>> forcing full GT resets. (takes a while to trigger).
>>>> Does need a fixes tag given that it is fixing a bug in an earlier 
>>>> patch. Especially when it is a kernel BUG.
>>>> E.g.:
>>>> 13:23> dim fixes 0e39037b31655
>>>> Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
>>>>
>>> okay, will add that.
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
>>> b/drivers/gpu/drm/i915/i915_sysfs.c
>>>>> index a4d1759375b9..c40e51298066 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>>>>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file 
>>>>> *filp, struct kobject *kobj,
>>>>>       struct device *kdev = kobj_to_dev(kobj);
>>>>>       struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
>>>>>       struct i915_gpu_coredump *gpu;
>>>>> -    ssize_t ret;
>>>>> +    ssize_t ret = 0;
>>>>>         gpu = i915_first_error_state(i915);
>>>>>       if (IS_ERR(gpu)) {
>>>>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file 
>>>>> *filp, struct kobject *kobj,
>>>>>           const char *str = "No error state collected\n";
>>>>>           size_t len = strlen(str);
>>>>>   -        ret = min_t(size_t, count, len - off);
>>>>> -        memcpy(buf, str + off, ret);
>>>>> +        if (off < len) {
>>>>> +            ret = min_t(size_t, count, len - off);
>>>>> +            memcpy(buf, str + off, ret);
>>>>> +        }
>>>> So the problem is that the error dump disappeared while it was 
>>>> being read? That seems like it cause more problems than just this 
>>>> out-of-range access. E.g. what if the dump was freed and a new one 
>>>> created? The entity doing the partial reads would end up with half 
>>>> of one dump and half of the next.
>>>>
>>>> I think we should at least add a FIXME comment to the code that 
>>>> fast recycling dumps could cause inconsistent sysfs reads.
>>>>
>>>> I guess you could do something like add a unique id to the gpu 
>>>> coredump structure. Then, when a partial sysfs read occurs starting 
>>>> at zero (i.e. a new read), take a note of the id somewhere (e.g. 
>>>> inside the i915 structure). When the next non-zero read comes in, 
>>>> compare the save id with the current coredump's id and return an 
>>>> error if there is a mis-match.
>>>>
>>>> Not sure if this would be viewed as an important enough problem to 
>>>> be worth fixing. I'd be happy with just a FIXME comment for now.
>>> For now I shall add a FIXME additional checks might impact IGT's 
>>> that currently dump and check the error state. I would assume with 
>>> that additional check in kernel, we would need to return a specific 
>>> error value like -ENODATA or -ENOLINK and handle it accordingly in 
>>> the igt.
>> If an an extra check against returning invalid data affects an 
>> existing IGT then that IGT is already broken. The check would to 
>> prevent userland reading the first half of one capture and then 
>> getting the second half of a completely different one. If that is 
>> already happening then the returned data is meaningless gibberish and 
>> even if the IGT says it is passing, it really isn't.
>>
>>
>>>>
>>>> I would also change the test to be 'if (off)' rather than 'if (off 
>>>> < len)'. Technically, the user could have read the first 10 bytes 
>>>> of a coredump and then gets "tate collected\n" as the remainder, 
>>>> for example. If we ensure that 'off' is zero then we know that this 
>>>> is a fresh read from scratch.
>>>>
>>>> John.
>>>>
>>> I'm a little confused, did u mean: in the case we dont have a 
>>> gpu-state to report, and then the user offset is zero (i.e. "if 
>>> (!off)" ) then we copy the string vs if we dont have a gpu-state to 
>>> report and the user-offset is non-zero, then we return an error 
>>> (like the -ENOLINK?). If thats what you meant, then it does mean we 
>>> are assuming that (in the case we dont have a gpu-state) the user 
>>> will never come in with a first-time-read where the user's buffer 
>>> size of less than the string length and be expected continue to read 
>>> the rest of the string-length. This i guess is okay since even 6 
>>> chars are enough to say "No err". :)
>> Sorry, yes. That was meant to say 'if (!off)'.
>>
>> Hmm, I guess technically the user could be issuing single byte reads. 
>> User's can be evil.
>>
>> Okay. Maybe just add a FIXME about needing to check for 
>> changed/missing/new error state since last read if the offset is 
>> non-zero and otherwise leave it as is.
>>
>> John.
>>
> Sounds good - will do. Apologies on the tardiness with the responses.
>>
>>>>>       }
>>>>>         return ret;
>>>>
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index a4d1759375b9..c40e51298066 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -432,7 +432,7 @@  static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
 	struct device *kdev = kobj_to_dev(kobj);
 	struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
 	struct i915_gpu_coredump *gpu;
-	ssize_t ret;
+	ssize_t ret = 0;
 
 	gpu = i915_first_error_state(i915);
 	if (IS_ERR(gpu)) {
@@ -444,8 +444,10 @@  static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
 		const char *str = "No error state collected\n";
 		size_t len = strlen(str);
 
-		ret = min_t(size_t, count, len - off);
-		memcpy(buf, str + off, ret);
+		if (off < len) {
+			ret = min_t(size_t, count, len - off);
+			memcpy(buf, str + off, ret);
+		}
 	}
 
 	return ret;