Message ID | 20220226062732.747531-2-alan.previn.teres.alexis@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix i915 error_state_read ptr use | expand |
On 2/25/2022 22:27, Alan Previn wrote: > Fix our pointer offset usage in error_state_read > when there is no i915_gpu_coredump but buf offset > is non-zero. > > This fixes a kernel page fault can happen when > multiple tests are running concurrently in a loop > and one is producing engine resets and consuming > the i915 error_state dump while the other is > forcing full GT resets. (takes a while to trigger). Does need a fixes tag given that it is fixing a bug in an earlier patch. Especially when it is a kernel BUG. E.g.: 13:23> dim fixes 0e39037b31655 Fixes: 0e39037b3165 ("drm/i915: Cache the error string") Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.0+ > The dmesg call trace: > > 5014 [ 5590.803000] BUG: unable to handle page fault for address: ffffffffa0b0e000 > 5015 [ 5590.803009] #PF: supervisor read access in kernel mode > 5016 [ 5590.803013] #PF: error_code(0x0000) - not-present page > 5017 [ 5590.803016] PGD 5814067 P4D 5814067 PUD 5815063 PMD 109de4067 PTE 0 > 5018 [ 5590.803022] Oops: 0000 [#1] PREEMPT SMP NOPTI > 5019 [ 5590.803026] CPU: 5 PID: 13656 Comm: i915_hangman Tainted: G U 5.17.0-rc5- > ups69-guc-err-capt-rev6+ #136 > 5020 [ 5590.803033] Hardware name: Intel Corporation Alder Lake Client Platform/ > AlderLake-M LP4x RVP, BIOS ADLPFWI1.R00.3031.A02.2201171222 > 01/17/2022 > 5021 [ 5590.803039] RIP: 0010:memcpy_erms+0x6/0x10 > 5022 [ 5590.803045] Code: fe ff ff cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 > e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 > d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe > 5023 [ 5590.803054] RSP: 0018:ffffc90003a8fdf0 EFLAGS: 00010282 > 5024 [ 5590.803057] RAX: ffff888107ee9000 RBX: ffff888108cb1a00 RCX: 0000000000000f8f > 5025 [ 5590.803061] RDX: 0000000000001000 RSI: ffffffffa0b0e000 RDI: ffff888107ee9071 > 5026 [ 5590.803065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 > 5027 [ 5590.803069] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000019 > 5028 [ 5590.803073] R13: 0000000000174fff R14: 0000000000001000 R15: ffff888107ee9000 > 5029 [ 5590.803077] FS: 00007f62a99bee80(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000 > 5030 [ 5590.803082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 5031 [ 5590.803085] CR2: ffffffffa0b0e000 CR3: 000000010a1a8004 CR4: 0000000000770ee0 > 5032 [ 5590.803089] PKRU: 55555554 > 5033 [ 5590.803091] Call Trace: > 5034 [ 5590.803093] <TASK> > 5035 [ 5590.803096] error_state_read+0xa1/0xd0 [i915] > 5036 [ 5590.803175] kernfs_fop_read_iter+0xb2/0x1b0 > 5037 [ 5590.803180] new_sync_read+0x116/0x1a0 > 5038 [ 5590.803185] vfs_read+0x114/0x1b0 > 5039 [ 5590.803189] ksys_read+0x63/0xe0 > 5040 [ 5590.803193] do_syscall_64+0x38/0xc0 > 5041 [ 5590.803197] entry_SYSCALL_64_after_hwframe+0x44/0xae > 5042 [ 5590.803201] RIP: 0033:0x7f62aaea5912 > 5043 [ 5590.803204] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a b9 0c 00 e8 05 19 02 00 0f 1f 44 00 > 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 > ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 > 5044 [ 5590.803213] RSP: 002b:00007fff5b659ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > 5045 [ 5590.803218] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f62aaea5912 > 5046 [ 5590.803221] RDX: 000000000008b000 RSI: 00007f62a8c4000f RDI: 0000000000000006 > 5047 [ 5590.803225] RBP: 00007f62a8bcb00f R08: 0000000000200010 R09: 0000000000101000 > 5048 [ 5590.803229] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000006 > 5049 [ 5590.803233] R13: 0000000000075000 R14: 00007f62a8acb010 R15: 0000000000200000 > 5050 [ 5590.803238] </TASK> > 5051 [ 5590.803240] Modules linked in: i915 ttm drm_buddy drm_dp_helper drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops prime_numbers nfnetlink br_netfilter overlay > mei_pxp mei_hdcp x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi > snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me > mei fuse ip_tables x_tables crct10dif_pclmul e1000e crc32_pclmul ptp i2c_i801 > ghash_clmulni_intel i2c_smbus pps_core [last unloa ded: ttm] > 5052 [ 5590.803277] CR2: ffffffffa0b0e000 > 5053 [ 5590.803280] ---[ end trace 0000000000000000 ]--- > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> > --- > drivers/gpu/drm/i915/i915_sysfs.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c > index a4d1759375b9..c40e51298066 100644 > --- a/drivers/gpu/drm/i915/i915_sysfs.c > +++ b/drivers/gpu/drm/i915/i915_sysfs.c > @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj, > struct device *kdev = kobj_to_dev(kobj); > struct drm_i915_private *i915 = kdev_minor_to_i915(kdev); > struct i915_gpu_coredump *gpu; > - ssize_t ret; > + ssize_t ret = 0; > > gpu = i915_first_error_state(i915); > if (IS_ERR(gpu)) { > @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj, > const char *str = "No error state collected\n"; > size_t len = strlen(str); > > - ret = min_t(size_t, count, len - off); > - memcpy(buf, str + off, ret); > + if (off < len) { > + ret = min_t(size_t, count, len - off); > + memcpy(buf, str + off, ret); > + } So the problem is that the error dump disappeared while it was being read? That seems like it cause more problems than just this out-of-range access. E.g. what if the dump was freed and a new one created? The entity doing the partial reads would end up with half of one dump and half of the next. I think we should at least add a FIXME comment to the code that fast recycling dumps could cause inconsistent sysfs reads. I guess you could do something like add a unique id to the gpu coredump structure. Then, when a partial sysfs read occurs starting at zero (i.e. a new read), take a note of the id somewhere (e.g. inside the i915 structure). When the next non-zero read comes in, compare the save id with the current coredump's id and return an error if there is a mis-match. Not sure if this would be viewed as an important enough problem to be worth fixing. I'd be happy with just a FIXME comment for now. I would also change the test to be 'if (off)' rather than 'if (off < len)'. Technically, the user could have read the first 10 bytes of a coredump and then gets "tate collected\n" as the remainder, for example. If we ensure that 'off' is zero then we know that this is a fresh read from scratch. John. > } > > return ret;
On 3/1/2022 1:37 PM, John Harrison wrote: > On 2/25/2022 22:27, Alan Previn wrote: >> ... >> This fixes a kernel page fault can happen when >> multiple tests are running concurrently in a loop >> and one is producing engine resets and consuming >> the i915 error_state dump while the other is >> forcing full GT resets. (takes a while to trigger). > Does need a fixes tag given that it is fixing a bug in an earlier > patch. Especially when it is a kernel BUG. > E.g.: > 13:23> dim fixes 0e39037b31655 > Fixes: 0e39037b3165 ("drm/i915: Cache the error string") > okay, will add that. diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c >> index a4d1759375b9..c40e51298066 100644 >> --- a/drivers/gpu/drm/i915/i915_sysfs.c >> +++ b/drivers/gpu/drm/i915/i915_sysfs.c >> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file >> *filp, struct kobject *kobj, >> struct device *kdev = kobj_to_dev(kobj); >> struct drm_i915_private *i915 = kdev_minor_to_i915(kdev); >> struct i915_gpu_coredump *gpu; >> - ssize_t ret; >> + ssize_t ret = 0; >> gpu = i915_first_error_state(i915); >> if (IS_ERR(gpu)) { >> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file >> *filp, struct kobject *kobj, >> const char *str = "No error state collected\n"; >> size_t len = strlen(str); >> - ret = min_t(size_t, count, len - off); >> - memcpy(buf, str + off, ret); >> + if (off < len) { >> + ret = min_t(size_t, count, len - off); >> + memcpy(buf, str + off, ret); >> + } > So the problem is that the error dump disappeared while it was being > read? That seems like it cause more problems than just this > out-of-range access. E.g. what if the dump was freed and a new one > created? The entity doing the partial reads would end up with half of > one dump and half of the next. > > I think we should at least add a FIXME comment to the code that fast > recycling dumps could cause inconsistent sysfs reads. > > I guess you could do something like add a unique id to the gpu > coredump structure. Then, when a partial sysfs read occurs starting at > zero (i.e. a new read), take a note of the id somewhere (e.g. inside > the i915 structure). When the next non-zero read comes in, compare the > save id with the current coredump's id and return an error if there is > a mis-match. > > Not sure if this would be viewed as an important enough problem to be > worth fixing. I'd be happy with just a FIXME comment for now. For now I shall add a FIXME additional checks might impact IGT's that currently dump and check the error state. I would assume with that additional check in kernel, we would need to return a specific error value like -ENODATA or -ENOLINK and handle it accordingly in the igt. > > I would also change the test to be 'if (off)' rather than 'if (off < > len)'. Technically, the user could have read the first 10 bytes of a > coredump and then gets "tate collected\n" as the remainder, for > example. If we ensure that 'off' is zero then we know that this is a > fresh read from scratch. > > John. > I'm a little confused, did u mean: in the case we dont have a gpu-state to report, and then the user offset is zero (i.e. "if (!off)" ) then we copy the string vs if we dont have a gpu-state to report and the user-offset is non-zero, then we return an error (like the -ENOLINK?). If thats what you meant, then it does mean we are assuming that (in the case we dont have a gpu-state) the user will never come in with a first-time-read where the user's buffer size of less than the string length and be expected continue to read the rest of the string-length. This i guess is okay since even 6 chars are enough to say "No err". :) >> } >> return ret; >
On 3/8/2022 11:47, Teres Alexis, Alan Previn wrote: > On 3/1/2022 1:37 PM, John Harrison wrote: >> On 2/25/2022 22:27, Alan Previn wrote: >>> ... >>> This fixes a kernel page fault can happen when >>> multiple tests are running concurrently in a loop >>> and one is producing engine resets and consuming >>> the i915 error_state dump while the other is >>> forcing full GT resets. (takes a while to trigger). >> Does need a fixes tag given that it is fixing a bug in an earlier >> patch. Especially when it is a kernel BUG. >> E.g.: >> 13:23> dim fixes 0e39037b31655 >> Fixes: 0e39037b3165 ("drm/i915: Cache the error string") >> > okay, will add that. > > diff --git a/drivers/gpu/drm/i915/i915_sysfs.c > b/drivers/gpu/drm/i915/i915_sysfs.c >>> index a4d1759375b9..c40e51298066 100644 >>> --- a/drivers/gpu/drm/i915/i915_sysfs.c >>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c >>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file >>> *filp, struct kobject *kobj, >>> struct device *kdev = kobj_to_dev(kobj); >>> struct drm_i915_private *i915 = kdev_minor_to_i915(kdev); >>> struct i915_gpu_coredump *gpu; >>> - ssize_t ret; >>> + ssize_t ret = 0; >>> gpu = i915_first_error_state(i915); >>> if (IS_ERR(gpu)) { >>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file >>> *filp, struct kobject *kobj, >>> const char *str = "No error state collected\n"; >>> size_t len = strlen(str); >>> - ret = min_t(size_t, count, len - off); >>> - memcpy(buf, str + off, ret); >>> + if (off < len) { >>> + ret = min_t(size_t, count, len - off); >>> + memcpy(buf, str + off, ret); >>> + } >> So the problem is that the error dump disappeared while it was being >> read? That seems like it cause more problems than just this >> out-of-range access. E.g. what if the dump was freed and a new one >> created? The entity doing the partial reads would end up with half of >> one dump and half of the next. >> >> I think we should at least add a FIXME comment to the code that fast >> recycling dumps could cause inconsistent sysfs reads. >> >> I guess you could do something like add a unique id to the gpu >> coredump structure. Then, when a partial sysfs read occurs starting >> at zero (i.e. a new read), take a note of the id somewhere (e.g. >> inside the i915 structure). When the next non-zero read comes in, >> compare the save id with the current coredump's id and return an >> error if there is a mis-match. >> >> Not sure if this would be viewed as an important enough problem to be >> worth fixing. I'd be happy with just a FIXME comment for now. > For now I shall add a FIXME additional checks might impact IGT's that > currently dump and check the error state. I would assume with that > additional check in kernel, we would need to return a specific error > value like -ENODATA or -ENOLINK and handle it accordingly in the igt. If an an extra check against returning invalid data affects an existing IGT then that IGT is already broken. The check would to prevent userland reading the first half of one capture and then getting the second half of a completely different one. If that is already happening then the returned data is meaningless gibberish and even if the IGT says it is passing, it really isn't. >> >> I would also change the test to be 'if (off)' rather than 'if (off < >> len)'. Technically, the user could have read the first 10 bytes of a >> coredump and then gets "tate collected\n" as the remainder, for >> example. If we ensure that 'off' is zero then we know that this is a >> fresh read from scratch. >> >> John. >> > I'm a little confused, did u mean: in the case we dont have a > gpu-state to report, and then the user offset is zero (i.e. "if > (!off)" ) then we copy the string vs if we dont have a gpu-state to > report and the user-offset is non-zero, then we return an error (like > the -ENOLINK?). If thats what you meant, then it does mean we are > assuming that (in the case we dont have a gpu-state) the user will > never come in with a first-time-read where the user's buffer size of > less than the string length and be expected continue to read the rest > of the string-length. This i guess is okay since even 6 chars are > enough to say "No err". :) Sorry, yes. That was meant to say 'if (!off)'. Hmm, I guess technically the user could be issuing single byte reads. User's can be evil. Okay. Maybe just add a FIXME about needing to check for changed/missing/new error state since last read if the offset is non-zero and otherwise leave it as is. John. >>> } >>> return ret; >>
On 3/9/2022 5:18 PM, John Harrison wrote: > On 3/8/2022 11:47, Teres Alexis, Alan Previn wrote: >> On 3/1/2022 1:37 PM, John Harrison wrote: >>> On 2/25/2022 22:27, Alan Previn wrote: >>>> ... >>>> This fixes a kernel page fault can happen when >>>> multiple tests are running concurrently in a loop >>>> and one is producing engine resets and consuming >>>> the i915 error_state dump while the other is >>>> forcing full GT resets. (takes a while to trigger). >>> Does need a fixes tag given that it is fixing a bug in an earlier >>> patch. Especially when it is a kernel BUG. >>> E.g.: >>> 13:23> dim fixes 0e39037b31655 >>> Fixes: 0e39037b3165 ("drm/i915: Cache the error string") >>> >> okay, will add that. >> >> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c >> b/drivers/gpu/drm/i915/i915_sysfs.c >>>> index a4d1759375b9..c40e51298066 100644 >>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c >>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c >>>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file >>>> *filp, struct kobject *kobj, >>>> struct device *kdev = kobj_to_dev(kobj); >>>> struct drm_i915_private *i915 = kdev_minor_to_i915(kdev); >>>> struct i915_gpu_coredump *gpu; >>>> - ssize_t ret; >>>> + ssize_t ret = 0; >>>> gpu = i915_first_error_state(i915); >>>> if (IS_ERR(gpu)) { >>>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file >>>> *filp, struct kobject *kobj, >>>> const char *str = "No error state collected\n"; >>>> size_t len = strlen(str); >>>> - ret = min_t(size_t, count, len - off); >>>> - memcpy(buf, str + off, ret); >>>> + if (off < len) { >>>> + ret = min_t(size_t, count, len - off); >>>> + memcpy(buf, str + off, ret); >>>> + } >>> So the problem is that the error dump disappeared while it was being >>> read? That seems like it cause more problems than just this >>> out-of-range access. E.g. what if the dump was freed and a new one >>> created? The entity doing the partial reads would end up with half >>> of one dump and half of the next. >>> >>> I think we should at least add a FIXME comment to the code that fast >>> recycling dumps could cause inconsistent sysfs reads. >>> >>> I guess you could do something like add a unique id to the gpu >>> coredump structure. Then, when a partial sysfs read occurs starting >>> at zero (i.e. a new read), take a note of the id somewhere (e.g. >>> inside the i915 structure). When the next non-zero read comes in, >>> compare the save id with the current coredump's id and return an >>> error if there is a mis-match. >>> >>> Not sure if this would be viewed as an important enough problem to >>> be worth fixing. I'd be happy with just a FIXME comment for now. >> For now I shall add a FIXME additional checks might impact IGT's that >> currently dump and check the error state. I would assume with that >> additional check in kernel, we would need to return a specific error >> value like -ENODATA or -ENOLINK and handle it accordingly in the igt. > If an an extra check against returning invalid data affects an > existing IGT then that IGT is already broken. The check would to > prevent userland reading the first half of one capture and then > getting the second half of a completely different one. If that is > already happening then the returned data is meaningless gibberish and > even if the IGT says it is passing, it really isn't. > > >>> >>> I would also change the test to be 'if (off)' rather than 'if (off < >>> len)'. Technically, the user could have read the first 10 bytes of a >>> coredump and then gets "tate collected\n" as the remainder, for >>> example. If we ensure that 'off' is zero then we know that this is a >>> fresh read from scratch. >>> >>> John. >>> >> I'm a little confused, did u mean: in the case we dont have a >> gpu-state to report, and then the user offset is zero (i.e. "if >> (!off)" ) then we copy the string vs if we dont have a gpu-state to >> report and the user-offset is non-zero, then we return an error (like >> the -ENOLINK?). If thats what you meant, then it does mean we are >> assuming that (in the case we dont have a gpu-state) the user will >> never come in with a first-time-read where the user's buffer size of >> less than the string length and be expected continue to read the rest >> of the string-length. This i guess is okay since even 6 chars are >> enough to say "No err". :) > Sorry, yes. That was meant to say 'if (!off)'. > > Hmm, I guess technically the user could be issuing single byte reads. > User's can be evil. > > Okay. Maybe just add a FIXME about needing to check for > changed/missing/new error state since last read if the offset is > non-zero and otherwise leave it as is. > > John. > Sounds good - will do. Apologies on the tardiness with the responses. > >>>> } >>>> return ret; >>> >
i missed something in rev3, but rev4 ended up as a different series. Please review this new URL for this patch. Apologies for the confusion: https://patchwork.freedesktop.org/series/101256/ ...alan On 3/9/2022 5:45 PM, Teres Alexis, Alan Previn wrote: > > On 3/9/2022 5:18 PM, John Harrison wrote: >> On 3/8/2022 11:47, Teres Alexis, Alan Previn wrote: >>> On 3/1/2022 1:37 PM, John Harrison wrote: >>>> On 2/25/2022 22:27, Alan Previn wrote: >>>>> ... >>>>> This fixes a kernel page fault can happen when >>>>> multiple tests are running concurrently in a loop >>>>> and one is producing engine resets and consuming >>>>> the i915 error_state dump while the other is >>>>> forcing full GT resets. (takes a while to trigger). >>>> Does need a fixes tag given that it is fixing a bug in an earlier >>>> patch. Especially when it is a kernel BUG. >>>> E.g.: >>>> 13:23> dim fixes 0e39037b31655 >>>> Fixes: 0e39037b3165 ("drm/i915: Cache the error string") >>>> >>> okay, will add that. >>> >>> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c >>> b/drivers/gpu/drm/i915/i915_sysfs.c >>>>> index a4d1759375b9..c40e51298066 100644 >>>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c >>>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c >>>>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file >>>>> *filp, struct kobject *kobj, >>>>> struct device *kdev = kobj_to_dev(kobj); >>>>> struct drm_i915_private *i915 = kdev_minor_to_i915(kdev); >>>>> struct i915_gpu_coredump *gpu; >>>>> - ssize_t ret; >>>>> + ssize_t ret = 0; >>>>> gpu = i915_first_error_state(i915); >>>>> if (IS_ERR(gpu)) { >>>>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file >>>>> *filp, struct kobject *kobj, >>>>> const char *str = "No error state collected\n"; >>>>> size_t len = strlen(str); >>>>> - ret = min_t(size_t, count, len - off); >>>>> - memcpy(buf, str + off, ret); >>>>> + if (off < len) { >>>>> + ret = min_t(size_t, count, len - off); >>>>> + memcpy(buf, str + off, ret); >>>>> + } >>>> So the problem is that the error dump disappeared while it was >>>> being read? That seems like it cause more problems than just this >>>> out-of-range access. E.g. what if the dump was freed and a new one >>>> created? The entity doing the partial reads would end up with half >>>> of one dump and half of the next. >>>> >>>> I think we should at least add a FIXME comment to the code that >>>> fast recycling dumps could cause inconsistent sysfs reads. >>>> >>>> I guess you could do something like add a unique id to the gpu >>>> coredump structure. Then, when a partial sysfs read occurs starting >>>> at zero (i.e. a new read), take a note of the id somewhere (e.g. >>>> inside the i915 structure). When the next non-zero read comes in, >>>> compare the save id with the current coredump's id and return an >>>> error if there is a mis-match. >>>> >>>> Not sure if this would be viewed as an important enough problem to >>>> be worth fixing. I'd be happy with just a FIXME comment for now. >>> For now I shall add a FIXME additional checks might impact IGT's >>> that currently dump and check the error state. I would assume with >>> that additional check in kernel, we would need to return a specific >>> error value like -ENODATA or -ENOLINK and handle it accordingly in >>> the igt. >> If an an extra check against returning invalid data affects an >> existing IGT then that IGT is already broken. The check would to >> prevent userland reading the first half of one capture and then >> getting the second half of a completely different one. If that is >> already happening then the returned data is meaningless gibberish and >> even if the IGT says it is passing, it really isn't. >> >> >>>> >>>> I would also change the test to be 'if (off)' rather than 'if (off >>>> < len)'. Technically, the user could have read the first 10 bytes >>>> of a coredump and then gets "tate collected\n" as the remainder, >>>> for example. If we ensure that 'off' is zero then we know that this >>>> is a fresh read from scratch. >>>> >>>> John. >>>> >>> I'm a little confused, did u mean: in the case we dont have a >>> gpu-state to report, and then the user offset is zero (i.e. "if >>> (!off)" ) then we copy the string vs if we dont have a gpu-state to >>> report and the user-offset is non-zero, then we return an error >>> (like the -ENOLINK?). If thats what you meant, then it does mean we >>> are assuming that (in the case we dont have a gpu-state) the user >>> will never come in with a first-time-read where the user's buffer >>> size of less than the string length and be expected continue to read >>> the rest of the string-length. This i guess is okay since even 6 >>> chars are enough to say "No err". :) >> Sorry, yes. That was meant to say 'if (!off)'. >> >> Hmm, I guess technically the user could be issuing single byte reads. >> User's can be evil. >> >> Okay. Maybe just add a FIXME about needing to check for >> changed/missing/new error state since last read if the offset is >> non-zero and otherwise leave it as is. >> >> John. >> > Sounds good - will do. Apologies on the tardiness with the responses. >> >>>>> } >>>>> return ret; >>>> >>
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c index a4d1759375b9..c40e51298066 100644 --- a/drivers/gpu/drm/i915/i915_sysfs.c +++ b/drivers/gpu/drm/i915/i915_sysfs.c @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj, struct device *kdev = kobj_to_dev(kobj); struct drm_i915_private *i915 = kdev_minor_to_i915(kdev); struct i915_gpu_coredump *gpu; - ssize_t ret; + ssize_t ret = 0; gpu = i915_first_error_state(i915); if (IS_ERR(gpu)) { @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj, const char *str = "No error state collected\n"; size_t len = strlen(str); - ret = min_t(size_t, count, len - off); - memcpy(buf, str + off, ret); + if (off < len) { + ret = min_t(size_t, count, len - off); + memcpy(buf, str + off, ret); + } } return ret;
Fix our pointer offset usage in error_state_read when there is no i915_gpu_coredump but buf offset is non-zero. This fixes a kernel page fault can happen when multiple tests are running concurrently in a loop and one is producing engine resets and consuming the i915 error_state dump while the other is forcing full GT resets. (takes a while to trigger). The dmesg call trace: 5014 [ 5590.803000] BUG: unable to handle page fault for address: ffffffffa0b0e000 5015 [ 5590.803009] #PF: supervisor read access in kernel mode 5016 [ 5590.803013] #PF: error_code(0x0000) - not-present page 5017 [ 5590.803016] PGD 5814067 P4D 5814067 PUD 5815063 PMD 109de4067 PTE 0 5018 [ 5590.803022] Oops: 0000 [#1] PREEMPT SMP NOPTI 5019 [ 5590.803026] CPU: 5 PID: 13656 Comm: i915_hangman Tainted: G U 5.17.0-rc5- ups69-guc-err-capt-rev6+ #136 5020 [ 5590.803033] Hardware name: Intel Corporation Alder Lake Client Platform/ AlderLake-M LP4x RVP, BIOS ADLPFWI1.R00.3031.A02.2201171222 01/17/2022 5021 [ 5590.803039] RIP: 0010:memcpy_erms+0x6/0x10 5022 [ 5590.803045] Code: fe ff ff cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 5023 [ 5590.803054] RSP: 0018:ffffc90003a8fdf0 EFLAGS: 00010282 5024 [ 5590.803057] RAX: ffff888107ee9000 RBX: ffff888108cb1a00 RCX: 0000000000000f8f 5025 [ 5590.803061] RDX: 0000000000001000 RSI: ffffffffa0b0e000 RDI: ffff888107ee9071 5026 [ 5590.803065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 5027 [ 5590.803069] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000019 5028 [ 5590.803073] R13: 0000000000174fff R14: 0000000000001000 R15: ffff888107ee9000 5029 [ 5590.803077] FS: 00007f62a99bee80(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000 5030 [ 5590.803082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 5031 [ 5590.803085] CR2: ffffffffa0b0e000 CR3: 000000010a1a8004 CR4: 0000000000770ee0 5032 [ 5590.803089] PKRU: 55555554 5033 [ 5590.803091] Call Trace: 5034 [ 5590.803093] <TASK> 5035 [ 5590.803096] error_state_read+0xa1/0xd0 [i915] 5036 [ 5590.803175] kernfs_fop_read_iter+0xb2/0x1b0 5037 [ 5590.803180] new_sync_read+0x116/0x1a0 5038 [ 5590.803185] vfs_read+0x114/0x1b0 5039 [ 5590.803189] ksys_read+0x63/0xe0 5040 [ 5590.803193] do_syscall_64+0x38/0xc0 5041 [ 5590.803197] entry_SYSCALL_64_after_hwframe+0x44/0xae 5042 [ 5590.803201] RIP: 0033:0x7f62aaea5912 5043 [ 5590.803204] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a b9 0c 00 e8 05 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 5044 [ 5590.803213] RSP: 002b:00007fff5b659ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 5045 [ 5590.803218] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f62aaea5912 5046 [ 5590.803221] RDX: 000000000008b000 RSI: 00007f62a8c4000f RDI: 0000000000000006 5047 [ 5590.803225] RBP: 00007f62a8bcb00f R08: 0000000000200010 R09: 0000000000101000 5048 [ 5590.803229] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000006 5049 [ 5590.803233] R13: 0000000000075000 R14: 00007f62a8acb010 R15: 0000000000200000 5050 [ 5590.803238] </TASK> 5051 [ 5590.803240] Modules linked in: i915 ttm drm_buddy drm_dp_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers nfnetlink br_netfilter overlay mei_pxp mei_hdcp x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei fuse ip_tables x_tables crct10dif_pclmul e1000e crc32_pclmul ptp i2c_i801 ghash_clmulni_intel i2c_smbus pps_core [last unloa ded: ttm] 5052 [ 5590.803277] CR2: ffffffffa0b0e000 5053 [ 5590.803280] ---[ end trace 0000000000000000 ]--- Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> --- drivers/gpu/drm/i915/i915_sysfs.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)