Message ID | 20240821143826.3720-5-daniel.almeida@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Panthor devcoredump support | expand |
Hi Daniel, > On 21.08.2024 11:37, Daniel Almeida wrote: > It can be advantageous for userspace to have access to successful jobs. While it's true that perhaps having additional jobs as part of the same devcoredump file could provide further information as to why a later job failed, I see a few snags with this approach: - The time since the debugfs knob is triggered and therefore some successful jobs dumped until a later job fails might be very long, so the preceding jobs maybe won't provide much context. - Besides being mostly interested in immediately preceding jobs, I think we'd want these to belong to the same scheduling group and VM as the failing job, but this approach will dump them all consecutively, even if they belong to a different open DRM file. - In my experience writing a similar feature for the Panfrost driver, I think it's best to wait until users of the driver run into specific bugs for us to come up with debug features that would be useful for them, rather than sort of trying to guess them instead, because there's the risk they'll never be used and then just add cruft into the codebase. Other than that, the preceding patches look absolutely gorgeous to me, so I think it's best if you resubmit them, and maybe keep patches 3-5 on hold until we run into a bug scenario where they might prove useful. Cheers, Adrian > Allow this as an opt-in through a debugfs file. > > Note that the devcoredump infrastructure will discard new dumps if a > previous dump hasn't been read. A future patch will add support for > multi-job dumps which will work around this limitation. > > Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com> > --- > drivers/gpu/drm/panthor/panthor_sched.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index afd644c7d9f1..ea2696c1075a 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -10,6 +10,7 @@ > > #include <linux/build_bug.h> > #include <linux/clk.h> > +#include <linux/debugfs.h> > #include <linux/delay.h> > #include <linux/dma-mapping.h> > #include <linux/dma-resv.h> > @@ -317,6 +318,9 @@ struct panthor_scheduler { > */ > struct list_head stopped_groups; > } reset; > + > + /** @dump_successful_jobs: whether to dump successful jobs through coredumpv */ > + bool dump_successful_jobs; > }; > > /** > @@ -2946,6 +2950,16 @@ queue_run_job(struct drm_sched_job *sched_job) > queue->iface.input->extract = queue->iface.output->extract; > queue->iface.input->insert = job->ringbuf.end; > > + if (sched->dump_successful_jobs) { > + struct panthor_core_dump_args core_dump_args = { > + .ptdev = ptdev, > + .group_vm = job->group->vm, > + .group = job->group, > + }; > + > + panthor_core_dump(&core_dump_args); > + } > + > if (group->csg_id < 0) { > /* If the queue is blocked, we want to keep the timeout running, so we > * can detect unbounded waits and kill the group when that happens. > @@ -3609,5 +3623,8 @@ void panthor_sched_debugfs_init(struct drm_minor *minor) > struct panthor_device *ptdev = > container_of(minor->dev, struct panthor_device, base); > struct panthor_scheduler *sched = ptdev->scheduler; > + > + debugfs_create_bool("dump_successful_jobs", 0644, minor->debugfs_root, > + &sched->dump_successful_jobs); > } > #endif /* CONFIG_DEBUG_FS */ > -- > 2.45.2
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index afd644c7d9f1..ea2696c1075a 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -10,6 +10,7 @@ #include <linux/build_bug.h> #include <linux/clk.h> +#include <linux/debugfs.h> #include <linux/delay.h> #include <linux/dma-mapping.h> #include <linux/dma-resv.h> @@ -317,6 +318,9 @@ struct panthor_scheduler { */ struct list_head stopped_groups; } reset; + + /** @dump_successful_jobs: whether to dump successful jobs through coredumpv */ + bool dump_successful_jobs; }; /** @@ -2946,6 +2950,16 @@ queue_run_job(struct drm_sched_job *sched_job) queue->iface.input->extract = queue->iface.output->extract; queue->iface.input->insert = job->ringbuf.end; + if (sched->dump_successful_jobs) { + struct panthor_core_dump_args core_dump_args = { + .ptdev = ptdev, + .group_vm = job->group->vm, + .group = job->group, + }; + + panthor_core_dump(&core_dump_args); + } + if (group->csg_id < 0) { /* If the queue is blocked, we want to keep the timeout running, so we * can detect unbounded waits and kill the group when that happens. @@ -3609,5 +3623,8 @@ void panthor_sched_debugfs_init(struct drm_minor *minor) struct panthor_device *ptdev = container_of(minor->dev, struct panthor_device, base); struct panthor_scheduler *sched = ptdev->scheduler; + + debugfs_create_bool("dump_successful_jobs", 0644, minor->debugfs_root, + &sched->dump_successful_jobs); } #endif /* CONFIG_DEBUG_FS */
It can be advantageous for userspace to have access to successful jobs. Allow this as an opt-in through a debugfs file. Note that the devcoredump infrastructure will discard new dumps if a previous dump hasn't been read. A future patch will add support for multi-job dumps which will work around this limitation. Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com> --- drivers/gpu/drm/panthor/panthor_sched.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)