diff mbox series

[v2,3/5] drm/i915/guc: Provide debugfs for log relay sub-buf info

Message ID 20221206092100.274195-4-alan.previn.teres.alexis@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Update GuC relay logging debugfs | expand

Commit Message

Alan Previn Dec. 6, 2022, 9:20 a.m. UTC
In order to provide alignment between IGT intel_guc_logger tool and
i915 kernel's guc log relay channels without requiring updating both
repositories everytime a sizing change is made, provide that info
via debugfs files.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 10 ++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  2 ++
 .../drm/i915/gt/uc/intel_guc_log_debugfs.c    | 34 +++++++++++++++++++
 3 files changed, 44 insertions(+), 2 deletions(-)

Comments

Dixit, Ashutosh Dec. 7, 2022, 4:43 p.m. UTC | #1
On Tue, 06 Dec 2022 01:20:58 -0800, Alan Previn wrote:
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
> index ddfbe334689f..27756640338e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
> @@ -105,6 +105,38 @@ DEFINE_SIMPLE_ATTRIBUTE(guc_log_level_fops,
>			guc_log_level_get, guc_log_level_set,
>			"%lld\n");
>
> +static int guc_log_relay_subbuf_size_get(void *data, u64 *val)
> +{
> +	struct intel_guc_log *log = data;
> +
> +	if (!log->vma)
> +		return -ENODEV;

For the record, from the other email thread, the issue here is whether this
check is needed.

Also, the issue is what happens if the relay is open and we unload the
module, what happens?

> +
> +	*val = (u64)intel_guc_log_size(log);

Don't cast, shouldn't need it.

> +
> +	return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(guc_log_relay_subbuf_size_fops,
> +			guc_log_relay_subbuf_size_get, NULL,
> +			"%lld\n");
> +
> +static int guc_log_relay_subbuf_count_get(void *data, u64 *val)
> +{
> +	struct intel_guc_log *log = data;
> +
> +	if (!log->vma)
> +		return -ENODEV;

Same for this check too.

> +
> +	*val = intel_guc_log_relay_subbuf_count(log);
> +
> +	return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(guc_log_relay_subbuf_count_fops,
> +			guc_log_relay_subbuf_count_get, NULL,
> +			"%lld\n");
> +
>  static int guc_log_relay_open(struct inode *inode, struct file *file)
>  {
>	struct intel_guc_log *log = inode->i_private;
> @@ -166,6 +198,8 @@ void intel_guc_log_debugfs_register(struct intel_guc_log *log,
>		{ "guc_load_err_log_dump", &guc_load_err_log_dump_fops, NULL },
>		{ "guc_log_level", &guc_log_level_fops, NULL },
>		{ "guc_log_relay", &guc_log_relay_fops, NULL },
> +		{ "guc_log_relay_subbuf_size", &guc_log_relay_subbuf_size_fops, NULL },
> +		{ "guc_log_relay_subbuf_count", &guc_log_relay_subbuf_count_fops, NULL },
>	};
>
>	if (!intel_guc_is_supported(log_to_guc(log)))

Thanks.
--
Ashutosh
Alan Previn March 9, 2023, 11:29 p.m. UTC | #2
Finally got some time to relook at this series. Responses.
I'll re-rev this and re-connect with Ashutosh offline considering how long i've been silent on this.


On Wed, 2022-12-07 at 08:43 -0800, Dixit, Ashutosh wrote:
> On Tue, 06 Dec 2022 01:20:58 -0800, Alan Previn wrote:
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
> > index ddfbe334689f..27756640338e 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
> > @@ -105,6 +105,38 @@ DEFINE_SIMPLE_ATTRIBUTE(guc_log_level_fops,
> > 			guc_log_level_get, guc_log_level_set,
> > 			"%lld\n");
> > 
> > +static int guc_log_relay_subbuf_size_get(void *data, u64 *val)
> > +{
> > +	struct intel_guc_log *log = data;
> > +
> > +	if (!log->vma)
> > +		return -ENODEV;
> 
> For the record, from the other email thread, the issue here is whether this
> check is needed.
> 
> Also, the issue is what happens if the relay is open and we unload the
> module, what happens?
> 
I'll retest this - but I clearly remember that if the user space app was stil holding
onto the debugfs handle, the i915 unload would go through most of the driver unload /
unregister steps, while the app doesnt get any signals but if the app were to close that
handle after that, (guc_log_relay_ctl_release gets called), we do get invalid ptr access
in kernel. Take note the logger tool runs with sudo. That said something "like" above check
is required but perhaps hanging off a still-valid ptr (like i915->foo - maybe gt-struct validity
- but needs something that is explicitly cleared on unload, not left around with stale ptrs.

>
Alan Previn March 14, 2023, 10:09 p.m. UTC | #3
On Thu, 2023-03-09 at 15:29 -0800, Teres Alexis, Alan Previn wrote:
> > > 
alan:snip

> > > +static int guc_log_relay_subbuf_size_get(void *data, u64 *val)
> > > +{
> > > +	struct intel_guc_log *log = data;
> > > +
> > > +	if (!log->vma)
> > > +		return -ENODEV;
> > 
> > For the record, from the other email thread, the issue here is whether this
> > check is needed.
> > 
> > Also, the issue is what happens if the relay is open and we unload the
> > module, what happens?
> > 
> I'll retest this - but I clearly remember that if the user space app was stil holding
> onto the debugfs handle, the i915 unload would go through most of the driver unload /
> unregister steps, while the app doesnt get any signals but if the app were to close that
> handle after that, (guc_log_relay_ctl_release gets called), we do get invalid ptr access
> in kernel. Take note the logger tool runs with sudo. That said something "like" above check
> is required but perhaps hanging off a still-valid ptr (like i915->foo - maybe gt-struct validity
> - but needs something that is explicitly cleared on unload, not left around with stale ptrs.
> 

An update on this above after some digging / testing : I believe we dont we need to check
for "log->vma" validity as you had suspected. However, I did find other legacy debugfs
functions for relay logging that DID check for it - so I must have been trying to maintain
consistency. That said, i will probably remove the check from other legacy functions as well
so they are all consistently not checking for it since its not required.

However, in the process of testing, i found an issue when connecting relay logger tool
and unloading driver. On one hand this is a debugfs interface and we may be able to fix that
later as the use-case doesnt really expect used to run this tool while unloading the driver.
On the other hand some of my colleagues did stress that crashing in kernel is something we cannot
igore and knowably allow. Considering the fact that relay logging tool is not working at all
upstream today, this patch could "unmask" that error. Finally, i too find myself, as part of testing /
debugging, occasionally forgetting to stop the relay logger tool when unloading and i cant even do
simple soft-reboot because of how bad things get in the i915. Given all considerations, I'm compelled
to fix that properly now. Previously, the majority of the time taken for this series was mostly
tied to the intel_guc_logger side of the effort, not the kernel changes. But for this fix, i think
more time + changes will be required on the kernel side.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 2fa952916120..6e880d9f42d4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -141,7 +141,7 @@  u32 intel_guc_log_section_size_capture(struct intel_guc_log *log)
 	return log->sizes[GUC_LOG_SECTIONS_CAPTURE].bytes;
 }
 
-static u32 intel_guc_log_size(struct intel_guc_log *log)
+u32 intel_guc_log_size(struct intel_guc_log *log)
 {
 	/*
 	 *  GuC Log buffer Layout:
@@ -170,6 +170,12 @@  static u32 intel_guc_log_size(struct intel_guc_log *log)
 		intel_guc_log_section_size_capture(log);
 }
 
+#define GUC_LOG_RELAY_SUBBUF_COUNT 8
+u32 intel_guc_log_relay_subbuf_count(struct intel_guc_log *log)
+{
+	return GUC_LOG_RELAY_SUBBUF_COUNT;
+}
+
 /**
  * DOC: GuC firmware log
  *
@@ -543,7 +549,7 @@  static int guc_log_relay_create(struct intel_guc_log *log)
 	 * latency, for consuming the logs from relay. Also doesn't take
 	 * up too much memory.
 	 */
-	n_subbufs = 8;
+	n_subbufs = intel_guc_log_relay_subbuf_count(log);
 
 	guc_log_relay_chan = relay_open("guc_log",
 					dev_priv->drm.primary->debugfs_root,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
index 02127703be80..c981eb8f4990 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
@@ -81,6 +81,8 @@  unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
 size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log, enum guc_log_buffer_type type);
 int intel_guc_log_create(struct intel_guc_log *log);
 void intel_guc_log_destroy(struct intel_guc_log *log);
+u32 intel_guc_log_size(struct intel_guc_log *log);
+u32 intel_guc_log_relay_subbuf_count(struct intel_guc_log *log);
 
 int intel_guc_log_set_level(struct intel_guc_log *log, u32 level);
 bool intel_guc_log_relay_created(const struct intel_guc_log *log);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
index ddfbe334689f..27756640338e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c
@@ -105,6 +105,38 @@  DEFINE_SIMPLE_ATTRIBUTE(guc_log_level_fops,
 			guc_log_level_get, guc_log_level_set,
 			"%lld\n");
 
+static int guc_log_relay_subbuf_size_get(void *data, u64 *val)
+{
+	struct intel_guc_log *log = data;
+
+	if (!log->vma)
+		return -ENODEV;
+
+	*val = (u64)intel_guc_log_size(log);
+
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(guc_log_relay_subbuf_size_fops,
+			guc_log_relay_subbuf_size_get, NULL,
+			"%lld\n");
+
+static int guc_log_relay_subbuf_count_get(void *data, u64 *val)
+{
+	struct intel_guc_log *log = data;
+
+	if (!log->vma)
+		return -ENODEV;
+
+	*val = intel_guc_log_relay_subbuf_count(log);
+
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(guc_log_relay_subbuf_count_fops,
+			guc_log_relay_subbuf_count_get, NULL,
+			"%lld\n");
+
 static int guc_log_relay_open(struct inode *inode, struct file *file)
 {
 	struct intel_guc_log *log = inode->i_private;
@@ -166,6 +198,8 @@  void intel_guc_log_debugfs_register(struct intel_guc_log *log,
 		{ "guc_load_err_log_dump", &guc_load_err_log_dump_fops, NULL },
 		{ "guc_log_level", &guc_log_level_fops, NULL },
 		{ "guc_log_relay", &guc_log_relay_fops, NULL },
+		{ "guc_log_relay_subbuf_size", &guc_log_relay_subbuf_size_fops, NULL },
+		{ "guc_log_relay_subbuf_count", &guc_log_relay_subbuf_count_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(log_to_guc(log)))