diff mbox series

[rdma-core,03/27] mlx5: Enable debug functionality for vfio

Message ID 20210720081647.1980-4-yishaih@nvidia.com (mailing list archive)
State Not Applicable
Headers show
Series Introduce mlx5 user space driver over VFIO | expand

Commit Message

Yishai Hadas July 20, 2021, 8:16 a.m. UTC
From: Maor Gottlieb <maorg@nvidia.com>

Usage will be in next patches.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
 providers/mlx5/mlx5.h |  4 ++++
 2 files changed, 18 insertions(+), 14 deletions(-)

Comments

Leon Romanovsky July 20, 2021, 8:51 a.m. UTC | #1
On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
> From: Maor Gottlieb <maorg@nvidia.com>
> 
> Usage will be in next patches.
> 
> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
>  providers/mlx5/mlx5.h |  4 ++++
>  2 files changed, 18 insertions(+), 14 deletions(-)

Probably, this patch will be needed to be changed after
"Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030

Thanks
Yishai Hadas July 20, 2021, 9:27 a.m. UTC | #2
On 7/20/2021 11:51 AM, Leon Romanovsky wrote:
> On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
>> From: Maor Gottlieb <maorg@nvidia.com>
>>
>> Usage will be in next patches.
>>
>> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
>>   providers/mlx5/mlx5.h |  4 ++++
>>   2 files changed, 18 insertions(+), 14 deletions(-)
> Probably, this patch will be needed to be changed after
> "Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030
>
> Thanks

Well, not really, this patch just reorganizes things inside mlx5 for 
code sharing.

Yishai
Leon Romanovsky July 20, 2021, 12:27 p.m. UTC | #3
On Tue, Jul 20, 2021 at 12:27:46PM +0300, Yishai Hadas wrote:
> On 7/20/2021 11:51 AM, Leon Romanovsky wrote:
> > On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
> > > From: Maor Gottlieb <maorg@nvidia.com>
> > > 
> > > Usage will be in next patches.
> > > 
> > > Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > ---
> > >   providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
> > >   providers/mlx5/mlx5.h |  4 ++++
> > >   2 files changed, 18 insertions(+), 14 deletions(-)
> > Probably, this patch will be needed to be changed after
> > "Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030
> > 
> > Thanks
> 
> Well, not really, this patch just reorganizes things inside mlx5 for code
> sharing.

After Gal's PR, I expect to see all mlx5 file/debug logic gone except
some minimal conversion logic to support old legacy variables.

All that get_env_... code will go.

Thanks

> 
> Yishai
>
Yishai Hadas July 20, 2021, 2:57 p.m. UTC | #4
On 7/20/2021 3:27 PM, Leon Romanovsky wrote:
> On Tue, Jul 20, 2021 at 12:27:46PM +0300, Yishai Hadas wrote:
>> On 7/20/2021 11:51 AM, Leon Romanovsky wrote:
>>> On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
>>>> From: Maor Gottlieb <maorg@nvidia.com>
>>>>
>>>> Usage will be in next patches.
>>>>
>>>> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>> ---
>>>>    providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
>>>>    providers/mlx5/mlx5.h |  4 ++++
>>>>    2 files changed, 18 insertions(+), 14 deletions(-)
>>> Probably, this patch will be needed to be changed after
>>> "Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030
>>>
>>> Thanks
>> Well, not really, this patch just reorganizes things inside mlx5 for code
>> sharing.
> After Gal's PR, I expect to see all mlx5 file/debug logic gone except
> some minimal conversion logic to support old legacy variables.
>
> All that get_env_... code will go.
>
> Thanks
>
Looking on current VERBs logging PR, it doesn't give a clean path 
conversion from mlx5.

For example it missed a debug granularity per object (e.g. QP, CQ, etc.) 
, in addition it doesn't define a driver specific options as we have 
today in mlx5 (e.g. MLX5_DBG_DR).

I believe that this should be added before going forward with the 
logging PR to enable a clean transition later on.

The transition of mlx5 should preserve current API/semantics (ENV, etc.) 
and is an orthogonal task.

Yishai
Gal Pressman July 21, 2021, 7:05 a.m. UTC | #5
On 20/07/2021 17:57, Yishai Hadas wrote:
> On 7/20/2021 3:27 PM, Leon Romanovsky wrote:
>> On Tue, Jul 20, 2021 at 12:27:46PM +0300, Yishai Hadas wrote:
>>> On 7/20/2021 11:51 AM, Leon Romanovsky wrote:
>>>> On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
>>>>> From: Maor Gottlieb <maorg@nvidia.com>
>>>>>
>>>>> Usage will be in next patches.
>>>>>
>>>>> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
>>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>>> ---
>>>>>    providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
>>>>>    providers/mlx5/mlx5.h |  4 ++++
>>>>>    2 files changed, 18 insertions(+), 14 deletions(-)
>>>> Probably, this patch will be needed to be changed after
>>>> "Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030
>>>>
>>>> Thanks
>>> Well, not really, this patch just reorganizes things inside mlx5 for code
>>> sharing.
>> After Gal's PR, I expect to see all mlx5 file/debug logic gone except
>> some minimal conversion logic to support old legacy variables.
>>
>> All that get_env_... code will go.
>>
>> Thanks
>>
> Looking on current VERBs logging PR, it doesn't give a clean path conversion
> from mlx5.
> 
> For example it missed a debug granularity per object (e.g. QP, CQ, etc.) , in
> addition it doesn't define a driver specific options as we have today in mlx5
> (e.g. MLX5_DBG_DR).
> 
> I believe that this should be added before going forward with the logging PR to
> enable a clean transition later on.
> 
> The transition of mlx5 should preserve current API/semantics (ENV, etc.) and is
> an orthogonal task.

Yishai, if you have any more concerns please share them in the PR.. The
discussion there is going on for a while and you've been quiet so I assumed
you're fine with it.

I disagree about needing to support everything that exists in mlx5 today, the
purpose of the generic mechanism is to unify the environment variables, driver
specific options do the opposite. IMO masking out a few prints isn't worth the
divergence.

If the mlx5 provider has backwards compatibility issues it doesn't necessarily
have to use this API, but we can at least covert most existing providers and all
future providers that wish to support such functionality in a unified way.

BTW, why even insist on having backwards compatibility here? Do you actually
have useful code that relies on debug environment variables?
Yishai Hadas July 21, 2021, 7:58 a.m. UTC | #6
On 7/21/2021 10:05 AM, Gal Pressman wrote:
> On 20/07/2021 17:57, Yishai Hadas wrote:
>> On 7/20/2021 3:27 PM, Leon Romanovsky wrote:
>>> On Tue, Jul 20, 2021 at 12:27:46PM +0300, Yishai Hadas wrote:
>>>> On 7/20/2021 11:51 AM, Leon Romanovsky wrote:
>>>>> On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
>>>>>> From: Maor Gottlieb <maorg@nvidia.com>
>>>>>>
>>>>>> Usage will be in next patches.
>>>>>>
>>>>>> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
>>>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>>>> ---
>>>>>>     providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
>>>>>>     providers/mlx5/mlx5.h |  4 ++++
>>>>>>     2 files changed, 18 insertions(+), 14 deletions(-)
>>>>> Probably, this patch will be needed to be changed after
>>>>> "Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030
>>>>>
>>>>> Thanks
>>>> Well, not really, this patch just reorganizes things inside mlx5 for code
>>>> sharing.
>>> After Gal's PR, I expect to see all mlx5 file/debug logic gone except
>>> some minimal conversion logic to support old legacy variables.
>>>
>>> All that get_env_... code will go.
>>>
>>> Thanks
>>>
>> Looking on current VERBs logging PR, it doesn't give a clean path conversion
>> from mlx5.
>>
>> For example it missed a debug granularity per object (e.g. QP, CQ, etc.) , in
>> addition it doesn't define a driver specific options as we have today in mlx5
>> (e.g. MLX5_DBG_DR).
>>
>> I believe that this should be added before going forward with the logging PR to
>> enable a clean transition later on.
>>
>> The transition of mlx5 should preserve current API/semantics (ENV, etc.) and is
>> an orthogonal task.
> Yishai, if you have any more concerns please share them in the PR.. The
> discussion there is going on for a while and you've been quiet so I assumed
> you're fine with it.
>
> I disagree about needing to support everything that exists in mlx5 today, the
> purpose of the generic mechanism is to unify the environment variables, driver
> specific options do the opposite. IMO masking out a few prints isn't worth the
> divergence.


The options in mlx5 gives more granularity and supports vendor specific 
options, this may be needed down the road by other vendors as well.

If we come with a new API need to consider such options in the general case.

NP, I'll comment in the logging PR as well.

>
> If the mlx5 provider has backwards compatibility issues it doesn't necessarily
> have to use this API, but we can at least covert most existing providers and all
> future providers that wish to support such functionality in a unified way.


The point was that current suggestion doesn't allow a clean transition 
for mlx5, so we won't use it unless the API will give a matching 
alternative.

> BTW, why even insist on having backwards compatibility here? Do you actually
> have useful code that relies on debug environment variables?

Logging options (env, mask, etc.) are kind of API which need to be 
preserved, it's used in the field for debug purposes, no reason to loose 
granularity and trace.

Yishai
Gal Pressman July 21, 2021, 8:51 a.m. UTC | #7
On 21/07/2021 10:58, Yishai Hadas wrote:
> On 7/21/2021 10:05 AM, Gal Pressman wrote:
>> On 20/07/2021 17:57, Yishai Hadas wrote:
>>> On 7/20/2021 3:27 PM, Leon Romanovsky wrote:
>>>> On Tue, Jul 20, 2021 at 12:27:46PM +0300, Yishai Hadas wrote:
>>>>> On 7/20/2021 11:51 AM, Leon Romanovsky wrote:
>>>>>> On Tue, Jul 20, 2021 at 11:16:23AM +0300, Yishai Hadas wrote:
>>>>>>> From: Maor Gottlieb <maorg@nvidia.com>
>>>>>>>
>>>>>>> Usage will be in next patches.
>>>>>>>
>>>>>>> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
>>>>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>>>>> ---
>>>>>>>     providers/mlx5/mlx5.c | 28 ++++++++++++++--------------
>>>>>>>     providers/mlx5/mlx5.h |  4 ++++
>>>>>>>     2 files changed, 18 insertions(+), 14 deletions(-)
>>>>>> Probably, this patch will be needed to be changed after
>>>>>> "Verbs logging API" PR https://github.com/linux-rdma/rdma-core/pull/1030
>>>>>>
>>>>>> Thanks
>>>>> Well, not really, this patch just reorganizes things inside mlx5 for code
>>>>> sharing.
>>>> After Gal's PR, I expect to see all mlx5 file/debug logic gone except
>>>> some minimal conversion logic to support old legacy variables.
>>>>
>>>> All that get_env_... code will go.
>>>>
>>>> Thanks
>>>>
>>> Looking on current VERBs logging PR, it doesn't give a clean path conversion
>>> from mlx5.
>>>
>>> For example it missed a debug granularity per object (e.g. QP, CQ, etc.) , in
>>> addition it doesn't define a driver specific options as we have today in mlx5
>>> (e.g. MLX5_DBG_DR).
>>>
>>> I believe that this should be added before going forward with the logging PR to
>>> enable a clean transition later on.
>>>
>>> The transition of mlx5 should preserve current API/semantics (ENV, etc.) and is
>>> an orthogonal task.
>> Yishai, if you have any more concerns please share them in the PR.. The
>> discussion there is going on for a while and you've been quiet so I assumed
>> you're fine with it.
>>
>> I disagree about needing to support everything that exists in mlx5 today, the
>> purpose of the generic mechanism is to unify the environment variables, driver
>> specific options do the opposite. IMO masking out a few prints isn't worth the
>> divergence.
> 
> 
> The options in mlx5 gives more granularity and supports vendor specific options,
> this may be needed down the road by other vendors as well.
> 
> If we come with a new API need to consider such options in the general case.
> 
> NP, I'll comment in the logging PR as well.
> 
>>
>> If the mlx5 provider has backwards compatibility issues it doesn't necessarily
>> have to use this API, but we can at least covert most existing providers and all
>> future providers that wish to support such functionality in a unified way.
> 
> 
> The point was that current suggestion doesn't allow a clean transition for mlx5,
> so we won't use it unless the API will give a matching alternative.
> 
>> BTW, why even insist on having backwards compatibility here? Do you actually
>> have useful code that relies on debug environment variables?
> 
> Logging options (env, mask, etc.) are kind of API which need to be preserved,
> it's used in the field for debug purposes, no reason to loose granularity and
> trace.

It's used in the field by *people*, which unlike legacy code can learn to use
the new debug env variables, they don't need backwards compatibility (the same
as users of debugfs).
I addressed the granularity issue in the PR.
diff mbox series

Patch

diff --git a/providers/mlx5/mlx5.c b/providers/mlx5/mlx5.c
index 46d7748..1abaa8c 100644
--- a/providers/mlx5/mlx5.c
+++ b/providers/mlx5/mlx5.c
@@ -583,7 +583,7 @@  static int get_total_uuars(int page_size)
 	return size;
 }
 
-static void open_debug_file(struct mlx5_context *ctx)
+void mlx5_open_debug_file(FILE **dbg_fp)
 {
 	char *env;
 	FILE *default_dbg_fp = NULL;
@@ -594,25 +594,25 @@  static void open_debug_file(struct mlx5_context *ctx)
 
 	env = getenv("MLX5_DEBUG_FILE");
 	if (!env) {
-		ctx->dbg_fp = default_dbg_fp;
+		*dbg_fp = default_dbg_fp;
 		return;
 	}
 
-	ctx->dbg_fp = fopen(env, "aw+");
-	if (!ctx->dbg_fp) {
-		ctx->dbg_fp = default_dbg_fp;
-		mlx5_err(ctx->dbg_fp, "Failed opening debug file %s\n", env);
+	*dbg_fp = fopen(env, "aw+");
+	if (!*dbg_fp) {
+		*dbg_fp = default_dbg_fp;
+		mlx5_err(*dbg_fp, "Failed opening debug file %s\n", env);
 		return;
 	}
 }
 
-static void close_debug_file(struct mlx5_context *ctx)
+void mlx5_close_debug_file(FILE *dbg_fp)
 {
-	if (ctx->dbg_fp && ctx->dbg_fp != stderr)
-		fclose(ctx->dbg_fp);
+	if (dbg_fp && dbg_fp != stderr)
+		fclose(dbg_fp);
 }
 
-static void set_debug_mask(void)
+void mlx5_set_debug_mask(void)
 {
 	char *env;
 
@@ -2036,7 +2036,7 @@  static int get_uar_info(struct mlx5_device *mdev,
 
 static void mlx5_uninit_context(struct mlx5_context *context)
 {
-	close_debug_file(context);
+	mlx5_close_debug_file(context->dbg_fp);
 
 	verbs_uninit_context(&context->ibv_ctx);
 	free(context);
@@ -2056,8 +2056,8 @@  static struct mlx5_context *mlx5_init_context(struct ibv_device *ibdev,
 	if (!context)
 		return NULL;
 
-	open_debug_file(context);
-	set_debug_mask();
+	mlx5_open_debug_file(&context->dbg_fp);
+	mlx5_set_debug_mask();
 	set_freeze_on_error();
 	if (gethostname(context->hostname, sizeof(context->hostname)))
 		strcpy(context->hostname, "host_unknown");
@@ -2377,7 +2377,7 @@  static void mlx5_free_context(struct ibv_context *ibctx)
 		       page_size);
 	if (context->clock_info_page)
 		munmap((void *)context->clock_info_page, page_size);
-	close_debug_file(context);
+	mlx5_close_debug_file(context->dbg_fp);
 	clean_dyn_uars(ibctx);
 	reserved_qpn_blks_free(context);
 
diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h
index 3862007..7436bc8 100644
--- a/providers/mlx5/mlx5.h
+++ b/providers/mlx5/mlx5.h
@@ -992,6 +992,10 @@  static inline struct mlx5_flow *to_mflow(struct ibv_flow *flow_id)
 
 bool is_mlx5_dev(struct ibv_device *device);
 
+void mlx5_open_debug_file(FILE **dbg_fp);
+void mlx5_close_debug_file(FILE *dbg_fp);
+void mlx5_set_debug_mask(void);
+
 int mlx5_alloc_buf(struct mlx5_buf *buf, size_t size, int page_size);
 void mlx5_free_buf(struct mlx5_buf *buf);
 int mlx5_alloc_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf,