diff mbox series

[ndctl] papr: Check for command type in papr_xlat_firmware_status()

Message ID 20200721114326.305790-1-vaibhav@linux.ibm.com (mailing list archive)
State Accepted
Commit 8a5b27ff18a561106926d108834b15249a1b102c
Headers show
Series [ndctl] papr: Check for command type in papr_xlat_firmware_status() | expand

Commit Message

Vaibhav Jain July 21, 2020, 11:43 a.m. UTC
We recently discovered intermittent failures while reading label-area
of PAPR-NVDimms and the command 'read-labels' would in such a case
generated empty output like below:

$ sudo ndctl read-labels -j nmem0
[
]
read 0 nmem

Upon investigation we found that this was caused by a spurious error
code returned from ndctl_cmd_submit_xlat() when its called from
ndctl_dimm_read_label_extent() while trying to read the label-area
contents of the NVDIMM.

Digging further it was relieved that ndctl_cmd_submit_xlat() would
always call papr_xlat_firmware_status() via pointer
'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware
status for all ndctl_cmds even though they arent really PAPR PDSM
commands.

In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was
represented by type 'struct nd_cmd_get_config_data_hdr' and
papr_xlat_firmware_status() incorrectly assumed it to be of type
'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid
value.

A proper fix for this would probably need introducing a new ndctl_cmd
callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one
introduced in [1]. However such a change could be disruptive, hence
the patch introduces a small workaround in papr_xlat_firmware_status()
that checks if the 'struct ndctl_cmd *' provided if of correct type
CMD_CALL and if not then it ignores it and return '0'

References:
[1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status
reporting")

Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM requests")
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 ndctl/lib/papr.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Verma, Vishal L July 21, 2020, 6:27 p.m. UTC | #1
On Tue, 2020-07-21 at 17:13 +0530, Vaibhav Jain wrote:
> We recently discovered intermittent failures while reading label-area
> of PAPR-NVDimms and the command 'read-labels' would in such a case
> generated empty output like below:
> 
> $ sudo ndctl read-labels -j nmem0
> [
> ]
> read 0 nmem
> 
> Upon investigation we found that this was caused by a spurious error
> code returned from ndctl_cmd_submit_xlat() when its called from
> ndctl_dimm_read_label_extent() while trying to read the label-area
> contents of the NVDIMM.
> 
> Digging further it was relieved that ndctl_cmd_submit_xlat() would
> always call papr_xlat_firmware_status() via pointer
> 'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware
> status for all ndctl_cmds even though they arent really PAPR PDSM
> commands.
> 
> In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was
> represented by type 'struct nd_cmd_get_config_data_hdr' and
> papr_xlat_firmware_status() incorrectly assumed it to be of type
> 'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid
> value.
> 
> A proper fix for this would probably need introducing a new ndctl_cmd
> callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one
> introduced in [1]. However such a change could be disruptive, hence
> the patch introduces a small workaround in papr_xlat_firmware_status()
> that checks if the 'struct ndctl_cmd *' provided if of correct type
> CMD_CALL and if not then it ignores it and return '0'
> 
> References:
> [1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status
> reporting")
> 
> Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM requests")
> Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
>  ndctl/lib/papr.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c
> index d9ce253369b3..63561f8f9797 100644
> --- a/ndctl/lib/papr.c
> +++ b/ndctl/lib/papr.c
> @@ -56,9 +56,7 @@ static u32 papr_get_firmware_status(struct ndctl_cmd *cmd)
>  
>  static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
>  {
> -	const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
> -
> -	return pcmd->cmd_status;
> +	return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0;

Is this correct? -- for non ND_CMD_CALL commands this will always return a 0,
and it seems like you will lose any error state for commands
like ND_CMD_SET_CONFIG_DATA.

>  }
>  
>  /* Verify if the given command is supported and valid */
Vaibhav Jain July 22, 2020, 1:59 a.m. UTC | #2
Vishal Verma <vishal.l.verma@intel.com> writes:

<snip>
>>  static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
>>  {
>> -	const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
>> -
>> -	return pcmd->cmd_status;
>> +	return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0;
>
> Is this correct? -- for non ND_CMD_CALL commands this will always return a 0,
> and it seems like you will lose any error state for commands
> like ND_CMD_SET_CONFIG_DATA.
This behaviour is similar to what ndctl_cmd_xlat_firmware_status() does
when corresponding dimm-ops are missing the 'xlat_firmware_status'
callback.

Also ndctl_cmd_submit_xlat() returns the rc from ndctl_cmd_submit()
in case ndctl_cmd_xlat_firmware_status() returns '0', which corresponds
to 'ndctl_cmd.status' field. So any error codes
returned from ndctl_cmd_submit() are still returned back to the caller
even though papr_xlat_firmware_status() returned '0'. 
>
>>  }
>>  
>>  /* Verify if the given command is supported and valid */
>
>
Verma, Vishal L July 22, 2020, 4:45 a.m. UTC | #3
On Wed, 2020-07-22 at 07:29 +0530, Vaibhav Jain wrote:
> Vishal Verma <vishal.l.verma@intel.com> writes:
> 
> <snip>
> > >  static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
> > >  {
> > > -	const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
> > > -
> > > -	return pcmd->cmd_status;
> > > +	return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status :
> > > 0;
> > 
> > Is this correct? -- for non ND_CMD_CALL commands this will always
> > return a 0,
> > and it seems like you will lose any error state for commands
> > like ND_CMD_SET_CONFIG_DATA.
> This behaviour is similar to what ndctl_cmd_xlat_firmware_status()
> does
> when corresponding dimm-ops are missing the 'xlat_firmware_status'
> callback.
> 
> Also ndctl_cmd_submit_xlat() returns the rc from ndctl_cmd_submit()
> in case ndctl_cmd_xlat_firmware_status() returns '0', which
> corresponds
> to 'ndctl_cmd.status' field. So any error codes
> returned from ndctl_cmd_submit() are still returned back to the caller
> even though papr_xlat_firmware_status() returned '0'. 

Ah yes you're right. I'll queue this up for v69 and we can do the more
involved fix in the next cycle.

Thanks!
-Vishal
diff mbox series

Patch

diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c
index d9ce253369b3..63561f8f9797 100644
--- a/ndctl/lib/papr.c
+++ b/ndctl/lib/papr.c
@@ -56,9 +56,7 @@  static u32 papr_get_firmware_status(struct ndctl_cmd *cmd)
 
 static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
 {
-	const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
-
-	return pcmd->cmd_status;
+	return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0;
 }
 
 /* Verify if the given command is supported and valid */