diff mbox series

[2/3] fpga manager: xilinx-spi: provide better diagnostics on programming failure

Message ID 20200817165911.32589-2-luca@lucaceresoli.net (mailing list archive)
State Superseded
Headers show
Series [1/3] fpga manager: xilinx-spi: remove stray comment | expand

Commit Message

Luca Ceresoli Aug. 17, 2020, 4:59 p.m. UTC
When the DONE pin does not go high after programming to confirm programming
success, the INIT_B pin provides some info on the reason. Use it if
available to provide a more explanatory error message.

Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
---
 drivers/fpga/xilinx-spi.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Tom Rix Aug. 17, 2020, 6:15 p.m. UTC | #1
The other two patches are fine.

On 8/17/20 9:59 AM, Luca Ceresoli wrote:
> When the DONE pin does not go high after programming to confirm programming
> success, the INIT_B pin provides some info on the reason. Use it if
> available to provide a more explanatory error message.
>
> Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
> ---
>  drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
> index 502fae0d1d85..2aa942bb1114 100644
> --- a/drivers/fpga/xilinx-spi.c
> +++ b/drivers/fpga/xilinx-spi.c
> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>  			return xilinx_spi_apply_cclk_cycles(conf);
>  	}
>  
> -	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
> +	if (conf->init_b) {
> +		int init_b_asserted = gpiod_get_value(conf->init_b);

gpiod_get_value can fail. So maybe need split the first statement.

init_b_asserted < 0 ? "invalid device"

As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.

> +
> +		dev_err(&mgr->dev,
> +			init_b_asserted ? "CRC error or invalid device\n"
> +			: "Missing sync word or incomplete bitstream\n");
> +	} else {
> +		dev_err(&mgr->dev, "Timeout after config data transfer.\n");
patch 3 removes '.' s , and you just added one back in ?
> +	}
> +
>  	return -ETIMEDOUT;
>  }
>  

Reviewed-by: Tom Rix <trix@redhat.com>
Luca Ceresoli Aug. 18, 2020, 10:20 a.m. UTC | #2
[a question for GPIO maintainers below]

Hi Tom,

thanks for your review!

On 17/08/20 20:15, Tom Rix wrote:
> The other two patches are fine.
> 
> On 8/17/20 9:59 AM, Luca Ceresoli wrote:
>> When the DONE pin does not go high after programming to confirm programming
>> success, the INIT_B pin provides some info on the reason. Use it if
>> available to provide a more explanatory error message.
>>
>> Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
>> ---
>>  drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
>> index 502fae0d1d85..2aa942bb1114 100644
>> --- a/drivers/fpga/xilinx-spi.c
>> +++ b/drivers/fpga/xilinx-spi.c
>> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>>  			return xilinx_spi_apply_cclk_cycles(conf);
>>  	}
>>  
>> -	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>> +	if (conf->init_b) {
>> +		int init_b_asserted = gpiod_get_value(conf->init_b);
> 
> gpiod_get_value can fail. So maybe need split the first statement.
> 
> init_b_asserted < 0 ? "invalid device"
> 
> As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.

Thanks for the heads up. However I'm not sure which is the best thing to
do here.

First, I've been reading the libgpiod code after your email and yes, the
libgpiod code _could_ return runtime errors received from the gpiochip
driver, even though the docs state:

> The get/set calls do not return errors because “invalid GPIO”> should have been reported earlier from gpiod_direction_*().
(https://www.kernel.org/doc/html/latest/driver-api/gpio/consumer.html)

On the other hand there are plenty of calls to gpiod_get/set_value in
the kernel that don't check for error values. I guess this is because
failures getting/setting a GPIO are very uncommon (perhaps impossible
with platform GPIO).

When still a GPIO get/set operation fails I'm not sure adding thousands
of error-checking code lines in hundreds of drivers is the best way to
go. I feel like we should have a unique, noisy dev_err() in the error
path in libgpio but I was surprised in not finding any [1].

Linus, Bartosz, what's your opinion? Should all drivers check for errors
after every gpiod_[sg]et_value*() call?

>> +		dev_err(&mgr->dev,
>> +			init_b_asserted ? "CRC error or invalid device\n"
>> +			: "Missing sync word or incomplete bitstream\n");
>> +	} else {
>> +		dev_err(&mgr->dev, "Timeout after config data transfer.\n");
> patch 3 removes '.' s , and you just added one back in ?

Here I'm only changing indentation of this line. But OK, this is
misleading, so I'll swap patches 2 and 3 in the next patch iteration to
avoid confusion.

[1]
https://elixir.bootlin.com/linux/v5.8/source/drivers/gpio/gpiolib.c#L3646
Tom Rix Aug. 18, 2020, 2:21 p.m. UTC | #3
On 8/18/20 3:20 AM, Luca Ceresoli wrote:
> [a question for GPIO maintainers below]
>
> Hi Tom,
>
> thanks for your review!
>
> On 17/08/20 20:15, Tom Rix wrote:
>> The other two patches are fine.
>>
>> On 8/17/20 9:59 AM, Luca Ceresoli wrote:
>>> When the DONE pin does not go high after programming to confirm programming
>>> success, the INIT_B pin provides some info on the reason. Use it if
>>> available to provide a more explanatory error message.
>>>
>>> Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
>>> ---
>>>  drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
>>> index 502fae0d1d85..2aa942bb1114 100644
>>> --- a/drivers/fpga/xilinx-spi.c
>>> +++ b/drivers/fpga/xilinx-spi.c
>>> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>>>  			return xilinx_spi_apply_cclk_cycles(conf);
>>>  	}
>>>  
>>> -	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>>> +	if (conf->init_b) {
>>> +		int init_b_asserted = gpiod_get_value(conf->init_b);
>> gpiod_get_value can fail. So maybe need split the first statement.
>>
>> init_b_asserted < 0 ? "invalid device"
>>
>> As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.
> Thanks for the heads up. However I'm not sure which is the best thing to
> do here.
>
> First, I've been reading the libgpiod code after your email and yes, the
> libgpiod code _could_ return runtime errors received from the gpiochip
> driver, even though the docs state:
>
>> The get/set calls do not return errors because “invalid GPIO”> should have been reported earlier from gpiod_direction_*().
> (https://www.kernel.org/doc/html/latest/driver-api/gpio/consumer.html)
>
> On the other hand there are plenty of calls to gpiod_get/set_value in
> the kernel that don't check for error values. I guess this is because
> failures getting/setting a GPIO are very uncommon (perhaps impossible
> with platform GPIO).
>
> When still a GPIO get/set operation fails I'm not sure adding thousands
> of error-checking code lines in hundreds of drivers is the best way to
> go. I feel like we should have a unique, noisy dev_err() in the error
> path in libgpio but I was surprised in not finding any [1].
>
> Linus, Bartosz, what's your opinion? Should all drivers check for errors
> after every gpiod_[sg]et_value*() call?

My opinion is that you know the driver / hw is in a bad state and you

are trying to convey useful information.  So you should

be as careful as possible and not assume gpio did not fail.

>
>>> +		dev_err(&mgr->dev,
>>> +			init_b_asserted ? "CRC error or invalid device\n"
>>> +			: "Missing sync word or incomplete bitstream\n");
>>> +	} else {
>>> +		dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>> patch 3 removes '.' s , and you just added one back in ?
> Here I'm only changing indentation of this line. But OK, this is
> misleading, so I'll swap patches 2 and 3 in the next patch iteration to
> avoid confusion.
Maybe just remove the '.' at the same time and/or collapse 2&3 into a single patch.
>
> [1]
> https://elixir.bootlin.com/linux/v5.8/source/drivers/gpio/gpiolib.c#L3646
>
Luca Ceresoli Aug. 19, 2020, 4:32 p.m. UTC | #4
On 18/08/20 16:21, Tom Rix wrote:
> 
> On 8/18/20 3:20 AM, Luca Ceresoli wrote:
>> [a question for GPIO maintainers below]
>>
>> Hi Tom,
>>
>> thanks for your review!
>>
>> On 17/08/20 20:15, Tom Rix wrote:
>>> The other two patches are fine.
>>>
>>> On 8/17/20 9:59 AM, Luca Ceresoli wrote:
>>>> When the DONE pin does not go high after programming to confirm programming
>>>> success, the INIT_B pin provides some info on the reason. Use it if
>>>> available to provide a more explanatory error message.
>>>>
>>>> Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
>>>> ---
>>>>  drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
>>>> index 502fae0d1d85..2aa942bb1114 100644
>>>> --- a/drivers/fpga/xilinx-spi.c
>>>> +++ b/drivers/fpga/xilinx-spi.c
>>>> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>>>>  			return xilinx_spi_apply_cclk_cycles(conf);
>>>>  	}
>>>>  
>>>> -	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>>>> +	if (conf->init_b) {
>>>> +		int init_b_asserted = gpiod_get_value(conf->init_b);
>>> gpiod_get_value can fail. So maybe need split the first statement.
>>>
>>> init_b_asserted < 0 ? "invalid device"
>>>
>>> As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.
>> Thanks for the heads up. However I'm not sure which is the best thing to
>> do here.
>>
>> First, I've been reading the libgpiod code after your email and yes, the
>> libgpiod code _could_ return runtime errors received from the gpiochip
>> driver, even though the docs state:
>>
>>> The get/set calls do not return errors because “invalid GPIO”> should have been reported earlier from gpiod_direction_*().
>> (https://www.kernel.org/doc/html/latest/driver-api/gpio/consumer.html)
>>
>> On the other hand there are plenty of calls to gpiod_get/set_value in
>> the kernel that don't check for error values. I guess this is because
>> failures getting/setting a GPIO are very uncommon (perhaps impossible
>> with platform GPIO).
>>
>> When still a GPIO get/set operation fails I'm not sure adding thousands
>> of error-checking code lines in hundreds of drivers is the best way to
>> go. I feel like we should have a unique, noisy dev_err() in the error
>> path in libgpio but I was surprised in not finding any [1].
>>
>> Linus, Bartosz, what's your opinion? Should all drivers check for errors
>> after every gpiod_[sg]et_value*() call?
> 
> My opinion is that you know the driver / hw is in a bad state and you
> 
> are trying to convey useful information.  So you should
> 
> be as careful as possible and not assume gpio did not fail.

This patch aims at providing better diagnostics after programming has
already gone bad. Neglecting an error might lead to a misleading error
message, but this doesn't lead programming to fail -- it has failed already.

On the other hand a gpiod_get/set_value() call might fail earlier, along
the normal execution path, and lead to real failures without an error
message emitted after the gpiod call that failed.

Which doesn't mean I'm against your proposal of adding error checking
code. Rather, if we want error checking, we want it mainly in other
places: at the very least at the first usage of each of the GPIOs, maybe
at each usage. Have a look at the beginning of
xilinx_spi_write_complete() [0] for example: if gpiod_get_value() fails
there the driver would think programming has been successfully completed
(DONE asserted). To me this is worse than just printing the wrong error
message.

[0]
https://elixir.bootlin.com/linux/v5.8.2/source/drivers/fpga/xilinx-spi.c#L114
Luca Ceresoli Aug. 27, 2020, 2:30 p.m. UTC | #5
Hi Tom,

On 19/08/20 18:32, Luca Ceresoli wrote:
> On 18/08/20 16:21, Tom Rix wrote:
>>
>> On 8/18/20 3:20 AM, Luca Ceresoli wrote:
>>> [a question for GPIO maintainers below]
>>>
>>> Hi Tom,
>>>
>>> thanks for your review!
>>>
>>> On 17/08/20 20:15, Tom Rix wrote:
>>>> The other two patches are fine.
>>>>
>>>> On 8/17/20 9:59 AM, Luca Ceresoli wrote:
>>>>> When the DONE pin does not go high after programming to confirm programming
>>>>> success, the INIT_B pin provides some info on the reason. Use it if
>>>>> available to provide a more explanatory error message.
>>>>>
>>>>> Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
>>>>> ---
>>>>>  drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
>>>>> index 502fae0d1d85..2aa942bb1114 100644
>>>>> --- a/drivers/fpga/xilinx-spi.c
>>>>> +++ b/drivers/fpga/xilinx-spi.c
>>>>> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>>>>>  			return xilinx_spi_apply_cclk_cycles(conf);
>>>>>  	}
>>>>>  
>>>>> -	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>>>>> +	if (conf->init_b) {
>>>>> +		int init_b_asserted = gpiod_get_value(conf->init_b);
>>>> gpiod_get_value can fail. So maybe need split the first statement.
>>>>
>>>> init_b_asserted < 0 ? "invalid device"
>>>>
>>>> As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.
>>> Thanks for the heads up. However I'm not sure which is the best thing to
>>> do here.
>>>
>>> First, I've been reading the libgpiod code after your email and yes, the
>>> libgpiod code _could_ return runtime errors received from the gpiochip
>>> driver, even though the docs state:
>>>
>>>> The get/set calls do not return errors because “invalid GPIO”> should have been reported earlier from gpiod_direction_*().
>>> (https://www.kernel.org/doc/html/latest/driver-api/gpio/consumer.html)
>>>
>>> On the other hand there are plenty of calls to gpiod_get/set_value in
>>> the kernel that don't check for error values. I guess this is because
>>> failures getting/setting a GPIO are very uncommon (perhaps impossible
>>> with platform GPIO).
>>>
>>> When still a GPIO get/set operation fails I'm not sure adding thousands
>>> of error-checking code lines in hundreds of drivers is the best way to
>>> go. I feel like we should have a unique, noisy dev_err() in the error
>>> path in libgpio but I was surprised in not finding any [1].
>>>
>>> Linus, Bartosz, what's your opinion? Should all drivers check for errors
>>> after every gpiod_[sg]et_value*() call?
>>
>> My opinion is that you know the driver / hw is in a bad state and you
>>
>> are trying to convey useful information.  So you should
>>
>> be as careful as possible and not assume gpio did not fail.
> 
> This patch aims at providing better diagnostics after programming has
> already gone bad. Neglecting an error might lead to a misleading error
> message, but this doesn't lead programming to fail -- it has failed already.
> 
> On the other hand a gpiod_get/set_value() call might fail earlier, along
> the normal execution path, and lead to real failures without an error
> message emitted after the gpiod call that failed.
> 
> Which doesn't mean I'm against your proposal of adding error checking
> code. Rather, if we want error checking, we want it mainly in other
> places: at the very least at the first usage of each of the GPIOs, maybe
> at each usage. Have a look at the beginning of
> xilinx_spi_write_complete() [0] for example: if gpiod_get_value() fails
> there the driver would think programming has been successfully completed
> (DONE asserted). To me this is worse than just printing the wrong error
> message.
> 
> [0]
> https://elixir.bootlin.com/linux/v5.8.2/source/drivers/fpga/xilinx-spi.c#L114

I added error checking wherever gpiod_get_value() is called to see what
happens, and I'm sending a v2 series with this change. The code got
longer, but I've kept it still pretty readable. It still feels like a
half solution as gpiod_set_value() is void and thus no error checking
can be done on it, but let's see yours and other's opinion.
diff mbox series

Patch

diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
index 502fae0d1d85..2aa942bb1114 100644
--- a/drivers/fpga/xilinx-spi.c
+++ b/drivers/fpga/xilinx-spi.c
@@ -169,7 +169,16 @@  static int xilinx_spi_write_complete(struct fpga_manager *mgr,
 			return xilinx_spi_apply_cclk_cycles(conf);
 	}
 
-	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
+	if (conf->init_b) {
+		int init_b_asserted = gpiod_get_value(conf->init_b);
+
+		dev_err(&mgr->dev,
+			init_b_asserted ? "CRC error or invalid device\n"
+			: "Missing sync word or incomplete bitstream\n");
+	} else {
+		dev_err(&mgr->dev, "Timeout after config data transfer.\n");
+	}
+
 	return -ETIMEDOUT;
 }