diff mbox

[1/2] IB/mlx5: Implementation of PCI error handler

Message ID 20140312034512.065218504@linux.vnet.ibm.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

clsoto@linux.vnet.ibm.com March 12, 2014, 3:42 a.m. UTC
This patch is to add PCI error handler function support for mlx5.
Created the functions for error_detected and slot_rest, plus will
send a port down event to users when the driver error_detected
function is invoked. This is to prevent a hang seeing in
mcast_remove_one at the time ib_unregister_device is called for the
ib_sa module. It will fail hardware commands while the driver is
handling a PCI error. It will reduce the hardware commands timeout
to 10 msecs so it does not hang waiting for an interrupt of the
completion of the hardware command.

Signed-off-by: Carol Soto <clsoto@linux.vnet.ibm.com>
---
 drivers/infiniband/hw/mlx5/main.c             |   32 +++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |    7 +++++
 include/linux/mlx5/driver.h                   |    4 +--
 3 files changed, 40 insertions(+), 3 deletions(-)

Comments

Ben Hutchings March 12, 2014, 6:34 p.m. UTC | #1
On Tue, 2014-03-11 at 22:42 -0500, clsoto@linux.vnet.ibm.com wrote:
[...]
> Index: b/include/linux/mlx5/driver.h
> ===================================================================
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -51,10 +51,10 @@ enum {
>  };
>  
>  enum {
> -       /* one minute for the sake of bringup. Generally, commands must always
> +       /* 10 msecs for the sake of bringup. Generally, commands must always
>          * complete and we may need to increase this timeout value
>          */
> -       MLX5_CMD_TIMEOUT_MSEC   = 7200 * 1000,
> +       MLX5_CMD_TIMEOUT_MSEC   = 10 * 1000,

You seem to be changing the timeout from 2 hours (not one minute) to 10
seconds (not milliseconds).

Ben.

>         MLX5_CMD_WQ_MAX_NAME    = 32,
>  };
>  
>
clsoto@linux.vnet.ibm.com March 12, 2014, 10 p.m. UTC | #2
On 3/12/2014 1:34 PM, Ben Hutchings wrote:
> On Tue, 2014-03-11 at 22:42 -0500, clsoto@linux.vnet.ibm.com wrote:
> [...]
>> Index: b/include/linux/mlx5/driver.h
>> ===================================================================
>> --- a/include/linux/mlx5/driver.h
>> +++ b/include/linux/mlx5/driver.h
>> @@ -51,10 +51,10 @@ enum {
>>   };
>>   
>>   enum {
>> -       /* one minute for the sake of bringup. Generally, commands must always
>> +       /* 10 msecs for the sake of bringup. Generally, commands must always
>>           * complete and we may need to increase this timeout value
>>           */
>> -       MLX5_CMD_TIMEOUT_MSEC   = 7200 * 1000,
>> +       MLX5_CMD_TIMEOUT_MSEC   = 10 * 1000,
> You seem to be changing the timeout from 2 hours (not one minute) to 10
> seconds (not milliseconds).
>
> Ben.
Yes you are right the comment should say 10 seconds instead of 10 msecs.

Carol
>
>>          MLX5_CMD_WQ_MAX_NAME    = 32,
>>   };
>>   
>>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 13, 2014, 6:45 a.m. UTC | #3
On Wed, Mar 12, 2014 at 06:34:12PM +0000, Ben Hutchings wrote:
> >  
> >  enum {
> > -       /* one minute for the sake of bringup. Generally, commands must always
> > +       /* 10 msecs for the sake of bringup. Generally, commands must always
> >          * complete and we may need to increase this timeout value
> >          */
> > -       MLX5_CMD_TIMEOUT_MSEC   = 7200 * 1000,
> > +       MLX5_CMD_TIMEOUT_MSEC   = 10 * 1000,
> 
> You seem to be changing the timeout from 2 hours (not one minute) to 10
> seconds (not milliseconds).
> 
Thanks for noting this. Actually, the time should remain 2 hours and
the comment should be fixed. Also note that long time/missing
completion of a command is, genrallly, not indicative of a PCI error.
If that happens, we would want to have enough time to do diagnostics
before timing out.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
clsoto@linux.vnet.ibm.com March 13, 2014, 3:12 p.m. UTC | #4
On 3/13/2014 1:45 AM, Eli Cohen wrote:
> On Wed, Mar 12, 2014 at 06:34:12PM +0000, Ben Hutchings wrote:
>>>   
>>>   enum {
>>> -       /* one minute for the sake of bringup. Generally, commands must always
>>> +       /* 10 msecs for the sake of bringup. Generally, commands must always
>>>           * complete and we may need to increase this timeout value
>>>           */
>>> -       MLX5_CMD_TIMEOUT_MSEC   = 7200 * 1000,
>>> +       MLX5_CMD_TIMEOUT_MSEC   = 10 * 1000,
>> You seem to be changing the timeout from 2 hours (not one minute) to 10
>> seconds (not milliseconds).
>>
> Thanks for noting this. Actually, the time should remain 2 hours and
> the comment should be fixed. Also note that long time/missing
> completion of a command is, genrallly, not indicative of a PCI error.
> If that happens, we would want to have enough time to do diagnostics
> before timing out.
>
Hi Eli

In mlx4 code, I do not recall a timeout for commands this big. So the 
reason in mlx5 is 2 hrs is just for
debugging purposes? So if for any reason a command hang then the user 
can not remove this module
for the next 2 hrs?

Carol

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 13, 2014, 3:40 p.m. UTC | #5
On Thu, Mar 13, 2014 at 10:12:19AM -0500, Carol Soto wrote:
> 
> In mlx4 code, I do not recall a timeout for commands this big. So
> the reason in mlx5 is 2 hrs is just for
> debugging purposes? So if for any reason a command hang then the
> user can not remove this module
> for the next 2 hrs?
> 
Hi Carol,
well I haven't seen any such case with latest firmware releases.
Anyway, 10 msec is really too short timeout value since there are
commands that can take more than that (e.g. memory registartion of
regions larger then 512 MB - though this will be changed soon). I
wonder what was the original motivation and have you been able to
simulate PCI errors and see this in action.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
clsoto@linux.vnet.ibm.com March 13, 2014, 3:51 p.m. UTC | #6
On 3/13/2014 10:40 AM, Eli Cohen wrote:
> On Thu, Mar 13, 2014 at 10:12:19AM -0500, Carol Soto wrote:
>> In mlx4 code, I do not recall a timeout for commands this big. So
>> the reason in mlx5 is 2 hrs is just for
>> debugging purposes? So if for any reason a command hang then the
>> user can not remove this module
>> for the next 2 hrs?
>>
> Hi Carol,
> well I haven't seen any such case with latest firmware releases.
> Anyway, 10 msec is really too short timeout value since there are
> commands that can take more than that (e.g. memory registartion of
> regions larger then 512 MB - though this will be changed soon). I
> wonder what was the original motivation and have you been able to
> simulate PCI errors and see this in action.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hi Eli,

The motivation to reduce that timeout is that if there is a process in 
the middle of a HW command
in the middle of the PCI error, I probably did not want to wait 2hrs 
since the command will never complete
since the card is dead. Now you are right, I forgot the case of big 
memory registration where commands can
take longer than that. Do you have an idea of what is the longest time 
that a command can take in mlx5?


Carol


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 13, 2014, 4:03 p.m. UTC | #7
On Thu, Mar 13, 2014 at 10:51:46AM -0500, Carol Soto wrote:
> 
> The motivation to reduce that timeout is that if there is a process
> in the middle of a HW command
> in the middle of the PCI error, I probably did not want to wait 2hrs
> since the command will never complete
> since the card is dead. Now you are right, I forgot the case of big
> memory registration where commands can
> take longer than that. Do you have an idea of what is the longest
> time that a command can take in mlx5?
>

There is no guranteed time for command completions. With current
driver/firmware, registration of 4.5 GB can take around 2 minutes; for
larger regions it can take even more time. As I mentioned earlier, I
will soon send a patch that reduces registation times for large
regions.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
clsoto@linux.vnet.ibm.com March 13, 2014, 4:26 p.m. UTC | #8
On 3/13/2014 11:03 AM, Eli Cohen wrote:
> On Thu, Mar 13, 2014 at 10:51:46AM -0500, Carol Soto wrote:
>> The motivation to reduce that timeout is that if there is a process
>> in the middle of a HW command
>> in the middle of the PCI error, I probably did not want to wait 2hrs
>> since the command will never complete
>> since the card is dead. Now you are right, I forgot the case of big
>> memory registration where commands can
>> take longer than that. Do you have an idea of what is the longest
>> time that a command can take in mlx5?
>>
> There is no guranteed time for command completions. With current
> driver/firmware, registration of 4.5 GB can take around 2 minutes; for
> larger regions it can take even more time. As I mentioned earlier, I
> will soon send a patch that reduces registation times for large
> regions.
Hi Eli,

I can wait for your patch to be available and try this again. For now,  
I can remove
my changes for the timeout and resubmit the patch.
Thanks for the feedback about the commands.

Carol
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: b/drivers/infiniband/hw/mlx5/main.c
===================================================================
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1508,11 +1508,41 @@  static DEFINE_PCI_DEVICE_TABLE(mlx5_ib_p
 
 MODULE_DEVICE_TABLE(pci, mlx5_ib_pci_table);
 
+static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev,
+					      pci_channel_state_t state)
+{
+	struct mlx5_ib_dev *dev = mlx5_pci2ibdev(pdev);
+	struct mlx5_core_dev *mdev = &dev->mdev;
+	u8 port;
+
+	/* To avoid the mcast hang with ipoib up */
+	for (port = 1; port <= dev->mdev.caps.num_ports; port++)
+		mlx5_ib_event(mdev, MLX5_DEV_EVENT_PORT_DOWN, &port);
+
+	remove_one(pdev);
+
+	return state == pci_channel_io_perm_failure ?
+		PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t mlx5_pci_slot_reset(struct pci_dev *pdev)
+{
+	int ret = init_one(pdev, 0);
+
+	return ret ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
+}
+
+static const struct pci_error_handlers mlx5_err_handler = {
+	.error_detected = mlx5_pci_err_detected,
+	.slot_reset     = mlx5_pci_slot_reset,
+};
+
 static struct pci_driver mlx5_ib_driver = {
 	.name		= DRIVER_NAME,
 	.id_table	= mlx5_ib_pci_table,
 	.probe		= init_one,
-	.remove		= remove_one
+	.remove		= remove_one,
+	.err_handler	= &mlx5_err_handler,
 };
 
 static int __init mlx5_ib_init(void)
Index: b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
===================================================================
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -646,6 +646,13 @@  static int mlx5_cmd_invoke(struct mlx5_c
 	if (callback && page_queue)
 		return -EINVAL;
 
+	if (pci_channel_offline(dev->pdev)) {
+		/* Device is going through error recovery
+		 * and cannot accept commands.
+		 */
+		return -EIO;
+	}
+
 	ent = alloc_cmd(cmd, in, out, uout, uout_size, callback, context,
 			page_queue);
 	if (IS_ERR(ent))
Index: b/include/linux/mlx5/driver.h
===================================================================
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -51,10 +51,10 @@  enum {
 };
 
 enum {
-	/* one minute for the sake of bringup. Generally, commands must always
+	/* 10 msecs for the sake of bringup. Generally, commands must always
 	 * complete and we may need to increase this timeout value
 	 */
-	MLX5_CMD_TIMEOUT_MSEC	= 7200 * 1000,
+	MLX5_CMD_TIMEOUT_MSEC	= 10 * 1000,
 	MLX5_CMD_WQ_MAX_NAME	= 32,
 };