Message ID | 20140312034512.065218504@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Tue, 2014-03-11 at 22:42 -0500, clsoto@linux.vnet.ibm.com wrote: [...] > Index: b/include/linux/mlx5/driver.h > =================================================================== > --- a/include/linux/mlx5/driver.h > +++ b/include/linux/mlx5/driver.h > @@ -51,10 +51,10 @@ enum { > }; > > enum { > - /* one minute for the sake of bringup. Generally, commands must always > + /* 10 msecs for the sake of bringup. Generally, commands must always > * complete and we may need to increase this timeout value > */ > - MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000, > + MLX5_CMD_TIMEOUT_MSEC = 10 * 1000, You seem to be changing the timeout from 2 hours (not one minute) to 10 seconds (not milliseconds). Ben. > MLX5_CMD_WQ_MAX_NAME = 32, > }; > >
On 3/12/2014 1:34 PM, Ben Hutchings wrote: > On Tue, 2014-03-11 at 22:42 -0500, clsoto@linux.vnet.ibm.com wrote: > [...] >> Index: b/include/linux/mlx5/driver.h >> =================================================================== >> --- a/include/linux/mlx5/driver.h >> +++ b/include/linux/mlx5/driver.h >> @@ -51,10 +51,10 @@ enum { >> }; >> >> enum { >> - /* one minute for the sake of bringup. Generally, commands must always >> + /* 10 msecs for the sake of bringup. Generally, commands must always >> * complete and we may need to increase this timeout value >> */ >> - MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000, >> + MLX5_CMD_TIMEOUT_MSEC = 10 * 1000, > You seem to be changing the timeout from 2 hours (not one minute) to 10 > seconds (not milliseconds). > > Ben. Yes you are right the comment should say 10 seconds instead of 10 msecs. Carol > >> MLX5_CMD_WQ_MAX_NAME = 32, >> }; >> >> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 12, 2014 at 06:34:12PM +0000, Ben Hutchings wrote: > > > > enum { > > - /* one minute for the sake of bringup. Generally, commands must always > > + /* 10 msecs for the sake of bringup. Generally, commands must always > > * complete and we may need to increase this timeout value > > */ > > - MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000, > > + MLX5_CMD_TIMEOUT_MSEC = 10 * 1000, > > You seem to be changing the timeout from 2 hours (not one minute) to 10 > seconds (not milliseconds). > Thanks for noting this. Actually, the time should remain 2 hours and the comment should be fixed. Also note that long time/missing completion of a command is, genrallly, not indicative of a PCI error. If that happens, we would want to have enough time to do diagnostics before timing out. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/13/2014 1:45 AM, Eli Cohen wrote: > On Wed, Mar 12, 2014 at 06:34:12PM +0000, Ben Hutchings wrote: >>> >>> enum { >>> - /* one minute for the sake of bringup. Generally, commands must always >>> + /* 10 msecs for the sake of bringup. Generally, commands must always >>> * complete and we may need to increase this timeout value >>> */ >>> - MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000, >>> + MLX5_CMD_TIMEOUT_MSEC = 10 * 1000, >> You seem to be changing the timeout from 2 hours (not one minute) to 10 >> seconds (not milliseconds). >> > Thanks for noting this. Actually, the time should remain 2 hours and > the comment should be fixed. Also note that long time/missing > completion of a command is, genrallly, not indicative of a PCI error. > If that happens, we would want to have enough time to do diagnostics > before timing out. > Hi Eli In mlx4 code, I do not recall a timeout for commands this big. So the reason in mlx5 is 2 hrs is just for debugging purposes? So if for any reason a command hang then the user can not remove this module for the next 2 hrs? Carol -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 13, 2014 at 10:12:19AM -0500, Carol Soto wrote: > > In mlx4 code, I do not recall a timeout for commands this big. So > the reason in mlx5 is 2 hrs is just for > debugging purposes? So if for any reason a command hang then the > user can not remove this module > for the next 2 hrs? > Hi Carol, well I haven't seen any such case with latest firmware releases. Anyway, 10 msec is really too short timeout value since there are commands that can take more than that (e.g. memory registartion of regions larger then 512 MB - though this will be changed soon). I wonder what was the original motivation and have you been able to simulate PCI errors and see this in action. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/13/2014 10:40 AM, Eli Cohen wrote: > On Thu, Mar 13, 2014 at 10:12:19AM -0500, Carol Soto wrote: >> In mlx4 code, I do not recall a timeout for commands this big. So >> the reason in mlx5 is 2 hrs is just for >> debugging purposes? So if for any reason a command hang then the >> user can not remove this module >> for the next 2 hrs? >> > Hi Carol, > well I haven't seen any such case with latest firmware releases. > Anyway, 10 msec is really too short timeout value since there are > commands that can take more than that (e.g. memory registartion of > regions larger then 512 MB - though this will be changed soon). I > wonder what was the original motivation and have you been able to > simulate PCI errors and see this in action. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Eli, The motivation to reduce that timeout is that if there is a process in the middle of a HW command in the middle of the PCI error, I probably did not want to wait 2hrs since the command will never complete since the card is dead. Now you are right, I forgot the case of big memory registration where commands can take longer than that. Do you have an idea of what is the longest time that a command can take in mlx5? Carol -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 13, 2014 at 10:51:46AM -0500, Carol Soto wrote: > > The motivation to reduce that timeout is that if there is a process > in the middle of a HW command > in the middle of the PCI error, I probably did not want to wait 2hrs > since the command will never complete > since the card is dead. Now you are right, I forgot the case of big > memory registration where commands can > take longer than that. Do you have an idea of what is the longest > time that a command can take in mlx5? > There is no guranteed time for command completions. With current driver/firmware, registration of 4.5 GB can take around 2 minutes; for larger regions it can take even more time. As I mentioned earlier, I will soon send a patch that reduces registation times for large regions. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/13/2014 11:03 AM, Eli Cohen wrote: > On Thu, Mar 13, 2014 at 10:51:46AM -0500, Carol Soto wrote: >> The motivation to reduce that timeout is that if there is a process >> in the middle of a HW command >> in the middle of the PCI error, I probably did not want to wait 2hrs >> since the command will never complete >> since the card is dead. Now you are right, I forgot the case of big >> memory registration where commands can >> take longer than that. Do you have an idea of what is the longest >> time that a command can take in mlx5? >> > There is no guranteed time for command completions. With current > driver/firmware, registration of 4.5 GB can take around 2 minutes; for > larger regions it can take even more time. As I mentioned earlier, I > will soon send a patch that reduces registation times for large > regions. Hi Eli, I can wait for your patch to be available and try this again. For now, I can remove my changes for the timeout and resubmit the patch. Thanks for the feedback about the commands. Carol > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: b/drivers/infiniband/hw/mlx5/main.c =================================================================== --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1508,11 +1508,41 @@ static DEFINE_PCI_DEVICE_TABLE(mlx5_ib_p MODULE_DEVICE_TABLE(pci, mlx5_ib_pci_table); +static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct mlx5_ib_dev *dev = mlx5_pci2ibdev(pdev); + struct mlx5_core_dev *mdev = &dev->mdev; + u8 port; + + /* To avoid the mcast hang with ipoib up */ + for (port = 1; port <= dev->mdev.caps.num_ports; port++) + mlx5_ib_event(mdev, MLX5_DEV_EVENT_PORT_DOWN, &port); + + remove_one(pdev); + + return state == pci_channel_io_perm_failure ? + PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET; +} + +static pci_ers_result_t mlx5_pci_slot_reset(struct pci_dev *pdev) +{ + int ret = init_one(pdev, 0); + + return ret ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; +} + +static const struct pci_error_handlers mlx5_err_handler = { + .error_detected = mlx5_pci_err_detected, + .slot_reset = mlx5_pci_slot_reset, +}; + static struct pci_driver mlx5_ib_driver = { .name = DRIVER_NAME, .id_table = mlx5_ib_pci_table, .probe = init_one, - .remove = remove_one + .remove = remove_one, + .err_handler = &mlx5_err_handler, }; static int __init mlx5_ib_init(void) Index: b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c =================================================================== --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -646,6 +646,13 @@ static int mlx5_cmd_invoke(struct mlx5_c if (callback && page_queue) return -EINVAL; + if (pci_channel_offline(dev->pdev)) { + /* Device is going through error recovery + * and cannot accept commands. + */ + return -EIO; + } + ent = alloc_cmd(cmd, in, out, uout, uout_size, callback, context, page_queue); if (IS_ERR(ent)) Index: b/include/linux/mlx5/driver.h =================================================================== --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -51,10 +51,10 @@ enum { }; enum { - /* one minute for the sake of bringup. Generally, commands must always + /* 10 msecs for the sake of bringup. Generally, commands must always * complete and we may need to increase this timeout value */ - MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000, + MLX5_CMD_TIMEOUT_MSEC = 10 * 1000, MLX5_CMD_WQ_MAX_NAME = 32, };
This patch is to add PCI error handler function support for mlx5. Created the functions for error_detected and slot_rest, plus will send a port down event to users when the driver error_detected function is invoked. This is to prevent a hang seeing in mcast_remove_one at the time ib_unregister_device is called for the ib_sa module. It will fail hardware commands while the driver is handling a PCI error. It will reduce the hardware commands timeout to 10 msecs so it does not hang waiting for an interrupt of the completion of the hardware command. Signed-off-by: Carol Soto <clsoto@linux.vnet.ibm.com> --- drivers/infiniband/hw/mlx5/main.c | 32 +++++++++++++++++++++++++- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 7 +++++ include/linux/mlx5/driver.h | 4 +-- 3 files changed, 40 insertions(+), 3 deletions(-)