Message ID | 1449784350-30214-1-git-send-email-ira.weiny@intel.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Doug, With your email troubles I just wanted to make sure you did not lose track of this patch. https://patchwork.kernel.org/patch/7822511/ Thanks, Ira On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.weiny@intel.com wrote: > From: Dean Luick <dean.luick@intel.com> > > It was found that when a process was rapidly sending MADs other processes could > be hung in their unregister calls. > > This would happen when process A was injecting packets fast enough that the > single threaded workqueue was never exiting ib_mad_completion_handler. > Therefore when process B called flush_workqueue via the unregister call it > would hang until process A stopped sending MADs. > > The fix is to periodically reschedule ib_mad_completion_handler after > processing a large number of completions. The number of completions chosen was > decided based on the defaults for the recv queue size. However, it was kept > fixed such that increasing those queue sizes would not adversely affect > fairness in the future. > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Dean Luick <dean.luick@intel.com> > --- > drivers/infiniband/core/mad.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index 2281de122038..d4d2a618fd66 100644 > --- a/drivers/infiniband/core/mad.c > +++ b/drivers/infiniband/core/mad.c > @@ -61,6 +61,18 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests > module_param_named(recv_queue_size, mad_recvq_size, int, 0444); > MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests"); > > +/* > + * Define a limit on the number of completions which will be processed by the > + * worker thread in a single work item. This ensures that other work items > + * (potentially from other users) are processed fairly. > + * > + * The number of completions was derived from the default queue sizes above. > + * We use a value which is double the larger of the 2 queues (receive @ 512) > + * but keep it fixed such that an increase in that value does not introduce > + * unfairness. > + */ > +#define MAD_COMPLETION_PROC_LIMIT 1024 > + > static struct list_head ib_mad_port_list; > static u32 ib_mad_client_id = 0; > > @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct work_struct *work) > { > struct ib_mad_port_private *port_priv; > struct ib_wc wc; > + int count = 0; > > port_priv = container_of(work, struct ib_mad_port_private, work); > ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) > } > } else > mad_error_handler(port_priv, &wc); > + > + if (++count > MAD_COMPLETION_PROC_LIMIT) { > + queue_work(port_priv->wq, &port_priv->work); > + break; > + } > } > } > > -- > 1.8.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/23/2015 03:01 PM, ira.weiny wrote: > Doug, > > With your email troubles I just wanted to make sure you did not lose track of > this patch. > > https://patchwork.kernel.org/patch/7822511/ I've got it, thanks. > Thanks, > Ira > > On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.weiny@intel.com wrote: >> From: Dean Luick <dean.luick@intel.com> >> >> It was found that when a process was rapidly sending MADs other processes could >> be hung in their unregister calls. >> >> This would happen when process A was injecting packets fast enough that the >> single threaded workqueue was never exiting ib_mad_completion_handler. >> Therefore when process B called flush_workqueue via the unregister call it >> would hang until process A stopped sending MADs. >> >> The fix is to periodically reschedule ib_mad_completion_handler after >> processing a large number of completions. The number of completions chosen was >> decided based on the defaults for the recv queue size. However, it was kept >> fixed such that increasing those queue sizes would not adversely affect >> fairness in the future. >> >> Reviewed-by: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Dean Luick <dean.luick@intel.com> >> --- >> drivers/infiniband/core/mad.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c >> index 2281de122038..d4d2a618fd66 100644 >> --- a/drivers/infiniband/core/mad.c >> +++ b/drivers/infiniband/core/mad.c >> @@ -61,6 +61,18 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests >> module_param_named(recv_queue_size, mad_recvq_size, int, 0444); >> MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests"); >> >> +/* >> + * Define a limit on the number of completions which will be processed by the >> + * worker thread in a single work item. This ensures that other work items >> + * (potentially from other users) are processed fairly. >> + * >> + * The number of completions was derived from the default queue sizes above. >> + * We use a value which is double the larger of the 2 queues (receive @ 512) >> + * but keep it fixed such that an increase in that value does not introduce >> + * unfairness. >> + */ >> +#define MAD_COMPLETION_PROC_LIMIT 1024 >> + >> static struct list_head ib_mad_port_list; >> static u32 ib_mad_client_id = 0; >> >> @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct work_struct *work) >> { >> struct ib_mad_port_private *port_priv; >> struct ib_wc wc; >> + int count = 0; >> >> port_priv = container_of(work, struct ib_mad_port_private, work); >> ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); >> @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) >> } >> } else >> mad_error_handler(port_priv, &wc); >> + >> + if (++count > MAD_COMPLETION_PROC_LIMIT) { >> + queue_work(port_priv->wq, &port_priv->work); >> + break; >> + } >> } >> } >> >> -- >> 1.8.2 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.weiny@intel.com wrote: > From: Dean Luick <dean.luick@intel.com> > > > @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct work_struct *work) > { > struct ib_mad_port_private *port_priv; > struct ib_wc wc; > + int count = 0; > > port_priv = container_of(work, struct ib_mad_port_private, work); > ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); I think you shoudld push the call to ib_req_notify_cq outside the while loop. You don't need to arm the CQ if you re-queued the work. Only when you have drained the CQ should you re-arm. > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) > } > } else > mad_error_handler(port_priv, &wc); > + > + if (++count > MAD_COMPLETION_PROC_LIMIT) { > + queue_work(port_priv->wq, &port_priv->work); > + break; > + } > } > } > > -- > 1.8.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 28, 2015 at 06:51:30PM +0200, Eli Cohen wrote: > On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.weiny@intel.com wrote: > > From: Dean Luick <dean.luick@intel.com> > > > > > > @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct work_struct *work) > > { > > struct ib_mad_port_private *port_priv; > > struct ib_wc wc; > > + int count = 0; > > > > port_priv = container_of(work, struct ib_mad_port_private, work); > > ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); > > I think you shoudld push the call to ib_req_notify_cq outside the > while loop. You don't need to arm the CQ if you re-queued the work. > Only when you have drained the CQ should you re-arm. Will it hurt to rearm? The way the code stands I think the worse that will happen is an extra work item scheduled and an ib_poll_cq call. I'm not quite sure what you mean about moving the ib_req_notify_cq outside of the while loop. It seems like to do what you say we would need another work item which just does ib_poll_cq. Is that what you meant? Ira > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) > > } > > } else > > mad_error_handler(port_priv, &wc); > > + > > + if (++count > MAD_COMPLETION_PROC_LIMIT) { > > + queue_work(port_priv->wq, &port_priv->work); > > + break; > > + } > > } > > } > > > > -- > > 1.8.2 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 28, 2015 at 06:05:46PM -0500, ira.weiny wrote: > > Will it hurt to rearm? The way the code stands I think the worse that will > happen is an extra work item scheduled and an ib_poll_cq call. If you re-arm unconditionally you call for extra interrupts which you can do without. When you break the loop of processing completions since you exhausted the quota, you queue the work so you can continue processing completions in the next time slot of the work task. After queueing the work you should call "return" instead of "break". If you processed all the completions before reaching MAD_COMPLETION_PROC_LIMIT you will exit the while loop and then re-arming can take place. > > I'm not quite sure what you mean about moving the ib_req_notify_cq outside of > the while loop. It seems like to do what you say we would need another work > item which just does ib_poll_cq. Is that what you meant? > > Ira > > > > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) > > > } > > > } else > > > mad_error_handler(port_priv, &wc); > > > + > > > + if (++count > MAD_COMPLETION_PROC_LIMIT) { > > > + queue_work(port_priv->wq, &port_priv->work); > > > + break; > > > + } > > > } > > > } > > > > > > -- > > > 1.8.2 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 29, 2015 at 01:25:33AM +0200, Eli Cohen wrote: > On Mon, Dec 28, 2015 at 06:05:46PM -0500, ira.weiny wrote: > > > > Will it hurt to rearm? The way the code stands I think the worse that will > > happen is an extra work item scheduled and an ib_poll_cq call. > > If you re-arm unconditionally you call for extra interrupts which you > can do without. When you break the loop of processing completions since > you exhausted the quota, you queue the work so you can continue > processing completions in the next time slot of the work task. After > queueing the work you should call "return" instead of "break". > If you processed all the completions before reaching > MAD_COMPLETION_PROC_LIMIT you will exit the while loop and then > re-arming can take place. I'm still confused. Here is the code with the patch applied: /* * IB MAD completion callback */ static void ib_mad_completion_handler(struct work_struct *work) { struct ib_mad_port_private *port_priv; struct ib_wc wc; int count = 0; port_priv = container_of(work, struct ib_mad_port_private, work); ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { if (wc.status == IB_WC_SUCCESS) { switch (wc.opcode) { case IB_WC_SEND: ib_mad_send_done_handler(port_priv, &wc); break; case IB_WC_RECV: ib_mad_recv_done_handler(port_priv, &wc); break; default: BUG_ON(1); break; } } else mad_error_handler(port_priv, &wc); if (++count > MAD_COMPLETION_PROC_LIMIT) { queue_work(port_priv->wq, &port_priv->work); break; } } } How is "return" any different than "break"? Calling return will still result in a rearm on the next work task. Perhaps this code is wrong in the first place? Should it call ib_req_notify_cq after the while loop? This code has been this way forever... 1da177e4c3f41 (Linus Torvalds 2005-04-16 15:20:36 -0700 2568) ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); 1da177e4c3f41 (Linus Torvalds 2005-04-16 15:20:36 -0700 2569) 1da177e4c3f41 (Linus Torvalds 2005-04-16 15:20:36 -0700 2570) while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { Ira > > > > > I'm not quite sure what you mean about moving the ib_req_notify_cq outside of > > the while loop. It seems like to do what you say we would need another work > > item which just does ib_poll_cq. Is that what you meant? > > > > Ira > > > > > > > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) > > > > } > > > > } else > > > > mad_error_handler(port_priv, &wc); > > > > + > > > > + if (++count > MAD_COMPLETION_PROC_LIMIT) { > > > > + queue_work(port_priv->wq, &port_priv->work); > > > > + break; > > > > + } > > > > } > > > > } > > > > > > > > -- > > > > 1.8.2 > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please just convert the mad handler to the new CQ API in drivers/infiniband/core/cq.c. If you have any question about it I'd be glad to help you. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please just convert the mad handler to the new CQ API in > drivers/infiniband/core/cq.c. If you have any question about it I'd be > glad to help you. +1 on this suggestion. We had these sorts of questions in our ULPs as well. The CQ API should take care of all that for you and leaves you to just handle the completions... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 29, 2015 at 11:51:19AM +0200, Sagi Grimberg wrote: > > >Please just convert the mad handler to the new CQ API in > >drivers/infiniband/core/cq.c. If you have any question about it I'd be > >glad to help you. > > +1 on this suggestion. > > We had these sorts of questions in our ULPs as well. The CQ API should > take care of all that for you and leaves you to just handle the > completions... I saw your work and agree it would be nice but it will take some time to convert and debug the MAD stack. I'll try and find some time but it is unlikely I will anytime soon. We can hit this bug regularly with hfi1 but have not hit with qib or mlx4. I leave it up to Doug if he wants to take this fix before someone finds time to convert the MAD stack. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 28, 2015 at 07:35:14PM -0500, ira.weiny wrote: > > I'm still confused. Here is the code with the patch applied: > > > /* > * IB MAD completion callback > */ > static void ib_mad_completion_handler(struct work_struct *work) > { > struct ib_mad_port_private *port_priv; > struct ib_wc wc; > int count = 0; > > port_priv = container_of(work, struct ib_mad_port_private, work); > ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); > > while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { > if (wc.status == IB_WC_SUCCESS) { > switch (wc.opcode) { > case IB_WC_SEND: > ib_mad_send_done_handler(port_priv, &wc); > break; > case IB_WC_RECV: > ib_mad_recv_done_handler(port_priv, &wc); > break; > default: > BUG_ON(1); > break; > } > } else > mad_error_handler(port_priv, &wc); > > if (++count > MAD_COMPLETION_PROC_LIMIT) { > queue_work(port_priv->wq, &port_priv->work); > break; > } > } > } > > > How is "return" any different than "break"? Calling return will still result > in a rearm on the next work task. My argument is that if you break the loop since you exahsuted the quota you don't need to call ib_req_notify_cq(). If you think about this you will see that this was the original logic of this function. Only after you exhasted the quota you need to call ib_req_notify_cq() so the next completion will trigger the interrupt handler which calls the completion handler. The result is that you are generating less interrupts in the system. To achieve that you do like this: /* * IB MAD completion callback */ static void ib_mad_completion_handler(struct work_struct *work) { struct ib_mad_port_private *port_priv; struct ib_wc wc; int count = 0; port_priv = container_of(work, struct ib_mad_port_private, work); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { if (wc.status == IB_WC_SUCCESS) { switch (wc.opcode) { case IB_WC_SEND: ib_mad_send_done_handler(port_priv, &wc); break; case IB_WC_RECV: ib_mad_recv_done_handler(port_priv, &wc); break; default: BUG_ON(1); break; } } else mad_error_handler(port_priv, &wc); if (++count > MAD_COMPLETION_PROC_LIMIT) { queue_work(port_priv->wq, &port_priv->work); return; <== return and avoid requesting for notification } } ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); <== only called if there are no more completions to process } I hope my point is clear now. > > Perhaps this code is wrong in the first place? Should it call ib_req_notify_cq > after the while loop? This code has been this way forever... > > 1da177e4c3f41 (Linus Torvalds 2005-04-16 15:20:36 -0700 2568) ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); > 1da177e4c3f41 (Linus Torvalds 2005-04-16 15:20:36 -0700 2569) > 1da177e4c3f41 (Linus Torvalds 2005-04-16 15:20:36 -0700 2570) while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { > > > Ira > > > > > > > > > > I'm not quite sure what you mean about moving the ib_req_notify_cq outside of > > > the while loop. It seems like to do what you say we would need another work > > > item which just does ib_poll_cq. Is that what you meant? > > > > > > Ira > > > > > > > > > > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) > > > > > } > > > > > } else > > > > > mad_error_handler(port_priv, &wc); > > > > > + > > > > > + if (++count > MAD_COMPLETION_PROC_LIMIT) { > > > > > + queue_work(port_priv->wq, &port_priv->work); > > > > > + break; > > > > > + } > > > > > } > > > > > } > > > > > > > > > > -- > > > > > 1.8.2 > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > > > the body of a message to majordomo@vger.kernel.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 2281de122038..d4d2a618fd66 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -61,6 +61,18 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests module_param_named(recv_queue_size, mad_recvq_size, int, 0444); MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests"); +/* + * Define a limit on the number of completions which will be processed by the + * worker thread in a single work item. This ensures that other work items + * (potentially from other users) are processed fairly. + * + * The number of completions was derived from the default queue sizes above. + * We use a value which is double the larger of the 2 queues (receive @ 512) + * but keep it fixed such that an increase in that value does not introduce + * unfairness. + */ +#define MAD_COMPLETION_PROC_LIMIT 1024 + static struct list_head ib_mad_port_list; static u32 ib_mad_client_id = 0; @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct work_struct *work) { struct ib_mad_port_private *port_priv; struct ib_wc wc; + int count = 0; port_priv = container_of(work, struct ib_mad_port_private, work); ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct work_struct *work) } } else mad_error_handler(port_priv, &wc); + + if (++count > MAD_COMPLETION_PROC_LIMIT) { + queue_work(port_priv->wq, &port_priv->work); + break; + } } }