Message ID | 564F4F38.9040505@sandisk.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
> Hello Christoph, > > The comment about locality in the above quote is interesting. How about > modifying patch 2/9 as indicated below ? The modification below does not > change the behavior of this patch if ib_cq.w.cpu is not modified. And it > allows users who care about locality and who want to skip the scheduler > overhead by setting ib_cq.w.cpu to the index of the CPU they want the > work to be processed on. That sounds acceptable... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Nov 22, 2015 at 11:51:13AM +0200, Sagi Grimberg wrote: > >> Hello Christoph, >> >> The comment about locality in the above quote is interesting. How about >> modifying patch 2/9 as indicated below ? The modification below does not >> change the behavior of this patch if ib_cq.w.cpu is not modified. And it >> allows users who care about locality and who want to skip the scheduler >> overhead by setting ib_cq.w.cpu to the index of the CPU they want the >> work to be processed on. > > That sounds acceptable... Wouldn't it be a better idea to set the WQ_SYSFS interface and use the standard sysfs interface for specifying cpumasks or node affinity? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Wouldn't it be a better idea to set the WQ_SYSFS interface and use > the standard sysfs interface for specifying cpumasks or node affinity? I think that bart wants to allow the caller to select cpu affinity per CQ. In this case ib_alloc_cq in workqueue mode would need to accept a affinity_hint from the caller (default to wild-card WORK_CPU_UNBOUND). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Nov 22, 2015 at 12:36:00PM +0200, Sagi Grimberg wrote: > >> Wouldn't it be a better idea to set the WQ_SYSFS interface and use >> the standard sysfs interface for specifying cpumasks or node affinity? > > I think that bart wants to allow the caller to select cpu affinity > per CQ. In this case ib_alloc_cq in workqueue mode would need to > accept a affinity_hint from the caller (default to wild-card > WORK_CPU_UNBOUND). Hmm, true. How would be set that hint from userspace? I'd really prefer to see a practical justification for it first. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> I think that bart wants to allow the caller to select cpu affinity >> per CQ. In this case ib_alloc_cq in workqueue mode would need to >> accept a affinity_hint from the caller (default to wild-card >> WORK_CPU_UNBOUND). > > Hmm, true. How would be set that hint from userspace? I'd really prefer > to see a practical justification for it first. In order to assign CPUs from user-space we'd need an ethtool like interface for isert/srpt/<xxxt>. Given that this is something we don't want to get into right now, I assumed that Bart meant that srpt would take a "least used" approach from srpt driver (which isn't better taking the wild-card option I'd say), So I'll let Bart answer... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/22/15 06:57, Sagi Grimberg wrote: >>> I think that bart wants to allow the caller to select cpu affinity >>> per CQ. In this case ib_alloc_cq in workqueue mode would need to >>> accept a affinity_hint from the caller (default to wild-card >>> WORK_CPU_UNBOUND). >> >> Hmm, true. How would be set that hint from userspace? I'd really prefer >> to see a practical justification for it first. > > In order to assign CPUs from user-space we'd need an ethtool like > interface for isert/srpt/<xxxt>. Given that this is something we don't > want to get into right now, I assumed that Bart meant that srpt > would take a "least used" approach from srpt driver (which isn't better > taking the wild-card option I'd say), So I'll let Bart answer... Hello Christoph and Sagi, My intention is indeed to allow to control CPU affinity per CQ. One use case is to implement a least-used policy in RDMA drivers that use multiple completion queues. Another use case is to make CPU affinity configurable from user space through something similar to ethtool or via sysfs. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index bf2a079..4d80d8c 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -94,18 +94,18 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void *private) static void ib_cq_poll_work(struct work_struct *work) { - struct ib_cq *cq = container_of(work, struct ib_cq, work); + struct ib_cq *cq = container_of(work, struct ib_cq, w.work); int completed; completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); if (completed >= IB_POLL_BUDGET_WORKQUEUE || ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) - queue_work(ib_comp_wq, &cq->work); + queue_work_on(cq->w.cpu, ib_comp_wq, &cq->w.work); } static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) { - queue_work(ib_comp_wq, &cq->work); + queue_work_on(cq->w.cpu, ib_comp_wq, &cq->w.work); } /** @@ -159,7 +159,8 @@ struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private, break; case IB_POLL_WORKQUEUE: cq->comp_handler = ib_cq_completion_workqueue; - INIT_WORK(&cq->work, ib_cq_poll_work); + INIT_WORK(&cq->w.work, ib_cq_poll_work); + cq->w.cpu = WORK_CPU_UNBOUND; ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); break; default: @@ -195,7 +196,7 @@ void ib_free_cq(struct ib_cq *cq) irq_poll_disable(&cq->iop); break; case IB_POLL_WORKQUEUE: - flush_work(&cq->work); + flush_work(&cq->w.work); break; default: WARN_ON_ONCE(1); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index f59a8d3..b1344f8 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1291,7 +1291,10 @@ struct ib_cq { struct ib_wc *wc; union { struct irq_poll iop; - struct work_struct work; + struct { + struct work_struct work; + int cpu; + } w; }; };