Message ID | 20181211233652.9705-1-sagi@grimberg.me (mailing list archive) |
---|---|
Headers | show |
Series | restore polling to nvme-rdma | expand |
On Tue, Dec 11, 2018 at 03:36:47PM -0800, Sagi Grimberg wrote: > Add an additional queue mapping for polling queues that will > host polling for latency critical I/O. > > One caveat is that we don't want these queues to be pure polling > as we don't want to bother with polling for the initial nvmf connect > I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling > context from SOFTIRQ to DIRECT. So do we really care? Yes, polling for the initial connect is not exactly efficient, but then again it doesn't happen all that often. Except for efficiency is there any problem with just starting out in polling mode?
>> Add an additional queue mapping for polling queues that will >> host polling for latency critical I/O. >> >> One caveat is that we don't want these queues to be pure polling >> as we don't want to bother with polling for the initial nvmf connect >> I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling >> context from SOFTIRQ to DIRECT. > > So do we really care? Yes, polling for the initial connect is not > exactly efficient, but then again it doesn't happen all that often. > > Except for efficiency is there any problem with just starting out > in polling mode? I found it cumbersome so I didn't really consider it... Isn't it a bit awkward? we will need to implement polled connect locally in nvme-rdma (because fabrics doesn't know anything about queues, hctx or polling). I'm open to looking at it if you think that this is better. Note that if we had the CQ in our hands, we would do exactly what we did here effectively: use interrupt for the connect and then simply not re-arm it again and poll... Should we poll the connect just because we are behind the CQ API?
On Tue, Dec 11, 2018 at 11:16:31PM -0800, Sagi Grimberg wrote: > >>> Add an additional queue mapping for polling queues that will >>> host polling for latency critical I/O. >>> >>> One caveat is that we don't want these queues to be pure polling >>> as we don't want to bother with polling for the initial nvmf connect >>> I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling >>> context from SOFTIRQ to DIRECT. >> >> So do we really care? Yes, polling for the initial connect is not >> exactly efficient, but then again it doesn't happen all that often. >> >> Except for efficiency is there any problem with just starting out >> in polling mode? > > I found it cumbersome so I didn't really consider it... > Isn't it a bit awkward? we will need to implement polled connect > locally in nvme-rdma (because fabrics doesn't know anything about > queues, hctx or polling). Well, it should just be a little blk_poll loop, right? > I'm open to looking at it if you think that this is better. Note that if > we had the CQ in our hands, we would do exactly what we did here > effectively: use interrupt for the connect and then simply not > re-arm it again and poll... Should we poll the connect just because > we are behind the CQ API? I'm just worried that the switch between the different context looks like a way to easy way to shoot yourself in the foot, so if we can avoid exposing that it would make for a harder to abuse API.
>> I found it cumbersome so I didn't really consider it... >> Isn't it a bit awkward? we will need to implement polled connect >> locally in nvme-rdma (because fabrics doesn't know anything about >> queues, hctx or polling). > > Well, it should just be a little blk_poll loop, right? Not so much about the poll loop, but the fact that we will need to check if we need to poll for this special case every time in .queue_rq and its somewhat annoying... >> I'm open to looking at it if you think that this is better. Note that if >> we had the CQ in our hands, we would do exactly what we did here >> effectively: use interrupt for the connect and then simply not >> re-arm it again and poll... Should we poll the connect just because >> we are behind the CQ API? > > I'm just worried that the switch between the different context looks > like a way to easy way to shoot yourself in the foot, so if we can > avoid exposing that it would make for a harder to abuse API. Well, it would have been 100% safe if we could undo a cq re-arm that we did in the past... The code is documented that the caller must make sure that there is no inflight I/O during the invocation of the routine.. We could be creative if we really want to make it 100% safe for inflight I/O (although no one should ever need to use that). We can flush the current cq context (work/irq), switch to polling context, then create a single entry QP attached to this CQ and drain it :) That would make it safe but its brain-dead... Anyway, if people think its really a bad idea we'll go ahead and poll the nvmf connect...
On Wed, Dec 12, 2018 at 12:53:47AM -0800, Sagi Grimberg wrote: >> Well, it should just be a little blk_poll loop, right? > > Not so much about the poll loop, but the fact that we will need > to check if we need to poll for this special case every time in > .queue_rq and its somewhat annoying... I don't think we need to add any special check in queue_rq. Any polled command should be marked as HIPRI from the submitting code, and that includes the queue creation. > Anyway, if people think its really a bad idea we'll go ahead and > poll the nvmf connect... I'm really just worried about the completion context switching. If the RDMA folks are happy with that API and you strongly favour this version I'll let you decide.
Hey Sagi, > Subject: [PATCH RFC 0/4] restore polling to nvme-rdma > > Add an additional queue mapping for polling queues that will > host polling for latency critical I/O. > > One caveat is that we don't want these queues to be pure polling > as we don't want to bother with polling for the initial nvmf connect > I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling > context from SOFTIRQ to DIRECT. Note that this function is not safe > with inflight I/O so the caller must make sure not to call it without > having all I/O quiesced (we also relax the ib_cq_completion_direct warning > as we have a scenario that this can happen). Is there no way to handle this in the core? Maybe have the polling context transition to DIRECT when the queue becomes empty and before re-arming the CQ? So ib_change_cq_ctx() would be called to indicate the change should happen when it is safe to do so. Just a thought..
> Hey Sagi, Hi Steve, > Is there no way to handle this in the core? Maybe have the polling context > transition to DIRECT when the queue becomes empty and before re-arming the > CQ? That is what I suggested, but that would mean that we we need to drain the cq before making the switch, which means we need to allocate a dedicated qp for that cq, and even that doesn't guarantee that the ULP is not posting other wrs on its own qp(s)... So making this safe for infight I/O would be a challenge... If we end up agreeing that we are ok with this functionality, I'd much rather not deal with it and simply document "use with care". > So ib_change_cq_ctx() would be called to indicate the change should > happen when it is safe to do so. You lost me here... ib_change_cq_ctx would get called by who and when?
> > Hey Sagi, > > Hi Steve, > > > Is there no way to handle this in the core? Maybe have the polling context > > transition to DIRECT when the queue becomes empty and before re-arming > the > > CQ? > > That is what I suggested, but that would mean that we we need to drain > the cq before making the switch, which means we need to allocate a > dedicated qp for that cq, and even that doesn't guarantee that the > ULP is not posting other wrs on its own qp(s)... > > So making this safe for infight I/O would be a challenge... If we end > up agreeing that we are ok with this functionality, I'd much rather not > deal with it and simply document "use with care". > > > So ib_change_cq_ctx() would be called to indicate the change should > > happen when it is safe to do so. > > You lost me here... ib_change_cq_ctx would get called by who and when? I didn't look in detail at your changes, but ib_change_cq_ctx() is called by the application, right? I was just asking what if the semantics of the call were "change the context when it is safe to do so" vs "do it immediately and hope there are no outstanding WRs". But I don't think this semantic change simplifies the problem.
>>> Well, it should just be a little blk_poll loop, right? >> >> Not so much about the poll loop, but the fact that we will need >> to check if we need to poll for this special case every time in >> .queue_rq and its somewhat annoying... > > I don't think we need to add any special check in queue_rq. Any > polled command should be marked as HIPRI from the submitting code, > and that includes the queue creation. Hmm, good idea, perhaps adding something like to following: -- /** * blk_execute_rq_polled - execute a request and poll for its completion * @q: queue to insert the request in * @bd_disk: matching gendisk * @rq: request to insert * @at_head: insert request at head or tail of queue * * Description: * Insert a fully prepared request at the back of the I/O scheduler * queue for execution and poll for completion. */ void blk_execute_rq_polled(struct request_queue *q, struct gendisk *bd_disk, struct request *rq, int at_head) { DECLARE_COMPLETION_ONSTACK(wait); unsigned long hang_check; rq->cmd_flags |= REQ_HIPRI; rq->end_io_data = &wait; blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq); while (!completion_done(&wait)) blk_poll(q, 0, true); } EXPORT_SYMBOL(blk_execute_rq_polled); -- and __nvme_submit_sync_cmd() can receive a poll argument as well to call blk_execute_req_polled()?