Message ID | 20181110151317.3813-5-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Various block optimizations | expand |
Hi Jens On 11/10/18 11:13 PM, Jens Axboe wrote: > We only really need the barrier if we're going to be sleeping, > if we're just polling we're fine with the __set_current_state() > variant. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > fs/block_dev.c | 25 +++++++++++++++++++++---- > 1 file changed, 21 insertions(+), 4 deletions(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index c039abfb2052..e7550b1f9670 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -237,12 +237,23 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > qc = submit_bio(&bio); > for (;;) { > - set_current_state(TASK_UNINTERRUPTIBLE); > + /* > + * Using non-atomic task state for polling > + */ > + __set_current_state(TASK_UNINTERRUPTIBLE); > + > if (!READ_ONCE(bio.bi_private)) > break; When we don't use polling, the blkdev_bio_end_io_simple may come on a different cpu before submit_bio returns For example, __blkdev_direct_IO_simple qc = submit_bio(&bio) blkdev_bio_end_io_simple WRITE_ONCE(bio.bi_private, NULL) wake_up_process(waiter) __set_current_state(TASK_UNINTERRUPTIBLE) READ_ONCE(bio.bi_private) At the moment, a smp_rmb at least maybe needed before READ_ONCE(bio.bi_private) to ensure the WRITE_ONCE(bio.bi_private, NULL) is seen if we miss the wakeup. > + > if (!(iocb->ki_flags & IOCB_HIPRI) || > - !blk_poll(bdev_get_queue(bdev), qc)) > + !blk_poll(bdev_get_queue(bdev), qc)) { > + /* > + * If we're scheduling, ensure that task state > + * change is ordered and seen. > + */ > + smp_mb(); > io_schedule(); > + } > } > __set_current_state(TASK_RUNNING); > > @@ -403,13 +414,19 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) > return -EIOCBQUEUED; > > for (;;) { > - set_current_state(TASK_UNINTERRUPTIBLE); > + /* > + * See __blkdev_direct_IO_simple() loop > + */ > + __set_current_state(TASK_UNINTERRUPTIBLE); > + Similar with above. > if (!READ_ONCE(dio->waiter)) > break; > > if (!(iocb->ki_flags & IOCB_HIPRI) || > - !blk_poll(bdev_get_queue(bdev), qc)) > + !blk_poll(bdev_get_queue(bdev), qc)) { > + smp_mb(); > io_schedule(); > + } > } > __set_current_state(TASK_RUNNING); > > Thanks Jianchao
On 11/12/18 2:35 AM, jianchao.wang wrote: > Hi Jens > > On 11/10/18 11:13 PM, Jens Axboe wrote: >> We only really need the barrier if we're going to be sleeping, >> if we're just polling we're fine with the __set_current_state() >> variant. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> fs/block_dev.c | 25 +++++++++++++++++++++---- >> 1 file changed, 21 insertions(+), 4 deletions(-) >> >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index c039abfb2052..e7550b1f9670 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -237,12 +237,23 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, >> >> qc = submit_bio(&bio); >> for (;;) { >> - set_current_state(TASK_UNINTERRUPTIBLE); >> + /* >> + * Using non-atomic task state for polling >> + */ >> + __set_current_state(TASK_UNINTERRUPTIBLE); >> + >> if (!READ_ONCE(bio.bi_private)) >> break; > > When we don't use polling, the blkdev_bio_end_io_simple may come on a different cpu before > submit_bio returns > For example, > __blkdev_direct_IO_simple > qc = submit_bio(&bio) > blkdev_bio_end_io_simple > WRITE_ONCE(bio.bi_private, NULL) > wake_up_process(waiter) > __set_current_state(TASK_UNINTERRUPTIBLE) > READ_ONCE(bio.bi_private) While I agree that you are right, that's an existing issue. The store barrier implied by set_current_state() doesn't fix that. How about this variant: http://git.kernel.dk/cgit/linux-block/commit/?h=mq-perf&id=036ba7a8d1334889c0fe55d4858d78f4e8022f12 which then changes the later wake up helper patch to be: http://git.kernel.dk/cgit/linux-block/commit/?h=mq-perf&id=f8c3f188425967adb040cfefb799b0a5a1df769d
Hi Jens On 11/13/18 12:26 AM, Jens Axboe wrote: > On 11/12/18 2:35 AM, jianchao.wang wrote: >> Hi Jens >> >> On 11/10/18 11:13 PM, Jens Axboe wrote: >>> We only really need the barrier if we're going to be sleeping, >>> if we're just polling we're fine with the __set_current_state() >>> variant. >>> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> --- >>> fs/block_dev.c | 25 +++++++++++++++++++++---- >>> 1 file changed, 21 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/block_dev.c b/fs/block_dev.c >>> index c039abfb2052..e7550b1f9670 100644 >>> --- a/fs/block_dev.c >>> +++ b/fs/block_dev.c >>> @@ -237,12 +237,23 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, >>> >>> qc = submit_bio(&bio); >>> for (;;) { >>> - set_current_state(TASK_UNINTERRUPTIBLE); >>> + /* >>> + * Using non-atomic task state for polling >>> + */ >>> + __set_current_state(TASK_UNINTERRUPTIBLE); >>> + >>> if (!READ_ONCE(bio.bi_private)) >>> break; >> >> When we don't use polling, the blkdev_bio_end_io_simple may come on a different cpu before >> submit_bio returns >> For example, >> __blkdev_direct_IO_simple >> qc = submit_bio(&bio) >> blkdev_bio_end_io_simple >> WRITE_ONCE(bio.bi_private, NULL) >> wake_up_process(waiter) >> __set_current_state(TASK_UNINTERRUPTIBLE) >> READ_ONCE(bio.bi_private) > > While I agree that you are right, that's an existing issue. The store > barrier implied by set_current_state() doesn't fix that. > > How about this variant: > > https://urldefense.proofpoint.com/v2/url?u=http-3A__git.kernel.dk_cgit_linux-2Dblock_commit_-3Fh-3Dmq-2Dperf-26id-3D036ba7a8d1334889c0fe55d4858d78f4e8022f12&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=pjbk_9rxuy7OS1pCkQ9MAHzFf4eT27Yt4RL8jASDJCA&s=AnmYMIcGpIILUrjDFsabj61pcTyMCRHB1Pu8XXi47t0&e= > > which then changes the later wake up helper patch to be: > > https://urldefense.proofpoint.com/v2/url?u=http-3A__git.kernel.dk_cgit_linux-2Dblock_commit_-3Fh-3Dmq-2Dperf-26id-3Df8c3f188425967adb040cfefb799b0a5a1df769d&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=pjbk_9rxuy7OS1pCkQ9MAHzFf4eT27Yt4RL8jASDJCA&s=PdajuUul55e6p0FED_AzzWt5Gip0ekLi1eGYBMLiZPo&e= > The wake up path has contained a full memory barrier with the raw_spin_lock and following smp_mb__after_spinlock wake_up_process -> try_to_wake_up raw_spin_lock_irqsave(&p->pi_lock, flags); smp_mb__after_spinlock(); So a smp_rmb() could be enough. :) Thanks Jianchao
diff --git a/fs/block_dev.c b/fs/block_dev.c index c039abfb2052..e7550b1f9670 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -237,12 +237,23 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, qc = submit_bio(&bio); for (;;) { - set_current_state(TASK_UNINTERRUPTIBLE); + /* + * Using non-atomic task state for polling + */ + __set_current_state(TASK_UNINTERRUPTIBLE); + if (!READ_ONCE(bio.bi_private)) break; + if (!(iocb->ki_flags & IOCB_HIPRI) || - !blk_poll(bdev_get_queue(bdev), qc)) + !blk_poll(bdev_get_queue(bdev), qc)) { + /* + * If we're scheduling, ensure that task state + * change is ordered and seen. + */ + smp_mb(); io_schedule(); + } } __set_current_state(TASK_RUNNING); @@ -403,13 +414,19 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) return -EIOCBQUEUED; for (;;) { - set_current_state(TASK_UNINTERRUPTIBLE); + /* + * See __blkdev_direct_IO_simple() loop + */ + __set_current_state(TASK_UNINTERRUPTIBLE); + if (!READ_ONCE(dio->waiter)) break; if (!(iocb->ki_flags & IOCB_HIPRI) || - !blk_poll(bdev_get_queue(bdev), qc)) + !blk_poll(bdev_get_queue(bdev), qc)) { + smp_mb(); io_schedule(); + } } __set_current_state(TASK_RUNNING);
We only really need the barrier if we're going to be sleeping, if we're just polling we're fine with the __set_current_state() variant. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/block_dev.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)