Message ID | 20210302190555.201228400@debian-a64.vm (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | Reworked device mapper polling patchset | expand |
On 3/3/21 3:05 AM, Mikulas Patocka wrote: > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty > cookie. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > --- > drivers/md/dm.c | 5 +++++ > 1 file changed, 5 insertions(+) > > Index: linux-2.6/drivers/md/dm.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru > } > } > > + if (ci.poll_cookie != BLK_QC_T_NONE) { > + while (atomic_read(&ci.io->io_count) > 1 && > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > + } > + > /* drop the extra reference count */ > dec_pending(ci.io, errno_to_blk_status(error)); > } It seems that the general idea of your design is to 1) submit *one* split bio 2) blk_poll(), waiting the previously submitted split bio complets and then submit next split bio, repeating the above process. I'm afraid the performance may be an issue here, since the batch every time blk_poll() reaps may decrease. Besides, the submitting routine and polling routine is bound together here, i.e., polling is always synchronous.
On Wed, 3 Mar 2021, JeffleXu wrote: > > > On 3/3/21 3:05 AM, Mikulas Patocka wrote: > > > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty > > cookie. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > > --- > > drivers/md/dm.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > Index: linux-2.6/drivers/md/dm.c > > =================================================================== > > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru > > } > > } > > > > + if (ci.poll_cookie != BLK_QC_T_NONE) { > > + while (atomic_read(&ci.io->io_count) > 1 && > > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > > + } > > + > > /* drop the extra reference count */ > > dec_pending(ci.io, errno_to_blk_status(error)); > > } > > It seems that the general idea of your design is to > 1) submit *one* split bio > 2) blk_poll(), waiting the previously submitted split bio complets No, I submit all the bios and poll for the last one. > and then submit next split bio, repeating the above process. I'm afraid > the performance may be an issue here, since the batch every time > blk_poll() reaps may decrease. Could you benchmark it? > Besides, the submitting routine and polling routine is bound together > here, i.e., polling is always synchronous. __split_and_process_bio calls __split_and_process_non_flush in a loop and __split_and_process_non_flush records the poll cookie in ci.poll_cookie. When we processed all the bios, we poll for the last cookie here: if (ci.poll_cookie != BLK_QC_T_NONE) { while (atomic_read(&ci.io->io_count) > 1 && blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; } Mikulas -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 3/3/21 6:09 PM, Mikulas Patocka wrote: > > > On Wed, 3 Mar 2021, JeffleXu wrote: > >> >> >> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >> >>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>> cookie. >>> >>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>> >>> --- >>> drivers/md/dm.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> Index: linux-2.6/drivers/md/dm.c >>> =================================================================== >>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>> } >>> } >>> >>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>> + while (atomic_read(&ci.io->io_count) > 1 && >>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>> + } >>> + >>> /* drop the extra reference count */ >>> dec_pending(ci.io, errno_to_blk_status(error)); >>> } >> >> It seems that the general idea of your design is to >> 1) submit *one* split bio >> 2) blk_poll(), waiting the previously submitted split bio complets > > No, I submit all the bios and poll for the last one. > >> and then submit next split bio, repeating the above process. I'm afraid >> the performance may be an issue here, since the batch every time >> blk_poll() reaps may decrease. > > Could you benchmark it? > I will once I finished some other issues. >> Besides, the submitting routine and polling routine is bound together >> here, i.e., polling is always synchronous. > > __split_and_process_bio calls __split_and_process_non_flush in a loop I also noticed that you sent this patch. https://patchwork.kernel.org/project/dm-devel/patch/alpine.LRH.2.02.2103010457510.631@file01.intranet.prod.int.rdu2.redhat.com/ I agree with you that this while() loop here is unnecessary. And thus there's no loop calling __split_and_process_non_flush() in __split_and_process_bio(). > __split_and_process_non_flush records the poll cookie in ci.poll_cookie. > When we processed all the bios, we poll for the last cookie here: > > if (ci.poll_cookie != BLK_QC_T_NONE) { > while (atomic_read(&ci.io->io_count) > 1 && > blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > } So what will happen if one bio submitted to dm device crosses the device boundary among several target devices (e.g., dm-stripe)? Please refer the following call graph. ``` submit_bio __submit_bio_noacct disk->fops->submit_bio(), calling into __split_and_process_bio(), call __split_and_process_non_flush() once, submitting the *first* split bio disk->fops->submit_bio(), calling into __split_and_process_bio(), call __split_and_process_non_flush() once, submitting the *second* split bio ... ``` So the loop is in __submit_bio_noacct(), rather than __split_and_process_bio(). Your design will send the first split bio, and then poll on this split bio, then send the next split bio, polling on this, go on and on...
On Thu, 4 Mar 2021, JeffleXu wrote: > > __split_and_process_non_flush records the poll cookie in ci.poll_cookie. > > When we processed all the bios, we poll for the last cookie here: > > > > if (ci.poll_cookie != BLK_QC_T_NONE) { > > while (atomic_read(&ci.io->io_count) > 1 && > > blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > > } > > So what will happen if one bio submitted to dm device crosses the device > boundary among several target devices (e.g., dm-stripe)? Please refer > the following call graph. > > ``` > submit_bio > __submit_bio_noacct > disk->fops->submit_bio(), calling into __split_and_process_bio(), > call __split_and_process_non_flush() once, submitting the *first* split bio > disk->fops->submit_bio(), calling into __split_and_process_bio(), > call __split_and_process_non_flush() once, submitting the *second* split bio > ... > ``` > > > So the loop is in __submit_bio_noacct(), rather than > __split_and_process_bio(). Your design will send the first split bio, > and then poll on this split bio, then send the next split bio, polling > on this, go on and on... No. It will send all the bios and poll for the last one. Mikulas -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Hi, Mikulas, Mikulas Patocka <mpatocka@redhat.com> writes: > On Wed, 3 Mar 2021, JeffleXu wrote: > >> >> >> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >> >> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >> > cookie. >> > >> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >> > >> > --- >> > drivers/md/dm.c | 5 +++++ >> > 1 file changed, 5 insertions(+) >> > >> > Index: linux-2.6/drivers/md/dm.c >> > =================================================================== >> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >> > } >> > } >> > >> > + if (ci.poll_cookie != BLK_QC_T_NONE) { >> > + while (atomic_read(&ci.io->io_count) > 1 && >> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >> > + } >> > + >> > /* drop the extra reference count */ >> > dec_pending(ci.io, errno_to_blk_status(error)); >> > } >> >> It seems that the general idea of your design is to >> 1) submit *one* split bio >> 2) blk_poll(), waiting the previously submitted split bio complets > > No, I submit all the bios and poll for the last one. What happens if the last bio completes first? It looks like you will call blk_poll with a cookie that already completed, and I'm pretty sure that's invalid. Thanks, Jeff -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Thu, Mar 04 2021 at 10:01am -0500, Jeff Moyer <jmoyer@redhat.com> wrote: > Hi, Mikulas, > > Mikulas Patocka <mpatocka@redhat.com> writes: > > > On Wed, 3 Mar 2021, JeffleXu wrote: > > > >> > >> > >> On 3/3/21 3:05 AM, Mikulas Patocka wrote: > >> > >> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty > >> > cookie. > >> > > >> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > >> > > >> > --- > >> > drivers/md/dm.c | 5 +++++ > >> > 1 file changed, 5 insertions(+) > >> > > >> > Index: linux-2.6/drivers/md/dm.c > >> > =================================================================== > >> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > >> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > >> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru > >> > } > >> > } > >> > > >> > + if (ci.poll_cookie != BLK_QC_T_NONE) { > >> > + while (atomic_read(&ci.io->io_count) > 1 && > >> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > >> > + } > >> > + > >> > /* drop the extra reference count */ > >> > dec_pending(ci.io, errno_to_blk_status(error)); > >> > } > >> > >> It seems that the general idea of your design is to > >> 1) submit *one* split bio > >> 2) blk_poll(), waiting the previously submitted split bio complets > > > > No, I submit all the bios and poll for the last one. > > What happens if the last bio completes first? It looks like you will > call blk_poll with a cookie that already completed, and I'm pretty sure > that's invalid. In addition, I'm concerned this approach to have DM internalize IO polling is a non-starter. I just don't think this approach adheres to the io_uring + IO polling interface.. it never returns a cookie to upper layers... so there is really no opportunity for standard io_uring + IO polling interface to work is there? But Heinz and Mikulas are about to kick off some fio io_uring + hipri=1 (io polling) testing of Jeffle's latest v5 patchset: https://patchwork.kernel.org/project/dm-devel/list/?series=442075 compared to Mikulas' patchset: https://patchwork.kernel.org/project/dm-devel/list/?series=440719 We should have definitive answers soon enough, just using Jeffle's fio config (with hipri=1 for IO polling) that was documented here: https://listman.redhat.com/archives/dm-devel/2020-October/msg00129.html Mike -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Thu, 4 Mar 2021, Jeff Moyer wrote: > Hi, Mikulas, > > Mikulas Patocka <mpatocka@redhat.com> writes: > > > On Wed, 3 Mar 2021, JeffleXu wrote: > > > >> > >> > >> On 3/3/21 3:05 AM, Mikulas Patocka wrote: > >> > >> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty > >> > cookie. > >> > > >> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > >> > > >> > --- > >> > drivers/md/dm.c | 5 +++++ > >> > 1 file changed, 5 insertions(+) > >> > > >> > Index: linux-2.6/drivers/md/dm.c > >> > =================================================================== > >> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > >> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > >> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru > >> > } > >> > } > >> > > >> > + if (ci.poll_cookie != BLK_QC_T_NONE) { > >> > + while (atomic_read(&ci.io->io_count) > 1 && > >> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > >> > + } > >> > + > >> > /* drop the extra reference count */ > >> > dec_pending(ci.io, errno_to_blk_status(error)); > >> > } > >> > >> It seems that the general idea of your design is to > >> 1) submit *one* split bio > >> 2) blk_poll(), waiting the previously submitted split bio complets > > > > No, I submit all the bios and poll for the last one. > > What happens if the last bio completes first? It looks like you will > call blk_poll with a cookie that already completed, and I'm pretty sure > that's invalid. > > Thanks, > Jeff If the last bio completes first, the other bios will use interrupt-driven endio. Calling blk_poll with already completed cookie is IMHO legal - it happens with the current direct-io code too. The direct-io code will check for bio completion and if the bio is not completed, call blk_poll. The bio may be completed from an interrupt after the check and before blk_poll - in this case, we will call blk_poll with already completed cookie. Mikulas -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 3/3/21 6:09 PM, Mikulas Patocka wrote: > > > On Wed, 3 Mar 2021, JeffleXu wrote: > >> >> >> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >> >>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>> cookie. >>> >>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>> >>> --- >>> drivers/md/dm.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> Index: linux-2.6/drivers/md/dm.c >>> =================================================================== >>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>> } >>> } >>> >>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>> + while (atomic_read(&ci.io->io_count) > 1 && >>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>> + } >>> + >>> /* drop the extra reference count */ >>> dec_pending(ci.io, errno_to_blk_status(error)); >>> } >> >> It seems that the general idea of your design is to >> 1) submit *one* split bio >> 2) blk_poll(), waiting the previously submitted split bio complets > > No, I submit all the bios and poll for the last one. > >> and then submit next split bio, repeating the above process. I'm afraid >> the performance may be an issue here, since the batch every time >> blk_poll() reaps may decrease. > > Could you benchmark it? I only tested dm-linear. The configuration (dm table) of dm-linear is: 0 1048576 linear /dev/nvme0n1 0 1048576 1048576 linear /dev/nvme2n1 0 2097152 1048576 linear /dev/nvme5n1 0 fio script used is: ``` $cat fio.conf [global] name=iouring-sqpoll-iopoll-1 ioengine=io_uring iodepth=128 numjobs=1 thread rw=randread direct=1 registerfiles=1 hipri=1 runtime=10 time_based group_reporting randrepeat=0 filename=/dev/mapper/testdev bs=4k [job-1] cpus_allowed=14 ``` IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) --------------- | -------------------- 213k | 19k At least, it doesn't work well with io_uring interface.
On 3/5/21 10:52 AM, JeffleXu wrote: > > On 3/3/21 6:09 PM, Mikulas Patocka wrote: >> >> On Wed, 3 Mar 2021, JeffleXu wrote: >> >>> >>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>> >>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>> cookie. >>>> >>>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>>> >>>> --- >>>> drivers/md/dm.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> Index: linux-2.6/drivers/md/dm.c >>>> =================================================================== >>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>> } >>>> } >>>> >>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>> + } >>>> + >>>> /* drop the extra reference count */ >>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>> } >>> It seems that the general idea of your design is to >>> 1) submit *one* split bio >>> 2) blk_poll(), waiting the previously submitted split bio complets >> No, I submit all the bios and poll for the last one. >> >>> and then submit next split bio, repeating the above process. I'm afraid >>> the performance may be an issue here, since the batch every time >>> blk_poll() reaps may decrease. >> Could you benchmark it? > I only tested dm-linear. > > The configuration (dm table) of dm-linear is: > 0 1048576 linear /dev/nvme0n1 0 > 1048576 1048576 linear /dev/nvme2n1 0 > 2097152 1048576 linear /dev/nvme5n1 0 > > > fio script used is: > ``` > $cat fio.conf > [global] > name=iouring-sqpoll-iopoll-1 > ioengine=io_uring > iodepth=128 > numjobs=1 > thread > rw=randread > direct=1 > registerfiles=1 > hipri=1 > runtime=10 > time_based > group_reporting > randrepeat=0 > filename=/dev/mapper/testdev > bs=4k > > [job-1] > cpus_allowed=14 > ``` > > IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) > --------------- | -------------------- > 213k | 19k > > At least, it doesn't work well with io_uring interface. > > Jeffe, I ran your above fio test on a linear LV split across 3 NVMes to second your split mapping (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio and io_uring, the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles and hipri) which resulted ok: sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) ------|----------|---------------------|----------------- 56.3K | 290K | 329K | 351K I can't second your drastic hipri=1 drop here... Heinz -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: > On 3/5/21 10:52 AM, JeffleXu wrote: >> >> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>> >>> On Wed, 3 Mar 2021, JeffleXu wrote: >>> >>>> >>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>> >>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>> cookie. >>>>> >>>>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>>>> >>>>> --- >>>>> drivers/md/dm.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> Index: linux-2.6/drivers/md/dm.c >>>>> =================================================================== >>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>> 19:26:34.000000000 +0100 >>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>> } >>>>> } >>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>> + } >>>>> + >>>>> /* drop the extra reference count */ >>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>> } >>>> It seems that the general idea of your design is to >>>> 1) submit *one* split bio >>>> 2) blk_poll(), waiting the previously submitted split bio complets >>> No, I submit all the bios and poll for the last one. >>> >>>> and then submit next split bio, repeating the above process. I'm >>>> afraid >>>> the performance may be an issue here, since the batch every time >>>> blk_poll() reaps may decrease. >>> Could you benchmark it? >> I only tested dm-linear. >> >> The configuration (dm table) of dm-linear is: >> 0 1048576 linear /dev/nvme0n1 0 >> 1048576 1048576 linear /dev/nvme2n1 0 >> 2097152 1048576 linear /dev/nvme5n1 0 >> >> >> fio script used is: >> ``` >> $cat fio.conf >> [global] >> name=iouring-sqpoll-iopoll-1 >> ioengine=io_uring >> iodepth=128 >> numjobs=1 >> thread >> rw=randread >> direct=1 >> registerfiles=1 >> hipri=1 >> runtime=10 >> time_based >> group_reporting >> randrepeat=0 >> filename=/dev/mapper/testdev >> bs=4k >> >> [job-1] >> cpus_allowed=14 >> ``` >> >> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >> --------------- | -------------------- >> 213k | 19k >> >> At least, it doesn't work well with io_uring interface. >> >> > > > Jeffle, > > I ran your above fio test on a linear LV split across 3 NVMes to > second your split mapping > (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio > and io_uring, > the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles > and hipri) which resulted ok: > > > > sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > ------|----------|---------------------|----------------- 56.3K | > 290K | 329K | 351K I can't second your > drastic hipri=1 drop here... Sorry, email mess. sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) -------|----------|---------------------|----------------- 56.3K | 290K | 329K | 351K I can't second your drastic hipri=1 drop here... > Heinz -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Fri, Mar 05 2021 at 12:56pm -0500, Heinz Mauelshagen <heinzm@redhat.com> wrote: > > On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: > >On 3/5/21 10:52 AM, JeffleXu wrote: > >> > >>On 3/3/21 6:09 PM, Mikulas Patocka wrote: > >>> > >>>On Wed, 3 Mar 2021, JeffleXu wrote: > >>> > >>>> > >>>>On 3/3/21 3:05 AM, Mikulas Patocka wrote: > >>>> > >>>>>Support I/O polling if submit_bio_noacct_mq_direct returned non-empty > >>>>>cookie. > >>>>> > >>>>>Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > >>>>> > >>>>>--- > >>>>> drivers/md/dm.c | 5 +++++ > >>>>> 1 file changed, 5 insertions(+) > >>>>> > >>>>>Index: linux-2.6/drivers/md/dm.c > >>>>>=================================================================== > >>>>>--- linux-2.6.orig/drivers/md/dm.c 2021-03-02 > >>>>>19:26:34.000000000 +0100 > >>>>>+++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > >>>>>@@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru > >>>>> } > >>>>> } > >>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { > >>>>>+ while (atomic_read(&ci.io->io_count) > 1 && > >>>>>+ blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > >>>>>+ } > >>>>>+ > >>>>> /* drop the extra reference count */ > >>>>> dec_pending(ci.io, errno_to_blk_status(error)); > >>>>> } > >>>>It seems that the general idea of your design is to > >>>>1) submit *one* split bio > >>>>2) blk_poll(), waiting the previously submitted split bio complets > >>>No, I submit all the bios and poll for the last one. > >>> > >>>>and then submit next split bio, repeating the above process. > >>>>I'm afraid > >>>>the performance may be an issue here, since the batch every time > >>>>blk_poll() reaps may decrease. > >>>Could you benchmark it? > >>I only tested dm-linear. > >> > >>The configuration (dm table) of dm-linear is: > >>0 1048576 linear /dev/nvme0n1 0 > >>1048576 1048576 linear /dev/nvme2n1 0 > >>2097152 1048576 linear /dev/nvme5n1 0 > >> > >> > >>fio script used is: > >>``` > >>$cat fio.conf > >>[global] > >>name=iouring-sqpoll-iopoll-1 > >>ioengine=io_uring > >>iodepth=128 > >>numjobs=1 > >>thread > >>rw=randread > >>direct=1 > >>registerfiles=1 > >>hipri=1 > >>runtime=10 > >>time_based > >>group_reporting > >>randrepeat=0 > >>filename=/dev/mapper/testdev > >>bs=4k > >> > >>[job-1] > >>cpus_allowed=14 > >>``` > >> > >>IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) > >>--------------- | -------------------- > >> 213k | 19k > >> > >>At least, it doesn't work well with io_uring interface. > >> > >> > > > > > >Jeffle, > > > >I ran your above fio test on a linear LV split across 3 NVMes to > >second your split mapping > >(system: 32 core Intel, 256GiB RAM) comparing io engines sync, > >libaio and io_uring, > >the latter w/ and w/o hipri (sync+libaio obviously w/o > >registerfiles and hipri) which resulted ok: > > > > > > > >sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > >------|----------|---------------------|----------------- 56.3K > >| 290K | 329K | 351K I can't second > >your drastic hipri=1 drop here... > > > Sorry, email mess. > > > sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > -------|----------|---------------------|----------------- > 56.3K | 290K | 329K | 351K > > > > I can't second your drastic hipri=1 drop here... I think your result is just showcasing your powerful system's ability to poll every related HW queue.. whereas Jeffle's system is likely somehow more constrained (on a cpu level, memory, whatever). My basis for this is that Mikulas' changes simply always return an invalid cookie (BLK_QC_T_NONE) for purposes of intelligent IO polling. Such an implementation is completely invalid. I discussed with Jens and he said: "it needs to return something that f_op->iopoll() can make sense of. otherwise you have no option but to try everything." Mike -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 3/5/21 7:09 PM, Mike Snitzer wrote: > On Fri, Mar 05 2021 at 12:56pm -0500, > Heinz Mauelshagen <heinzm@redhat.com> wrote: > >> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: >>> On 3/5/21 10:52 AM, JeffleXu wrote: >>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>>>> On Wed, 3 Mar 2021, JeffleXu wrote: >>>>> >>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>>>> >>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>>>> cookie. >>>>>>> >>>>>>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>>>>>> >>>>>>> --- >>>>>>> drivers/md/dm.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> Index: linux-2.6/drivers/md/dm.c >>>>>>> =================================================================== >>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>>>> 19:26:34.000000000 +0100 >>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>>>> } >>>>>>> } >>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>>>> + } >>>>>>> + >>>>>>> /* drop the extra reference count */ >>>>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>>>> } >>>>>> It seems that the general idea of your design is to >>>>>> 1) submit *one* split bio >>>>>> 2) blk_poll(), waiting the previously submitted split bio complets >>>>> No, I submit all the bios and poll for the last one. >>>>> >>>>>> and then submit next split bio, repeating the above process. >>>>>> I'm afraid >>>>>> the performance may be an issue here, since the batch every time >>>>>> blk_poll() reaps may decrease. >>>>> Could you benchmark it? >>>> I only tested dm-linear. >>>> >>>> The configuration (dm table) of dm-linear is: >>>> 0 1048576 linear /dev/nvme0n1 0 >>>> 1048576 1048576 linear /dev/nvme2n1 0 >>>> 2097152 1048576 linear /dev/nvme5n1 0 >>>> >>>> >>>> fio script used is: >>>> ``` >>>> $cat fio.conf >>>> [global] >>>> name=iouring-sqpoll-iopoll-1 >>>> ioengine=io_uring >>>> iodepth=128 >>>> numjobs=1 >>>> thread >>>> rw=randread >>>> direct=1 >>>> registerfiles=1 >>>> hipri=1 >>>> runtime=10 >>>> time_based >>>> group_reporting >>>> randrepeat=0 >>>> filename=/dev/mapper/testdev >>>> bs=4k >>>> >>>> [job-1] >>>> cpus_allowed=14 >>>> ``` >>>> >>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >>>> --------------- | -------------------- >>>> 213k | 19k >>>> >>>> At least, it doesn't work well with io_uring interface. >>>> >>>> >>> >>> Jeffle, >>> >>> I ran your above fio test on a linear LV split across 3 NVMes to >>> second your split mapping >>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, >>> libaio and io_uring, >>> the latter w/ and w/o hipri (sync+libaio obviously w/o >>> registerfiles and hipri) which resulted ok: >>> >>> >>> >>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >>> ------|----------|---------------------|----------------- 56.3K >>> | 290K | 329K | 351K I can't second >>> your drastic hipri=1 drop here... >> >> Sorry, email mess. >> >> >> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >> -------|----------|---------------------|----------------- >> 56.3K | 290K | 329K | 351K >> >> >> >> I can't second your drastic hipri=1 drop here... > I think your result is just showcasing your powerful system's ability to > poll every related HW queue.. whereas Jeffle's system is likely somehow > more constrained (on a cpu level, memory, whatever). Jeffle, mind explaining your test system configuration a bit more (cores, ram, i/o config to your NVMes) so that we have a better base for such assumption? Thanks, Heinz > > My basis for this is that Mikulas' changes simply always return an > invalid cookie (BLK_QC_T_NONE) for purposes of intelligent IO polling. > > Such an implementation is completely invalid. > > I discussed with Jens and he said: > "it needs to return something that f_op->iopoll() can make sense of. > otherwise you have no option but to try everything." > > Mike > > -- > dm-devel mailing list > dm-devel@redhat.com > https://listman.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 3/4/21 3:09 AM, Mikulas Patocka wrote: > > > On Thu, 4 Mar 2021, JeffleXu wrote: > >>> __split_and_process_non_flush records the poll cookie in ci.poll_cookie. >>> When we processed all the bios, we poll for the last cookie here: >>> >>> if (ci.poll_cookie != BLK_QC_T_NONE) { >>> while (atomic_read(&ci.io->io_count) > 1 && >>> blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>> } >> >> So what will happen if one bio submitted to dm device crosses the device >> boundary among several target devices (e.g., dm-stripe)? Please refer >> the following call graph. >> >> ``` >> submit_bio >> __submit_bio_noacct >> disk->fops->submit_bio(), calling into __split_and_process_bio(), >> call __split_and_process_non_flush() once, submitting the *first* split bio >> disk->fops->submit_bio(), calling into __split_and_process_bio(), >> call __split_and_process_non_flush() once, submitting the *second* split bio >> ... >> ``` >> >> >> So the loop is in __submit_bio_noacct(), rather than >> __split_and_process_bio(). Your design will send the first split bio, >> and then poll on this split bio, then send the next split bio, polling >> on this, go on and on... > > No. It will send all the bios and poll for the last one. I took a quick look, and this seems very broken. You must not poll off the submission path, polling should be invoked by the higher layer when someone wants to reap events. IOW, dm should not be calling blk_poll() by itself, only off mq_ops->poll(). Your patch seems to do it off submission once you submit the last bio in that batch, effectively implementing sync polling for that series. That's not right.
On 3/6/21 1:56 AM, Heinz Mauelshagen wrote: > > On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: >> On 3/5/21 10:52 AM, JeffleXu wrote: >>> >>> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>>> >>>> On Wed, 3 Mar 2021, JeffleXu wrote: >>>> >>>>> >>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>>> >>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>>> cookie. >>>>>> >>>>>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>>>>> >>>>>> --- >>>>>> drivers/md/dm.c | 5 +++++ >>>>>> 1 file changed, 5 insertions(+) >>>>>> >>>>>> Index: linux-2.6/drivers/md/dm.c >>>>>> =================================================================== >>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>>> 19:26:34.000000000 +0100 >>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>>> } >>>>>> } >>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>>> + } >>>>>> + >>>>>> /* drop the extra reference count */ >>>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>>> } >>>>> It seems that the general idea of your design is to >>>>> 1) submit *one* split bio >>>>> 2) blk_poll(), waiting the previously submitted split bio complets >>>> No, I submit all the bios and poll for the last one. >>>> >>>>> and then submit next split bio, repeating the above process. I'm >>>>> afraid >>>>> the performance may be an issue here, since the batch every time >>>>> blk_poll() reaps may decrease. >>>> Could you benchmark it? >>> I only tested dm-linear. >>> >>> The configuration (dm table) of dm-linear is: >>> 0 1048576 linear /dev/nvme0n1 0 >>> 1048576 1048576 linear /dev/nvme2n1 0 >>> 2097152 1048576 linear /dev/nvme5n1 0 >>> >>> >>> fio script used is: >>> ``` >>> $cat fio.conf >>> [global] >>> name=iouring-sqpoll-iopoll-1 >>> ioengine=io_uring >>> iodepth=128 >>> numjobs=1 >>> thread >>> rw=randread >>> direct=1 >>> registerfiles=1 >>> hipri=1 >>> runtime=10 >>> time_based >>> group_reporting >>> randrepeat=0 >>> filename=/dev/mapper/testdev >>> bs=4k >>> >>> [job-1] >>> cpus_allowed=14 >>> ``` >>> >>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >>> --------------- | -------------------- >>> 213k | 19k >>> >>> At least, it doesn't work well with io_uring interface. >>> >>> >> >> >> Jeffle, >> >> I ran your above fio test on a linear LV split across 3 NVMes to >> second your split mapping >> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio >> and io_uring, >> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles >> and hipri) which resulted ok: >> >> >> >> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >> ------|----------|---------------------|----------------- 56.3K | >> 290K | 329K | 351K I can't second your >> drastic hipri=1 drop here... > > > Sorry, email mess. > > > sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > -------|----------|---------------------|----------------- > 56.3K | 290K | 329K | 351K > > > > I can't second your drastic hipri=1 drop here... > Hummm, that's indeed somewhat strange... My test environment: - CPU: 128 cores, though only one CPU core is used since 'cpus_allowed=14' in fio configuration - memory: 983G memory free - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3' Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still goes into IRQ mode, even you have specified 'hipri=1'?
On 3/7/21 8:54 PM, JeffleXu wrote: > > > On 3/6/21 1:56 AM, Heinz Mauelshagen wrote: >> >> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: >>> On 3/5/21 10:52 AM, JeffleXu wrote: >>>> >>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>>>> >>>>> On Wed, 3 Mar 2021, JeffleXu wrote: >>>>> >>>>>> >>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>>>> >>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>>>> cookie. >>>>>>> >>>>>>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>>>>>> >>>>>>> --- >>>>>>> drivers/md/dm.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> Index: linux-2.6/drivers/md/dm.c >>>>>>> =================================================================== >>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>>>> 19:26:34.000000000 +0100 >>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>>>> } >>>>>>> } >>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>>>> + } >>>>>>> + >>>>>>> /* drop the extra reference count */ >>>>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>>>> } >>>>>> It seems that the general idea of your design is to >>>>>> 1) submit *one* split bio >>>>>> 2) blk_poll(), waiting the previously submitted split bio complets >>>>> No, I submit all the bios and poll for the last one. >>>>> >>>>>> and then submit next split bio, repeating the above process. I'm >>>>>> afraid >>>>>> the performance may be an issue here, since the batch every time >>>>>> blk_poll() reaps may decrease. >>>>> Could you benchmark it? >>>> I only tested dm-linear. >>>> >>>> The configuration (dm table) of dm-linear is: >>>> 0 1048576 linear /dev/nvme0n1 0 >>>> 1048576 1048576 linear /dev/nvme2n1 0 >>>> 2097152 1048576 linear /dev/nvme5n1 0 >>>> >>>> >>>> fio script used is: >>>> ``` >>>> $cat fio.conf >>>> [global] >>>> name=iouring-sqpoll-iopoll-1 >>>> ioengine=io_uring >>>> iodepth=128 >>>> numjobs=1 >>>> thread >>>> rw=randread >>>> direct=1 >>>> registerfiles=1 >>>> hipri=1 >>>> runtime=10 >>>> time_based >>>> group_reporting >>>> randrepeat=0 >>>> filename=/dev/mapper/testdev >>>> bs=4k >>>> >>>> [job-1] >>>> cpus_allowed=14 >>>> ``` >>>> >>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >>>> --------------- | -------------------- >>>> 213k | 19k >>>> >>>> At least, it doesn't work well with io_uring interface. >>>> >>>> >>> >>> >>> Jeffle, >>> >>> I ran your above fio test on a linear LV split across 3 NVMes to >>> second your split mapping >>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio >>> and io_uring, >>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles >>> and hipri) which resulted ok: >>> >>> >>> >>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >>> ------|----------|---------------------|----------------- 56.3K | >>> 290K | 329K | 351K I can't second your >>> drastic hipri=1 drop here... >> >> >> Sorry, email mess. >> >> >> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >> -------|----------|---------------------|----------------- >> 56.3K | 290K | 329K | 351K >> >> >> >> I can't second your drastic hipri=1 drop here... >> > > Hummm, that's indeed somewhat strange... > > My test environment: > - CPU: 128 cores, though only one CPU core is used since > 'cpus_allowed=14' in fio configuration > - memory: 983G memory free > - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3' > > Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still > goes into IRQ mode, even you have specified 'hipri=1'? That would be my guess too, and the patches also have a very suspicious clear of HIPRI which shouldn't be there (which would let that fly through).
On 3/8/21 4:54 AM, JeffleXu wrote: > > On 3/6/21 1:56 AM, Heinz Mauelshagen wrote: >> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: >>> On 3/5/21 10:52 AM, JeffleXu wrote: >>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>>>> On Wed, 3 Mar 2021, JeffleXu wrote: >>>>> >>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>>>> >>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>>>> cookie. >>>>>>> >>>>>>> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> >>>>>>> >>>>>>> --- >>>>>>> drivers/md/dm.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> Index: linux-2.6/drivers/md/dm.c >>>>>>> =================================================================== >>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>>>> 19:26:34.000000000 +0100 >>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>>>> } >>>>>>> } >>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>>>> + } >>>>>>> + >>>>>>> /* drop the extra reference count */ >>>>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>>>> } >>>>>> It seems that the general idea of your design is to >>>>>> 1) submit *one* split bio >>>>>> 2) blk_poll(), waiting the previously submitted split bio complets >>>>> No, I submit all the bios and poll for the last one. >>>>> >>>>>> and then submit next split bio, repeating the above process. I'm >>>>>> afraid >>>>>> the performance may be an issue here, since the batch every time >>>>>> blk_poll() reaps may decrease. >>>>> Could you benchmark it? >>>> I only tested dm-linear. >>>> >>>> The configuration (dm table) of dm-linear is: >>>> 0 1048576 linear /dev/nvme0n1 0 >>>> 1048576 1048576 linear /dev/nvme2n1 0 >>>> 2097152 1048576 linear /dev/nvme5n1 0 >>>> >>>> >>>> fio script used is: >>>> ``` >>>> $cat fio.conf >>>> [global] >>>> name=iouring-sqpoll-iopoll-1 >>>> ioengine=io_uring >>>> iodepth=128 >>>> numjobs=1 >>>> thread >>>> rw=randread >>>> direct=1 >>>> registerfiles=1 >>>> hipri=1 >>>> runtime=10 >>>> time_based >>>> group_reporting >>>> randrepeat=0 >>>> filename=/dev/mapper/testdev >>>> bs=4k >>>> >>>> [job-1] >>>> cpus_allowed=14 >>>> ``` >>>> >>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >>>> --------------- | -------------------- >>>> 213k | 19k >>>> >>>> At least, it doesn't work well with io_uring interface. >>>> >>>> >>> >>> Jeffle, >>> >>> I ran your above fio test on a linear LV split across 3 NVMes to >>> second your split mapping >>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio >>> and io_uring, >>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles >>> and hipri) which resulted ok: >>> >>> >>> >>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >>> ------|----------|---------------------|----------------- 56.3K | >>> 290K | 329K | 351K I can't second your >>> drastic hipri=1 drop here... >> >> Sorry, email mess. >> >> >> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >> -------|----------|---------------------|----------------- >> 56.3K | 290K | 329K | 351K >> >> >> >> I can't second your drastic hipri=1 drop here... >> > Hummm, that's indeed somewhat strange... > > My test environment: > - CPU: 128 cores, though only one CPU core is used since > 'cpus_allowed=14' in fio configuration > - memory: 983G memory free > - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3' > > Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still > goes into IRQ mode, even you have specified 'hipri=1'? > Jeffle, nvme.poll_queues was zero indeed. At 3 results are hipri=0 / hipri=1 : 1699K / 1702K IOPS (all cores) Single core results : 315K / 329K Still no extreme drop... FWIW: Thanks, Heinz -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Index: linux-2.6/drivers/md/dm.c =================================================================== --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru } } + if (ci.poll_cookie != BLK_QC_T_NONE) { + while (atomic_read(&ci.io->io_count) > 1 && + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; + } + /* drop the extra reference count */ dec_pending(ci.io, errno_to_blk_status(error)); }
Support I/O polling if submit_bio_noacct_mq_direct returned non-empty cookie. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm.c | 5 +++++ 1 file changed, 5 insertions(+) -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel