diff mbox

[28/60] block: introduce QUEUE_FLAG_SPLIT_MP

Message ID 1477728600-12938-29-git-send-email-tom.leiming@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lei Oct. 29, 2016, 8:08 a.m. UTC
Some drivers(such as dm) should be capable of dealing with multipage
bvec, but the incoming bio may be too big, such as, a new singlepage bvec
bio can't be cloned from the bio, or can't be allocated to singlepage
bvec with same size.

At least crypt dm, log writes and bcache have this kind of issue.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c      | 4 ++++
 include/linux/blkdev.h | 2 ++
 2 files changed, 6 insertions(+)

Comments

Christoph Hellwig Oct. 31, 2016, 3:39 p.m. UTC | #1
On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> Some drivers(such as dm) should be capable of dealing with multipage
> bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> bio can't be cloned from the bio, or can't be allocated to singlepage
> bvec with same size.
> 
> At least crypt dm, log writes and bcache have this kind of issue.

We already have the segment_size limitation for request based drivers.
I'd rather extent it to bio drivers if really needed.

But then again we should look into not having this limitation.  E.g.
for bcache I'd be really surprised if it's that limited, given that
Kent came up with this whole multipage bvec scheme.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Oct. 31, 2016, 11:56 p.m. UTC | #2
On Mon, Oct 31, 2016 at 11:39 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
>> Some drivers(such as dm) should be capable of dealing with multipage
>> bvec, but the incoming bio may be too big, such as, a new singlepage bvec
>> bio can't be cloned from the bio, or can't be allocated to singlepage
>> bvec with same size.
>>
>> At least crypt dm, log writes and bcache have this kind of issue.
>
> We already have the segment_size limitation for request based drivers.
> I'd rather extent it to bio drivers if really needed.

Yeah, just found dm actually don't need the flag, and it has its own way
for limitting bio size. For bcache, there is only place which need the
flag, so we can use max sectors limit to address it or use multiple bio
to read & check ony by one.

>
> But then again we should look into not having this limitation.  E.g.
> for bcache I'd be really surprised if it's that limited, given that
> Kent came up with this whole multipage bvec scheme.

As far as I find, the only one which need the flag is bch_data_verify().


Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Nov. 2, 2016, 3:08 a.m. UTC | #3
On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> > Some drivers(such as dm) should be capable of dealing with multipage
> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> > bio can't be cloned from the bio, or can't be allocated to singlepage
> > bvec with same size.
> > 
> > At least crypt dm, log writes and bcache have this kind of issue.
> 
> We already have the segment_size limitation for request based drivers.
> I'd rather extent it to bio drivers if really needed.
> 
> But then again we should look into not having this limitation.  E.g.
> for bcache I'd be really surprised if it's that limited, given that
> Kent came up with this whole multipage bvec scheme.

AFAIK the only issue is with drivers that may have to bounce bios - pages that
were contiguous in the original bio won't necessarily be contiguous in the
bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.

I don't know what Ming's referring to by "singlepage bvec bios".

Anyways, bouncing comes up in multiple places so we probably need to come up
with a generic solution for that. Other than that, there shouldn't be any issues
or limitations - if you're not bouncing, there's no need to clone the bvecs.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Nov. 3, 2016, 10:38 a.m. UTC | #4
On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
>> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
>> > Some drivers(such as dm) should be capable of dealing with multipage
>> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
>> > bio can't be cloned from the bio, or can't be allocated to singlepage
>> > bvec with same size.
>> >
>> > At least crypt dm, log writes and bcache have this kind of issue.
>>
>> We already have the segment_size limitation for request based drivers.
>> I'd rather extent it to bio drivers if really needed.
>>
>> But then again we should look into not having this limitation.  E.g.
>> for bcache I'd be really surprised if it's that limited, given that
>> Kent came up with this whole multipage bvec scheme.
>
> AFAIK the only issue is with drivers that may have to bounce bios - pages that
> were contiguous in the original bio won't necessarily be contiguous in the
> bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
>
> I don't know what Ming's referring to by "singlepage bvec bios".
>
> Anyways, bouncing comes up in multiple places so we probably need to come up
> with a generic solution for that. Other than that, there shouldn't be any issues
> or limitations - if you're not bouncing, there's no need to clone the bvecs.

AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
for other bio_clone(), no direct access to io vec table, so default
multipage bvec
copy is fine.

I will remove the flag and try to fix bch_data_verify() by using multiple bio,
and I remembered I cooked patch to do that long time ago, :-)


Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Nov. 3, 2016, 11:20 a.m. UTC | #5
On Thu, Nov 03, 2016 at 06:38:57PM +0800, Ming Lei wrote:
> On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
> >> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> >> > Some drivers(such as dm) should be capable of dealing with multipage
> >> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> >> > bio can't be cloned from the bio, or can't be allocated to singlepage
> >> > bvec with same size.
> >> >
> >> > At least crypt dm, log writes and bcache have this kind of issue.
> >>
> >> We already have the segment_size limitation for request based drivers.
> >> I'd rather extent it to bio drivers if really needed.
> >>
> >> But then again we should look into not having this limitation.  E.g.
> >> for bcache I'd be really surprised if it's that limited, given that
> >> Kent came up with this whole multipage bvec scheme.
> >
> > AFAIK the only issue is with drivers that may have to bounce bios - pages that
> > were contiguous in the original bio won't necessarily be contiguous in the
> > bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
> >
> > I don't know what Ming's referring to by "singlepage bvec bios".
> >
> > Anyways, bouncing comes up in multiple places so we probably need to come up
> > with a generic solution for that. Other than that, there shouldn't be any issues
> > or limitations - if you're not bouncing, there's no need to clone the bvecs.
> 
> AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
> for other bio_clone(), no direct access to io vec table, so default
> multipage bvec
> copy is fine.
> 
> I will remove the flag and try to fix bch_data_verify() by using multiple bio,
> and I remembered I cooked patch to do that long time ago, :-)

You can #ifdef out the bch_data_verify() code, it's debug code that hasn't been
used in ages.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Nov. 3, 2016, 11:26 a.m. UTC | #6
On Thu, Nov 3, 2016 at 7:20 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Thu, Nov 03, 2016 at 06:38:57PM +0800, Ming Lei wrote:
>> On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
>> <kent.overstreet@gmail.com> wrote:
>> > On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
>> >> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
>> >> > Some drivers(such as dm) should be capable of dealing with multipage
>> >> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
>> >> > bio can't be cloned from the bio, or can't be allocated to singlepage
>> >> > bvec with same size.
>> >> >
>> >> > At least crypt dm, log writes and bcache have this kind of issue.
>> >>
>> >> We already have the segment_size limitation for request based drivers.
>> >> I'd rather extent it to bio drivers if really needed.
>> >>
>> >> But then again we should look into not having this limitation.  E.g.
>> >> for bcache I'd be really surprised if it's that limited, given that
>> >> Kent came up with this whole multipage bvec scheme.
>> >
>> > AFAIK the only issue is with drivers that may have to bounce bios - pages that
>> > were contiguous in the original bio won't necessarily be contiguous in the
>> > bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
>> >
>> > I don't know what Ming's referring to by "singlepage bvec bios".
>> >
>> > Anyways, bouncing comes up in multiple places so we probably need to come up
>> > with a generic solution for that. Other than that, there shouldn't be any issues
>> > or limitations - if you're not bouncing, there's no need to clone the bvecs.
>>
>> AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
>> for other bio_clone(), no direct access to io vec table, so default
>> multipage bvec
>> copy is fine.
>>
>> I will remove the flag and try to fix bch_data_verify() by using multiple bio,
>> and I remembered I cooked patch to do that long time ago, :-)
>
> You can #ifdef out the bch_data_verify() code, it's debug code that hasn't been
> used in ages.

Though you didn't test it ages, it is sitll working in my last test, :-)

But someone can enable that for debug too, I don't want to make him/her sad.
Kent Overstreet Nov. 3, 2016, 11:30 a.m. UTC | #7
On Thu, Nov 03, 2016 at 07:26:52PM +0800, Ming Lei wrote:
> On Thu, Nov 3, 2016 at 7:20 PM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Thu, Nov 03, 2016 at 06:38:57PM +0800, Ming Lei wrote:
> >> On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
> >> <kent.overstreet@gmail.com> wrote:
> >> > On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
> >> >> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> >> >> > Some drivers(such as dm) should be capable of dealing with multipage
> >> >> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> >> >> > bio can't be cloned from the bio, or can't be allocated to singlepage
> >> >> > bvec with same size.
> >> >> >
> >> >> > At least crypt dm, log writes and bcache have this kind of issue.
> >> >>
> >> >> We already have the segment_size limitation for request based drivers.
> >> >> I'd rather extent it to bio drivers if really needed.
> >> >>
> >> >> But then again we should look into not having this limitation.  E.g.
> >> >> for bcache I'd be really surprised if it's that limited, given that
> >> >> Kent came up with this whole multipage bvec scheme.
> >> >
> >> > AFAIK the only issue is with drivers that may have to bounce bios - pages that
> >> > were contiguous in the original bio won't necessarily be contiguous in the
> >> > bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
> >> >
> >> > I don't know what Ming's referring to by "singlepage bvec bios".
> >> >
> >> > Anyways, bouncing comes up in multiple places so we probably need to come up
> >> > with a generic solution for that. Other than that, there shouldn't be any issues
> >> > or limitations - if you're not bouncing, there's no need to clone the bvecs.
> >>
> >> AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
> >> for other bio_clone(), no direct access to io vec table, so default
> >> multipage bvec
> >> copy is fine.
> >>
> >> I will remove the flag and try to fix bch_data_verify() by using multiple bio,
> >> and I remembered I cooked patch to do that long time ago, :-)
> >
> > You can #ifdef out the bch_data_verify() code, it's debug code that hasn't been
> > used in ages.
> 
> Though you didn't test it ages, it is sitll working in my last test, :-)
> 
> But someone can enable that for debug too, I don't want to make him/her sad.

Up to you :)

It's not useful for anything but debugging though, so I wouldn't worry about
impacting end users.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2642e5fc8b69..266c94d1d82f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -79,6 +79,10 @@  static inline unsigned get_max_io_size(struct request_queue *q,
 	/* aligned to logical block size */
 	sectors &= ~(mask >> 9);
 
+	/* some queues can't handle bigger bio even it is ready for mp bvecs */
+	if (blk_queue_split_mp(q) && sectors > BIO_SP_MAX_SECTORS)
+		sectors = BIO_SP_MAX_SECTORS;
+
 	return sectors;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e4dd25361bd6..7cee0179c9e6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -506,6 +506,7 @@  struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
 #define QUEUE_FLAG_NO_MP       27	/* multipage bvecs isn't ready */
+#define QUEUE_FLAG_SPLIT_MP    28	/* split MP bvecs if too bigger */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -597,6 +598,7 @@  static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_no_mp(q)	test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
+#define blk_queue_split_mp(q)	test_bit(QUEUE_FLAG_SPLIT_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \