diff mbox

[PATCHv4,18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

Message ID 20161025001342.76126-19-kirill.shutemov@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kirill A . Shutemov Oct. 25, 2016, 12:13 a.m. UTC
We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
at least HPAGE_PMD_NR. For x86-64, it's 512 pages.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 block/bio.c         | 3 ++-
 include/linux/bio.h | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Oct. 25, 2016, 7:21 a.m. UTC | #1
On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
> at least HPAGE_PMD_NR. For x86-64, it's 512 pages.

NAK.  The maximum bio size should not depend on an obscure vm config,
please send a standalone patch increasing the size to the block list,
with a much long explanation.  Also you can't simply increase the size
of the largers pool, we'll probably need more pools instead, or maybe
even implement a similar chaining scheme as we do for struct
scatterlist.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov Oct. 25, 2016, 12:54 p.m. UTC | #2
On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
> > We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
> > at least HPAGE_PMD_NR. For x86-64, it's 512 pages.
> 
> NAK.  The maximum bio size should not depend on an obscure vm config,
> please send a standalone patch increasing the size to the block list,
> with a much long explanation.  Also you can't simply increase the size
> of the largers pool, we'll probably need more pools instead, or maybe
> even implement a similar chaining scheme as we do for struct
> scatterlist.

The size of required pool depends on architecture: different architectures
has different (huge page size)/(base page size).

Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR,
if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?
Andreas Dilger Oct. 26, 2016, 4:13 a.m. UTC | #3
On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
>>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
>>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages.
>> 
>> NAK.  The maximum bio size should not depend on an obscure vm config,
>> please send a standalone patch increasing the size to the block list,
>> with a much long explanation.  Also you can't simply increase the size
>> of the largers pool, we'll probably need more pools instead, or maybe
>> even implement a similar chaining scheme as we do for struct
>> scatterlist.
> 
> The size of required pool depends on architecture: different architectures
> has different (huge page size)/(base page size).
> 
> Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR,
> if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?

Why wouldn't you have all the pool sizes in between?  Definitely 1MB has
been too small already for high-bandwidth IO.  I wouldn't mind BIOs up to
4MB or larger since most high-end RAID hardware does best with 4MB IOs.

Cheers, Andreas
Ming Lei Oct. 26, 2016, 7:30 a.m. UTC | #4
On Wed, Oct 26, 2016 at 12:13 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>>
>> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
>>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
>>>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
>>>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages.
>>>
>>> NAK.  The maximum bio size should not depend on an obscure vm config,
>>> please send a standalone patch increasing the size to the block list,
>>> with a much long explanation.  Also you can't simply increase the size
>>> of the largers pool, we'll probably need more pools instead, or maybe
>>> even implement a similar chaining scheme as we do for struct
>>> scatterlist.
>>
>> The size of required pool depends on architecture: different architectures
>> has different (huge page size)/(base page size).
>>
>> Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR,
>> if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?
>
> Why wouldn't you have all the pool sizes in between?  Definitely 1MB has
> been too small already for high-bandwidth IO.  I wouldn't mind BIOs up to
> 4MB or larger since most high-end RAID hardware does best with 4MB IOs.

I am preparing for the multipage bvec support[1], and once it is ready the
default 256 bvecs should be enough for normal cases.

I will post them out next month for review.

[1] https://github.com/ming1/linux/tree/mp-bvec-0.1-v4.9

Thanks,
Ming

>
> Cheers, Andreas
>
>
>
>
>
Christoph Hellwig Oct. 26, 2016, 7:35 a.m. UTC | #5
On Tue, Oct 25, 2016 at 03:54:31PM +0300, Kirill A. Shutemov wrote:
> The size of required pool depends on architecture: different architectures
> has different (huge page size)/(base page size).

Please explain first why they are required and not just nice to have.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 26, 2016, 7:36 a.m. UTC | #6
On Tue, Oct 25, 2016 at 10:13:13PM -0600, Andreas Dilger wrote:
> Why wouldn't you have all the pool sizes in between?  Definitely 1MB has
> been too small already for high-bandwidth IO.  I wouldn't mind BIOs up to
> 4MB or larger since most high-end RAID hardware does best with 4MB IOs.

I/O sizes are not limited by the bio size, we can already support larger
than 1MB I/O for a long time.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 26, 2016, 7:36 a.m. UTC | #7
On Wed, Oct 26, 2016 at 03:30:05PM +0800, Ming Lei wrote:
> I am preparing for the multipage bvec support[1], and once it is ready the
> default 256 bvecs should be enough for normal cases.

Yes, multipage bvecs are defintively the way to got to efficiently
support I/O on huge pages.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index db85c5753a76..a69062bda3e0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -44,7 +44,8 @@ 
  */
 #define BV(x) { .nr_vecs = x, .name = "biovec-"__stringify(x) }
 static struct biovec_slab bvec_slabs[BVEC_POOL_NR] __read_mostly = {
-	BV(1), BV(4), BV(16), BV(64), BV(128), BV(BIO_MAX_PAGES),
+	BV(1), BV(4), BV(16), BV(64), BV(128),
+	{ .nr_vecs = BIO_MAX_PAGES, .name ="biovec-max_pages" },
 };
 #undef BV
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 97cb48f03dc7..19d0fae9cdd0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -38,7 +38,11 @@ 
 #define BIO_BUG_ON
 #endif
 
+#ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
+#define BIO_MAX_PAGES		(HPAGE_PMD_NR > 256 ? HPAGE_PMD_NR : 256)
+#else
 #define BIO_MAX_PAGES		256
+#endif
 
 #define bio_prio(bio)			(bio)->bi_ioprio
 #define bio_set_prio(bio, prio)		((bio)->bi_ioprio = prio)