Message ID | 20161025001342.76126-19-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote: > We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be > at least HPAGE_PMD_NR. For x86-64, it's 512 pages. NAK. The maximum bio size should not depend on an obscure vm config, please send a standalone patch increasing the size to the block list, with a much long explanation. Also you can't simply increase the size of the largers pool, we'll probably need more pools instead, or maybe even implement a similar chaining scheme as we do for struct scatterlist. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote: > On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote: > > We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be > > at least HPAGE_PMD_NR. For x86-64, it's 512 pages. > > NAK. The maximum bio size should not depend on an obscure vm config, > please send a standalone patch increasing the size to the block list, > with a much long explanation. Also you can't simply increase the size > of the largers pool, we'll probably need more pools instead, or maybe > even implement a similar chaining scheme as we do for struct > scatterlist. The size of required pool depends on architecture: different architectures has different (huge page size)/(base page size). Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR, if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?
On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote: >> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote: >>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be >>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages. >> >> NAK. The maximum bio size should not depend on an obscure vm config, >> please send a standalone patch increasing the size to the block list, >> with a much long explanation. Also you can't simply increase the size >> of the largers pool, we'll probably need more pools instead, or maybe >> even implement a similar chaining scheme as we do for struct >> scatterlist. > > The size of required pool depends on architecture: different architectures > has different (huge page size)/(base page size). > > Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR, > if it's bigger than than BIO_MAX_PAGES and huge pages are enabled? Why wouldn't you have all the pool sizes in between? Definitely 1MB has been too small already for high-bandwidth IO. I wouldn't mind BIOs up to 4MB or larger since most high-end RAID hardware does best with 4MB IOs. Cheers, Andreas
On Wed, Oct 26, 2016 at 12:13 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: >> >> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote: >>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote: >>>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be >>>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages. >>> >>> NAK. The maximum bio size should not depend on an obscure vm config, >>> please send a standalone patch increasing the size to the block list, >>> with a much long explanation. Also you can't simply increase the size >>> of the largers pool, we'll probably need more pools instead, or maybe >>> even implement a similar chaining scheme as we do for struct >>> scatterlist. >> >> The size of required pool depends on architecture: different architectures >> has different (huge page size)/(base page size). >> >> Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR, >> if it's bigger than than BIO_MAX_PAGES and huge pages are enabled? > > Why wouldn't you have all the pool sizes in between? Definitely 1MB has > been too small already for high-bandwidth IO. I wouldn't mind BIOs up to > 4MB or larger since most high-end RAID hardware does best with 4MB IOs. I am preparing for the multipage bvec support[1], and once it is ready the default 256 bvecs should be enough for normal cases. I will post them out next month for review. [1] https://github.com/ming1/linux/tree/mp-bvec-0.1-v4.9 Thanks, Ming > > Cheers, Andreas > > > > >
On Tue, Oct 25, 2016 at 03:54:31PM +0300, Kirill A. Shutemov wrote: > The size of required pool depends on architecture: different architectures > has different (huge page size)/(base page size). Please explain first why they are required and not just nice to have. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 25, 2016 at 10:13:13PM -0600, Andreas Dilger wrote: > Why wouldn't you have all the pool sizes in between? Definitely 1MB has > been too small already for high-bandwidth IO. I wouldn't mind BIOs up to > 4MB or larger since most high-end RAID hardware does best with 4MB IOs. I/O sizes are not limited by the bio size, we can already support larger than 1MB I/O for a long time. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 26, 2016 at 03:30:05PM +0800, Ming Lei wrote: > I am preparing for the multipage bvec support[1], and once it is ready the > default 256 bvecs should be enough for normal cases. Yes, multipage bvecs are defintively the way to got to efficiently support I/O on huge pages. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/bio.c b/block/bio.c index db85c5753a76..a69062bda3e0 100644 --- a/block/bio.c +++ b/block/bio.c @@ -44,7 +44,8 @@ */ #define BV(x) { .nr_vecs = x, .name = "biovec-"__stringify(x) } static struct biovec_slab bvec_slabs[BVEC_POOL_NR] __read_mostly = { - BV(1), BV(4), BV(16), BV(64), BV(128), BV(BIO_MAX_PAGES), + BV(1), BV(4), BV(16), BV(64), BV(128), + { .nr_vecs = BIO_MAX_PAGES, .name ="biovec-max_pages" }, }; #undef BV diff --git a/include/linux/bio.h b/include/linux/bio.h index 97cb48f03dc7..19d0fae9cdd0 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -38,7 +38,11 @@ #define BIO_BUG_ON #endif +#ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE +#define BIO_MAX_PAGES (HPAGE_PMD_NR > 256 ? HPAGE_PMD_NR : 256) +#else #define BIO_MAX_PAGES 256 +#endif #define bio_prio(bio) (bio)->bi_ioprio #define bio_set_prio(bio, prio) ((bio)->bi_ioprio = prio)
We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be at least HPAGE_PMD_NR. For x86-64, it's 512 pages. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- block/bio.c | 3 ++- include/linux/bio.h | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-)