[3/4] libceph: use kvmalloc to allocate page vector
diff mbox series

Message ID 20180928094355.6049-3-zyan@redhat.com
State New
Headers show
Series
  • [1/4] Revert "ceph: fix dentry leak in splice_dentry()"
Related show

Commit Message

Yan, Zheng Sept. 28, 2018, 9:43 a.m. UTC
large read may require allocating large page vector

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
---
 net/ceph/pagevec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Ilya Dryomov Sept. 28, 2018, 11:47 a.m. UTC | #1
On Fri, Sep 28, 2018 at 11:45 AM Yan, Zheng <zyan@redhat.com> wrote:
>
> large read may require allocating large page vector
>
> Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
> ---
>  net/ceph/pagevec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
> index d3736f5bffec..b1c47ba0c38a 100644
> --- a/net/ceph/pagevec.c
> +++ b/net/ceph/pagevec.c
> @@ -62,7 +62,7 @@ void ceph_release_page_vector(struct page **pages, int num_pages)
>
>         for (i = 0; i < num_pages; i++)
>                 __free_pages(pages[i], 0);
> -       kfree(pages);
> +       kvfree(pages);
>  }
>  EXPORT_SYMBOL(ceph_release_page_vector);
>
> @@ -74,7 +74,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
>         struct page **pages;
>         int i;
>
> -       pages = kmalloc_array(num_pages, sizeof(*pages), flags);
> +       pages = kvmalloc_array(num_pages, sizeof(*pages), flags);
>         if (!pages)
>                 return ERR_PTR(-ENOMEM);
>         for (i = 0; i < num_pages; i++) {

Have you considered rewriting ceph_sync_read()?  In the end, it does
a synchronous OSD request per object, it's not atomic in any way, so in
theory it shouldn't need more than object_size worth of pages.

Fengguang's application is doing ~100M reads, and the one he reported
was likely a ~2G read.  The page pointer array itself is only part of
problem, allocating 2G worth of pages when we don't actually need all
of them at once is wrong.

Thanks,

                Ilya
Yan, Zheng Sept. 28, 2018, 12:58 p.m. UTC | #2
On Fri, Sep 28, 2018 at 7:50 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Fri, Sep 28, 2018 at 11:45 AM Yan, Zheng <zyan@redhat.com> wrote:
> >
> > large read may require allocating large page vector
> >
> > Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
> > ---
> >  net/ceph/pagevec.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
> > index d3736f5bffec..b1c47ba0c38a 100644
> > --- a/net/ceph/pagevec.c
> > +++ b/net/ceph/pagevec.c
> > @@ -62,7 +62,7 @@ void ceph_release_page_vector(struct page **pages, int num_pages)
> >
> >         for (i = 0; i < num_pages; i++)
> >                 __free_pages(pages[i], 0);
> > -       kfree(pages);
> > +       kvfree(pages);
> >  }
> >  EXPORT_SYMBOL(ceph_release_page_vector);
> >
> > @@ -74,7 +74,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
> >         struct page **pages;
> >         int i;
> >
> > -       pages = kmalloc_array(num_pages, sizeof(*pages), flags);
> > +       pages = kvmalloc_array(num_pages, sizeof(*pages), flags);
> >         if (!pages)
> >                 return ERR_PTR(-ENOMEM);
> >         for (i = 0; i < num_pages; i++) {
>
> Have you considered rewriting ceph_sync_read()?  In the end, it does
> a synchronous OSD request per object, it's not atomic in any way, so in
> theory it shouldn't need more than object_size worth of pages.
>

Yes, it's a better. But I'm a little busy recently. No idea when I
will have time.

Regards
Yan, Zheng

> Fengguang's application is doing ~100M reads, and the one he reported
> was likely a ~2G read.  The page pointer array itself is only part of
> problem, allocating 2G worth of pages when we don't actually need all
> of them at once is wrong.
>
> Thanks,
>
>                 Ilya
Jeff Layton Oct. 11, 2018, 3:10 p.m. UTC | #3
On Fri, 2018-09-28 at 17:43 +0800, Yan, Zheng wrote:
> large read may require allocating large page vector
> 
> Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
> ---
>  net/ceph/pagevec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
> index d3736f5bffec..b1c47ba0c38a 100644
> --- a/net/ceph/pagevec.c
> +++ b/net/ceph/pagevec.c
> @@ -62,7 +62,7 @@ void ceph_release_page_vector(struct page **pages, int num_pages)
>  
>  	for (i = 0; i < num_pages; i++)
>  		__free_pages(pages[i], 0);
> -	kfree(pages);
> +	kvfree(pages);
>  }
>  EXPORT_SYMBOL(ceph_release_page_vector);
>  
> @@ -74,7 +74,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
>  	struct page **pages;
>  	int i;
>  
> -	pages = kmalloc_array(num_pages, sizeof(*pages), flags);
> +	pages = kvmalloc_array(num_pages, sizeof(*pages), flags);
>  	if (!pages)
>  		return ERR_PTR(-ENOMEM);
>  	for (i = 0; i < num_pages; i++) {

Reviewed-by: Jeff Layton <jlayton@redhat.com>

Patch
diff mbox series

diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index d3736f5bffec..b1c47ba0c38a 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -62,7 +62,7 @@  void ceph_release_page_vector(struct page **pages, int num_pages)
 
 	for (i = 0; i < num_pages; i++)
 		__free_pages(pages[i], 0);
-	kfree(pages);
+	kvfree(pages);
 }
 EXPORT_SYMBOL(ceph_release_page_vector);
 
@@ -74,7 +74,7 @@  struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
 	struct page **pages;
 	int i;
 
-	pages = kmalloc_array(num_pages, sizeof(*pages), flags);
+	pages = kvmalloc_array(num_pages, sizeof(*pages), flags);
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < num_pages; i++) {