diff mbox series

[2/2] blktrace: limit allowed total trace buffer size

Message ID 20210323081440.81343-3-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blktrace: fix trace buffer leak and limit trace buffer size | expand

Commit Message

Ming Lei March 23, 2021, 8:14 a.m. UTC
On some ARCHs, such as aarch64, page size may be 64K, meantime there may
be lots of CPU cores. relay_open() needs to allocate pages on each CPU
blktrace, so easily too many pages are taken by blktrace. For example,
on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got
allocated 7GB in case of 'blktrace -b 8192' which is used by device-mapper
test suite[1]. This way could cause OOM easily.

Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages().

[1] https://github.com/jthornber/device-mapper-test-suite.git

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 kernel/trace/blktrace.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Su Yue March 30, 2021, 2:57 a.m. UTC | #1
On Tue 23 Mar 2021 at 16:14, Ming Lei <ming.lei@redhat.com> wrote:

> On some ARCHs, such as aarch64, page size may be 64K, meantime 
> there may
> be lots of CPU cores. relay_open() needs to allocate pages on 
> each CPU
> blktrace, so easily too many pages are taken by blktrace. For 
> example,
> on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally 
> got
> allocated 7GB in case of 'blktrace -b 8192' which is used by 
> device-mapper
> test suite[1]. This way could cause OOM easily.
>
> Fix the issue by limiting max allowed pages to be 1/8 of 
> totalram_pages().
>
> [1] https://github.com/jthornber/device-mapper-test-suite.git
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  kernel/trace/blktrace.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index c221e4c3f625..8403ff19d533 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -466,6 +466,35 @@ static void blk_trace_setup_lba(struct 
> blk_trace *bt,
>  	}
>  }
>
> +/* limit total allocated buffer size is <= 1/8 of total pages 
> */
> +static void validate_and_adjust_buf(struct blk_user_trace_setup 
> *buts)
> +{
> +	unsigned buf_size = buts->buf_size;
> +	unsigned buf_nr = buts->buf_nr;
> +	unsigned long max_allowed_pages = totalram_pages() >> 3;
> +	unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> 
> PAGE_SHIFT;
> +
> +	if (req_pages * num_online_cpus() <= max_allowed_pages)
> +		return;
> +
> +	req_pages = DIV_ROUND_UP(max_allowed_pages, 
> num_online_cpus());
> +
> +	if (req_pages == 0) {
> +		buf_size = PAGE_SIZE;
> +		buf_nr = 1;
> +	} else {
> +		buf_size = req_pages << PAGE_SHIFT / buf_nr;
>
Should it be:
buf_size = (req_pages << PAGE_SHIFT) / buf_nr;
?
The priority of '<<' is lower than '/', right? :)

--
Su
> +		if (buf_size < PAGE_SIZE)
> +			buf_size = PAGE_SIZE;
> +		buf_nr = req_pages << PAGE_SHIFT / buf_size;
> +		if (buf_nr == 0)
> +			buf_nr = 1;
> +	}
> +
> +	buts->buf_size = min_t(unsigned, buf_size, buts->buf_size);
> +	buts->buf_nr = min_t(unsigned, buf_nr, buts->buf_nr);
> +}
> +
>  /*
>   * Setup everything required to start tracing
>   */
> @@ -482,6 +511,9 @@ static int do_blk_trace_setup(struct 
> request_queue *q, char *name, dev_t dev,
>  	if (!buts->buf_size || !buts->buf_nr)
>  		return -EINVAL;
>
> +	/* make sure not allocate too much for userspace */
> +	validate_and_adjust_buf(buts);
> +
>  	strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
>  	buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
Ming Lei March 30, 2021, 3:55 a.m. UTC | #2
On Tue, Mar 30, 2021 at 10:57:04AM +0800, Su Yue wrote:
> 
> On Tue 23 Mar 2021 at 16:14, Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On some ARCHs, such as aarch64, page size may be 64K, meantime there may
> > be lots of CPU cores. relay_open() needs to allocate pages on each CPU
> > blktrace, so easily too many pages are taken by blktrace. For example,
> > on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got
> > allocated 7GB in case of 'blktrace -b 8192' which is used by
> > device-mapper
> > test suite[1]. This way could cause OOM easily.
> > 
> > Fix the issue by limiting max allowed pages to be 1/8 of
> > totalram_pages().
> > 
> > [1] https://github.com/jthornber/device-mapper-test-suite.git
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  kernel/trace/blktrace.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> > index c221e4c3f625..8403ff19d533 100644
> > --- a/kernel/trace/blktrace.c
> > +++ b/kernel/trace/blktrace.c
> > @@ -466,6 +466,35 @@ static void blk_trace_setup_lba(struct blk_trace
> > *bt,
> >  	}
> >  }
> > 
> > +/* limit total allocated buffer size is <= 1/8 of total pages */
> > +static void validate_and_adjust_buf(struct blk_user_trace_setup *buts)
> > +{
> > +	unsigned buf_size = buts->buf_size;
> > +	unsigned buf_nr = buts->buf_nr;
> > +	unsigned long max_allowed_pages = totalram_pages() >> 3;
> > +	unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> PAGE_SHIFT;
> > +
> > +	if (req_pages * num_online_cpus() <= max_allowed_pages)
> > +		return;
> > +
> > +	req_pages = DIV_ROUND_UP(max_allowed_pages, num_online_cpus());
> > +
> > +	if (req_pages == 0) {
> > +		buf_size = PAGE_SIZE;
> > +		buf_nr = 1;
> > +	} else {
> > +		buf_size = req_pages << PAGE_SHIFT / buf_nr;
> > 
> Should it be:
> buf_size = (req_pages << PAGE_SHIFT) / buf_nr;
> ?
> The priority of '<<' is lower than '/', right? :)

Good catch, thanks!
Christoph Hellwig March 30, 2021, 4:57 p.m. UTC | #3
On Tue, Mar 23, 2021 at 04:14:40PM +0800, Ming Lei wrote:
> On some ARCHs, such as aarch64, page size may be 64K, meantime there may

Which we call arm64..

> be lots of CPU cores. relay_open() needs to allocate pages on each CPU
> blktrace, so easily too many pages are taken by blktrace. For example,
> on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got
> allocated 7GB in case of 'blktrace -b 8192' which is used by device-mapper
> test suite[1]. This way could cause OOM easily.
> 
> Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages().

Doesn't this break the blktrace ABI by using different buffer size
and numbers than the user asked for?  I think we can enforce an
upper limit and error out, but silently adjusting seems wrong.

Wouldn't it make more sense to fix userspace to not request so many
and so big buffers instead?
diff mbox series

Patch

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index c221e4c3f625..8403ff19d533 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -466,6 +466,35 @@  static void blk_trace_setup_lba(struct blk_trace *bt,
 	}
 }
 
+/* limit total allocated buffer size is <= 1/8 of total pages */
+static void validate_and_adjust_buf(struct blk_user_trace_setup *buts)
+{
+	unsigned buf_size = buts->buf_size;
+	unsigned buf_nr = buts->buf_nr;
+	unsigned long max_allowed_pages = totalram_pages() >> 3;
+	unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> PAGE_SHIFT;
+
+	if (req_pages * num_online_cpus() <= max_allowed_pages)
+		return;
+
+	req_pages = DIV_ROUND_UP(max_allowed_pages, num_online_cpus());
+
+	if (req_pages == 0) {
+		buf_size = PAGE_SIZE;
+		buf_nr = 1;
+	} else {
+		buf_size = req_pages << PAGE_SHIFT / buf_nr;
+		if (buf_size < PAGE_SIZE)
+			buf_size = PAGE_SIZE;
+		buf_nr = req_pages << PAGE_SHIFT / buf_size;
+		if (buf_nr == 0)
+			buf_nr = 1;
+	}
+
+	buts->buf_size = min_t(unsigned, buf_size, buts->buf_size);
+	buts->buf_nr = min_t(unsigned, buf_nr, buts->buf_nr);
+}
+
 /*
  * Setup everything required to start tracing
  */
@@ -482,6 +511,9 @@  static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	if (!buts->buf_size || !buts->buf_nr)
 		return -EINVAL;
 
+	/* make sure not allocate too much for userspace */
+	validate_and_adjust_buf(buts);
+
 	strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 	buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';