diff mbox series

[1/2] block: shutdown blktrace in case of fatal signal pending

Message ID 20210323081440.81343-2-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blktrace: fix trace buffer leak and limit trace buffer size | expand

Commit Message

Ming Lei March 23, 2021, 8:14 a.m. UTC
blktrace may allocate lots of memory, if the process is terminated
by user or OOM, we need to provide one chance to remove the trace
buffer, otherwise memory leak may be caused.

Fix the issue by shutdown blktrace in case of task exiting in
blkdev_close().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 fs/block_dev.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Christoph Hellwig March 30, 2021, 4:53 p.m. UTC | #1
On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> blktrace may allocate lots of memory, if the process is terminated
> by user or OOM, we need to provide one chance to remove the trace
> buffer, otherwise memory leak may be caused.
> 
> Fix the issue by shutdown blktrace in case of task exiting in
> blkdev_close().
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

This just seems weird.  blktrace has no relationship to open
block device instances.
Ming Lei March 31, 2021, 12:16 a.m. UTC | #2
On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> > blktrace may allocate lots of memory, if the process is terminated
> > by user or OOM, we need to provide one chance to remove the trace
> > buffer, otherwise memory leak may be caused.
> > 
> > Fix the issue by shutdown blktrace in case of task exiting in
> > blkdev_close().
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> This just seems weird.  blktrace has no relationship to open
> block device instances.

blktrace still needs to open one blkdev, then send its own ioctl
commands to block layer. In case of OOM, the allocated memory in
these ioctl commands won't be released.

Or any other suggestion?
Christoph Hellwig April 2, 2021, 5:27 p.m. UTC | #3
On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote:
> On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> > On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> > > blktrace may allocate lots of memory, if the process is terminated
> > > by user or OOM, we need to provide one chance to remove the trace
> > > buffer, otherwise memory leak may be caused.
> > > 
> > > Fix the issue by shutdown blktrace in case of task exiting in
> > > blkdev_close().
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > 
> > This just seems weird.  blktrace has no relationship to open
> > block device instances.
> 
> blktrace still needs to open one blkdev, then send its own ioctl
> commands to block layer. In case of OOM, the allocated memory in
> these ioctl commands won't be released.
> 
> Or any other suggestion?

Not much we can do there I think.  If we want to autorelease memory
it needs to be an API that ties the memory allocation to an FD.
Ming Lei April 3, 2021, 8:10 a.m. UTC | #4
On Fri, Apr 02, 2021 at 07:27:30PM +0200, Christoph Hellwig wrote:
> On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote:
> > On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> > > On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> > > > blktrace may allocate lots of memory, if the process is terminated
> > > > by user or OOM, we need to provide one chance to remove the trace
> > > > buffer, otherwise memory leak may be caused.
> > > > 
> > > > Fix the issue by shutdown blktrace in case of task exiting in
> > > > blkdev_close().
> > > > 
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > 
> > > This just seems weird.  blktrace has no relationship to open
> > > block device instances.
> > 
> > blktrace still needs to open one blkdev, then send its own ioctl
> > commands to block layer. In case of OOM, the allocated memory in
> > these ioctl commands won't be released.
> > 
> > Or any other suggestion?
> 
> Not much we can do there I think.  If we want to autorelease memory
> it needs to be an API that ties the memory allocation to an FD.

We still may shutdown blktrace if current is the last opener, otherwise
new blktrace can't be started and memory should be leaked forever, and
what do you think of the revised version?

From de33ec85ee1ce2865aa04f2639e480ea4db4eebf Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Tue, 23 Mar 2021 10:32:23 +0800
Subject: [PATCH] block: shutdown blktrace in case of task exiting

blktrace may allocate lots of memory, if the process is terminated
by user or OOM, we need to provide one chance to remove the trace
buffer, otherwise memory leak may be caused. Also new blktrace
instance can't be started too.

Fix the issue by shutdown blktrace in case of task exiting in
blkdev_close() when it is the last opener.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 fs/block_dev.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 92ed7d5df677..8fa59cecce72 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -34,6 +34,7 @@
 #include <linux/part_stat.h>
 #include <linux/uaccess.h>
 #include <linux/suspend.h>
+#include <linux/blktrace_api.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1646,6 +1647,11 @@ EXPORT_SYMBOL(blkdev_put);
 static int blkdev_close(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev = I_BDEV(bdev_file_inode(filp));
+
+	/* shutdown blktrace in case of exiting which may be from OOM */
+	if ((current->flags & PF_EXITING) && (bdev->bd_openers == 1))
+		blk_trace_shutdown(bdev->bd_disk->queue);
+
 	blkdev_put(bdev, filp->f_mode);
 	return 0;
 }
Ming Lei April 3, 2021, 9:04 a.m. UTC | #5
On Sat, Apr 03, 2021 at 04:10:16PM +0800, Ming Lei wrote:
> On Fri, Apr 02, 2021 at 07:27:30PM +0200, Christoph Hellwig wrote:
> > On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote:
> > > On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> > > > On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> > > > > blktrace may allocate lots of memory, if the process is terminated
> > > > > by user or OOM, we need to provide one chance to remove the trace
> > > > > buffer, otherwise memory leak may be caused.
> > > > > 
> > > > > Fix the issue by shutdown blktrace in case of task exiting in
> > > > > blkdev_close().
> > > > > 
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > 
> > > > This just seems weird.  blktrace has no relationship to open
> > > > block device instances.
> > > 
> > > blktrace still needs to open one blkdev, then send its own ioctl
> > > commands to block layer. In case of OOM, the allocated memory in
> > > these ioctl commands won't be released.
> > > 
> > > Or any other suggestion?
> > 
> > Not much we can do there I think.  If we want to autorelease memory
> > it needs to be an API that ties the memory allocation to an FD.
> 
> We still may shutdown blktrace if current is the last opener, otherwise
> new blktrace can't be started and memory should be leaked forever, and
> what do you think of the revised version?

This way seems not good enough, another better one is to use
file->private_data for such purpose since blkdev fs doesn't use
file->privete_data, then we can shutdown blktrace just for the
blktrace FD:

From 191dff30abfd48c38a78dec78e011a39a3b606ca Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Tue, 23 Mar 2021 10:32:23 +0800
Subject: [PATCH] block: shutdown blktrace in case of task exiting

blktrace may allocate lots of memory, if the process is terminated
by user or OOM, we need to provide one chance to remove the trace
buffer, otherwise memory leak may be caused. Also new blktrace
instance can't be started too.

Fix the issue by shutdown blktrace in bdev_close() if blktrace
was setup on this FD.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/ioctl.c                |  2 ++
 fs/block_dev.c               | 12 ++++++++++++
 include/linux/blktrace_api.h | 11 +++++++++++
 3 files changed, 25 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ff241e663c01..7dad4a546db3 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -611,6 +611,8 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	else
 		mode &= ~FMODE_NDELAY;
 
+	blkdev_mark_blktrace(file, cmd);
+
 	switch (cmd) {
 	/* These need separate implementations for the data structure */
 	case HDIO_GETGEO:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 92ed7d5df677..aaa7d7d1e5a4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -34,6 +34,7 @@
 #include <linux/part_stat.h>
 #include <linux/uaccess.h>
 #include <linux/suspend.h>
+#include <linux/blktrace_api.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1646,6 +1647,15 @@ EXPORT_SYMBOL(blkdev_put);
 static int blkdev_close(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev = I_BDEV(bdev_file_inode(filp));
+
+	/*
+	 * The task running blktrace is supposed to shutdown blktrace
+	 * by ioctl. If they forget to shutdown or can't do it because
+	 * of OOM or sort of situation, we shutdown for them.
+	 */
+	if (blkdev_has_run_blktrace(filp))
+		blk_trace_shutdown(bdev->bd_disk->queue);
+
 	blkdev_put(bdev, filp->f_mode);
 	return 0;
 }
@@ -1664,6 +1674,8 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	else
 		mode &= ~FMODE_NDELAY;
 
+	blkdev_mark_blktrace(file, cmd);
+
 	return blkdev_ioctl(bdev, mode, cmd, arg);
 }
 
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index a083e15df608..754058c1965c 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -135,4 +135,15 @@ static inline unsigned int blk_rq_trace_nr_sectors(struct request *rq)
 	return blk_rq_is_passthrough(rq) ? 0 : blk_rq_sectors(rq);
 }
 
+static inline void blkdev_mark_blktrace(struct file *file, unsigned int cmd)
+{
+	if (cmd == BLKTRACESETUP)
+		file->private_data = (void *)-1;
+}
+
+static inline bool blkdev_has_run_blktrace(struct file *file)
+{
+	return file->private_data == (void *)-1;
+}
+
 #endif
Christoph Hellwig April 6, 2021, 6:30 a.m. UTC | #6
On Sat, Apr 03, 2021 at 04:10:16PM +0800, Ming Lei wrote:
> We still may shutdown blktrace if current is the last opener, otherwise
> new blktrace can't be started and memory should be leaked forever, and
> what do you think of the revised version?

I don't think this works.  For one there might be users of the blktrace
ioctl that explicitly rely on this not happening as difference processes
might start the tracing vs actually consume the trace data.  Second this
might not actually work as another process could be the last opener.

If you want to fix this for the blktrace tool (common) case I think we
need a new ioctl that explicitly ties the buffer lifetime to the fd.
diff mbox series

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 92ed7d5df677..1370eb731cea 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -34,6 +34,7 @@ 
 #include <linux/part_stat.h>
 #include <linux/uaccess.h>
 #include <linux/suspend.h>
+#include <linux/blktrace_api.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1646,6 +1647,11 @@  EXPORT_SYMBOL(blkdev_put);
 static int blkdev_close(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev = I_BDEV(bdev_file_inode(filp));
+
+	/* shutdown blktrace in case of exiting which may be from OOM */
+	if (current->flags & PF_EXITING)
+		blk_trace_shutdown(bdev->bd_disk->queue);
+
 	blkdev_put(bdev, filp->f_mode);
 	return 0;
 }