diff mbox series

[dhowells/mount-context] fs: don't call fs_context->free() from fsmount()

Message ID 20180731072928.2413-1-avagin@openvz.org (mailing list archive)
State New, archived
Headers show
Series [dhowells/mount-context] fs: don't call fs_context->free() from fsmount() | expand

Commit Message

Andrey Vagin July 31, 2018, 7:29 a.m. UTC
From: Andrei Vagin <avagin@gmail.com>

fs_context->free() will be called from fscontext_release().

Without this patch, a kernel bug is triggered:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000012b0a3067 P4D 800000012b0a3067 PUD 131f1a067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 612 Comm: test Not tainted 4.18.0-rc1-00037-g60023faf445f #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
RIP: 0010:proc_fs_context_free+0xd/0x30
Code: 78 dd ff b8 ea ff ff ff eb af e8 fe dd d5 ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 9f 90 00 00 00 <48> 8b 3b 48 85 ff 74 05 e8 16 24 e3 ff 48 89 df 5b e9 fd f5 f3 ff
RSP: 0018:ffffa9b5c0b53e60 EFLAGS: 00010282
RAX: ffffffffaf332e20 RBX: 0000000000000000 RCX: 0000000000000005
RDX: ffff9bc77abbe1bc RSI: 0000000000000001 RDI: ffff9bc774209908
RBP: ffff9bc77abbe0c8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9bc77ac08d20 R11: ffffffffb1232378 R12: ffff9bc779af0f28
R13: 0000000000020003 R14: ffff9bc77aba9020 R15: ffff9bc773b6ab48
FS:  00007f725675a4c0(0000) GS:ffff9bc77fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000012b0c6003 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 put_fs_context+0x4f/0x150
 fscontext_release+0x21/0x30
 __fput+0xc0/0x230
 task_work_run+0xa1/0xd0
 exit_to_usermode_loop+0xbb/0xc0
 do_syscall_64+0x1e5/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f725627ff24
Code: eb 89 e8 0f f6 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 2a e9 2c 00 48 63 ff 85 c0 75 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 f3 c3 66 90 48 83 ec 18 48 89 7c 24 08 e8

The following code can be used to reproduce the bug.

int main()
{
	int fsfd,  dfd;

	fsfd = fsopen("proc", 0);
	if (fsfd < 0)
		return 1;

	if (fsconfig(fsfd, fsconfig_cmd_create, NULL, NULL, 0) < 0)
		return 1;

	dfd = fsmount(fsfd, 0, 0);
	if (dfd < 0)
		return 1;

	if (close(dfd) < 0)
		return 1;
	if (close(fsfd) < 0)
		return 1;

	return 0;
}

Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 fs/namespace.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

David Howells July 31, 2018, 8:52 a.m. UTC | #1
Andrei Vagin <avagin@openvz.org> wrote:

> @@ -3435,9 +3435,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags
>  	 * do any memory allocation or anything like that at this point as we
>  	 * don't want to have to handle any errors incurred.
>  	 */
> -	if (fc->ops && fc->ops->free)
> -		fc->ops->free(fc);
> -	fc->fs_private = NULL;
>  	fc->s_fs_info = NULL;
>  	fc->sb_flags = 0;
>  	fc->sloppy = false;

This isn't the right fix.  The context needs to be reset at this point so that
it's prepared to be reinitialised into in the same state as one generated by
fspick().

I can do this two ways: (1) stick a flag in the context that says if ->free()
needs calling, (2) make all the ->free() routines aware that they may see the
reset state.  I think (1) is less error prone.

David
Andrey Vagin July 31, 2018, 5:34 p.m. UTC | #2
On Tue, Jul 31, 2018 at 09:52:36AM +0100, David Howells wrote:
> Andrei Vagin <avagin@openvz.org> wrote:
> 
> > @@ -3435,9 +3435,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags
> >  	 * do any memory allocation or anything like that at this point as we
> >  	 * don't want to have to handle any errors incurred.
> >  	 */
> > -	if (fc->ops && fc->ops->free)
> > -		fc->ops->free(fc);
> > -	fc->fs_private = NULL;
> >  	fc->s_fs_info = NULL;
> >  	fc->sb_flags = 0;
> >  	fc->sloppy = false;
> 
> This isn't the right fix.  The context needs to be reset at this point so that
> it's prepared to be reinitialised into in the same state as one generated by
> fspick().

I understand this. init_fs_context() is called from fspick() and
fs_context->free() is called for contexts which have been created in
fspick().

> 
> I can do this two ways: (1) stick a flag in the context that says if ->free()
> needs calling, (2) make all the ->free() routines aware that they may see the
> reset state.  I think (1) is less error prone.

Does it mean that fc->fs_type->init_fs_context() should not be called
contexts which are created from fspick()?

> 
> David
David Howells July 31, 2018, 8:56 p.m. UTC | #3
Andrei Vagin <avagin@virtuozzo.com> wrote:

> > I can do this two ways: (1) stick a flag in the context that says if
> > ->free() needs calling, (2) make all the ->free() routines aware that they
> > may see the reset state.  I think (1) is less error prone.
> 
> Does it mean that fc->fs_type->init_fs_context() should not be called
> contexts which are created from fspick()?

No.  I've put a flag in the context that is set when ->init_fs_context() is
called and cleared when ->free() is called.  ->free() isn't called in the put
routine if the flag isn't set.

David
Andrey Vagin July 31, 2018, 10:48 p.m. UTC | #4
On Tue, Jul 31, 2018 at 09:56:45PM +0100, David Howells wrote:
> Andrei Vagin <avagin@virtuozzo.com> wrote:
> 
> > > I can do this two ways: (1) stick a flag in the context that says if
> > > ->free() needs calling, (2) make all the ->free() routines aware that they
> > > may see the reset state.  I think (1) is less error prone.
> > 
> > Does it mean that fc->fs_type->init_fs_context() should not be called
> > contexts which are created from fspick()?
> 
> No.  I've put a flag in the context that is set when ->init_fs_context() is
> called and cleared when ->free() is called.  ->free() isn't called in the put
> routine if the flag isn't set.

> /* We've done the mount bit - now move the file context into
>  * more or less the same state as if we'd done an fspick().

According to this comment, a context after fsmount() should be in
the same state as after fspick().

In fsmount(), we call ->free() which is oposite to init_fs_context(). If
we want to have "more or less the same state" and want to call
fs_context->free() in fsmount(), this means that we should not call
fc->fs_type->init_fs_context() in fspick()...  Where am I wrong?


> 
> David
David Howells Aug. 1, 2018, 12:42 a.m. UTC | #5
Andrei Vagin <avagin@virtuozzo.com> wrote:

> > > Does it mean that fc->fs_type->init_fs_context() should not be called
> > > contexts which are created from fspick()?
> > 
> > No.  I've put a flag in the context that is set when ->init_fs_context() is
> > called and cleared when ->free() is called.  ->free() isn't called in the put
> > routine if the flag isn't set.
> 
> > /* We've done the mount bit - now move the file context into
> >  * more or less the same state as if we'd done an fspick().
> 
> According to this comment, a context after fsmount() should be in
> the same state as after fspick().
> 
> In fsmount(), we call ->free() which is oposite to init_fs_context(). If
> we want to have "more or less the same state" and want to call
> fs_context->free() in fsmount(), this means that we should not call
> fc->fs_type->init_fs_context() in fspick()...  Where am I wrong?

vfs_fsconfig() calls ->init_fs_context() if fc->phase is
FS_CONTEXT_AWAITING_RECONF.  This is so that we don't actually fully reinit
the context unnecessarily if the fd is just going to be closed after fsmount()
is called.

fspick() assumes that you're definitely going to reconfigure the superblock
(presumably that's why you used it) and always reinitialises the context,
shoving fc->phase round to FS_CONTEXT_RECONF_PARAMS.

David
Andrey Vagin Aug. 1, 2018, 5:53 a.m. UTC | #6
On Wed, Aug 01, 2018 at 01:42:29AM +0100, David Howells wrote:
> Andrei Vagin <avagin@virtuozzo.com> wrote:
> 
> > > > Does it mean that fc->fs_type->init_fs_context() should not be called
> > > > contexts which are created from fspick()?
> > > 
> > > No.  I've put a flag in the context that is set when ->init_fs_context() is
> > > called and cleared when ->free() is called.  ->free() isn't called in the put
> > > routine if the flag isn't set.
> > 
> > > /* We've done the mount bit - now move the file context into
> > >  * more or less the same state as if we'd done an fspick().
> > 
> > According to this comment, a context after fsmount() should be in
> > the same state as after fspick().
> > 
> > In fsmount(), we call ->free() which is oposite to init_fs_context(). If
> > we want to have "more or less the same state" and want to call
> > fs_context->free() in fsmount(), this means that we should not call
> > fc->fs_type->init_fs_context() in fspick()...  Where am I wrong?
> 
> vfs_fsconfig() calls ->init_fs_context() if fc->phase is
> FS_CONTEXT_AWAITING_RECONF.  This is so that we don't actually fully reinit
> the context unnecessarily if the fd is just going to be closed after fsmount()
> is called.

I understood the idea. Thank you for the explanation.

> 
> fspick() assumes that you're definitely going to reconfigure the superblock
> (presumably that's why you used it) and always reinitialises the context,
> shoving fc->phase round to FS_CONTEXT_RECONF_PARAMS.
> 
> David
diff mbox series

Patch

diff --git a/fs/namespace.c b/fs/namespace.c
index 7e7b1145d15d..1899456194c3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3435,9 +3435,6 @@  SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags
 	 * do any memory allocation or anything like that at this point as we
 	 * don't want to have to handle any errors incurred.
 	 */
-	if (fc->ops && fc->ops->free)
-		fc->ops->free(fc);
-	fc->fs_private = NULL;
 	fc->s_fs_info = NULL;
 	fc->sb_flags = 0;
 	fc->sloppy = false;