diff mbox series

[RFC] selinuxfs: saner handling of policy reloads

Message ID 20231016220835.GH800259@ZenIV (mailing list archive)
State Accepted
Delegated to: Paul Moore
Headers show
Series [RFC] selinuxfs: saner handling of policy reloads | expand

Commit Message

Al Viro Oct. 16, 2023, 10:08 p.m. UTC
[
That thing sits in viro/vfs.git#work.selinuxfs; I have
lock_rename()-related followups in another branch, so a pull would be more
convenient for me than cherry-pick.  NOTE: testing and comments would
be very welcome - as it is, the patch is pretty much untested beyond
"it builds".
]

On policy reload selinuxfs replaces two subdirectories (/booleans
and /class) with new variants.  Unfortunately, that's done with
serious abuses of directory locking.

1) lock_rename() should be done to parents, not to objects being
exchanged

2) there's a bunch of reasons why it should not be done for directories
that do not have a common ancestor; most of those do not apply to
selinuxfs, but even in the best case the proof is subtle and brittle.

3) failure halfway through the creation of /class will leak
names and values arrays.

4) use of d_genocide() is also rather brittle; it's probably not much of
a bug per se, but e.g. an overmount of /sys/fs/selinuxfs/classes/shm/index
with any regular file will end up with leaked mount on policy reload.
Sure, don't do it, but...

Let's stop messing with disconnected directories; just create
a temporary (/.swapover) with no permissions for anyone (on the
level of ->permission() returing -EPERM, no matter who's calling
it) and build the new /booleans and /class in there; then
lock_rename on root and that temporary directory and d_exchange()
old and new both for class and booleans.  Then unlock and use
simple_recursive_removal() to take the temporary out; it's much
more robust.

And instead of bothering with separate pathways for freeing
new (on failure halfway through) and old (on success) names/values,
do all freeing in one place.  With temporaries swapped with the
old ones when we are past all possible failures.

The only user-visible difference is that /.swapover shows up
(but isn't possible to open, look up into, etc.) for the
duration of policy reload.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

Comments

Paul Moore Oct. 17, 2023, 8:28 p.m. UTC | #1
On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> [
> That thing sits in viro/vfs.git#work.selinuxfs; I have
> lock_rename()-related followups in another branch, so a pull would be more
> convenient for me than cherry-pick.  NOTE: testing and comments would
> be very welcome - as it is, the patch is pretty much untested beyond
> "it builds".
> ]
>
> On policy reload selinuxfs replaces two subdirectories (/booleans
> and /class) with new variants.  Unfortunately, that's done with
> serious abuses of directory locking.
>
> 1) lock_rename() should be done to parents, not to objects being
> exchanged
>
> 2) there's a bunch of reasons why it should not be done for directories
> that do not have a common ancestor; most of those do not apply to
> selinuxfs, but even in the best case the proof is subtle and brittle.
>
> 3) failure halfway through the creation of /class will leak
> names and values arrays.
>
> 4) use of d_genocide() is also rather brittle; it's probably not much of
> a bug per se, but e.g. an overmount of /sys/fs/selinuxfs/classes/shm/index
> with any regular file will end up with leaked mount on policy reload.
> Sure, don't do it, but...
>
> Let's stop messing with disconnected directories; just create
> a temporary (/.swapover) with no permissions for anyone (on the
> level of ->permission() returing -EPERM, no matter who's calling
> it) and build the new /booleans and /class in there; then
> lock_rename on root and that temporary directory and d_exchange()
> old and new both for class and booleans.  Then unlock and use
> simple_recursive_removal() to take the temporary out; it's much
> more robust.
>
> And instead of bothering with separate pathways for freeing
> new (on failure halfway through) and old (on success) names/values,
> do all freeing in one place.  With temporaries swapped with the
> old ones when we are past all possible failures.
>
> The only user-visible difference is that /.swapover shows up
> (but isn't possible to open, look up into, etc.) for the
> duration of policy reload.

Thanks Al.

Giving this a very quick look, I like the code simplifications that
come out of this change and I'll trust you on the idea that this
approach is better from a VFS perspective.

While the reject_all() permission hammer is good, I do want to make
sure we are covered from a file labeling perspective; even though the
DAC/reject_all() check hits first and avoids the LSM inode permission
hook, we still want to make sure the files are labeled properly.  It
looks like given the current SELinux Reference Policy this shouldn't
be a problem, it will be labeled like most everything else in
selinuxfs via genfscon (SELinux policy construct).  I expect those
with custom SELinux policies will have something similar in place with
a sane default that would cover the /sys/fs/selinux/.swapover
directory but I did add the selinux-refpol list to the CC line just in
case I'm being dumb and forgetting something important with respect to
policy.

The next step is to actually boot up a kernel with this patch and make
sure it doesn't break anything.  Simply booting up a SELinux system
and running 'load_policy' a handful of times should exercise the
policy (re)load path, and if you want a (relatively) simple SELinux
test suite you can find one here:

* https://github.com/SELinuxProject/selinux-testsuite

The README.md should have the instructions necessary to get it
running.  If you can't do that, and no one else on the mailing list is
able to test this out, I'll give it a go but expect it to take a while
as I'm currently swamped with reviews and other stuff.
Al Viro Oct. 18, 2023, 4:35 a.m. UTC | #2
On Tue, Oct 17, 2023 at 04:28:53PM -0400, Paul Moore wrote:
> Thanks Al.
> 
> Giving this a very quick look, I like the code simplifications that
> come out of this change and I'll trust you on the idea that this
> approach is better from a VFS perspective.
> 
> While the reject_all() permission hammer is good, I do want to make
> sure we are covered from a file labeling perspective; even though the
> DAC/reject_all() check hits first and avoids the LSM inode permission
> hook, we still want to make sure the files are labeled properly.  It
> looks like given the current SELinux Reference Policy this shouldn't
> be a problem, it will be labeled like most everything else in
> selinuxfs via genfscon (SELinux policy construct).  I expect those
> with custom SELinux policies will have something similar in place with
> a sane default that would cover the /sys/fs/selinux/.swapover
> directory but I did add the selinux-refpol list to the CC line just in
> case I'm being dumb and forgetting something important with respect to
> policy.
> 
> The next step is to actually boot up a kernel with this patch and make
> sure it doesn't break anything.  Simply booting up a SELinux system
> and running 'load_policy' a handful of times should exercise the
> policy (re)load path, and if you want a (relatively) simple SELinux
> test suite you can find one here:
> 
> * https://github.com/SELinuxProject/selinux-testsuite
> 
> The README.md should have the instructions necessary to get it
> running.  If you can't do that, and no one else on the mailing list is
> able to test this out, I'll give it a go but expect it to take a while
> as I'm currently swamped with reviews and other stuff.

It does survive repeated load_policy (as well as semodule -d/semodule -e,
with expected effect on /booleans, AFAICS).  As for the testsuite...
No regressions compared to clean -rc5, but then there are (identical)
failures on both - "Failed 8/76 test programs. 88/1046 subtests failed."
Incomplete defconfig, at a guess...
Paul Moore Oct. 19, 2023, 1:39 a.m. UTC | #3
On Wed, Oct 18, 2023 at 12:35 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Oct 17, 2023 at 04:28:53PM -0400, Paul Moore wrote:
> > Thanks Al.
> >
> > Giving this a very quick look, I like the code simplifications that
> > come out of this change and I'll trust you on the idea that this
> > approach is better from a VFS perspective.
> >
> > While the reject_all() permission hammer is good, I do want to make
> > sure we are covered from a file labeling perspective; even though the
> > DAC/reject_all() check hits first and avoids the LSM inode permission
> > hook, we still want to make sure the files are labeled properly.  It
> > looks like given the current SELinux Reference Policy this shouldn't
> > be a problem, it will be labeled like most everything else in
> > selinuxfs via genfscon (SELinux policy construct).  I expect those
> > with custom SELinux policies will have something similar in place with
> > a sane default that would cover the /sys/fs/selinux/.swapover
> > directory but I did add the selinux-refpol list to the CC line just in
> > case I'm being dumb and forgetting something important with respect to
> > policy.
> >
> > The next step is to actually boot up a kernel with this patch and make
> > sure it doesn't break anything.  Simply booting up a SELinux system
> > and running 'load_policy' a handful of times should exercise the
> > policy (re)load path, and if you want a (relatively) simple SELinux
> > test suite you can find one here:
> >
> > * https://github.com/SELinuxProject/selinux-testsuite
> >
> > The README.md should have the instructions necessary to get it
> > running.  If you can't do that, and no one else on the mailing list is
> > able to test this out, I'll give it a go but expect it to take a while
> > as I'm currently swamped with reviews and other stuff.
>
> It does survive repeated load_policy (as well as semodule -d/semodule -e,
> with expected effect on /booleans, AFAICS).  As for the testsuite...
> No regressions compared to clean -rc5, but then there are (identical)
> failures on both - "Failed 8/76 test programs. 88/1046 subtests failed."
> Incomplete defconfig, at a guess...

Thanks for the smoke testing, the tests should run clean, but if you
didn't adjust the Kconfig you're likely correct that it is the source
of the failures.  I'll build a kernel with the patch and give it a
test.

From what I can see, it doesn't look like this is a candidate for
stable, correct?
Stephen Smalley Oct. 19, 2023, 1:02 p.m. UTC | #4
On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> [
> That thing sits in viro/vfs.git#work.selinuxfs; I have
> lock_rename()-related followups in another branch, so a pull would be more
> convenient for me than cherry-pick.  NOTE: testing and comments would
> be very welcome - as it is, the patch is pretty much untested beyond
> "it builds".
> ]
>
> On policy reload selinuxfs replaces two subdirectories (/booleans
> and /class) with new variants.  Unfortunately, that's done with
> serious abuses of directory locking.
>
> 1) lock_rename() should be done to parents, not to objects being
> exchanged
>
> 2) there's a bunch of reasons why it should not be done for directories
> that do not have a common ancestor; most of those do not apply to
> selinuxfs, but even in the best case the proof is subtle and brittle.
>
> 3) failure halfway through the creation of /class will leak
> names and values arrays.
>
> 4) use of d_genocide() is also rather brittle; it's probably not much of
> a bug per se, but e.g. an overmount of /sys/fs/selinuxfs/classes/shm/index
> with any regular file will end up with leaked mount on policy reload.
> Sure, don't do it, but...
>
> Let's stop messing with disconnected directories; just create
> a temporary (/.swapover) with no permissions for anyone (on the
> level of ->permission() returing -EPERM, no matter who's calling
> it) and build the new /booleans and /class in there; then
> lock_rename on root and that temporary directory and d_exchange()
> old and new both for class and booleans.  Then unlock and use
> simple_recursive_removal() to take the temporary out; it's much
> more robust.
>
> And instead of bothering with separate pathways for freeing
> new (on failure halfway through) and old (on success) names/values,
> do all freeing in one place.  With temporaries swapped with the
> old ones when we are past all possible failures.
>
> The only user-visible difference is that /.swapover shows up
> (but isn't possible to open, look up into, etc.) for the
> duration of policy reload.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Stephen Smalley Oct. 19, 2023, 1:10 p.m. UTC | #5
On Wed, Oct 18, 2023 at 12:35 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Oct 17, 2023 at 04:28:53PM -0400, Paul Moore wrote:
> > Thanks Al.
> >
> > Giving this a very quick look, I like the code simplifications that
> > come out of this change and I'll trust you on the idea that this
> > approach is better from a VFS perspective.
> >
> > While the reject_all() permission hammer is good, I do want to make
> > sure we are covered from a file labeling perspective; even though the
> > DAC/reject_all() check hits first and avoids the LSM inode permission
> > hook, we still want to make sure the files are labeled properly.  It
> > looks like given the current SELinux Reference Policy this shouldn't
> > be a problem, it will be labeled like most everything else in
> > selinuxfs via genfscon (SELinux policy construct).  I expect those
> > with custom SELinux policies will have something similar in place with
> > a sane default that would cover the /sys/fs/selinux/.swapover
> > directory but I did add the selinux-refpol list to the CC line just in
> > case I'm being dumb and forgetting something important with respect to
> > policy.
> >
> > The next step is to actually boot up a kernel with this patch and make
> > sure it doesn't break anything.  Simply booting up a SELinux system
> > and running 'load_policy' a handful of times should exercise the
> > policy (re)load path, and if you want a (relatively) simple SELinux
> > test suite you can find one here:
> >
> > * https://github.com/SELinuxProject/selinux-testsuite
> >
> > The README.md should have the instructions necessary to get it
> > running.  If you can't do that, and no one else on the mailing list is
> > able to test this out, I'll give it a go but expect it to take a while
> > as I'm currently swamped with reviews and other stuff.
>
> It does survive repeated load_policy (as well as semodule -d/semodule -e,
> with expected effect on /booleans, AFAICS).  As for the testsuite...
> No regressions compared to clean -rc5, but then there are (identical)
> failures on both - "Failed 8/76 test programs. 88/1046 subtests failed."
> Incomplete defconfig, at a guess...

All tests passed for me using the defconfig fragment from the selinux-testsuite.
Paul Moore Nov. 13, 2023, 3:48 a.m. UTC | #6
On Thu, Oct 19, 2023 at 9:10 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
> On Wed, Oct 18, 2023 at 12:35 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Oct 17, 2023 at 04:28:53PM -0400, Paul Moore wrote:
> > > Thanks Al.
> > >
> > > Giving this a very quick look, I like the code simplifications that
> > > come out of this change and I'll trust you on the idea that this
> > > approach is better from a VFS perspective.
> > >
> > > While the reject_all() permission hammer is good, I do want to make
> > > sure we are covered from a file labeling perspective; even though the
> > > DAC/reject_all() check hits first and avoids the LSM inode permission
> > > hook, we still want to make sure the files are labeled properly.  It
> > > looks like given the current SELinux Reference Policy this shouldn't
> > > be a problem, it will be labeled like most everything else in
> > > selinuxfs via genfscon (SELinux policy construct).  I expect those
> > > with custom SELinux policies will have something similar in place with
> > > a sane default that would cover the /sys/fs/selinux/.swapover
> > > directory but I did add the selinux-refpol list to the CC line just in
> > > case I'm being dumb and forgetting something important with respect to
> > > policy.
> > >
> > > The next step is to actually boot up a kernel with this patch and make
> > > sure it doesn't break anything.  Simply booting up a SELinux system
> > > and running 'load_policy' a handful of times should exercise the
> > > policy (re)load path, and if you want a (relatively) simple SELinux
> > > test suite you can find one here:
> > >
> > > * https://github.com/SELinuxProject/selinux-testsuite
> > >
> > > The README.md should have the instructions necessary to get it
> > > running.  If you can't do that, and no one else on the mailing list is
> > > able to test this out, I'll give it a go but expect it to take a while
> > > as I'm currently swamped with reviews and other stuff.
> >
> > It does survive repeated load_policy (as well as semodule -d/semodule -e,
> > with expected effect on /booleans, AFAICS).  As for the testsuite...
> > No regressions compared to clean -rc5, but then there are (identical)
> > failures on both - "Failed 8/76 test programs. 88/1046 subtests failed."
> > Incomplete defconfig, at a guess...
>
> All tests passed for me using the defconfig fragment from the selinux-testsuite.

I just merged this into selinux/dev, thanks everyone.
Paul Moore Nov. 13, 2023, 4:19 p.m. UTC | #7
On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> [
> That thing sits in viro/vfs.git#work.selinuxfs; I have
> lock_rename()-related followups in another branch, so a pull would be more
> convenient for me than cherry-pick.  NOTE: testing and comments would
> be very welcome - as it is, the patch is pretty much untested beyond
> "it builds".
> ]

Hi Al,

I will admit to glossing over the comment above when I merged this
into the selinux/dev branch last night.  As it's been a few weeks, I'm
not sure if the comment above still applies, but if it does let me
know and I can yank/revert the patch in favor of a larger pull.  Let
me know what you'd like to do.

> On policy reload selinuxfs replaces two subdirectories (/booleans
> and /class) with new variants.  Unfortunately, that's done with
> serious abuses of directory locking.
>
> 1) lock_rename() should be done to parents, not to objects being
> exchanged
>
> 2) there's a bunch of reasons why it should not be done for directories
> that do not have a common ancestor; most of those do not apply to
> selinuxfs, but even in the best case the proof is subtle and brittle.
>
> 3) failure halfway through the creation of /class will leak
> names and values arrays.
>
> 4) use of d_genocide() is also rather brittle; it's probably not much of
> a bug per se, but e.g. an overmount of /sys/fs/selinuxfs/classes/shm/index
> with any regular file will end up with leaked mount on policy reload.
> Sure, don't do it, but...
>
> Let's stop messing with disconnected directories; just create
> a temporary (/.swapover) with no permissions for anyone (on the
> level of ->permission() returing -EPERM, no matter who's calling
> it) and build the new /booleans and /class in there; then
> lock_rename on root and that temporary directory and d_exchange()
> old and new both for class and booleans.  Then unlock and use
> simple_recursive_removal() to take the temporary out; it's much
> more robust.
>
> And instead of bothering with separate pathways for freeing
> new (on failure halfway through) and old (on success) names/values,
> do all freeing in one place.  With temporaries swapped with the
> old ones when we are past all possible failures.
>
> The only user-visible difference is that /.swapover shows up
> (but isn't possible to open, look up into, etc.) for the
> duration of policy reload.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
Stephen Smalley Nov. 14, 2023, 8:57 p.m. UTC | #8
On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > [
> > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > lock_rename()-related followups in another branch, so a pull would be more
> > convenient for me than cherry-pick.  NOTE: testing and comments would
> > be very welcome - as it is, the patch is pretty much untested beyond
> > "it builds".
> > ]
>
> Hi Al,
>
> I will admit to glossing over the comment above when I merged this
> into the selinux/dev branch last night.  As it's been a few weeks, I'm
> not sure if the comment above still applies, but if it does let me
> know and I can yank/revert the patch in favor of a larger pull.  Let
> me know what you'd like to do.

Seeing this during testsuite runs:

[ 3550.206423] SELinux:  Converting 1152 SID table entries...
[ 3550.666195] ------------[ cut here ]------------
[ 3550.666201] WARNING: CPU: 3 PID: 12300 at fs/inode.c:330 drop_nlink+0x57/0x70
[ 3550.666214] Modules linked in: tun af_key crypto_user
scsi_transport_iscsi xt_multiport ip_gre gre ip_tunnel bluetooth
ecdh_generic sctp ip6_udp_tunnel udp_tunnel overlay xt_CONNSECMARK
xt_SECMARK ah6 ah4 vfat fat xt_CHECKSUM xt_MASQUERADE xt_conntrack
ipt_REJECT nf_nat_tftp nf_conntrack_tftp nft_fib_inet bridge
nft_fib_ipv4 nft_fib_ipv6 nft_fib stp llc nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat rfkill
ip6table_nat ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink
ip6table_filter iptable_filter vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common
vmw_balloon rapl joydev vmw_vmci i2c_piix4 auth_rpcgss loop sunrpc
zram xfs vmwgfx drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper
crc32_pclmul crc32c_intel ghash_clmulni_intel drm vmw_pvscsi vmxnet3
ata_generic
[ 3550.666453]  pata_acpi serio_raw ip6_tables ip_tables
pkcs8_key_parser fuse [last unloaded: setest_module_request(OE)]
[ 3550.666476] CPU: 3 PID: 12300 Comm: load_policy Tainted: G    B
 OE      6.7.0-rc1+ #68
[ 3550.666488] RIP: 0010:drop_nlink+0x57/0x70
[ 3550.666495] Code: 7b 28 e8 9c ab f4 ff 48 8b 5b 28 be 08 00 00 00
48 8d bb 40 07 00 00 e8 c7 be f4 ff f0 48 ff 83 40 07 00 00 5b c3 cc
cc cc cc <0f> 0b c7 43 48 ff ff ff ff 5b c3 cc cc cc cc 66 2e 0f 1f 84
00 00
[ 3550.666502] RSP: 0018:ffff88816efefb78 EFLAGS: 00010246
[ 3550.666508] RAX: 0000000000000000 RBX: ffff8881007e7a48 RCX: dffffc0000000000
[ 3550.666513] RDX: 0000000000000003 RSI: ffffffff9a6f30b6 RDI: ffff8881007e7a90
[ 3550.666518] RBP: ffff88816efefbf0 R08: 0000000000000000 R09: 0000000000000000
[ 3550.666523] R10: ffffffff9d8952e7 R11: 0000000000000000 R12: 000000006553dd4c
[ 3550.666527] R13: ffff8881007e7a48 R14: ffff8881d014a8c8 R15: ffff888319e42e68
[ 3550.666533] FS:  00007fa567f7fc40(0000) GS:ffff888dfed80000(0000)
knlGS:0000000000000000
[ 3550.666538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3550.666542] CR2: 00007f48f4049024 CR3: 00000001f8174003 CR4: 00000000007706f0
[ 3550.666569] PKRU: 55555554
[ 3550.666573] Call Trace:
[ 3550.666576]  <TASK>
[ 3550.666580]  ? __warn+0xa5/0x200
[ 3550.666590]  ? drop_nlink+0x57/0x70
[ 3550.666598]  ? report_bug+0x1b2/0x1e0
[ 3550.666612]  ? handle_bug+0x79/0xa0
[ 3550.666621]  ? exc_invalid_op+0x17/0x40
[ 3550.666629]  ? asm_exc_invalid_op+0x1a/0x20
[ 3550.666643]  ? drop_nlink+0x16/0x70
[ 3550.666651]  ? drop_nlink+0x57/0x70
[ 3550.666659]  ? drop_nlink+0x16/0x70
[ 3550.666666]  simple_recursive_removal+0x405/0x430
[ 3550.666683]  sel_write_load+0x668/0xf30
[ 3550.666727]  ? __pfx_sel_write_load+0x10/0x10
[ 3550.666735]  ? __call_rcu_common.constprop.0+0x30b/0x980
[ 3550.666747]  ? __might_sleep+0x2b/0xb0
[ 3550.666756]  ? __pfx_lock_acquire+0x10/0x10
[ 3550.666764]  ? inode_security+0x6d/0x90
[ 3550.666775]  ? selinux_file_permission+0x1e4/0x220
[ 3550.666787]  vfs_write+0x18f/0x700
[ 3550.666799]  ? __pfx_vfs_write+0x10/0x10
[ 3550.666809]  ? do_sys_openat2+0xcb/0x110
[ 3550.666817]  ? __pfx_do_sys_openat2+0x10/0x10
[ 3550.666827]  ? __fget_light+0xdf/0x100
[ 3550.666838]  ksys_write+0xb7/0x140
[ 3550.666847]  ? __pfx_ksys_write+0x10/0x10
[ 3550.666856]  ? lockdep_hardirqs_on_prepare+0x12/0x200
[ 3550.666864]  ? syscall_enter_from_user_mode+0x24/0x80
[ 3550.666875]  do_syscall_64+0x43/0xf0
[ 3550.666882]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3550.666889] RIP: 0033:0x7fa56811c154
[ 3550.666920] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84
00 00 00 00 00 90 f3 0f 1e fa 80 3d 8d b4 0d 00 00 74 13 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20
48 89
[ 3550.666926] RSP: 002b:00007fffe6bdf948 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 3550.666933] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa56811c154
[ 3550.666938] RDX: 000000000038f3d2 RSI: 00007fa55a400000 RDI: 0000000000000004
[ 3550.666942] RBP: 00007fffe6be0980 R08: 0000000000000073 R09: 0000000000000001
[ 3550.666947] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa55a400000
[ 3550.666951] R13: 000000000038f3d2 R14: 0000000000000003 R15: 00007fa5682230a0
[ 3550.666965]  </TASK>
[ 3550.666969] irq event stamp: 0
[ 3550.666972] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 3550.666982] hardirqs last disabled at (0): [<ffffffff9a15878c>]
copy_process+0x114c/0x3580
[ 3550.666990] softirqs last  enabled at (0): [<ffffffff9a15878c>]
copy_process+0x114c/0x3580
[ 3550.666997] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 3550.667007] ---[ end trace 0000000000000000 ]---
Al Viro Nov. 14, 2023, 9:53 p.m. UTC | #9
On Tue, Nov 14, 2023 at 03:57:35PM -0500, Stephen Smalley wrote:
> On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > [
> > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > lock_rename()-related followups in another branch, so a pull would be more
> > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > be very welcome - as it is, the patch is pretty much untested beyond
> > > "it builds".
> > > ]
> >
> > Hi Al,
> >
> > I will admit to glossing over the comment above when I merged this
> > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > not sure if the comment above still applies, but if it does let me
> > know and I can yank/revert the patch in favor of a larger pull.  Let
> > me know what you'd like to do.
> 
> Seeing this during testsuite runs:

Interesting...  i_nlink decrement hitting an inode already with zero
nlink...

<pokes around>

Could you add
        inc_nlink(sb->s_root->d_inode);
in sel_make_swapover_dir() right before
        inode_unlock(sb->s_root->d_inode);

and check if that fixes the problem?
Al Viro Nov. 14, 2023, 9:57 p.m. UTC | #10
On Tue, Nov 14, 2023 at 09:53:41PM +0000, Al Viro wrote:
> On Tue, Nov 14, 2023 at 03:57:35PM -0500, Stephen Smalley wrote:
> > On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > [
> > > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > > lock_rename()-related followups in another branch, so a pull would be more
> > > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > > be very welcome - as it is, the patch is pretty much untested beyond
> > > > "it builds".
> > > > ]
> > >
> > > Hi Al,
> > >
> > > I will admit to glossing over the comment above when I merged this
> > > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > > not sure if the comment above still applies, but if it does let me
> > > know and I can yank/revert the patch in favor of a larger pull.  Let
> > > me know what you'd like to do.
> > 
> > Seeing this during testsuite runs:
> 
> Interesting...  i_nlink decrement hitting an inode already with zero
> nlink...
> 
> <pokes around>
> 
> Could you add
>         inc_nlink(sb->s_root->d_inode);
> in sel_make_swapover_dir() right before
>         inode_unlock(sb->s_root->d_inode);
> 
> and check if that fixes the problem?

See git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #for-selinux
Paul Moore Nov. 14, 2023, 10:24 p.m. UTC | #11
On Tue, Nov 14, 2023 at 3:57 PM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
> On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > [
> > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > lock_rename()-related followups in another branch, so a pull would be more
> > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > be very welcome - as it is, the patch is pretty much untested beyond
> > > "it builds".
> > > ]
> >
> > Hi Al,
> >
> > I will admit to glossing over the comment above when I merged this
> > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > not sure if the comment above still applies, but if it does let me
> > know and I can yank/revert the patch in favor of a larger pull.  Let
> > me know what you'd like to do.
>
> Seeing this during testsuite runs:
>
> [ 3550.206423] SELinux:  Converting 1152 SID table entries...
> [ 3550.666195] ------------[ cut here ]------------
> [ 3550.666201] WARNING: CPU: 3 PID: 12300 at fs/inode.c:330 drop_nlink+0x57/0x70

How are you running the test suite Stephen?  I haven't hit this in my
automated testing and I did another test run manually to make sure I
wasn't missing it and everything ran clean.

I'm running the latest selinux-testsuite on a current Rawhide system
with kernel 6.7.0-0.rc1.20231114git9bacdd89.17.1.secnext.fc40 (current
Rawhide kernel + the LSM, SELinux, and audit dev trees).
Stephen Smalley Nov. 15, 2023, 1:35 p.m. UTC | #12
On Tue, Nov 14, 2023 at 5:24 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Tue, Nov 14, 2023 at 3:57 PM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> > On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> > > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > [
> > > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > > lock_rename()-related followups in another branch, so a pull would be more
> > > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > > be very welcome - as it is, the patch is pretty much untested beyond
> > > > "it builds".
> > > > ]
> > >
> > > Hi Al,
> > >
> > > I will admit to glossing over the comment above when I merged this
> > > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > > not sure if the comment above still applies, but if it does let me
> > > know and I can yank/revert the patch in favor of a larger pull.  Let
> > > me know what you'd like to do.
> >
> > Seeing this during testsuite runs:
> >
> > [ 3550.206423] SELinux:  Converting 1152 SID table entries...
> > [ 3550.666195] ------------[ cut here ]------------
> > [ 3550.666201] WARNING: CPU: 3 PID: 12300 at fs/inode.c:330 drop_nlink+0x57/0x70
>
> How are you running the test suite Stephen?  I haven't hit this in my
> automated testing and I did another test run manually to make sure I
> wasn't missing it and everything ran clean.
>
> I'm running the latest selinux-testsuite on a current Rawhide system
> with kernel 6.7.0-0.rc1.20231114git9bacdd89.17.1.secnext.fc40 (current
> Rawhide kernel + the LSM, SELinux, and audit dev trees).

Technically this was with a kernel built from your dev branch plus
Ondrej's selinux: introduce an initial SID for early boot processes
patch, but I don't see how the latter could introduce such a bug. Will
retry without it.
Stephen Smalley Nov. 16, 2023, 1:16 p.m. UTC | #13
On Wed, Nov 15, 2023 at 8:35 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> On Tue, Nov 14, 2023 at 5:24 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Tue, Nov 14, 2023 at 3:57 PM Stephen Smalley
> > <stephen.smalley.work@gmail.com> wrote:
> > > On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > >
> > > > > [
> > > > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > > > lock_rename()-related followups in another branch, so a pull would be more
> > > > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > > > be very welcome - as it is, the patch is pretty much untested beyond
> > > > > "it builds".
> > > > > ]
> > > >
> > > > Hi Al,
> > > >
> > > > I will admit to glossing over the comment above when I merged this
> > > > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > > > not sure if the comment above still applies, but if it does let me
> > > > know and I can yank/revert the patch in favor of a larger pull.  Let
> > > > me know what you'd like to do.
> > >
> > > Seeing this during testsuite runs:
> > >
> > > [ 3550.206423] SELinux:  Converting 1152 SID table entries...
> > > [ 3550.666195] ------------[ cut here ]------------
> > > [ 3550.666201] WARNING: CPU: 3 PID: 12300 at fs/inode.c:330 drop_nlink+0x57/0x70
> >
> > How are you running the test suite Stephen?  I haven't hit this in my
> > automated testing and I did another test run manually to make sure I
> > wasn't missing it and everything ran clean.
> >
> > I'm running the latest selinux-testsuite on a current Rawhide system
> > with kernel 6.7.0-0.rc1.20231114git9bacdd89.17.1.secnext.fc40 (current
> > Rawhide kernel + the LSM, SELinux, and audit dev trees).
>
> Technically this was with a kernel built from your dev branch plus
> Ondrej's selinux: introduce an initial SID for early boot processes
> patch, but I don't see how the latter could introduce such a bug. Will
> retry without it.

Reproduced without Ondrej's patch; the trick seems to be accessing
selinuxfs files during the testsuite run (likely interleaving with
policy reloads).
while true; do cat /sys/fs/selinux/initial_contexts/kernel ; done &
while running the testsuite seems to trigger.
Could also try while true; do sudo load_policy; done & in parallel
with the above loop.
In any event, will retry with Al's updated branch with the fix he proposed.
Stephen Smalley Nov. 16, 2023, 2:30 p.m. UTC | #14
On Thu, Nov 16, 2023 at 8:16 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> On Wed, Nov 15, 2023 at 8:35 AM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> >
> > On Tue, Nov 14, 2023 at 5:24 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > On Tue, Nov 14, 2023 at 3:57 PM Stephen Smalley
> > > <stephen.smalley.work@gmail.com> wrote:
> > > > On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > > >
> > > > > > [
> > > > > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > > > > lock_rename()-related followups in another branch, so a pull would be more
> > > > > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > > > > be very welcome - as it is, the patch is pretty much untested beyond
> > > > > > "it builds".
> > > > > > ]
> > > > >
> > > > > Hi Al,
> > > > >
> > > > > I will admit to glossing over the comment above when I merged this
> > > > > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > > > > not sure if the comment above still applies, but if it does let me
> > > > > know and I can yank/revert the patch in favor of a larger pull.  Let
> > > > > me know what you'd like to do.
> > > >
> > > > Seeing this during testsuite runs:
> > > >
> > > > [ 3550.206423] SELinux:  Converting 1152 SID table entries...
> > > > [ 3550.666195] ------------[ cut here ]------------
> > > > [ 3550.666201] WARNING: CPU: 3 PID: 12300 at fs/inode.c:330 drop_nlink+0x57/0x70
> > >
> > > How are you running the test suite Stephen?  I haven't hit this in my
> > > automated testing and I did another test run manually to make sure I
> > > wasn't missing it and everything ran clean.
> > >
> > > I'm running the latest selinux-testsuite on a current Rawhide system
> > > with kernel 6.7.0-0.rc1.20231114git9bacdd89.17.1.secnext.fc40 (current
> > > Rawhide kernel + the LSM, SELinux, and audit dev trees).
> >
> > Technically this was with a kernel built from your dev branch plus
> > Ondrej's selinux: introduce an initial SID for early boot processes
> > patch, but I don't see how the latter could introduce such a bug. Will
> > retry without it.
>
> Reproduced without Ondrej's patch; the trick seems to be accessing
> selinuxfs files during the testsuite run (likely interleaving with
> policy reloads).
> while true; do cat /sys/fs/selinux/initial_contexts/kernel ; done &
> while running the testsuite seems to trigger.
> Could also try while true; do sudo load_policy; done & in parallel
> with the above loop.
> In any event, will retry with Al's updated branch with the fix he proposed.

So far not showing up with Al's updated for-selinux branch. Difference
between that and your dev branch for selinuxfs is what he showed
earlier in the thread (pardon the whitespace damage):
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 36dc656a642a..0619a1cbbfbe 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1960,6 +1960,7 @@ static struct dentry *sel_make_swapover_dir(struct super_b
lock *sb,
        inc_nlink(inode);
        inode_lock(sb->s_root->d_inode);
        d_add(dentry, inode);
+       inc_nlink(sb->s_root->d_inode);
        inode_unlock(sb->s_root->d_inode);
        return dentry;
 }
Paul Moore Nov. 16, 2023, 5:53 p.m. UTC | #15
On Thu, Nov 16, 2023 at 9:31 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
> On Thu, Nov 16, 2023 at 8:16 AM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> > On Wed, Nov 15, 2023 at 8:35 AM Stephen Smalley
> > <stephen.smalley.work@gmail.com> wrote:
> > > On Tue, Nov 14, 2023 at 5:24 PM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > On Tue, Nov 14, 2023 at 3:57 PM Stephen Smalley
> > > > <stephen.smalley.work@gmail.com> wrote:
> > > > > On Mon, Nov 13, 2023 at 11:19 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > On Mon, Oct 16, 2023 at 6:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > > > >
> > > > > > > [
> > > > > > > That thing sits in viro/vfs.git#work.selinuxfs; I have
> > > > > > > lock_rename()-related followups in another branch, so a pull would be more
> > > > > > > convenient for me than cherry-pick.  NOTE: testing and comments would
> > > > > > > be very welcome - as it is, the patch is pretty much untested beyond
> > > > > > > "it builds".
> > > > > > > ]
> > > > > >
> > > > > > Hi Al,
> > > > > >
> > > > > > I will admit to glossing over the comment above when I merged this
> > > > > > into the selinux/dev branch last night.  As it's been a few weeks, I'm
> > > > > > not sure if the comment above still applies, but if it does let me
> > > > > > know and I can yank/revert the patch in favor of a larger pull.  Let
> > > > > > me know what you'd like to do.
> > > > >
> > > > > Seeing this during testsuite runs:
> > > > >
> > > > > [ 3550.206423] SELinux:  Converting 1152 SID table entries...
> > > > > [ 3550.666195] ------------[ cut here ]------------
> > > > > [ 3550.666201] WARNING: CPU: 3 PID: 12300 at fs/inode.c:330 drop_nlink+0x57/0x70
> > > >
> > > > How are you running the test suite Stephen?  I haven't hit this in my
> > > > automated testing and I did another test run manually to make sure I
> > > > wasn't missing it and everything ran clean.
> > > >
> > > > I'm running the latest selinux-testsuite on a current Rawhide system
> > > > with kernel 6.7.0-0.rc1.20231114git9bacdd89.17.1.secnext.fc40 (current
> > > > Rawhide kernel + the LSM, SELinux, and audit dev trees).
> > >
> > > Technically this was with a kernel built from your dev branch plus
> > > Ondrej's selinux: introduce an initial SID for early boot processes
> > > patch, but I don't see how the latter could introduce such a bug. Will
> > > retry without it.
> >
> > Reproduced without Ondrej's patch; the trick seems to be accessing
> > selinuxfs files during the testsuite run (likely interleaving with
> > policy reloads).
> > while true; do cat /sys/fs/selinux/initial_contexts/kernel ; done &
> > while running the testsuite seems to trigger.
> > Could also try while true; do sudo load_policy; done & in parallel
> > with the above loop.
> > In any event, will retry with Al's updated branch with the fix he proposed.

Ah ha!  That would definitely explain why I didn't see it in my test
runs, I generally don't do anything on the test system when the test
is running.

> So far not showing up with Al's updated for-selinux branch. Difference
> between that and your dev branch for selinuxfs is what he showed
> earlier in the thread (pardon the whitespace damage) ...

I just updated the selinux/dev branch with the change from Al's tree.
Considering the minor nature of the fix, and the other patches
currently in flight on the list, I just updated the original commit
(with a note) and did a force push; I doubt that will cause issues
with anyone, but as I try to avoid force pushes on the */dev branches
I thought it was worth mentioning.

Thanks everyone!
diff mbox series

Patch

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 6fa640263216..ab4beb1cdf20 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -336,12 +336,9 @@  static struct dentry *sel_make_dir(struct dentry *dir, const char *name,
 			unsigned long *ino);
 
 /* declaration for sel_make_policy_nodes */
-static struct dentry *sel_make_disconnected_dir(struct super_block *sb,
+static struct dentry *sel_make_swapover_dir(struct super_block *sb,
 						unsigned long *ino);
 
-/* declaration for sel_make_policy_nodes */
-static void sel_remove_entries(struct dentry *de);
-
 static ssize_t sel_read_mls(struct file *filp, char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -508,13 +505,13 @@  static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
 				struct selinux_policy *newpolicy)
 {
 	int ret = 0;
-	struct dentry *tmp_parent, *tmp_bool_dir, *tmp_class_dir, *old_dentry;
-	unsigned int tmp_bool_num, old_bool_num;
-	char **tmp_bool_names, **old_bool_names;
-	int *tmp_bool_values, *old_bool_values;
+	struct dentry *tmp_parent, *tmp_bool_dir, *tmp_class_dir;
+	unsigned int bool_num = 0;
+	char **bool_names = NULL;
+	int *bool_values = NULL;
 	unsigned long tmp_ino = fsi->last_ino; /* Don't increment last_ino in this function */
 
-	tmp_parent = sel_make_disconnected_dir(fsi->sb, &tmp_ino);
+	tmp_parent = sel_make_swapover_dir(fsi->sb, &tmp_ino);
 	if (IS_ERR(tmp_parent))
 		return PTR_ERR(tmp_parent);
 
@@ -532,8 +529,8 @@  static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
 		goto out;
 	}
 
-	ret = sel_make_bools(newpolicy, tmp_bool_dir, &tmp_bool_num,
-			     &tmp_bool_names, &tmp_bool_values);
+	ret = sel_make_bools(newpolicy, tmp_bool_dir, &bool_num,
+			     &bool_names, &bool_values);
 	if (ret)
 		goto out;
 
@@ -542,38 +539,30 @@  static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
 	if (ret)
 		goto out;
 
+	lock_rename(tmp_parent, fsi->sb->s_root);
+
 	/* booleans */
-	old_dentry = fsi->bool_dir;
-	lock_rename(tmp_bool_dir, old_dentry);
 	d_exchange(tmp_bool_dir, fsi->bool_dir);
 
-	old_bool_num = fsi->bool_num;
-	old_bool_names = fsi->bool_pending_names;
-	old_bool_values = fsi->bool_pending_values;
-
-	fsi->bool_num = tmp_bool_num;
-	fsi->bool_pending_names = tmp_bool_names;
-	fsi->bool_pending_values = tmp_bool_values;
-
-	sel_remove_old_bool_data(old_bool_num, old_bool_names, old_bool_values);
+	swap(fsi->bool_num, bool_num);
+	swap(fsi->bool_pending_names, bool_names);
+	swap(fsi->bool_pending_values, bool_values);
 
 	fsi->bool_dir = tmp_bool_dir;
-	unlock_rename(tmp_bool_dir, old_dentry);
 
 	/* classes */
-	old_dentry = fsi->class_dir;
-	lock_rename(tmp_class_dir, old_dentry);
 	d_exchange(tmp_class_dir, fsi->class_dir);
 	fsi->class_dir = tmp_class_dir;
-	unlock_rename(tmp_class_dir, old_dentry);
+
+	unlock_rename(tmp_parent, fsi->sb->s_root);
 
 out:
+	sel_remove_old_bool_data(bool_num, bool_names, bool_values);
 	/* Since the other temporary dirs are children of tmp_parent
 	 * this will handle all the cleanup in the case of a failure before
 	 * the swapover
 	 */
-	sel_remove_entries(tmp_parent);
-	dput(tmp_parent); /* d_genocide() only handles the children */
+	simple_recursive_removal(tmp_parent, NULL);
 
 	return ret;
 }
@@ -1351,54 +1340,48 @@  static const struct file_operations sel_commit_bools_ops = {
 	.llseek		= generic_file_llseek,
 };
 
-static void sel_remove_entries(struct dentry *de)
-{
-	d_genocide(de);
-	shrink_dcache_parent(de);
-}
-
 static int sel_make_bools(struct selinux_policy *newpolicy, struct dentry *bool_dir,
 			  unsigned int *bool_num, char ***bool_pending_names,
 			  int **bool_pending_values)
 {
 	int ret;
-	ssize_t len;
-	struct dentry *dentry = NULL;
-	struct inode *inode = NULL;
-	struct inode_security_struct *isec;
-	char **names = NULL, *page;
+	char **names, *page;
 	u32 i, num;
-	int *values = NULL;
-	u32 sid;
 
-	ret = -ENOMEM;
 	page = (char *)get_zeroed_page(GFP_KERNEL);
 	if (!page)
-		goto out;
+		return -ENOMEM;
 
-	ret = security_get_bools(newpolicy, &num, &names, &values);
+	ret = security_get_bools(newpolicy, &num, &names, bool_pending_values);
 	if (ret)
 		goto out;
 
+	*bool_num = num;
+	*bool_pending_names = names;
+
 	for (i = 0; i < num; i++) {
-		ret = -ENOMEM;
+		struct dentry *dentry;
+		struct inode *inode;
+		struct inode_security_struct *isec;
+		ssize_t len;
+		u32 sid;
+
+		len = snprintf(page, PAGE_SIZE, "/%s/%s", BOOL_DIR_NAME, names[i]);
+		if (len >= PAGE_SIZE) {
+			ret = -ENAMETOOLONG;
+			break;
+		}
 		dentry = d_alloc_name(bool_dir, names[i]);
-		if (!dentry)
-			goto out;
+		if (!dentry) {
+			ret = -ENOMEM;
+			break;
+		}
 
-		ret = -ENOMEM;
 		inode = sel_make_inode(bool_dir->d_sb, S_IFREG | S_IRUGO | S_IWUSR);
 		if (!inode) {
 			dput(dentry);
-			goto out;
-		}
-
-		ret = -ENAMETOOLONG;
-		len = snprintf(page, PAGE_SIZE, "/%s/%s", BOOL_DIR_NAME, names[i]);
-		if (len >= PAGE_SIZE) {
-			dput(dentry);
-			iput(inode);
-			goto out;
+			ret = -ENOMEM;
+			break;
 		}
 
 		isec = selinux_inode(inode);
@@ -1416,23 +1399,8 @@  static int sel_make_bools(struct selinux_policy *newpolicy, struct dentry *bool_
 		inode->i_ino = i|SEL_BOOL_INO_OFFSET;
 		d_add(dentry, inode);
 	}
-	*bool_num = num;
-	*bool_pending_names = names;
-	*bool_pending_values = values;
-
-	free_page((unsigned long)page);
-	return 0;
 out:
 	free_page((unsigned long)page);
-
-	if (names) {
-		for (i = 0; i < num; i++)
-			kfree(names[i]);
-		kfree(names);
-	}
-	kfree(values);
-	sel_remove_entries(bool_dir);
-
 	return ret;
 }
 
@@ -1961,20 +1929,39 @@  static struct dentry *sel_make_dir(struct dentry *dir, const char *name,
 	return dentry;
 }
 
-static struct dentry *sel_make_disconnected_dir(struct super_block *sb,
+static int reject_all(struct mnt_idmap *idmap, struct inode *inode, int mask)
+{
+	return -EPERM;	// no access for anyone, root or no root.
+}
+
+static const struct inode_operations swapover_dir_inode_operations = {
+	.lookup		= simple_lookup,
+	.permission	= reject_all,
+};
+
+static struct dentry *sel_make_swapover_dir(struct super_block *sb,
 						unsigned long *ino)
 {
-	struct inode *inode = sel_make_inode(sb, S_IFDIR | S_IRUGO | S_IXUGO);
+	struct dentry *dentry = d_alloc_name(sb->s_root, ".swapover");
+	struct inode *inode;
 
-	if (!inode)
+	if (!dentry)
 		return ERR_PTR(-ENOMEM);
 
-	inode->i_op = &simple_dir_inode_operations;
-	inode->i_fop = &simple_dir_operations;
+	inode = sel_make_inode(sb, S_IFDIR);
+	if (!inode) {
+		dput(dentry);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	inode->i_op = &swapover_dir_inode_operations;
 	inode->i_ino = ++(*ino);
 	/* directory inodes start off with i_nlink == 2 (for "." entry) */
 	inc_nlink(inode);
-	return d_obtain_alias(inode);
+	inode_lock(sb->s_root->d_inode);
+	d_add(dentry, inode);
+	inode_unlock(sb->s_root->d_inode);
+	return dentry;
 }
 
 #define NULL_FILE_NAME "null"