diff mbox series

[RFC,3/3] selinux: track policy lifetime with refcount

Message ID 20200825152045.1719298-4-omosnace@redhat.com (mailing list archive)
State Superseded
Headers show
Series selinux: RCU conversion follow-ups | expand

Commit Message

Ondrej Mosnacek Aug. 25, 2020, 3:20 p.m. UTC
Instead of holding the RCU read lock the whole time while accessing the
policy, add a simple refcount mechanism to track its lifetime. After
this, the RCU read lock is held only for a brief time when fetching the
policy pointer and incrementing the refcount. The policy struct is then
guaranteed to stay alive until the refcount is decremented.

Freeing of the policy remains the responsibility of the task that does
the policy reload. In case the refcount drops to zero in a different
task, the policy load task is notified via a completion.

The advantage of this change is that the operations that access the
policy can now do sleeping allocations, since they don't need to hold
the RCU read lock anymore. This patch so far only leverages this in
security_read_policy() for the vmalloc_user() allocation (although this
function is always called under fsi->mutex and could just access the
policy pointer directly). The conversion of affected GFP_ATOMIC
allocations to GFP_KERNEL is left for a later patch, since auditing
which code paths may still need GFP_ATOMIC is not very easy.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 security/selinux/ss/services.c | 286 ++++++++++++++++-----------------
 security/selinux/ss/services.h |   6 +
 2 files changed, 147 insertions(+), 145 deletions(-)

Comments

Stephen Smalley Aug. 25, 2020, 4:45 p.m. UTC | #1
On Tue, Aug 25, 2020 at 11:20 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> Instead of holding the RCU read lock the whole time while accessing the
> policy, add a simple refcount mechanism to track its lifetime. After
> this, the RCU read lock is held only for a brief time when fetching the
> policy pointer and incrementing the refcount. The policy struct is then
> guaranteed to stay alive until the refcount is decremented.
>
> Freeing of the policy remains the responsibility of the task that does
> the policy reload. In case the refcount drops to zero in a different
> task, the policy load task is notified via a completion.

That's an interesting pattern.  Is this approach used anywhere else in
the kernel?  I didn't see any examples of it in the RCU documentation.

> The advantage of this change is that the operations that access the
> policy can now do sleeping allocations, since they don't need to hold
> the RCU read lock anymore. This patch so far only leverages this in
> security_read_policy() for the vmalloc_user() allocation (although this
> function is always called under fsi->mutex and could just access the
> policy pointer directly). The conversion of affected GFP_ATOMIC
> allocations to GFP_KERNEL is left for a later patch, since auditing
> which code paths may still need GFP_ATOMIC is not very easy.

Technically we don't need this patch for that purpose because
rcu_read_lock() isn't actually needed at all in
security_read_policy(), so I think we're better off just getting rid
of it there and letting it use rcu_dereference_check(..., 1) or
rcu_dereference_protected() instead.
Ondrej Mosnacek Aug. 25, 2020, 5:30 p.m. UTC | #2
On Tue, Aug 25, 2020 at 6:47 PM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
> On Tue, Aug 25, 2020 at 11:20 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > Instead of holding the RCU read lock the whole time while accessing the
> > policy, add a simple refcount mechanism to track its lifetime. After
> > this, the RCU read lock is held only for a brief time when fetching the
> > policy pointer and incrementing the refcount. The policy struct is then
> > guaranteed to stay alive until the refcount is decremented.
> >
> > Freeing of the policy remains the responsibility of the task that does
> > the policy reload. In case the refcount drops to zero in a different
> > task, the policy load task is notified via a completion.
>
> That's an interesting pattern.  Is this approach used anywhere else in
> the kernel?  I didn't see any examples of it in the RCU documentation.

If you mean RCU + reference counting, that's actually mentioned in RCU
documentation in quite a few places as an option, e.g. [1] or [2].

As for the completion, I'm not aware if it's been used like this yet,
but it seems to fit the purpose nicely. At least I hope there are no
hidden gotchas, but I couldn't think of any. I know it from the crypto
subsystem, where it's often used to wait for the result of an
asynchronous operation.

[1] https://www.kernel.org/doc/html/latest/RCU/whatisRCU.html#rcu-read-lock
[2] https://www.kernel.org/doc/html/latest/core-api/kernel-api.html?highlight=long_lived#c.rcu_pointer_handoff

>
> > The advantage of this change is that the operations that access the
> > policy can now do sleeping allocations, since they don't need to hold
> > the RCU read lock anymore. This patch so far only leverages this in
> > security_read_policy() for the vmalloc_user() allocation (although this
> > function is always called under fsi->mutex and could just access the
> > policy pointer directly). The conversion of affected GFP_ATOMIC
> > allocations to GFP_KERNEL is left for a later patch, since auditing
> > which code paths may still need GFP_ATOMIC is not very easy.
>
> Technically we don't need this patch for that purpose because
> rcu_read_lock() isn't actually needed at all in
> security_read_policy(), so I think we're better off just getting rid
> of it there and letting it use rcu_dereference_check(..., 1) or
> rcu_dereference_protected() instead.

Yes, I'll address that in the next revision.

--
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.
Paul E. McKenney Aug. 25, 2020, 5:50 p.m. UTC | #3
On Tue, Aug 25, 2020 at 12:45:43PM -0400, Stephen Smalley wrote:
> On Tue, Aug 25, 2020 at 11:20 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > Instead of holding the RCU read lock the whole time while accessing the
> > policy, add a simple refcount mechanism to track its lifetime. After
> > this, the RCU read lock is held only for a brief time when fetching the
> > policy pointer and incrementing the refcount. The policy struct is then
> > guaranteed to stay alive until the refcount is decremented.
> >
> > Freeing of the policy remains the responsibility of the task that does
> > the policy reload. In case the refcount drops to zero in a different
> > task, the policy load task is notified via a completion.
> 
> That's an interesting pattern.  Is this approach used anywhere else in
> the kernel?  I didn't see any examples of it in the RCU documentation.

The function txopt_get() in include/net/ipv6.h uses something quite
similar.  By convention, rcu_pointer_handoff() is (sometimes) used to
indicate that the pointer is now protected by something other than RCU,
as shown in that function.  And grepping for rcu_pointer_handoff()
can find you a few more.

							Thanx, Paul

> > The advantage of this change is that the operations that access the
> > policy can now do sleeping allocations, since they don't need to hold
> > the RCU read lock anymore. This patch so far only leverages this in
> > security_read_policy() for the vmalloc_user() allocation (although this
> > function is always called under fsi->mutex and could just access the
> > policy pointer directly). The conversion of affected GFP_ATOMIC
> > allocations to GFP_KERNEL is left for a later patch, since auditing
> > which code paths may still need GFP_ATOMIC is not very easy.
> 
> Technically we don't need this patch for that purpose because
> rcu_read_lock() isn't actually needed at all in
> security_read_policy(), so I think we're better off just getting rid
> of it there and letting it use rcu_dereference_check(..., 1) or
> rcu_dereference_protected() instead.
Peter Enderborg Aug. 25, 2020, 6:51 p.m. UTC | #4
On 8/25/20 5:20 PM, Ondrej Mosnacek wrote:
> Instead of holding the RCU read lock the whole time while accessing the
> policy, add a simple refcount mechanism to track its lifetime. After
> this, the RCU read lock is held only for a brief time when fetching the
> policy pointer and incrementing the refcount. The policy struct is then
> guaranteed to stay alive until the refcount is decremented.
>
> Freeing of the policy remains the responsibility of the task that does
> the policy reload. In case the refcount drops to zero in a different
> task, the policy load task is notified via a completion.
>
> The advantage of this change is that the operations that access the
> policy can now do sleeping allocations, since they don't need to hold
> the RCU read lock anymore. This patch so far only leverages this in
> security_read_policy() for the vmalloc_user() allocation (although this
> function is always called under fsi->mutex and could just access the
> policy pointer directly). The conversion of affected GFP_ATOMIC
> allocations to GFP_KERNEL is left for a later patch, since auditing
> which code paths may still need GFP_ATOMIC is not very easy.
>
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
>
Very clever. But is it the right prioritization? We get a lot more
cpu synchronization need with two RCU in-out and refcounts inc/dec
instead of only one RCU in-out.  What is the problem with the atomic
allocations? And this if for each syscall, all caches are on the inside?
Lakshmi Ramasubramanian Sept. 5, 2020, 9:33 p.m. UTC | #5
On 8/25/20 8:20 AM, Ondrej Mosnacek wrote:

Hi Ondrej,

I am just trying understand the expected behavior w.r.t the usage of 
rcu_dereference_protected() for accessing SELinux policy. Could you 
please clarify?

> Instead of holding the RCU read lock the whole time while accessing the
> policy, add a simple refcount mechanism to track its lifetime. After
> this, the RCU read lock is held only for a brief time when fetching the
> policy pointer and incrementing the refcount. The policy struct is then
> guaranteed to stay alive until the refcount is decremented.
> 
> Freeing of the policy remains the responsibility of the task that does
> the policy reload. In case the refcount drops to zero in a different
> task, the policy load task is notified via a completion.
> 
> The advantage of this change is that the operations that access the
> policy can now do sleeping allocations, since they don't need to hold
> the RCU read lock anymore. This patch so far only leverages this in
> security_read_policy() for the vmalloc_user() allocation (although this
> function is always called under fsi->mutex and could just access the
> policy pointer directly). The conversion of affected GFP_ATOMIC
> allocations to GFP_KERNEL is left for a later patch, since auditing
> which code paths may still need GFP_ATOMIC is not very easy.
> 
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>   security/selinux/ss/services.c | 286 ++++++++++++++++-----------------
>   security/selinux/ss/services.h |   6 +
>   2 files changed, 147 insertions(+), 145 deletions(-)

int security_read_policy(struct selinux_state *state,
			 void **data, size_t *len)
{
	struct selinux_policy *policy;

	policy = rcu_dereference_protected(
			state->policy,
                         lockdep_is_held(&state->policy_mutex));
	if (!policy)
		return -EINVAL;
...
}

If "policy_mutex" is not held by the caller of security_read_policy() I 
was expecting the above rcu_dereference_protected() call to return NULL, 
but policy is non-NULL and I see the following messages in "dmesg" log.

Is this expected?

[   78.627152] =============================
[   78.627155] WARNING: suspicious RCU usage
[   78.627159] 5.9.0-rc1+ #10 Not tainted
[   78.627162] -----------------------------
[   78.627166] security/selinux/ss/services.c:3950 suspicious 
rcu_dereference_protected() usage!
[   78.627169]
                other info that might help us debug this:

[   78.627173]
                rcu_scheduler_active = 2, debug_locks = 1
[   78.627177] 1 lock held by bash/2446:
[   78.627179]  #0: ffff939ef5f69448 (sb_writers#7){.+.+}-{0:0}, at: 
vfs_write+0x1b8/0x230
[   78.627199]
                stack backtrace:
[   78.627205] CPU: 10 PID: 2446 Comm: bash Not tainted 5.9.0-rc1+ #10
[   78.627208] Hardware name: LENOVO 30BFS07500/1036, BIOS S03KT33A 
08/06/2019
[   78.627211] Call Trace:
[   78.627222]  dump_stack+0x9f/0xe5
[   78.627231]  lockdep_rcu_suspicious+0xce/0xf0
[   78.627256]  security_read_policy_kernel+0x10a/0x140
[   78.627268]  selinux_measure_state+0x1dc/0x270
[   78.627282]  sel_write_checkreqprot+0x129/0x1a0
[   78.627296]  vfs_write+0xdd/0x230
[   78.627300]  ? sel_read_handle_unknown+0xb0/0xb0
[   78.627304]  ? vfs_write+0xdd/0x230
[   78.627313]  ksys_write+0xad/0xf0
[   78.627324]  __x64_sys_write+0x1a/0x20
[   78.627333]  do_syscall_64+0x37/0x80
[   78.627341]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   78.627346] RIP: 0033:0x7faabb210264
[   78.627351] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 
00 00 00 66 90 48 8d 05 a1 06 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5

thanks,
  -lakshmi
Ondrej Mosnacek Sept. 7, 2020, 7:47 a.m. UTC | #6
On Sat, Sep 5, 2020 at 11:33 PM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
> On 8/25/20 8:20 AM, Ondrej Mosnacek wrote:
>
> Hi Ondrej,
>
> I am just trying understand the expected behavior w.r.t the usage of
> rcu_dereference_protected() for accessing SELinux policy. Could you
> please clarify?
>
> > Instead of holding the RCU read lock the whole time while accessing the
> > policy, add a simple refcount mechanism to track its lifetime. After
> > this, the RCU read lock is held only for a brief time when fetching the
> > policy pointer and incrementing the refcount. The policy struct is then
> > guaranteed to stay alive until the refcount is decremented.
> >
> > Freeing of the policy remains the responsibility of the task that does
> > the policy reload. In case the refcount drops to zero in a different
> > task, the policy load task is notified via a completion.
> >
> > The advantage of this change is that the operations that access the
> > policy can now do sleeping allocations, since they don't need to hold
> > the RCU read lock anymore. This patch so far only leverages this in
> > security_read_policy() for the vmalloc_user() allocation (although this
> > function is always called under fsi->mutex and could just access the
> > policy pointer directly). The conversion of affected GFP_ATOMIC
> > allocations to GFP_KERNEL is left for a later patch, since auditing
> > which code paths may still need GFP_ATOMIC is not very easy.
> >
> > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >   security/selinux/ss/services.c | 286 ++++++++++++++++-----------------
> >   security/selinux/ss/services.h |   6 +
> >   2 files changed, 147 insertions(+), 145 deletions(-)
>
> int security_read_policy(struct selinux_state *state,
>                          void **data, size_t *len)
> {
>         struct selinux_policy *policy;
>
>         policy = rcu_dereference_protected(
>                         state->policy,
>                          lockdep_is_held(&state->policy_mutex));
>         if (!policy)
>                 return -EINVAL;
> ...
> }
>
> If "policy_mutex" is not held by the caller of security_read_policy() I
> was expecting the above rcu_dereference_protected() call to return NULL,

No, that's not how rcu_dereference_protected() works. You should only
call it when you know for sure that you are in a section that is
mutually exclusive with anything that might change the pointer. The
check argument is only used for sanity checking that this is indeed
true, but other than triggering a warning when RCU debugging is
enabled the result of the check is ignored.

If the returned pointer is NULL, it just means that the pointer hasn't
been assigned yet, i.e. that no policy has been loaded yet (this was
checked using selinux_initialized() before, but when we're under
policy_mutex, checking the pointer for NULL is possible and simpler).

BTW, you should've replied to [1], which is the merged patch that
introduced the code you are referencing :)

[1] https://patchwork.kernel.org/patch/11741025/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git/commit/?h=next&id=9ff9abc4c6be27ff27b6df625501a46711730520
[3] https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git/commit/?h=next&id=66ccd2560affc6e653ef7372ea36fb825743d186

> but policy is non-NULL and I see the following messages in "dmesg" log.
>
> Is this expected?

Yes, that's expected. The caller of security_read_policy() in this
case needs to ensure that state->policy_mutex is being held.

--
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.
Lakshmi Ramasubramanian Sept. 7, 2020, 2:03 p.m. UTC | #7
On 9/7/20 12:47 AM, Ondrej Mosnacek wrote:
> On Sat, Sep 5, 2020 at 11:33 PM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
>> On 8/25/20 8:20 AM, Ondrej Mosnacek wrote:
>>
>> Hi Ondrej,
>>
>> I am just trying understand the expected behavior w.r.t the usage of
>> rcu_dereference_protected() for accessing SELinux policy. Could you
>> please clarify?
>>
>>> Instead of holding the RCU read lock the whole time while accessing the
>>> policy, add a simple refcount mechanism to track its lifetime. After
>>> this, the RCU read lock is held only for a brief time when fetching the
>>> policy pointer and incrementing the refcount. The policy struct is then
>>> guaranteed to stay alive until the refcount is decremented.
>>>
>>> Freeing of the policy remains the responsibility of the task that does
>>> the policy reload. In case the refcount drops to zero in a different
>>> task, the policy load task is notified via a completion.
>>>
>>> The advantage of this change is that the operations that access the
>>> policy can now do sleeping allocations, since they don't need to hold
>>> the RCU read lock anymore. This patch so far only leverages this in
>>> security_read_policy() for the vmalloc_user() allocation (although this
>>> function is always called under fsi->mutex and could just access the
>>> policy pointer directly). The conversion of affected GFP_ATOMIC
>>> allocations to GFP_KERNEL is left for a later patch, since auditing
>>> which code paths may still need GFP_ATOMIC is not very easy.
>>>
>>> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
>>> ---
>>>    security/selinux/ss/services.c | 286 ++++++++++++++++-----------------
>>>    security/selinux/ss/services.h |   6 +
>>>    2 files changed, 147 insertions(+), 145 deletions(-)
>>
>> int security_read_policy(struct selinux_state *state,
>>                           void **data, size_t *len)
>> {
>>          struct selinux_policy *policy;
>>
>>          policy = rcu_dereference_protected(
>>                          state->policy,
>>                           lockdep_is_held(&state->policy_mutex));
>>          if (!policy)
>>                  return -EINVAL;
>> ...
>> }
>>
>> If "policy_mutex" is not held by the caller of security_read_policy() I
>> was expecting the above rcu_dereference_protected() call to return NULL,
> 
> No, that's not how rcu_dereference_protected() works. You should only
> call it when you know for sure that you are in a section that is
> mutually exclusive with anything that might change the pointer. The
> check argument is only used for sanity checking that this is indeed
> true, but other than triggering a warning when RCU debugging is
> enabled the result of the check is ignored.
> 
> If the returned pointer is NULL, it just means that the pointer hasn't
> been assigned yet, i.e. that no policy has been loaded yet (this was
> checked using selinux_initialized() before, but when we're under
> policy_mutex, checking the pointer for NULL is possible and simpler).
> 
> BTW, you should've replied to [1], which is the merged patch that
> introduced the code you are referencing :)

Thanks for clarifying Ondrej.

  -lakshmi
diff mbox series

Patch

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index d863b23f2a670..393bc7cc03d73 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -66,6 +66,33 @@ 
 #include "audit.h"
 #include "policycap_names.h"
 
+static struct selinux_policy *selinux_policy_get(struct selinux_state *state)
+{
+	struct selinux_policy *policy;
+
+	rcu_read_lock();
+	policy = rcu_dereference(state->policy);
+	if (policy)
+		refcount_inc(&policy->refcount);
+	policy = rcu_pointer_handoff(policy);
+	rcu_read_unlock();
+
+	return policy;
+}
+
+static void selinux_policy_put(struct selinux_policy *policy)
+{
+	if (!policy)
+		return;
+
+	/*
+	 * Decrement the refcount and if it becomes zero, tell
+	 * the loading task that it can now free the policy.
+	 */
+	if (unlikely(refcount_dec_and_test(&policy->refcount)))
+		complete(&policy->eol);
+}
+
 /* Forward declaration. */
 static int context_struct_to_string(struct policydb *policydb,
 				    struct context *context,
@@ -233,13 +260,12 @@  int security_mls_enabled(struct selinux_state *state)
 	int mls_enabled;
 	struct selinux_policy *policy;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	mls_enabled = policy->policydb.mls_enabled;
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return mls_enabled;
 }
 
@@ -758,12 +784,10 @@  static int security_compute_validatetrans(struct selinux_state *state,
 	int rc = 0;
 
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -822,7 +846,7 @@  static int security_compute_validatetrans(struct selinux_state *state,
 	}
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -862,11 +886,10 @@  int security_bounded_transition(struct selinux_state *state,
 	int index;
 	int rc;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -929,7 +952,7 @@  int security_bounded_transition(struct selinux_state *state,
 		kfree(old_name);
 	}
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 
 	return rc;
 }
@@ -1024,11 +1047,10 @@  void security_compute_xperms_decision(struct selinux_state *state,
 	memset(xpermd->auditallow->p, 0, sizeof(xpermd->auditallow->p));
 	memset(xpermd->dontaudit->p, 0, sizeof(xpermd->dontaudit->p));
 
-	rcu_read_lock();
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		goto allow;
 
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -1078,7 +1100,7 @@  void security_compute_xperms_decision(struct selinux_state *state,
 		}
 	}
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return;
 allow:
 	memset(xpermd->allowed->p, 0xff, sizeof(xpermd->allowed->p));
@@ -1109,11 +1131,10 @@  void security_compute_av(struct selinux_state *state,
 	u16 tclass;
 	struct context *scontext = NULL, *tcontext = NULL;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
+	policy = selinux_policy_get(state);
 	avd_init(policy, avd);
 	xperms->len = 0;
-	if (!selinux_initialized(state))
+	if (!policy)
 		goto allow;
 
 	policydb = &policy->policydb;
@@ -1148,7 +1169,7 @@  void security_compute_av(struct selinux_state *state,
 	map_decision(&policy->map, orig_tclass, avd,
 		     policydb->allow_unknown);
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return;
 allow:
 	avd->allowed = 0xffffffff;
@@ -1166,10 +1187,9 @@  void security_compute_av_user(struct selinux_state *state,
 	struct sidtab *sidtab;
 	struct context *scontext = NULL, *tcontext = NULL;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
+	policy = selinux_policy_get(state);
 	avd_init(policy, avd);
-	if (!selinux_initialized(state))
+	if (!policy)
 		goto allow;
 
 	policydb = &policy->policydb;
@@ -1201,8 +1221,8 @@  void security_compute_av_user(struct selinux_state *state,
 
 	context_struct_compute_av(policydb, scontext, tcontext, tclass, avd,
 				  NULL);
- out:
-	rcu_read_unlock();
+out:
+	selinux_policy_put(policy);
 	return;
 allow:
 	avd->allowed = 0xffffffff;
@@ -1290,16 +1310,15 @@  int security_sidtab_hash_stats(struct selinux_state *state, char *page)
 	struct selinux_policy *policy;
 	int rc;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		pr_err("SELinux: %s:  called before initial load_policy\n",
 		       __func__);
 		return -EINVAL;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	rc = sidtab_hash_stats(policy->sidtab, page);
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 
 	return rc;
 }
@@ -1326,7 +1345,8 @@  static int security_sid_to_context_core(struct selinux_state *state,
 		*scontext = NULL;
 	*scontext_len  = 0;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		if (sid <= SECINITSID_NUM) {
 			char *scontextp;
 			const char *s = initial_sid_to_string[sid];
@@ -1346,8 +1366,6 @@  static int security_sid_to_context_core(struct selinux_state *state,
 		       "load_policy on unknown SID %d\n", __func__, sid);
 		return -EINVAL;
 	}
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -1368,7 +1386,7 @@  static int security_sid_to_context_core(struct selinux_state *state,
 				    scontext_len);
 
 out_unlock:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 
 }
@@ -1519,7 +1537,8 @@  static int security_context_to_sid_core(struct selinux_state *state,
 	if (!scontext2)
 		return -ENOMEM;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		int i;
 
 		for (i = 1; i < SECINITSID_NUM; i++) {
@@ -1542,8 +1561,6 @@  static int security_context_to_sid_core(struct selinux_state *state,
 		if (!str)
 			goto out;
 	}
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 	rc = string_to_context_struct(policydb, sidtab, scontext2,
@@ -1553,12 +1570,11 @@  static int security_context_to_sid_core(struct selinux_state *state,
 		context.len = strlen(str) + 1;
 		str = NULL;
 	} else if (rc)
-		goto out_unlock;
+		goto out;
 	rc = sidtab_context_to_sid(sidtab, &context, sid);
 	context_destroy(&context);
-out_unlock:
-	rcu_read_unlock();
 out:
+	selinux_policy_put(policy);
 	kfree(scontext2);
 	kfree(str);
 	return rc;
@@ -1714,7 +1730,8 @@  static int security_compute_sid(struct selinux_state *state,
 	int rc = 0;
 	bool sock;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		switch (orig_tclass) {
 		case SECCLASS_PROCESS: /* kernel value */
 			*out_sid = ssid;
@@ -1728,10 +1745,6 @@  static int security_compute_sid(struct selinux_state *state,
 
 	context_init(&newcontext);
 
-	rcu_read_lock();
-
-	policy = rcu_dereference(state->policy);
-
 	if (kern) {
 		tclass = unmap_class(&policy->map, orig_tclass);
 		sock = security_is_socket_class(orig_tclass);
@@ -1871,7 +1884,7 @@  static int security_compute_sid(struct selinux_state *state,
 	/* Obtain the sid for the context. */
 	rc = sidtab_context_to_sid(sidtab, &newcontext, out_sid);
 out_unlock:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	context_destroy(&newcontext);
 out:
 	return rc;
@@ -2220,14 +2233,15 @@  void selinux_policy_commit(struct selinux_state *state,
 
 	if (!oldpolicy) {
 		/*
-		 * After first policy load, the security server is
-		 * marked as initialized and ready to handle requests and
-		 * any objects created prior to policy load are then labeled.
+		 * After first policy load, any objects created prior
+		 * to policy load need to be labeled.
 		 */
 		selinux_complete_init();
 	} else {
 		/* Free the old policy */
 		synchronize_rcu();
+		if (unlikely(!refcount_dec_and_test(&oldpolicy->refcount)))
+			wait_for_completion(&oldpolicy->eol);
 		selinux_policy_free(oldpolicy);
 	}
 
@@ -2258,6 +2272,9 @@  int security_load_policy(struct selinux_state *state, void *data, size_t len,
 	if (!newpolicy)
 		return -ENOMEM;
 
+	refcount_set(&newpolicy->refcount, 1);
+	init_completion(&newpolicy->eol);
+
 	newpolicy->sidtab = kzalloc(sizeof(*newpolicy->sidtab), GFP_KERNEL);
 	if (!newpolicy->sidtab)
 		goto err;
@@ -2340,13 +2357,10 @@  int security_port_sid(struct selinux_state *state,
 	struct ocontext *c;
 	int rc = 0;
 
-	if (!selinux_initialized(state)) {
-		*out_sid = SECINITSID_PORT;
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
-	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2372,7 +2386,7 @@  int security_port_sid(struct selinux_state *state,
 	}
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -2391,13 +2405,12 @@  int security_ib_pkey_sid(struct selinux_state *state,
 	struct ocontext *c;
 	int rc = 0;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*out_sid = SECINITSID_UNLABELED;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2424,7 +2437,7 @@  int security_ib_pkey_sid(struct selinux_state *state,
 		*out_sid = SECINITSID_UNLABELED;
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -2443,13 +2456,12 @@  int security_ib_endport_sid(struct selinux_state *state,
 	struct ocontext *c;
 	int rc = 0;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*out_sid = SECINITSID_UNLABELED;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2476,7 +2488,7 @@  int security_ib_endport_sid(struct selinux_state *state,
 		*out_sid = SECINITSID_UNLABELED;
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -2494,13 +2506,12 @@  int security_netif_sid(struct selinux_state *state,
 	int rc = 0;
 	struct ocontext *c;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*if_sid = SECINITSID_NETIF;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2527,7 +2538,7 @@  int security_netif_sid(struct selinux_state *state,
 		*if_sid = SECINITSID_NETIF;
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -2563,13 +2574,12 @@  int security_node_sid(struct selinux_state *state,
 	int rc;
 	struct ocontext *c;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*out_sid = SECINITSID_NODE;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2626,7 +2636,7 @@  int security_node_sid(struct selinux_state *state,
 
 	rc = 0;
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -2666,11 +2676,10 @@  int security_get_user_sids(struct selinux_state *state,
 	*sids = NULL;
 	*nel = 0;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		goto out;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2723,7 +2732,7 @@  int security_get_user_sids(struct selinux_state *state,
 	}
 	rc = 0;
 out_unlock:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	if (rc || !mynel) {
 		kfree(mysids);
 		goto out;
@@ -2837,16 +2846,15 @@  int security_genfs_sid(struct selinux_state *state,
 	struct selinux_policy *policy;
 	int retval;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*sid = SECINITSID_UNLABELED;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	retval = __security_genfs_sid(policy,
 				fstype, path, orig_sclass, sid);
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return retval;
 }
 
@@ -2874,14 +2882,13 @@  int security_fs_use(struct selinux_state *state, struct super_block *sb)
 	struct superblock_security_struct *sbsec = sb->s_security;
 	const char *fstype = sb->s_type->name;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		sbsec->behavior = SECURITY_FS_USE_NONE;
 		sbsec->sid = SECINITSID_UNLABELED;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -2913,7 +2920,7 @@  int security_fs_use(struct selinux_state *state, struct super_block *sb)
 	}
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -2976,17 +2983,15 @@  int security_set_bools(struct selinux_state *state, u32 len, int *values)
 	int rc;
 	u32 i, seqno = 0;
 
-	if (!selinux_initialized(state))
-		return -EINVAL;
-
 	/*
 	 * NOTE: We do not need to take the rcu read lock
 	 * around the code below because other policy-modifying
 	 * operations are already excluded by selinuxfs via
 	 * fsi->mutex.
 	 */
-
 	oldpolicy = rcu_dereference_check(state->policy, 1);
+	if (!oldpolicy)
+		return -EINVAL;
 
 	/* Consistency check on number of booleans, should never fail */
 	if (WARN_ON(len != oldpolicy->policydb.p_bools.nprim))
@@ -2996,6 +3001,10 @@  int security_set_bools(struct selinux_state *state, u32 len, int *values)
 	if (!newpolicy)
 		return -ENOMEM;
 
+	/* Re-init lifecycle management fields. */
+	refcount_set(&newpolicy->refcount, 1);
+	init_completion(&newpolicy->eol);
+
 	/*
 	 * Deep copy only the parts of the policydb that might be
 	 * modified as a result of changing booleans.
@@ -3040,6 +3049,8 @@  int security_set_bools(struct selinux_state *state, u32 len, int *values)
 	 * structure itself but not what it references.
 	 */
 	synchronize_rcu();
+	if (unlikely(!refcount_dec_and_test(&oldpolicy->refcount)))
+		wait_for_completion(&oldpolicy->eol);
 	selinux_policy_cond_free(oldpolicy);
 
 	/* Notify others of the policy change */
@@ -3055,11 +3066,10 @@  int security_get_bool_value(struct selinux_state *state,
 	int rc;
 	u32 len;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 
 	rc = -EFAULT;
@@ -3069,7 +3079,7 @@  int security_get_bool_value(struct selinux_state *state,
 
 	rc = policydb->bool_val_to_struct[index]->state;
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -3120,15 +3130,14 @@  int security_sid_mls_copy(struct selinux_state *state,
 	int rc;
 
 	rc = 0;
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*new_sid = sid;
 		goto out;
 	}
 
 	context_init(&newcon);
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -3184,7 +3193,7 @@  int security_sid_mls_copy(struct selinux_state *state,
 	}
 	rc = sidtab_context_to_sid(sidtab, &newcon, new_sid);
 out_unlock:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	context_destroy(&newcon);
 out:
 	return rc;
@@ -3239,11 +3248,10 @@  int security_net_peersid_resolve(struct selinux_state *state,
 		return 0;
 	}
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -3282,7 +3290,7 @@  int security_net_peersid_resolve(struct selinux_state *state,
 	 * expressive */
 	*peer_sid = xfrm_sid;
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -3389,13 +3397,12 @@  int security_get_reject_unknown(struct selinux_state *state)
 	struct selinux_policy *policy;
 	int value;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	value = policy->policydb.reject_unknown;
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return value;
 }
 
@@ -3404,13 +3411,12 @@  int security_get_allow_unknown(struct selinux_state *state)
 	struct selinux_policy *policy;
 	int value;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	value = policy->policydb.allow_unknown;
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return value;
 }
 
@@ -3430,13 +3436,12 @@  int security_policycap_supported(struct selinux_state *state,
 	struct selinux_policy *policy;
 	int rc;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	rc = ebitmap_get_bit(&policy->policydb.policycaps, req_cap);
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 
 	return rc;
 }
@@ -3470,9 +3475,6 @@  int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule)
 
 	*rule = NULL;
 
-	if (!selinux_initialized(state))
-		return -EOPNOTSUPP;
-
 	switch (field) {
 	case AUDIT_SUBJ_USER:
 	case AUDIT_SUBJ_ROLE:
@@ -3497,14 +3499,16 @@  int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule)
 		return -EINVAL;
 	}
 
+	policy = selinux_policy_get(state);
+	if (!policy)
+		return -EOPNOTSUPP;
+
 	tmprule = kzalloc(sizeof(struct selinux_audit_rule), GFP_KERNEL);
 	if (!tmprule)
 		return -ENOMEM;
 
 	context_init(&tmprule->au_ctxt);
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 
 	tmprule->au_seqno = policy->latest_granting;
@@ -3546,7 +3550,7 @@  int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule)
 	}
 	rc = 0;
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 
 	if (rc) {
 		selinux_audit_rule_free(tmprule);
@@ -3597,13 +3601,10 @@  int selinux_audit_rule_match(u32 sid, u32 field, u32 op, void *vrule)
 		return -ENOENT;
 	}
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-
-	policy = rcu_dereference(state->policy);
-
 	if (rule->au_seqno < policy->latest_granting) {
 		match = -ESTALE;
 		goto out;
@@ -3693,7 +3694,7 @@  int selinux_audit_rule_match(u32 sid, u32 field, u32 op, void *vrule)
 	}
 
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return match;
 }
 
@@ -3778,13 +3779,12 @@  int security_netlbl_secattr_to_sid(struct selinux_state *state,
 	struct context *ctx;
 	struct context ctx_new;
 
-	if (!selinux_initialized(state)) {
+	policy = selinux_policy_get(state);
+	if (!policy) {
 		*sid = SECSID_NULL;
 		return 0;
 	}
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 	sidtab = policy->sidtab;
 
@@ -3822,12 +3822,12 @@  int security_netlbl_secattr_to_sid(struct selinux_state *state,
 	} else
 		*sid = SECSID_NULL;
 
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return 0;
 out_free:
 	ebitmap_destroy(&ctx_new.range.level[0].cat);
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 
@@ -3849,11 +3849,10 @@  int security_netlbl_sid_to_secattr(struct selinux_state *state,
 	int rc;
 	struct context *ctx;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return 0;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	policydb = &policy->policydb;
 
 	rc = -ENOENT;
@@ -3872,7 +3871,7 @@  int security_netlbl_sid_to_secattr(struct selinux_state *state,
 	mls_export_netlbl_lvl(policydb, ctx, secattr);
 	rc = mls_export_netlbl_cat(policydb, ctx, secattr);
 out:
-	rcu_read_unlock();
+	selinux_policy_put(policy);
 	return rc;
 }
 #endif /* CONFIG_NETLABEL */
@@ -3890,25 +3889,22 @@  int security_read_policy(struct selinux_state *state,
 	int rc;
 	struct policy_file fp;
 
-	if (!selinux_initialized(state))
+	policy = selinux_policy_get(state);
+	if (!policy)
 		return -EINVAL;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
 	*len = policy->policydb.len;
-	rcu_read_unlock();
 
 	*data = vmalloc_user(*len);
-	if (!*data)
-		return -ENOMEM;
-
-	fp.data = *data;
-	fp.len = *len;
+	if (*data) {
+		fp.data = *data;
+		fp.len = *len;
 
-	rcu_read_lock();
-	policy = rcu_dereference(state->policy);
-	rc = policydb_write(&policy->policydb, &fp);
-	rcu_read_unlock();
+		rc = policydb_write(&policy->policydb, &fp);
+	} else {
+		rc = -ENOMEM;
+	}
+	selinux_policy_put(policy);
 
 	if (rc)
 		return rc;
diff --git a/security/selinux/ss/services.h b/security/selinux/ss/services.h
index 9555ad074303c..104ca16037baa 100644
--- a/security/selinux/ss/services.h
+++ b/security/selinux/ss/services.h
@@ -9,6 +9,9 @@ 
 
 #include "policydb.h"
 
+#include <linux/refcount.h>
+#include <linux/completion.h>
+
 /* Mapping for a single class */
 struct selinux_mapping {
 	u16 value; /* policy value for class */
@@ -23,6 +26,9 @@  struct selinux_map {
 };
 
 struct selinux_policy {
+	refcount_t refcount;
+	struct completion eol;
+
 	struct sidtab *sidtab;
 	struct policydb policydb;
 	struct selinux_map map;