diff mbox series

kvm: x86: keep srcu writer side operation mutually exclusive

Message ID CAPm50a+aygp3T1mNjzGXtL2nyNm-mHFZ3YO8F7eO0gCxZDuQsA@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series kvm: x86: keep srcu writer side operation mutually exclusive | expand

Commit Message

Hao Peng Oct. 7, 2022, 4 p.m. UTC
From: Peng Hao <flyingpeng@tencent.com>

Synchronization operations on the writer side of SRCU should be
invoked within the mutex.

Signed-off-by: Peng Hao <flyingpeng@tencent.com>
---
 arch/x86/kvm/pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

        kfree(filter);
--
2.27.0

Comments

Sean Christopherson Oct. 7, 2022, 5:12 p.m. UTC | #1
On Sat, Oct 08, 2022, Hao Peng wrote:
> From: Peng Hao <flyingpeng@tencent.com>
> 
> Synchronization operations on the writer side of SRCU should be
> invoked within the mutex.

Why?  Synchronizing SRCU is necessary only to ensure that all previous readers go
away before the old filter is freed.  There's no need to serialize synchronization
between writers.  The mutex ensures each writer operates on the "new" filter that's
set by the previous writer, i.e. there's no danger of a double-free.  And the next
writer will wait for readers to _its_ "new" filter.

I think it's a moot point though, as this is a subset of patch I posted[*] to fix
other issues with the PMU event filter.

[*] https://lore.kernel.org/all/20220923001355.3741194-2-seanjc@google.com
Hao Peng Oct. 9, 2022, 11:45 a.m. UTC | #2
On Sat, Oct 8, 2022 at 1:12 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sat, Oct 08, 2022, Hao Peng wrote:
> > From: Peng Hao <flyingpeng@tencent.com>
> >
> > Synchronization operations on the writer side of SRCU should be
> > invoked within the mutex.
>
> Why?  Synchronizing SRCU is necessary only to ensure that all previous readers go
> away before the old filter is freed.  There's no need to serialize synchronization
> between writers.  The mutex ensures each writer operates on the "new" filter that's
> set by the previous writer, i.e. there's no danger of a double-free.  And the next
> writer will wait for readers to _its_ "new" filter.
>
Array srcu_lock_count/srcu_unlock_count[] in srcu_data, which is used
alternately to determine
which readers need to wait to get out of the critical area. If  two
synchronize_srcu are initiated concurrently,
there may be a problem with the judgment of gp. But if it is confirmed
that there will be no writer concurrency,
it is not necessary to ensure that synchronize_srcu is executed within
the scope of the mutex lock.
Thanks.
> I think it's a moot point though, as this is a subset of patch I posted[*] to fix
> other issues with the PMU event filter.
>
> [*] https://lore.kernel.org/all/20220923001355.3741194-2-seanjc@google.com
Sean Christopherson Oct. 10, 2022, 5:38 p.m. UTC | #3
On Sun, Oct 09, 2022, Hao Peng wrote:
> On Sat, Oct 8, 2022 at 1:12 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Sat, Oct 08, 2022, Hao Peng wrote:
> > > From: Peng Hao <flyingpeng@tencent.com>
> > >
> > > Synchronization operations on the writer side of SRCU should be
> > > invoked within the mutex.
> >
> > Why?  Synchronizing SRCU is necessary only to ensure that all previous readers go
> > away before the old filter is freed.  There's no need to serialize synchronization
> > between writers.  The mutex ensures each writer operates on the "new" filter that's
> > set by the previous writer, i.e. there's no danger of a double-free.  And the next
> > writer will wait for readers to _its_ "new" filter.
> >
> Array srcu_lock_count/srcu_unlock_count[] in srcu_data, which is used
> alternately to determine
> which readers need to wait to get out of the critical area. If  two
> synchronize_srcu are initiated concurrently,
> there may be a problem with the judgment of gp. But if it is confirmed
> that there will be no writer concurrency,
> it is not necessary to ensure that synchronize_srcu is executed within
> the scope of the mutex lock.

I don't see anything in the RCU documentation or code that suggests that callers
need to serialize synchronization calls.  E.g. the "tree" SRCU implementation uses
a dedicated mutex to serialize grace period work 

	struct mutex srcu_gp_mutex;		/* Serialize GP work. */

static void srcu_advance_state(struct srcu_struct *ssp)
{
	int idx;

	mutex_lock(&ssp->srcu_gp_mutex);

	<magic>
}


and its state machine explicitly accounts for "Someone else" starting a grace
period

		if (idx != SRCU_STATE_IDLE) {
			mutex_unlock(&ssp->srcu_gp_mutex);
			return; /* Someone else started the grace period. */
		}

and srcu_gp_end() also guards against creating more than 2 grace periods.

	/* Prevent more than one additional grace period. */
	mutex_lock(&ssp->srcu_cb_mutex);

And if this is a subtle requirement, there is a lot of broken kernel code, e.g.
mmu_notifier, other KVM code, srcu_notifier_chain_unregister(), etc...
Hao Peng Oct. 24, 2022, 3:27 a.m. UTC | #4
On Tue, Oct 11, 2022 at 1:38 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sun, Oct 09, 2022, Hao Peng wrote:
> > On Sat, Oct 8, 2022 at 1:12 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Sat, Oct 08, 2022, Hao Peng wrote:
> > > > From: Peng Hao <flyingpeng@tencent.com>
> > > >
> > > > Synchronization operations on the writer side of SRCU should be
> > > > invoked within the mutex.
> > >
> > > Why?  Synchronizing SRCU is necessary only to ensure that all previous readers go
> > > away before the old filter is freed.  There's no need to serialize synchronization
> > > between writers.  The mutex ensures each writer operates on the "new" filter that's
> > > set by the previous writer, i.e. there's no danger of a double-free.  And the next
> > > writer will wait for readers to _its_ "new" filter.
> > >
> > Array srcu_lock_count/srcu_unlock_count[] in srcu_data, which is used
> > alternately to determine
> > which readers need to wait to get out of the critical area. If  two
> > synchronize_srcu are initiated concurrently,
> > there may be a problem with the judgment of gp. But if it is confirmed
> > that there will be no writer concurrency,
> > it is not necessary to ensure that synchronize_srcu is executed within
> > the scope of the mutex lock.
>
> I don't see anything in the RCU documentation or code that suggests that callers
> need to serialize synchronization calls.  E.g. the "tree" SRCU implementation uses
> a dedicated mutex to serialize grace period work
>
>         struct mutex srcu_gp_mutex;             /* Serialize GP work. */
>
> static void srcu_advance_state(struct srcu_struct *ssp)
> {
>         int idx;
>
>         mutex_lock(&ssp->srcu_gp_mutex);
>
>         <magic>
> }
>
>
> and its state machine explicitly accounts for "Someone else" starting a grace
> period
>
>                 if (idx != SRCU_STATE_IDLE) {
>                         mutex_unlock(&ssp->srcu_gp_mutex);
>                         return; /* Someone else started the grace period. */
>                 }
>
> and srcu_gp_end() also guards against creating more than 2 grace periods.
>
>         /* Prevent more than one additional grace period. */
>         mutex_lock(&ssp->srcu_cb_mutex);
>
> And if this is a subtle requirement, there is a lot of broken kernel code, e.g.
> mmu_notifier, other KVM code, srcu_notifier_chain_unregister(), etc...

srcu_gp_mutex is meaningless because the workqueue already guarantees
that the same work_struct will not be reentrant.
If synchronize_srcu is not mutually exclusive on the update side, it may cause
a GP to fail for a long time. I will continue to analyze when I have time.
diff mbox series

Patch

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 8a7dbe2c469a..619151849980 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -602,9 +602,9 @@  int kvm_vm_ioctl_set_pmu_event_filter(struct kvm
*kvm, void __user *argp)

        mutex_lock(&kvm->lock);
        filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter, 1);
+       synchronize_srcu_expedited(&kvm->srcu);
        mutex_unlock(&kvm->lock);

-       synchronize_srcu_expedited(&kvm->srcu);
        r = 0;
 cleanup: