[3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

Message ID	0e5ff7f7-855c-ea28-fdee-73c062c3d289@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Subject: Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd To: Christoffer Dall <cdall@linaro.org> References: <1489503154-20705-1-git-send-email-suzuki.poulose@arm.com> <1489503154-20705-4-git-send-email-suzuki.poulose@arm.com> <20170315092147.GM1277@cbox> <314fbde3-17e6-414b-85e6-326de22bdc1c@arm.com> <20170315105639.GA31974@cbox> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>, linux-arm-kernel@lists.infradead.org, andreyknvl@google.com, dvyukov@google.com, christoffer.dall@linaro.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, kcc@google.com, syzkaller@googlegroups.com, will.deacon@arm.com, catalin.marinas@arm.com, pbonzini@redhat.com, mark.rutland@arm.com, ard.biesheuvel@linaro.org, stable@vger.kernel.org From: Marc Zyngier <marc.zyngier@arm.com> Organization: ARM Ltd Message-ID: <0e5ff7f7-855c-ea28-fdee-73c062c3d289@arm.com> Date: Wed, 15 Mar 2017 13:28:07 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0 MIME-Version: 1.0 In-Reply-To: <20170315105639.GA31974@cbox> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Marc Zyngier March 15, 2017, 1:28 p.m. UTC

On 15/03/17 10:56, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>> On 15/03/17 09:21, Christoffer Dall wrote:
>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>> unmap a range.
>>>>
>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>> Cc: stable@vger.kernel.org # v3.10+
>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 13b9c1f..b361f71 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>  	if (kvm->arch.pgd == NULL)
>>>>  		return;
>>>>  
>>>> +	spin_lock(&kvm->mmu_lock);
>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>> +	spin_unlock(&kvm->mmu_lock);
>>>> +
>>>
>>> This ends up holding the spin lock for potentially quite a while, where
>>> we can do things like __flush_dcache_area(), which I think can fault.
>>
>> I believe we're always using the linear mapping (or kmap on 32bit) in
>> order not to fault.
>>
> 
> ok, then there's just the concern that we may be holding a spinlock for
> a very long time.  I seem to recall Mario once added something where he
> unlocked and gave a chance to schedule something else for each PUD or
> something like that, because he ran into the issue during migration.  Am
> I confusing this with something else?

That definitely rings a bell: stage2_wp_range() uses that kind of trick
to give the system a chance to breathe. Maybe we could use a similar
trick in our S2 unmapping code? How about this (completely untested) patch:


The additional BUG_ON() is just for my own peace of mind - we seem to
have missed a couple of these lately, and the "breathing" code makes
it imperative that this lock is being taken prior to entering the
function.

Thoughts?

	M.

Christoffer Dall March 15, 2017, 1:35 p.m. UTC | #1

On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
> On 15/03/17 10:56, Christoffer Dall wrote:
> > On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
> >> On 15/03/17 09:21, Christoffer Dall wrote:
> >>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
> >>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >>>> unmap_stage2_range() on the entire memory range for the guest. This could
> >>>> cause problems with other callers (e.g, munmap on a memslot) trying to
> >>>> unmap a range.
> >>>>
> >>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >>>> Cc: stable@vger.kernel.org # v3.10+
> >>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> >>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>>> ---
> >>>>  arch/arm/kvm/mmu.c | 3 +++
> >>>>  1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >>>> index 13b9c1f..b361f71 100644
> >>>> --- a/arch/arm/kvm/mmu.c
> >>>> +++ b/arch/arm/kvm/mmu.c
> >>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> >>>>  	if (kvm->arch.pgd == NULL)
> >>>>  		return;
> >>>>  
> >>>> +	spin_lock(&kvm->mmu_lock);
> >>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> >>>> +	spin_unlock(&kvm->mmu_lock);
> >>>> +
> >>>
> >>> This ends up holding the spin lock for potentially quite a while, where
> >>> we can do things like __flush_dcache_area(), which I think can fault.
> >>
> >> I believe we're always using the linear mapping (or kmap on 32bit) in
> >> order not to fault.
> >>
> > 
> > ok, then there's just the concern that we may be holding a spinlock for
> > a very long time.  I seem to recall Mario once added something where he
> > unlocked and gave a chance to schedule something else for each PUD or
> > something like that, because he ran into the issue during migration.  Am
> > I confusing this with something else?
> 
> That definitely rings a bell: stage2_wp_range() uses that kind of trick
> to give the system a chance to breathe. Maybe we could use a similar
> trick in our S2 unmapping code? How about this (completely untested) patch:
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616fd4ddd..1786c24212d4 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
> +
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> 
> The additional BUG_ON() is just for my own peace of mind - we seem to
> have missed a couple of these lately, and the "breathing" code makes
> it imperative that this lock is being taken prior to entering the
> function.
> 

Looks good to me!

-Christoffer

Marc Zyngier March 15, 2017, 1:43 p.m. UTC | #2

On 15/03/17 13:35, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
>> On 15/03/17 10:56, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>> unmap a range.
>>>>>>
>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>> ---
>>>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>>>  1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>>>> index 13b9c1f..b361f71 100644
>>>>>> --- a/arch/arm/kvm/mmu.c
>>>>>> +++ b/arch/arm/kvm/mmu.c
>>>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>>>  	if (kvm->arch.pgd == NULL)
>>>>>>  		return;
>>>>>>  
>>>>>> +	spin_lock(&kvm->mmu_lock);
>>>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>>>> +	spin_unlock(&kvm->mmu_lock);
>>>>>> +
>>>>>
>>>>> This ends up holding the spin lock for potentially quite a while, where
>>>>> we can do things like __flush_dcache_area(), which I think can fault.
>>>>
>>>> I believe we're always using the linear mapping (or kmap on 32bit) in
>>>> order not to fault.
>>>>
>>>
>>> ok, then there's just the concern that we may be holding a spinlock for
>>> a very long time.  I seem to recall Mario once added something where he
>>> unlocked and gave a chance to schedule something else for each PUD or
>>> something like that, because he ran into the issue during migration.  Am
>>> I confusing this with something else?
>>
>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>> to give the system a chance to breathe. Maybe we could use a similar
>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1786c24212d4 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>  	phys_addr_t addr = start, end = start + size;
>>  	phys_addr_t next;
>>  
>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
>> +
>>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>  	do {
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
>> +
>>  		next = stage2_pgd_addr_end(addr, end);
>>  		if (!stage2_pgd_none(*pgd))
>>  			unmap_stage2_puds(kvm, pgd, addr, next);
>>
>> The additional BUG_ON() is just for my own peace of mind - we seem to
>> have missed a couple of these lately, and the "breathing" code makes
>> it imperative that this lock is being taken prior to entering the
>> function.
>>
> 
> Looks good to me!

OK. I'll stash that on top of Suzuki's series, and start running some
actual tests... ;-)

Thanks,

	M.

Robin Murphy March 15, 2017, 1:50 p.m. UTC | #3

Hi Marc,

On 15/03/17 13:43, Marc Zyngier wrote:
> On 15/03/17 13:35, Christoffer Dall wrote:
>> On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
>>> On 15/03/17 10:56, Christoffer Dall wrote:
>>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>>> unmap a range.
>>>>>>>
>>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>>> ---
>>>>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>>>>> index 13b9c1f..b361f71 100644
>>>>>>> --- a/arch/arm/kvm/mmu.c
>>>>>>> +++ b/arch/arm/kvm/mmu.c
>>>>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>>>>  	if (kvm->arch.pgd == NULL)
>>>>>>>  		return;
>>>>>>>  
>>>>>>> +	spin_lock(&kvm->mmu_lock);
>>>>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>>>>> +	spin_unlock(&kvm->mmu_lock);
>>>>>>> +
>>>>>>
>>>>>> This ends up holding the spin lock for potentially quite a while, where
>>>>>> we can do things like __flush_dcache_area(), which I think can fault.
>>>>>
>>>>> I believe we're always using the linear mapping (or kmap on 32bit) in
>>>>> order not to fault.
>>>>>
>>>>
>>>> ok, then there's just the concern that we may be holding a spinlock for
>>>> a very long time.  I seem to recall Mario once added something where he
>>>> unlocked and gave a chance to schedule something else for each PUD or
>>>> something like that, because he ran into the issue during migration.  Am
>>>> I confusing this with something else?
>>>
>>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>>> to give the system a chance to breathe. Maybe we could use a similar
>>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 962616fd4ddd..1786c24212d4 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>>  	phys_addr_t addr = start, end = start + size;
>>>  	phys_addr_t next;
>>>  
>>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));

Nit: assert_spin_locked() is somewhat more pleasant (and currently looks
to expand to the exact same code).

Robin.

>>> +
>>>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>>  	do {
>>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>>> +			cond_resched_lock(&kvm->mmu_lock);
>>> +
>>>  		next = stage2_pgd_addr_end(addr, end);
>>>  		if (!stage2_pgd_none(*pgd))
>>>  			unmap_stage2_puds(kvm, pgd, addr, next);
>>>
>>> The additional BUG_ON() is just for my own peace of mind - we seem to
>>> have missed a couple of these lately, and the "breathing" code makes
>>> it imperative that this lock is being taken prior to entering the
>>> function.
>>>
>>
>> Looks good to me!
> 
> OK. I'll stash that on top of Suzuki's series, and start running some
> actual tests... ;-)
> 
> Thanks,
> 
> 	M.
>

Marc Zyngier March 15, 2017, 1:55 p.m. UTC | #4

On 15/03/17 13:50, Robin Murphy wrote:
> Hi Marc,
> 
> On 15/03/17 13:43, Marc Zyngier wrote:
>> On 15/03/17 13:35, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
>>>> On 15/03/17 10:56, Christoffer Dall wrote:
>>>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>>>> unmap a range.
>>>>>>>>
>>>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>>>> ---
>>>>>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>>>>>> index 13b9c1f..b361f71 100644
>>>>>>>> --- a/arch/arm/kvm/mmu.c
>>>>>>>> +++ b/arch/arm/kvm/mmu.c
>>>>>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>>>>>  	if (kvm->arch.pgd == NULL)
>>>>>>>>  		return;
>>>>>>>>  
>>>>>>>> +	spin_lock(&kvm->mmu_lock);
>>>>>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>>>>>> +	spin_unlock(&kvm->mmu_lock);
>>>>>>>> +
>>>>>>>
>>>>>>> This ends up holding the spin lock for potentially quite a while, where
>>>>>>> we can do things like __flush_dcache_area(), which I think can fault.
>>>>>>
>>>>>> I believe we're always using the linear mapping (or kmap on 32bit) in
>>>>>> order not to fault.
>>>>>>
>>>>>
>>>>> ok, then there's just the concern that we may be holding a spinlock for
>>>>> a very long time.  I seem to recall Mario once added something where he
>>>>> unlocked and gave a chance to schedule something else for each PUD or
>>>>> something like that, because he ran into the issue during migration.  Am
>>>>> I confusing this with something else?
>>>>
>>>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>>>> to give the system a chance to breathe. Maybe we could use a similar
>>>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 962616fd4ddd..1786c24212d4 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>>>  	phys_addr_t addr = start, end = start + size;
>>>>  	phys_addr_t next;
>>>>  
>>>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
> 
> Nit: assert_spin_locked() is somewhat more pleasant (and currently looks
> to expand to the exact same code).

Fancy!

Thanks,

	M.

Suzuki K Poulose March 15, 2017, 2:33 p.m. UTC | #5

On 15/03/17 13:28, Marc Zyngier wrote:
> On 15/03/17 10:56, Christoffer Dall wrote:
>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>> unmap a range.
>>>>>
>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

...

>> ok, then there's just the concern that we may be holding a spinlock for
>> a very long time.  I seem to recall Mario once added something where he
>> unlocked and gave a chance to schedule something else for each PUD or
>> something like that, because he ran into the issue during migration.  Am
>> I confusing this with something else?
>
> That definitely rings a bell: stage2_wp_range() uses that kind of trick
> to give the system a chance to breathe. Maybe we could use a similar
> trick in our S2 unmapping code? How about this (completely untested) patch:
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616fd4ddd..1786c24212d4 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>
> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
> +
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);

nit: I think we could make the cond_resched_lock() unconditionally here:
Given, __cond_resched_lock() already does all the above checks :

kernel/sched/core.c:

int __cond_resched_lock(spinlock_t *lock)
{
         int resched = should_resched(PREEMPT_LOCK_OFFSET);

...

         if (spin_needbreak(lock) || resched) {


Suzuki

Marc Zyngier March 15, 2017, 3:07 p.m. UTC | #6

On 15/03/17 14:33, Suzuki K Poulose wrote:
> On 15/03/17 13:28, Marc Zyngier wrote:
>> On 15/03/17 10:56, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>> unmap a range.
>>>>>>
>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> ...
> 
>>> ok, then there's just the concern that we may be holding a spinlock for
>>> a very long time.  I seem to recall Mario once added something where he
>>> unlocked and gave a chance to schedule something else for each PUD or
>>> something like that, because he ran into the issue during migration.  Am
>>> I confusing this with something else?
>>
>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>> to give the system a chance to breathe. Maybe we could use a similar
>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1786c24212d4 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>  	phys_addr_t addr = start, end = start + size;
>>  	phys_addr_t next;
>>
>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
>> +
>>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>  	do {
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
> 
> nit: I think we could make the cond_resched_lock() unconditionally here:
> Given, __cond_resched_lock() already does all the above checks :
> 
> kernel/sched/core.c:
> 
> int __cond_resched_lock(spinlock_t *lock)
> {
>          int resched = should_resched(PREEMPT_LOCK_OFFSET);
> 
> ...
> 
>          if (spin_needbreak(lock) || resched) {

Right. And should_resched() also contains a test for need_resched().

This means we can also simplify stage2_wp_range(). Awesome!

Thanks,

	M.

[3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

Commit Message

Comments

Patch