diff mbox

[3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

Message ID 0e5ff7f7-855c-ea28-fdee-73c062c3d289@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier March 15, 2017, 1:28 p.m. UTC
On 15/03/17 10:56, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>> On 15/03/17 09:21, Christoffer Dall wrote:
>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>> unmap a range.
>>>>
>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>> Cc: stable@vger.kernel.org # v3.10+
>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 13b9c1f..b361f71 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>  	if (kvm->arch.pgd == NULL)
>>>>  		return;
>>>>  
>>>> +	spin_lock(&kvm->mmu_lock);
>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>> +	spin_unlock(&kvm->mmu_lock);
>>>> +
>>>
>>> This ends up holding the spin lock for potentially quite a while, where
>>> we can do things like __flush_dcache_area(), which I think can fault.
>>
>> I believe we're always using the linear mapping (or kmap on 32bit) in
>> order not to fault.
>>
> 
> ok, then there's just the concern that we may be holding a spinlock for
> a very long time.  I seem to recall Mario once added something where he
> unlocked and gave a chance to schedule something else for each PUD or
> something like that, because he ran into the issue during migration.  Am
> I confusing this with something else?

That definitely rings a bell: stage2_wp_range() uses that kind of trick
to give the system a chance to breathe. Maybe we could use a similar
trick in our S2 unmapping code? How about this (completely untested) patch:


The additional BUG_ON() is just for my own peace of mind - we seem to
have missed a couple of these lately, and the "breathing" code makes
it imperative that this lock is being taken prior to entering the
function.

Thoughts?

	M.

Comments

Christoffer Dall March 15, 2017, 1:35 p.m. UTC | #1
On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
> On 15/03/17 10:56, Christoffer Dall wrote:
> > On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
> >> On 15/03/17 09:21, Christoffer Dall wrote:
> >>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
> >>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >>>> unmap_stage2_range() on the entire memory range for the guest. This could
> >>>> cause problems with other callers (e.g, munmap on a memslot) trying to
> >>>> unmap a range.
> >>>>
> >>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >>>> Cc: stable@vger.kernel.org # v3.10+
> >>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> >>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>>> ---
> >>>>  arch/arm/kvm/mmu.c | 3 +++
> >>>>  1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >>>> index 13b9c1f..b361f71 100644
> >>>> --- a/arch/arm/kvm/mmu.c
> >>>> +++ b/arch/arm/kvm/mmu.c
> >>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> >>>>  	if (kvm->arch.pgd == NULL)
> >>>>  		return;
> >>>>  
> >>>> +	spin_lock(&kvm->mmu_lock);
> >>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> >>>> +	spin_unlock(&kvm->mmu_lock);
> >>>> +
> >>>
> >>> This ends up holding the spin lock for potentially quite a while, where
> >>> we can do things like __flush_dcache_area(), which I think can fault.
> >>
> >> I believe we're always using the linear mapping (or kmap on 32bit) in
> >> order not to fault.
> >>
> > 
> > ok, then there's just the concern that we may be holding a spinlock for
> > a very long time.  I seem to recall Mario once added something where he
> > unlocked and gave a chance to schedule something else for each PUD or
> > something like that, because he ran into the issue during migration.  Am
> > I confusing this with something else?
> 
> That definitely rings a bell: stage2_wp_range() uses that kind of trick
> to give the system a chance to breathe. Maybe we could use a similar
> trick in our S2 unmapping code? How about this (completely untested) patch:
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616fd4ddd..1786c24212d4 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
> +
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> 
> The additional BUG_ON() is just for my own peace of mind - we seem to
> have missed a couple of these lately, and the "breathing" code makes
> it imperative that this lock is being taken prior to entering the
> function.
> 

Looks good to me!

-Christoffer
Marc Zyngier March 15, 2017, 1:43 p.m. UTC | #2
On 15/03/17 13:35, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
>> On 15/03/17 10:56, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>> unmap a range.
>>>>>>
>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>> ---
>>>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>>>  1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>>>> index 13b9c1f..b361f71 100644
>>>>>> --- a/arch/arm/kvm/mmu.c
>>>>>> +++ b/arch/arm/kvm/mmu.c
>>>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>>>  	if (kvm->arch.pgd == NULL)
>>>>>>  		return;
>>>>>>  
>>>>>> +	spin_lock(&kvm->mmu_lock);
>>>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>>>> +	spin_unlock(&kvm->mmu_lock);
>>>>>> +
>>>>>
>>>>> This ends up holding the spin lock for potentially quite a while, where
>>>>> we can do things like __flush_dcache_area(), which I think can fault.
>>>>
>>>> I believe we're always using the linear mapping (or kmap on 32bit) in
>>>> order not to fault.
>>>>
>>>
>>> ok, then there's just the concern that we may be holding a spinlock for
>>> a very long time.  I seem to recall Mario once added something where he
>>> unlocked and gave a chance to schedule something else for each PUD or
>>> something like that, because he ran into the issue during migration.  Am
>>> I confusing this with something else?
>>
>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>> to give the system a chance to breathe. Maybe we could use a similar
>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1786c24212d4 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>  	phys_addr_t addr = start, end = start + size;
>>  	phys_addr_t next;
>>  
>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
>> +
>>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>  	do {
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
>> +
>>  		next = stage2_pgd_addr_end(addr, end);
>>  		if (!stage2_pgd_none(*pgd))
>>  			unmap_stage2_puds(kvm, pgd, addr, next);
>>
>> The additional BUG_ON() is just for my own peace of mind - we seem to
>> have missed a couple of these lately, and the "breathing" code makes
>> it imperative that this lock is being taken prior to entering the
>> function.
>>
> 
> Looks good to me!

OK. I'll stash that on top of Suzuki's series, and start running some
actual tests... ;-)

Thanks,

	M.
Robin Murphy March 15, 2017, 1:50 p.m. UTC | #3
Hi Marc,

On 15/03/17 13:43, Marc Zyngier wrote:
> On 15/03/17 13:35, Christoffer Dall wrote:
>> On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
>>> On 15/03/17 10:56, Christoffer Dall wrote:
>>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>>> unmap a range.
>>>>>>>
>>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>>> ---
>>>>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>>>>> index 13b9c1f..b361f71 100644
>>>>>>> --- a/arch/arm/kvm/mmu.c
>>>>>>> +++ b/arch/arm/kvm/mmu.c
>>>>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>>>>  	if (kvm->arch.pgd == NULL)
>>>>>>>  		return;
>>>>>>>  
>>>>>>> +	spin_lock(&kvm->mmu_lock);
>>>>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>>>>> +	spin_unlock(&kvm->mmu_lock);
>>>>>>> +
>>>>>>
>>>>>> This ends up holding the spin lock for potentially quite a while, where
>>>>>> we can do things like __flush_dcache_area(), which I think can fault.
>>>>>
>>>>> I believe we're always using the linear mapping (or kmap on 32bit) in
>>>>> order not to fault.
>>>>>
>>>>
>>>> ok, then there's just the concern that we may be holding a spinlock for
>>>> a very long time.  I seem to recall Mario once added something where he
>>>> unlocked and gave a chance to schedule something else for each PUD or
>>>> something like that, because he ran into the issue during migration.  Am
>>>> I confusing this with something else?
>>>
>>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>>> to give the system a chance to breathe. Maybe we could use a similar
>>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 962616fd4ddd..1786c24212d4 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>>  	phys_addr_t addr = start, end = start + size;
>>>  	phys_addr_t next;
>>>  
>>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));

Nit: assert_spin_locked() is somewhat more pleasant (and currently looks
to expand to the exact same code).

Robin.

>>> +
>>>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>>  	do {
>>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>>> +			cond_resched_lock(&kvm->mmu_lock);
>>> +
>>>  		next = stage2_pgd_addr_end(addr, end);
>>>  		if (!stage2_pgd_none(*pgd))
>>>  			unmap_stage2_puds(kvm, pgd, addr, next);
>>>
>>> The additional BUG_ON() is just for my own peace of mind - we seem to
>>> have missed a couple of these lately, and the "breathing" code makes
>>> it imperative that this lock is being taken prior to entering the
>>> function.
>>>
>>
>> Looks good to me!
> 
> OK. I'll stash that on top of Suzuki's series, and start running some
> actual tests... ;-)
> 
> Thanks,
> 
> 	M.
>
Marc Zyngier March 15, 2017, 1:55 p.m. UTC | #4
On 15/03/17 13:50, Robin Murphy wrote:
> Hi Marc,
> 
> On 15/03/17 13:43, Marc Zyngier wrote:
>> On 15/03/17 13:35, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 01:28:07PM +0000, Marc Zyngier wrote:
>>>> On 15/03/17 10:56, Christoffer Dall wrote:
>>>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>>>> unmap a range.
>>>>>>>>
>>>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>>>> ---
>>>>>>>>  arch/arm/kvm/mmu.c | 3 +++
>>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>>>>>> index 13b9c1f..b361f71 100644
>>>>>>>> --- a/arch/arm/kvm/mmu.c
>>>>>>>> +++ b/arch/arm/kvm/mmu.c
>>>>>>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>>>>>>>  	if (kvm->arch.pgd == NULL)
>>>>>>>>  		return;
>>>>>>>>  
>>>>>>>> +	spin_lock(&kvm->mmu_lock);
>>>>>>>>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>>>>>>> +	spin_unlock(&kvm->mmu_lock);
>>>>>>>> +
>>>>>>>
>>>>>>> This ends up holding the spin lock for potentially quite a while, where
>>>>>>> we can do things like __flush_dcache_area(), which I think can fault.
>>>>>>
>>>>>> I believe we're always using the linear mapping (or kmap on 32bit) in
>>>>>> order not to fault.
>>>>>>
>>>>>
>>>>> ok, then there's just the concern that we may be holding a spinlock for
>>>>> a very long time.  I seem to recall Mario once added something where he
>>>>> unlocked and gave a chance to schedule something else for each PUD or
>>>>> something like that, because he ran into the issue during migration.  Am
>>>>> I confusing this with something else?
>>>>
>>>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>>>> to give the system a chance to breathe. Maybe we could use a similar
>>>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 962616fd4ddd..1786c24212d4 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>>>  	phys_addr_t addr = start, end = start + size;
>>>>  	phys_addr_t next;
>>>>  
>>>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
> 
> Nit: assert_spin_locked() is somewhat more pleasant (and currently looks
> to expand to the exact same code).

Fancy!

Thanks,

	M.
Suzuki K Poulose March 15, 2017, 2:33 p.m. UTC | #5
On 15/03/17 13:28, Marc Zyngier wrote:
> On 15/03/17 10:56, Christoffer Dall wrote:
>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>> unmap a range.
>>>>>
>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

...

>> ok, then there's just the concern that we may be holding a spinlock for
>> a very long time.  I seem to recall Mario once added something where he
>> unlocked and gave a chance to schedule something else for each PUD or
>> something like that, because he ran into the issue during migration.  Am
>> I confusing this with something else?
>
> That definitely rings a bell: stage2_wp_range() uses that kind of trick
> to give the system a chance to breathe. Maybe we could use a similar
> trick in our S2 unmapping code? How about this (completely untested) patch:
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616fd4ddd..1786c24212d4 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>
> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
> +
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);

nit: I think we could make the cond_resched_lock() unconditionally here:
Given, __cond_resched_lock() already does all the above checks :

kernel/sched/core.c:

int __cond_resched_lock(spinlock_t *lock)
{
         int resched = should_resched(PREEMPT_LOCK_OFFSET);

...

         if (spin_needbreak(lock) || resched) {


Suzuki
Marc Zyngier March 15, 2017, 3:07 p.m. UTC | #6
On 15/03/17 14:33, Suzuki K Poulose wrote:
> On 15/03/17 13:28, Marc Zyngier wrote:
>> On 15/03/17 10:56, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 09:39:26AM +0000, Marc Zyngier wrote:
>>>> On 15/03/17 09:21, Christoffer Dall wrote:
>>>>> On Tue, Mar 14, 2017 at 02:52:34PM +0000, Suzuki K Poulose wrote:
>>>>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>>>>> unmap_stage2_range() on the entire memory range for the guest. This could
>>>>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>>>>> unmap a range.
>>>>>>
>>>>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>>>>> Cc: stable@vger.kernel.org # v3.10+
>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> ...
> 
>>> ok, then there's just the concern that we may be holding a spinlock for
>>> a very long time.  I seem to recall Mario once added something where he
>>> unlocked and gave a chance to schedule something else for each PUD or
>>> something like that, because he ran into the issue during migration.  Am
>>> I confusing this with something else?
>>
>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>> to give the system a chance to breathe. Maybe we could use a similar
>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1786c24212d4 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>  	phys_addr_t addr = start, end = start + size;
>>  	phys_addr_t next;
>>
>> +	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
>> +
>>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>  	do {
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
> 
> nit: I think we could make the cond_resched_lock() unconditionally here:
> Given, __cond_resched_lock() already does all the above checks :
> 
> kernel/sched/core.c:
> 
> int __cond_resched_lock(spinlock_t *lock)
> {
>          int resched = should_resched(PREEMPT_LOCK_OFFSET);
> 
> ...
> 
>          if (spin_needbreak(lock) || resched) {

Right. And should_resched() also contains a test for need_resched().

This means we can also simplify stage2_wp_range(). Awesome!

Thanks,

	M.
diff mbox

Patch

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616fd4ddd..1786c24212d4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -292,8 +292,13 @@  static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
 
+	BUG_ON(!spin_is_locked(&kvm->mmu_lock));
+
 	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
 	do {
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
 			unmap_stage2_puds(kvm, pgd, addr, next);