Message ID | 20240726235234.228822-55-seanjc@google.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | KVM: Stop grabbing references to PFNMAP'd pages | expand |
[+cc Fuad] Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting the page as dirty out of the critical section") that restructuring around the MMU lock was helpful for reuse (presumably for pKVM), but I lack the context there. On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote: > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a > page/folio dirty after it has been written back can make some filesystems > unhappy (backing KVM guests will such filesystem files is uncommon, and typo: s/will/with/ > the race is minuscule, hence the lack of complaints). See the link below > for details. > > This will also allow converting arm64 to kvm_release_faultin_page(), which > requires that mmu_lock be held (for the aforementioned reason). > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/arm64/kvm/mmu.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 22ee37360c4e..ce13c3d884d5 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > } > > out_unlock: > + if (writable && !ret) > + kvm_set_pfn_dirty(pfn); I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks a reference. > + else > + kvm_release_pfn_clean(pfn); > + > read_unlock(&kvm->mmu_lock); > > /* Mark the page dirty only if the fault is handled successfully */ > - if (writable && !ret) { > - kvm_set_pfn_dirty(pfn); > + if (writable && !ret) > mark_page_dirty_in_slot(kvm, memslot, gfn); > - } > > - kvm_release_pfn_clean(pfn); > return ret != -EAGAIN ? ret : 0; > } > > -- > 2.46.0.rc1.232.g9752f9e123-goog >
On Mon, Aug 05, 2024 at 11:26:03PM +0000, Oliver Upton wrote: > [+cc Fuad] Take 2! > Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting > the page as dirty out of the critical section") that restructuring > around the MMU lock was helpful for reuse (presumably for pKVM), but I > lack the context there. > > On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote: > > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a > > page/folio dirty after it has been written back can make some filesystems > > unhappy (backing KVM guests will such filesystem files is uncommon, and > > typo: s/will/with/ > > > the race is minuscule, hence the lack of complaints). See the link below > > for details. > > > > This will also allow converting arm64 to kvm_release_faultin_page(), which > > requires that mmu_lock be held (for the aforementioned reason). > > > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > arch/arm64/kvm/mmu.c | 10 ++++++---- > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 22ee37360c4e..ce13c3d884d5 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > } > > > > out_unlock: > > + if (writable && !ret) > > + kvm_set_pfn_dirty(pfn); > > I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks > a reference. > > > + else > > + kvm_release_pfn_clean(pfn); > > + > > read_unlock(&kvm->mmu_lock); > > > > /* Mark the page dirty only if the fault is handled successfully */ > > - if (writable && !ret) { > > - kvm_set_pfn_dirty(pfn); > > + if (writable && !ret) > > mark_page_dirty_in_slot(kvm, memslot, gfn); > > - } > > > > - kvm_release_pfn_clean(pfn); > > return ret != -EAGAIN ? ret : 0; > > } > > > > -- > > 2.46.0.rc1.232.g9752f9e123-goog > > > > -- > Thanks, > Oliver
On Mon, Aug 05, 2024, Oliver Upton wrote: > > > --- > > > arch/arm64/kvm/mmu.c | 10 ++++++---- > > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > index 22ee37360c4e..ce13c3d884d5 100644 > > > --- a/arch/arm64/kvm/mmu.c > > > +++ b/arch/arm64/kvm/mmu.c > > > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > > } > > > > > > out_unlock: > > > + if (writable && !ret) > > > + kvm_set_pfn_dirty(pfn); > > > > I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks > > a reference. Doh, I did indeed. Alternatively, this could be: if (writable && !ret) kvm_set_pfn_dirty(pfn); kvm_release_pfn_clean(pfn); It won't matter in the end, because this just becomes: kvm_release_faultin_page(kvm, page, !!ret, writable); So I guess the question is if you prefer to make the switch to an if-else in this path, or more implicitly in the conversion to kvm_release_faultin_page(). I made the same goof for RISC-V, perhaps to prove that I too can copy+paste arm64's MMU code ;-)
On Mon, Aug 05, 2024 at 04:53:01PM -0700, Sean Christopherson wrote: > On Mon, Aug 05, 2024, Oliver Upton wrote: > > > > --- > > > > arch/arm64/kvm/mmu.c | 10 ++++++---- > > > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > > index 22ee37360c4e..ce13c3d884d5 100644 > > > > --- a/arch/arm64/kvm/mmu.c > > > > +++ b/arch/arm64/kvm/mmu.c > > > > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > > > } > > > > > > > > out_unlock: > > > > + if (writable && !ret) > > > > + kvm_set_pfn_dirty(pfn); > > > > > > I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks > > > a reference. > > Doh, I did indeed. Alternatively, this could be: > > if (writable && !ret) > kvm_set_pfn_dirty(pfn); > > kvm_release_pfn_clean(pfn); > > It won't matter in the end, because this just becomes: > > kvm_release_faultin_page(kvm, page, !!ret, writable); > > So I guess the question is if you prefer to make the switch to an if-else in this > path, or more implicitly in the conversion to kvm_release_faultin_page(). > > I made the same goof for RISC-V, perhaps to prove that I too can copy+paste arm64's > MMU code ;-) LOL, whatever way you want to address it is fine by me, just wanted to make sure this intermediate bug wouldn't bite an unlucky bisection.
Hi Oliver, On Tue, 6 Aug 2024 at 00:26, Oliver Upton <oliver.upton@linux.dev> wrote: > > [+cc Fuad] > > Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting > the page as dirty out of the critical section") that restructuring > around the MMU lock was helpful for reuse (presumably for pKVM), but I > lack the context there. That was for some refactoring I'd done later on for mem_aborts in pKVM. That said, I didn't know at the time that there might be a race with some filesystems. I'll keep this in mind for the pKVM code we have for now, and when upstreaming. Thanks, /fuad > On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote: > > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a > > page/folio dirty after it has been written back can make some filesystems > > unhappy (backing KVM guests will such filesystem files is uncommon, and > > typo: s/will/with/ > > > the race is minuscule, hence the lack of complaints). See the link below > > for details. > > > > This will also allow converting arm64 to kvm_release_faultin_page(), which > > requires that mmu_lock be held (for the aforementioned reason). > > > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > arch/arm64/kvm/mmu.c | 10 ++++++---- > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 22ee37360c4e..ce13c3d884d5 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > } > > > > out_unlock: > > + if (writable && !ret) > > + kvm_set_pfn_dirty(pfn); > > I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks > a reference. > > > + else > > + kvm_release_pfn_clean(pfn); > > + > > read_unlock(&kvm->mmu_lock); > > > > /* Mark the page dirty only if the fault is handled successfully */ > > - if (writable && !ret) { > > - kvm_set_pfn_dirty(pfn); > > + if (writable && !ret) > > mark_page_dirty_in_slot(kvm, memslot, gfn); > > - } > > > > - kvm_release_pfn_clean(pfn); > > return ret != -EAGAIN ? ret : 0; > > } > > > > -- > > 2.46.0.rc1.232.g9752f9e123-goog > > > > -- > Thanks, > Oliver >
On Tue, 06 Aug 2024 00:26:54 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Mon, Aug 05, 2024 at 11:26:03PM +0000, Oliver Upton wrote: > > [+cc Fuad] > > Take 2! > > > Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting > > the page as dirty out of the critical section") that restructuring > > around the MMU lock was helpful for reuse (presumably for pKVM), but I > > lack the context there. > > > > On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote: > > > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a > > > page/folio dirty after it has been written back can make some filesystems > > > unhappy (backing KVM guests will such filesystem files is uncommon, and > > > > typo: s/will/with/ > > > > > the race is minuscule, hence the lack of complaints). See the link below > > > for details. Should we consider reverting 9c30fc615daa then? Thanks, M.
On Tue, Aug 06, 2024, Marc Zyngier wrote: > On Tue, 06 Aug 2024 00:26:54 +0100, > Oliver Upton <oliver.upton@linux.dev> wrote: > > > > On Mon, Aug 05, 2024 at 11:26:03PM +0000, Oliver Upton wrote: > > > [+cc Fuad] > > > > Take 2! > > > > > Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting > > > the page as dirty out of the critical section") that restructuring > > > around the MMU lock was helpful for reuse (presumably for pKVM), but I > > > lack the context there. > > > > > > On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote: > > > > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a > > > > page/folio dirty after it has been written back can make some filesystems > > > > unhappy (backing KVM guests will such filesystem files is uncommon, and > > > > > > typo: s/will/with/ > > > > > > > the race is minuscule, hence the lack of complaints). See the link below > > > > for details. > > Should we consider reverting 9c30fc615daa then? Aha! After thinking through things more, I don't think a revert is necessary. I _think_ the worst case scenario is that KVM would trigger this WARN in filemap_unaccount_folio(): /* * At this point folio must be either written or cleaned by * truncate. Dirty folio here signals a bug and loss of * unwritten data - on ordinary filesystems. * * But it's harmless on in-memory filesystems like tmpfs; and can * occur when a driver which did get_user_pages() sets page dirty * before putting it, while the inode is being finally evicted. * * Below fixes dirty accounting after removing the folio entirely * but leaves the dirty flag set: it has no effect for truncated * folio and anyway will be cleared before returning folio to * buddy allocator. */ if (WARN_ON_ONCE(folio_test_dirty(folio) && mapping_can_writeback(mapping))) folio_account_cleaned(folio, inode_to_wb(mapping->host)); KVM won't actually write memory because the stage-2 mappings are protected by the mmu_notifier, i.e. there is no risk of loss of data, even if the VM were backed by memory that needs writeback. And FWIW, given that multiple other KVM architectures mark folios dirty outside of mmu_notifier protection and have never tripped over this, I think it's highly unlikely the WARN will ever be triggered by a sane virtualization setup. I can add something to that effect to the changelog, e.g. to document that this isn't super urgent.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 22ee37360c4e..ce13c3d884d5 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } out_unlock: + if (writable && !ret) + kvm_set_pfn_dirty(pfn); + else + kvm_release_pfn_clean(pfn); + read_unlock(&kvm->mmu_lock); /* Mark the page dirty only if the fault is handled successfully */ - if (writable && !ret) { - kvm_set_pfn_dirty(pfn); + if (writable && !ret) mark_page_dirty_in_slot(kvm, memslot, gfn); - } - kvm_release_pfn_clean(pfn); return ret != -EAGAIN ? ret : 0; }
Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a page/folio dirty after it has been written back can make some filesystems unhappy (backing KVM guests will such filesystem files is uncommon, and the race is minuscule, hence the lack of complaints). See the link below for details. This will also allow converting arm64 to kvm_release_faultin_page(), which requires that mmu_lock be held (for the aforementioned reason). Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/arm64/kvm/mmu.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)