diff mbox series

[v2,1/2] mm/userfaultfd: don't place zeropages when zeropages are disallowed

Message ID 20240327171737.919590-2-david@redhat.com (mailing list archive)
State New
Headers show
Series s390/mm: shared zeropage + KVM fixes | expand

Commit Message

David Hildenbrand March 27, 2024, 5:17 p.m. UTC
s390x must disable shared zeropages for processes running VMs, because
the VMs could end up making use of "storage keys" or protected
virtualization, which are incompatible with shared zeropages.

Yet, with userfaultfd it is possible to insert shared zeropages into
such processes. Let's fallback to simply allocating a fresh zeroed
anonymous folio and insert that instead.

mm_forbids_zeropage() was introduced in commit 593befa6ab74 ("mm: introduce
mm_forbids_zeropage function"), briefly before userfaultfd went
upstream.

Note that we don't want to fail the UFFDIO_ZEROPAGE request like we do
for hugetlb, it would be rather unexpected. Further, we also
cannot really indicated "not supported" to user space ahead of time: it
could be that the MM disallows zeropages after userfaultfd was already
registered.

Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/userfaultfd.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

Alexander Gordeev April 11, 2024, 12:26 p.m. UTC | #1
On Wed, Mar 27, 2024 at 06:17:36PM +0100, David Hildenbrand wrote:

Hi David,
...
>  static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
>  				     struct vm_area_struct *dst_vma,
>  				     unsigned long dst_addr)
> @@ -324,6 +355,9 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
>  	spinlock_t *ptl;
>  	int ret;
>  
> +	if (mm_forbids_zeropage(dst_vma->mm))

I assume, you were going to pass dst_vma->vm_mm here?
This patch does not compile otherwise.
...

Thanks!
David Hildenbrand April 11, 2024, 12:30 p.m. UTC | #2
On 11.04.24 14:26, Alexander Gordeev wrote:
> On Wed, Mar 27, 2024 at 06:17:36PM +0100, David Hildenbrand wrote:
> 
> Hi David,
> ...
>>   static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
>>   				     struct vm_area_struct *dst_vma,
>>   				     unsigned long dst_addr)
>> @@ -324,6 +355,9 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
>>   	spinlock_t *ptl;
>>   	int ret;
>>   
>> +	if (mm_forbids_zeropage(dst_vma->mm))
> 
> I assume, you were going to pass dst_vma->vm_mm here?
> This patch does not compile otherwise.

Ah, I compiled it only on x86, where the parameter is ignored ... and 
for testing the code path I forced mm_forbids_zeropage to be 1 on x86.

Yes, this must be dst_vma->vm_mm.

Thanks!
David Hildenbrand April 11, 2024, 12:55 p.m. UTC | #3
On 11.04.24 14:30, David Hildenbrand wrote:
> On 11.04.24 14:26, Alexander Gordeev wrote:
>> On Wed, Mar 27, 2024 at 06:17:36PM +0100, David Hildenbrand wrote:
>>
>> Hi David,
>> ...
>>>    static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
>>>    				     struct vm_area_struct *dst_vma,
>>>    				     unsigned long dst_addr)
>>> @@ -324,6 +355,9 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
>>>    	spinlock_t *ptl;
>>>    	int ret;
>>>    
>>> +	if (mm_forbids_zeropage(dst_vma->mm))
>>
>> I assume, you were going to pass dst_vma->vm_mm here?
>> This patch does not compile otherwise.
> 
> Ah, I compiled it only on x86, where the parameter is ignored ... and
> for testing the code path I forced mm_forbids_zeropage to be 1 on x86.

Now I get it, I compiled it all on s390x, but not the individual 
patches, so patch #2 hid the issue in patch #1. Sneaky. :)
diff mbox series

Patch

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 712160cd41ec..9d385696fb89 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -316,6 +316,37 @@  static int mfill_atomic_pte_copy(pmd_t *dst_pmd,
 	goto out;
 }
 
+static int mfill_atomic_pte_zeroed_folio(pmd_t *dst_pmd,
+		 struct vm_area_struct *dst_vma, unsigned long dst_addr)
+{
+	struct folio *folio;
+	int ret = -ENOMEM;
+
+	folio = vma_alloc_zeroed_movable_folio(dst_vma, dst_addr);
+	if (!folio)
+		return ret;
+
+	if (mem_cgroup_charge(folio, dst_vma->vm_mm, GFP_KERNEL))
+		goto out_put;
+
+	/*
+	 * The memory barrier inside __folio_mark_uptodate makes sure that
+	 * zeroing out the folio become visible before mapping the page
+	 * using set_pte_at(). See do_anonymous_page().
+	 */
+	__folio_mark_uptodate(folio);
+
+	ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
+				       &folio->page, true, 0);
+	if (ret)
+		goto out_put;
+
+	return 0;
+out_put:
+	folio_put(folio);
+	return ret;
+}
+
 static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
 				     struct vm_area_struct *dst_vma,
 				     unsigned long dst_addr)
@@ -324,6 +355,9 @@  static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
 	spinlock_t *ptl;
 	int ret;
 
+	if (mm_forbids_zeropage(dst_vma->mm))
+		return mfill_atomic_pte_zeroed_folio(dst_pmd, dst_vma, dst_addr);
+
 	_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
 					 dst_vma->vm_page_prot));
 	ret = -EAGAIN;