From patchwork Mon Mar 30 23:59:19 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Izik Eidus X-Patchwork-Id: 15293 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n2V00xss026854 for ; Tue, 31 Mar 2009 00:01:02 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760225AbZCaAAy (ORCPT ); Mon, 30 Mar 2009 20:00:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760204AbZCaAAy (ORCPT ); Mon, 30 Mar 2009 20:00:54 -0400 Received: from mx2.redhat.com ([66.187.237.31]:59731 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760032AbZCaAAu (ORCPT ); Mon, 30 Mar 2009 20:00:50 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n2V00hYM028297; Mon, 30 Mar 2009 20:00:43 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n2V00hnk004587; Mon, 30 Mar 2009 20:00:43 -0400 Received: from localhost.localdomain (dhcp-0-45.tlv.redhat.com [10.35.0.45]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n2V00FJr029158; Mon, 30 Mar 2009 20:00:40 -0400 From: Izik Eidus Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, avi@redhat.com, aarcange@redhat.com, chrisw@redhat.com, riel@redhat.com, jeremy@goop.org, mtosatti@redhat.com, hugh@veritas.com, corbet@lwn.net, yaniv@redhat.com, dmonakhov@openvz.org, Izik Eidus Subject: [PATCH 3/4] add replace_page(): change the page pte is pointing to. Date: Tue, 31 Mar 2009 02:59:19 +0300 Message-Id: <1238457560-7613-4-git-send-email-ieidus@redhat.com> In-Reply-To: <1238457560-7613-3-git-send-email-ieidus@redhat.com> References: <1238457560-7613-1-git-send-email-ieidus@redhat.com> <1238457560-7613-2-git-send-email-ieidus@redhat.com> <1238457560-7613-3-git-send-email-ieidus@redhat.com> X-Scanned-By: MIMEDefang 2.58 on 172.16.27.26 To: unlisted-recipients:; (no To-header on input) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org replace_page() allow changing the mapping of pte from one physical page into diffrent physical page. this function is working by removing oldpage from the rmap and calling put_page on it, and by setting the pte to point into newpage and by inserting it to the rmap using page_add_file_rmap(). note: newpage must be non anonymous page, the reason for this is: replace_page() is built to allow mapping one page into more than one virtual addresses, the mapping of this page can happen in diffrent offsets inside each vma, and therefore we cannot trust the page->index anymore. the side effect of this issue is that newpage cannot be anything but kernel allocated page that is not swappable. Signed-off-by: Izik Eidus --- include/linux/mm.h | 5 +++ mm/memory.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 065cdf8..b19e4c2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1237,6 +1237,11 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); +#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) +int replace_page(struct vm_area_struct *vma, struct page *oldpage, + struct page *newpage, pte_t orig_pte, pgprot_t prot); +#endif + struct page *follow_page(struct vm_area_struct *, unsigned long address, unsigned int foll_flags); #define FOLL_WRITE 0x01 /* check pte is writable */ diff --git a/mm/memory.c b/mm/memory.c index 0382a34..3946e79 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1562,6 +1562,86 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vm_insert_mixed); +#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE) + +/** + * replace_page - replace page in vma with new page + * @vma: vma that hold the pte oldpage is pointed by. + * @oldpage: the page we are replacing with newpage + * @newpage: the page we replace oldpage with + * @orig_pte: the original value of the pte + * @prot: page protection bits + * + * Returns 0 on success, -EFAULT on failure. + * + * Note: @newpage must not be an anonymous page because replace_page() does + * not change the mapping of @newpage to have the same values as @oldpage. + * @newpage can be mapped in several vmas at different offsets (page->index). + */ +int replace_page(struct vm_area_struct *vma, struct page *oldpage, + struct page *newpage, pte_t orig_pte, pgprot_t prot) +{ + struct mm_struct *mm = vma->vm_mm; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *ptep; + spinlock_t *ptl; + unsigned long addr; + int ret; + + BUG_ON(PageAnon(newpage)); + + ret = -EFAULT; + addr = page_address_in_vma(oldpage, vma); + if (addr == -EFAULT) + goto out; + + pgd = pgd_offset(mm, addr); + if (!pgd_present(*pgd)) + goto out; + + pud = pud_offset(pgd, addr); + if (!pud_present(*pud)) + goto out; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + goto out; + + ptep = pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!ptep) + goto out; + + if (!pte_same(*ptep, orig_pte)) { + pte_unmap_unlock(ptep, ptl); + goto out; + } + + ret = 0; + get_page(newpage); + page_add_file_rmap(newpage); + + flush_cache_page(vma, addr, pte_pfn(*ptep)); + ptep_clear_flush(vma, addr, ptep); + set_pte_at_notify(mm, addr, ptep, mk_pte(newpage, prot)); + + page_remove_rmap(oldpage); + if (PageAnon(oldpage)) { + dec_mm_counter(mm, anon_rss); + inc_mm_counter(mm, file_rss); + } + put_page(oldpage); + + pte_unmap_unlock(ptep, ptl); + +out: + return ret; +} +EXPORT_SYMBOL_GPL(replace_page); + +#endif + /* * maps a range of physical memory into the requested pages. the old * mappings are removed. any references to nonexistent pages results