@@ -250,7 +250,7 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
* Attach the anon_vmas from src to dst.
* Returns 0 on success, -ENOMEM on failure.
*
- * If dst->anon_vma is NULL this function tries to find and reuse existing
+ * If reuse is true, this function tries to find and reuse existing
* anon_vma which has no vmas and only one child anon_vma. This prevents
* degradation of anon_vma hierarchy to endless linear chain in case of
* constantly forking task. On the other hand, an anon_vma with more than one
@@ -263,6 +263,18 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
{
struct anon_vma_chain *avc, *pavc;
struct anon_vma *root = NULL;
+ struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
+
+ /*
+ * If parent share anon_vma with its vm_prev, keep this sharing in in
+ * child.
+ *
+ * 1. Parent has vm_prev, which implies we have vm_prev.
+ * 2. Parent and its vm_prev have the same anon_vma.
+ */
+ if (reuse && pprev && pprev->anon_vma == src->anon_vma)
+ dst->anon_vma = prev->anon_vma;
+
list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma;
In function __anon_vma_prepare(), we will try to find anon_vma if it is possible to reuse it. While on fork, the logic is different. Since commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue"), function anon_vma_clone() tries to allocate new anon_vma for child process. But the logic here will allocate a new anon_vma for each vma, even in parent this vma is mergeable and share the same anon_vma with its sibling. This may do better for scalability issue, while it is not necessary to do so especially after interval tree is used. Commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") tries to reuse some anon_vma by counting child anon_vma and attached vmas. While for those mergeable anon_vmas, we can just reuse it and not necessary to go through the logic. After this change, kernel build test reduces 20% anon_vma allocation. Do the same kernel build test, it shows run time in sys reduced 8.1%. Origin real 2m50.467s user 17m52.002s sys 1m51.953s real 2m48.662s user 17m55.464s sys 1m50.553s real 2m51.143s user 17m59.687s sys 1m53.600s Patched real 2m45.478s user 17m37.069s sys 1m42.671s real 2m46.420s user 17m45.970s sys 1m43.175s real 2m47.404s user 17m51.531s sys 1m43.005s Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> --- mm/rmap.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)