Message ID | 20240902225009.34576-1-21cnbao@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration | expand |
On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > The mmap_write_lock() can block all access to the VMAs, for example page > faults. Performing memory allocation while holding this lock may trigger > direct reclamation, leading to others being queued in the rwsem for an > extended period. > We've observed that the allocation can sometimes take more than 300ms, > significantly blocking other threads. The user interface sometimes > becomes less responsive as a result. To prevent this, let's move the > allocation outside of the write lock. > A potential side effect could be an extra alloc_page() for the second > thread executing binder_install_single_page() while the first thread > has done it earlier. However, according to Tangquan's 48-hour profiling > using monkey, the likelihood of this occurring is minimal, with a ratio > of only 1 in 2400. Compared to the significantly costly rwsem, this is > negligible. > On the other hand, holding a write lock without making any VMA > modifications appears questionable and likely incorrect. While this > patch focuses on reducing the lock duration, future updates may aim > to eliminate the write lock entirely. > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: "Arve Hjønnevåg" <arve@android.com> > Cc: Todd Kjos <tkjos@android.com> > Cc: Martijn Coenen <maco@android.com> > Cc: Joel Fernandes <joel@joelfernandes.org> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Carlos Llamas <cmllamas@google.com> > Cc: Suren Baghdasaryan <surenb@google.com> > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index b3acbc4174fb..f20074e23a7c 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, > if (!mmget_not_zero(alloc->mm)) > return -ESRCH; > > + /* > + * Don't allocate page in mmap_write_lock, this can block > + * mmap_rwsem for a long time; Meanwhile, allocation failure > + * doesn't necessarily need to return -ENOMEM, if lru_page > + * has been installed, we can still return 0(success). > + */ > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); But now you are allocating new pages even if binder_get_installed_page() is an error, right? Doesn't that slow things down? How was this benchmarked? > + > /* > * Protected with mmap_sem in write mode as multiple tasks > * might race to install the same page. > */ > mmap_write_lock(alloc->mm); > - if (binder_get_installed_page(lru_page)) > + if (binder_get_installed_page(lru_page)) { > + ret = 1; That is not a valid error value :( > goto out; > + } > > if (!alloc->vma) { > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > goto out; > } > > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > if (!page) { > pr_err("%d: failed to allocate page\n", alloc->pid); > ret = -ENOMEM; > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > if (ret) { > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > alloc->pid, __func__, addr - alloc->buffer, ret); > - __free_page(page); > ret = -ENOMEM; > goto out; > } > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, > out: > mmap_write_unlock(alloc->mm); > mmput_async(alloc->mm); > - return ret; > + if (ret && page) > + __free_page(page); > + return ret < 0 ? ret : 0; Please only use ? : for when you have to, otherwise please spell it out with a normal if statement: if (ret < 0) return ret; return 0; But, this is abusing the fact that you set "ret = 1" above, which is going to trip someone up in the future as that is NOT a normal coding pattern we have in the kernel, sorry. If you insist on this change, please rework it to not have that type of "positive means one thing, 0 means another, and negative means yet something else" please. thanks, greg k-h
On Tue, Sep 3, 2024 at 4:57 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > faults. Performing memory allocation while holding this lock may trigger > > direct reclamation, leading to others being queued in the rwsem for an > > extended period. > > We've observed that the allocation can sometimes take more than 300ms, > > significantly blocking other threads. The user interface sometimes > > becomes less responsive as a result. To prevent this, let's move the > > allocation outside of the write lock. > > A potential side effect could be an extra alloc_page() for the second > > thread executing binder_install_single_page() while the first thread > > has done it earlier. However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a ratio > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible. > > On the other hand, holding a write lock without making any VMA > > modifications appears questionable and likely incorrect. While this > > patch focuses on reducing the lock duration, future updates may aim > > to eliminate the write lock entirely. > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Cc: "Arve Hjønnevåg" <arve@android.com> > > Cc: Todd Kjos <tkjos@android.com> > > Cc: Martijn Coenen <maco@android.com> > > Cc: Joel Fernandes <joel@joelfernandes.org> > > Cc: Christian Brauner <brauner@kernel.org> > > Cc: Carlos Llamas <cmllamas@google.com> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > --- > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > index b3acbc4174fb..f20074e23a7c 100644 > > --- a/drivers/android/binder_alloc.c > > +++ b/drivers/android/binder_alloc.c > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > if (!mmget_not_zero(alloc->mm)) > > return -ESRCH; > > > > + /* > > + * Don't allocate page in mmap_write_lock, this can block > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > + * has been installed, we can still return 0(success). > > + */ > > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > But now you are allocating new pages even if binder_get_installed_page() > is an error, right? Doesn't that slow things down? very very unlikely, as the ratio is only 1/2400 while write lock 100% blocks everyone. > > How was this benchmarked? > i actually put Tangquan's profiling result: " However, according to Tangquan's 48-hour profiling using monkey, the likelihood of this occurring is minimal, with a ratio of only 1 in 2400. Compared to the significantly costly rwsem, this is negligible." > > + > > /* > > * Protected with mmap_sem in write mode as multiple tasks > > * might race to install the same page. > > */ > > mmap_write_lock(alloc->mm); > > - if (binder_get_installed_page(lru_page)) > > + if (binder_get_installed_page(lru_page)) { > > + ret = 1; > > That is not a valid error value :( > > > goto out; > > + } > > > > if (!alloc->vma) { > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > goto out; > > } > > > > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > if (!page) { > > pr_err("%d: failed to allocate page\n", alloc->pid); > > ret = -ENOMEM; > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > if (ret) { > > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > > alloc->pid, __func__, addr - alloc->buffer, ret); > > - __free_page(page); > > ret = -ENOMEM; > > goto out; > > } > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > out: > > mmap_write_unlock(alloc->mm); > > mmput_async(alloc->mm); > > - return ret; > > + if (ret && page) > > + __free_page(page); > > + return ret < 0 ? ret : 0; > > Please only use ? : for when you have to, otherwise please spell it out > with a normal if statement: > if (ret < 0) > return ret; > return 0; > > But, this is abusing the fact that you set "ret = 1" above, which is > going to trip someone up in the future as that is NOT a normal coding > pattern we have in the kernel, sorry. > > If you insist on this change, please rework it to not have that type of > "positive means one thing, 0 means another, and negative means yet > something else" please. I was trying to consolidate all free_page() calls into one place. Otherwise, we would need multiple free_page() calls. I'm perfectly fine with having more free_page() calls in both the ret = 1 and ret < 0 paths. In that case, the ret = 1 path can be removed if you prefer. > > thanks, > > greg k-h Thanks Barry
On Tue, Sep 03, 2024 at 05:07:23PM +0800, Barry Song wrote: > On Tue, Sep 3, 2024 at 4:57 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > > faults. Performing memory allocation while holding this lock may trigger > > > direct reclamation, leading to others being queued in the rwsem for an > > > extended period. > > > We've observed that the allocation can sometimes take more than 300ms, > > > significantly blocking other threads. The user interface sometimes > > > becomes less responsive as a result. To prevent this, let's move the > > > allocation outside of the write lock. > > > A potential side effect could be an extra alloc_page() for the second > > > thread executing binder_install_single_page() while the first thread > > > has done it earlier. However, according to Tangquan's 48-hour profiling > > > using monkey, the likelihood of this occurring is minimal, with a ratio > > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > > negligible. > > > On the other hand, holding a write lock without making any VMA > > > modifications appears questionable and likely incorrect. While this > > > patch focuses on reducing the lock duration, future updates may aim > > > to eliminate the write lock entirely. > > > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > Cc: "Arve Hjønnevåg" <arve@android.com> > > > Cc: Todd Kjos <tkjos@android.com> > > > Cc: Martijn Coenen <maco@android.com> > > > Cc: Joel Fernandes <joel@joelfernandes.org> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Carlos Llamas <cmllamas@google.com> > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com> > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > --- > > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > > index b3acbc4174fb..f20074e23a7c 100644 > > > --- a/drivers/android/binder_alloc.c > > > +++ b/drivers/android/binder_alloc.c > > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > if (!mmget_not_zero(alloc->mm)) > > > return -ESRCH; > > > > > > + /* > > > + * Don't allocate page in mmap_write_lock, this can block > > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > > + * has been installed, we can still return 0(success). > > > + */ > > > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > But now you are allocating new pages even if binder_get_installed_page() > > is an error, right? Doesn't that slow things down? > > very very unlikely, as the ratio is only 1/2400 while write lock 100% blocks > everyone. Ok, but: > > How was this benchmarked? > > i actually put Tangquan's profiling result: > " > However, according to Tangquan's 48-hour profiling > using monkey, the likelihood of this occurring is minimal, with a ratio > of only 1 in 2400. Compared to the significantly costly rwsem, this is > negligible." That's not a benchmark, or any real numbers of how this overall saves any time. > > > /* > > > * Protected with mmap_sem in write mode as multiple tasks > > > * might race to install the same page. > > > */ > > > mmap_write_lock(alloc->mm); > > > - if (binder_get_installed_page(lru_page)) > > > + if (binder_get_installed_page(lru_page)) { > > > + ret = 1; > > > > That is not a valid error value :( > > > > > goto out; > > > + } > > > > > > if (!alloc->vma) { > > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > goto out; > > > } > > > > > > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > if (!page) { > > > pr_err("%d: failed to allocate page\n", alloc->pid); > > > ret = -ENOMEM; > > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > if (ret) { > > > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > > > alloc->pid, __func__, addr - alloc->buffer, ret); > > > - __free_page(page); > > > ret = -ENOMEM; > > > goto out; > > > } > > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > out: > > > mmap_write_unlock(alloc->mm); > > > mmput_async(alloc->mm); > > > - return ret; > > > + if (ret && page) > > > + __free_page(page); > > > + return ret < 0 ? ret : 0; > > > > Please only use ? : for when you have to, otherwise please spell it out > > with a normal if statement: > > if (ret < 0) > > return ret; > > return 0; > > > > But, this is abusing the fact that you set "ret = 1" above, which is > > going to trip someone up in the future as that is NOT a normal coding > > pattern we have in the kernel, sorry. > > > > If you insist on this change, please rework it to not have that type of > > "positive means one thing, 0 means another, and negative means yet > > something else" please. > > I was trying to consolidate all free_page() calls into one place. Otherwise, > we would need multiple free_page() calls. I'm perfectly fine with having more > free_page() calls in both the ret = 1 and ret < 0 paths. In that case, > the ret = 1 > path can be removed if you prefer. Remember, we write code for people first, and compilers second. You have to maintain this code for the next 10+ years, make it _VERY_ obvious what is happening and how it works as you will be coming back to it and not remembering what was made for what reason at all. thanks, greg k-h
On Tue, Sep 3, 2024 at 6:37 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Tue, Sep 03, 2024 at 05:07:23PM +0800, Barry Song wrote: > > On Tue, Sep 3, 2024 at 4:57 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > > > faults. Performing memory allocation while holding this lock may trigger > > > > direct reclamation, leading to others being queued in the rwsem for an > > > > extended period. > > > > We've observed that the allocation can sometimes take more than 300ms, > > > > significantly blocking other threads. The user interface sometimes > > > > becomes less responsive as a result. To prevent this, let's move the > > > > allocation outside of the write lock. > > > > A potential side effect could be an extra alloc_page() for the second > > > > thread executing binder_install_single_page() while the first thread > > > > has done it earlier. However, according to Tangquan's 48-hour profiling > > > > using monkey, the likelihood of this occurring is minimal, with a ratio > > > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > > > negligible. > > > > On the other hand, holding a write lock without making any VMA > > > > modifications appears questionable and likely incorrect. While this > > > > patch focuses on reducing the lock duration, future updates may aim > > > > to eliminate the write lock entirely. > > > > > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > Cc: "Arve Hjønnevåg" <arve@android.com> > > > > Cc: Todd Kjos <tkjos@android.com> > > > > Cc: Martijn Coenen <maco@android.com> > > > > Cc: Joel Fernandes <joel@joelfernandes.org> > > > > Cc: Christian Brauner <brauner@kernel.org> > > > > Cc: Carlos Llamas <cmllamas@google.com> > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com> > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > --- > > > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > > > index b3acbc4174fb..f20074e23a7c 100644 > > > > --- a/drivers/android/binder_alloc.c > > > > +++ b/drivers/android/binder_alloc.c > > > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > > if (!mmget_not_zero(alloc->mm)) > > > > return -ESRCH; > > > > > > > > + /* > > > > + * Don't allocate page in mmap_write_lock, this can block > > > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > > > + * has been installed, we can still return 0(success). > > > > + */ > > > > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > > > But now you are allocating new pages even if binder_get_installed_page() > > > is an error, right? Doesn't that slow things down? > > > > very very unlikely, as the ratio is only 1/2400 while write lock 100% blocks > > everyone. > > Ok, but: > > > > How was this benchmarked? > > > > i actually put Tangquan's profiling result: > > " > > However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a ratio > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible." > > That's not a benchmark, or any real numbers of how this overall saves > any time. My point is that mmap_rw is one of the most notorious locks impacting system performance. Our profiling indicates that this binder's rwsem lock is among the top two write locks, blocking page faults and other accesses to the VMA. From the memory management perspective, I believe the improvement will be quite evident. However, I agree that we should include more data to show how reducing lock contention can enhance overall user experience and minimize UI lag. > > > > > /* > > > > * Protected with mmap_sem in write mode as multiple tasks > > > > * might race to install the same page. > > > > */ > > > > mmap_write_lock(alloc->mm); > > > > - if (binder_get_installed_page(lru_page)) > > > > + if (binder_get_installed_page(lru_page)) { > > > > + ret = 1; > > > > > > That is not a valid error value :( > > > > > > > goto out; > > > > + } > > > > > > > > if (!alloc->vma) { > > > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > > goto out; > > > > } > > > > > > > > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > if (!page) { > > > > pr_err("%d: failed to allocate page\n", alloc->pid); > > > > ret = -ENOMEM; > > > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > > if (ret) { > > > > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > > > > alloc->pid, __func__, addr - alloc->buffer, ret); > > > > - __free_page(page); > > > > ret = -ENOMEM; > > > > goto out; > > > > } > > > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > > out: > > > > mmap_write_unlock(alloc->mm); > > > > mmput_async(alloc->mm); > > > > - return ret; > > > > + if (ret && page) > > > > + __free_page(page); > > > > + return ret < 0 ? ret : 0; > > > > > > Please only use ? : for when you have to, otherwise please spell it out > > > with a normal if statement: > > > if (ret < 0) > > > return ret; > > > return 0; > > > > > > But, this is abusing the fact that you set "ret = 1" above, which is > > > going to trip someone up in the future as that is NOT a normal coding > > > pattern we have in the kernel, sorry. > > > > > > If you insist on this change, please rework it to not have that type of > > > "positive means one thing, 0 means another, and negative means yet > > > something else" please. > > > > I was trying to consolidate all free_page() calls into one place. Otherwise, > > we would need multiple free_page() calls. I'm perfectly fine with having more > > free_page() calls in both the ret = 1 and ret < 0 paths. In that case, > > the ret = 1 > > path can be removed if you prefer. > > Remember, we write code for people first, and compilers second. You > have to maintain this code for the next 10+ years, make it _VERY_ > obvious what is happening and how it works as you will be coming back to > it and not remembering what was made for what reason at all. no objection to this. > > thanks, > > greg k-h Thanks Barry
On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > The mmap_write_lock() can block all access to the VMAs, for example page > faults. Performing memory allocation while holding this lock may trigger > direct reclamation, leading to others being queued in the rwsem for an > extended period. > We've observed that the allocation can sometimes take more than 300ms, > significantly blocking other threads. The user interface sometimes > becomes less responsive as a result. To prevent this, let's move the > allocation outside of the write lock. I suspect concurrent allocators make things better wrt response, cutting alloc latency down to 10ms for instance in your scenario. Feel free to show figures given Tangquan's 48-hour profiling. > A potential side effect could be an extra alloc_page() for the second > thread executing binder_install_single_page() while the first thread > has done it earlier. However, according to Tangquan's 48-hour profiling > using monkey, the likelihood of this occurring is minimal, with a ratio > of only 1 in 2400. Compared to the significantly costly rwsem, this is > negligible. > On the other hand, holding a write lock without making any VMA > modifications appears questionable and likely incorrect. While this > patch focuses on reducing the lock duration, future updates may aim > to eliminate the write lock entirely. If spin, better not before taking a look at vm_insert_page().
On Tue, Sep 3, 2024 at 7:01 PM Hillf Danton <hdanton@sina.com> wrote: > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > faults. Performing memory allocation while holding this lock may trigger > > direct reclamation, leading to others being queued in the rwsem for an > > extended period. > > We've observed that the allocation can sometimes take more than 300ms, > > significantly blocking other threads. The user interface sometimes > > becomes less responsive as a result. To prevent this, let's move the > > allocation outside of the write lock. > > I suspect concurrent allocators make things better wrt response, cutting > alloc latency down to 10ms for instance in your scenario. Feel free to > show figures given Tangquan's 48-hour profiling. Likely. Concurrent allocators are quite common in PFs which occur in the same PTE. whoever gets PTL sets PTE, others free the allocated pages. > > > A potential side effect could be an extra alloc_page() for the second > > thread executing binder_install_single_page() while the first thread > > has done it earlier. However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a ratio > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible. > > On the other hand, holding a write lock without making any VMA > > modifications appears questionable and likely incorrect. While this > > patch focuses on reducing the lock duration, future updates may aim > > to eliminate the write lock entirely. > > If spin, better not before taking a look at vm_insert_page(). I have patch 2/3 transitioning to mmap_read_lock, and per_vma_lock is currently in the testing queue. At the moment, alloc->spin is in place, but I'm not entirely convinced it's the best replacement for the write lock. Let's wait for Tangquan's test results. Patch 2 is detailed below, but it has only passed the build-test phase so far, so its result is uncertain. I'm sharing it early in case you find it interesting. And I am not convinced Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()") is a correct fix to really avoid all UAF of alloc->vma. [PATCH] binder_alloc: Don't use mmap_write_lock for installing page Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()") uses the mmap_rwsem write lock to protect against a race condition with munmap, where the vma is detached by the write lock, but pages are zapped by the read lock. This approach is extremely expensive for the system, though perhaps less so for binder itself, as the write lock can block all other operations. As an alternative, we could hold only the read lock and re-check that the vma hasn't been detached. To protect simultaneous page installation, we could use alloc->lock instead. Signed-off-by: Barry Song <v-songbaohua@oppo.com> --- drivers/android/binder_alloc.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index f20074e23a7c..a2281dfacbbc 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -228,24 +228,17 @@ static int binder_install_single_page(struct binder_alloc *alloc, return -ESRCH; /* - * Don't allocate page in mmap_write_lock, this can block - * mmap_rwsem for a long time; Meanwhile, allocation failure - * doesn't necessarily need to return -ENOMEM, if lru_page - * has been installed, we can still return 0(success). + * Allocation failure doesn't necessarily need to return -ENOMEM, + * if lru_page has been installed, we can still return 0(success). + * So, defer the !page check until after binder_get_installed_page() + * is completed. */ page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); - /* - * Protected with mmap_sem in write mode as multiple tasks - * might race to install the same page. - */ - mmap_write_lock(alloc->mm); - if (binder_get_installed_page(lru_page)) { - ret = 1; - goto out; - } + mmap_read_lock(alloc->mm); - if (!alloc->vma) { + /* vma might have been dropped or deattached */ + if (!alloc->vma || !find_vma(alloc->mm, addr)) { pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); ret = -ESRCH; goto out; @@ -257,18 +250,27 @@ static int binder_install_single_page(struct binder_alloc *alloc, goto out; } + spin_lock(&alloc->lock); + if (binder_get_installed_page(lru_page)) { + spin_unlock(&alloc->lock); + ret = 1; + goto out; + } + ret = vm_insert_page(alloc->vma, addr, page); if (ret) { pr_err("%d: %s failed to insert page at offset %lx with %d\n", alloc->pid, __func__, addr - alloc->buffer, ret); + spin_unlock(&alloc->lock); ret = -ENOMEM; goto out; } /* Mark page installation complete and safe to use */ binder_set_installed_page(lru_page, page); + spin_unlock(&alloc->lock); out: - mmap_write_unlock(alloc->mm); + mmap_read_unlock(alloc->mm); mmput_async(alloc->mm); if (ret && page) __free_page(page); -- 2.39.3 (Apple Git-146)
On Tue, Sep 03, 2024 at 07:45:12PM +0800, Barry Song wrote: > On Tue, Sep 3, 2024 at 7:01 PM Hillf Danton <hdanton@sina.com> wrote: > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > > faults. Performing memory allocation while holding this lock may trigger > > > direct reclamation, leading to others being queued in the rwsem for an > > > extended period. > > > We've observed that the allocation can sometimes take more than 300ms, > > > significantly blocking other threads. The user interface sometimes > > > becomes less responsive as a result. To prevent this, let's move the > > > allocation outside of the write lock. Thanks for you patch Barry. So, we are aware of this contention and I've been working on a fix for it. See more about this below. > > > > I suspect concurrent allocators make things better wrt response, cutting > > alloc latency down to 10ms for instance in your scenario. Feel free to > > show figures given Tangquan's 48-hour profiling. > > Likely. > > Concurrent allocators are quite common in PFs which occur > in the same PTE. whoever gets PTL sets PTE, others free the allocated > pages. > > > > > > A potential side effect could be an extra alloc_page() for the second > > > thread executing binder_install_single_page() while the first thread > > > has done it earlier. However, according to Tangquan's 48-hour profiling > > > using monkey, the likelihood of this occurring is minimal, with a ratio > > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > > negligible. This is not negligible. In fact, it is the exact reason for the page allocation to be done with the mmap sem. If the first thread sleeps on vm_insert_page(), then binder gets into a bad state of multiple threads trying to reclaim pages that won't really be used. Memory pressure goes from bad to worst pretty quick. FWIW, I believe this was first talked about here: https://lore.kernel.org/all/ZWmNpxPXZSxdmDE1@google.com/ > > > On the other hand, holding a write lock without making any VMA > > > modifications appears questionable and likely incorrect. While this > > > patch focuses on reducing the lock duration, future updates may aim > > > to eliminate the write lock entirely. > > > > If spin, better not before taking a look at vm_insert_page(). > > I have patch 2/3 transitioning to mmap_read_lock, and per_vma_lock is > currently in the > testing queue. At the moment, alloc->spin is in place, but I'm not > entirely convinced > it's the best replacement for the write lock. Let's wait for > Tangquan's test results. > > Patch 2 is detailed below, but it has only passed the build-test phase > so far, so > its result is uncertain. I'm sharing it early in case you find it > interesting. And I > am not convinced Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in > race with munmap()") is a correct fix to really avoid all UAF of alloc->vma. > > [PATCH] binder_alloc: Don't use mmap_write_lock for installing page > > Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with > munmap()") uses the mmap_rwsem write lock to protect against a race > condition with munmap, where the vma is detached by the write lock, > but pages are zapped by the read lock. This approach is extremely > expensive for the system, though perhaps less so for binder itself, > as the write lock can block all other operations. > > As an alternative, we could hold only the read lock and re-check > that the vma hasn't been detached. To protect simultaneous page > installation, we could use alloc->lock instead. > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > drivers/android/binder_alloc.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index f20074e23a7c..a2281dfacbbc 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -228,24 +228,17 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > return -ESRCH; > > /* > - * Don't allocate page in mmap_write_lock, this can block > - * mmap_rwsem for a long time; Meanwhile, allocation failure > - * doesn't necessarily need to return -ENOMEM, if lru_page > - * has been installed, we can still return 0(success). > + * Allocation failure doesn't necessarily need to return -ENOMEM, > + * if lru_page has been installed, we can still return 0(success). > + * So, defer the !page check until after binder_get_installed_page() > + * is completed. > */ > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > - /* > - * Protected with mmap_sem in write mode as multiple tasks > - * might race to install the same page. > - */ > - mmap_write_lock(alloc->mm); > - if (binder_get_installed_page(lru_page)) { > - ret = 1; > - goto out; > - } > + mmap_read_lock(alloc->mm); > > - if (!alloc->vma) { > + /* vma might have been dropped or deattached */ > + if (!alloc->vma || !find_vma(alloc->mm, addr)) { > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > ret = -ESRCH; > goto out; > @@ -257,18 +250,27 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > goto out; > } > > + spin_lock(&alloc->lock); You can't hold a spinlock and then call vm_insert_page(). > + if (binder_get_installed_page(lru_page)) { > + spin_unlock(&alloc->lock); > + ret = 1; > + goto out; > + } > + > ret = vm_insert_page(alloc->vma, addr, page); > if (ret) { > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > alloc->pid, __func__, addr - alloc->buffer, ret); > + spin_unlock(&alloc->lock); > ret = -ENOMEM; > goto out; > } > > /* Mark page installation complete and safe to use */ > binder_set_installed_page(lru_page, page); > + spin_unlock(&alloc->lock); > out: > - mmap_write_unlock(alloc->mm); > + mmap_read_unlock(alloc->mm); > mmput_async(alloc->mm); > if (ret && page) > __free_page(page); > -- > 2.39.3 (Apple Git-146) Sorry, but as I mentioned, I've been working on fixing this contention by supporting concurrent "faults" in binder_install_single_page(). This is the appropriate fix. I should be sending a patch soon after working out the conflicts with the shrinker's callback. Thanks, -- Carlos Llamas
On Wed, Sep 4, 2024 at 1:55 AM Carlos Llamas <cmllamas@google.com> wrote: > > On Tue, Sep 03, 2024 at 07:45:12PM +0800, Barry Song wrote: > > On Tue, Sep 3, 2024 at 7:01 PM Hillf Danton <hdanton@sina.com> wrote: > > > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > > > faults. Performing memory allocation while holding this lock may trigger > > > > direct reclamation, leading to others being queued in the rwsem for an > > > > extended period. > > > > We've observed that the allocation can sometimes take more than 300ms, > > > > significantly blocking other threads. The user interface sometimes > > > > becomes less responsive as a result. To prevent this, let's move the > > > > allocation outside of the write lock. > > Thanks for you patch Barry. So, we are aware of this contention and I've > been working on a fix for it. See more about this below. Cool, Carlos! > > > > > > > I suspect concurrent allocators make things better wrt response, cutting > > > alloc latency down to 10ms for instance in your scenario. Feel free to > > > show figures given Tangquan's 48-hour profiling. > > > > Likely. > > > > Concurrent allocators are quite common in PFs which occur > > in the same PTE. whoever gets PTL sets PTE, others free the allocated > > pages. > > > > > > > > > A potential side effect could be an extra alloc_page() for the second > > > > thread executing binder_install_single_page() while the first thread > > > > has done it earlier. However, according to Tangquan's 48-hour profiling > > > > using monkey, the likelihood of this occurring is minimal, with a ratio > > > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > > > negligible. > > This is not negligible. In fact, it is the exact reason for the page > allocation to be done with the mmap sem. If the first thread sleeps on > vm_insert_page(), then binder gets into a bad state of multiple threads > trying to reclaim pages that won't really be used. Memory pressure goes > from bad to worst pretty quick. > > FWIW, I believe this was first talked about here: > https://lore.kernel.org/all/ZWmNpxPXZSxdmDE1@google.com/ However, I'm not entirely convinced that this is a problem :-) Concurrent allocations like this can occur in many places, especially in PFs. Reclamation is not useless because it helps free up memory for others; it's not without value. I also don't believe binder is one of the largest users executing concurrent allocations. > > > > > > On the other hand, holding a write lock without making any VMA > > > > modifications appears questionable and likely incorrect. While this > > > > patch focuses on reducing the lock duration, future updates may aim > > > > to eliminate the write lock entirely. > > > > > > If spin, better not before taking a look at vm_insert_page(). > > > > I have patch 2/3 transitioning to mmap_read_lock, and per_vma_lock is > > currently in the > > testing queue. At the moment, alloc->spin is in place, but I'm not > > entirely convinced > > it's the best replacement for the write lock. Let's wait for > > Tangquan's test results. > > > > Patch 2 is detailed below, but it has only passed the build-test phase > > so far, so > > its result is uncertain. I'm sharing it early in case you find it > > interesting. And I > > am not convinced Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in > > race with munmap()") is a correct fix to really avoid all UAF of alloc->vma. > > > > [PATCH] binder_alloc: Don't use mmap_write_lock for installing page > > > > Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with > > munmap()") uses the mmap_rwsem write lock to protect against a race > > condition with munmap, where the vma is detached by the write lock, > > but pages are zapped by the read lock. This approach is extremely > > expensive for the system, though perhaps less so for binder itself, > > as the write lock can block all other operations. > > > > As an alternative, we could hold only the read lock and re-check > > that the vma hasn't been detached. To protect simultaneous page > > installation, we could use alloc->lock instead. > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > --- > > drivers/android/binder_alloc.c | 32 +++++++++++++++++--------------- > > 1 file changed, 17 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > index f20074e23a7c..a2281dfacbbc 100644 > > --- a/drivers/android/binder_alloc.c > > +++ b/drivers/android/binder_alloc.c > > @@ -228,24 +228,17 @@ static int binder_install_single_page(struct > > binder_alloc *alloc, > > return -ESRCH; > > > > /* > > - * Don't allocate page in mmap_write_lock, this can block > > - * mmap_rwsem for a long time; Meanwhile, allocation failure > > - * doesn't necessarily need to return -ENOMEM, if lru_page > > - * has been installed, we can still return 0(success). > > + * Allocation failure doesn't necessarily need to return -ENOMEM, > > + * if lru_page has been installed, we can still return 0(success). > > + * So, defer the !page check until after binder_get_installed_page() > > + * is completed. > > */ > > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > - /* > > - * Protected with mmap_sem in write mode as multiple tasks > > - * might race to install the same page. > > - */ > > - mmap_write_lock(alloc->mm); > > - if (binder_get_installed_page(lru_page)) { > > - ret = 1; > > - goto out; > > - } > > + mmap_read_lock(alloc->mm); > > > > - if (!alloc->vma) { > > + /* vma might have been dropped or deattached */ > > + if (!alloc->vma || !find_vma(alloc->mm, addr)) { > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > ret = -ESRCH; > > goto out; > > @@ -257,18 +250,27 @@ static int binder_install_single_page(struct > > binder_alloc *alloc, > > goto out; > > } > > > > + spin_lock(&alloc->lock); > > You can't hold a spinlock and then call vm_insert_page(). Thanks! This patch has only passed the build test so far. It seems like we can hold off on further testing for now. > > > + if (binder_get_installed_page(lru_page)) { > > + spin_unlock(&alloc->lock); > > + ret = 1; > > + goto out; > > + } > > + > > ret = vm_insert_page(alloc->vma, addr, page); > > if (ret) { > > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > > alloc->pid, __func__, addr - alloc->buffer, ret); > > + spin_unlock(&alloc->lock); > > ret = -ENOMEM; > > goto out; > > } > > > > /* Mark page installation complete and safe to use */ > > binder_set_installed_page(lru_page, page); > > + spin_unlock(&alloc->lock); > > out: > > - mmap_write_unlock(alloc->mm); > > + mmap_read_unlock(alloc->mm); > > mmput_async(alloc->mm); > > if (ret && page) > > __free_page(page); > > -- > > 2.39.3 (Apple Git-146) > > > Sorry, but as I mentioned, I've been working on fixing this contention > by supporting concurrent "faults" in binder_install_single_page(). This > is the appropriate fix. I should be sending a patch soon after working > out the conflicts with the shrinker's callback. Awesome! I’m eager to see your patch, and we’re ready to help with testing. I strongly recommend dropping the write lock entirely. Using `mmap_write_lock()` isn’t just a binder-specific concern; it has the potential to affect the entire Android system. In patch 3, I experimented with using `per_vma_lock` as well. I’m _not_ proposing it for merging since you’re already working on it, but I wanted to share the idea. (just like patch2, it has only passed build-test) [PATCH] binder_alloc: Further move to per_vma_lock from mmap_read_lock To further reduce the read lock duration, let's try using per_vma_lock first. If that fails, we can take the read lock, similar to how page fault handlers operate. Signed-off-by: Barry Song <v-songbaohua@oppo.com> --- drivers/android/binder_alloc.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index a2281dfacbbc..b40a5dd650c8 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -221,6 +221,8 @@ static int binder_install_single_page(struct binder_alloc *alloc, struct binder_lru_page *lru_page, unsigned long addr) { + struct vm_area_struct *vma; + bool per_vma_lock = true; struct page *page; int ret = 0; @@ -235,10 +237,15 @@ static int binder_install_single_page(struct binder_alloc *alloc, */ page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); - mmap_read_lock(alloc->mm); + vma = lock_vma_under_rcu(alloc->mm, addr); + if (!vma) { + per_vma_lock = false; + mmap_read_lock(alloc->mm); + vma = find_vma(alloc->mm, addr); + } - /* vma might have been dropped or deattached */ - if (!alloc->vma || !find_vma(alloc->mm, addr)) { + /* vma might have been dropped, deattached or changed to new one */ + if (!alloc->vma || !vma || vma != alloc->vma) { pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); ret = -ESRCH; goto out; @@ -270,7 +277,10 @@ static int binder_install_single_page(struct binder_alloc *alloc, binder_set_installed_page(lru_page, page); spin_unlock(&alloc->lock); out: - mmap_read_unlock(alloc->mm); + if (per_vma_lock) + vma_end_read(vma); + else + mmap_read_unlock(alloc->mm); mmput_async(alloc->mm); if (ret && page) __free_page(page);
On Wed, Sep 04, 2024 at 09:10:20AM +1200, Barry Song wrote: > However, I'm not entirely convinced that this is a problem :-) Concurrent > allocations like this can occur in many places, especially in PFs. Reclamation > is not useless because it helps free up memory for others; it's not > without value. Yeah, for binder though there is a high chance of multiple callers trying to allocate the _same_ page concurrently. What I observed is that pages being reclaimed "in vain" are often in the same vma and this means subsequent callers will need to allocate these pages. But even if this wasn't an issue, the correct solution would still be to support concurrent faults. So, in reality it doesn't matter and we still need to refactor the call to be non-exclusive. > I also don't believe binder is one of the largest users executing concurrent > allocations. Oh, I have no statistics on this. > Awesome! I’m eager to see your patch, and we’re ready to help with testing. > I strongly recommend dropping the write lock entirely. Using > `mmap_write_lock()` isn’t just a binder-specific concern; it has the > potential to affect the entire Android system. > > In patch 3, I experimented with using `per_vma_lock` as well. I’m _not_ > proposing it for merging since you’re already working on it, but I wanted > to share the idea. (just like patch2, it has only passed build-test) Yeap, I'm using the per-vma-lock too per Suren's suggestion. > > [PATCH] binder_alloc: Further move to per_vma_lock from mmap_read_lock > > To further reduce the read lock duration, let's try using per_vma_lock > first. If that fails, we can take the read lock, similar to how page > fault handlers operate. > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index a2281dfacbbc..b40a5dd650c8 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -221,6 +221,8 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > struct binder_lru_page *lru_page, > unsigned long addr) > { > + struct vm_area_struct *vma; > + bool per_vma_lock = true; > struct page *page; > int ret = 0; > > @@ -235,10 +237,15 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > */ > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > - mmap_read_lock(alloc->mm); > + vma = lock_vma_under_rcu(alloc->mm, addr); > + if (!vma) { > + per_vma_lock = false; > + mmap_read_lock(alloc->mm); > + vma = find_vma(alloc->mm, addr); > + } > > - /* vma might have been dropped or deattached */ > - if (!alloc->vma || !find_vma(alloc->mm, addr)) { > + /* vma might have been dropped, deattached or changed to new one */ > + if (!alloc->vma || !vma || vma != alloc->vma) { > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > ret = -ESRCH; > goto out; > @@ -270,7 +277,10 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > binder_set_installed_page(lru_page, page); > spin_unlock(&alloc->lock); > out: > - mmap_read_unlock(alloc->mm); > + if (per_vma_lock) > + vma_end_read(vma); > + else > + mmap_read_unlock(alloc->mm); > mmput_async(alloc->mm); > if (ret && page) > __free_page(page); > -- > 2.39.3 (Apple Git-146) This looks fairly similar to my patch. However, you would need to handle the case were vm_insert_page() returns -EBUSY (success that raced) and also sync with the shrinker callbacks in binder_alloc_free_page() which is the biggest concern. Let's not duplicate efforts though. Can we please wait for my patch? I'll add you as Co-Developed-by, since you've posted this already? Regards, Carlos Llamas
On Wed, Sep 4, 2024 at 10:23 AM Carlos Llamas <cmllamas@google.com> wrote: > > On Wed, Sep 04, 2024 at 09:10:20AM +1200, Barry Song wrote: > > However, I'm not entirely convinced that this is a problem :-) Concurrent > > allocations like this can occur in many places, especially in PFs. Reclamation > > is not useless because it helps free up memory for others; it's not > > without value. > > Yeah, for binder though there is a high chance of multiple callers > trying to allocate the _same_ page concurrently. What I observed is that > pages being reclaimed "in vain" are often in the same vma and this means > subsequent callers will need to allocate these pages. > > But even if this wasn't an issue, the correct solution would still be to > support concurrent faults. So, in reality it doesn't matter and we still > need to refactor the call to be non-exclusive. > > > I also don't believe binder is one of the largest users executing concurrent > > allocations. > > Oh, I have no statistics on this. > > > Awesome! I’m eager to see your patch, and we’re ready to help with testing. > > I strongly recommend dropping the write lock entirely. Using > > `mmap_write_lock()` isn’t just a binder-specific concern; it has the > > potential to affect the entire Android system. > > > > In patch 3, I experimented with using `per_vma_lock` as well. I’m _not_ > > proposing it for merging since you’re already working on it, but I wanted > > to share the idea. (just like patch2, it has only passed build-test) > > Yeap, I'm using the per-vma-lock too per Suren's suggestion. > > > > > [PATCH] binder_alloc: Further move to per_vma_lock from mmap_read_lock > > > > To further reduce the read lock duration, let's try using per_vma_lock > > first. If that fails, we can take the read lock, similar to how page > > fault handlers operate. > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > --- > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > index a2281dfacbbc..b40a5dd650c8 100644 > > --- a/drivers/android/binder_alloc.c > > +++ b/drivers/android/binder_alloc.c > > @@ -221,6 +221,8 @@ static int binder_install_single_page(struct > > binder_alloc *alloc, > > struct binder_lru_page *lru_page, > > unsigned long addr) > > { > > + struct vm_area_struct *vma; > > + bool per_vma_lock = true; > > struct page *page; > > int ret = 0; > > > > @@ -235,10 +237,15 @@ static int binder_install_single_page(struct > > binder_alloc *alloc, > > */ > > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > - mmap_read_lock(alloc->mm); > > + vma = lock_vma_under_rcu(alloc->mm, addr); > > + if (!vma) { > > + per_vma_lock = false; > > + mmap_read_lock(alloc->mm); > > + vma = find_vma(alloc->mm, addr); > > + } > > > > - /* vma might have been dropped or deattached */ > > - if (!alloc->vma || !find_vma(alloc->mm, addr)) { > > + /* vma might have been dropped, deattached or changed to new one */ > > + if (!alloc->vma || !vma || vma != alloc->vma) { > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > ret = -ESRCH; > > goto out; > > @@ -270,7 +277,10 @@ static int binder_install_single_page(struct > > binder_alloc *alloc, > > binder_set_installed_page(lru_page, page); > > spin_unlock(&alloc->lock); > > out: > > - mmap_read_unlock(alloc->mm); > > + if (per_vma_lock) > > + vma_end_read(vma); > > + else > > + mmap_read_unlock(alloc->mm); > > mmput_async(alloc->mm); > > if (ret && page) > > __free_page(page); > > -- > > 2.39.3 (Apple Git-146) > > This looks fairly similar to my patch. However, you would need to handle > the case were vm_insert_page() returns -EBUSY (success that raced) and > also sync with the shrinker callbacks in binder_alloc_free_page() which > is the biggest concern. > > Let's not duplicate efforts though. Can we please wait for my patch? > I'll add you as Co-Developed-by, since you've posted this already? Yes, I’d be more than happy to wait for your patch, as I believe you have much much more experience with binder. > > Regards, > Carlos Llamas Thanks Barry
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index b3acbc4174fb..f20074e23a7c 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, if (!mmget_not_zero(alloc->mm)) return -ESRCH; + /* + * Don't allocate page in mmap_write_lock, this can block + * mmap_rwsem for a long time; Meanwhile, allocation failure + * doesn't necessarily need to return -ENOMEM, if lru_page + * has been installed, we can still return 0(success). + */ + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); + /* * Protected with mmap_sem in write mode as multiple tasks * might race to install the same page. */ mmap_write_lock(alloc->mm); - if (binder_get_installed_page(lru_page)) + if (binder_get_installed_page(lru_page)) { + ret = 1; goto out; + } if (!alloc->vma) { pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, goto out; } - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); if (!page) { pr_err("%d: failed to allocate page\n", alloc->pid); ret = -ENOMEM; @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, if (ret) { pr_err("%d: %s failed to insert page at offset %lx with %d\n", alloc->pid, __func__, addr - alloc->buffer, ret); - __free_page(page); ret = -ENOMEM; goto out; } @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, out: mmap_write_unlock(alloc->mm); mmput_async(alloc->mm); - return ret; + if (ret && page) + __free_page(page); + return ret < 0 ? ret : 0; } static int binder_install_buffer_pages(struct binder_alloc *alloc,