Message ID | 20201016225254.3853109-1-guro@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: cma: make cma_release() non-blocking | expand |
On 10/16/20 3:52 PM, Roman Gushchin wrote: > This small patchset makes cma_release() non-blocking and simplifies > the code in hugetlbfs, where previously we had to temporarily drop > hugetlb_lock around the cma_release() call. > > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic > THP under a memory pressure requires a cma_release() call. If it's > a blocking function, it complicates the already complicated code. > Because there are at least two use cases like this (hugetlbfs is > another example), I believe it's just better to make cma_release() > non-blocking. > > It also makes it more consistent with other memory releasing functions > in the kernel: most of them are non-blocking. Thanks for looking into this Roman. I may be missing something, but why does cma_release have to be blocking today? Certainly, it takes the bitmap in cma_clear_bitmap and could block. However, I do not see why cma->lock has to be a mutex. I may be missing something, but I do not see any code protected by the mutex doing anything that could sleep? Could we simply change that mutex to a spinlock?
On 2020/10/17 6:52, Roman Gushchin wrote: > This small patchset makes cma_release() non-blocking and simplifies > the code in hugetlbfs, where previously we had to temporarily drop > hugetlb_lock around the cma_release() call. > > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic > THP under a memory pressure requires a cma_release() call. If it's > a blocking function, it complicates the already complicated code. > Because there are at least two use cases like this (hugetlbfs is > another example), I believe it's just better to make cma_release() > non-blocking. > > It also makes it more consistent with other memory releasing functions > in the kernel: most of them are non-blocking. > > > Roman Gushchin (2): > mm: cma: make cma_release() non-blocking > mm: hugetlb: don't drop hugetlb_lock around cma_release() call > > mm/cma.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-- > mm/hugetlb.c | 6 ------ > 2 files changed, 49 insertions(+), 8 deletions(-) > I don't think this patch is a good idea.It transfers part or even all of the time of cma_release to cma_alloc, which is more concerned by performance indicators. On Android phones, CPU resource competition is intense in many scenarios, As a result, kernel threads and workers can be scheduled only after some ticks or more. In this case, the performance of cma_alloc will deteriorate significantly, which is not good news for many services on Android.
On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote: > On 10/16/20 3:52 PM, Roman Gushchin wrote: > > This small patchset makes cma_release() non-blocking and simplifies > > the code in hugetlbfs, where previously we had to temporarily drop > > hugetlb_lock around the cma_release() call. > > > > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic > > THP under a memory pressure requires a cma_release() call. If it's > > a blocking function, it complicates the already complicated code. > > Because there are at least two use cases like this (hugetlbfs is > > another example), I believe it's just better to make cma_release() > > non-blocking. > > > > It also makes it more consistent with other memory releasing functions > > in the kernel: most of them are non-blocking. > > Thanks for looking into this Roman. Hi Mike, > > I may be missing something, but why does cma_release have to be blocking > today? Certainly, it takes the bitmap in cma_clear_bitmap and could > block. However, I do not see why cma->lock has to be a mutex. I may be > missing something, but I do not see any code protected by the mutex doing > anything that could sleep? > > Could we simply change that mutex to a spinlock? I actually have suggested it few months ago, but the idea was opposed by Joonsoo: https://lkml.org/lkml/2020/4/3/12 . The time of a bitmap operation is definitely not an issue in my context, but I can't speak for something like an embedded/rt case. Thanks!
On Thu, Oct 22, 2020 at 09:54:53AM +0800, Xiaqing (A) wrote: > > > On 2020/10/17 6:52, Roman Gushchin wrote: > > > This small patchset makes cma_release() non-blocking and simplifies > > the code in hugetlbfs, where previously we had to temporarily drop > > hugetlb_lock around the cma_release() call. > > > > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic > > THP under a memory pressure requires a cma_release() call. If it's > > a blocking function, it complicates the already complicated code. > > Because there are at least two use cases like this (hugetlbfs is > > another example), I believe it's just better to make cma_release() > > non-blocking. > > > > It also makes it more consistent with other memory releasing functions > > in the kernel: most of them are non-blocking. > > > > > > Roman Gushchin (2): > > mm: cma: make cma_release() non-blocking > > mm: hugetlb: don't drop hugetlb_lock around cma_release() call > > > > mm/cma.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-- > > mm/hugetlb.c | 6 ------ > > 2 files changed, 49 insertions(+), 8 deletions(-) > > > I don't think this patch is a good idea.It transfers part or even all of the time of > cma_release to cma_alloc, which is more concerned by performance indicators. I'm not quite sure: if cma_alloc() is racing with cma_release(), cma_alloc() will wait for the cma_lock mutex anyway. So we don't really transfer anything to cma_alloc(). > On Android phones, CPU resource competition is intense in many scenarios, > As a result, kernel threads and workers can be scheduled only after some ticks or more. > In this case, the performance of cma_alloc will deteriorate significantly, > which is not good news for many services on Android. Ok, I agree, if the cpu is heavily loaded, it might affect the total execution time. If we aren't going into the mutex->spinlock conversion direction (as Mike suggested), we can address the performance concerns by introducing a cma_release_nowait() function, so that the default cma_release() would work in the old way. cma_release_nowait() can set an atomic flag on a cma area, which will cause following cma_alloc()'s to flush the release queue. In this case there will be no performance penalty unless somebody is using cma_release_nowait(). Will it work for you? Thank you!
On 2020/10/22 10:45, Roman Gushchin wrote: > On Thu, Oct 22, 2020 at 09:54:53AM +0800, Xiaqing (A) wrote: >> >> On 2020/10/17 6:52, Roman Gushchin wrote: >> >>> This small patchset makes cma_release() non-blocking and simplifies >>> the code in hugetlbfs, where previously we had to temporarily drop >>> hugetlb_lock around the cma_release() call. >>> >>> It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic >>> THP under a memory pressure requires a cma_release() call. If it's >>> a blocking function, it complicates the already complicated code. >>> Because there are at least two use cases like this (hugetlbfs is >>> another example), I believe it's just better to make cma_release() >>> non-blocking. >>> >>> It also makes it more consistent with other memory releasing functions >>> in the kernel: most of them are non-blocking. >>> >>> >>> Roman Gushchin (2): >>> mm: cma: make cma_release() non-blocking >>> mm: hugetlb: don't drop hugetlb_lock around cma_release() call >>> >>> mm/cma.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-- >>> mm/hugetlb.c | 6 ------ >>> 2 files changed, 49 insertions(+), 8 deletions(-) >>> >> I don't think this patch is a good idea.It transfers part or even all of the time of >> cma_release to cma_alloc, which is more concerned by performance indicators. > I'm not quite sure: if cma_alloc() is racing with cma_release(), cma_alloc() will > wait for the cma_lock mutex anyway. So we don't really transfer anything to cma_alloc(). > >> On Android phones, CPU resource competition is intense in many scenarios, >> As a result, kernel threads and workers can be scheduled only after some ticks or more. >> In this case, the performance of cma_alloc will deteriorate significantly, >> which is not good news for many services on Android. > Ok, I agree, if the cpu is heavily loaded, it might affect the total execution time. > > If we aren't going into the mutex->spinlock conversion direction (as Mike suggested), > we can address the performance concerns by introducing a cma_release_nowait() function, > so that the default cma_release() would work in the old way. > cma_release_nowait() can set an atomic flag on a cma area, which will cause following > cma_alloc()'s to flush the release queue. In this case there will be no performance > penalty unless somebody is using cma_release_nowait(). > Will it work for you? That looks good to me. Thanks! > > Thank you! > >
On 10/21/20 7:33 PM, Roman Gushchin wrote: > On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote: >> On 10/16/20 3:52 PM, Roman Gushchin wrote: >>> This small patchset makes cma_release() non-blocking and simplifies >>> the code in hugetlbfs, where previously we had to temporarily drop >>> hugetlb_lock around the cma_release() call. >>> >>> It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic >>> THP under a memory pressure requires a cma_release() call. If it's >>> a blocking function, it complicates the already complicated code. >>> Because there are at least two use cases like this (hugetlbfs is >>> another example), I believe it's just better to make cma_release() >>> non-blocking. >>> >>> It also makes it more consistent with other memory releasing functions >>> in the kernel: most of them are non-blocking. >> >> Thanks for looking into this Roman. > > Hi Mike, > >> >> I may be missing something, but why does cma_release have to be blocking >> today? Certainly, it takes the bitmap in cma_clear_bitmap and could >> block. However, I do not see why cma->lock has to be a mutex. I may be >> missing something, but I do not see any code protected by the mutex doing >> anything that could sleep? >> >> Could we simply change that mutex to a spinlock? > > I actually have suggested it few months ago, but the idea was > opposed by Joonsoo: https://lkml.org/lkml/2020/4/3/12 . > > The time of a bitmap operation is definitely not an issue in my context, > but I can't speak for something like an embedded/rt case. > I wonder if it may be time to look into replacing the cma area bitmap with some other data structure? Joonsoo was concerned about the time required to traverse the bitmap for an 8GB area. With new support for allocating 1GB hugetlb pages from cma, I can imagine someone setting up a cma area that is hundreds of GB if not TB in size. It is going take some time to traverse a bitmap describing such a huge area.
On Thu, Oct 22, 2020 at 09:42:11AM -0700, Mike Kravetz wrote: > On 10/21/20 7:33 PM, Roman Gushchin wrote: > > On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote: > >> On 10/16/20 3:52 PM, Roman Gushchin wrote: > >>> This small patchset makes cma_release() non-blocking and simplifies > >>> the code in hugetlbfs, where previously we had to temporarily drop > >>> hugetlb_lock around the cma_release() call. > >>> > >>> It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic > >>> THP under a memory pressure requires a cma_release() call. If it's > >>> a blocking function, it complicates the already complicated code. > >>> Because there are at least two use cases like this (hugetlbfs is > >>> another example), I believe it's just better to make cma_release() > >>> non-blocking. > >>> > >>> It also makes it more consistent with other memory releasing functions > >>> in the kernel: most of them are non-blocking. > >> > >> Thanks for looking into this Roman. > > > > Hi Mike, > > > >> > >> I may be missing something, but why does cma_release have to be blocking > >> today? Certainly, it takes the bitmap in cma_clear_bitmap and could > >> block. However, I do not see why cma->lock has to be a mutex. I may be > >> missing something, but I do not see any code protected by the mutex doing > >> anything that could sleep? > >> > >> Could we simply change that mutex to a spinlock? > > > > I actually have suggested it few months ago, but the idea was > > opposed by Joonsoo: https://lkml.org/lkml/2020/4/3/12 . > > > > The time of a bitmap operation is definitely not an issue in my context, > > but I can't speak for something like an embedded/rt case. > > > > I wonder if it may be time to look into replacing the cma area bitmap > with some other data structure? Joonsoo was concerned about the time > required to traverse the bitmap for an 8GB area. With new support for > allocating 1GB hugetlb pages from cma, I can imagine someone setting > up a cma area that is hundreds of GB if not TB in size. It is going > take some time to traverse a bitmap describing such a huge area. If the cma area is used exclusively for 1 GB allocations, the bitmap can have only 1 bit per GB, so it shouldn't be a big problem. Long-term I have some hopes to be able to allocate 1 GB pages without a need to reserve a cma area: we can try to group pages based on their mobility on a 1 GB scale, so that all non-movable pages will reside in few 1 GB blocks. I'm looking into that direction, but don't have any results yet. If this idea fails and we'll end up allocating a large cma area unconditionally and shrink it on demand (I think Rik suggested something like this), replacing the bitmap with something else sounds like a good idea to me. As now, I want to unblock Zi Yan on his work on 1 GB THPs, so maybe we can go with introducing cma_release_nowait(), as I suggested in the other e-mail in this thread? Do you have any objections? Thanks!
On 10/22/20 10:16 AM, Roman Gushchin wrote: > As now, I want to unblock Zi Yan on his work on 1 GB THPs, so maybe > we can go with introducing cma_release_nowait(), as I suggested in > the other e-mail in this thread? Do you have any objections? No objections from me. And, I like that it could be used by the hugetlb code to make it simpler.