Message ID | 20201202052330.474592-1-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
Headers | show |
Series | prohibit pinning pages in ZONE_MOVABLE | expand |
Hello, On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote: > When page is pinned it cannot be moved and its physical address stays > the same until pages is unpinned. > > This is useful functionality to allows userland to implementation DMA > access. For example, it is used by vfio in vfio_pin_pages(). > > However, this functionality breaks memory hotplug/hotremove assumptions > that pages in ZONE_MOVABLE can always be migrated. > > This patch series fixes this issue by forcing new allocations during > page pinning to omit ZONE_MOVABLE, and also to migrate any existing > pages from ZONE_MOVABLE during pinning. I love what this patchset does, but, at least, it's better to consider the side-effect of this patchset and inform it in somewhere. IIUC, ZONE_MOVABLE exists for two purposes. 1) increasing availability of THP 2) memory hot-unplug Potential issue would come from the case 1). They uses ZONE_MOVABLE for THP availability and hard guarantee for migration isn't required until now. So, there would be a system with following congifuration. - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB - memory usage: unmovable-256MB, movable pinned-256MB, movable unpinned-512MB With this patchset, movable pinned should be placed in ZONE_NORMAL so 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and system performance would be highly afftect according to memory usage pattern. I'm not sure whether such configuration exists or not, but, at least, it's better to write down this risk on commit message or something else. Thanks.
On Thu, Dec 3, 2020 at 11:03 PM Joonsoo Kim <js1304@gmail.com> wrote: > > Hello, > > On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote: > > When page is pinned it cannot be moved and its physical address stays > > the same until pages is unpinned. > > > > This is useful functionality to allows userland to implementation DMA > > access. For example, it is used by vfio in vfio_pin_pages(). > > > > However, this functionality breaks memory hotplug/hotremove assumptions > > that pages in ZONE_MOVABLE can always be migrated. > > > > This patch series fixes this issue by forcing new allocations during > > page pinning to omit ZONE_MOVABLE, and also to migrate any existing > > pages from ZONE_MOVABLE during pinning. > > I love what this patchset does, but, at least, it's better to consider > the side-effect of this patchset and inform it in somewhere. IIUC, > ZONE_MOVABLE exists for two purposes. > > 1) increasing availability of THP > 2) memory hot-unplug > > Potential issue would come from the case 1). They uses ZONE_MOVABLE > for THP availability and hard guarantee for migration isn't required > until now. So, there would be a system with following congifuration. > > - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB > - memory usage: unmovable-256MB, movable pinned-256MB, movable > unpinned-512MB > > With this patchset, movable pinned should be placed in ZONE_NORMAL so > 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and > system performance would be highly afftect according to memory usage > pattern. > > I'm not sure whether such configuration exists or not, but, at least, > it's better to write down this risk on commit message or something > else. Yes, this indeed could be a problem for some configurations. I will add your comment to the commit log of one of the patches. Thank you, Pasha
On Fri, Dec 04, 2020 at 10:55:30AM -0500, Pavel Tatashin wrote: > On Thu, Dec 3, 2020 at 11:03 PM Joonsoo Kim <js1304@gmail.com> wrote: > > > > Hello, > > > > On Wed, Dec 02, 2020 at 12:23:24AM -0500, Pavel Tatashin wrote: > > > When page is pinned it cannot be moved and its physical address stays > > > the same until pages is unpinned. > > > > > > This is useful functionality to allows userland to implementation DMA > > > access. For example, it is used by vfio in vfio_pin_pages(). > > > > > > However, this functionality breaks memory hotplug/hotremove assumptions > > > that pages in ZONE_MOVABLE can always be migrated. > > > > > > This patch series fixes this issue by forcing new allocations during > > > page pinning to omit ZONE_MOVABLE, and also to migrate any existing > > > pages from ZONE_MOVABLE during pinning. > > > > I love what this patchset does, but, at least, it's better to consider > > the side-effect of this patchset and inform it in somewhere. IIUC, > > ZONE_MOVABLE exists for two purposes. > > > > 1) increasing availability of THP > > 2) memory hot-unplug > > > > Potential issue would come from the case 1). They uses ZONE_MOVABLE > > for THP availability and hard guarantee for migration isn't required > > until now. So, there would be a system with following congifuration. > > > > - memory layout: ZONE_NORMAL-512MB, ZONE_MOVABLE-512MB > > - memory usage: unmovable-256MB, movable pinned-256MB, movable > > unpinned-512MB > > > > With this patchset, movable pinned should be placed in ZONE_NORMAL so > > 512MB is required for ZONE_NORMAL. ZONE_NORMAL would be exhausted and > > system performance would be highly afftect according to memory usage > > pattern. > > > > I'm not sure whether such configuration exists or not, but, at least, > > it's better to write down this risk on commit message or something > > else. > > Yes, this indeed could be a problem for some configurations. I will > add your comment to the commit log of one of the patches. It sounds like there is some inherent tension here, breaking THP's when doing pin_user_pages() is a really nasty thing to do. DMA benefits greatly from THP. I know nothing about ZONE_MOVABLE, is this auto-setup or an admin option? If the result of this patch is standard systems can no longer pin > 80% of their memory I have some regression concerns.. Jason
> > Yes, this indeed could be a problem for some configurations. I will > > add your comment to the commit log of one of the patches. > > It sounds like there is some inherent tension here, breaking THP's > when doing pin_user_pages() is a really nasty thing to do. DMA > benefits greatly from THP. > > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin > option? If the result of this patch is standard systems can no longer > pin > 80% of their memory I have some regression concerns.. ZONE_MOVABLE can be configured via kernel parameter, or when memory nodes are onlined after hot-add; so this is something that admins configure. ZONE_MOVABLE is designed to gurantee memory hot-plug functionality, and not availability of THP, however, I did not know about the use case where some admins might configure ZONE_MOVABLE to increase availability of THP because pages are always migratable in them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in it, the availability of THP also suffers. We can migrate pages in ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL as well, which is the usual case.
On 04.12.20 18:50, Pavel Tatashin wrote: >>> Yes, this indeed could be a problem for some configurations. I will >>> add your comment to the commit log of one of the patches. >> >> It sounds like there is some inherent tension here, breaking THP's >> when doing pin_user_pages() is a really nasty thing to do. DMA >> benefits greatly from THP. >> >> I know nothing about ZONE_MOVABLE, is this auto-setup or an admin >> option? If the result of this patch is standard systems can no longer >> pin > 80% of their memory I have some regression concerns.. > > ZONE_MOVABLE can be configured via kernel parameter, or when memory > nodes are onlined after hot-add; so this is something that admins > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug > functionality, and not availability of THP, however, I did not know > about the use case where some admins might configure ZONE_MOVABLE to > increase availability of THP because pages are always migratable in > them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in > it, the availability of THP also suffers. We can migrate pages in > ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL > as well, which is the usual case. Right, we should document this at some place to make admins aware of this. Something like "Techniques that rely on long-term pinnings of memory (especially, RDMA and vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory hotunplug. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that memory can still get hotunplugged - be aware that pinning can fail even if there is plenty of free memory in ZONE_MOVABLE. In addition, using ZONE_MOVABLE might make page pinning more expensive, because pages have to be migrated off that zone first." BTW, you might also want to update the comment for ZONE_MOVABLE in include/linux/mmzone.h at the end of this series, removing the special case of pinned pages (1.) and maybe adding what happens when trying to pin pages on ZONE_MOVABLE.
> > ZONE_MOVABLE can be configured via kernel parameter, or when memory > > nodes are onlined after hot-add; so this is something that admins > > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug > > functionality, and not availability of THP, however, I did not know > > about the use case where some admins might configure ZONE_MOVABLE to > > increase availability of THP because pages are always migratable in > > them. The thing is, if we fragment ZONE_MOVABLE by pinning pages in > > it, the availability of THP also suffers. We can migrate pages in > > ZONE_NORMAL, just not guaranteed, so we can create THP in ZONE_NORMAL > > as well, which is the usual case. > > Right, we should document this at some place to make admins aware of > this. Something like > > "Techniques that rely on long-term pinnings of memory (especially, RDMA > and vfio) are fundamentally problematic with ZONE_MOVABLE and, > therefore, memory hotunplug. Pinned pages cannot reside on ZONE_MOVABLE, > to guarantee that memory can still get hotunplugged - be aware that > pinning can fail even if there is plenty of free memory in ZONE_MOVABLE. > In addition, using ZONE_MOVABLE might make page pinning more expensive, > because pages have to be migrated off that zone first." Thanks, I will add this. > > BTW, you might also want to update the comment for ZONE_MOVABLE in > include/linux/mmzone.h at the end of this series, removing the special > case of pinned pages (1.) and maybe adding what happens when trying to > pin pages on ZONE_MOVABLE. Will do it. Thank you, Pasha
On Fri, Dec 04, 2020 at 12:50:56PM -0500, Pavel Tatashin wrote: > > > Yes, this indeed could be a problem for some configurations. I will > > > add your comment to the commit log of one of the patches. > > > > It sounds like there is some inherent tension here, breaking THP's > > when doing pin_user_pages() is a really nasty thing to do. DMA > > benefits greatly from THP. > > > > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin > > option? If the result of this patch is standard systems can no longer > > pin > 80% of their memory I have some regression concerns.. > > ZONE_MOVABLE can be configured via kernel parameter, or when memory > nodes are onlined after hot-add; so this is something that admins > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug Just note, the origin of ZONE_MOVABLE is to provide availability of huge page, especially, hugetlb page. AFAIK, not guarantee memory hot-plug. See following commit that introduces the ZONE_MOVABLE. 2a1e274 Create the ZONE_MOVABLE zone > functionality, and not availability of THP, however, I did not know > about the use case where some admins might configure ZONE_MOVABLE to The usecase is lightly mentioned in previous discussion. http://lkml.kernel.org/r/alpine.DEB.2.23.453.2011221300100.2830030@chino.kir.corp.google.com Anyway, I agree with your other arguments and this patchset. Thanks.
On Mon 07-12-20 16:12:50, Joonsoo Kim wrote: > On Fri, Dec 04, 2020 at 12:50:56PM -0500, Pavel Tatashin wrote: > > > > Yes, this indeed could be a problem for some configurations. I will > > > > add your comment to the commit log of one of the patches. > > > > > > It sounds like there is some inherent tension here, breaking THP's > > > when doing pin_user_pages() is a really nasty thing to do. DMA > > > benefits greatly from THP. > > > > > > I know nothing about ZONE_MOVABLE, is this auto-setup or an admin > > > option? If the result of this patch is standard systems can no longer > > > pin > 80% of their memory I have some regression concerns.. > > > > ZONE_MOVABLE can be configured via kernel parameter, or when memory > > nodes are onlined after hot-add; so this is something that admins > > configure. ZONE_MOVABLE is designed to gurantee memory hot-plug > > Just note, the origin of ZONE_MOVABLE is to provide availability of > huge page, especially, hugetlb page. AFAIK, not guarantee memory > hot-plug. See following commit that introduces the ZONE_MOVABLE. > > 2a1e274 Create the ZONE_MOVABLE zone > > > functionality, and not availability of THP, however, I did not know > > about the use case where some admins might configure ZONE_MOVABLE to > > The usecase is lightly mentioned in previous discussion. > > http://lkml.kernel.org/r/alpine.DEB.2.23.453.2011221300100.2830030@chino.kir.corp.google.com > > Anyway, I agree with your other arguments and this patchset. Yes, historically the original motivation for the movable zone was to help creating large pages via compaction. I also do remember Mel not being particularly happy about that. The thing is that the movability constrain is just too strict for this usecases because the movable zone, especially a lot of it, might be causing similar to lowmem/highmem problems very well known from 32b world. So an admin had to be always very careful when configuring to not cause zone pressure problems. Later on, with a higher demand on the memory hotplug - especially the hotremove usecases - it has become clear that the only reliable way for the memory offlining is to rule out any unmovable memory out of the way and that is why a rather strong properly of movable zone was relied on. In the end we are in two rather different requirements here. One for optimization and one for correctness. In this case I would much rather focus on the correctness aspect.