Message ID | 148184524161.184728.14005697153880489871.stgit@djiang5-desk3.ch.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > The caller into dax needs to clear __GFP_FS mask bit since it's > responsible for acquiring locks / transactions that blocks __GFP_FS > allocation. The caller will restore the original mask when dax function > returns. What's the allocation problem you're working around here? Can you please describe the call chain that is the problem? > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > if (IS_DAX(inode)) { > + gfp_t old_gfp = vmf->gfp_mask; > + > + vmf->gfp_mask &= ~__GFP_FS; > ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); > + vmf->gfp_mask = old_gfp; I really have to say that I hate code that clears and restores flags without any explanation of why the code needs to play flag tricks. I take one look at the XFS fault handling code and ask myself now "why the hell do we need to clear those flags?" Especially as the other paths into generic fault handlers /don't/ require us to do this. What does DAX do that require us to treat memory allocation contexts differently to the filemap_fault() path? Cheers, Dave.
On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote: > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > > The caller into dax needs to clear __GFP_FS mask bit since it's > > responsible for acquiring locks / transactions that blocks __GFP_FS > > allocation. The caller will restore the original mask when dax function > > returns. > > What's the allocation problem you're working around here? Can you > please describe the call chain that is the problem? > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > > > if (IS_DAX(inode)) { > > + gfp_t old_gfp = vmf->gfp_mask; > > + > > + vmf->gfp_mask &= ~__GFP_FS; > > ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); > > + vmf->gfp_mask = old_gfp; > > I really have to say that I hate code that clears and restores flags > without any explanation of why the code needs to play flag tricks. I > take one look at the XFS fault handling code and ask myself now "why > the hell do we need to clear those flags?" Especially as the other > paths into generic fault handlers /don't/ require us to do this. > What does DAX do that require us to treat memory allocation contexts > differently to the filemap_fault() path? This was done in response to Jan Kara's concern: The gfp_mask that propagates from __do_fault() or do_page_mkwrite() is fine because at that point it is correct. But once we grab filesystem locks which are not reclaim safe, we should update vmf->gfp_mask we pass further down into DAX code to not contain __GFP_FS (that's a bug we apparently have there). And inside DAX code, we definitely are not generally safe to add __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off propagating struct vm_fault into this function, using passed gfp_mask there and make sure callers update gfp_mask as appropriate. https://lkml.org/lkml/2016/10/4/37 IIUC I think the concern is that, for example, in xfs_filemap_page_mkwrite() we take a read lock on the struct inode.i_rwsem before we call dax_iomap_fault(). dax_iomap_fault() then calls find_or_create_page(), etc. with the vfm->gfp_mask we were given. I believe the concern is that if that memory allocation tries to do FS operations to free memory because __GFP_FS is part of the gfp mask, then we could end up deadlocking because we are already holding FS locks.
On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote: > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote: > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > > > The caller into dax needs to clear __GFP_FS mask bit since it's > > > responsible for acquiring locks / transactions that blocks __GFP_FS > > > allocation. The caller will restore the original mask when dax function > > > returns. > > > > What's the allocation problem you're working around here? Can you > > please describe the call chain that is the problem? > > > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > > > > > if (IS_DAX(inode)) { > > > + gfp_t old_gfp = vmf->gfp_mask; > > > + > > > + vmf->gfp_mask &= ~__GFP_FS; > > > ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); > > > + vmf->gfp_mask = old_gfp; > > > > I really have to say that I hate code that clears and restores flags > > without any explanation of why the code needs to play flag tricks. I > > take one look at the XFS fault handling code and ask myself now "why > > the hell do we need to clear those flags?" Especially as the other > > paths into generic fault handlers /don't/ require us to do this. > > What does DAX do that require us to treat memory allocation contexts > > differently to the filemap_fault() path? > > This was done in response to Jan Kara's concern: > > The gfp_mask that propagates from __do_fault() or do_page_mkwrite() is fine > because at that point it is correct. But once we grab filesystem locks which > are not reclaim safe, we should update vmf->gfp_mask we pass further down > into DAX code to not contain __GFP_FS (that's a bug we apparently have > there). And inside DAX code, we definitely are not generally safe to add > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off propagating struct > vm_fault into this function, using passed gfp_mask there and make sure > callers update gfp_mask as appropriate. > > https://lkml.org/lkml/2016/10/4/37 > > IIUC I think the concern is that, for example, in xfs_filemap_page_mkwrite() > we take a read lock on the struct inode.i_rwsem before we call > dax_iomap_fault(). That, my friends, is exactly the problem that mapping_gfp_mask() is meant to solve. This: > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS | __GFP_IO; Is just so wrong it's not funny. The whole point of mapping_gfp_mask() is to remove flags from the gfp_mask used to do mapping+page cache related allocations that the mapping->host considers dangerous when the host may be holding locks. This includes mapping tree allocations, and anything else required to set up a new entry in the mapping during IO path operations. That includes page fault operations... e.g. in xfs_setup_inode(): /* * Ensure all page cache allocations are done from GFP_NOFS context to * prevent direct reclaim recursion back into the filesystem and blowing * stacks or deadlocking. */ gfp_mask = mapping_gfp_mask(inode->i_mapping); mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS))); i.e. XFS considers it invalid to use GFP_FS at all for mapping allocations in the io path, because we *know* that we hold filesystems locks over those allocations. > dax_iomap_fault() then calls find_or_create_page(), etc. with the > vfm->gfp_mask we were given. Yup. Precisely why we should be using mapping_gfp_mask() as it was intended for vmf.gfp_mask.... > I believe the concern is that if that memory allocation tries to do FS > operations to free memory because __GFP_FS is part of the gfp mask, then we > could end up deadlocking because we are already holding FS locks. Which is a problem with the filesystem mapping mask setup, not a reason to sprinkle random gfpmask clear/set pairs around the code. i.e. For DAX inodes, the mapping mask should clear __GFP_FS as XFS does above, and the mapping_gfp_mask() should be used unadulterated by the DAX page fault code.... Cheers, Dave.
On Sat, 2016-12-17 at 09:04 +1100, Dave Chinner wrote: > On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote: > > > > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote: > > > > > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > > > > > > > > The caller into dax needs to clear __GFP_FS mask bit since it's > > > > responsible for acquiring locks / transactions that blocks > > > > __GFP_FS > > > > allocation. The caller will restore the original mask when dax > > > > function > > > > returns. > > > > > > What's the allocation problem you're working around here? Can you > > > please describe the call chain that is the problem? > > > > > > > > > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > > > > > > > if (IS_DAX(inode)) { > > > > + gfp_t old_gfp = vmf->gfp_mask; > > > > + > > > > + vmf->gfp_mask &= ~__GFP_FS; > > > > ret = dax_iomap_fault(vma, vmf, > > > > &xfs_iomap_ops); > > > > + vmf->gfp_mask = old_gfp; > > > > > > I really have to say that I hate code that clears and restores > > > flags > > > without any explanation of why the code needs to play flag > > > tricks. I > > > take one look at the XFS fault handling code and ask myself now > > > "why > > > the hell do we need to clear those flags?" Especially as the > > > other > > > paths into generic fault handlers /don't/ require us to do this. > > > What does DAX do that require us to treat memory allocation > > > contexts > > > differently to the filemap_fault() path? > > > > This was done in response to Jan Kara's concern: > > > > The gfp_mask that propagates from __do_fault() or > > do_page_mkwrite() is fine > > because at that point it is correct. But once we grab filesystem > > locks which > > are not reclaim safe, we should update vmf->gfp_mask we pass > > further down > > into DAX code to not contain __GFP_FS (that's a bug we apparently > > have > > there). And inside DAX code, we definitely are not generally safe > > to add > > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off > > propagating struct > > vm_fault into this function, using passed gfp_mask there and make > > sure > > callers update gfp_mask as appropriate. > > > > https://lkml.org/lkml/2016/10/4/37 > > > > IIUC I think the concern is that, for example, in > > xfs_filemap_page_mkwrite() > > we take a read lock on the struct inode.i_rwsem before we call > > dax_iomap_fault(). > > That, my friends, is exactly the problem that mapping_gfp_mask() is > meant to solve. This: > > > > > > > > > > > > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS > > > > | __GFP_IO; > > Is just so wrong it's not funny. > > The whole point of mapping_gfp_mask() is to remove flags from the > gfp_mask used to do mapping+page cache related allocations that the > mapping->host considers dangerous when the host may be holding locks. > This includes mapping tree allocations, and anything else required > to set up a new entry in the mapping during IO path operations. That > includes page fault operations... > > e.g. in xfs_setup_inode(): > > /* > * Ensure all page cache allocations are done from GFP_NOFS > context to > * prevent direct reclaim recursion back into the filesystem > and blowing > * stacks or deadlocking. > */ > gfp_mask = mapping_gfp_mask(inode->i_mapping); > mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & > ~(__GFP_FS))); > > i.e. XFS considers it invalid to use GFP_FS at all for mapping > allocations in the io path, because we *know* that we hold > filesystems locks over those allocations. > > > > > dax_iomap_fault() then calls find_or_create_page(), etc. with the > > vfm->gfp_mask we were given. > > Yup. Precisely why we should be using mapping_gfp_mask() as it was > intended for vmf.gfp_mask.... > > > > > I believe the concern is that if that memory allocation tries to do > > FS > > operations to free memory because __GFP_FS is part of the gfp mask, > > then we > > could end up deadlocking because we are already holding FS locks. > > Which is a problem with the filesystem mapping mask setup, not a > reason to sprinkle random gfpmask clear/set pairs around the code. > i.e. For DAX inodes, the mapping mask should clear __GFP_FS as XFS > does above, and the mapping_gfp_mask() should be used unadulterated > by the DAX page fault code.... I'll drop this patch. We can address the issue separate from the pmd_fault changes. > > Cheers, > > Dave.
On Sat 17-12-16 09:04:50, Dave Chinner wrote: > On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote: > > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote: > > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > > > > The caller into dax needs to clear __GFP_FS mask bit since it's > > > > responsible for acquiring locks / transactions that blocks __GFP_FS > > > > allocation. The caller will restore the original mask when dax function > > > > returns. > > > > > > What's the allocation problem you're working around here? Can you > > > please describe the call chain that is the problem? > > > > > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > > > > > > > if (IS_DAX(inode)) { > > > > + gfp_t old_gfp = vmf->gfp_mask; > > > > + > > > > + vmf->gfp_mask &= ~__GFP_FS; > > > > ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); > > > > + vmf->gfp_mask = old_gfp; > > > > > > I really have to say that I hate code that clears and restores flags > > > without any explanation of why the code needs to play flag tricks. I > > > take one look at the XFS fault handling code and ask myself now "why > > > the hell do we need to clear those flags?" Especially as the other > > > paths into generic fault handlers /don't/ require us to do this. > > > What does DAX do that require us to treat memory allocation contexts > > > differently to the filemap_fault() path? > > > > This was done in response to Jan Kara's concern: > > > > The gfp_mask that propagates from __do_fault() or do_page_mkwrite() is fine > > because at that point it is correct. But once we grab filesystem locks which > > are not reclaim safe, we should update vmf->gfp_mask we pass further down > > into DAX code to not contain __GFP_FS (that's a bug we apparently have > > there). And inside DAX code, we definitely are not generally safe to add > > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off propagating struct > > vm_fault into this function, using passed gfp_mask there and make sure > > callers update gfp_mask as appropriate. > > > > https://lkml.org/lkml/2016/10/4/37 > > > > IIUC I think the concern is that, for example, in xfs_filemap_page_mkwrite() > > we take a read lock on the struct inode.i_rwsem before we call > > dax_iomap_fault(). > > That, my friends, is exactly the problem that mapping_gfp_mask() is > meant to solve. This: > > > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS | __GFP_IO; > > Is just so wrong it's not funny. You mean like in mm/memory.c: __get_fault_gfp_mask()? Which was introduced by commit c20cd45eb017 "mm: allow GFP_{FS,IO} for page_cache_read page cache allocation" by Michal (added to CC) and you were even on CC ;). The code here was replicating __get_fault_gfp_mask() and in fact the idea of the cleanup is to get rid of this code and take whatever is in vmf.gfp_mask and mask off __GFP_FS in the filesystem if it deems it is needed (e.g. ext4 really needs this as inode reclaim is depending on being able to force a transaction commit). I agree with your point about comments, we should add those when changing gfp_mask. > The whole point of mapping_gfp_mask() is to remove flags from the > gfp_mask used to do mapping+page cache related allocations that the > mapping->host considers dangerous when the host may be holding locks. > This includes mapping tree allocations, and anything else required > to set up a new entry in the mapping during IO path operations. That > includes page fault operations... > > e.g. in xfs_setup_inode(): > > /* > * Ensure all page cache allocations are done from GFP_NOFS context to > * prevent direct reclaim recursion back into the filesystem and blowing > * stacks or deadlocking. > */ > gfp_mask = mapping_gfp_mask(inode->i_mapping); > mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS))); > > i.e. XFS considers it invalid to use GFP_FS at all for mapping > allocations in the io path, because we *know* that we hold > filesystems locks over those allocations. Well, this is a discussion you should probably have with Michal. DAX code was just mirroring what the generic code does. Michal had a valid points why page fault path is special and allocation of pages for a page fault should be fine with __GFP_FS - but if those assumptions are wrong for XFS, generic code needs to be fixed. Honza
On Mon, Dec 19, 2016 at 08:53:02PM +0100, Jan Kara wrote: > On Sat 17-12-16 09:04:50, Dave Chinner wrote: > > On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote: > > > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote: > > > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > > > > > The caller into dax needs to clear __GFP_FS mask bit since it's > > > > > responsible for acquiring locks / transactions that blocks __GFP_FS > > > > > allocation. The caller will restore the original mask when dax function > > > > > returns. > > > > > > > > What's the allocation problem you're working around here? Can you > > > > please describe the call chain that is the problem? > > > > > > > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > > > > > > > > > if (IS_DAX(inode)) { > > > > > + gfp_t old_gfp = vmf->gfp_mask; > > > > > + > > > > > + vmf->gfp_mask &= ~__GFP_FS; > > > > > ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); > > > > > + vmf->gfp_mask = old_gfp; > > > > > > > > I really have to say that I hate code that clears and restores flags > > > > without any explanation of why the code needs to play flag tricks. I > > > > take one look at the XFS fault handling code and ask myself now "why > > > > the hell do we need to clear those flags?" Especially as the other > > > > paths into generic fault handlers /don't/ require us to do this. > > > > What does DAX do that require us to treat memory allocation contexts > > > > differently to the filemap_fault() path? > > > > > > This was done in response to Jan Kara's concern: > > > > > > The gfp_mask that propagates from __do_fault() or do_page_mkwrite() is fine > > > because at that point it is correct. But once we grab filesystem locks which > > > are not reclaim safe, we should update vmf->gfp_mask we pass further down > > > into DAX code to not contain __GFP_FS (that's a bug we apparently have > > > there). And inside DAX code, we definitely are not generally safe to add > > > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off propagating struct > > > vm_fault into this function, using passed gfp_mask there and make sure > > > callers update gfp_mask as appropriate. > > > > > > https://lkml.org/lkml/2016/10/4/37 > > > > > > IIUC I think the concern is that, for example, in xfs_filemap_page_mkwrite() > > > we take a read lock on the struct inode.i_rwsem before we call > > > dax_iomap_fault(). > > > > That, my friends, is exactly the problem that mapping_gfp_mask() is > > meant to solve. This: > > > > > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS | __GFP_IO; > > > > Is just so wrong it's not funny. > > You mean like in mm/memory.c: __get_fault_gfp_mask()? > > Which was introduced by commit c20cd45eb017 "mm: allow GFP_{FS,IO} for > page_cache_read page cache allocation" by Michal (added to CC) and you were > even on CC ;). Sure, I was on the cc list, but that doesn't mean I /liked/ the patch. It also doesn't mean I had the time or patience to argue whether it was the right way to address whatever whacky OOM/reclaim deficiency was being reported.... Oh, and this is a write fault, not a read fault. There's a big difference in filesystem behaviour between those two types of faults, so what might be fine for a page cache read (i.e. no transactions) isn't necessarily correct for a write operation... > The code here was replicating __get_fault_gfp_mask() and in fact the idea > of the cleanup is to get rid of this code and take whatever is in > vmf.gfp_mask and mask off __GFP_FS in the filesystem if it deems it is > needed (e.g. ext4 really needs this as inode reclaim is depending on being > able to force a transaction commit). And so now we add a flag to the fault that the filesystem says not to add to mapping masks, and now the filesystem has to mask off thati flag /again/ because it's mapping gfp mask guidelines are essentially being ignored. Remind me again why we even have the mapping gfp_mask if we just ignore it like this? Cheers, Dave.
On Tue 20-12-16 08:17:11, Dave Chinner wrote: > On Mon, Dec 19, 2016 at 08:53:02PM +0100, Jan Kara wrote: > > On Sat 17-12-16 09:04:50, Dave Chinner wrote: > > > On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote: > > > > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote: > > > > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote: > > > > > > The caller into dax needs to clear __GFP_FS mask bit since it's > > > > > > responsible for acquiring locks / transactions that blocks __GFP_FS > > > > > > allocation. The caller will restore the original mask when dax function > > > > > > returns. > > > > > > > > > > What's the allocation problem you're working around here? Can you > > > > > please describe the call chain that is the problem? > > > > > > > > > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > > > > > > > > > > > if (IS_DAX(inode)) { > > > > > > + gfp_t old_gfp = vmf->gfp_mask; > > > > > > + > > > > > > + vmf->gfp_mask &= ~__GFP_FS; > > > > > > ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); > > > > > > + vmf->gfp_mask = old_gfp; > > > > > > > > > > I really have to say that I hate code that clears and restores flags > > > > > without any explanation of why the code needs to play flag tricks. I > > > > > take one look at the XFS fault handling code and ask myself now "why > > > > > the hell do we need to clear those flags?" Especially as the other > > > > > paths into generic fault handlers /don't/ require us to do this. > > > > > What does DAX do that require us to treat memory allocation contexts > > > > > differently to the filemap_fault() path? > > > > > > > > This was done in response to Jan Kara's concern: > > > > > > > > The gfp_mask that propagates from __do_fault() or do_page_mkwrite() is fine > > > > because at that point it is correct. But once we grab filesystem locks which > > > > are not reclaim safe, we should update vmf->gfp_mask we pass further down > > > > into DAX code to not contain __GFP_FS (that's a bug we apparently have > > > > there). And inside DAX code, we definitely are not generally safe to add > > > > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off propagating struct > > > > vm_fault into this function, using passed gfp_mask there and make sure > > > > callers update gfp_mask as appropriate. > > > > > > > > https://lkml.org/lkml/2016/10/4/37 > > > > > > > > IIUC I think the concern is that, for example, in xfs_filemap_page_mkwrite() > > > > we take a read lock on the struct inode.i_rwsem before we call > > > > dax_iomap_fault(). > > > > > > That, my friends, is exactly the problem that mapping_gfp_mask() is > > > meant to solve. This: > > > > > > > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS | __GFP_IO; > > > > > > Is just so wrong it's not funny. > > > > You mean like in mm/memory.c: __get_fault_gfp_mask()? > > > > Which was introduced by commit c20cd45eb017 "mm: allow GFP_{FS,IO} for > > page_cache_read page cache allocation" by Michal (added to CC) and you were > > even on CC ;). > > Sure, I was on the cc list, but that doesn't mean I /liked/ the > patch. It also doesn't mean I had the time or patience to argue > whether it was the right way to address whatever whacky OOM/reclaim > deficiency was being reported.... > > Oh, and this is a write fault, not a read fault. There's a big > difference in filesystem behaviour between those two types of > faults, so what might be fine for a page cache read (i.e. no > transactions) isn't necessarily correct for a write operation... > > > The code here was replicating __get_fault_gfp_mask() and in fact the idea > > of the cleanup is to get rid of this code and take whatever is in > > vmf.gfp_mask and mask off __GFP_FS in the filesystem if it deems it is > > needed (e.g. ext4 really needs this as inode reclaim is depending on being > > able to force a transaction commit). > > And so now we add a flag to the fault that the filesystem says not > to add to mapping masks, and now the filesystem has to mask off > thati flag /again/ because it's mapping gfp mask guidelines are > essentially being ignored. > > Remind me again why we even have the mapping gfp_mask if we just > ignore it like this? mapping mask still serves its _main_ purpose - the allocation placement/movability properties. This is something only the owner of the mapping knows. The (ab)use of the mapping gfp_mask to drop GFP_FS was imho a bad decision. As the above mentioned commit has mentioned we were doing a lot of GFP_NOFS allocations from the paths which are inherently GFP_KERNEL so they couldn't prevent from recursion problems while they still affected the direct relaim behavior. On the other hand I do understand why mapping's mask has been used at the time. We simply lacked a better api back then. But I believe that with the scope nofs [1] api we can do much better and get rid of ~__GFP_FS in mapping's mask finally. c20cd45eb017 was an intermediate step until we get there. I am not fully familiar with the DAX changes which started this discussion but if there is a reclaim recursion problem from within the fault path then the scope api sounds like a good fit here. [1] http://lkml.kernel.org/r/20161215140715.12732-1-mhocko@kernel.org
On Tue 20-12-16 11:13:52, Michal Hocko wrote: > I am not fully familiar with the DAX changes which started this > discussion but if there is a reclaim recursion problem from within the > fault path then the scope api sounds like a good fit here. > > [1] http://lkml.kernel.org/r/20161215140715.12732-1-mhocko@kernel.org Yes, once your scope API and associated ext4 changes are in, we can stop playing tricks with gfp_mask in DAX code at least for ext4. Honza
diff --git a/fs/dax.c b/fs/dax.c index d3fe880..6395bc6 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1380,6 +1380,7 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address, vmf.pgoff = pgoff; vmf.flags = flags; vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_IO; + vmf.gfp_mask &= ~__GFP_FS; switch (iomap.type) { case IOMAP_MAPPED: diff --git a/fs/ext2/file.c b/fs/ext2/file.c index b0f2415..8422d5f 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -92,16 +92,19 @@ static int ext2_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf) struct inode *inode = file_inode(vma->vm_file); struct ext2_inode_info *ei = EXT2_I(inode); int ret; + gfp_t old_gfp = vmf->gfp_mask; if (vmf->flags & FAULT_FLAG_WRITE) { sb_start_pagefault(inode->i_sb); file_update_time(vma->vm_file); } + vmf->gfp_mask &= ~__GFP_FS; down_read(&ei->dax_sem); ret = dax_iomap_fault(vma, vmf, &ext2_iomap_ops); up_read(&ei->dax_sem); + vmf->gfp_mask = old_gfp; if (vmf->flags & FAULT_FLAG_WRITE) sb_end_pagefault(inode->i_sb); return ret; @@ -114,6 +117,7 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma, struct ext2_inode_info *ei = EXT2_I(inode); loff_t size; int ret; + gfp_t old_gfp = vmf->gfp_mask; sb_start_pagefault(inode->i_sb); file_update_time(vma->vm_file); @@ -123,8 +127,11 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma, size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT; if (vmf->pgoff >= size) ret = VM_FAULT_SIGBUS; - else + else { + vmf->gfp_mask &= ~__GFP_FS; ret = dax_pfn_mkwrite(vma, vmf); + vmf->gfp_mask = old_gfp; + } up_read(&ei->dax_sem); sb_end_pagefault(inode->i_sb); diff --git a/fs/ext4/file.c b/fs/ext4/file.c index d663d3d..a3f2bf0 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -261,14 +261,17 @@ static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf) struct inode *inode = file_inode(vma->vm_file); struct super_block *sb = inode->i_sb; bool write = vmf->flags & FAULT_FLAG_WRITE; + gfp_t old_gfp = vmf->gfp_mask; if (write) { sb_start_pagefault(sb); file_update_time(vma->vm_file); } + vmf->gfp_mask &= ~__GFP_FS; down_read(&EXT4_I(inode)->i_mmap_sem); result = dax_iomap_fault(vma, vmf, &ext4_iomap_ops); up_read(&EXT4_I(inode)->i_mmap_sem); + vmf->gfp_mask = old_gfp; if (write) sb_end_pagefault(sb); @@ -320,8 +323,13 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma, size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT; if (vmf->pgoff >= size) ret = VM_FAULT_SIGBUS; - else + else { + gfp_t old_gfp = vmf->gfp_mask; + + vmf->gfp_mask &= ~__GFP_FS; ret = dax_pfn_mkwrite(vma, vmf); + vmf->gfp_mask = old_gfp; + } up_read(&EXT4_I(inode)->i_mmap_sem); sb_end_pagefault(sb); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index d818c16..52202b4 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1474,7 +1474,11 @@ xfs_filemap_page_mkwrite( xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); if (IS_DAX(inode)) { + gfp_t old_gfp = vmf->gfp_mask; + + vmf->gfp_mask &= ~__GFP_FS; ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); + vmf->gfp_mask = old_gfp; } else { ret = iomap_page_mkwrite(vma, vmf, &xfs_iomap_ops); ret = block_page_mkwrite_return(ret); @@ -1502,13 +1506,16 @@ xfs_filemap_fault( xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); if (IS_DAX(inode)) { + gfp_t old_gfp = vmf->gfp_mask; /* * we do not want to trigger unwritten extent conversion on read * faults - that is unnecessary overhead and would also require * changes to xfs_get_blocks_direct() to map unwritten extent * ioend for conversion on read-only mappings. */ + vmf->gfp_mask &= ~__GFP_FS; ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops); + vmf->gfp_mask = old_gfp; } else ret = filemap_fault(vma, vmf); xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); @@ -1581,8 +1588,13 @@ xfs_filemap_pfn_mkwrite( size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT; if (vmf->pgoff >= size) ret = VM_FAULT_SIGBUS; - else if (IS_DAX(inode)) + else if (IS_DAX(inode)) { + gfp_t old_gfp = vmf->gfp_mask; + + vmf->gfp_mask &= ~__GFP_FS; ret = dax_pfn_mkwrite(vma, vmf); + vmf->gfp_mask = old_gfp; + } xfs_iunlock(ip, XFS_MMAPLOCK_SHARED); sb_end_pagefault(inode->i_sb); return ret;