Message ID | a59510f4552a5d3557958cdb0ce1b23b3abfc75b.1418618044.git.osandov@osandov.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun 14-12-14 21:26:56, Omar Sandoval wrote: > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to > be held, but most direct_IO implementations do. I think you are speaking about direct IO writes only, aren't you? For DIO reads we don't hold i_mutex AFAICS. And also for DIO writes we don't necessarily hold i_mutex - see for example XFS which doesn't take i_mutex for direct IO writes. It uses it's internal rwlock for this (see xfs_file_dio_aio_write()). So I think this is just wrong. Honza > Signed-off-by: Omar Sandoval <osandov@osandov.com> > --- > mm/page_io.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/page_io.c b/mm/page_io.c > index 955db8b..1630ac0 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -263,6 +263,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, > if (sis->flags & SWP_FILE) { > struct kiocb kiocb; > struct file *swap_file = sis->swap_file; > + struct inode *inode = file_inode(swap_file); > struct address_space *mapping = swap_file->f_mapping; > struct bio_vec bv = { > .bv_page = page, > @@ -283,9 +284,11 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, > > set_page_writeback(page); > unlock_page(page); > + mutex_lock(&inode->i_mutex); > ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE, > &kiocb, &from, > kiocb.ki_pos); > + mutex_unlock(&inode->i_mutex); > if (ret == PAGE_SIZE) { > count_vm_event(PSWPOUT); > ret = 0; > -- > 2.1.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote: > On Sun 14-12-14 21:26:56, Omar Sandoval wrote: > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to > > be held, but most direct_IO implementations do. > I think you are speaking about direct IO writes only, aren't you? For DIO > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex > for direct IO writes. It uses it's internal rwlock for this (see > xfs_file_dio_aio_write()). So I think this is just wrong. The problem is that the use of ->direct_IO by the swap code is a gross layering violation. ->direct_IO is a callback for the filesystem, and the swap code need to call ->read_iter instead of ->readpage and ->write_tier instead of ->direct_IO, and leave the locking to the filesystem. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote: > On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote: > > On Sun 14-12-14 21:26:56, Omar Sandoval wrote: > > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS > > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to > > > be held, but most direct_IO implementations do. > > I think you are speaking about direct IO writes only, aren't you? For DIO > > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't > > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex > > for direct IO writes. It uses it's internal rwlock for this (see > > xfs_file_dio_aio_write()). So I think this is just wrong. > > The problem is that the use of ->direct_IO by the swap code is a gross > layering violation. ->direct_IO is a callback for the filesystem, and > the swap code need to call ->read_iter instead of ->readpage and > ->write_tier instead of ->direct_IO, and leave the locking to the > filesystem. > Ok, I got the swap code working with ->read_iter/->write_iter without too much trouble. I wanted to double check before I submit if there's any gotchas involved with adding the O_DIRECT flag to a file pointer after it has been opened -- swapon opens the swapfile before we know if we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so ->{read,write}_iter use direct I/O, but we can't add O_DIRECT to the original open without excluding filesystems that support the old bmap path but not direct I/O.
On Mon, Dec 15, 2014 at 02:11:00PM -0800, Omar Sandoval wrote: > Ok, I got the swap code working with ->read_iter/->write_iter without > too much trouble. I wanted to double check before I submit if there's > any gotchas involved with adding the O_DIRECT flag to a file pointer > after it has been opened -- swapon opens the swapfile before we know if > we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so > ->{read,write}_iter use direct I/O, but we can't add O_DIRECT to the > original open without excluding filesystems that support the old bmap > path but not direct I/O. In general just adding O_DIRECT is a problem. However given that the swap file is locked against any other access while in use it seems ok in this particular case. Just make sure to clear it on swapoff, and write a detailed comment explaining the situation. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote: > On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote: > > On Sun 14-12-14 21:26:56, Omar Sandoval wrote: > > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS > > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to > > > be held, but most direct_IO implementations do. > > I think you are speaking about direct IO writes only, aren't you? For DIO > > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't > > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex > > for direct IO writes. It uses it's internal rwlock for this (see > > xfs_file_dio_aio_write()). So I think this is just wrong. > > The problem is that the use of ->direct_IO by the swap code is a gross > layering violation. ->direct_IO is a callback for the filesystem, and > the swap code need to call ->read_iter instead of ->readpage and > ->write_tier instead of ->direct_IO, and leave the locking to the > filesystem. The thing is, ->read_iter() and ->write_iter() might decide to fall back to buffered IO path. XFS is unusual in that respect - there O_DIRECT ends up with short write in such case. Other filesystems, OTOH... -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Dec 20, 2014 at 06:51:33AM +0000, Al Viro wrote: > On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote: > > On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote: > > > On Sun 14-12-14 21:26:56, Omar Sandoval wrote: > > > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS > > > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to > > > > be held, but most direct_IO implementations do. > > > I think you are speaking about direct IO writes only, aren't you? For DIO > > > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't > > > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex > > > for direct IO writes. It uses it's internal rwlock for this (see > > > xfs_file_dio_aio_write()). So I think this is just wrong. > > > > The problem is that the use of ->direct_IO by the swap code is a gross > > layering violation. ->direct_IO is a callback for the filesystem, and > > the swap code need to call ->read_iter instead of ->readpage and > > ->write_tier instead of ->direct_IO, and leave the locking to the > > filesystem. > > The thing is, ->read_iter() and ->write_iter() might decide to fall back to > buffered IO path. XFS is unusual in that respect - there O_DIRECT ends up > with short write in such case. Other filesystems, OTOH... Alright, now what? Using ->direct_IO directly is pretty much a no go because of the different locking conventions as was pointed out. Maybe some "no, really, just direct I/O" iocb flag?
On Sat, Dec 20, 2014 at 06:51:33AM +0000, Al Viro wrote: > > The problem is that the use of ->direct_IO by the swap code is a gross > > layering violation. ->direct_IO is a callback for the filesystem, and > > the swap code need to call ->read_iter instead of ->readpage and > > ->write_tier instead of ->direct_IO, and leave the locking to the > > filesystem. > > The thing is, ->read_iter() and ->write_iter() might decide to fall back to > buffered IO path. XFS is unusual in that respect - there O_DIRECT ends up > with short write in such case. Other filesystems, OTOH... We'll just need a ->swap_activate method that makes sure we really do direct I/O. For all filesystems currently suporting swap just checking that all blocks are allocated (as the ->bmap path already does) should be enough. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/page_io.c b/mm/page_io.c index 955db8b..1630ac0 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -263,6 +263,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, if (sis->flags & SWP_FILE) { struct kiocb kiocb; struct file *swap_file = sis->swap_file; + struct inode *inode = file_inode(swap_file); struct address_space *mapping = swap_file->f_mapping; struct bio_vec bv = { .bv_page = page, @@ -283,9 +284,11 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, set_page_writeback(page); unlock_page(page); + mutex_lock(&inode->i_mutex); ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE, &kiocb, &from, kiocb.ki_pos); + mutex_unlock(&inode->i_mutex); if (ret == PAGE_SIZE) { count_vm_event(PSWPOUT); ret = 0;
The generic write code locks i_mutex for a direct_IO. Swap-over-NFS doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to be held, but most direct_IO implementations do. Signed-off-by: Omar Sandoval <osandov@osandov.com> --- mm/page_io.c | 3 +++ 1 file changed, 3 insertions(+)