diff mbox series

[v3,9/9] mm: Remove swap BIO paths and only use DIO paths

Message ID 163250396319.2330363.10564506508011638258.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series mm: Use DIO for swap and fix NFS swapfiles | expand

Commit Message

David Howells Sept. 24, 2021, 5:19 p.m. UTC
Delete the BIO-generating swap read/write paths and always use ->swap_rw().
This puts the mapping layer in the filesystem.

[!] ALSO: Add a compile-time knob to disable swap by asynchronous DIO, only
    using synchronous DIO.  Async DIO doesn't seem to work, with ATA errors
    being chucked out by the swap-on-blockdev and swapfile-on-XFS.  It also
    misbehaves on NFS.

I have tested this with sync DIO on ext4-swapfile, xfs-swapfile, a raw
blockdev and NFS.  The first three work; NFS works for a while then grinds to
a halt, chucking out lists of blocked sunrpc operations (I suspect it can't
allocate memory somewhere).

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Darrick J. Wong <djwong@kernel.org>
cc: linux-block@vger.kernel.org
cc: linux-xfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

 mm/page_io.c  |  156 +++------------------------------------------------------
 mm/swapfile.c |    4 +
 2 files changed, 10 insertions(+), 150 deletions(-)

Comments

Matthew Wilcox Sept. 25, 2021, 2:56 p.m. UTC | #1
On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
> Delete the BIO-generating swap read/write paths and always use ->swap_rw().
> This puts the mapping layer in the filesystem.

Is SWP_FS_OPS now unused after this patch?

Also, do we still need ->swap_activate and ->swap_deactivate?
David Howells Sept. 25, 2021, 3:36 p.m. UTC | #2
Matthew Wilcox <willy@infradead.org> wrote:

> On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
> > Delete the BIO-generating swap read/write paths and always use ->swap_rw().
> > This puts the mapping layer in the filesystem.
> 
> Is SWP_FS_OPS now unused after this patch?

Ummm.  Interesting question - it's only used in swap_set_page_dirty():

int swap_set_page_dirty(struct page *page)
{
	struct swap_info_struct *sis = page_swap_info(page);

	if (data_race(sis->flags & SWP_FS_OPS)) {
		struct address_space *mapping = sis->swap_file->f_mapping;

		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
		return mapping->a_ops->set_page_dirty(page);
	} else {
		return __set_page_dirty_no_writeback(page);
	}
}


> Also, do we still need ->swap_activate and ->swap_deactivate?

f2fs does quite a lot of work in its ->swap_activate(), as does btrfs.  I'm
not sure how necessary it is.  cifs looks like it intends to use it, but it's
not fully implemented yet.  zonefs and nfs do some checking, including hole
checking in nfs's case.  nfs also does some setting up for the sunrpc
transport.

btrfs, cifs, f2fs and nfs all supply ->swap_deactivate() to undo the effects
of the activation.

David
Matthew Wilcox Sept. 25, 2021, 5:09 p.m. UTC | #3
On Sat, Sep 25, 2021 at 04:36:42PM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
> > > Delete the BIO-generating swap read/write paths and always use ->swap_rw().
> > > This puts the mapping layer in the filesystem.
> > 
> > Is SWP_FS_OPS now unused after this patch?
> 
> Ummm.  Interesting question - it's only used in swap_set_page_dirty():
> 
> int swap_set_page_dirty(struct page *page)
> {
> 	struct swap_info_struct *sis = page_swap_info(page);
> 
> 	if (data_race(sis->flags & SWP_FS_OPS)) {
> 		struct address_space *mapping = sis->swap_file->f_mapping;
> 
> 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> 		return mapping->a_ops->set_page_dirty(page);
> 	} else {
> 		return __set_page_dirty_no_writeback(page);
> 	}
> }

I suspect that's no longer necessary.  NFS was the only filesystem
using SWP_FS_OPS and ...

fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,

so it's not like NFS does anything special to reserve memory to write
back swap pages.

> > Also, do we still need ->swap_activate and ->swap_deactivate?
> 
> f2fs does quite a lot of work in its ->swap_activate(), as does btrfs.  I'm
> not sure how necessary it is.  cifs looks like it intends to use it, but it's
> not fully implemented yet.  zonefs and nfs do some checking, including hole
> checking in nfs's case.  nfs also does some setting up for the sunrpc
> transport.
> 
> btrfs, cifs, f2fs and nfs all supply ->swap_deactivate() to undo the effects
> of the activation.

Right ... so my question really is, now that we're doing I/O through
aops->direct_IO (or ->swap_rw), do those magic things need to be done?
After all, open(O_DIRECT) doesn't do these same magic things.  They're
really there to allow the direct-to-BIO path to work, and you're removing
that here.
Damien Le Moal Sept. 26, 2021, 11:08 p.m. UTC | #4
On 2021/09/26 2:09, Matthew Wilcox wrote:
> On Sat, Sep 25, 2021 at 04:36:42PM +0100, David Howells wrote:
>> Matthew Wilcox <willy@infradead.org> wrote:
>>
>>> On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
>>>> Delete the BIO-generating swap read/write paths and always use ->swap_rw().
>>>> This puts the mapping layer in the filesystem.
>>>
>>> Is SWP_FS_OPS now unused after this patch?
>>
>> Ummm.  Interesting question - it's only used in swap_set_page_dirty():
>>
>> int swap_set_page_dirty(struct page *page)
>> {
>> 	struct swap_info_struct *sis = page_swap_info(page);
>>
>> 	if (data_race(sis->flags & SWP_FS_OPS)) {
>> 		struct address_space *mapping = sis->swap_file->f_mapping;
>>
>> 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>> 		return mapping->a_ops->set_page_dirty(page);
>> 	} else {
>> 		return __set_page_dirty_no_writeback(page);
>> 	}
>> }
> 
> I suspect that's no longer necessary.  NFS was the only filesystem
> using SWP_FS_OPS and ...
> 
> fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,
> 
> so it's not like NFS does anything special to reserve memory to write
> back swap pages.
> 
>>> Also, do we still need ->swap_activate and ->swap_deactivate?
>>
>> f2fs does quite a lot of work in its ->swap_activate(), as does btrfs.  I'm
>> not sure how necessary it is.  cifs looks like it intends to use it, but it's
>> not fully implemented yet.  zonefs and nfs do some checking, including hole
>> checking in nfs's case.  nfs also does some setting up for the sunrpc
>> transport.
>>
>> btrfs, cifs, f2fs and nfs all supply ->swap_deactivate() to undo the effects
>> of the activation.
> 
> Right ... so my question really is, now that we're doing I/O through
> aops->direct_IO (or ->swap_rw), do those magic things need to be done?
> After all, open(O_DIRECT) doesn't do these same magic things.  They're
> really there to allow the direct-to-BIO path to work, and you're removing
> that here.

For zonefs, ->swap_activate() checks that the user is not trying to use a
sequential write only file for swap. Swap cannot work on these files as there
are no guarantees that the writes will be sequential.
Dave Chinner Sept. 27, 2021, 1:25 a.m. UTC | #5
On Mon, Sep 27, 2021 at 08:08:53AM +0900, Damien Le Moal wrote:
> On 2021/09/26 2:09, Matthew Wilcox wrote:
> > On Sat, Sep 25, 2021 at 04:36:42PM +0100, David Howells wrote:
> >> Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >>> On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
> >>>> Delete the BIO-generating swap read/write paths and always use ->swap_rw().
> >>>> This puts the mapping layer in the filesystem.
> >>>
> >>> Is SWP_FS_OPS now unused after this patch?
> >>
> >> Ummm.  Interesting question - it's only used in swap_set_page_dirty():
> >>
> >> int swap_set_page_dirty(struct page *page)
> >> {
> >> 	struct swap_info_struct *sis = page_swap_info(page);
> >>
> >> 	if (data_race(sis->flags & SWP_FS_OPS)) {
> >> 		struct address_space *mapping = sis->swap_file->f_mapping;
> >>
> >> 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> >> 		return mapping->a_ops->set_page_dirty(page);
> >> 	} else {
> >> 		return __set_page_dirty_no_writeback(page);
> >> 	}
> >> }
> > 
> > I suspect that's no longer necessary.  NFS was the only filesystem
> > using SWP_FS_OPS and ...
> > 
> > fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,
> > 
> > so it's not like NFS does anything special to reserve memory to write
> > back swap pages.
> > 
> >>> Also, do we still need ->swap_activate and ->swap_deactivate?
> >>
> >> f2fs does quite a lot of work in its ->swap_activate(), as does btrfs.  I'm
> >> not sure how necessary it is.  cifs looks like it intends to use it, but it's
> >> not fully implemented yet.  zonefs and nfs do some checking, including hole
> >> checking in nfs's case.  nfs also does some setting up for the sunrpc
> >> transport.
> >>
> >> btrfs, cifs, f2fs and nfs all supply ->swap_deactivate() to undo the effects
> >> of the activation.
> > 
> > Right ... so my question really is, now that we're doing I/O through
> > aops->direct_IO (or ->swap_rw), do those magic things need to be done?
> > After all, open(O_DIRECT) doesn't do these same magic things.  They're
> > really there to allow the direct-to-BIO path to work, and you're removing
> > that here.
> 
> For zonefs, ->swap_activate() checks that the user is not trying to use a
> sequential write only file for swap. Swap cannot work on these files as there
> are no guarantees that the writes will be sequential.

iomap_swapfile_activate() is used by ext4, XFS and zonefs. It checks
there are no holes in the file, no shared extents, no inline
extents, the swap info block device matches the block device the
extent is mapped to (i.e. filesystems can have more than one bdev,
swapfile only supports files on sb->s_bdev), etc.

Also, I noticed, iomap_swapfile_add_extent() filters out extents
that are smaller than PAGE_SIZE, and aligns larger extents to
PAGE_SIZE. This allows ensures that when fs block size != PAGE_SIZE
that only a single IO per page being swapped is required. i.e. the
DIO path may change the "one page, one bio, one IO" behaviour that
the current swapfile mapping guarantees.

Cheers,

Dave.
Damien Le Moal Sept. 27, 2021, 1:41 a.m. UTC | #6
On 2021/09/27 10:25, Dave Chinner wrote:
> On Mon, Sep 27, 2021 at 08:08:53AM +0900, Damien Le Moal wrote:
>> On 2021/09/26 2:09, Matthew Wilcox wrote:
>>> On Sat, Sep 25, 2021 at 04:36:42PM +0100, David Howells wrote:
>>>> Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>>> On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
>>>>>> Delete the BIO-generating swap read/write paths and always use ->swap_rw().
>>>>>> This puts the mapping layer in the filesystem.
>>>>>
>>>>> Is SWP_FS_OPS now unused after this patch?
>>>>
>>>> Ummm.  Interesting question - it's only used in swap_set_page_dirty():
>>>>
>>>> int swap_set_page_dirty(struct page *page)
>>>> {
>>>> 	struct swap_info_struct *sis = page_swap_info(page);
>>>>
>>>> 	if (data_race(sis->flags & SWP_FS_OPS)) {
>>>> 		struct address_space *mapping = sis->swap_file->f_mapping;
>>>>
>>>> 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>>>> 		return mapping->a_ops->set_page_dirty(page);
>>>> 	} else {
>>>> 		return __set_page_dirty_no_writeback(page);
>>>> 	}
>>>> }
>>>
>>> I suspect that's no longer necessary.  NFS was the only filesystem
>>> using SWP_FS_OPS and ...
>>>
>>> fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,
>>>
>>> so it's not like NFS does anything special to reserve memory to write
>>> back swap pages.
>>>
>>>>> Also, do we still need ->swap_activate and ->swap_deactivate?
>>>>
>>>> f2fs does quite a lot of work in its ->swap_activate(), as does btrfs.  I'm
>>>> not sure how necessary it is.  cifs looks like it intends to use it, but it's
>>>> not fully implemented yet.  zonefs and nfs do some checking, including hole
>>>> checking in nfs's case.  nfs also does some setting up for the sunrpc
>>>> transport.
>>>>
>>>> btrfs, cifs, f2fs and nfs all supply ->swap_deactivate() to undo the effects
>>>> of the activation.
>>>
>>> Right ... so my question really is, now that we're doing I/O through
>>> aops->direct_IO (or ->swap_rw), do those magic things need to be done?
>>> After all, open(O_DIRECT) doesn't do these same magic things.  They're
>>> really there to allow the direct-to-BIO path to work, and you're removing
>>> that here.
>>
>> For zonefs, ->swap_activate() checks that the user is not trying to use a
>> sequential write only file for swap. Swap cannot work on these files as there
>> are no guarantees that the writes will be sequential.
> 
> iomap_swapfile_activate() is used by ext4, XFS and zonefs. It checks
> there are no holes in the file, no shared extents, no inline
> extents, the swap info block device matches the block device the
> extent is mapped to (i.e. filesystems can have more than one bdev,
> swapfile only supports files on sb->s_bdev), etc.

OK. But I was referring to the additional check in zonefs_swap_activate() before
iomap_swapfile_activate() is called. We must prevent that function from being
called for a full sequential write only zone file since such file will pass all
checks (no hole, all extents written etc) but cannot be used for swap since it
is not writtable when full (no overwrites allowed in sequential zones).

> 
> Also, I noticed, iomap_swapfile_add_extent() filters out extents
> that are smaller than PAGE_SIZE, and aligns larger extents to
> PAGE_SIZE. This allows ensures that when fs block size != PAGE_SIZE
> that only a single IO per page being swapped is required. i.e. the
> DIO path may change the "one page, one bio, one IO" behaviour that
> the current swapfile mapping guarantees.
> 
> Cheers,
> 
> Dave.
>
David Sterba Sept. 27, 2021, 8:03 p.m. UTC | #7
On Sat, Sep 25, 2021 at 04:36:42PM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Fri, Sep 24, 2021 at 06:19:23PM +0100, David Howells wrote:
> > > Delete the BIO-generating swap read/write paths and always use ->swap_rw().
> > > This puts the mapping layer in the filesystem.
> > 
> > Is SWP_FS_OPS now unused after this patch?
> 
> Ummm.  Interesting question - it's only used in swap_set_page_dirty():
> 
> int swap_set_page_dirty(struct page *page)
> {
> 	struct swap_info_struct *sis = page_swap_info(page);
> 
> 	if (data_race(sis->flags & SWP_FS_OPS)) {
> 		struct address_space *mapping = sis->swap_file->f_mapping;
> 
> 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> 		return mapping->a_ops->set_page_dirty(page);
> 	} else {
> 		return __set_page_dirty_no_writeback(page);
> 	}
> }
> 
> 
> > Also, do we still need ->swap_activate and ->swap_deactivate?
> 
> f2fs does quite a lot of work in its ->swap_activate(), as does btrfs.  I'm
> not sure how necessary it is.

Yes we still need it for btrfs. Besides checking the conditions similar
to what iomap_swapfile_activate does on the file itself, we need to
exclude other operations potentially changing the mapping on the level
of block groups. This is namely relocation, used to implement several
other things like resize or balance. There's an exclusion at the
beginning of btrfs_swap_activate. Right now I don't see how we could
make sure that the swapfile requirements would be satisfied without it.
diff mbox series

Patch

diff --git a/mm/page_io.c b/mm/page_io.c
index 8f1199d59162..b48318951380 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -26,6 +26,8 @@ 
 #include <linux/uio.h>
 #include <linux/sched/task.h>
 
+#define ONLY_USE_SYNC_DIO 1
+
 /*
  * Keep track of the kiocb we're using to do async DIO.  We have to
  * refcount it until various things stop looking at the kiocb *after*
@@ -42,30 +44,6 @@  static void swapfile_put_kiocb(struct swapfile_kiocb *ki)
 		kfree(ki);
 }
 
-static void end_swap_bio_write(struct bio *bio)
-{
-	struct page *page = bio_first_page_all(bio);
-
-	if (bio->bi_status) {
-		SetPageError(page);
-		/*
-		 * We failed to write the page out to swap-space.
-		 * Re-dirty the page in order to avoid it being reclaimed.
-		 * Also print a dire warning that things will go BAD (tm)
-		 * very quickly.
-		 *
-		 * Also clear PG_reclaim to avoid rotate_reclaimable_page()
-		 */
-		set_page_dirty(page);
-		pr_alert_ratelimited("Write-error on swap-device (%u:%u:%llu)\n",
-				     MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)),
-				     (unsigned long long)bio->bi_iter.bi_sector);
-		ClearPageReclaim(page);
-	}
-	end_page_writeback(page);
-	bio_put(bio);
-}
-
 static void swap_slot_free_notify(struct page *page)
 {
 	struct swap_info_struct *sis;
@@ -114,32 +92,6 @@  static void swap_slot_free_notify(struct page *page)
 	}
 }
 
-static void end_swap_bio_read(struct bio *bio)
-{
-	struct page *page = bio_first_page_all(bio);
-	struct task_struct *waiter = bio->bi_private;
-
-	if (bio->bi_status) {
-		SetPageError(page);
-		ClearPageUptodate(page);
-		pr_alert_ratelimited("Read-error on swap-device (%u:%u:%llu)\n",
-				     MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)),
-				     (unsigned long long)bio->bi_iter.bi_sector);
-		goto out;
-	}
-
-	SetPageUptodate(page);
-	swap_slot_free_notify(page);
-out:
-	unlock_page(page);
-	WRITE_ONCE(bio->bi_private, NULL);
-	bio_put(bio);
-	if (waiter) {
-		blk_wake_io_task(waiter);
-		put_task_struct(waiter);
-	}
-}
-
 int generic_swapfile_activate(struct swap_info_struct *sis,
 				struct file *swap_file,
 				sector_t *span)
@@ -279,25 +231,6 @@  static inline void count_swpout_vm_event(struct page *page)
 	count_vm_events(PSWPOUT, thp_nr_pages(page));
 }
 
-#if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
-static void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
-{
-	struct cgroup_subsys_state *css;
-	struct mem_cgroup *memcg;
-
-	memcg = page_memcg(page);
-	if (!memcg)
-		return;
-
-	rcu_read_lock();
-	css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys);
-	bio_associate_blkg_from_css(bio, css);
-	rcu_read_unlock();
-}
-#else
-#define bio_associate_blkg_from_page(bio, page)		do { } while (0)
-#endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */
-
 static void swapfile_write_complete(struct page *page, long ret)
 {
 	if (ret == thp_size(page)) {
@@ -364,7 +297,7 @@  static int swapfile_write(struct swap_info_struct *sis,
 
 	iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE);
 
-	if (wbc->sync_mode == WB_SYNC_ALL)
+	if (ONLY_USE_SYNC_DIO || wbc->sync_mode == WB_SYNC_ALL)
 		return swapfile_write_sync(sis, page, wbc, &from);
 
 	ki = kzalloc(sizeof(*ki), GFP_KERNEL);
@@ -390,40 +323,17 @@  static int swapfile_write(struct swap_info_struct *sis,
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc)
 {
-	struct bio *bio;
-	int ret;
 	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
-	if (data_race(sis->flags & SWP_FS_OPS))
-		return swapfile_write(sis, page, wbc);
-
-	ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
-	if (!ret) {
-		count_swpout_vm_event(page);
-		return 0;
-	}
-
-	bio = bio_alloc(GFP_NOIO, 1);
-	bio_set_dev(bio, sis->bdev);
-	bio->bi_iter.bi_sector = swap_page_sector(page);
-	bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
-	bio->bi_end_io = end_swap_bio_write;
-	bio_add_page(bio, page, thp_size(page), 0);
-
-	bio_associate_blkg_from_page(bio, page);
-	count_swpout_vm_event(page);
-	set_page_writeback(page);
-	unlock_page(page);
-	submit_bio(bio);
-
-	return 0;
+	return swapfile_write(sis, page, wbc);
 }
 
 static void swapfile_read_complete(struct page *page, long ret)
 {
 	if (ret == page_size(page)) {
 		count_vm_event(PSWPIN);
+		swap_slot_free_notify(page);
 		SetPageUptodate(page);
 	} else {
 		SetPageError(page);
@@ -473,7 +383,7 @@  static void swapfile_read(struct swap_info_struct *sis, struct page *page,
 
 	iov_iter_bvec(&to, READ, &bv, 1, thp_size(page));
 
-	if (synchronous)
+	if (ONLY_USE_SYNC_DIO || synchronous)
 		return swapfile_read_sync(sis, page, &to);
 
 	ki = kzalloc(sizeof(*ki), GFP_KERNEL);
@@ -495,10 +405,7 @@  static void swapfile_read(struct swap_info_struct *sis, struct page *page,
 
 void swap_readpage(struct page *page, bool synchronous)
 {
-	struct bio *bio;
 	struct swap_info_struct *sis = page_swap_info(page);
-	blk_qc_t qc;
-	struct gendisk *disk;
 	unsigned long pflags;
 
 	VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
@@ -515,58 +422,9 @@  void swap_readpage(struct page *page, bool synchronous)
 	if (frontswap_load(page) == 0) {
 		SetPageUptodate(page);
 		unlock_page(page);
-		goto out;
-	}
-
-	if (data_race(sis->flags & SWP_FS_OPS)) {
+	} else {
 		swapfile_read(sis, page, synchronous);
-		goto out;
 	}
-
-	if (sis->flags & SWP_SYNCHRONOUS_IO) {
-		if (!bdev_read_page(sis->bdev, swap_page_sector(page), page)) {
-			if (trylock_page(page)) {
-				swap_slot_free_notify(page);
-				unlock_page(page);
-			}
-
-			count_vm_event(PSWPIN);
-			goto out;
-		}
-	}
-
-	bio = bio_alloc(GFP_KERNEL, 1);
-	bio_set_dev(bio, sis->bdev);
-	bio->bi_opf = REQ_OP_READ;
-	bio->bi_iter.bi_sector = swap_page_sector(page);
-	bio->bi_end_io = end_swap_bio_read;
-	bio_add_page(bio, page, thp_size(page), 0);
-
-	disk = bio->bi_bdev->bd_disk;
-	/*
-	 * Keep this task valid during swap readpage because the oom killer may
-	 * attempt to access it in the page fault retry time check.
-	 */
-	if (synchronous) {
-		bio->bi_opf |= REQ_HIPRI;
-		get_task_struct(current);
-		bio->bi_private = current;
-	}
-	count_vm_event(PSWPIN);
-	bio_get(bio);
-	qc = submit_bio(bio);
-	while (synchronous) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		if (!READ_ONCE(bio->bi_private))
-			break;
-
-		if (!blk_poll(disk->queue, qc, true))
-			blk_io_schedule();
-	}
-	__set_current_state(TASK_RUNNING);
-	bio_put(bio);
-
-out:
 	psi_memstall_leave(&pflags);
 }
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 22d10f713848..95d2571e3727 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2918,6 +2918,8 @@  static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 			return -EINVAL;
 		p->flags |= SWP_BLKDEV;
 	} else if (S_ISREG(inode->i_mode)) {
+		if (!inode->i_mapping->a_ops->swap_rw)
+			return -EINVAL;
 		p->bdev = inode->i_sb->s_bdev;
 	}
 
@@ -3165,7 +3167,7 @@  SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		name = NULL;
 		goto bad_swap;
 	}
-	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+	swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
 	if (IS_ERR(swap_file)) {
 		error = PTR_ERR(swap_file);
 		swap_file = NULL;