Message ID | 20220316063927.2128383-1-yang.yang29@zte.com.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/psi: make PSI annotations of submit_bio only work for file pages | expand |
> @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page, > bio->bi_iter.bi_size += len; > bio->bi_vcnt++; > > - if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page))) > - bio_set_flag(bio, BIO_WORKINGSET); > + if (!bio_flagged(bio, BIO_WORKINGSET_FILE) && > + unlikely(PageWorkingset(page)) && !PageSwapBacked(page)) > + bio_set_flag(bio, BIO_WORKINGSET_FILE); This needs to go out of the block I/O fast path, not grow even more checks.
On Wed, Mar 16, 2022 at 01:48:05AM -0700, Christoph Hellwig wrote: > > @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page, > > bio->bi_iter.bi_size += len; > > bio->bi_vcnt++; > > > > - if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page))) > > - bio_set_flag(bio, BIO_WORKINGSET); > > + if (!bio_flagged(bio, BIO_WORKINGSET_FILE) && > > + unlikely(PageWorkingset(page)) && !PageSwapBacked(page)) > > + bio_set_flag(bio, BIO_WORKINGSET_FILE); > > This needs to go out of the block I/O fast path, not grow even more > checks. Thanks for your replying. First, Johannes Weiner had made his state, see: https://lore.kernel.org/all/Yio17pXawRuuVJFO@cmpxchg.org/ Second, I understand your concern, but actually there seems no better way to do file pages workingset delay accounting, because this is no unique function to track file pages submitting, kernel do this in multiple sub-system. Thirdly, this patch doesn't make it worse, indeed it reduce unnecessary psi accounting in submit_bio.
On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > From: Yang Yang <yang.yang29@zte.com.cn> > > psi tracks the time spent on submitting the IO of refaulting file pages > and anonymous pages[1]. But after we tracks refaulting anonymous pages > in swap_readpage[2][3], there is no need to track refaulting anonymous > pages in submit_bio. > > So this patch can reduce redundant calling of psi_memstall_enter. And > make it easier to track refaulting file pages and anonymous pages > separately. I don't think this is an improvement. psi_memstall_enter() will check current->in_memstall once, detect the nested call, and bail. Your patch checks PageSwapBacked for every page being added. It's more branches for less robust code.
On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > pages in submit_bio. > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > make it easier to track refaulting file pages and anonymous pages > > separately. > > I don't think this is an improvement. > > psi_memstall_enter() will check current->in_memstall once, detect the > nested call, and bail. Your patch checks PageSwapBacked for every page > being added. It's more branches for less robust code. We are also working for a new patch to classify different reasons cause psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help user to tuning sysctl, for example, if user see high compact delay, he may try do adjust THP sysctl to reduce the compact delay. To support that, we should distinguish what's the reason cause psi in submit_io(), this patch does the job.
On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > pages in submit_bio. > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > make it easier to track refaulting file pages and anonymous pages > > separately. > > I don't think this is an improvement. > > psi_memstall_enter() will check current->in_memstall once, detect the > nested call, and bail. Your patch checks PageSwapBacked for every page > being added. It's more branches for less robust code. And PageSwapBacked checking is after unlikely(PageWorkingset(page), so I think the impact is little.
On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote: > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > > pages in submit_bio. > > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > > make it easier to track refaulting file pages and anonymous pages > > > separately. > > > > I don't think this is an improvement. > > > > psi_memstall_enter() will check current->in_memstall once, detect the > > nested call, and bail. Your patch checks PageSwapBacked for every page > > being added. It's more branches for less robust code. > > We are also working for a new patch to classify different reasons cause > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help > user to tuning sysctl, for example, if user see high compact delay, he > may try do adjust THP sysctl to reduce the compact delay. > > To support that, we should distinguish what's the reason cause psi in > submit_io(), this patch does the job. Please submit these patches together then. On its own, this patch isn't desirable.
On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote: > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote: > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > > > pages in submit_bio. > > > > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > > > make it easier to track refaulting file pages and anonymous pages > > > > separately. > > > > > > I don't think this is an improvement. > > > > > > psi_memstall_enter() will check current->in_memstall once, detect the > > > nested call, and bail. Your patch checks PageSwapBacked for every page > > > being added. It's more branches for less robust code. > > > > We are also working for a new patch to classify different reasons cause > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help > > user to tuning sysctl, for example, if user see high compact delay, he > > may try do adjust THP sysctl to reduce the compact delay. > > > > To support that, we should distinguish what's the reason cause psi in > > submit_io(), this patch does the job. > > Please submit these patches together then. On its own, this patch > isn't desirable. I think this patch has it's independent value, I try to make a better explain. 1) This patch doesn't work it worse or even better After this patch, swap workingset handle is simpler, file workingset handle just has one more check, as below. Before this patch handling swap workingset: a) in swap_readpage() call psi_memstall_enter() -> b) in __bio_add_page() test if should call bio_set_flag(), true -> c) in __bio_add_page() call bio_set_flag() d) in submit_bio() test if should call psi_memstall_enter(), true -> e) call psi_memstall_enter, detect the nested call, and bail. f) call bio_clear_flag if needed. Before this patch handling file page workingset: a) in __bio_add_page() test if should call bio_set_flag(), true -> ... b) call bio_clear_flag if needed. After this patch handling swap workingset: a) in swap_readpage() call psi_memstall_enter() -> b) in __bio_add_page() test if should call bio_set_flag(), one more check, false and return. c) in submit_bio() test if should call psi_memstall_enter(), false and return. After this patch handling file pages workingset: a) in __bio_add_page() test if should call bio_set_flag(), one more check, true -> ... b) call bio_clear_flag if needed. 2) This patch help tools like kprobe to trace different workingset After this patch we know workingset in submit_io() is only cause by file pages. 3) This patch will help code evolution Such as psi classify, getdelays supports counting file pages workingset submit.
On Wed, Mar 23, 2022 at 06:11:03AM +0000, CGEL wrote: > On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote: > > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote: > > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > > > > pages in submit_bio. > > > > > > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > > > > make it easier to track refaulting file pages and anonymous pages > > > > > separately. > > > > > > > > I don't think this is an improvement. > > > > > > > > psi_memstall_enter() will check current->in_memstall once, detect the > > > > nested call, and bail. Your patch checks PageSwapBacked for every page > > > > being added. It's more branches for less robust code. > > > > > > We are also working for a new patch to classify different reasons cause > > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help > > > user to tuning sysctl, for example, if user see high compact delay, he > > > may try do adjust THP sysctl to reduce the compact delay. > > > > > > To support that, we should distinguish what's the reason cause psi in > > > submit_io(), this patch does the job. > > > > Please submit these patches together then. On its own, this patch > > isn't desirable. > I think this patch has it's independent value, I try to make a better > explain. > > 1) This patch doesn't work it worse or even better > After this patch, swap workingset handle is simpler, file workingset > handle just has one more check, as below. > Before this patch handling swap workingset: > a) in swap_readpage() call psi_memstall_enter() -> > b) in __bio_add_page() test if should call bio_set_flag(), true -> > c) in __bio_add_page() call bio_set_flag() > d) in submit_bio() test if should call psi_memstall_enter(), true -> > e) call psi_memstall_enter, detect the nested call, and bail. > f) call bio_clear_flag if needed. > Before this patch handling file page workingset: > a) in __bio_add_page() test if should call bio_set_flag(), true -> > ... > b) call bio_clear_flag if needed. > After this patch handling swap workingset: > a) in swap_readpage() call psi_memstall_enter() -> > b) in __bio_add_page() test if should call bio_set_flag(), one more check, false and return. > c) in submit_bio() test if should call psi_memstall_enter(), false and return. > After this patch handling file pages workingset: > a) in __bio_add_page() test if should call bio_set_flag(), one more check, true -> > ... > b) call bio_clear_flag if needed. > > 2) This patch help tools like kprobe to trace different workingset > After this patch we know workingset in submit_io() is only cause by file pages. > > 3) This patch will help code evolution > Such as psi classify, getdelays supports counting file pages workingset submit. Looking forward to your review, thanks!
On Wed, Mar 30, 2022 at 08:34:08AM +0000, CGEL wrote: > On Wed, Mar 23, 2022 at 06:11:03AM +0000, CGEL wrote: > > On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote: > > > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote: > > > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > > > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > > > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > > > > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > > > > > pages in submit_bio. > > > > > > > > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > > > > > make it easier to track refaulting file pages and anonymous pages > > > > > > separately. > > > > > > > > > > I don't think this is an improvement. > > > > > > > > > > psi_memstall_enter() will check current->in_memstall once, detect the > > > > > nested call, and bail. Your patch checks PageSwapBacked for every page > > > > > being added. It's more branches for less robust code. > > > > > > > > We are also working for a new patch to classify different reasons cause > > > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help > > > > user to tuning sysctl, for example, if user see high compact delay, he > > > > may try do adjust THP sysctl to reduce the compact delay. > > > > > > > > To support that, we should distinguish what's the reason cause psi in > > > > submit_io(), this patch does the job. > > > > > > Please submit these patches together then. On its own, this patch > > > isn't desirable. > > I think this patch has it's independent value, I try to make a better > > explain. You missed the point about it complicating semantics. Right now, the bio layer annotates stalls from queue contention. This is very simple. The swap code has relied on it in the past. It doesn't now, but that doesn't change what the concept is at the bio layer. You patch explicitly codifies that the MM handles swap IOs, and the lower bio layer handles files. This asymmetry is ugly and error prone. If you want type distinction, we should move it all into MM code, like Christoph is saying. Were swap code handles anon refaults and the page cache code handles file refaults. This would be my preferred layering, and my original patch did that: https://lkml.org/lkml/2019/7/22/1070. But it was NAKed, and I had to agree with the argument. The page cache code is not very centralized, and the place where we deal with individual pages (and detect refaults) and where we submit bios (where the stalls occur) is spread out into multiple filesystems. There are 180 submit_bio() calls in fs/; you'd have to audit which ones are for page cache submissions, and then add stall annotations or use a new submit_bio_cache() wrapper that handles it. Changes in the filesystem could easily miss this protocol and silently break pressure detection. I would prefer the annotations to be at this level, I just don't see how to do it cleanly/robustly. Maybe Christoph has an idea, he understands the fs side much better than I do.
On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote: > If you want type distinction, we should move it all into MM code, like > Christoph is saying. Were swap code handles anon refaults and the page > cache code handles file refaults. This would be my preferred layering, > and my original patch did that: https://lkml.org/lkml/2019/7/22/1070. FYI, I started redoing that version and I think with all the cleanups to filemap.c and the readahead code this can be done fairly nicely now: http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4
On 3/30/22 7:04 AM, Christoph Hellwig wrote: > On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote: >> If you want type distinction, we should move it all into MM code, like >> Christoph is saying. Were swap code handles anon refaults and the page >> cache code handles file refaults. This would be my preferred layering, >> and my original patch did that: https://lkml.org/lkml/2019/7/22/1070. > > FYI, I started redoing that version and I think with all the cleanups > to filemap.c and the readahead code this can be done fairly nicely now: > > http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4 This looks way better than hiding it deep down in the block layer.
On Wed, Mar 30, 2022 at 06:04:08AM -0700, Christoph Hellwig wrote: > On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote: > > If you want type distinction, we should move it all into MM code, like > > Christoph is saying. Were swap code handles anon refaults and the page > > cache code handles file refaults. This would be my preferred layering, > > and my original patch did that: https://lkml.org/lkml/2019/7/22/1070. > > FYI, I started redoing that version and I think with all the cleanups > to filemap.c and the readahead code this can be done fairly nicely now: > > http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4 Yes, it's definitely much nicer now with the MM instantiating the pages for ->readpage(s). But AFAICS this breaks compressed btrfs (and erofs?) because those still do additional add_to_page_cache_lru() and bio submissions.
On Wed, Mar 30, 2022 at 11:45:56AM -0400, Johannes Weiner wrote: > > FYI, I started redoing that version and I think with all the cleanups > > to filemap.c and the readahead code this can be done fairly nicely now: > > > > http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4 > > Yes, it's definitely much nicer now with the MM instantiating the > pages for ->readpage(s). > > But AFAICS this breaks compressed btrfs (and erofs?) because those > still do additional add_to_page_cache_lru() and bio submissions. In btrfs, add_ra_bio_pages only passed freshly allocated pages to add_to_page_cache_lru. These can't really have PageWorkingSet set, can they? In erofs they can also come from a local page pool, but I think otherwise the same applies.
On Wed, Mar 30, 2022 at 08:54:09AM -0700, Christoph Hellwig wrote: > On Wed, Mar 30, 2022 at 11:45:56AM -0400, Johannes Weiner wrote: > > > FYI, I started redoing that version and I think with all the cleanups > > > to filemap.c and the readahead code this can be done fairly nicely now: > > > > > > http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4 > > > > Yes, it's definitely much nicer now with the MM instantiating the > > pages for ->readpage(s). > > > > But AFAICS this breaks compressed btrfs (and erofs?) because those > > still do additional add_to_page_cache_lru() and bio submissions. > > In btrfs, add_ra_bio_pages only passed freshly allocated pages to > add_to_page_cache_lru. These can't really have PageWorkingSet set, > can they? In erofs they can also come from a local page pool, but > I think otherwise the same applies. It's add_to_page_cache_lru() that sets the flag. Basically, when a PageWorkingset (hot) page gets reclaimed, the bit is stored in the vacated tree slot. When the entry is brought back in, add_to_page_cache_lru() transfers it to the newly allocated page. add_to_page_cache_lru() filemap_add_folio() __filemap_add_folio(mapping, folio, index, gfp, &shadow) *shadow = *slot *slot = folio if (shadow) workingset_refault(folio, shadow) folio_set_workingset()
On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote: > On Wed, Mar 30, 2022 at 08:34:08AM +0000, CGEL wrote: > > On Wed, Mar 23, 2022 at 06:11:03AM +0000, CGEL wrote: > > > On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote: > > > > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote: > > > > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote: > > > > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote: > > > > > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > > > > > > > > > psi tracks the time spent on submitting the IO of refaulting file pages > > > > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages > > > > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous > > > > > > > pages in submit_bio. > > > > > > > > > > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And > > > > > > > make it easier to track refaulting file pages and anonymous pages > > > > > > > separately. > > > > > > > > > > > > I don't think this is an improvement. > > > > > > > > > > > > psi_memstall_enter() will check current->in_memstall once, detect the > > > > > > nested call, and bail. Your patch checks PageSwapBacked for every page > > > > > > being added. It's more branches for less robust code. > > > > > > > > > > We are also working for a new patch to classify different reasons cause > > > > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help > > > > > user to tuning sysctl, for example, if user see high compact delay, he > > > > > may try do adjust THP sysctl to reduce the compact delay. > > > > > > > > > > To support that, we should distinguish what's the reason cause psi in > > > > > submit_io(), this patch does the job. > > > > > > > > Please submit these patches together then. On its own, this patch > > > > isn't desirable. > > > I think this patch has it's independent value, I try to make a better > > > explain. > > You missed the point about it complicating semantics. > > Right now, the bio layer annotates stalls from queue contention. This > is very simple. The swap code has relied on it in the past. It doesn't > now, but that doesn't change what the concept is at the bio layer. > > You patch explicitly codifies that the MM handles swap IOs, and the > lower bio layer handles files. This asymmetry is ugly and error prone. > Yes this asymmetry is bad, we also don't think this patch is perfect. But just as you said below, page cache creating is spread out multiple filesystems, if we move psi_memstall_enter to the upper layer it would maybe horrible repetition and also error prone. I think the primary question is page cache code is not centralized, before it's be solved (not likely?), we may have to make a compromise, and I think this patch is a simple one. Thanks. > If you want type distinction, we should move it all into MM code, like > Christoph is saying. Were swap code handles anon refaults and the page > cache code handles file refaults. This would be my preferred layering, > and my original patch did that: https://lkml.org/lkml/2019/7/22/1070. > > But it was NAKed, and I had to agree with the argument. The page cache > code is not very centralized, and the place where we deal with > individual pages (and detect refaults) and where we submit bios (where > the stalls occur) is spread out into multiple filesystems. There are > 180 submit_bio() calls in fs/; you'd have to audit which ones are for > page cache submissions, and then add stall annotations or use a new > submit_bio_cache() wrapper that handles it. Changes in the filesystem > could easily miss this protocol and silently break pressure detection. > > I would prefer the annotations to be at this level, I just don't see > how to do it cleanly/robustly. Maybe Christoph has an idea, he > understands the fs side much better than I do.
On Wed, Mar 30, 2022 at 12:17:09PM -0400, Johannes Weiner wrote: > It's add_to_page_cache_lru() that sets the flag. > > Basically, when a PageWorkingset (hot) page gets reclaimed, the bit is > stored in the vacated tree slot. When the entry is brought back in, > add_to_page_cache_lru() transfers it to the newly allocated page. Ok. In this case my patch didn't quite do the right thing for readahead either. But that does leave a question for the btrfs compressed case, which only adds extra pages to a read to readahad a bigger cluster size - that is these pages are not read at the request of the VM. Does it really make sense to do PSI accounting for them in that case?
On Wed, Mar 30, 2022 at 10:15:32PM -0700, Christoph Hellwig wrote: > On Wed, Mar 30, 2022 at 12:17:09PM -0400, Johannes Weiner wrote: > > It's add_to_page_cache_lru() that sets the flag. > > > > Basically, when a PageWorkingset (hot) page gets reclaimed, the bit is > > stored in the vacated tree slot. When the entry is brought back in, > > add_to_page_cache_lru() transfers it to the newly allocated page. > > Ok. In this case my patch didn't quite do the right thing for readahead > either. But that does leave a question for the btrfs compressed > case, which only adds extra pages to a read to readahad a bigger > cluster size - that is these pages are not read at the request of the > VM. Does it really make sense to do PSI accounting for them in that > case? I think it does. I suppose it's an argument about readahead pages in general, which technically the workload itself doesn't commission explicitly. But those pages are still triggered by a nearby access, their reads contribute to device utilization, and if they're PageWorkingset it means they're only being read because there is a lack of memory. In a perfect world, readahead would stop when memory or IO are contended. But it doesn't, and the stalls it can inject into the workload are as real as stalls from directly requested reads.
diff --git a/block/bio.c b/block/bio.c index 3c57b3ba727d..54b60be4f3b0 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page, bio->bi_iter.bi_size += len; bio->bi_vcnt++; - if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page))) - bio_set_flag(bio, BIO_WORKINGSET); + if (!bio_flagged(bio, BIO_WORKINGSET_FILE) && + unlikely(PageWorkingset(page)) && !PageSwapBacked(page)) + bio_set_flag(bio, BIO_WORKINGSET_FILE); } EXPORT_SYMBOL_GPL(__bio_add_page); @@ -1254,7 +1255,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter) * is returned only if 0 pages could be pinned. * * It's intended for direct IO, so doesn't do PSI tracking, the caller is - * responsible for setting BIO_WORKINGSET if necessary. + * responsible for setting BIO_WORKINGSET_FILE if necessary. */ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { @@ -1274,7 +1275,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); /* don't account direct I/O as memory stall */ - bio_clear_flag(bio, BIO_WORKINGSET); + bio_clear_flag(bio, BIO_WORKINGSET_FILE); return bio->bi_vcnt ? 0 : ret; } EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); diff --git a/block/blk-core.c b/block/blk-core.c index ddac62aebc55..9a955323734b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -918,7 +918,7 @@ void submit_bio(struct bio *bio) * part of overall IO time. */ if (unlikely(bio_op(bio) == REQ_OP_READ && - bio_flagged(bio, BIO_WORKINGSET))) { + bio_flagged(bio, BIO_WORKINGSET_FILE))) { unsigned long pflags; psi_memstall_enter(&pflags); diff --git a/fs/direct-io.c b/fs/direct-io.c index aef06e607b40..7cdec50fb27b 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -420,7 +420,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) bio->bi_private = dio; /* don't account direct I/O as memory stall */ - bio_clear_flag(bio, BIO_WORKINGSET); + bio_clear_flag(bio, BIO_WORKINGSET_FILE); spin_lock_irqsave(&dio->bio_lock, flags); dio->refcount++; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 0c7a9a1f06c8..a1aaba4767e9 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -314,7 +314,7 @@ enum { BIO_NO_PAGE_REF, /* don't put release vec pages */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ - BIO_WORKINGSET, /* contains userspace workingset pages */ + BIO_WORKINGSET_FILE, /* contains userspace workingset file pages */ BIO_QUIET, /* Make BIO Quiet */ BIO_CHAIN, /* chained bio, ->bi_remaining in effect */ BIO_REFFED, /* bio has elevated ->bi_cnt */