mbox series

[0/3] mincore() and IOCB_NOWAIT adjustments

Message ID 20190130124420.1834-1-vbabka@suse.cz (mailing list archive)
Headers show
Series mincore() and IOCB_NOWAIT adjustments | expand

Message

Vlastimil Babka Jan. 30, 2019, 12:44 p.m. UTC
I've collected the patches from the discussion for formal posting. The first
two should be settled already, third one is the possible improvement I've
mentioned earlier, where only in restricted case we resort to existence of page
table mapping (the original and later reverted approach from Linus) instead of
faking the result completely. Review and testing welcome.

The consensus seems to be going through -mm tree for 5.1, unless Linus wants
them alredy for 5.0.

Jiri Kosina (2):
  mm/mincore: make mincore() more conservative
  mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O

Vlastimil Babka (1):
  mm/mincore: provide mapped status when cached status is not allowed

 mm/filemap.c |  2 --
 mm/mincore.c | 54 ++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 42 insertions(+), 14 deletions(-)

Comments

Jiri Kosina March 6, 2019, 12:11 p.m. UTC | #1
On Wed, 30 Jan 2019, Vlastimil Babka wrote:

> I've collected the patches from the discussion for formal posting. The first
> two should be settled already, third one is the possible improvement I've
> mentioned earlier, where only in restricted case we resort to existence of page
> table mapping (the original and later reverted approach from Linus) instead of
> faking the result completely. Review and testing welcome.
> 
> The consensus seems to be going through -mm tree for 5.1, unless Linus wants
> them alredy for 5.0.
> 
> Jiri Kosina (2):
>   mm/mincore: make mincore() more conservative
>   mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
> 
> Vlastimil Babka (1):
>   mm/mincore: provide mapped status when cached status is not allowed

Andrew,

could you please take at least the correct and straightforward fix for 
mincore() before we figure out how to deal with the slightly less 
practical RWF_NOWAIT? Thanks.
Andrew Morton March 6, 2019, 10:35 p.m. UTC | #2
On Wed, 6 Mar 2019 13:11:39 +0100 (CET) Jiri Kosina <jikos@kernel.org> wrote:

> On Wed, 30 Jan 2019, Vlastimil Babka wrote:
> 
> > I've collected the patches from the discussion for formal posting. The first
> > two should be settled already, third one is the possible improvement I've
> > mentioned earlier, where only in restricted case we resort to existence of page
> > table mapping (the original and later reverted approach from Linus) instead of
> > faking the result completely. Review and testing welcome.
> > 
> > The consensus seems to be going through -mm tree for 5.1, unless Linus wants
> > them alredy for 5.0.
> > 
> > Jiri Kosina (2):
> >   mm/mincore: make mincore() more conservative
> >   mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
> > 
> > Vlastimil Babka (1):
> >   mm/mincore: provide mapped status when cached status is not allowed
> 
> Andrew,
> 
> could you please take at least the correct and straightforward fix for 
> mincore() before we figure out how to deal with the slightly less 
> practical RWF_NOWAIT? Thanks.

I assume we're talking about [1/3] and [2/3] from this thread?

Can we have a resend please?  Gather the various acks and revisions,
make changelog changes to address the review questions and comments?

Thanks.
Jiri Kosina March 6, 2019, 10:48 p.m. UTC | #3
On Wed, 6 Mar 2019, Andrew Morton wrote:

> > could you please take at least the correct and straightforward fix for 
> > mincore() before we figure out how to deal with the slightly less 
> > practical RWF_NOWAIT? Thanks.
> 
> I assume we're talking about [1/3] and [2/3] from this thread?
> 
> Can we have a resend please?  Gather the various acks and revisions,
> make changelog changes to address the review questions and comments?

1/3 is clearly the one to be merged. The version with all the acks 
gathered is in this thread, at

	https://lore.kernel.org/lkml/de52b3bd-4e39-c133-542a-0a9c5e357404@suse.cz/

Attaching the patch also at the end of this mail so that it could be 
easily picked up.

I am unfortunately not sure what changelog changes you are talking about, 
there were none requested during the review as far as I know.

2/3 is clearly postponed for now, it needs more thinking.

3/3 is actually waiting for your decision, see

	https://lore.kernel.org/lkml/20190212063643.GL15609@dhcp22.suse.cz/

The 1/3 patch to be merged in any case:


=== cut here ===

From: Jiri Kosina <jkosina@suse.cz>
Date: Wed, 16 Jan 2019 20:53:17 +0100
Subject: [PATCH v2] mm/mincore: make mincore() more conservative

The semantics of what mincore() considers to be resident is not completely
clear, but Linux has always (since 2.3.52, which is when mincore() was
initially done) treated it as "page is available in page cache".

That's potentially a problem, as that [in]directly exposes meta-information
about pagecache / memory mapping state even about memory not strictly belonging
to the process executing the syscall, opening possibilities for sidechannel
attacks.

Change the semantics of mincore() so that it only reveals pagecache information
for non-anonymous mappings that belog to files that the calling process could
(if it tried to) successfully open for writing.

[mhocko@suse.com: restructure can_do_mincore() conditions]
Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally-by: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daniel Gruss <daniel@gruss.cc>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Josh Snyder <joshs@netflix.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/mincore.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index 218099b5ed31..b8842b849604 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -169,6 +169,16 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	return 0;
 }
 
+static inline bool can_do_mincore(struct vm_area_struct *vma)
+{
+	if (vma_is_anonymous(vma))
+		return true;
+	if (!vma->vm_file)
+		return false;
+	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
+		inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0;
+}
+
 /*
  * Do a chunk of "sys_mincore()". We've already checked
  * all the arguments, we hold the mmap semaphore: we should
@@ -189,8 +199,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
 	vma = find_vma(current->mm, addr);
 	if (!vma || addr < vma->vm_start)
 		return -ENOMEM;
-	mincore_walk.mm = vma->vm_mm;
 	end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
+	if (!can_do_mincore(vma)) {
+		unsigned long pages = (end - addr) >> PAGE_SHIFT;
+		memset(vec, 1, pages);
+		return pages;
+	}
+	mincore_walk.mm = vma->vm_mm;
 	err = walk_page_range(addr, end, &mincore_walk);
 	if (err < 0)
 		return err;
Andrew Morton March 6, 2019, 11:23 p.m. UTC | #4
On Wed, 6 Mar 2019 23:48:03 +0100 (CET) Jiri Kosina <jikos@kernel.org> wrote:

> 3/3 is actually waiting for your decision, see
> 
> 	https://lore.kernel.org/lkml/20190212063643.GL15609@dhcp22.suse.cz/

I pity anyone who tried to understand this code by reading this code. 
Can we please get some careful commentary in there explaining what is
going on, and why things are thus?

I guess the [3/3] change makes sense, although it's unclear whether
anyone really needs it?  5.0 was released with 574823bfab8 ("Change
mincore() to count "mapped" pages rather than "cached" pages") so we'll
have a release cycle to somewhat determine how much impact 574823bfab8
has on users.  How about I queue up [3/3] and we reevaluate its
desirability in a couple of months?
Dominique Martinet March 6, 2019, 11:32 p.m. UTC | #5
Andrew Morton wrote on Wed, Mar 06, 2019:
> On Wed, 6 Mar 2019 23:48:03 +0100 (CET) Jiri Kosina <jikos@kernel.org> wrote:
> 
> > 3/3 is actually waiting for your decision, see
> > 
> > 	https://lore.kernel.org/lkml/20190212063643.GL15609@dhcp22.suse.cz/
> 
> I pity anyone who tried to understand this code by reading this code. 
> Can we please get some careful commentary in there explaining what is
> going on, and why things are thus?
> 
> I guess the [3/3] change makes sense, although it's unclear whether
> anyone really needs it?  5.0 was released with 574823bfab8 ("Change
> mincore() to count "mapped" pages rather than "cached" pages") so we'll
> have a release cycle to somewhat determine how much impact 574823bfab8
> has on users.  How about I queue up [3/3] and we reevaluate its
> desirability in a couple of months?

FWIW,

574823bfab8 has been reverted in 30bac164aca750, included in 5.0-rc4, so
the controversial change has only been there from 5.0-rc1 to 5.0-rc3
Andrew Morton March 6, 2019, 11:38 p.m. UTC | #6
On Thu, 7 Mar 2019 00:32:09 +0100 Dominique Martinet <asmadeus@codewreck.org> wrote:

> Andrew Morton wrote on Wed, Mar 06, 2019:
> > On Wed, 6 Mar 2019 23:48:03 +0100 (CET) Jiri Kosina <jikos@kernel.org> wrote:
> > 
> > > 3/3 is actually waiting for your decision, see
> > > 
> > > 	https://lore.kernel.org/lkml/20190212063643.GL15609@dhcp22.suse.cz/
> > 
> > I pity anyone who tried to understand this code by reading this code. 
> > Can we please get some careful commentary in there explaining what is
> > going on, and why things are thus?
> > 
> > I guess the [3/3] change makes sense, although it's unclear whether
> > anyone really needs it?  5.0 was released with 574823bfab8 ("Change
> > mincore() to count "mapped" pages rather than "cached" pages") so we'll
> > have a release cycle to somewhat determine how much impact 574823bfab8
> > has on users.  How about I queue up [3/3] and we reevaluate its
> > desirability in a couple of months?
> 
> FWIW,
> 
> 574823bfab8 has been reverted in 30bac164aca750, included in 5.0-rc4, so
> the controversial change has only been there from 5.0-rc1 to 5.0-rc3

Ah, OK, thanks, I misread.

Linus, do you have thoughts on
http://lkml.kernel.org/r/20190130124420.1834-4-vbabka@suse.cz ?
Linus Torvalds March 9, 2019, 4:53 p.m. UTC | #7
On Wed, Mar 6, 2019 at 3:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Linus, do you have thoughts on
> http://lkml.kernel.org/r/20190130124420.1834-4-vbabka@suse.cz ?

I think that's fine, and probably the right thing to do, but I also
suspect that nobody actually cares ;(

                 Linus