diff mbox

mm: get_user_pages(write,force) refuse to COW in shared areas

Message ID 20140424133055.GA13269@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oleg Nesterov April 24, 2014, 1:30 p.m. UTC
Hi Hugh,

Sorry for late reply. First of all, to avoid the confusion, I think the
patch is fine.

When I saw this patch I decided that uprobes should be updated accordingly,
but I just realized that I do not understand what should I write in the
changelog.

On 04/04, Hugh Dickins wrote:
>
> +		if (gup_flags & FOLL_WRITE) {
> +			if (!(vm_flags & VM_WRITE)) {
> +				if (!(gup_flags & FOLL_FORCE))
> +					goto efault;
> +				/*
> +				 * We used to let the write,force case do COW
> +				 * in a VM_MAYWRITE VM_SHARED !VM_WRITE vma, so
> +				 * ptrace could set a breakpoint in a read-only
> +				 * mapping of an executable, without corrupting
> +				 * the file (yet only when that file had been
> +				 * opened for writing!).  Anon pages in shared
> +				 * mappings are surprising: now just reject it.
> +				 */
> +				if (!is_cow_mapping(vm_flags)) {
> +					WARN_ON_ONCE(vm_flags & VM_MAYWRITE);
> +					goto efault;
> +				}

OK. But could you please clarify "Anon pages in shared mappings are surprising" ?
I mean, does this only apply to "VM_MAYWRITE VM_SHARED !VM_WRITE vma" mentioned
above or this is bad even if a !FMODE_WRITE file was mmaped as MAP_SHARED ?

Yes, in this case this vma is not VM_SHARED and it is not VM_MAYWRITE, it is only
VM_MAYSHARE. This is in fact private mapping except mprotect(PROT_WRITE) will not
work.

But with or without this patch gup(FOLL_WRITE | FOLL_FORCE) won't work in this case,
(although perhaps it could ?), is_cow_mapping() == F because of !VM_MAYWRITE.

However, currently uprobes assumes that a cowed anon page is fine in this case, and
this differs from gup().

So, what do you think about the patch below? It is probably fine in any case,
but is there any "strong" reason to follow the gup's behaviour and forbid the
anon page in VM_MAYSHARE && !VM_MAYWRITE vma?

Oleg.


--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hugh Dickins April 24, 2014, 11:14 p.m. UTC | #1
On Thu, 24 Apr 2014, Oleg Nesterov wrote:

> Hi Hugh,
> 
> Sorry for late reply. First of all, to avoid the confusion, I think the
> patch is fine.
> 
> When I saw this patch I decided that uprobes should be updated accordingly,
> but I just realized that I do not understand what should I write in the
> changelog.

Thanks a lot for considering similar issues in uprobes, Oleg: I merely
checked that its uses of get_user_pages() would not be problematic,
and didn't look around to rediscover the worrying mm business that
goes on down there in kernel/events.

> 
> On 04/04, Hugh Dickins wrote:
> >
> > +		if (gup_flags & FOLL_WRITE) {
> > +			if (!(vm_flags & VM_WRITE)) {
> > +				if (!(gup_flags & FOLL_FORCE))
> > +					goto efault;
> > +				/*
> > +				 * We used to let the write,force case do COW
> > +				 * in a VM_MAYWRITE VM_SHARED !VM_WRITE vma, so
> > +				 * ptrace could set a breakpoint in a read-only
> > +				 * mapping of an executable, without corrupting
> > +				 * the file (yet only when that file had been
> > +				 * opened for writing!).  Anon pages in shared
> > +				 * mappings are surprising: now just reject it.
> > +				 */
> > +				if (!is_cow_mapping(vm_flags)) {
> > +					WARN_ON_ONCE(vm_flags & VM_MAYWRITE);
> > +					goto efault;
> > +				}
> 
> OK. But could you please clarify "Anon pages in shared mappings are surprising" ?
> I mean, does this only apply to "VM_MAYWRITE VM_SHARED !VM_WRITE vma" mentioned
> above or this is bad even if a !FMODE_WRITE file was mmaped as MAP_SHARED ?

Good question. I simply didn't consider that - and (as you have realized)
didn't need to consider it, because I was just stopping the problematic
behaviour in gup(), and didn't need to consider whether other behaviour
prohibited by gup() was actually unproblematic.

> 
> Yes, in this case this vma is not VM_SHARED and it is not VM_MAYWRITE, it is only
> VM_MAYSHARE. This is in fact private mapping except mprotect(PROT_WRITE) will not
> work.
> 
> But with or without this patch gup(FOLL_WRITE | FOLL_FORCE) won't work in this case,
                     "this" meaning my patch rather than yours below
> (although perhaps it could ?), is_cow_mapping() == F because of !VM_MAYWRITE.
> 
> However, currently uprobes assumes that a cowed anon page is fine in this case, and
> this differs from gup().
> 
> So, what do you think about the patch below? It is probably fine in any case,
> but is there any "strong" reason to follow the gup's behaviour and forbid the
> anon page in VM_MAYSHARE && !VM_MAYWRITE vma?

I don't think there is a "strong" reason to forbid it.

The strongest reason is simply that it's much safer if uprobes follows
the same conventions as mm, and get_user_pages() happens to have
forbidden that all along.

The philosophical reason to forbid it is that the user mmapped with
MAP_SHARED, and it's merely a kernel-internal detail that we flip off
VM_SHARED and treat these read-only shared mappings very much like
private mappings.  The user asked for MAP_SHARED, and we prefer to
respect that by not letting private COWs creep in.

We could treat those mappings even more like private mappings, and
allow the COWs; but better to be strict about it, so long as doing
so doesn't give you regressions.

> 
> Oleg.
> 
> --- x/kernel/events/uprobes.c
> +++ x/kernel/events/uprobes.c
> @@ -127,12 +127,13 @@ struct xol_area {
>   */
>  static bool valid_vma(struct vm_area_struct *vma, bool is_register)
>  {
> -	vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED;
> +	vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC;

I think a one-line patch changing VM_SHARED to VM_MAYSHARE would do it,
wouldn't it?  And save you from having to export is_cow_mapping()
from mm/memory.c.  (I used is_cow_mapping() because I had to make the
test more complex anyway, just to exclude the case which had been
oddly handled before.)

Hugh

>  
>  	if (is_register)
>  		flags |= VM_WRITE;
>  
> -	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
> +	return 	vma->vm_file && is_cow_mapping(vma->vm_flags) &&
> +		(vma->vm_flags & flags) == VM_MAYEXEC;
>  }
>  
>  static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov April 25, 2014, 7:09 p.m. UTC | #2
On 04/24, Hugh Dickins wrote:
>
> On Thu, 24 Apr 2014, Oleg Nesterov wrote:
>
> > So, what do you think about the patch below? It is probably fine in any case,
> > but is there any "strong" reason to follow the gup's behaviour and forbid the
> > anon page in VM_MAYSHARE && !VM_MAYWRITE vma?
>
> I don't think there is a "strong" reason to forbid it.
>
> The strongest reason is simply that it's much safer if uprobes follows
> the same conventions as mm, and get_user_pages() happens to have
> forbidden that all along.
>
> The philosophical reason to forbid it is that the user mmapped with
> MAP_SHARED, and it's merely a kernel-internal detail that we flip off
> VM_SHARED and treat these read-only shared mappings very much like
> private mappings.  The user asked for MAP_SHARED, and we prefer to
> respect that by not letting private COWs creep in.
>
> We could treat those mappings even more like private mappings, and
> allow the COWs; but better to be strict about it, so long as doing
> so doesn't give you regressions.

Great, thanks a lot! I was worried I missed something subtle.

And I forgot to mention, there is another reason why I would like to
change uprobes to follow the same convention. I still think it would
be better to kill __replace_page() and use gup(FOLL_WRITE | FORCE)
in uprobe_write_opcode().


> > --- x/kernel/events/uprobes.c
> > +++ x/kernel/events/uprobes.c
> > @@ -127,12 +127,13 @@ struct xol_area {
> >   */
> >  static bool valid_vma(struct vm_area_struct *vma, bool is_register)
> >  {
> > -	vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED;
> > +	vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC;
>
> I think a one-line patch changing VM_SHARED to VM_MAYSHARE would do it,
> wouldn't it?  And save you from having to export is_cow_mapping()
> from mm/memory.c.  (I used is_cow_mapping() because I had to make the
> test more complex anyway, just to exclude the case which had been
> oddly handled before.)

Indeed, thanks!

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugh Dickins April 25, 2014, 8:07 p.m. UTC | #3
On Fri, 25 Apr 2014, Oleg Nesterov wrote:
> 
> And I forgot to mention, there is another reason why I would like to
> change uprobes to follow the same convention. I still think it would
> be better to kill __replace_page() and use gup(FOLL_WRITE | FORCE)
> in uprobe_write_opcode().

Oh, please please do!  I assumed there was a good atomicity reason
for doing it that way, but if you can do it with gup() please do so.

I went off into a little rant about __replace_page() in my reply to you;
but then felt that you had approached with an honest enquiry, and didn't
deserve a rant in response, so I deleted it.

And, of course, I'm conscious that I have from start to finish withheld
my involvement from uprobes, and refused to review that __replace_page()
(beyond insisting that it not be put in a common place for sharing with
KSM, because I just couldn't guarantee it for uprobes).  I'm afraid that
it's much like HWPoison to me, a complexity (nastiness?) way beyond what
I have time to support myself.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -127,12 +127,13 @@  struct xol_area {
  */
 static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 {
-	vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED;
+	vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC;
 
 	if (is_register)
 		flags |= VM_WRITE;
 
-	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
+	return 	vma->vm_file && is_cow_mapping(vma->vm_flags) &&
+		(vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
 static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)