[v34,10/24] mm: Add vm_ops->mprotect()

Message ID	20200707030204.126021-11-jarkko.sakkinen@linux.intel.com (mailing list archive)
State	Under Review
Headers	show Return-Path: <SRS0=kNj5=AS=vger.kernel.org=linux-sgx-owner@kernel.org> IronPort-SDR: 6L8KH1UkQ1aA1ULBmsYPYJTTNWHqZXHMN+turQbv0LYt23owDIIFbUrsJZQJuQlyv4uoQ/XPEz MOpLhZQomPZg== IronPort-SDR: sXpsBG7BvDvHFtYYp4eAWB3px0aBqlh0kR6o9U/nj0eyIRXJbD9BRJfLTh46JoaNuPRfkfIGcW SqO8x1ZhPc8w== From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: x86@kernel.org, linux-sgx@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Sean Christopherson <sean.j.christopherson@intel.com>, linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Jethro Beekman <jethro@fortanix.com>, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, andriy.shevchenko@linux.intel.com, asapek@google.com, bp@alien8.de, cedric.xing@intel.com, chenalexchen@google.com, conradparker@google.com, cyhanish@google.com, dave.hansen@intel.com, haitao.huang@intel.com, josh@joshtriplett.org, kai.huang@intel.com, kai.svahn@intel.com, kmoy@google.com, ludloff@google.com, luto@kernel.org, nhorman@redhat.com, npmccallum@redhat.com, puiterwijk@redhat.com, rientjes@google.com, tglx@linutronix.de, yaozhangx@google.com Subject: [PATCH v34 10/24] mm: Add vm_ops->mprotect() Date: Tue, 7 Jul 2020 06:01:50 +0300 Message-Id: <20200707030204.126021-11-jarkko.sakkinen@linux.intel.com> In-Reply-To: <20200707030204.126021-1-jarkko.sakkinen@linux.intel.com> References: <20200707030204.126021-1-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk
Series	Intel SGX foundations \| expand [v34,00/24] Intel SGX foundations [v34,01/24] x86/cpufeatures: x86/msr: Add Intel SGX hardware bits [v34,02/24] x86/cpufeatures: x86/msr: Add Intel SGX Launch Control hardware bits [v34,03/24] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX [v34,04/24] x86/sgx: Add SGX microarchitectural data structures [v34,05/24] x86/sgx: Add wrappers for ENCLS leaf functions [v34,06/24] x86/cpu/intel: Detect SGX support [v34,07/24] x86/cpu/intel: Add nosgx kernel parameter [v34,08/24] x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections [v34,09/24] x86/sgx: Add __sgx_alloc_epc_page() and sgx_free_epc_page() [v34,10/24] mm: Add vm_ops->mprotect() [v34,11/24] x86/sgx: Add SGX enclave driver [v34,12/24] x86/sgx: Add SGX_IOC_ENCLAVE_CREATE [v34,13/24] x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES [v34,14/24] x86/sgx: Add SGX_IOC_ENCLAVE_INIT [v34,15/24] x86/sgx: Allow a limited use of ATTRIBUTE.PROVISIONKEY for attestation [v34,16/24] x86/sgx: Add a page reclaimer [v34,17/24] x86/sgx: ptrace() support for the SGX driver [v34,18/24] x86/vdso: Add support for exception fixup in vDSO functions [v34,19/24] x86/fault: Add helper function to sanitize error code [v34,20/24] x86/traps: Attempt to fixup exceptions in vDSO before signaling [v34,21/24] x86/vdso: Implement a vDSO for Intel SGX enclave call [v34,22/24] selftests/x86: Add a selftest for SGX [v34,23/24] docs: x86/sgx: Document SGX micro architecture and kernel internals [v34,24/24] x86/sgx: Update MAINTAINERS

Jarkko Sakkinen July 7, 2020, 3:01 a.m. UTC

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add vm_ops()->mprotect() for additional constraints for a VMA.

Intel Software Guard eXtensions (SGX) will use this callback to add two
constraints:

1. Verify that the address range does not have holes: each page address
   must be filled with an enclave page.
2. Verify that VMA permissions won't surpass the permissions of any enclave
   page within the address range. Enclave cryptographically sealed
   permissions for each page address that set the upper limit for possible
   VMA permissions. Not respecting this can cause #GP's to be emitted.

Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 include/linux/mm.h |  2 ++
 mm/mprotect.c      | 13 ++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

Matthew Wilcox July 7, 2020, 3:14 a.m. UTC | #1

On Tue, Jul 07, 2020 at 06:01:50AM +0300, Jarkko Sakkinen wrote:
> +++ b/mm/mprotect.c
> @@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>  			goto out;
>  		}
>  
> +		tmp = vma->vm_end;
> +		if (tmp > end)
> +			tmp = end;
> +
>  		error = security_file_mprotect(vma, reqprot, prot);
>  		if (error)
>  			goto out;
>  
> -		tmp = vma->vm_end;
> -		if (tmp > end)
> -			tmp = end;

You don't need to move this any more, right?

> +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> +			if (error)
> +				goto out;
> +		}
> +
>  		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
>  		if (error)
>  			goto out;
> -- 
> 2.25.1
>

Sean Christopherson July 7, 2020, 3:22 a.m. UTC | #2

On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 06:01:50AM +0300, Jarkko Sakkinen wrote:
> > +++ b/mm/mprotect.c
> > @@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >  			goto out;
> >  		}
> >  
> > +		tmp = vma->vm_end;
> > +		if (tmp > end)
> > +			tmp = end;
> > +
> >  		error = security_file_mprotect(vma, reqprot, prot);
> >  		if (error)
> >  			goto out;
> >  
> > -		tmp = vma->vm_end;
> > -		if (tmp > end)
> > -			tmp = end;
> 
> You don't need to move this any more, right?

Ya, I was typing up a longer version, you beat me to the punch...

The calculation of 'tmp' doesn't need to be moved.  The previous incarnation
only moved it so that that 'tmp would be available for .may_mprotect().

The only reason  .may_mprotect() was placed before security_file_mprotect()
was to avoid triggering LSM errors that would have been blocked by
.may_mprotect(), but that justification is no longer relevant as the
proposed SGX LSM hooks obviously didn't worm their way into this series.

> > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > +			if (error)
> > +				goto out;
> > +		}

Based on "... and then the vma owner can do whatever it needs to before
calling mprotect_fixup(), which is already not static", my interpretation
is that Matthew's intent was to do:

		if (vma->vm_ops && vma->vm_ops->mprotect)
			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
		else
			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
		if (error)
			goto out;

i.e. make .mprotect() a full replacement as opposed to a prereq hook.

> > +
> >  		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> >  		if (error)
> >  			goto out;
> > -- 
> > 2.25.1
> >

Matthew Wilcox July 7, 2020, 3:24 a.m. UTC | #3

On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > +			if (error)
> > > +				goto out;
> > > +		}
> 
> Based on "... and then the vma owner can do whatever it needs to before
> calling mprotect_fixup(), which is already not static", my interpretation
> is that Matthew's intent was to do:
> 
> 		if (vma->vm_ops && vma->vm_ops->mprotect)
> 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> 		else
> 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> 		if (error)
> 			goto out;
> 
> i.e. make .mprotect() a full replacement as opposed to a prereq hook.

Yes, it was.  I was just looking at the next patch to be sure this was
how I'd been misunderstood.

Jarkko Sakkinen July 7, 2020, 4:01 a.m. UTC | #4

On Tue, Jul 07, 2020 at 04:24:08AM +0100, Matthew Wilcox wrote:
> On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> > On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > +			if (error)
> > > > +				goto out;
> > > > +		}
> > 
> > Based on "... and then the vma owner can do whatever it needs to before
> > calling mprotect_fixup(), which is already not static", my interpretation
> > is that Matthew's intent was to do:
> > 
> > 		if (vma->vm_ops && vma->vm_ops->mprotect)
> > 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > 		else
> > 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> > 		if (error)
> > 			goto out;
> > 
> > i.e. make .mprotect() a full replacement as opposed to a prereq hook.
> 
> Yes, it was.  I was just looking at the next patch to be sure this was
> how I'd been misunderstood.

I'm don't get this part. If mprotect_fixup is called in the tail of the
callback, why it has to be called inside the callback and not be called
after the callback?

The reason I only part did what you requested was to do only the part of
the change that I get. Not to oppose it.

/Jarkko

Jarkko Sakkinen July 7, 2020, 4:03 a.m. UTC | #5

On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 06:01:50AM +0300, Jarkko Sakkinen wrote:
> > +++ b/mm/mprotect.c
> > @@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >  			goto out;
> >  		}
> >  
> > +		tmp = vma->vm_end;
> > +		if (tmp > end)
> > +			tmp = end;
> > +
> >  		error = security_file_mprotect(vma, reqprot, prot);
> >  		if (error)
> >  			goto out;
> >  
> > -		tmp = vma->vm_end;
> > -		if (tmp > end)
> > -			tmp = end;
> 
> You don't need to move this any more, right?

My bad.

/Jarkko

Matthew Wilcox July 7, 2020, 4:10 a.m. UTC | #6

On Tue, Jul 07, 2020 at 07:01:51AM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 07, 2020 at 04:24:08AM +0100, Matthew Wilcox wrote:
> > On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> > > On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > > +			if (error)
> > > > > +				goto out;
> > > > > +		}
> > > 
> > > Based on "... and then the vma owner can do whatever it needs to before
> > > calling mprotect_fixup(), which is already not static", my interpretation
> > > is that Matthew's intent was to do:
> > > 
> > > 		if (vma->vm_ops && vma->vm_ops->mprotect)
> > > 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > 		else
> > > 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> > > 		if (error)
> > > 			goto out;
> > > 
> > > i.e. make .mprotect() a full replacement as opposed to a prereq hook.
> > 
> > Yes, it was.  I was just looking at the next patch to be sure this was
> > how I'd been misunderstood.
> 
> I'm don't get this part. If mprotect_fixup is called in the tail of the
> callback, why it has to be called inside the callback and not be called
> after the callback?

Because that's how every other VM operation works.  Look at your
implementation of get_unmapped_area() for example.

Jarkko Sakkinen July 8, 2020, 2:33 p.m. UTC | #7

On Tue, Jul 07, 2020 at 05:10:46AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 07:01:51AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Jul 07, 2020 at 04:24:08AM +0100, Matthew Wilcox wrote:
> > > On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> > > > On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > > > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > > > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > > > +			if (error)
> > > > > > +				goto out;
> > > > > > +		}
> > > > 
> > > > Based on "... and then the vma owner can do whatever it needs to before
> > > > calling mprotect_fixup(), which is already not static", my interpretation
> > > > is that Matthew's intent was to do:
> > > > 
> > > > 		if (vma->vm_ops && vma->vm_ops->mprotect)
> > > > 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > 		else
> > > > 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> > > > 		if (error)
> > > > 			goto out;
> > > > 
> > > > i.e. make .mprotect() a full replacement as opposed to a prereq hook.
> > > 
> > > Yes, it was.  I was just looking at the next patch to be sure this was
> > > how I'd been misunderstood.
> > 
> > I'm don't get this part. If mprotect_fixup is called in the tail of the
> > callback, why it has to be called inside the callback and not be called
> > after the callback?
> 
> Because that's how every other VM operation works.  Look at your
> implementation of get_unmapped_area() for example.

I get the point but I don't think that your proposal could work given
that mprotect-callback takes neither 'prev' nor 'newflags' as its
parameters. The current callback has no means to call mprotect_fixup()
properly.

It would have to be extended

	int (*mprotect)(struct vm_area_struct *vma,
			struct vm_area_struct **pprev, unsigned long start,
			unsigned long end, unsigned long prot,
			unsigned long newflags);

Is this what you want?

/Jarkko

Matthew Wilcox July 8, 2020, 2:37 p.m. UTC | #8

On Wed, Jul 08, 2020 at 05:33:20PM +0300, Jarkko Sakkinen wrote:
> I get the point but I don't think that your proposal could work given
> that mprotect-callback takes neither 'prev' nor 'newflags' as its
> parameters. The current callback has no means to call mprotect_fixup()
> properly.
> 
> It would have to be extended
> 
> 	int (*mprotect)(struct vm_area_struct *vma,
> 			struct vm_area_struct **pprev, unsigned long start,
> 			unsigned long end, unsigned long prot,
> 			unsigned long newflags);
> 
> Is this what you want?

https://lore.kernel.org/linux-mm/20200625173050.GF7703@casper.infradead.org/

Jarkko Sakkinen July 8, 2020, 4:10 p.m. UTC | #9

On Wed, Jul 08, 2020 at 03:37:08PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 08, 2020 at 05:33:20PM +0300, Jarkko Sakkinen wrote:
> > I get the point but I don't think that your proposal could work given
> > that mprotect-callback takes neither 'prev' nor 'newflags' as its
> > parameters. The current callback has no means to call mprotect_fixup()
> > properly.
> > 
> > It would have to be extended
> > 
> > 	int (*mprotect)(struct vm_area_struct *vma,
> > 			struct vm_area_struct **pprev, unsigned long start,
> > 			unsigned long end, unsigned long prot,
> > 			unsigned long newflags);
> > 
> > Is this what you want?
> 
> https://lore.kernel.org/linux-mm/20200625173050.GF7703@casper.infradead.org/

Ugh, it's there as it should be. I'm sorry - I just misread the code.

I think that should work, and we do not have to do extra conversion
inside the callback.

There is still one thing that I'm wondering. 'page->vm_max_prot_bits'
contains some an union of subset of {VM_READ, VM_WRITE, VM_EXEC}, and
'newflags' can contain other bits set too.

The old implementation of sgx_vma_mprotect() is like this:

static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
			    unsigned long end, unsigned long prot)
{
	return sgx_encl_may_map(vma->vm_private_data, start, end,
				calc_vm_prot_bits(prot, 0));
}

The new one should be probably the implemented along the lines of

static int sgx_vma_mprotect(struct vm_area_struct *vma,
			    struct vm_area_struct **pprev, unsigned long start,
			    unsigned long end, unsigned long newflags)
{
	unsigned long masked_newflags = newflags &
					(VM_READ | VM_WRITE | VM_EXEC);
	int ret;

	ret = sgx_encl_may_map(vma->vm_private_data, start, end,
				   masked_newflags);
	if (ret)
		return ret;

	return mprotect_fixup(vma, pprev, start, end, newflags);
}

Alternatively the filtering can be inside sgx_encl_may_map(). Perhaps
that is a better place for it. This was just easier function to
represent the idea.

/Jarkko

Jarkko Sakkinen July 8, 2020, 10:56 p.m. UTC | #10

On Wed, Jul 08, 2020 at 07:10:27PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jul 08, 2020 at 03:37:08PM +0100, Matthew Wilcox wrote:
> > On Wed, Jul 08, 2020 at 05:33:20PM +0300, Jarkko Sakkinen wrote:
> > > I get the point but I don't think that your proposal could work given
> > > that mprotect-callback takes neither 'prev' nor 'newflags' as its
> > > parameters. The current callback has no means to call mprotect_fixup()
> > > properly.
> > > 
> > > It would have to be extended
> > > 
> > > 	int (*mprotect)(struct vm_area_struct *vma,
> > > 			struct vm_area_struct **pprev, unsigned long start,
> > > 			unsigned long end, unsigned long prot,
> > > 			unsigned long newflags);
> > > 
> > > Is this what you want?
> > 
> > https://lore.kernel.org/linux-mm/20200625173050.GF7703@casper.infradead.org/
> 
> Ugh, it's there as it should be. I'm sorry - I just misread the code.
> 
> I think that should work, and we do not have to do extra conversion
> inside the callback.
> 
> There is still one thing that I'm wondering. 'page->vm_max_prot_bits'
> contains some an union of subset of {VM_READ, VM_WRITE, VM_EXEC}, and
> 'newflags' can contain other bits set too.
> 
> The old implementation of sgx_vma_mprotect() is like this:
> 
> static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
> 			    unsigned long end, unsigned long prot)
> {
> 	return sgx_encl_may_map(vma->vm_private_data, start, end,
> 				calc_vm_prot_bits(prot, 0));
> }
> 
> The new one should be probably the implemented along the lines of
> 
> static int sgx_vma_mprotect(struct vm_area_struct *vma,
> 			    struct vm_area_struct **pprev, unsigned long start,
> 			    unsigned long end, unsigned long newflags)
> {
> 	unsigned long masked_newflags = newflags &
> 					(VM_READ | VM_WRITE | VM_EXEC);
> 	int ret;
> 
> 	ret = sgx_encl_may_map(vma->vm_private_data, start, end,
> 				   masked_newflags);
> 	if (ret)
> 		return ret;
> 
> 	return mprotect_fixup(vma, pprev, start, end, newflags);
> }
> 
> Alternatively the filtering can be inside sgx_encl_may_map(). Perhaps
> that is a better place for it. This was just easier function to
> represent the idea.

Turned out that mmap() handler was already masking with RWX. So I
removed masking from there and do it as the first step in
sgx_encl_may_map(), which is called by the both handlers:

int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
		     unsigned long end, unsigned long vm_flags)
{
	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
	unsigned long idx, idx_start, idx_end;
	struct sgx_encl_page *page;

Also renamed the last function parameter from vm_flags to vm_port_bits.
Kind of makes the flow more understandable (i.e. vm_prot_bits is purely
internal representation, not something caller needs to be concerned of).

/Jarkko

[v34,10/24] mm: Add vm_ops->mprotect()

Commit Message

Comments

Patch