diff mbox series

[4/7] mm: Introduce verify_page_range()

Message ID 20210412080611.769864829@infradead.org (mailing list archive)
State New
Headers show
Series mm: Unexport apply_to_page_range() | expand

Commit Message

Peter Zijlstra April 12, 2021, 8 a.m. UTC
Introduce and EXPORT a read-only counterpart to apply_to_page_range().

It only exposes the PTE value, not a pointer to the pagetables itself
and is thus quite a bit safer to export. A number of
apply_to_page_range() users can be converted to this primitive.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/mm.h |    4 ++++
 mm/memory.c        |   24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

Comments

Christoph Hellwig April 12, 2021, 8:28 a.m. UTC | #1
On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> +extern int verify_page_range(struct mm_struct *mm,

No need for the extern here.

> +int verify_page_range(struct mm_struct *mm,
> +		      unsigned long addr, unsigned long size,
> +		      int (*fn)(pte_t pte, unsigned long addr, void *data),
> +		      void *data)

A kerneldoc comment would be nice for this function.

Otherwise this looks fine.
Peter Zijlstra April 12, 2021, 9:17 a.m. UTC | #2
On Mon, Apr 12, 2021 at 10:28:05AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > +extern int verify_page_range(struct mm_struct *mm,
> 
> No need for the extern here.

It's consistent with the rest of the functions there. Also, I personally
like that extern.

> > +int verify_page_range(struct mm_struct *mm,
> > +		      unsigned long addr, unsigned long size,
> > +		      int (*fn)(pte_t pte, unsigned long addr, void *data),
> > +		      void *data)
> 
> A kerneldoc comment would be nice for this function.
> 
> Otherwise this looks fine.

Something like so?

/**
 * verify_page_range() - Scan (and fill) a range of virtual memory and validate PTEs
 * @mm: mm identifying the virtual memory map
 * @addr: starting virtual address of the range
 * @size: size of the range
 * @fn: function that verifies the PTEs
 * @data: opaque data passed to @fn
 *
 * Scan a region of virtual memory, filling in page tables as necessary and
 * calling a provided function on each leaf, providing a copy of the
 * page-table-entry.
 *
 * Similar apply_to_page_range(), but does not provide direct access to the
 * page-tables.
 *
 * NOTE! this function does not work correctly vs large pages.
 *
 * Return: the first !0 return of the provided function, or 0 on completion.
 */
int verify_page_range(struct mm_struct *mm,
		      unsigned long addr, unsigned long size,
		      int (*fn)(pte_t pte, unsigned long addr, void *data),
		      void *data)
Kees Cook April 12, 2021, 8:05 p.m. UTC | #3
On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> +struct vpr_data {
> +	int (*fn)(pte_t pte, unsigned long addr, void *data);
> +	void *data;
> +};

Eeerg. This is likely to become an attack target itself. Stored function
pointer with stored (3rd) argument.

This doesn't seem needed: only DRM uses it, and that's for error
reporting. I'd rather plumb back errors in a way to not have to add
another place in the kernel where we do func+arg stored calling.
Peter Zijlstra April 13, 2021, 6:54 a.m. UTC | #4
On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > +struct vpr_data {
> > +	int (*fn)(pte_t pte, unsigned long addr, void *data);
> > +	void *data;
> > +};
> 
> Eeerg. This is likely to become an attack target itself. Stored function
> pointer with stored (3rd) argument.

You got some further reading on that? How exactly are those exploited?
Peter Zijlstra April 13, 2021, 7:36 a.m. UTC | #5
On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > +struct vpr_data {
> > +	int (*fn)(pte_t pte, unsigned long addr, void *data);
> > +	void *data;
> > +};
> 
> Eeerg. This is likely to become an attack target itself. Stored function
> pointer with stored (3rd) argument.
> 
> This doesn't seem needed: only DRM uses it, and that's for error
> reporting. I'd rather plumb back errors in a way to not have to add
> another place in the kernel where we do func+arg stored calling.

Is this any better? It does have the stored pointer, but not a stored
argument, assuming you don't count returns as arguments I suppose.

The alternative is refactoring apply_to_page_range() :-/

---

struct vpr_data {
	bool (*fn)(pte_t pte, unsigned long addr);
	unsigned long addr;
};

static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
{
	struct vpr_data *vpr = data;
	if (!vpr->fn(*pte, addr)) {
		vpr->addr = addr;
		return -EINVAL;
	}
	return 0;
}

/**
 * verify_page_range() - Scan (and fill) a range of virtual memory and validate PTEs
 * @mm: mm identifying the virtual memory map
 * @addr: starting virtual address of the range
 * @size: size of the range
 * @fn: function that verifies the PTEs
 *
 * Scan a region of virtual memory, filling in page tables as necessary and
 * calling a provided function on each leaf, providing a copy of the
 * page-table-entry.
 *
 * Similar apply_to_page_range(), but does not provide direct access to the
 * page-tables.
 *
 * NOTE! this function does not work correctly vs large pages.
 *
 * Return: the address that failed verification or 0 on success.
 */
unsigned long verify_page_range(struct mm_struct *mm,
				unsigned long addr, unsigned long size,
				bool (*fn)(pte_t pte, unsigned long addr))
{
	struct vpr_data vpr = {
		.fn = fn,
		.addr = 0,
	};
	apply_to_page_range(mm, addr, size, vpr_fn, &vpr);
	return vpr.addr;
}
EXPORT_SYMBOL_GPL(verify_page_range);
Kees Cook April 14, 2021, 3:01 a.m. UTC | #6
On Tue, Apr 13, 2021 at 09:36:32AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > > +struct vpr_data {
> > > +	int (*fn)(pte_t pte, unsigned long addr, void *data);
> > > +	void *data;
> > > +};
> > 
> > Eeerg. This is likely to become an attack target itself. Stored function
> > pointer with stored (3rd) argument.
> > 
> > This doesn't seem needed: only DRM uses it, and that's for error
> > reporting. I'd rather plumb back errors in a way to not have to add
> > another place in the kernel where we do func+arg stored calling.
> 
> Is this any better? It does have the stored pointer, but not a stored
> argument, assuming you don't count returns as arguments I suppose.

It's better in the sense that it's not the func/arg pair that really
bugs me, yes. :)

> The alternative is refactoring apply_to_page_range() :-/

Yeah, I'm looking now, I see what you mean.

> ---
> 
> struct vpr_data {
> 	bool (*fn)(pte_t pte, unsigned long addr);
> 	unsigned long addr;
> };
> 
> static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
> {
> 	struct vpr_data *vpr = data;
> 	if (!vpr->fn(*pte, addr)) {
> 		vpr->addr = addr;
> 		return -EINVAL;
> 	}
> 	return 0;
> }

My point about passing "addr" was that nothing in the callback actually
needs it -- the top level can just as easily report the error. And that
the helper is always vpr_fn(), so it doesn't need to be passed either.

So the addr can just be encoded in "int", and no structure is needed at:

typedef bool (*vpr_fn_t)(pte_t pte);

static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
{
	vpr_fn_t callback = data;

	if (!callback(*pte))
		return addr >> PAGE_SIZE;
	return 0;
}

unsigned long verify_page_range(struct mm_struct *mm,
				unsigned long addr, unsigned long size,
				vpr_fn_t callback)
{
	return apply_to_page_range(mm, addr, size, vpr_fn, callback) << PAGE_SIZE;
}

But maybe I'm missing something?
Peter Zijlstra April 14, 2021, 7 a.m. UTC | #7
On Tue, Apr 13, 2021 at 08:01:08PM -0700, Kees Cook wrote:
> So the addr can just be encoded in "int", and no structure is needed at:
> 
> typedef bool (*vpr_fn_t)(pte_t pte);
> 
> static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
> {
> 	vpr_fn_t callback = data;
> 
> 	if (!callback(*pte))
> 		return addr >> PAGE_SIZE;
> 	return 0;
> }
> 
> unsigned long verify_page_range(struct mm_struct *mm,
> 				unsigned long addr, unsigned long size,
> 				vpr_fn_t callback)
> {
> 	return apply_to_page_range(mm, addr, size, vpr_fn, callback) << PAGE_SIZE;
> }
> 
> But maybe I'm missing something?

That covers only (32+12) bits of address space and will mostly work, but
we definitely support architectures (very much including x86_64) with
larger address spaces than that.
Kees Cook April 19, 2021, 11:36 p.m. UTC | #8
On Tue, Apr 13, 2021 at 08:54:06AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> > On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > > +struct vpr_data {
> > > +	int (*fn)(pte_t pte, unsigned long addr, void *data);
> > > +	void *data;
> > > +};
> > 
> > Eeerg. This is likely to become an attack target itself. Stored function
> > pointer with stored (3rd) argument.
> 
> You got some further reading on that? How exactly are those exploited?

Sure, see "Executing code" in
https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html

I killed the entire primitive (for timer_list)
https://outflux.net/blog/archives/2018/02/05/security-things-in-linux-v4-15/#v4.15-timer_list
but that was a lot of work, so I'm trying to avoid seeing more things
like it appear. :) (And I'm trying to get rid of similar APIs, like
tasklet.)

This new code is unlikely to ever be used as widely as timer_list,
but I just cringe when I see the code pattern. I'll understand if there
isn't a solution that doesn't require major refactoring, but I can
dream. :)
diff mbox series

Patch

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2876,6 +2876,10 @@  extern int apply_to_page_range(struct mm
 extern int apply_to_existing_page_range(struct mm_struct *mm,
 				   unsigned long address, unsigned long size,
 				   pte_fn_t fn, void *data);
+extern int verify_page_range(struct mm_struct *mm,
+			     unsigned long addr, unsigned long size,
+			     int (*fn)(pte_t pte, unsigned long addr, void *data),
+			     void *data);
 
 extern void init_mem_debugging_and_hardening(void);
 #ifdef CONFIG_PAGE_POISONING
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2559,6 +2559,30 @@  int apply_to_existing_page_range(struct
 	return __apply_to_page_range(mm, addr, size, fn, data, false);
 }
 
+struct vpr_data {
+	int (*fn)(pte_t pte, unsigned long addr, void *data);
+	void *data;
+};
+
+static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
+{
+	struct vpr_data *vpr = data;
+	return vpr->fn(*pte, addr, vpr->data);
+}
+
+int verify_page_range(struct mm_struct *mm,
+		      unsigned long addr, unsigned long size,
+		      int (*fn)(pte_t pte, unsigned long addr, void *data),
+		      void *data)
+{
+	struct vpr_data vpr = {
+		.fn = fn,
+		.data = data,
+	};
+	return apply_to_page_range(mm, addr, size, vpr_fn, &vpr);
+}
+EXPORT_SYMBOL_GPL(verify_page_range);
+
 /*
  * handle_pte_fault chooses page fault handler according to an entry which was
  * read non-atomically.  Before making any commitment, on those architectures