diff mbox

[v2,1/2] fs: Improve and simplify copy_mount_options

Message ID f2ad616567c7666cbba22f51c97fac1d09cfd200.1465871650.git.luto@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andy Lutomirski June 14, 2016, 2:36 a.m. UTC
copy_mount_options always tries to copy a full page even if the
string is shorter than a page.  If the string starts part-way into a
page and ends on the same page it started on, this means that
copy_mount_options can overrun the supplied buffer and read into the
next page.

If the buffer came from userspace (USER_DS), then this could be a
performance issue (reading across the page boundary could block).
If the buffer came from the kernel (KERNEL_DS), then this could read
an unrelated page, and the kernel can have pages mapped in that have
side-effects.

I noticed this due to a new sanity-check I'm working on that tries
to make sure that we don't try to access nonexistent pages under
KERNEL_DS.

This is the same issue that was fixed by commit eca6f534e619 ("fs:
fix overflow in sys_mount() for in-kernel calls"), but for
copy_mount_options instead of copy_mount_string.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 fs/namespace.c | 56 ++++++++++++--------------------------------------------
 1 file changed, 12 insertions(+), 44 deletions(-)

Comments

Al Viro June 15, 2016, 11:50 p.m. UTC | #1
On Mon, Jun 13, 2016 at 07:36:04PM -0700, Andy Lutomirski wrote:
> copy_mount_options always tries to copy a full page even if the
> string is shorter than a page.  If the string starts part-way into a
> page and ends on the same page it started on, this means that
> copy_mount_options can overrun the supplied buffer and read into the
> next page.

Have you considered the possibility that there might be a reason for
having separate copy_mount_option() and copy_mount_string()?  Such as
options not being a string, perhaps?

In some filesystems (including older NFS variants) it is not a string
at all - a binary data structure, with quite a few zero bytes in it.
And no, we fucking *can't* break mount.nfs(8), no matter how we'd like
to get rid of that wart of an ABI.

IOW, NAK with prejudice - don't bring that thing back, it's hard no-go.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski June 16, 2016, 12:01 a.m. UTC | #2
On Wed, Jun 15, 2016 at 4:50 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Jun 13, 2016 at 07:36:04PM -0700, Andy Lutomirski wrote:
>> copy_mount_options always tries to copy a full page even if the
>> string is shorter than a page.  If the string starts part-way into a
>> page and ends on the same page it started on, this means that
>> copy_mount_options can overrun the supplied buffer and read into the
>> next page.
>
> Have you considered the possibility that there might be a reason for
> having separate copy_mount_option() and copy_mount_string()?  Such as
> options not being a string, perhaps?
>
> In some filesystems (including older NFS variants) it is not a string
> at all - a binary data structure, with quite a few zero bytes in it.
> And no, we fucking *can't* break mount.nfs(8), no matter how we'd like
> to get rid of that wart of an ABI.
>
> IOW, NAK with prejudice - don't bring that thing back, it's hard no-go.

Well, that sucks.  I suppose we could make it conditional on the fs
type being "nfs", but yuck.

If we don't fix this, though, then we have other problems:

devtmpfsd does:

        *err = sys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);

where options points to the kernel stack.  This is bad.  do_mount_root
is similarly broken.

Is there any reason that these things use sys_mount instead of do_mount?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds June 16, 2016, 12:42 a.m. UTC | #3
On Wed, Jun 15, 2016 at 2:01 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> devtmpfsd does:
>
>         *err = sys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
>
> where options points to the kernel stack.  This is bad.  do_mount_root
> is similarly broken.
>
> Is there any reason that these things use sys_mount instead of do_mount?

Not that I can see. But maybe copy_mount_options could also check for
KERNEL_DS, and use a strncpy instead of a copy_from_user() for that
case?

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/namespace.c b/fs/namespace.c
index 4fb1691b4355..8644f1961ca6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2581,38 +2581,13 @@  static void shrink_submounts(struct mount *mnt)
 	}
 }
 
-/*
- * Some copy_from_user() implementations do not return the exact number of
- * bytes remaining to copy on a fault.  But copy_mount_options() requires that.
- * Note that this function differs from copy_from_user() in that it will oops
- * on bad values of `to', rather than returning a short copy.
+/* Copy the mount options string.  Always returns a full page padded
+ * with nulls.  If the input string is a full page or more, it may be
+ * truncated and the result will not be null-terminated.
  */
-static long exact_copy_from_user(void *to, const void __user * from,
-				 unsigned long n)
+void *copy_mount_options(const void __user *data)
 {
-	char *t = to;
-	const char __user *f = from;
-	char c;
-
-	if (!access_ok(VERIFY_READ, from, n))
-		return n;
-
-	while (n) {
-		if (__get_user(c, f)) {
-			memset(t, 0, n);
-			break;
-		}
-		*t++ = c;
-		f++;
-		n--;
-	}
-	return n;
-}
-
-void *copy_mount_options(const void __user * data)
-{
-	int i;
-	unsigned long size;
+	long size;
 	char *copy;
 
 	if (!data)
@@ -2622,22 +2597,15 @@  void *copy_mount_options(const void __user * data)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	/* We only care that *some* data at the address the user
-	 * gave us is valid.  Just in case, we'll zero
-	 * the remainder of the page.
-	 */
-	/* copy_from_user cannot cross TASK_SIZE ! */
-	size = TASK_SIZE - (unsigned long)data;
-	if (size > PAGE_SIZE)
-		size = PAGE_SIZE;
-
-	i = size - exact_copy_from_user(copy, data, size);
-	if (!i) {
+	size = strncpy_from_user(copy, data, PAGE_SIZE);
+	if (size < 0) {
 		kfree(copy);
-		return ERR_PTR(-EFAULT);
+		return ERR_PTR(size);
 	}
-	if (i != PAGE_SIZE)
-		memset(copy + i, 0, PAGE_SIZE - i);
+
+	/* If we got less than PAGE_SIZE bytes, zero out the remainder. */
+	memset(copy + size, 0, PAGE_SIZE - size);
+
 	return copy;
 }