diff mbox series

[2/2] tools/nolibc: add support for directory access

Message ID 20250130-nolibc-dir-v1-2-ea9950b52e29@weissschuh.net (mailing list archive)
State New
Headers show
Series tools/nolibc: add support for directory access | expand

Commit Message

Thomas Weißschuh Jan. 30, 2025, 7:54 p.m. UTC
From: Thomas Weißschuh <thomas.weissschuh@linutronix.de>

Add an allocation-free implementation of readdir() and related
functions. The implementation is modelled after the one for FILE.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>

---
I'm not entirely sure where to put it. It doesn't really belong into
stdio.h, but that's where the FILE stuff is.
sys.h wants alphabetical ordering, but IMO these functions should stick
together.
---
 tools/include/nolibc/stdio.h                 | 76 ++++++++++++++++++++++++++++
 tools/include/nolibc/types.h                 |  5 ++
 tools/testing/selftests/nolibc/nolibc-test.c | 37 ++++++++++++++
 3 files changed, 118 insertions(+)

Comments

Willy Tarreau Feb. 1, 2025, 10:34 a.m. UTC | #1
On Thu, Jan 30, 2025 at 08:54:03PM +0100, Thomas Weißschuh wrote:
> From: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> 
> Add an allocation-free implementation of readdir() and related
> functions. The implementation is modelled after the one for FILE.

I think you'd need to mention/remind the two important points that
come out of that choice, one being that DIR is a fake pointer that
instead stores ~fd so that it can be turned back to a valid FD, and
that subsequent readdir() calls will only work from the same file
unit since it relies on a local static storage.

Better have this visible in the commit message so that in the event
someone faces a difficulty due to this, they can easily find that it's
an on-purpose design choice.

> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
> 
> ---
> I'm not entirely sure where to put it. It doesn't really belong into
> stdio.h, but that's where the FILE stuff is.
> sys.h wants alphabetical ordering, but IMO these functions should stick
> together.

My man pages suggest that userland code will include <dirent.h>, thus
I think it could be the moment to create it with that new code.

> diff --git a/tools/include/nolibc/stdio.h b/tools/include/nolibc/stdio.h
> index 3892034198dd566d21a5cc0a9f67cf097d428393..1f275a0a7b6b2c6f1c15405d027c282bb77aa618 100644
> --- a/tools/include/nolibc/stdio.h
> +++ b/tools/include/nolibc/stdio.h
(...)
> +static __attribute__((unused))
> +struct dirent *readdir(DIR *dirp)
> +{
> +	static struct dirent dirent;
> +
> +	char buf[sizeof(struct linux_dirent64) + NAME_MAX];

I'm uncertain where NAME_MAX is defined, I haven't found it in the
nolibc sources, just double-checking that it's not just in your build
environment by accident.

> +	struct linux_dirent64 *ldir = (void *)buf;
> +	intptr_t i = (intptr_t)dirp;
> +	int fd, ret;
> +
> +	if (i >= 0) {
> +		SET_ERRNO(EBADF);
> +		return NULL;
> +	}
> +
> +	fd = ~i;
> +
> +	ret = getdents64(fd, ldir, sizeof(buf));
> +	if (ret == -1 || ret == 0)
> +		return NULL;
> +
> +	/*
> +	 * getdents64() returns as many entries as fit the buffer.
> +	 * readdir() can only return one entry at a time.
> +	 * Make sure the non-returned ones are not skipped.
> +	 */
> +	ret = lseek(fd, ldir->d_off, SEEK_SET);
> +	if (ret == -1)
> +		return NULL;
> +
> +	dirent = (struct dirent) {
> +		.d_ino = ldir->d_ino,
> +	};
> +	strlcpy(dirent.d_name, ldir->d_name, sizeof(dirent.d_name));

Just out of curiosity, could this copy fail, and if so, should we handle
it (e.g. NAME_MAX != 255) ? My guess here is that if it could almost never
fail and checking it would needlessly complicate the function, let's just
handle it with a comment for now. And if it cannot at all, let's mention
why on top of it as well.

Thanks,
Willy
Thomas Weißschuh Feb. 1, 2025, 10:41 a.m. UTC | #2
On 2025-02-01 11:34:38+0100, Willy Tarreau wrote:
> On Thu, Jan 30, 2025 at 08:54:03PM +0100, Thomas Weißschuh wrote:
> > From: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > 
> > Add an allocation-free implementation of readdir() and related
> > functions. The implementation is modelled after the one for FILE.
> 
> I think you'd need to mention/remind the two important points that
> come out of that choice, one being that DIR is a fake pointer that
> instead stores ~fd so that it can be turned back to a valid FD, and
> that subsequent readdir() calls will only work from the same file
> unit since it relies on a local static storage.

Point one is true.
Point two not so much. It is fine to have multiple directories open at
the same time. All state is kept inside the kernel.
Only the *current* return value is in the static buffer.
So multithreading wouldn't work, but nolibc doesn't support that
anyways.

> Better have this visible in the commit message so that in the event
> someone faces a difficulty due to this, they can easily find that it's
> an on-purpose design choice.

Ack.

> > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
> > 
> > ---
> > I'm not entirely sure where to put it. It doesn't really belong into
> > stdio.h, but that's where the FILE stuff is.
> > sys.h wants alphabetical ordering, but IMO these functions should stick
> > together.
> 
> My man pages suggest that userland code will include <dirent.h>, thus
> I think it could be the moment to create it with that new code.

Ack. Now that you suggest it, it seems obvious.

> > diff --git a/tools/include/nolibc/stdio.h b/tools/include/nolibc/stdio.h
> > index 3892034198dd566d21a5cc0a9f67cf097d428393..1f275a0a7b6b2c6f1c15405d027c282bb77aa618 100644
> > --- a/tools/include/nolibc/stdio.h
> > +++ b/tools/include/nolibc/stdio.h
> (...)
> > +static __attribute__((unused))
> > +struct dirent *readdir(DIR *dirp)
> > +{
> > +	static struct dirent dirent;
> > +
> > +	char buf[sizeof(struct linux_dirent64) + NAME_MAX];
> 
> I'm uncertain where NAME_MAX is defined, I haven't found it in the
> nolibc sources, just double-checking that it's not just in your build
> environment by accident.

It comes from linux/limits.h. I don't think it can ever go away.

> > +	struct linux_dirent64 *ldir = (void *)buf;
> > +	intptr_t i = (intptr_t)dirp;
> > +	int fd, ret;
> > +
> > +	if (i >= 0) {
> > +		SET_ERRNO(EBADF);
> > +		return NULL;
> > +	}
> > +
> > +	fd = ~i;
> > +
> > +	ret = getdents64(fd, ldir, sizeof(buf));
> > +	if (ret == -1 || ret == 0)
> > +		return NULL;
> > +
> > +	/*
> > +	 * getdents64() returns as many entries as fit the buffer.
> > +	 * readdir() can only return one entry at a time.
> > +	 * Make sure the non-returned ones are not skipped.
> > +	 */
> > +	ret = lseek(fd, ldir->d_off, SEEK_SET);
> > +	if (ret == -1)
> > +		return NULL;
> > +
> > +	dirent = (struct dirent) {
> > +		.d_ino = ldir->d_ino,
> > +	};
> > +	strlcpy(dirent.d_name, ldir->d_name, sizeof(dirent.d_name));
> 
> Just out of curiosity, could this copy fail, and if so, should we handle
> it (e.g. NAME_MAX != 255) ? My guess here is that if it could almost never
> fail and checking it would needlessly complicate the function, let's just
> handle it with a comment for now. And if it cannot at all, let's mention
> why on top of it as well.

If NAME_MAX changes it would be a userspace regression IMO.
I'll add a comment.


Thanks!
Thomas
Willy Tarreau Feb. 1, 2025, 10:46 a.m. UTC | #3
On Sat, Feb 01, 2025 at 11:41:58AM +0100, Thomas Weißschuh wrote:
> On 2025-02-01 11:34:38+0100, Willy Tarreau wrote:
> > On Thu, Jan 30, 2025 at 08:54:03PM +0100, Thomas Weißschuh wrote:
> > > From: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > > 
> > > Add an allocation-free implementation of readdir() and related
> > > functions. The implementation is modelled after the one for FILE.
> > 
> > I think you'd need to mention/remind the two important points that
> > come out of that choice, one being that DIR is a fake pointer that
> > instead stores ~fd so that it can be turned back to a valid FD, and
> > that subsequent readdir() calls will only work from the same file
> > unit since it relies on a local static storage.
> 
> Point one is true.
> Point two not so much. It is fine to have multiple directories open at
> the same time. All state is kept inside the kernel.
> Only the *current* return value is in the static buffer.

Excellent point! It also needs to be mentioned.

> So multithreading wouldn't work, but nolibc doesn't support that
> anyways.

Not only that, but it also means recursive directory scanning is not
possible either, this definitely deserve being mentioned!

That's something we could improve in the future if there is some demand,
by also keeping a static "level" and use a hand-built stack. But let's
not overengineer something that nobody asked for yet ;-)

> > > I'm not entirely sure where to put it. It doesn't really belong into
> > > stdio.h, but that's where the FILE stuff is.
> > > sys.h wants alphabetical ordering, but IMO these functions should stick
> > > together.
> > 
> > My man pages suggest that userland code will include <dirent.h>, thus
> > I think it could be the moment to create it with that new code.
> 
> Ack. Now that you suggest it, it seems obvious.

Obvious is always the hardest thing to find as it's assumed to be known
and is rarely documented ;-)

> > I'm uncertain where NAME_MAX is defined, I haven't found it in the
> > nolibc sources, just double-checking that it's not just in your build
> > environment by accident.
> 
> It comes from linux/limits.h. I don't think it can ever go away.

Ah fine then!

> > Just out of curiosity, could this copy fail, and if so, should we handle
> > it (e.g. NAME_MAX != 255) ? My guess here is that if it could almost never
> > fail and checking it would needlessly complicate the function, let's just
> > handle it with a comment for now. And if it cannot at all, let's mention
> > why on top of it as well.
> 
> If NAME_MAX changes it would be a userspace regression IMO.
> I'll add a comment.

Perfect! Consider an acked-by from me on the next one as I'm generally
fine with the principles.

Willy
Thomas Weißschuh Feb. 2, 2025, 8:56 p.m. UTC | #4
On 2025-02-01 11:46:59+0100, Willy Tarreau wrote:
> On Sat, Feb 01, 2025 at 11:41:58AM +0100, Thomas Weißschuh wrote:
> > On 2025-02-01 11:34:38+0100, Willy Tarreau wrote:
> > > On Thu, Jan 30, 2025 at 08:54:03PM +0100, Thomas Weißschuh wrote:
> > > > From: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > > > 
> > > > Add an allocation-free implementation of readdir() and related
> > > > functions. The implementation is modelled after the one for FILE.
> > > 
> > > I think you'd need to mention/remind the two important points that
> > > come out of that choice, one being that DIR is a fake pointer that
> > > instead stores ~fd so that it can be turned back to a valid FD, and
> > > that subsequent readdir() calls will only work from the same file
> > > unit since it relies on a local static storage.
> > 
> > Point one is true.
> > Point two not so much. It is fine to have multiple directories open at
> > the same time. All state is kept inside the kernel.
> > Only the *current* return value is in the static buffer.
> 
> Excellent point! It also needs to be mentioned.

Unfortunately POSIX is more specific in it's definition and forbids this:

	The returned pointer, and pointers within the structure, might
	be invalidated or the structure or the storage areas might be
	overwritten by a subsequent call to readdir() on the same
	directory stream.

I see two possibilities:

Allocate one 'struct dirent' as part of DIR.
Only implement readdir_r().

While readdir_r() is deprecated (according to readdir(3) but not
readdir(3p)) this seems to be due to ABI issues, which shouldn't matter
for nolibc anyways.

Personally I would prefer readdir_r().

<snip>
diff mbox series

Patch

diff --git a/tools/include/nolibc/stdio.h b/tools/include/nolibc/stdio.h
index 3892034198dd566d21a5cc0a9f67cf097d428393..1f275a0a7b6b2c6f1c15405d027c282bb77aa618 100644
--- a/tools/include/nolibc/stdio.h
+++ b/tools/include/nolibc/stdio.h
@@ -387,6 +387,82 @@  const char *strerror(int errno)
 	return buf;
 }
 
+/* See comment of FILE above */
+typedef struct {
+	char dummy[1];
+} DIR;
+
+static __attribute__((unused))
+DIR *fdopendir(int fd)
+{
+	if (fd < 0) {
+		SET_ERRNO(EBADF);
+		return NULL;
+	}
+	return (DIR *)(intptr_t)~fd;
+}
+
+static __attribute__((unused))
+DIR *opendir(const char *name)
+{
+	int fd;
+
+	fd = open(name, O_RDONLY);
+	if (fd == -1)
+		return NULL;
+	return fdopendir(fd);
+}
+
+static __attribute__((unused))
+int closedir(DIR *dirp)
+{
+	intptr_t i = (intptr_t)dirp;
+
+	if (i >= 0) {
+		SET_ERRNO(EBADF);
+		return -1;
+	}
+	return close(~i);
+}
+
+static __attribute__((unused))
+struct dirent *readdir(DIR *dirp)
+{
+	static struct dirent dirent;
+
+	char buf[sizeof(struct linux_dirent64) + NAME_MAX];
+	struct linux_dirent64 *ldir = (void *)buf;
+	intptr_t i = (intptr_t)dirp;
+	int fd, ret;
+
+	if (i >= 0) {
+		SET_ERRNO(EBADF);
+		return NULL;
+	}
+
+	fd = ~i;
+
+	ret = getdents64(fd, ldir, sizeof(buf));
+	if (ret == -1 || ret == 0)
+		return NULL;
+
+	/*
+	 * getdents64() returns as many entries as fit the buffer.
+	 * readdir() can only return one entry at a time.
+	 * Make sure the non-returned ones are not skipped.
+	 */
+	ret = lseek(fd, ldir->d_off, SEEK_SET);
+	if (ret == -1)
+		return NULL;
+
+	dirent = (struct dirent) {
+		.d_ino = ldir->d_ino,
+	};
+	strlcpy(dirent.d_name, ldir->d_name, sizeof(dirent.d_name));
+
+	return &dirent;
+}
+
 /* make sure to include all global symbols */
 #include "nolibc.h"
 
diff --git a/tools/include/nolibc/types.h b/tools/include/nolibc/types.h
index b26a5d0c417c7c3b232f28e78cdd6d94a759b7bc..67885731fff3b2008efffe7c02f93f978ef7b078 100644
--- a/tools/include/nolibc/types.h
+++ b/tools/include/nolibc/types.h
@@ -179,6 +179,11 @@  struct linux_dirent64 {
 	char           d_name[];
 };
 
+struct dirent {
+	ino_t	d_ino;
+	char	d_name[256];
+};
+
 /* The format of the struct as returned by the libc to the application, which
  * significantly differs from the format returned by the stat() syscall flavours.
  */
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 0e0e3b48a8c3a6802c6989954b6f3a7c7258db43..18f769bcf68dfb190b157aeeba14f7904965377b 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -767,6 +767,42 @@  int test_getdents64(const char *dir)
 	return ret;
 }
 
+int test_directories(void)
+{
+	int comm = 0, cmdline = 0;
+	struct dirent *dirent;
+	DIR *dir;
+	int ret;
+
+	dir = opendir("/proc/self");
+	if (!dir)
+		return 1;
+
+	while (1) {
+		errno = 0;
+		dirent = readdir(dir);
+		if (!dirent)
+			break;
+
+		if (strcmp(dirent->d_name, "comm") == 0)
+			comm++;
+		else if (strcmp(dirent->d_name, "cmdline") == 0)
+			cmdline++;
+	}
+
+	if (errno)
+		return 1;
+
+	ret = closedir(dir);
+	if (ret)
+		return 1;
+
+	if (comm != 1 || cmdline != 1)
+		return 1;
+
+	return 0;
+}
+
 int test_getpagesize(void)
 {
 	int x = getpagesize();
@@ -1059,6 +1095,7 @@  int run_syscall(int min, int max)
 		CASE_TEST(fork);              EXPECT_SYSZR(1, test_fork()); break;
 		CASE_TEST(getdents64_root);   EXPECT_SYSNE(1, test_getdents64("/"), -1); break;
 		CASE_TEST(getdents64_null);   EXPECT_SYSER(1, test_getdents64("/dev/null"), -1, ENOTDIR); break;
+		CASE_TEST(directories);       EXPECT_SYSZR(proc, test_directories()); break;
 		CASE_TEST(gettimeofday_tv);   EXPECT_SYSZR(1, gettimeofday(&tv, NULL)); break;
 		CASE_TEST(gettimeofday_tv_tz);EXPECT_SYSZR(1, gettimeofday(&tv, &tz)); break;
 		CASE_TEST(getpagesize);       EXPECT_SYSZR(1, test_getpagesize()); break;