diff mbox series

[RFC,v2,02/13] epoll: introduce user structures for polling from userspace

Message ID 20190121201456.28338-3-rpenyaev@suse.de (mailing list archive)
State New, archived
Headers show
Series epoll: support pollable epoll from userspace | expand

Commit Message

Roman Penyaev Jan. 21, 2019, 8:14 p.m. UTC
This one introduces structures of user items array:

struct user_epheader -
    describes inserted epoll items.

struct user_epitem -
    single epoll item, visible to userspace.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c                 |  9 +++++++++
 include/uapi/linux/eventpoll.h | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+)

Comments

Linus Torvalds Jan. 21, 2019, 9:34 p.m. UTC | #1
So I'm not entirely convinced, but I guess actual numbers and users
might convince me otherwise.

However, a quick comment:

On Tue, Jan 22, 2019 at 9:15 AM Roman Penyaev <rpenyaev@suse.de> wrote:
>
> +struct epoll_uitem {
> +       __poll_t ready_events;
> +       struct epoll_event event;
> +};

This really ends up being a horrible data structure.

struct epoll_event is declared as

    struct epoll_event {
            __poll_t events;
            __u64 data;
    } EPOLL_PACKED;

and __poll_t is "unsigned". So on pretty much all 64-bit architectures
except for x86-64 (which sets that packed attribute), you have a
packing hole there in between the events and the data, and "struct
epoll_event" has 8-byte alignment.

Now, in "struct epoll_uitem", you end up having *another* packing hold
in between "ready_events" and "struct epoll_event".

So this data structure that has 16 bytes of actual data, ends up being
24 bytes in size.

Again, x86-64 happens to be the exception to this, but that's a random
small implementation detail, not a design thing.

I think "struct epoll_event" was badly designed to begin with to have
this issue, but it shouldn't then be an excuse to make things even
worse with this array of "struct epoll_uitem" things.

Hmm?

              Linus
Roman Penyaev Jan. 22, 2019, 11:46 a.m. UTC | #2
On 2019-01-21 22:34, Linus Torvalds wrote:
> So I'm not entirely convinced, but I guess actual numbers and users
> might convince me otherwise.
> 
> However, a quick comment:
> 
> On Tue, Jan 22, 2019 at 9:15 AM Roman Penyaev <rpenyaev@suse.de> wrote:
>> 
>> +struct epoll_uitem {
>> +       __poll_t ready_events;
>> +       struct epoll_event event;
>> +};
> 
> This really ends up being a horrible data structure.
> 
> struct epoll_event is declared as
> 
>     struct epoll_event {
>             __poll_t events;
>             __u64 data;
>     } EPOLL_PACKED;
> 
> and __poll_t is "unsigned". So on pretty much all 64-bit architectures
> except for x86-64 (which sets that packed attribute), you have a
> packing hole there in between the events and the data, and "struct
> epoll_event" has 8-byte alignment.
> 
> Now, in "struct epoll_uitem", you end up having *another* packing hold
> in between "ready_events" and "struct epoll_event".
> 
> So this data structure that has 16 bytes of actual data, ends up being
> 24 bytes in size.
> 
> Again, x86-64 happens to be the exception to this, but that's a random
> small implementation detail, not a design thing.
> 
> I think "struct epoll_event" was badly designed to begin with to have
> this issue, but it shouldn't then be an excuse to make things even
> worse with this array of "struct epoll_uitem" things.
> 
> Hmm?

Ha! Yes, you are right.  Eyes see "packed" and brain responds
"ok, this is 12 bytes, + 4 for ready_events = 16, perfect".
I have not paid any attention to how actually this EPOLL_PACKED is
defined.  Not nice at all.  I will unfold the structure like this:

/*
  * Item, shared with userspace.  Unfortunately we can't embed 
epoll_event
  * structure, because it is badly aligned on all 64-bit archs, except
  * x86-64 (see EPOLL_PACKED).  sizeof(epoll_uitem) == 16
  */
struct epoll_uitem {
	__poll_t ready_events;
	__poll_t events;
	__u64 data;
};

Also BUILD_BUG_ON(sizeof(epoll_uitem) != 16) somewhere in alloc won't
hurt.

--
Roman
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2cc183e86a29..f598442512f3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -9,6 +9,8 @@ 
  *
  *  Davide Libenzi <davidel@xmailserver.org>
  *
+ *  Polling from userspace support by Roman Penyaev <rpenyaev@suse.de>
+ *  (C) Copyright 2019 SUSE, All Rights Reserved
  */
 
 #include <linux/init.h>
@@ -109,6 +111,13 @@ 
 
 #define EP_ITEM_COST (sizeof(struct epitem) + sizeof(struct eppoll_entry))
 
+/*
+ * That is around 1.3mb of allocated memory for one epfd.  What is more
+ * important is ->index_length, which should be ^2, so do not increase
+ * max items number to avoid size doubling of user index.
+ */
+#define EP_USERPOLL_MAX_ITEMS_NR 65536
+
 struct epoll_filefd {
 	struct file *file;
 	int fd;
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index 39dfc29f0f52..690a625ddeb2 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -79,4 +79,23 @@  struct epoll_event {
 	__u64 data;
 } EPOLL_PACKED;
 
+struct epoll_uitem {
+	__poll_t ready_events;
+	struct epoll_event event;
+};
+
+#define EPOLL_USERPOLL_HEADER_SIZE  128
+#define EPOLL_USERPOLL_HEADER_MAGIC 0xeb01eb01
+
+struct epoll_uheader {
+	u32 magic;          /* epoll user header magic */
+	u32 header_length;  /* length of the header + items */
+	u32 index_length;   /* length of the index ring, always pow2 */
+	u32 max_items_nr;   /* max number of items */
+	u32 head;           /* updated by userland */
+	u32 tail;           /* updated by kernel */
+
+	struct epoll_uitem items[] __aligned(EPOLL_USERPOLL_HEADER_SIZE);
+};
+
 #endif /* _UAPI_LINUX_EVENTPOLL_H */