diff mbox series

kcmp: add separate Kconfig symbol for kcmp syscall

Message ID 20200710075632.14661-1-linux@rasmusvillemoes.dk (mailing list archive)
State New, archived
Headers show
Series kcmp: add separate Kconfig symbol for kcmp syscall | expand

Commit Message

Rasmus Villemoes July 10, 2020, 7:56 a.m. UTC
The ability to check open file descriptions for equality (without
resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
useful outside of the checkpoint/restore use case - for example,
systemd uses kcmp() to deduplicate the per-service file descriptor
store.

Make it possible to have the kcmp() syscall without the full
CONFIG_CHECKPOINT_RESTORE.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
I deliberately drop the ifdef in the eventpoll.h header rather than
replace with KCMP_SYSCALL; it's harmless to declare a function that
isn't defined anywhere.

 fs/eventpoll.c            |  4 ++--
 include/linux/eventpoll.h |  2 --
 init/Kconfig              | 11 +++++++++++
 kernel/Makefile           |  2 +-
 4 files changed, 14 insertions(+), 5 deletions(-)

Comments

Cyrill Gorcunov July 10, 2020, 8:30 a.m. UTC | #1
On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
> The ability to check open file descriptions for equality (without
> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
> useful outside of the checkpoint/restore use case - for example,
> systemd uses kcmp() to deduplicate the per-service file descriptor
> store.
> 
> Make it possible to have the kcmp() syscall without the full
> CONFIG_CHECKPOINT_RESTORE.
> 
> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
> I deliberately drop the ifdef in the eventpoll.h header rather than
> replace with KCMP_SYSCALL; it's harmless to declare a function that
> isn't defined anywhere.

Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
is not suitable? Still the overall idea is fine for me, thanks!

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Rasmus Villemoes July 10, 2020, 9:05 a.m. UTC | #2
On 10/07/2020 10.30, Cyrill Gorcunov wrote:
> On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
>> The ability to check open file descriptions for equality (without
>> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
>> useful outside of the checkpoint/restore use case - for example,
>> systemd uses kcmp() to deduplicate the per-service file descriptor
>> store.
>>
>> Make it possible to have the kcmp() syscall without the full
>> CONFIG_CHECKPOINT_RESTORE.
>>
>> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
>> ---
>> I deliberately drop the ifdef in the eventpoll.h header rather than
>> replace with KCMP_SYSCALL; it's harmless to declare a function that
>> isn't defined anywhere.
> 
> Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
> is not suitable?

It's just from a general "avoid ifdef clutter if possible" POV. The
conditional declaration of the function doesn't really serve any purpose.

> Still the overall idea is fine for me, thanks!>
> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>

Thank you.

Rasmus
Cyrill Gorcunov July 10, 2020, 9:37 a.m. UTC | #3
On Fri, Jul 10, 2020 at 11:05:11AM +0200, Rasmus Villemoes wrote:
> >> I deliberately drop the ifdef in the eventpoll.h header rather than
> >> replace with KCMP_SYSCALL; it's harmless to declare a function that
> >> isn't defined anywhere.
> > 
> > Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
> > is not suitable?
> 
> It's just from a general "avoid ifdef clutter if possible" POV. The
> conditional declaration of the function doesn't really serve any
> purpose.

OK, thanks for explanation.
Randy Dunlap July 10, 2020, 3:51 p.m. UTC | #4
On 7/10/20 12:56 AM, Rasmus Villemoes wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index 0498af567f70..95e9486d4217 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1158,9 +1158,20 @@ config NET_NS
>  
>  endif # NAMESPACES
>  
> +config KCMP_SYSCALL
> +	bool "kcmp system call"
> +	help
> +	  Enable the kcmp system call, which allows one to determine
> +	  whether to tasks share various kernel resources, for example

	Either s/to/two/ or s/to//
?

> +	  whether they share address space, or if two file descriptors
> +	  refer to the same open file description.
> +
> +	  If unsure, say N.
> +
>  config CHECKPOINT_RESTORE
>  	bool "Checkpoint/restore support"
>  	select PROC_CHILDREN
> +	select KCMP_SYSCALL
>  	default n
>  	help
>  	  Enables additional kernel features in a sake of checkpoint/restore.
Matthew Wilcox July 10, 2020, 3:57 p.m. UTC | #5
On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
> The ability to check open file descriptions for equality (without
> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
> useful outside of the checkpoint/restore use case - for example,
> systemd uses kcmp() to deduplicate the per-service file descriptor
> store.
> 
> Make it possible to have the kcmp() syscall without the full
> CONFIG_CHECKPOINT_RESTORE.

If systemd is using it, is it even worth making it conditional any more?
Maybe for CONFIG_EXPERT builds, it could be de-selectable.
Rasmus Villemoes Oct. 13, 2020, 7:45 p.m. UTC | #6
On 10/07/2020 17.57, Matthew Wilcox wrote:
> On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
>> The ability to check open file descriptions for equality (without
>> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
>> useful outside of the checkpoint/restore use case - for example,
>> systemd uses kcmp() to deduplicate the per-service file descriptor
>> store.
>>
>> Make it possible to have the kcmp() syscall without the full
>> CONFIG_CHECKPOINT_RESTORE.
> 
> If systemd is using it, is it even worth making it conditional any more?
> Maybe for CONFIG_EXPERT builds, it could be de-selectable.
> 

[hm, I dropped the ball, sorry for the necromancy]

Well, first, I don't want to change any defaults here, if that is to be
done, it should be a separate patch.

Second, yes, systemd uses it for the de-duplication, and for that reason
recommends CONFIG_CHECKPOINT_RESTORE (at least, according to their
README) - but I'm not aware of any daemons that actually make use of
systemd's file descriptor store, so it's not really something essential
to every systemd-based system out there. It would be nice if systemd
could change its recommendation to just CONFIG_KCMP_SYSCALL.

But it's also useful for others, e.g. I have some code that wants to
temporarily replace stdin/stdout/stderr with some other file
descriptors, but needs to preserve the '0 is a dup of 1, or not' state
(i.e., is the same struct file) - that cannot reliably be determined
from fstat()/lseek(SEEK_CUR)/F_GETFL or whatever else one could throw at
an fd.

Rasmus
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..b0313ce2df73 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1064,7 +1064,7 @@  static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd)
 	return epir;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP_SYSCALL
 static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff)
 {
 	struct rb_node *rbp;
@@ -1106,7 +1106,7 @@  struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
 
 	return file_raw;
 }
-#endif /* CONFIG_CHECKPOINT_RESTORE */
+#endif /* CONFIG_KCMP_SYSCALL */
 
 /**
  * Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 8f000fada5a4..aa799295e373 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -18,9 +18,7 @@  struct file;
 
 #ifdef CONFIG_EPOLL
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
 struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff);
-#endif
 
 /* Used to initialize the epoll bits inside the "struct file" */
 static inline void eventpoll_init_file(struct file *file)
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..95e9486d4217 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1158,9 +1158,20 @@  config NET_NS
 
 endif # NAMESPACES
 
+config KCMP_SYSCALL
+	bool "kcmp system call"
+	help
+	  Enable the kcmp system call, which allows one to determine
+	  whether to tasks share various kernel resources, for example
+	  whether they share address space, or if two file descriptors
+	  refer to the same open file description.
+
+	  If unsure, say N.
+
 config CHECKPOINT_RESTORE
 	bool "Checkpoint/restore support"
 	select PROC_CHILDREN
+	select KCMP_SYSCALL
 	default n
 	help
 	  Enables additional kernel features in a sake of checkpoint/restore.
diff --git a/kernel/Makefile b/kernel/Makefile
index f3218bc5ec69..3daedba2146a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -49,7 +49,7 @@  obj-y += rcu/
 obj-y += livepatch/
 obj-y += dma/
 
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
+obj-$(CONFIG_KCMP_SYSCALL) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o