diff mbox

[0/2] eventfd: new EFD_STATE flag

Message ID 1251363930-3916-1-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Aug. 27, 2009, 9:05 a.m. UTC
> Ok, so why not using the eventfd counter as state?
> On the device side:
> 
> void write_state(int sfd, int state) {
> 	u64 cnt;
> 
> 	/* Clear the current state, sfd is in non-blocking mode */
> 	read(sfd,&cnt, sizeof(cnt));
> 	/* Writes new state */
> 	cnt = 1 + !!state;
> 	write(sfd,&cnt, sizeof(cnt));
> }

It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
added exactly to avoid a read+write combination for the case of
decrementing a value.  Here it's the same, just it's about the case of
writing a *given* value.  What about having EFD_STATE simply mean "do
not use a counter, just write the value" without affecting the way
read works, and use

	/* Writes new state */
	cnt = 1 + !!state;
	write(sfd,&cnt, sizeof(cnt));

See below?

Paolo

> On the hypervisor side:
> 
> int read_state(int sfd) {
> 	u64 cnt;
> 
> 	read(sfd,&cnt, sizeof(cnt));
> 	return state - 1;
> }


------------- 8<-- ---------------
Subject: [PATCH] eventfd: new EFD_ABSOLUTE flag

This implements a new EFD_ABSOLUTE flag for eventfd.
This changes eventfd behaviour so that write simply
stores the value written, and is always non-blocking.

Untested (I just modified Michael's patch, and given
the simpler code I'm not sure it's now worthwhile
introducing the inline functions), but otherwise

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 fs/eventfd.c            |   13 ++++++++-----
 include/linux/eventfd.h |    4 ++--
 2 files changed, 10 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Michael S. Tsirkin Aug. 27, 2009, 9:09 a.m. UTC | #1
On Thu, Aug 27, 2009 at 11:05:30AM +0200, Paolo Bonzini wrote:
> > Ok, so why not using the eventfd counter as state?
> > On the device side:
> > 
> > void write_state(int sfd, int state) {
> > 	u64 cnt;
> > 
> > 	/* Clear the current state, sfd is in non-blocking mode */
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	/* Writes new state */
> > 	cnt = 1 + !!state;
> > 	write(sfd,&cnt, sizeof(cnt));
> > }
> 
> It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> added exactly to avoid a read+write combination for the case of
> decrementing a value.  Here it's the same, just it's about the case of
> writing a *given* value.  What about having EFD_STATE simply mean "do
> not use a counter, just write the value" without affecting the way
> read works, and use
> 
> 	/* Writes new state */
> 	cnt = 1 + !!state;
> 	write(sfd,&cnt, sizeof(cnt));

That would work for kvm.

> See below?
> 
> Paolo
> 
> > On the hypervisor side:
> > 
> > int read_state(int sfd) {
> > 	u64 cnt;
> > 
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	return state - 1;
> > }
> 
> 
> ------------- 8<-- ---------------
> Subject: [PATCH] eventfd: new EFD_ABSOLUTE flag
> 
> This implements a new EFD_ABSOLUTE flag for eventfd.
> This changes eventfd behaviour so that write simply
> stores the value written, and is always non-blocking.
> 
> Untested (I just modified Michael's patch, and given
> the simpler code I'm not sure it's now worthwhile
> introducing the inline functions), but otherwise
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  fs/eventfd.c            |   13 ++++++++-----
>  include/linux/eventfd.h |    4 ++--
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 347a0e0..7b279e3 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -31,8 +31,6 @@
>  
>  static inline int eventfd_writeable(struct eventfd_ctx *ctx, u64 n)
>  {
> -	return ULLONG_MAX - n > ctx->count;
> -	return (ctx->flags & EFD_ABSOLUTE) || (ULLONG_MAX - n > ctx->count);
>  }
>  
>  static inline int eventfd_overflow(struct eventfd_ctx *ctx, u64 cnt)
> @@ -42,10 +40,14 @@
>  
>  static inline void eventfd_dowrite(struct eventfd_ctx *ctx, u64 ucnt)
>  {
> -	if (eventfd_writeable(ctx, ucnt))
> -		ucnt = ULLONG_MAX - ctx->count;
> +	if (ctx->flags & EFD_ABSOLUTE)
> +		ctx->count = ucnt;
> +	else {
> +		if (ULLONG_MAX - ctx->count < ucnt)
> +			ucnt = ULLONG_MAX - ctx->count;
>  
> -	ctx->count += ucnt;
> +		ctx->count += ucnt;
> +	}
>  }
>  
>  static inline u64 eventfd_doread(struct eventfd_ctx *ctx)
> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index 3b85ba6..78ff649 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h
> @@ -19,11 +19,12 @@
>   * shared O_* flags.
>   */
>  #define EFD_SEMAPHORE (1 << 0)
> +#define EFD_ABSOLUTE (1 << 1)
>  #define EFD_CLOEXEC O_CLOEXEC
>  #define EFD_NONBLOCK O_NONBLOCK
>  
>  #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> -#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
> +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_ABSOLUTE)
>  
>  #ifdef CONFIG_EVENTFD
> --
> 1.6.2.5 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davide Libenzi Aug. 27, 2009, 2:21 p.m. UTC | #2
On Thu, 27 Aug 2009, Paolo Bonzini wrote:

> > Ok, so why not using the eventfd counter as state?
> > On the device side:
> > 
> > void write_state(int sfd, int state) {
> > 	u64 cnt;
> > 
> > 	/* Clear the current state, sfd is in non-blocking mode */
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	/* Writes new state */
> > 	cnt = 1 + !!state;
> > 	write(sfd,&cnt, sizeof(cnt));
> > }
> 
> It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> added exactly to avoid a read+write combination for the case of
> decrementing a value.

Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
*semaphore* is a pretty widely known and used abstraction.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Aug. 27, 2009, 2:30 p.m. UTC | #3
On Thu, Aug 27, 2009 at 07:21:49AM -0700, Davide Libenzi wrote:
> On Thu, 27 Aug 2009, Paolo Bonzini wrote:
> 
> > > Ok, so why not using the eventfd counter as state?
> > > On the device side:
> > > 
> > > void write_state(int sfd, int state) {
> > > 	u64 cnt;
> > > 
> > > 	/* Clear the current state, sfd is in non-blocking mode */
> > > 	read(sfd,&cnt, sizeof(cnt));
> > > 	/* Writes new state */
> > > 	cnt = 1 + !!state;
> > > 	write(sfd,&cnt, sizeof(cnt));
> > > }
> > 
> > It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> > added exactly to avoid a read+write combination for the case of
> > decrementing a value.
> 
> Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
> *semaphore* is a pretty widely known and used abstraction.

what about an atomic variable, btw?  does it make sense to implement
write that does compare and exchange?

> 
> - Davide
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davide Libenzi Aug. 27, 2009, 2:38 p.m. UTC | #4
On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> On Thu, Aug 27, 2009 at 07:21:49AM -0700, Davide Libenzi wrote:
> > On Thu, 27 Aug 2009, Paolo Bonzini wrote:
> > 
> > > > Ok, so why not using the eventfd counter as state?
> > > > On the device side:
> > > > 
> > > > void write_state(int sfd, int state) {
> > > > 	u64 cnt;
> > > > 
> > > > 	/* Clear the current state, sfd is in non-blocking mode */
> > > > 	read(sfd,&cnt, sizeof(cnt));
> > > > 	/* Writes new state */
> > > > 	cnt = 1 + !!state;
> > > > 	write(sfd,&cnt, sizeof(cnt));
> > > > }
> > > 
> > > It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> > > added exactly to avoid a read+write combination for the case of
> > > decrementing a value.
> > 
> > Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
> > *semaphore* is a pretty widely known and used abstraction.
> 
> what about an atomic variable, btw?  does it make sense to implement
> write that does compare and exchange?

It is surprising to me, that is front of a workable solution w/out any 
use-once additions, yet you want to try to add optimizations and new 
ad-hoc abstractions to user visible interfaces.
Now, you tell me what an atomic variable has to do with an eventfd.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Aug. 27, 2009, 2:49 p.m. UTC | #5
On Thu, Aug 27, 2009 at 07:38:46AM -0700, Davide Libenzi wrote:
> On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:
> 
> > On Thu, Aug 27, 2009 at 07:21:49AM -0700, Davide Libenzi wrote:
> > > On Thu, 27 Aug 2009, Paolo Bonzini wrote:
> > > 
> > > > > Ok, so why not using the eventfd counter as state?
> > > > > On the device side:
> > > > > 
> > > > > void write_state(int sfd, int state) {
> > > > > 	u64 cnt;
> > > > > 
> > > > > 	/* Clear the current state, sfd is in non-blocking mode */
> > > > > 	read(sfd,&cnt, sizeof(cnt));
> > > > > 	/* Writes new state */
> > > > > 	cnt = 1 + !!state;
> > > > > 	write(sfd,&cnt, sizeof(cnt));
> > > > > }
> > > > 
> > > > It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> > > > added exactly to avoid a read+write combination for the case of
> > > > decrementing a value.
> > > 
> > > Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
> > > *semaphore* is a pretty widely known and used abstraction.
> > 
> > what about an atomic variable, btw?  does it make sense to implement
> > write that does compare and exchange?
> 
> It is surprising to me, that is front of a workable solution w/out any 
> use-once additions, yet you want to try to add optimizations and new 
> ad-hoc abstractions to user visible interfaces.
> Now, you tell me what an atomic variable has to do with an eventfd.
> 
> 
> - Davide
> 

Oh, I stopped pushing EFD_STATE since we have a solution.
I am just trying to grok what does and what does not consititute a
use-once addition, in your mind, and what does and what does not
belong in eventfd. The reason atomic does not belong there and
semaphore does is because one waits on semaphore but not
on atomic? Is that it?
Davide Libenzi Aug. 27, 2009, 3:29 p.m. UTC | #6
On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> Oh, I stopped pushing EFD_STATE since we have a solution.
> I am just trying to grok what does and what does not consititute a
> use-once addition, in your mind, and what does and what does not
> belong in eventfd. The reason atomic does not belong there and
> semaphore does is because one waits on semaphore but not
> on atomic? Is that it?

An atomic variable is not a synchronization interface.
Even if it'd come up that we really need an atomic variable abstraction 
via an fd, this should not go via eventfd.
Yeah, maybe the name syncfd would have been better, but the very reason 
why eventfd born was to allow KAIO to signal events to poll/select/epoll.
That first implementation was also usable as a mutex.
Then it was propsed to make eventfd to act as a semaphore too. Code was 
trivial, a semaphore fitted the interface, and the abstraction was a 
pretty damn known too. So it went in.
Yes, you could have implemented a semaphore with the existing eventfd, by 
reading the counter and writing counter-1. But again, a semaphore was 
something widely known and generic enough, and was fitting the bill.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davide Libenzi Aug. 27, 2009, 5:09 p.m. UTC | #7
On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> Oh, I stopped pushing EFD_STATE since we have a solution.

Do you guys need the kernel-side eventfd_ctx_read() I posted or not?
Because if nobody uses it, I'm not going to push it.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 347a0e0..7b279e3 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -31,8 +31,6 @@ 
 
 static inline int eventfd_writeable(struct eventfd_ctx *ctx, u64 n)
 {
-	return ULLONG_MAX - n > ctx->count;
-	return (ctx->flags & EFD_ABSOLUTE) || (ULLONG_MAX - n > ctx->count);
 }
 
 static inline int eventfd_overflow(struct eventfd_ctx *ctx, u64 cnt)
@@ -42,10 +40,14 @@ 
 
 static inline void eventfd_dowrite(struct eventfd_ctx *ctx, u64 ucnt)
 {
-	if (eventfd_writeable(ctx, ucnt))
-		ucnt = ULLONG_MAX - ctx->count;
+	if (ctx->flags & EFD_ABSOLUTE)
+		ctx->count = ucnt;
+	else {
+		if (ULLONG_MAX - ctx->count < ucnt)
+			ucnt = ULLONG_MAX - ctx->count;
 
-	ctx->count += ucnt;
+		ctx->count += ucnt;
+	}
 }
 
 static inline u64 eventfd_doread(struct eventfd_ctx *ctx)
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 3b85ba6..78ff649 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -19,11 +19,12 @@ 
  * shared O_* flags.
  */
 #define EFD_SEMAPHORE (1 << 0)
+#define EFD_ABSOLUTE (1 << 1)
 #define EFD_CLOEXEC O_CLOEXEC
 #define EFD_NONBLOCK O_NONBLOCK
 
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
-#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
+#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_ABSOLUTE)
 
 #ifdef CONFIG_EVENTFD
--
1.6.2.5