[RFC,02/18] blktrace: add more definitions for BLK_TC_ACT
diff mbox series

Message ID 20190501042831.5313-3-chaitanya.kulkarni@wdc.com
State New
Headers show
Series
  • blktrace: add blktrace extension support
Related show

Commit Message

Chaitanya Kulkarni May 1, 2019, 4:28 a.m. UTC
Now that we have increase the size of action mask we can now safely add
new blktrace actions and extend the code to track more block layer
request flags.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 include/uapi/linux/blktrace_api.h | 63 +++++++++++++++++++++----------
 1 file changed, 44 insertions(+), 19 deletions(-)

Comments

hch@infradead.org May 1, 2019, 12:31 p.m. UTC | #1
On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote:
> @@ -104,7 +120,12 @@ struct blk_io_trace {
>  	__u64 time;		/* in nanoseconds */
>  	__u64 sector;		/* disk offset */
>  	__u32 bytes;		/* transfer length */
> +
> +#ifdef CONFIG_BLKTRACE_EXT
> +	__u64 action;		/* what happened */
> +#else
>  	__u32 action;		/* what happened */
> +#endif /* CONFIG_BLKTRACE_EXT */

You can't use CONFIG_ symbols in UAPI headers, as userspace
applications won't set it.  You also can't ever change the layout of an
existing structure in UAPI headers in not backward compatible way.
Jeff Moyer May 1, 2019, 12:56 p.m. UTC | #2
Christoph Hellwig <hch@infradead.org> writes:

> On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote:
>> @@ -104,7 +120,12 @@ struct blk_io_trace {
>>  	__u64 time;		/* in nanoseconds */
>>  	__u64 sector;		/* disk offset */
>>  	__u32 bytes;		/* transfer length */
>> +
>> +#ifdef CONFIG_BLKTRACE_EXT
>> +	__u64 action;		/* what happened */
>> +#else
>>  	__u32 action;		/* what happened */
>> +#endif /* CONFIG_BLKTRACE_EXT */
>
> You can't use CONFIG_ symbols in UAPI headers, as userspace
> applications won't set it.  You also can't ever change the layout of an
> existing structure in UAPI headers in not backward compatible way.

Right.  The blk_io_trace->magic has the lower 8 bits reserved for a
version number which is checked by userspace.  There's no way to
negotiate a supported version between userspace and the kernel,
unfortunately.  The version number is checked for each trace event.

What you *could* do is to add another trace event with a higher version
number that includes only the extra data.  So each event would be split
into two: the original event with original content and the new event
that only contains the new fields.  That way the old userspace would
continue to work, as it would discard the trace events it doesn't
recognize.  Newer userspace could handle both types of events, and merge
them back together.

There would be a ton of warnings spewed on stderr, unfortunately, but it
would at least work.  I don't see a lot of value in the kernel config
option, no matter which way we go with this.

-Jeff
Chaitanya Kulkarni May 2, 2019, 3:48 a.m. UTC | #3
Thanks for the reply Jeff.

On 5/1/19 5:56 AM, Jeff Moyer wrote:
> Christoph Hellwig <hch@infradead.org> writes:
> 
>> On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote:
>>> @@ -104,7 +120,12 @@ struct blk_io_trace {
>>>   	__u64 time;		/* in nanoseconds */
>>>   	__u64 sector;		/* disk offset */
>>>   	__u32 bytes;		/* transfer length */
>>> +
>>> +#ifdef CONFIG_BLKTRACE_EXT
>>> +	__u64 action;		/* what happened */
>>> +#else
>>>   	__u32 action;		/* what happened */
>>> +#endif /* CONFIG_BLKTRACE_EXT */
>>
>> You can't use CONFIG_ symbols in UAPI headers, as userspace
>> applications won't set it.  You also can't ever change the layout of an
>> existing structure in UAPI headers in not backward compatible way.
> 
> Right.  The blk_io_trace->magic has the lower 8 bits reserved for a
> version number which is checked by userspace.  There's no way to
> negotiate a supported version between userspace and the kernel,
> unfortunately.  The version number is checked for each trace event.
> 
> What you *could* do is to add another trace event with a higher version
> number that includes only the extra data.  So each event would be split
> into two: the original event with original content and the new event
> that only contains the new fields.  That way the old userspace would
> continue to work, as it would discard the trace events it doesn't
> recognize.  Newer userspace could handle both types of events, and merge
> them back together.
> 
> There would be a ton of warnings spewed on stderr, unfortunately, but it
> would at least work.  I don't see a lot of value in the kernel config
> option, no matter which way we go with this.
>
As you have mentioned this approach will have a lot of stderr, I was 
trying to avoid this scenario. If everyone is okay with this will make 
this change and resend the series.

> -Jeff
>
Chaitanya Kulkarni May 2, 2019, 3:49 a.m. UTC | #4
Thanks for looking into this.

On 5/1/19 5:31 AM, Christoph Hellwig wrote:
> On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote:
>> @@ -104,7 +120,12 @@ struct blk_io_trace {
>>   	__u64 time;		/* in nanoseconds */
>>   	__u64 sector;		/* disk offset */
>>   	__u32 bytes;		/* transfer length */
>> +
>> +#ifdef CONFIG_BLKTRACE_EXT
>> +	__u64 action;		/* what happened */
>> +#else
>>   	__u32 action;		/* what happened */
>> +#endif /* CONFIG_BLKTRACE_EXT */
> 
> You can't use CONFIG_ symbols in UAPI headers, as userspace
> applications won't set it.  You also can't ever change the layout of an
> existing structure in UAPI headers in not backward compatible way.
> 
Jeff has suggested another approach, if everyone is okay with that 
approach will send out the series with that change.

Please let me know if you have more comments.

Patch
diff mbox series

diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index 690621b610e5..c34cf752a9a1 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -8,30 +8,42 @@ 
  * Trace categories
  */
 enum blktrace_cat {
-	BLK_TC_READ	= 1 << 0,	/* reads */
-	BLK_TC_WRITE	= 1 << 1,	/* writes */
-	BLK_TC_FLUSH	= 1 << 2,	/* flush */
-	BLK_TC_SYNC	= 1 << 3,	/* sync IO */
-	BLK_TC_SYNCIO	= BLK_TC_SYNC,
-	BLK_TC_QUEUE	= 1 << 4,	/* queueing/merging */
-	BLK_TC_REQUEUE	= 1 << 5,	/* requeueing */
-	BLK_TC_ISSUE	= 1 << 6,	/* issue */
-	BLK_TC_COMPLETE	= 1 << 7,	/* completions */
-	BLK_TC_FS	= 1 << 8,	/* fs requests */
-	BLK_TC_PC	= 1 << 9,	/* pc requests */
-	BLK_TC_NOTIFY	= 1 << 10,	/* special message */
-	BLK_TC_AHEAD	= 1 << 11,	/* readahead */
-	BLK_TC_META	= 1 << 12,	/* metadata */
-	BLK_TC_DISCARD	= 1 << 13,	/* discard requests */
-	BLK_TC_DRV_DATA	= 1 << 14,	/* binary per-driver data */
-	BLK_TC_FUA	= 1 << 15,	/* fua requests */
-
-	BLK_TC_END	= 1 << 15,	/* we've run out of bits! */
+	BLK_TC_READ		= 1 << 0,	/* reads */
+	BLK_TC_WRITE		= 1 << 1,	/* writes */
+	BLK_TC_FLUSH		= 1 << 2,	/* flush */
+	BLK_TC_SYNC		= 1 << 3,	/* sync IO */
+	BLK_TC_SYNCIO		= BLK_TC_SYNC,
+	BLK_TC_QUEUE		= 1 << 4,	/* queueing/merging */
+	BLK_TC_REQUEUE		= 1 << 5,	/* requeueing */
+	BLK_TC_ISSUE		= 1 << 6,	/* issue */
+	BLK_TC_COMPLETE		= 1 << 7,	/* completions */
+	BLK_TC_FS		= 1 << 8,	/* fs requests */
+	BLK_TC_PC		= 1 << 9,	/* pc requests */
+	BLK_TC_NOTIFY		= 1 << 10,	/* special message */
+	BLK_TC_AHEAD		= 1 << 11,	/* readahead */
+	BLK_TC_META		= 1 << 12,	/* metadata */
+	BLK_TC_DISCARD		= 1 << 13,	/* discard requests */
+	BLK_TC_DRV_DATA		= 1 << 14,	/* binary per-driver data */
+	BLK_TC_FUA		= 1 << 15,	/* fua requests */
+
+#ifdef CONFIG_BLKTRACE_EXT
+	BLK_TC_WRITE_ZEROES	= 1 << 16,	/* write-zeores */
+	BLK_TC_ZONE_RESET	= 1 << 17,	/* zone-reset */
+
+	BLK_TC_END		= 1 << 31,	/* we've run out of bits! */
+#else
+	BLK_TC_END		= 1 << 16,	/* we've run out of bits! */
+#endif /* CONFIG_BLKTRACE_EXT */
 };
 
+#ifdef CONFIG_BLKTRACE_EXT
+#define BLK_TC_SHIFT		(32)
+#define BLK_TC_ACT(act)		(((u64)act) << BLK_TC_SHIFT)
+#else
 #define BLK_TC_SHIFT		(16)
 #define BLK_TC_ACT(act)		((act) << BLK_TC_SHIFT)
 
+#endif /* CONFIG_BLKTRACE_EXT */
 /*
  * Basic trace actions
  */
@@ -93,7 +105,11 @@  enum blktrace_notify {
 #define BLK_TN_MESSAGE		(__BLK_TN_MESSAGE | BLK_TC_ACT(BLK_TC_NOTIFY))
 
 #define BLK_IO_TRACE_MAGIC	0x65617400
+#ifdef CONFIG_BLKTRACE_EXT
+#define BLK_IO_TRACE_VERSION	0x08
+#else
 #define BLK_IO_TRACE_VERSION	0x07
+#endif /* CONFIG_BLKTRACE_EXT */
 
 /*
  * The trace itself
@@ -104,7 +120,12 @@  struct blk_io_trace {
 	__u64 time;		/* in nanoseconds */
 	__u64 sector;		/* disk offset */
 	__u32 bytes;		/* transfer length */
+
+#ifdef CONFIG_BLKTRACE_EXT
+	__u64 action;		/* what happened */
+#else
 	__u32 action;		/* what happened */
+#endif /* CONFIG_BLKTRACE_EXT */
 	__u32 pid;		/* who did it */
 	__u32 device;		/* device number */
 	__u32 cpu;		/* on what cpu did it happen */
@@ -135,7 +156,11 @@  enum {
  */
 struct blk_user_trace_setup {
 	char name[BLKTRACE_BDEV_SIZE];	/* output */
+#ifdef CONFIG_BLKTRACE_EXT
+	__u64 act_mask;			/* input */
+#else
 	__u16 act_mask;			/* input */
+#endif /* CONFIG_BLKTRACE_EXT */
 	__u32 buf_size;			/* input */
 	__u32 buf_nr;			/* input */
 	__u64 start_lba;