Btrfs: Improve btrfs_ioctl_search_key documentation
diff mbox

Message ID 20170605152733.25441-1-hans.van.kranenburg@mendix.com
State New
Headers show

Commit Message

Hans van Kranenburg June 5, 2017, 3:27 p.m. UTC
A programmer who is trying to implement calling the btrfs SEARCH
or SEARCH_V2 ioctl will probably soon end up reading this struct
definition.

Properly document the input fields to prevent common misconceptions:
 1. The search space is linear, not 3 dimensional.
 2. The transaction id (a.k.a. generation) filter applies only on
 transaction id of the last COW operation on a whole metadata page, not
 on individual items.

Ad 1. The first misunderstanding was helped by the previous misleading
comments on min/max type and offset: "keys returned will be
>= min and <= max".

Ad 2. For example, running btrfs balance will happily cause rewriting of
metadata pages that contain a filesystem tree of a read only subvolume,
causing transids to be increased.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 include/uapi/linux/btrfs.h | 63 +++++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 20 deletions(-)

Comments

Hans van Kranenburg June 5, 2017, 4:03 p.m. UTC | #1
On 06/05/2017 05:27 PM, Hans van Kranenburg wrote:
> A programmer who is trying to implement calling the btrfs SEARCH
> or SEARCH_V2 ioctl will probably soon end up reading this struct
> definition.
> 
> Properly document the input fields to prevent common misconceptions:
>  1. The search space is linear, not 3 dimensional.
>  2. The transaction id (a.k.a. generation) filter applies only on
>  transaction id of the last COW operation on a whole metadata page, not
>  on individual items.
> 
> Ad 1. The first misunderstanding was helped by the previous misleading
> comments on min/max type and offset: "keys returned will be
>> = min and <= max".
> 
> Ad 2. For example, running btrfs balance will happily cause rewriting of
> metadata pages that contain a filesystem tree of a read only subvolume,
> causing transids to be increased.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> ---
>  include/uapi/linux/btrfs.h | 63 +++++++++++++++++++++++++++++++---------------
>  1 file changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index a456e5309238..864ad86c5d80 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -427,30 +427,53 @@ struct btrfs_ioctl_ino_lookup_args {
>  };
>  
>  struct btrfs_ioctl_search_key {
> -	/* which root are we searching.  0 is the tree of tree roots */
> -	__u64 tree_id;

Since this 0 is incorrect... I also fixed that...

> -
> -	/* keys returned will be >= min and <= max */
> -	__u64 min_objectid;
> -	__u64 max_objectid;
> -
> -	/* keys returned will be >= min and <= max */
> -	__u64 min_offset;
> -	__u64 max_offset;
> -
> -	/* max and min transids to search for */
> -	__u64 min_transid;
> -	__u64 max_transid;
> +	/*
> +	 * The tree we're searching in. 1 is the tree of tree roots, 2 is the
> +	 * extent tree, etc...

But after trying to feed a tree 0 to SEARCH, I got output, while this
tree does not exist at all...

Then I found this, in ioctl.c:

    if (sk->tree_id == 0) {
        /* search the root of the inode that was passed */
        root = BTRFS_I(inode)->root;
    }

I'll send an updated patch later to also mention that special case,
which is quite useful to know about actually...

Hans

> +	 */
> +	__u64 tree_id;	/* in */
>  
> -	/* keys returned will be >= min and <= max */
> -	__u32 min_type;
> -	__u32 max_type;
> +	/*
> +	 * This struct is used to provide the search key range for the SEARCH and
> +	 * SEARCH_V2 ioctls.
> +	 *
> +	 * When doing a tree search, we're actually taking a slice from a linear
> +	 * search space of 136-bit keys:
> +	 *
> +	 * Key of the first possible item to be returned:
> +	 *   (min_objectid << 72) + (min_type << 64) + min_offset
> +	 * Key of the last possible item to be returned:
> +	 *   (max_objectid << 72) + (max_type << 64) + max_offset
> +	 *
> +	 * All of the min/max input numbers only define the ultimate lower and
> +	 * upper boundary of the keys of items that will be returned. In other
> +	 * words, they are not used to filter the type or offset of intermediate
> +	 * keys encountered.
> +	 *
> +	 * Additionally, we can filter the items returned on transaction id of the
> +	 * metadata block they're stored in by specifying a transid range.  Be
> +	 * aware that this transaction id only denotes when the metadata page that
> +	 * currently contains the item got written the last time as result of a COW
> +	 * operation.  The number does not have any meaning related to the
> +	 * transaction in which an individual item that is being returned was
> +	 * created or changed.
> +	 */
> +	__u64 min_objectid;	/* in */
> +	__u64 max_objectid;	/* in */
> +	__u64 min_offset;	/* in */
> +	__u64 max_offset;	/* in */
> +	__u64 min_transid;	/* in */
> +	__u64 max_transid;	/* in */
> +	__u32 min_type;	/* in */
> +	__u32 max_type;	/* in */
>  
>  	/*
> -	 * how many items did userland ask for, and how many are we
> -	 * returning
> +	 * input: The maximum amount of results desired.
> +	 * output: The actual amount of items returned, restricted by either
> +	 *   stopping the search when reaching the input nr_items amount of items,
> +	 *   or restricted by the size of the supplied memory buffer.
>  	 */
> -	__u32 nr_items;
> +	__u32 nr_items;	/* in/out */
>  
>  	/* align to 64 bits */
>  	__u32 unused;
>
Goffredo Baroncelli June 5, 2017, 7 p.m. UTC | #2
On 2017-06-05 17:27, Hans van Kranenburg wrote:
> +	 * When doing a tree search, we're actually taking a slice from a linear
> +	 * search space of 136-bit keys:
> +	 *
> +	 * Key of the first possible item to be returned:
> +	 *   (min_objectid << 72) + (min_type << 64) + min_offset
> +	 * Key of the last possible item to be returned:
> +	 *   (max_objectid << 72) + (max_type << 64) + max_offset
> +	 *


As non English people, I prefer a less verbose and more programmatic form, like:

+	 * When doing a tree search, we're actually taking a slice from a linear
+	 * search space of 136-bit keys:
+        *
+	 * A key is returned if 
+	 *   ((min_objectid << 72) + (min_type << 64) + min_offset  <=
+        *        (objectid << 72) + (type << 64) + offset))  &&
+	 *   ((max_objectid << 72) + (max_type << 64) + max_offset >= 
+        *        (objectid << 72) + (type << 64) + offset))
+        *




> +	 * [...] In other
> +	 * words, they are not used to filter the type or offset of intermediate
> +	 * keys encountered.

Even this is correct, I still find a bit complicate to fully understand the meaning.

I would prefer to replace "not used" with "not usable"... But as stated above I am not a native English people :-)

BR
G.Baroncelli
Hans van Kranenburg June 5, 2017, 10:16 p.m. UTC | #3
On 06/05/2017 09:00 PM, Goffredo Baroncelli wrote:
> On 2017-06-05 17:27, Hans van Kranenburg wrote:
>> +	 * When doing a tree search, we're actually taking a slice from a linear
>> +	 * search space of 136-bit keys:
>> +	 *
>> +	 * Key of the first possible item to be returned:
>> +	 *   (min_objectid << 72) + (min_type << 64) + min_offset
>> +	 * Key of the last possible item to be returned:
>> +	 *   (max_objectid << 72) + (max_type << 64) + max_offset
>> +	 *
> As non English people, I prefer a less verbose [...]

Yeah, it's a bit meh... I started to change the text again and ended up
rewriting it in a different way for patch V2 (sending in a minute).

> [...] and more programmatic form, like:
> 
> +	 * When doing a tree search, we're actually taking a slice from a linear
> +	 * search space of 136-bit keys:
> +        *
> +	 * A key is returned if 
> +	 *   ((min_objectid << 72) + (min_type << 64) + min_offset  <=
> +        *        (objectid << 72) + (type << 64) + offset))  &&
> +	 *   ((max_objectid << 72) + (max_type << 64) + max_offset >= 
> +        *        (objectid << 72) + (type << 64) + offset))
> +        *

TBH, these lines mostly have an effect of dancing around before my eyes.

The point is, the search starts somewhere, end it ends somewhere. All
intermediate objects are returned. The min/max values are not applied as
a check to every key found in that range again. This way of explaining
("is returned if") adds to that wrong idea again imho.

>> +	 * [...] In other
>> +	 * words, they are not used to filter the type or offset of intermediate
>> +	 * keys encountered.
> 
> Even this is correct, I still find a bit complicate to fully understand the meaning.
> 
> I would prefer to replace "not used" with "not usable"... But as stated above I am not a native English people :-)

I'm dutch. ;) But for the user, using usable instead of used is nice
indeed, because it provides something that can be acted on, instead of
having something somewhere that "uses" it and apparently makes decisions
about what it does for some reason. Anyway, in the rewrite of that part
above, it's gone.

Patch
diff mbox

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index a456e5309238..864ad86c5d80 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -427,30 +427,53 @@  struct btrfs_ioctl_ino_lookup_args {
 };
 
 struct btrfs_ioctl_search_key {
-	/* which root are we searching.  0 is the tree of tree roots */
-	__u64 tree_id;
-
-	/* keys returned will be >= min and <= max */
-	__u64 min_objectid;
-	__u64 max_objectid;
-
-	/* keys returned will be >= min and <= max */
-	__u64 min_offset;
-	__u64 max_offset;
-
-	/* max and min transids to search for */
-	__u64 min_transid;
-	__u64 max_transid;
+	/*
+	 * The tree we're searching in. 1 is the tree of tree roots, 2 is the
+	 * extent tree, etc...
+	 */
+	__u64 tree_id;	/* in */
 
-	/* keys returned will be >= min and <= max */
-	__u32 min_type;
-	__u32 max_type;
+	/*
+	 * This struct is used to provide the search key range for the SEARCH and
+	 * SEARCH_V2 ioctls.
+	 *
+	 * When doing a tree search, we're actually taking a slice from a linear
+	 * search space of 136-bit keys:
+	 *
+	 * Key of the first possible item to be returned:
+	 *   (min_objectid << 72) + (min_type << 64) + min_offset
+	 * Key of the last possible item to be returned:
+	 *   (max_objectid << 72) + (max_type << 64) + max_offset
+	 *
+	 * All of the min/max input numbers only define the ultimate lower and
+	 * upper boundary of the keys of items that will be returned. In other
+	 * words, they are not used to filter the type or offset of intermediate
+	 * keys encountered.
+	 *
+	 * Additionally, we can filter the items returned on transaction id of the
+	 * metadata block they're stored in by specifying a transid range.  Be
+	 * aware that this transaction id only denotes when the metadata page that
+	 * currently contains the item got written the last time as result of a COW
+	 * operation.  The number does not have any meaning related to the
+	 * transaction in which an individual item that is being returned was
+	 * created or changed.
+	 */
+	__u64 min_objectid;	/* in */
+	__u64 max_objectid;	/* in */
+	__u64 min_offset;	/* in */
+	__u64 max_offset;	/* in */
+	__u64 min_transid;	/* in */
+	__u64 max_transid;	/* in */
+	__u32 min_type;	/* in */
+	__u32 max_type;	/* in */
 
 	/*
-	 * how many items did userland ask for, and how many are we
-	 * returning
+	 * input: The maximum amount of results desired.
+	 * output: The actual amount of items returned, restricted by either
+	 *   stopping the search when reaching the input nr_items amount of items,
+	 *   or restricted by the size of the supplied memory buffer.
 	 */
-	__u32 nr_items;
+	__u32 nr_items;	/* in/out */
 
 	/* align to 64 bits */
 	__u32 unused;