diff mbox

[v5,1/8] Btrfs: introduce a tree for items that map UUIDs to something

Message ID ef484aa41751485fc6301ce78b5e3cddaa3f9a52.1371749051.git.sbehrens@giantdisaster.de (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Stefan Behrens June 20, 2013, 5:45 p.m. UTC
Mapping UUIDs to subvolume IDs is an operation with a high effort
today. Today, the algorithm even has quadratic effort (based on the
number of existing subvolumes), which means, that it takes minutes
to send/receive a single subvolume if 10,000 subvolumes exist. But
even linear effort would be too much since it is a waste. And these
data structures to allow mapping UUIDs to subvolume IDs are created
every time a btrfs send/receive instance is started.

It is much more efficient to maintain a searchable persistent data
structure in the filesystem, one that is updated whenever a
subvolume/snapshot is created and deleted, and when the received
subvolume UUID is set by the btrfs-receive tool.

Therefore kernel code is added with this commit that is able to
maintain data structures in the filesystem that allow to quickly
search for a given UUID and to retrieve data that is assigned to
this UUID, like which subvolume ID is related to this UUID.

This commit adds a new tree to hold UUID-to-data mapping items. The
key of the items is the full UUID plus the key type BTRFS_UUID_KEY.
Multiple data blocks can be stored for a given UUID, a type/length/
value scheme is used.

Now follows the lengthy justification, why a new tree was added
instead of using the existing root tree:

The first approach was to not create another tree that holds UUID
items. Instead, the items should just go into the top root tree.
Unfortunately this confused the algorithm to assign the objectid
of subvolumes and snapshots. The reason is that
btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
the first created subvol or snapshot after mounting a filesystem,
and this function simply searches for the largest used objectid in
the root tree keys to pick the next objectid to assign. Of course,
the UUID keys have always been the ones with the highest offset
value, and the next assigned subvol ID was wastefully huge.

To use any other existing tree did not look proper. To apply a
workaround such as setting the objectid to zero in the UUID item
key and to implement collision handling would either add
limitations (in case of a btrfs_extend_item() approach to handle
the collisions) or a lot of complexity and source code (in case a
key would be looked up that is free of collisions). Adding new code
that introduces limitations is not good, and adding code that is
complex and lengthy for no good reason is also not good. That's the
justification why a completely new tree was introduced.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
---
 fs/btrfs/Makefile    |   3 +-
 fs/btrfs/ctree.h     |  50 ++++++
 fs/btrfs/uuid-tree.c | 480 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 532 insertions(+), 1 deletion(-)

Comments

Zach Brown June 20, 2013, 7:47 p.m. UTC | #1
> +/* for items that use the BTRFS_UUID_KEY */
> +#define BTRFS_UUID_ITEM_TYPE_SUBVOL	0 /* for UUIDs assigned to subvols */
> +#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL	1 /* for UUIDs assigned to
> +						   * received subvols */
> +
> +/* a sequence of such items is stored under the BTRFS_UUID_KEY */
> +struct btrfs_uuid_item {
> +	__le16 type;	/* refer to BTRFS_UUID_ITEM_TYPE* defines above */
> +	__le32 len;	/* number of following 64bit values */
> +	__le64 subid[0];	/* sequence of subids */
> +} __attribute__ ((__packed__));
> +

[...]

>  /*
> + * Stores items that allow to quickly map UUIDs to something else.
> + * These items are part of the filesystem UUID tree.
> + * The key is built like this:
> + * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits).
> + */
> +#if BTRFS_UUID_SIZE != 16
> +#error "UUID items require BTRFS_UUID_SIZE == 16!"
> +#endif
> +#define BTRFS_UUID_KEY	251

Why do we need this btrfs_uuid_item structure?  Why not set the key type
to either _SUBVOL or _RECEIVED_SUBVOL instead of embedding structs with
those types under items with the constant BTRFS_UUID_KEY.  Then use the
item size to determine the number of u64 subids.  Then the item has a
simple array of u64s in the data which will be a lot easier to work
with.

> +/* btrfs_uuid_item */
> +BTRFS_SETGET_FUNCS(uuid_type, struct btrfs_uuid_item, type, 16);
> +BTRFS_SETGET_FUNCS(uuid_len, struct btrfs_uuid_item, len, 32);
> +BTRFS_SETGET_STACK_FUNCS(stack_uuid_type, struct btrfs_uuid_item, type, 16);
> +BTRFS_SETGET_STACK_FUNCS(stack_uuid_len, struct btrfs_uuid_item, len, 32);

This would all go away.

> +/*
> + * One key is used to store a sequence of btrfs_uuid_item items.
> + * Each item in the sequence contains a type information and a sequence of
> + * ids (together with the information about the size of the sequence of ids).
> + * {[btrfs_uuid_item type0 {id0, id1, ..., idN}],
> + *  ...,
> + *  [btrfs_uuid_item typeZ {id0, id1, ..., idN}]}
> + *
> + * It is forbidden to put multiple items with the same type under the same key.
> + * Instead the sequence of ids is extended and used to store any additional
> + * ids for the same item type.

This constraint, and the cost of ensuring it and repairing violations,
would go away.

> +static struct btrfs_uuid_item *btrfs_match_uuid_item_type(
> +		struct btrfs_path *path, u16 type)
> +{
> +	struct extent_buffer *eb;
> +	int slot;
> +	struct btrfs_uuid_item *ptr;
> +	u32 item_size;
> +
> +	eb = path->nodes[0];
> +	slot = path->slots[0];
> +	ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item);
> +	item_size = btrfs_item_size_nr(eb, slot);
> +	do {
> +		u16 sub_item_type;
> +		u64 sub_item_len;
> +
> +		if (item_size < sizeof(*ptr)) {
> +			pr_warn("btrfs: uuid item too short (%lu < %d)!\n",
> +				(unsigned long)item_size, (int)sizeof(*ptr));
> +			return NULL;
> +		}
> +		item_size -= sizeof(*ptr);
> +		sub_item_type = btrfs_uuid_type(eb, ptr);
> +		sub_item_len = btrfs_uuid_len(eb, ptr);
> +		if (sub_item_len * sizeof(u64) > item_size) {
> +			pr_warn("btrfs: uuid item too short (%llu > %lu)!\n",
> +				(unsigned long long)(sub_item_len *
> +						     sizeof(u64)),
> +				(unsigned long)item_size);
> +			return NULL;
> +		}
> +		if (sub_item_type == type)
> +			return ptr;
> +		item_size -= sub_item_len * sizeof(u64);
> +		ptr = 1 + (struct btrfs_uuid_item *)
> +			(((char *)ptr) + (sub_item_len * sizeof(u64)));
> +	} while (item_size);
>
> +static int btrfs_uuid_tree_lookup_prepare(struct btrfs_root *uuid_root,
> +					  u8 *uuid, u16 type,
> +					  struct btrfs_path *path,
> +					  struct btrfs_uuid_item **ptr)
> +{
> +	int ret;
> +	struct btrfs_key key;
> +
> +	if (!uuid_root) {
> +		WARN_ON_ONCE(1);
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	btrfs_uuid_to_key(uuid, &key);
> +
> +	ret = btrfs_search_slot(NULL, uuid_root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;
> +	if (ret > 0) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	*ptr = btrfs_match_uuid_item_type(path, type);
> +	if (!*ptr) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	ret = 0;
> +
> +out:
> +	return ret;
> +}

All of this is replaced with the simple search_slot in the caller.

> +	offset = (unsigned long)ptr;
> +	while (sub_item_len > 0) {
> +		u64 data;
> +
> +		read_extent_buffer(eb, &data, offset, sizeof(data));
> +		data = le64_to_cpu(data);
> +		if (data == subid) {
> +			ret = 0;
> +			break;
> +		}
> +		offset += sizeof(data);
> +		sub_item_len--;
> +	}

This could be cleaned up a bit by comparing an on-stack little-endian
input subid with each little-endian subid in the item with
memcmp_extent_buffer().

> +	ret = btrfs_insert_empty_item(trans, uuid_root, path, &key,
> +				      sizeof(*ptr) + sizeof(subid));
> +	if (ret == -EEXIST) {
> +		ptr = btrfs_match_uuid_item_type(path, type);
> +		if (ptr) {
[...]
> +			btrfs_extend_item(uuid_root, path, sizeof(subid));

How does this extension avoid the BUG when the leaf containing the item
doesn't have room for the new subid?

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Behrens June 21, 2013, 8:47 a.m. UTC | #2
On Thu, 20 Jun 2013 12:47:04 -0700, Zach Brown wrote:
> 
>> +/* for items that use the BTRFS_UUID_KEY */
>> +#define BTRFS_UUID_ITEM_TYPE_SUBVOL	0 /* for UUIDs assigned to subvols */
>> +#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL	1 /* for UUIDs assigned to
>> +						   * received subvols */
>> +
>> +/* a sequence of such items is stored under the BTRFS_UUID_KEY */
>> +struct btrfs_uuid_item {
>> +	__le16 type;	/* refer to BTRFS_UUID_ITEM_TYPE* defines above */
>> +	__le32 len;	/* number of following 64bit values */
>> +	__le64 subid[0];	/* sequence of subids */
>> +} __attribute__ ((__packed__));
>> +
> 
> [...]
> 
>>  /*
>> + * Stores items that allow to quickly map UUIDs to something else.
>> + * These items are part of the filesystem UUID tree.
>> + * The key is built like this:
>> + * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits).
>> + */
>> +#if BTRFS_UUID_SIZE != 16
>> +#error "UUID items require BTRFS_UUID_SIZE == 16!"
>> +#endif
>> +#define BTRFS_UUID_KEY	251
> 
> Why do we need this btrfs_uuid_item structure?  Why not set the key type
> to either _SUBVOL or _RECEIVED_SUBVOL instead of embedding structs with
> those types under items with the constant BTRFS_UUID_KEY.  Then use the
> item size to determine the number of u64 subids.  Then the item has a
> simple array of u64s in the data which will be a lot easier to work
> with.

Maintaining a 16 bit uuid_type field inside the item is done in order to
avoid wasting type values for the key. The type field in the key has a
width of 8 bits, so far 34 type values are defined in ctree.h.

As long as such items are in separated trees, their type value could be
reused in the future when the 8 bit type field is exhausted.

There was some discussion in #btrfs about this topic on 2013-04-26.
There are pros and cons. The uuid_type field in the item allows to use
the uuid items generically outside of the uuid tree since it occupies
only one type value in the 8 bit type field. On the other hand, if the 8
bit type field in the key is exhausted, it saves lines of codes and
avoids complexity to implement a common way to expand this field instead
of implementing multiplexing layers at multiple places, although maybe
with some added performance penalty.

Anybody else with an opinion for this topic?


>> +/* btrfs_uuid_item */
>> +BTRFS_SETGET_FUNCS(uuid_type, struct btrfs_uuid_item, type, 16);
>> +BTRFS_SETGET_FUNCS(uuid_len, struct btrfs_uuid_item, len, 32);
>> +BTRFS_SETGET_STACK_FUNCS(stack_uuid_type, struct btrfs_uuid_item, type, 16);
>> +BTRFS_SETGET_STACK_FUNCS(stack_uuid_len, struct btrfs_uuid_item, len, 32);
> 
> This would all go away.
> 
>> +/*
>> + * One key is used to store a sequence of btrfs_uuid_item items.
>> + * Each item in the sequence contains a type information and a sequence of
>> + * ids (together with the information about the size of the sequence of ids).
>> + * {[btrfs_uuid_item type0 {id0, id1, ..., idN}],
>> + *  ...,
>> + *  [btrfs_uuid_item typeZ {id0, id1, ..., idN}]}
>> + *
>> + * It is forbidden to put multiple items with the same type under the same key.
>> + * Instead the sequence of ids is extended and used to store any additional
>> + * ids for the same item type.
> 
> This constraint, and the cost of ensuring it and repairing violations,
> would go away.
> 
>> +static struct btrfs_uuid_item *btrfs_match_uuid_item_type(
>> +		struct btrfs_path *path, u16 type)
>> +{
>> +	struct extent_buffer *eb;
>> +	int slot;
>> +	struct btrfs_uuid_item *ptr;
>> +	u32 item_size;
>> +
>> +	eb = path->nodes[0];
>> +	slot = path->slots[0];
>> +	ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item);
>> +	item_size = btrfs_item_size_nr(eb, slot);
>> +	do {
>> +		u16 sub_item_type;
>> +		u64 sub_item_len;
>> +
>> +		if (item_size < sizeof(*ptr)) {
>> +			pr_warn("btrfs: uuid item too short (%lu < %d)!\n",
>> +				(unsigned long)item_size, (int)sizeof(*ptr));
>> +			return NULL;
>> +		}
>> +		item_size -= sizeof(*ptr);
>> +		sub_item_type = btrfs_uuid_type(eb, ptr);
>> +		sub_item_len = btrfs_uuid_len(eb, ptr);
>> +		if (sub_item_len * sizeof(u64) > item_size) {
>> +			pr_warn("btrfs: uuid item too short (%llu > %lu)!\n",
>> +				(unsigned long long)(sub_item_len *
>> +						     sizeof(u64)),
>> +				(unsigned long)item_size);
>> +			return NULL;
>> +		}
>> +		if (sub_item_type == type)
>> +			return ptr;
>> +		item_size -= sub_item_len * sizeof(u64);
>> +		ptr = 1 + (struct btrfs_uuid_item *)
>> +			(((char *)ptr) + (sub_item_len * sizeof(u64)));
>> +	} while (item_size);
>>
>> +static int btrfs_uuid_tree_lookup_prepare(struct btrfs_root *uuid_root,
>> +					  u8 *uuid, u16 type,
>> +					  struct btrfs_path *path,
>> +					  struct btrfs_uuid_item **ptr)
>> +{
>> +	int ret;
>> +	struct btrfs_key key;
>> +
>> +	if (!uuid_root) {
>> +		WARN_ON_ONCE(1);
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	btrfs_uuid_to_key(uuid, &key);
>> +
>> +	ret = btrfs_search_slot(NULL, uuid_root, &key, path, 0, 0);
>> +	if (ret < 0)
>> +		goto out;
>> +	if (ret > 0) {
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	*ptr = btrfs_match_uuid_item_type(path, type);
>> +	if (!*ptr) {
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	ret = 0;
>> +
>> +out:
>> +	return ret;
>> +}
> 
> All of this is replaced with the simple search_slot in the caller.
> 
>> +	offset = (unsigned long)ptr;
>> +	while (sub_item_len > 0) {
>> +		u64 data;
>> +
>> +		read_extent_buffer(eb, &data, offset, sizeof(data));
>> +		data = le64_to_cpu(data);
>> +		if (data == subid) {
>> +			ret = 0;
>> +			break;
>> +		}
>> +		offset += sizeof(data);
>> +		sub_item_len--;
>> +	}
> 
> This could be cleaned up a bit by comparing an on-stack little-endian
> input subid with each little-endian subid in the item with
> memcmp_extent_buffer().

This would save some CPU cycles for the repeated le64_to_cpu() and for
the memcpy(). The number of lines of code is equal for both ways. The
read_extent_buffer() way is more straight forward and readable IMO.


>> +	ret = btrfs_insert_empty_item(trans, uuid_root, path, &key,
>> +				      sizeof(*ptr) + sizeof(subid));
>> +	if (ret == -EEXIST) {
>> +		ptr = btrfs_match_uuid_item_type(path, type);
>> +		if (ptr) {
> [...]
>> +			btrfs_extend_item(uuid_root, path, sizeof(subid));
> 
> How does this extension avoid the BUG when the leaf containing the item
> doesn't have room for the new subid?

btrfs_extend_item() does not BUG() when you have called
btrfs_search_slot() before with ins_len > 0.

btrfs_search_slot() calls split_leaf(), and split_leaf() checks whether
the requested additional data will fit, and return EOVERFLOW if this is
not the case.

The call chain is:
btrfs_insert_empty_item() -> btrfs_search_slot() -> split_leaf()

Thanks for your review comments.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown June 21, 2013, 4:11 p.m. UTC | #3
> > Why do we need this btrfs_uuid_item structure?  Why not set the key type
> > to either _SUBVOL or _RECEIVED_SUBVOL instead of embedding structs with
> > those types under items with the constant BTRFS_UUID_KEY.  Then use the
> > item size to determine the number of u64 subids.  Then the item has a
> > simple array of u64s in the data which will be a lot easier to work
> > with.
> 
> Maintaining a 16 bit uuid_type field inside the item is done in order to
> avoid wasting type values for the key. The type field in the key has a
> width of 8 bits, so far 34 type values are defined in ctree.h.
> 
> As long as such items are in separated trees, their type value could be
> reused in the future when the 8 bit type field is exhausted.
> 
> There was some discussion in #btrfs about this topic on 2013-04-26.
> There are pros and cons. The uuid_type field in the item allows to use
> the uuid items generically outside of the uuid tree since it occupies
> only one type value in the 8 bit type field. On the other hand, if the 8
> bit type field in the key is exhausted, it saves lines of codes and
> avoids complexity to implement a common way to expand this field instead
> of implementing multiplexing layers at multiple places, although maybe
> with some added performance penalty.
> 
> Anybody else with an opinion for this topic?

Perhaps obviously, I'm strongly in favour of not rolling another layer
of indirection inside the items, especially when the code to implement
it is full of fragile pointer math.

> >> +	offset = (unsigned long)ptr;
> >> +	while (sub_item_len > 0) {
> >> +		u64 data;
> >> +
> >> +		read_extent_buffer(eb, &data, offset, sizeof(data));
> >> +		data = le64_to_cpu(data);
> >> +		if (data == subid) {
> >> +			ret = 0;
> >> +			break;
> >> +		}
> >> +		offset += sizeof(data);
> >> +		sub_item_len--;
> >> +	}
> > 
> > This could be cleaned up a bit by comparing an on-stack little-endian
> > input subid with each little-endian subid in the item with
> > memcmp_extent_buffer().
> 
> This would save some CPU cycles for the repeated le64_to_cpu() and for
> the memcpy(). The number of lines of code is equal for both ways.

Hmm?  It would be many fewer lines of code.

> read_extent_buffer() way is more straight forward and readable IMO.

> > How does this extension avoid the BUG when the leaf containing the item
> > doesn't have room for the new subid?
> 
> btrfs_extend_item() does not BUG() when you have called
> btrfs_search_slot() before with ins_len > 0.

Ah, got it.  Thanks.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason June 21, 2013, 4:20 p.m. UTC | #4
Quoting Stefan Behrens (2013-06-21 04:47:20)
> On Thu, 20 Jun 2013 12:47:04 -0700, Zach Brown wrote:
> > 
> >> +/* for items that use the BTRFS_UUID_KEY */
> >> +#define BTRFS_UUID_ITEM_TYPE_SUBVOL 0 /* for UUIDs assigned to subvols */
> >> +#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL        1 /* for UUIDs assigned to
> >> +                                               * received subvols */
> >> +
> >> +/* a sequence of such items is stored under the BTRFS_UUID_KEY */
> >> +struct btrfs_uuid_item {
> >> +    __le16 type;    /* refer to BTRFS_UUID_ITEM_TYPE* defines above */
> >> +    __le32 len;     /* number of following 64bit values */
> >> +    __le64 subid[0];        /* sequence of subids */
> >> +} __attribute__ ((__packed__));
> >> +
> > 
> > [...]
> > 
> >>  /*
> >> + * Stores items that allow to quickly map UUIDs to something else.
> >> + * These items are part of the filesystem UUID tree.
> >> + * The key is built like this:
> >> + * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits).
> >> + */
> >> +#if BTRFS_UUID_SIZE != 16
> >> +#error "UUID items require BTRFS_UUID_SIZE == 16!"
> >> +#endif
> >> +#define BTRFS_UUID_KEY      251
> > 
> > Why do we need this btrfs_uuid_item structure?  Why not set the key type
> > to either _SUBVOL or _RECEIVED_SUBVOL instead of embedding structs with
> > those types under items with the constant BTRFS_UUID_KEY.  Then use the
> > item size to determine the number of u64 subids.  Then the item has a
> > simple array of u64s in the data which will be a lot easier to work
> > with.
> 
> Maintaining a 16 bit uuid_type field inside the item is done in order to
> avoid wasting type values for the key. The type field in the key has a
> width of 8 bits, so far 34 type values are defined in ctree.h.
> 
> As long as such items are in separated trees, their type value could be
> reused in the future when the 8 bit type field is exhausted.
> 
> There was some discussion in #btrfs about this topic on 2013-04-26.
> There are pros and cons. The uuid_type field in the item allows to use
> the uuid items generically outside of the uuid tree since it occupies
> only one type value in the 8 bit type field. On the other hand, if the 8
> bit type field in the key is exhausted, it saves lines of codes and
> avoids complexity to implement a common way to expand this field instead
> of implementing multiplexing layers at multiple places, although maybe
> with some added performance penalty.
> 
> Anybody else with an opinion for this topic?

I've been thinking about this, and while I clearly did not have this use
of uuids in mind originally, they have become a key part of the FS.  They
deserve their own bits in the type field.

I know this is not an easy thing to redo, but long term it really looks
worth it to me.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Behrens June 24, 2013, 4:42 p.m. UTC | #5
On Fri, 21 Jun 2013 09:11:38 -0700, Zach Brown wrote:
>>>> +	offset = (unsigned long)ptr;
>>>> +	while (sub_item_len > 0) {
>>>> +		u64 data;
>>>> +
>>>> +		read_extent_buffer(eb, &data, offset, sizeof(data));
>>>> +		data = le64_to_cpu(data);
>>>> +		if (data == subid) {
>>>> +			ret = 0;
>>>> +			break;
>>>> +		}
>>>> +		offset += sizeof(data);
>>>> +		sub_item_len--;
>>>> +	}
>>>
>>> This could be cleaned up a bit by comparing an on-stack little-endian
>>> input subid with each little-endian subid in the item with
>>> memcmp_extent_buffer().
>>
>> This would save some CPU cycles for the repeated le64_to_cpu() and for
>> the memcpy(). The number of lines of code is equal for both ways.
> 
> Hmm?  It would be many fewer lines of code.

Are you thinking of something shorter than the following?

offset = (unsigned long)ptr;
subid = cpu_to_le64(subid);
while (sub_item_len > 0) {
	if (memcmp_extent_buffer(eb, &subid, offset,
				 sizeof(subid)) == 0) {
		ret = 0;
		break;
	}
	offset += sizeof(subid);
	sub_item_len--;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown June 24, 2013, 6:10 p.m. UTC | #6
> >> This would save some CPU cycles for the repeated le64_to_cpu() and for
> >> the memcpy(). The number of lines of code is equal for both ways.
> > 
> > Hmm?  It would be many fewer lines of code.
> 
> Are you thinking of something shorter than the following?

That's better, yeah, but there's still some room left to go.  Almost all
of the remaining code is boiler-plate iteration that could be done with
a for(;;).

This could be a mini OCC contest! :)

> offset = (unsigned long)ptr;
> subid = cpu_to_le64(subid);

> while (sub_item_len > 0) {
> 	if (memcmp_extent_buffer(eb, &subid, offset,
> 				 sizeof(subid)) == 0) {
> 		ret = 0;
> 		break;
> 	}
> 	offset += sizeof(subid);
> 	sub_item_len--;
> }

roughly:

unsigned long end = offset + sub_item_len;

for (ret = 1; offset < end && ret; offset += sizeof(subid))
	ret = !!memcmp_extent_buffer(eb, &subid, offset,
				     sizeof(subid));

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 3932224..a550dfc 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,8 @@  btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o
+	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
+	   uuid-tree.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 76e4983..04447b6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -91,6 +91,9 @@  struct btrfs_ordered_sum;
 /* holds quota configuration and tracking */
 #define BTRFS_QUOTA_TREE_OBJECTID 8ULL
 
+/* for storing items that use the BTRFS_UUID_KEY */
+#define BTRFS_UUID_TREE_OBJECTID 9ULL
+
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
@@ -953,6 +956,18 @@  struct btrfs_dev_replace_item {
 	__le64 num_uncorrectable_read_errors;
 } __attribute__ ((__packed__));
 
+/* for items that use the BTRFS_UUID_KEY */
+#define BTRFS_UUID_ITEM_TYPE_SUBVOL	0 /* for UUIDs assigned to subvols */
+#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL	1 /* for UUIDs assigned to
+						   * received subvols */
+
+/* a sequence of such items is stored under the BTRFS_UUID_KEY */
+struct btrfs_uuid_item {
+	__le16 type;	/* refer to BTRFS_UUID_ITEM_TYPE* defines above */
+	__le32 len;	/* number of following 64bit values */
+	__le64 subid[0];	/* sequence of subids */
+} __attribute__ ((__packed__));
+
 /* different types of block groups (and chunks) */
 #define BTRFS_BLOCK_GROUP_DATA		(1ULL << 0)
 #define BTRFS_BLOCK_GROUP_SYSTEM	(1ULL << 1)
@@ -1922,6 +1937,17 @@  struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_DEV_REPLACE_KEY	250
 
 /*
+ * Stores items that allow to quickly map UUIDs to something else.
+ * These items are part of the filesystem UUID tree.
+ * The key is built like this:
+ * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits).
+ */
+#if BTRFS_UUID_SIZE != 16
+#error "UUID items require BTRFS_UUID_SIZE == 16!"
+#endif
+#define BTRFS_UUID_KEY	251
+
+/*
  * string items are for debugging.  They just store a short string of
  * data in the FS
  */
@@ -2979,6 +3005,12 @@  BTRFS_SETGET_FUNCS(dev_replace_cursor_left, struct btrfs_dev_replace_item,
 BTRFS_SETGET_FUNCS(dev_replace_cursor_right, struct btrfs_dev_replace_item,
 		   cursor_right, 64);
 
+/* btrfs_uuid_item */
+BTRFS_SETGET_FUNCS(uuid_type, struct btrfs_uuid_item, type, 16);
+BTRFS_SETGET_FUNCS(uuid_len, struct btrfs_uuid_item, len, 32);
+BTRFS_SETGET_STACK_FUNCS(stack_uuid_type, struct btrfs_uuid_item, type, 16);
+BTRFS_SETGET_STACK_FUNCS(stack_uuid_len, struct btrfs_uuid_item, len, 32);
+
 BTRFS_SETGET_STACK_FUNCS(stack_dev_replace_src_devid,
 			 struct btrfs_dev_replace_item, src_devid, 64);
 BTRFS_SETGET_STACK_FUNCS(stack_dev_replace_cont_reading_from_srcdev_mode,
@@ -3414,6 +3446,24 @@  void btrfs_check_and_init_root_item(struct btrfs_root_item *item);
 void btrfs_update_root_times(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root);
 
+/* uuid-tree.c */
+int btrfs_lookup_uuid_subvol_item(struct btrfs_root *uuid_root, u8 *uuid,
+				  u64 *subvol_id);
+int btrfs_insert_uuid_subvol_item(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *uuid_root, u8 *uuid,
+				  u64 subvol_id);
+int btrfs_del_uuid_subvol_item(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *uuid_root, u8 *uuid,
+			       u64 subvol_id);
+int btrfs_lookup_uuid_received_subvol_item(struct btrfs_root *uuid_root,
+					   u8 *uuid, u64 *subvol_id);
+int btrfs_insert_uuid_received_subvol_item(struct btrfs_trans_handle *trans,
+					   struct btrfs_root *uuid_root,
+					   u8 *uuid, u64 subvol_id);
+int btrfs_del_uuid_received_subvol_item(struct btrfs_trans_handle *trans,
+					struct btrfs_root *uuid_root, u8 *uuid,
+					u64 subvol_id);
+
 /* dir-item.c */
 int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir,
 			  const char *name, int name_len);
diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
new file mode 100644
index 0000000..3939a54
--- /dev/null
+++ b/fs/btrfs/uuid-tree.c
@@ -0,0 +1,480 @@ 
+/*
+ * Copyright (C) STRATO AG 2013.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include <linux/uuid.h>
+#include <asm/unaligned.h>
+#include "ctree.h"
+#include "transaction.h"
+#include "disk-io.h"
+#include "print-tree.h"
+
+
+/*
+ * One key is used to store a sequence of btrfs_uuid_item items.
+ * Each item in the sequence contains a type information and a sequence of
+ * ids (together with the information about the size of the sequence of ids).
+ * {[btrfs_uuid_item type0 {id0, id1, ..., idN}],
+ *  ...,
+ *  [btrfs_uuid_item typeZ {id0, id1, ..., idN}]}
+ *
+ * It is forbidden to put multiple items with the same type under the same key.
+ * Instead the sequence of ids is extended and used to store any additional
+ * ids for the same item type.
+ */
+
+static void btrfs_uuid_to_key(u8 *uuid, struct btrfs_key *key)
+{
+	key->type = BTRFS_UUID_KEY;
+	key->objectid = get_unaligned_le64(uuid);
+	key->offset = get_unaligned_le64(uuid + sizeof(u64));
+}
+
+static struct btrfs_uuid_item *btrfs_match_uuid_item_type(
+		struct btrfs_path *path, u16 type)
+{
+	struct extent_buffer *eb;
+	int slot;
+	struct btrfs_uuid_item *ptr;
+	u32 item_size;
+
+	eb = path->nodes[0];
+	slot = path->slots[0];
+	ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item);
+	item_size = btrfs_item_size_nr(eb, slot);
+	do {
+		u16 sub_item_type;
+		u64 sub_item_len;
+
+		if (item_size < sizeof(*ptr)) {
+			pr_warn("btrfs: uuid item too short (%lu < %d)!\n",
+				(unsigned long)item_size, (int)sizeof(*ptr));
+			return NULL;
+		}
+		item_size -= sizeof(*ptr);
+		sub_item_type = btrfs_uuid_type(eb, ptr);
+		sub_item_len = btrfs_uuid_len(eb, ptr);
+		if (sub_item_len * sizeof(u64) > item_size) {
+			pr_warn("btrfs: uuid item too short (%llu > %lu)!\n",
+				(unsigned long long)(sub_item_len *
+						     sizeof(u64)),
+				(unsigned long)item_size);
+			return NULL;
+		}
+		if (sub_item_type == type)
+			return ptr;
+		item_size -= sub_item_len * sizeof(u64);
+		ptr = 1 + (struct btrfs_uuid_item *)
+			(((char *)ptr) + (sub_item_len * sizeof(u64)));
+	} while (item_size);
+
+	return NULL;
+}
+
+static int btrfs_uuid_tree_lookup_prepare(struct btrfs_root *uuid_root,
+					  u8 *uuid, u16 type,
+					  struct btrfs_path *path,
+					  struct btrfs_uuid_item **ptr)
+{
+	int ret;
+	struct btrfs_key key;
+
+	if (!uuid_root) {
+		WARN_ON_ONCE(1);
+		ret = -ENOENT;
+		goto out;
+	}
+
+	btrfs_uuid_to_key(uuid, &key);
+
+	ret = btrfs_search_slot(NULL, uuid_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	*ptr = btrfs_match_uuid_item_type(path, type);
+	if (!*ptr) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	return ret;
+}
+
+/* return -ENOENT for !found, < 0 for errors, or 0 if an item was found */
+static int btrfs_uuid_tree_lookup(struct btrfs_root *uuid_root, u8 *uuid,
+				  u16 type, u64 subid)
+{
+	int ret;
+	struct btrfs_path *path;
+	struct extent_buffer *eb;
+	struct btrfs_uuid_item *ptr;
+	u32 sub_item_len;
+	unsigned long offset;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = btrfs_uuid_tree_lookup_prepare(uuid_root, uuid, type, path, &ptr);
+	if (ret < 0)
+		goto out;
+
+	eb = path->nodes[0];
+	sub_item_len = btrfs_uuid_len(eb, ptr);
+	WARN_ON_ONCE(sub_item_len == 0);
+	ret = -ENOENT;
+	ptr++;
+	offset = (unsigned long)ptr;
+	while (sub_item_len > 0) {
+		u64 data;
+
+		read_extent_buffer(eb, &data, offset, sizeof(data));
+		data = le64_to_cpu(data);
+		if (data == subid) {
+			ret = 0;
+			break;
+		}
+		offset += sizeof(data);
+		sub_item_len--;
+	}
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+/* return -ENOENT for !found, < 0 for errors, or 0 if an item was found */
+static int btrfs_uuid_tree_lookup_any(struct btrfs_root *uuid_root, u8 *uuid,
+				      u16 type, u64 *subid)
+{
+	int ret;
+	struct btrfs_path *path;
+	struct extent_buffer *eb;
+	struct btrfs_uuid_item *ptr;
+	u32 sub_item_len;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = btrfs_uuid_tree_lookup_prepare(uuid_root, uuid, type, path, &ptr);
+	if (ret < 0)
+		goto out;
+
+	eb = path->nodes[0];
+	sub_item_len = btrfs_uuid_len(eb, ptr);
+	WARN_ON_ONCE(sub_item_len == 0);
+	if (sub_item_len > 0) {
+		/* return first stored id */
+		read_extent_buffer(eb, subid, (unsigned long)(ptr + 1),
+				   sizeof(*subid));
+		*subid = le64_to_cpu(*subid);
+		ret = 0;
+	} else {
+		ret = -ENOENT;
+	}
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+/* it is not checked whether the entry to add already exists */
+static int btrfs_uuid_tree_add(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *uuid_root, u8 *uuid,
+			       u16 type, u64 subid)
+{
+	int ret;
+	struct btrfs_path *path = NULL;
+	struct btrfs_key key;
+	struct extent_buffer *eb;
+	int slot;
+	struct btrfs_uuid_item *ptr;
+	u32 item_size;
+
+	if (!uuid_root) {
+		WARN_ON_ONCE(1);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	btrfs_uuid_to_key(uuid, &key);
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = btrfs_insert_empty_item(trans, uuid_root, path, &key,
+				      sizeof(*ptr) + sizeof(subid));
+	if (ret == -EEXIST) {
+		ptr = btrfs_match_uuid_item_type(path, type);
+		if (ptr) {
+			/*
+			 * An item with that type already exists.
+			 * Extend the item and store the subid at the
+			 * location of the first id of this type.
+			 */
+			unsigned long start;
+			unsigned long move_dst;
+			unsigned long move_src;
+			unsigned long move_len;
+
+			btrfs_extend_item(uuid_root, path, sizeof(subid));
+			ptr = (struct btrfs_uuid_item *)
+				(((char *)ptr) - sizeof(subid));
+			eb = path->nodes[0];
+			slot = path->slots[0];
+			item_size = btrfs_item_size_nr(eb, slot);
+			WARN_ON(btrfs_uuid_len(eb, ptr) == 0);
+			btrfs_set_uuid_len(eb, ptr,
+					   btrfs_uuid_len(eb, ptr) + 1);
+			ptr++;
+			start = btrfs_item_ptr_offset(eb, slot);
+			move_dst = ((unsigned long)ptr) + sizeof(subid);
+			move_src = (unsigned long)ptr;
+			move_len = item_size - (move_dst - start);
+			memmove_extent_buffer(eb, move_dst, move_src, move_len);
+
+			goto write_subid;
+		} else {
+			btrfs_extend_item(uuid_root, path,
+					  sizeof(*ptr) + sizeof(subid));
+		}
+	} else if (ret < 0) {
+		pr_warn("btrfs: insert uuid item failed %d (0x%016llx, 0x%016llx) type %lu!\n",
+			ret, (unsigned long long)key.objectid,
+			(unsigned long long)key.offset,
+			(unsigned long)type);
+		goto out;
+	}
+
+	/*
+	 * Add an item for the type for the first time. Either add it behind
+	 * items with different types, or write the very first item.
+	 */
+	eb = path->nodes[0];
+	slot = path->slots[0];
+	ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item);
+	item_size = btrfs_item_size_nr(eb, slot);
+	ptr = (struct btrfs_uuid_item *)
+		(((char *)ptr) + item_size - (sizeof(*ptr) + sizeof(subid)));
+	btrfs_set_uuid_type(eb, ptr, type);
+	btrfs_set_uuid_len(eb, ptr, 1);
+	ptr++;
+
+write_subid:
+	ret = 0;
+	subid = cpu_to_le64(subid);
+	write_extent_buffer(eb, &subid, (unsigned long)ptr, sizeof(subid));
+	btrfs_mark_buffer_dirty(eb);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int btrfs_uuid_tree_rem(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *uuid_root, u8 *uuid,
+			       u16 type, u64 subid)
+{
+	int ret;
+	struct btrfs_path *path = NULL;
+	struct btrfs_key key;
+	struct extent_buffer *eb;
+	int slot;
+	struct btrfs_uuid_item *ptr_start;
+	unsigned long offset;
+	u32 item_size;
+	u32 id_index;
+	unsigned long start;
+	unsigned long move_dst;
+	unsigned long move_src;
+	unsigned long move_len;
+
+	if (!uuid_root) {
+		WARN_ON_ONCE(1);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	btrfs_uuid_to_key(uuid, &key);
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = btrfs_search_slot(trans, uuid_root, &key, path, -1, 1);
+	if (ret < 0) {
+		pr_warn("btrfs: error %d while searching for uuid item!\n",
+			ret);
+		goto out;
+	}
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	ptr_start = btrfs_match_uuid_item_type(path, type);
+	if (!ptr_start) {
+		/* case 1: no entry for that type is found */
+		ret = -ENOENT;
+		goto out;
+	}
+
+	eb = path->nodes[0];
+	slot = path->slots[0];
+	start = btrfs_item_ptr_offset(eb, slot);
+	item_size = btrfs_item_size_nr(eb, slot);
+	id_index = btrfs_uuid_len(eb, ptr_start);
+	offset = (unsigned long)(ptr_start + 1);
+	while (id_index > 0) {
+		u64 found_subid;
+
+		read_extent_buffer(eb, &found_subid, offset,
+				   sizeof(found_subid));
+		found_subid = le64_to_cpu(found_subid);
+		if (found_subid == subid)
+			break;
+		offset += sizeof(found_subid);
+		id_index--;
+	}
+
+	if (!id_index) {
+		/*
+		 * case 2: an entry for that type is found, but no match
+		 * for the id
+		 */
+		ret = -ENOENT;
+		goto out;
+	}
+
+	if (btrfs_uuid_len(eb, ptr_start) == 1) {
+		if (item_size == sizeof(*ptr_start) + sizeof(subid)) {
+			/*
+			 * case 3: no other types and ids are stored,
+			 * delete the full item
+			 */
+			ret = btrfs_del_item(trans, uuid_root, path);
+			goto out;
+		} else {
+			/*
+			 * case 4: No other ids for the particular type
+			 * are stored, but other types exist. Shrink the
+			 * item by the size of an id plus the size of the
+			 * item for that type.
+			 */
+			move_dst = (unsigned long)ptr_start;
+		}
+	} else {
+		/*
+		 * case 5: Other ids for that particular type are stored.
+		 * Shrink the item by the size of an id.
+		 */
+		move_dst = offset;
+		WARN_ON(btrfs_uuid_len(eb, ptr_start) == 0);
+		btrfs_set_uuid_len(eb, ptr_start,
+				   btrfs_uuid_len(eb, ptr_start) - 1);
+	}
+	move_src = offset + sizeof(subid);
+	move_len = item_size - (move_src - start);
+
+	memmove_extent_buffer(eb, move_dst, move_src, move_len);
+	btrfs_truncate_item(uuid_root, path, item_size - (move_src - move_dst),
+			    1);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+int btrfs_lookup_uuid_subvol_item(struct btrfs_root *uuid_root, u8 *uuid,
+				  u64 *subvol_id)
+{
+	return btrfs_uuid_tree_lookup_any(uuid_root, uuid,
+					  BTRFS_UUID_ITEM_TYPE_SUBVOL,
+					  subvol_id);
+}
+
+int btrfs_insert_uuid_subvol_item(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *uuid_root, u8 *uuid,
+				  u64 subvol_id)
+{
+	int ret;
+
+	ret = btrfs_uuid_tree_lookup(uuid_root, uuid,
+				     BTRFS_UUID_ITEM_TYPE_SUBVOL, subvol_id);
+	if (ret == -ENOENT)
+		ret = btrfs_uuid_tree_add(trans, uuid_root, uuid,
+					  BTRFS_UUID_ITEM_TYPE_SUBVOL,
+					  subvol_id);
+	return ret;
+}
+
+int btrfs_del_uuid_subvol_item(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *uuid_root, u8 *uuid,
+			       u64 subvol_id)
+{
+	return btrfs_uuid_tree_rem(trans, uuid_root, uuid,
+				   BTRFS_UUID_ITEM_TYPE_SUBVOL, subvol_id);
+}
+
+int btrfs_lookup_uuid_received_subvol_item(struct btrfs_root *uuid_root,
+					   u8 *uuid, u64 *subvol_id)
+{
+	return btrfs_uuid_tree_lookup_any(uuid_root, uuid,
+					  BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL,
+					  subvol_id);
+}
+
+int btrfs_insert_uuid_received_subvol_item(struct btrfs_trans_handle *trans,
+					   struct btrfs_root *uuid_root,
+					   u8 *uuid, u64 subvol_id)
+{
+	int ret;
+
+	ret = btrfs_uuid_tree_lookup(uuid_root, uuid,
+				     BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL,
+				     subvol_id);
+	if (ret == -ENOENT)
+		ret = btrfs_uuid_tree_add(trans, uuid_root, uuid,
+					  BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL,
+					  subvol_id);
+	return ret;
+}
+
+int btrfs_del_uuid_received_subvol_item(struct btrfs_trans_handle *trans,
+					struct btrfs_root *uuid_root, u8 *uuid,
+					u64 subvol_id)
+{
+	return btrfs_uuid_tree_rem(trans, uuid_root, uuid,
+				   BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL,
+				   subvol_id);
+}