From patchwork Sun Oct 6 21:22:33 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 2993401 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 6DA249F1C4 for ; Sun, 6 Oct 2013 21:22:49 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6AC75201FD for ; Sun, 6 Oct 2013 21:22:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7930520116 for ; Sun, 6 Oct 2013 21:22:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754753Ab3JFVWo (ORCPT ); Sun, 6 Oct 2013 17:22:44 -0400 Received: from mail-wi0-f179.google.com ([209.85.212.179]:44566 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752380Ab3JFVWn (ORCPT ); Sun, 6 Oct 2013 17:22:43 -0400 Received: by mail-wi0-f179.google.com with SMTP id hm2so3937353wib.12 for ; Sun, 06 Oct 2013 14:22:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TjZKgyqgRqZCpK9FrUErM/8gee2PZdJPvw1nCucL8Rg=; b=XfAnHQaKCQeSyrEm6pPuBnF0T7uAj7xjOOdR0CscR6duKwcoShYhqzJ6kHLi6Kok9u O/WePJr6RIJylbnNrZhSheSoZ+et9X2Z0KwT1LskrHGLVBIIWjiuYD8GzOubYB6rLzOr Glu1V2oUnrid/hSIV1BlPlI51ALBAVF+jcECCm6Oy3I3WepPKYaMTrarAVy4ldRBSHwv CyKQBoqy2X9VS54WOnpBusFUmlWmjlU69M1FhlYPsenVTi77bn5H9a2RD8EvvWsVxLYS Z0hhDl/ByNp+lmKvZprF/gyafYt9MbcWMkV6TV5Pu9ujFhSjUFh2y5ex475hvdZJ5TZK QkGw== X-Received: by 10.194.93.105 with SMTP id ct9mr21600022wjb.6.1381094562243; Sun, 06 Oct 2013 14:22:42 -0700 (PDT) Received: from storm-desktop.lan (bl13-154-188.dsl.telepac.pt. [85.246.154.188]) by mx.google.com with ESMTPSA id fb9sm32921738wid.7.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 06 Oct 2013 14:22:41 -0700 (PDT) From: Filipe David Borba Manana To: linux-btrfs@vger.kernel.org Cc: Filipe David Borba Manana Subject: [PATCH v2] Btrfs: improve inode hash function/inode lookup Date: Sun, 6 Oct 2013 22:22:33 +0100 Message-Id: <1381094553-2190-1-git-send-email-fdmanana@gmail.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1381092807-21422-1-git-send-email-fdmanana@gmail.com> References: <1381092807-21422-1-git-send-email-fdmanana@gmail.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently the hash value used for adding an inode to the VFS's inode hash table consists of the plain inode number, which is a 64 bits integer. This results in hash table buckets (hlist_head lists) with too many elements for at least 2 important scenarios: 1) When we have many subvolumes. Each subvolume has its own btree where its files and directories are added to, and each has its own objectid (inode number) namespace. This means that if we have N subvolumes, and all have inode number X associated to a file or directory, the corresponding inodes all map to the same hash table entry, resulting in a bucket (hlist_head list) with N elements; 2) On 32 bits machines. Th VFS hash values are unsigned longs, which are 32 bits wide on 32 bits machines, and the inode (objectid) numbers are 64 bits unsigned integers. We simply cast the inode numbers to hash values, which means that for all inodes with the same 32 bits lower half, the same hash bucket is used for all of them. For example, all inodes with a number (objectid) between 0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in the same hash table bucket. This change ensures the inode's hash value depends both on the objectid (inode number) and its subvolume's (btree root) objectid. For 32 bits machines, this change gives better entropy by making the hash value depend on both the upper and lower 32 bits of the 64 bits hash previously computed. Signed-off-by: Filipe David Borba Manana --- V2: Renamed function inode_hash() to btrfs_inode_hash(). fs/btrfs/btrfs_inode.h | 20 ++++++++++++++++++++ fs/btrfs/disk-io.c | 2 +- fs/btrfs/inode.c | 6 ++++-- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 71f074e..ac0b39d 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -19,6 +19,7 @@ #ifndef __BTRFS_I__ #define __BTRFS_I__ +#include #include "extent_map.h" #include "extent_io.h" #include "ordered-data.h" @@ -179,6 +180,25 @@ static inline struct btrfs_inode *BTRFS_I(struct inode *inode) return container_of(inode, struct btrfs_inode, vfs_inode); } +static inline unsigned long btrfs_inode_hash(u64 objectid, + const struct btrfs_root *root) +{ + u64 h = objectid ^ (root->objectid * GOLDEN_RATIO_PRIME); + +#if BITS_PER_LONG == 32 + h = (h >> 32) ^ (h & 0xffffffff); +#endif + + return (unsigned long)h; +} + +static inline void btrfs_insert_inode_hash(struct inode *inode) +{ + unsigned long h = btrfs_inode_hash(inode->i_ino, BTRFS_I(inode)->root); + + __insert_inode_hash(inode, h); +} + static inline u64 btrfs_ino(struct inode *inode) { u64 ino = BTRFS_I(inode)->location.objectid; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ade6c0e..d205bdd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2294,7 +2294,7 @@ int open_ctree(struct super_block *sb, sizeof(struct btrfs_key)); set_bit(BTRFS_INODE_DUMMY, &BTRFS_I(fs_info->btree_inode)->runtime_flags); - insert_inode_hash(fs_info->btree_inode); + btrfs_insert_inode_hash(fs_info->btree_inode); spin_lock_init(&fs_info->block_group_cache_lock); fs_info->block_group_cache_tree = RB_ROOT; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 13b470c..cc3de9d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4837,10 +4837,12 @@ static struct inode *btrfs_iget_locked(struct super_block *s, { struct inode *inode; struct btrfs_iget_args args; + unsigned long hashval = btrfs_inode_hash(objectid, root); + args.ino = objectid; args.root = root; - inode = iget5_locked(s, objectid, btrfs_find_actor, + inode = iget5_locked(s, hashval, btrfs_find_actor, btrfs_init_locked_inode, (void *)&args); return inode; @@ -5460,7 +5462,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, BTRFS_INODE_NODATASUM; } - insert_inode_hash(inode); + btrfs_insert_inode_hash(inode); inode_tree_add(inode); trace_btrfs_inode_new(inode);