From patchwork Mon Jul 22 01:23:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051291 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63F5D13AC for ; Mon, 22 Jul 2019 01:24:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D60926E4A for ; Mon, 22 Jul 2019 01:24:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 413E4284A3; Mon, 22 Jul 2019 01:24:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 009C126E4A for ; Mon, 22 Jul 2019 01:24:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB35621FD28; Sun, 21 Jul 2019 18:24:03 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C2E6A21CB47 for ; Sun, 21 Jul 2019 18:23:57 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C5E2D26F; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B90C2BD; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:30 -0400 Message-Id: <1563758631-29550-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/22] ext4: add i_fs_version X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: James Simmons Add inode version field that osd-ldiskfs uses. Signed-off-by: James Simmons --- fs/ext4/ext4.h | 2 ++ fs/ext4/ialloc.c | 1 + fs/ext4/inode.c | 4 ++-- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 1cb6785..8abbcab 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1063,6 +1063,8 @@ struct ext4_inode_info { struct dquot *i_dquot[MAXQUOTAS]; #endif + u64 i_fs_version; + /* Precomputed uuid+inum+igen checksum for seeding inode checksums */ __u32 i_csum_seed; diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 764ff4c..09ae4a4 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1100,6 +1100,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, ei->i_dtime = 0; ei->i_block_group = group; ei->i_last_alloc_group = ~0; + ei->i_fs_version = 0; ext4_set_inode_flags(inode); if (IS_DIRSYNC(inode)) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c7f77c6..6e66175 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4806,14 +4806,14 @@ static inline void ext4_inode_set_iversion_queried(struct inode *inode, u64 val) if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) inode_set_iversion_raw(inode, val); else - inode_set_iversion_queried(inode, val); + EXT4_I(inode)->i_fs_version = val; } static inline u64 ext4_inode_peek_iversion(const struct inode *inode) { if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) return inode_peek_iversion_raw(inode); else - return inode_peek_iversion(inode); + return EXT4_I(inode)->i_fs_version; } struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, From patchwork Mon Jul 22 01:23:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051295 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F61F1398 for ; Mon, 22 Jul 2019 01:24:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08D0126E4A for ; Mon, 22 Jul 2019 01:24:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF92F284AA; Mon, 22 Jul 2019 01:24:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A7A0C26E4A for ; Mon, 22 Jul 2019 01:24:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5838C21FD73; Sun, 21 Jul 2019 18:24:08 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2427C21CB47 for ; Sun, 21 Jul 2019 18:23:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C876C270; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BC10DBF; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:31 -0400 Message-Id: <1563758631-29550-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/22] ext4: use d_find_alias() in ext4_lookup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: James Simmons Really old bug that might not even exist anymore? Signed-off-by: James Simmons --- fs/ext4/namei.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index cd01c4a..a616f58 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1664,6 +1664,35 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi } } + /* ".." shouldn't go into dcache to preserve dcache hierarchy + * otherwise we'll get parent being a child of actual child. + * see bug 10458 for details -bzzz + */ + if (inode && (dentry->d_name.name[0] == '.' && + (dentry->d_name.len == 1 || (dentry->d_name.len == 2 && + dentry->d_name.name[1] == '.')))) { + struct dentry *goal = NULL; + + /* first, look for an existing dentry - any one is good */ + goal = d_find_any_alias(inode); + if (!goal) { + spin_lock(&dentry->d_lock); + /* there is no alias, we need to make current dentry: + * a) inaccessible for __d_lookup() + * b) inaccessible for iopen + */ + J_ASSERT(hlist_unhashed(&dentry->d_u.d_alias)); + dentry->d_flags |= DCACHE_NFSFS_RENAMED; + /* this is d_instantiate() ... */ + hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry); + dentry->d_inode = inode; + spin_unlock(&dentry->d_lock); + } + if (goal) + iput(inode); + return goal; + } + #ifdef CONFIG_UNICODE if (!inode && IS_CASEFOLDED(dir)) { /* Eventually we want to call d_add_ci(dentry, NULL) From patchwork Mon Jul 22 01:23:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051299 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5DFBC13AC for ; Mon, 22 Jul 2019 01:24:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 47FC326E4A for ; Mon, 22 Jul 2019 01:24:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3BE59284AA; Mon, 22 Jul 2019 01:24:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7333F26E4A for ; Mon, 22 Jul 2019 01:24:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 24BA821FD45; Sun, 21 Jul 2019 18:24:12 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 64D1021CB47 for ; Sun, 21 Jul 2019 18:23:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CA013273; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C00AA1F3; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:32 -0400 Message-Id: <1563758631-29550-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/22] ext4: prealloc table optimization X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Optimize prealloc table Signed-off-by: James Simmons Reviewed-by: Artem Blagodarenko --- fs/ext4/ext4.h | 7 +- fs/ext4/inode.c | 3 + fs/ext4/mballoc.c | 221 +++++++++++++++++++++++++++++++++++++++++------------- fs/ext4/namei.c | 4 +- fs/ext4/sysfs.c | 8 +- 5 files changed, 186 insertions(+), 57 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 8abbcab..423ab4d 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1190,6 +1190,8 @@ struct ext4_inode_info { /* Metadata checksum algorithm codes */ #define EXT4_CRC32C_CHKSUM 1 +#define EXT4_MAX_PREALLOC_TABLE 64 + /* * Structure of the super block */ @@ -1447,12 +1449,14 @@ struct ext4_sb_info { /* tunables */ unsigned long s_stripe; - unsigned int s_mb_stream_request; + unsigned long s_mb_small_req; + unsigned long s_mb_large_req; unsigned int s_mb_max_to_scan; unsigned int s_mb_min_to_scan; unsigned int s_mb_stats; unsigned int s_mb_order2_reqs; unsigned int s_mb_group_prealloc; + unsigned long *s_mb_prealloc_table; unsigned int s_max_dir_size_kb; /* where last allocation was done - for stream allocation */ unsigned long s_mb_last_group; @@ -2457,6 +2461,7 @@ extern int ext4_init_inode_table(struct super_block *sb, extern void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate); /* mballoc.c */ +extern const struct file_operations ext4_seq_prealloc_table_fops; extern const struct seq_operations ext4_mb_seq_groups_ops; extern long ext4_mb_stats; extern long ext4_mb_max_to_scan; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6e66175..c37418a 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2796,6 +2796,9 @@ static int ext4_writepages(struct address_space *mapping, ext4_journal_stop(handle); } + if (wbc->nr_to_write < sbi->s_mb_small_req) + wbc->nr_to_write = sbi->s_mb_small_req; + if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) range_whole = 1; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 99ba720..3be3bef 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2339,6 +2339,101 @@ static void ext4_mb_seq_groups_stop(struct seq_file *seq, void *v) .show = ext4_mb_seq_groups_show, }; +static int ext4_mb_check_and_update_prealloc(struct ext4_sb_info *sbi, + char *str, size_t cnt, + int update) +{ + unsigned long value; + unsigned long prev = 0; + char *cur; + char *next; + char *end; + int num = 0; + + cur = str; + end = str + cnt; + while (cur < end) { + while ((cur < end) && (*cur == ' ')) cur++; + /* Yuck - simple_strtol */ + value = simple_strtol(cur, &next, 0); + if (value == 0) + break; + if (cur == next) + return -EINVAL; + + cur = next; + + if (value > (sbi->s_blocks_per_group - 1 - 1 - sbi->s_itb_per_group)) + return -EINVAL; + + /* they should add values in order */ + if (value <= prev) + return -EINVAL; + + if (update) + sbi->s_mb_prealloc_table[num] = value; + + prev = value; + num++; + } + + if (num > EXT4_MAX_PREALLOC_TABLE - 1) + return -EOVERFLOW; + + if (update) + sbi->s_mb_prealloc_table[num] = 0; + + return 0; +} + +static ssize_t ext4_mb_prealloc_table_proc_write(struct file *file, + const char __user *buf, + size_t cnt, loff_t *pos) +{ + struct ext4_sb_info *sbi = EXT4_SB(PDE_DATA(file_inode(file))); + char str[128]; + int rc; + + if (cnt >= sizeof(str)) + return -EINVAL; + if (copy_from_user(str, buf, cnt)) + return -EFAULT; + + rc = ext4_mb_check_and_update_prealloc(sbi, str, cnt, 0); + if (rc) + return rc; + + rc = ext4_mb_check_and_update_prealloc(sbi, str, cnt, 1); + return rc ? rc : cnt; +} + +static int mb_prealloc_table_seq_show(struct seq_file *m, void *v) +{ + struct ext4_sb_info *sbi = EXT4_SB(m->private); + int i; + + for (i = 0; i < EXT4_MAX_PREALLOC_TABLE && + sbi->s_mb_prealloc_table[i] != 0; i++) + seq_printf(m, "%ld ", sbi->s_mb_prealloc_table[i]); + seq_printf(m, "\n"); + + return 0; +} + +static int mb_prealloc_table_seq_open(struct inode *inode, struct file *file) +{ + return single_open(file, mb_prealloc_table_seq_show, PDE_DATA(inode)); +} + +const struct file_operations ext4_seq_prealloc_table_fops = { + .owner = THIS_MODULE, + .open = mb_prealloc_table_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, + .write = ext4_mb_prealloc_table_proc_write, +}; + static struct kmem_cache *get_groupinfo_cache(int blocksize_bits) { int cache_index = blocksize_bits - EXT4_MIN_BLOCK_LOG_SIZE; @@ -2567,7 +2662,7 @@ static int ext4_groupinfo_create_slab(size_t size) int ext4_mb_init(struct super_block *sb) { struct ext4_sb_info *sbi = EXT4_SB(sb); - unsigned i, j; + unsigned i, j, k, l; unsigned offset, offset_incr; unsigned max; int ret; @@ -2616,7 +2711,6 @@ int ext4_mb_init(struct super_block *sb) sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN; sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN; sbi->s_mb_stats = MB_DEFAULT_STATS; - sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD; sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS; /* * The default group preallocation is 512, which for 4k block @@ -2640,9 +2734,28 @@ int ext4_mb_init(struct super_block *sb) * RAID stripe size so that preallocations don't fragment * the stripes. */ - if (sbi->s_stripe > 1) { - sbi->s_mb_group_prealloc = roundup( - sbi->s_mb_group_prealloc, sbi->s_stripe); + /* Allocate table once */ + sbi->s_mb_prealloc_table = kzalloc( + EXT4_MAX_PREALLOC_TABLE * sizeof(unsigned long), GFP_NOFS); + if (!sbi->s_mb_prealloc_table) { + ret = -ENOMEM; + goto out; + } + + if (sbi->s_stripe == 0) { + for (k = 0, l = 4; k <= 9; ++k, l *= 2) + sbi->s_mb_prealloc_table[k] = l; + + sbi->s_mb_small_req = 256; + sbi->s_mb_large_req = 1024; + sbi->s_mb_group_prealloc = 512; + } else { + for (k = 0, l = sbi->s_stripe; k <= 2; ++k, l *= 2) + sbi->s_mb_prealloc_table[k] = l; + + sbi->s_mb_small_req = sbi->s_stripe; + sbi->s_mb_large_req = sbi->s_stripe * 8; + sbi->s_mb_group_prealloc = sbi->s_stripe * 4; } sbi->s_locality_groups = alloc_percpu(struct ext4_locality_group); @@ -2670,6 +2783,7 @@ int ext4_mb_init(struct super_block *sb) free_percpu(sbi->s_locality_groups); sbi->s_locality_groups = NULL; out: + kfree(sbi->s_mb_prealloc_table); kfree(sbi->s_mb_offsets); sbi->s_mb_offsets = NULL; kfree(sbi->s_mb_maxs); @@ -2932,7 +3046,6 @@ void ext4_exit_mballoc(void) int err, len; BUG_ON(ac->ac_status != AC_STATUS_FOUND); - BUG_ON(ac->ac_b_ex.fe_len <= 0); sb = ac->ac_sb; sbi = EXT4_SB(sb); @@ -3062,13 +3175,14 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) struct ext4_allocation_request *ar) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); - int bsbits, max; + int bsbits, i, wind; ext4_lblk_t end; - loff_t size, start_off; + loff_t size; loff_t orig_size __maybe_unused; ext4_lblk_t start; struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); struct ext4_prealloc_space *pa; + unsigned long value, last_non_zero; /* do normalize only data requests, metadata requests do not need preallocation */ @@ -3097,51 +3211,47 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) size = size << bsbits; if (size < i_size_read(ac->ac_inode)) size = i_size_read(ac->ac_inode); - orig_size = size; + size = (size + ac->ac_sb->s_blocksize - 1) >> bsbits; - /* max size of free chunks */ - max = 2 << bsbits; - -#define NRL_CHECK_SIZE(req, size, max, chunk_size) \ - (req <= (size) || max <= (chunk_size)) - - /* first, try to predict filesize */ - /* XXX: should this table be tunable? */ - start_off = 0; - if (size <= 16 * 1024) { - size = 16 * 1024; - } else if (size <= 32 * 1024) { - size = 32 * 1024; - } else if (size <= 64 * 1024) { - size = 64 * 1024; - } else if (size <= 128 * 1024) { - size = 128 * 1024; - } else if (size <= 256 * 1024) { - size = 256 * 1024; - } else if (size <= 512 * 1024) { - size = 512 * 1024; - } else if (size <= 1024 * 1024) { - size = 1024 * 1024; - } else if (NRL_CHECK_SIZE(size, 4 * 1024 * 1024, max, 2 * 1024)) { - start_off = ((loff_t)ac->ac_o_ex.fe_logical >> - (21 - bsbits)) << 21; - size = 2 * 1024 * 1024; - } else if (NRL_CHECK_SIZE(size, 8 * 1024 * 1024, max, 4 * 1024)) { - start_off = ((loff_t)ac->ac_o_ex.fe_logical >> - (22 - bsbits)) << 22; - size = 4 * 1024 * 1024; - } else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len, - (8<<20)>>bsbits, max, 8 * 1024)) { - start_off = ((loff_t)ac->ac_o_ex.fe_logical >> - (23 - bsbits)) << 23; - size = 8 * 1024 * 1024; + start = wind = 0; + value = last_non_zero = 0; + + /* let's choose preallocation window depending on file size */ + for (i = 0; i < EXT4_MAX_PREALLOC_TABLE; i++) { + value = sbi->s_mb_prealloc_table[i]; + if (value == 0) + break; + else + last_non_zero = value; + + if (size <= value) { + wind = value; + break; + } + } + + if (wind == 0) { + if (last_non_zero != 0) { + u64 tstart, tend; + + /* file is quite large, we now preallocate with + * the biggest configured window with regart to + * logical offset + */ + wind = last_non_zero; + tstart = ac->ac_o_ex.fe_logical; + do_div(tstart, wind); + start = tstart * wind; + tend = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len - 1; + do_div(tend, wind); + tend = tend * wind + wind; + size = tend - start; + } } else { - start_off = (loff_t) ac->ac_o_ex.fe_logical << bsbits; - size = (loff_t) EXT4_C2B(EXT4_SB(ac->ac_sb), - ac->ac_o_ex.fe_len) << bsbits; + size = wind; } - size = size >> bsbits; - start = start_off >> bsbits; + + orig_size = size; /* don't cover already allocated blocks in selected range */ if (ar->pleft && start <= ar->lleft) { @@ -3223,7 +3333,6 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) (unsigned long) ac->ac_o_ex.fe_logical); BUG(); } - BUG_ON(size <= 0 || size > EXT4_BLOCKS_PER_GROUP(ac->ac_sb)); /* now prepare goal request */ @@ -4191,11 +4300,19 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) /* don't use group allocation for large files */ size = max(size, isize); - if (size > sbi->s_mb_stream_request) { + if ((ac->ac_o_ex.fe_len >= sbi->s_mb_small_req) || + (size >= sbi->s_mb_large_req)) { ac->ac_flags |= EXT4_MB_STREAM_ALLOC; return; } + /* + * request is so large that we don't care about + * streaming - it overweights any possible seek + */ + if (ac->ac_o_ex.fe_len >= sbi->s_mb_large_req) + return; + BUG_ON(ac->ac_lg != NULL); /* * locality group prealloc space are per cpu. The reason for having diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a616f58..a52b311 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -752,8 +752,8 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, if (root->info.hash_version != DX_HASH_TEA && root->info.hash_version != DX_HASH_HALF_MD4 && root->info.hash_version != DX_HASH_LEGACY) { - ext4_warning_inode(dir, "Unrecognised inode hash code %u", - root->info.hash_version); + ext4_warning_inode(dir, "Unrecognised inode hash code %u for directory %lu", + root->info.hash_version, dir->i_ino); goto fail; } if (fname) diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 04b4f53..1375815 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -184,7 +184,8 @@ static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan); EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan); EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs); -EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request); +EXT4_RW_ATTR_SBI_UI(mb_small_req, s_mb_small_req); +EXT4_RW_ATTR_SBI_UI(mb_large_req, s_mb_large_req); EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc); EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb); EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error); @@ -213,7 +214,8 @@ static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) ATTR_LIST(mb_max_to_scan), ATTR_LIST(mb_min_to_scan), ATTR_LIST(mb_order2_req), - ATTR_LIST(mb_stream_req), + ATTR_LIST(mb_small_req), + ATTR_LIST(mb_large_req), ATTR_LIST(mb_group_prealloc), ATTR_LIST(max_writeback_mb_bump), ATTR_LIST(extent_max_zeroout_kb), @@ -413,6 +415,8 @@ int ext4_register_sysfs(struct super_block *sb) sb); proc_create_seq_data("mb_groups", S_IRUGO, sbi->s_proc, &ext4_mb_seq_groups_ops, sb); + proc_create_data("prealloc_table", S_IRUGO, sbi->s_proc, + &ext4_seq_prealloc_table_fops, sb); } return 0; } From patchwork Mon Jul 22 01:23:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3852813AC for ; Mon, 22 Jul 2019 01:24:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 222DF26E4A for ; Mon, 22 Jul 2019 01:24:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 151D8284A3; Mon, 22 Jul 2019 01:24:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B4DD526E4A for ; Mon, 22 Jul 2019 01:24:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D7F321FCF3; Sun, 21 Jul 2019 18:24:00 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA1AC21F4F7 for ; Sun, 21 Jul 2019 18:23:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CB9EA275; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C305E1F8; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:33 -0400 Message-Id: <1563758631-29550-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/22] ext4: export inode management X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Make ext4_delete_entry() exportable for osd-ldiskfs. Also add exportable ext4_create_inode(). Signed-off-by: James Simmons --- fs/ext4/ext4.h | 6 ++++++ fs/ext4/namei.c | 30 ++++++++++++++++++++++++++---- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 423ab4d..50f0c50 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2579,6 +2579,12 @@ extern int ext4_dirent_csum_verify(struct inode *inode, struct ext4_dir_entry *dirent); extern int ext4_orphan_add(handle_t *, struct inode *); extern int ext4_orphan_del(handle_t *, struct inode *); +extern struct inode *ext4_create_inode(handle_t *handle, + struct inode *dir, int mode, + uid_t *owner); +extern int ext4_delete_entry(handle_t *handle, struct inode * dir, + struct ext4_dir_entry_2 *de_del, + struct buffer_head *bh); extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash, __u32 start_minor_hash, __u32 *next_hash); extern int ext4_search_dir(struct buffer_head *bh, diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a52b311..a42a2db 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2459,10 +2459,9 @@ int ext4_generic_delete_entry(handle_t *handle, return -ENOENT; } -static int ext4_delete_entry(handle_t *handle, - struct inode *dir, - struct ext4_dir_entry_2 *de_del, - struct buffer_head *bh) +int ext4_delete_entry(handle_t *handle, struct inode *dir, + struct ext4_dir_entry_2 *de_del, + struct buffer_head *bh) { int err, csum_size = 0; @@ -2499,6 +2498,7 @@ static int ext4_delete_entry(handle_t *handle, ext4_std_error(dir->i_sb, err); return err; } +EXPORT_SYMBOL(ext4_delete_entry); /* * Set directory link count to 1 if nlinks > EXT4_LINK_MAX, or if nlinks == 2 @@ -2545,6 +2545,28 @@ static int ext4_add_nondir(handle_t *handle, return err; } +/* Return locked inode, then the caller can modify the inode's states/flags + * before others finding it. The caller should unlock the inode by itself. + */ +struct inode *ext4_create_inode(handle_t *handle, struct inode *dir, int mode, + uid_t *owner) +{ + struct inode *inode; + + inode = ext4_new_inode(handle, dir, mode, NULL, 0, owner, 0); + if (!IS_ERR(inode)) { + if (S_ISCHR(mode) || S_ISBLK(mode) || S_ISFIFO(mode)) { + inode->i_op = &ext4_special_inode_operations; + } else { + inode->i_op = &ext4_file_inode_operations; + inode->i_fop = &ext4_file_operations; + ext4_set_aops(inode); + } + } + return inode; +} +EXPORT_SYMBOL(ext4_create_inode); + /* * By the time this is called, we already have created * the directory cache entry for the new file, but it From patchwork Mon Jul 22 01:23:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051305 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A47181398 for ; Mon, 22 Jul 2019 01:24:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8DCE227528 for ; Mon, 22 Jul 2019 01:24:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 80CB5284B3; Mon, 22 Jul 2019 01:24:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D421727528 for ; Mon, 22 Jul 2019 01:24:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9DBA321FDF7; Sun, 21 Jul 2019 18:24:22 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4C10021F4F7 for ; Sun, 21 Jul 2019 18:23:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D7076277; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C639F1FA; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:34 -0400 Message-Id: <1563758631-29550-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/22] ext4: various misc changes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Mostly exporting symbols for osd-ldiskfs. Signed-off-by: James Simmons --- fs/ext4/balloc.c | 1 + fs/ext4/ext4.h | 26 +++++++++++++++++++++++++- fs/ext4/ext4_jbd2.c | 4 ++++ fs/ext4/hash.c | 1 + fs/ext4/ialloc.c | 3 ++- fs/ext4/inode.c | 6 ++++++ fs/ext4/namei.c | 13 +++++++------ fs/ext4/super.c | 9 +++------ 8 files changed, 49 insertions(+), 14 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index e5d6ee6..bca75c1 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -294,6 +294,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb, *bh = sbi->s_group_desc[group_desc]; return desc; } +EXPORT_SYMBOL(ext4_get_group_desc); /* * Return the block number which was discovered to be invalid, or 0 if diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 50f0c50..eb2d124 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1617,6 +1617,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei) #define NEXT_ORPHAN(inode) EXT4_I(inode)->i_dtime +#define JOURNAL_START_HAS_3ARGS 1 + /* * Codes for operating systems */ @@ -1839,7 +1841,24 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei) EXTN_FEATURE_FUNCS(2) EXTN_FEATURE_FUNCS(3) -EXTN_FEATURE_FUNCS(4) + +static inline bool ext4_has_unknown_ext4_compat_features(struct super_block *sb) +{ + return ((EXT4_SB(sb)->s_es->s_feature_compat & + cpu_to_le32(~EXT4_FEATURE_COMPAT_SUPP)) != 0); +} + +static inline bool ext4_has_unknown_ext4_ro_compat_features(struct super_block *sb) +{ + return ((EXT4_SB(sb)->s_es->s_feature_ro_compat & + cpu_to_le32(~EXT4_FEATURE_RO_COMPAT_SUPP)) != 0); +} + +static inline bool ext4_has_unknown_ext4_incompat_features(struct super_block *sb) +{ + return ((EXT4_SB(sb)->s_es->s_feature_incompat & + cpu_to_le32(~EXT4_FEATURE_INCOMPAT_SUPP)) != 0); +} static inline bool ext4_has_compat_features(struct super_block *sb) { @@ -3192,6 +3211,11 @@ extern int ext4_check_blockref(const char *, unsigned int, extern int ext4_ext_tree_init(handle_t *handle, struct inode *); extern int ext4_ext_writepage_trans_blocks(struct inode *, int); +extern struct buffer_head *ext4_read_inode_bitmap(struct super_block *sb, + ext4_group_t block_group); +extern struct buffer_head *ext4_append(handle_t *handle, + struct inode *inode, + ext4_lblk_t *block); extern int ext4_ext_index_trans_blocks(struct inode *inode, int extents); extern int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, struct ext4_map_blocks *map, int flags); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 7c70b08..fcec082 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -81,6 +81,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, return jbd2__journal_start(journal, blocks, rsv_blocks, GFP_NOFS, type, line); } +EXPORT_SYMBOL(__ext4_journal_start_sb); int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle) { @@ -108,6 +109,7 @@ int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle) __ext4_std_error(sb, where, line, err); return err; } +EXPORT_SYMBOL(__ext4_journal_stop); handle_t *__ext4_journal_start_reserved(handle_t *handle, unsigned int line, int type) @@ -173,6 +175,7 @@ int __ext4_journal_get_write_access(const char *where, unsigned int line, } return err; } +EXPORT_SYMBOL(__ext4_journal_get_write_access); /* * The ext4 forget function must perform a revoke if we are freeing data @@ -313,6 +316,7 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, } return err; } +EXPORT_SYMBOL(__ext4_handle_dirty_metadata); int __ext4_handle_dirty_super(const char *where, unsigned int line, handle_t *handle, struct super_block *sb) diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c index d358bfc..4648a8f 100644 --- a/fs/ext4/hash.c +++ b/fs/ext4/hash.c @@ -300,3 +300,4 @@ int ext4fs_dirhash(const struct inode *dir, const char *name, int len, #endif return __ext4fs_dirhash(name, len, hinfo); } +EXPORT_SYMBOL(ext4fs_dirhash); diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 09ae4a4..68d41e6 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -114,7 +114,7 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, * * Return buffer_head of bitmap on success or NULL. */ -static struct buffer_head * +struct buffer_head * ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group) { struct ext4_group_desc *desc; @@ -211,6 +211,7 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, put_bh(bh); return ERR_PTR(err); } +EXPORT_SYMBOL(ext4_read_inode_bitmap); /* * NOTE! When we get the inode, we're the only people diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c37418a..5561351 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -741,6 +741,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, } return retval; } +EXPORT_SYMBOL(ext4_map_blocks); /* * Update EXT4_MAP_FLAGS in bh->b_state. For buffer heads attached to pages @@ -1027,6 +1028,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode, put_bh(bh); return ERR_PTR(-EIO); } +EXPORT_SYMBOL(ext4_bread); /* Read a contiguous batch of blocks. */ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count, @@ -4559,6 +4561,7 @@ int ext4_truncate(struct inode *inode) trace_ext4_truncate_exit(inode); return err; } +EXPORT_SYMBOL(ext4_truncate); /* * ext4_get_inode_loc returns with an extra refcount against the inode's @@ -4710,6 +4713,7 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc) return __ext4_get_inode_loc(inode, iloc, !ext4_test_inode_state(inode, EXT4_STATE_XATTR)); } +EXPORT_SYMBOL(ext4_get_inode_loc); static bool ext4_should_use_dax(struct inode *inode) { @@ -5113,6 +5117,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, iget_failed(inode); return ERR_PTR(ret); } +EXPORT_SYMBOL(__ext4_iget); static int ext4_inode_blocks_set(handle_t *handle, struct ext4_inode *raw_inode, @@ -6050,6 +6055,7 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode) return ext4_mark_iloc_dirty(handle, inode, &iloc); } +EXPORT_SYMBOL(ext4_mark_inode_dirty); /* * ext4_dirty_inode() is called from __mark_inode_dirty() diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a42a2db..91fc7fe 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -50,9 +50,9 @@ #define NAMEI_RA_BLOCKS 4 #define NAMEI_RA_SIZE (NAMEI_RA_CHUNKS * NAMEI_RA_BLOCKS) -static struct buffer_head *ext4_append(handle_t *handle, - struct inode *inode, - ext4_lblk_t *block) +struct buffer_head *ext4_append(handle_t *handle, + struct inode *inode, + ext4_lblk_t *block) { struct buffer_head *bh; int err; @@ -2511,24 +2511,25 @@ int ext4_delete_entry(handle_t *handle, struct inode *dir, * for checking S_ISDIR(inode) (since the INODE_INDEX feature will not be set * on regular files) and to avoid creating huge/slow non-HTREE directories. */ -static void ext4_inc_count(handle_t *handle, struct inode *inode) +void ext4_inc_count(handle_t *handle, struct inode *inode) { inc_nlink(inode); if (is_dx(inode) && (inode->i_nlink > EXT4_LINK_MAX || inode->i_nlink == 2)) set_nlink(inode, 1); } +EXPORT_SYMBOL(ext4_inc_count); /* * If a directory had nlink == 1, then we should let it be 1. This indicates * directory has >EXT4_LINK_MAX subdirs. */ -static void ext4_dec_count(handle_t *handle, struct inode *inode) +void ext4_dec_count(handle_t *handle, struct inode *inode) { if (!S_ISDIR(inode->i_mode) || inode->i_nlink > 2) drop_nlink(inode); } - +EXPORT_SYMBOL(ext4_dec_count); static int ext4_add_nondir(handle_t *handle, struct dentry *dentry, struct inode *inode) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4079605..d15c26c 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -279,6 +279,7 @@ __u32 ext4_itable_unused_count(struct super_block *sb, (EXT4_DESC_SIZE(sb) >= EXT4_MIN_DESC_SIZE_64BIT ? (__u32)le16_to_cpu(bg->bg_itable_unused_hi) << 16 : 0); } +EXPORT_SYMBOL(ext4_itable_unused_count); void ext4_block_bitmap_set(struct super_block *sb, struct ext4_group_desc *bg, ext4_fsblk_t blk) @@ -658,6 +659,7 @@ void __ext4_std_error(struct super_block *sb, const char *function, save_error_info(sb, function, line); ext4_handle_error(sb); } +EXPORT_SYMBOL(__ext4_std_error); /* * ext4_abort is a much stronger failure handler than ext4_error. The @@ -5101,6 +5103,7 @@ int ext4_force_commit(struct super_block *sb) journal = EXT4_SB(sb)->s_journal; return ext4_journal_force_commit(journal); } +EXPORT_SYMBOL(ext4_force_commit); static int ext4_sync_fs(struct super_block *sb, int wait) { @@ -6115,16 +6118,12 @@ static int __init ext4_init_fs(void) err = init_inodecache(); if (err) goto out1; - register_as_ext3(); - register_as_ext2(); err = register_filesystem(&ext4_fs_type); if (err) goto out; return 0; out: - unregister_as_ext2(); - unregister_as_ext3(); destroy_inodecache(); out1: ext4_exit_mballoc(); @@ -6145,8 +6144,6 @@ static int __init ext4_init_fs(void) static void __exit ext4_exit_fs(void) { ext4_destroy_lazyinit_thread(); - unregister_as_ext2(); - unregister_as_ext3(); unregister_filesystem(&ext4_fs_type); destroy_inodecache(); ext4_exit_mballoc(); From patchwork Mon Jul 22 01:23:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051293 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C44CC1398 for ; Mon, 22 Jul 2019 01:24:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE8D126E4A for ; Mon, 22 Jul 2019 01:24:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2F38284AA; Mon, 22 Jul 2019 01:24:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E5BAA26E4A for ; Mon, 22 Jul 2019 01:24:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1B9C721FD4E; Sun, 21 Jul 2019 18:24:06 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A11B321FC3A for ; Sun, 21 Jul 2019 18:23:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D74C4278; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C93181FB; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:35 -0400 Message-Id: <1563758631-29550-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/22] ext4: add extra checks for mballoc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Handle mballoc corruptions. Signed-off-by: James Simmons --- fs/ext4/ext4.h | 1 + fs/ext4/mballoc.c | 110 +++++++++++++++++++++++++++++++++++++++++++++--------- fs/ext4/mballoc.h | 2 +- 3 files changed, 94 insertions(+), 19 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index eb2d124..e321286 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2957,6 +2957,7 @@ struct ext4_group_info { ext4_grpblk_t bb_fragments; /* nr of freespace fragments */ ext4_grpblk_t bb_largest_free_order;/* order of largest frag in BG */ struct list_head bb_prealloc_list; + unsigned long bb_prealloc_nr; #ifdef DOUBLE_CHECK void *bb_bitmap; #endif diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 3be3bef..483fc0f 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -352,8 +352,8 @@ "ext4_groupinfo_64k", "ext4_groupinfo_128k" }; -static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, - ext4_group_t group); +static int ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, + ext4_group_t group); static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap, ext4_group_t group); @@ -708,8 +708,8 @@ static void ext4_mb_mark_free_simple(struct super_block *sb, } static noinline_for_stack -void ext4_mb_generate_buddy(struct super_block *sb, - void *buddy, void *bitmap, ext4_group_t group) +int ext4_mb_generate_buddy(struct super_block *sb, + void *buddy, void *bitmap, ext4_group_t group) { struct ext4_group_info *grp = ext4_get_group_info(sb, group); struct ext4_sb_info *sbi = EXT4_SB(sb); @@ -752,6 +752,7 @@ void ext4_mb_generate_buddy(struct super_block *sb, grp->bb_free = free; ext4_mark_group_bitmap_corrupted(sb, group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); + return -EIO; } mb_set_largest_free_order(sb, grp); @@ -762,6 +763,8 @@ void ext4_mb_generate_buddy(struct super_block *sb, sbi->s_mb_buddies_generated++; sbi->s_mb_generation_time += period; spin_unlock(&sbi->s_bal_lock); + + return 0; } static void mb_regenerate_buddy(struct ext4_buddy *e4b) @@ -882,7 +885,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp) } first_block = page->index * blocks_per_page; - for (i = 0; i < blocks_per_page; i++) { + for (i = 0; i < blocks_per_page && err == 0; i++) { group = (first_block + i) >> 1; if (group >= ngroups) break; @@ -926,7 +929,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp) ext4_lock_group(sb, group); /* init the buddy */ memset(data, 0xff, blocksize); - ext4_mb_generate_buddy(sb, data, incore, group); + err = ext4_mb_generate_buddy(sb, data, incore, group); ext4_unlock_group(sb, group); incore = NULL; } else { @@ -941,7 +944,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp) memcpy(data, bitmap, blocksize); /* mark all preallocated blks used in in-core bitmap */ - ext4_mb_generate_from_pa(sb, data, group); + err = ext4_mb_generate_from_pa(sb, data, group); ext4_mb_generate_from_freelist(sb, data, group); ext4_unlock_group(sb, group); @@ -951,8 +954,8 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp) incore = data; } } - SetPageUptodate(page); - + if (likely(err == 0)) + SetPageUptodate(page); out: if (bh) { for (i = 0; i < groups_per_page; i++) @@ -2281,7 +2284,8 @@ static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v) { struct super_block *sb = PDE_DATA(file_inode(seq->file)); ext4_group_t group = (ext4_group_t) ((unsigned long) v); - int i; + struct ext4_group_desc *gdp; + int free = 0, i; int err, buddy_loaded = 0; struct ext4_buddy e4b; struct ext4_group_info *grinfo; @@ -2295,7 +2299,7 @@ static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v) group--; if (group == 0) - seq_puts(seq, "#group: free frags first [" + seq_puts(seq, "#group: bfree gfree free frags first [" " 2^0 2^1 2^2 2^3 2^4 2^5 2^6 " " 2^7 2^8 2^9 2^10 2^11 2^12 2^13 ]\n"); @@ -2313,13 +2317,19 @@ static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v) buddy_loaded = 1; } + gdp = ext4_get_group_desc(sb, group, NULL); + if (gdp) + free = ext4_free_group_clusters(sb, gdp); + memcpy(&sg, ext4_get_group_info(sb, group), i); if (buddy_loaded) ext4_mb_unload_buddy(&e4b); - seq_printf(seq, "#%-5u: %-5u %-5u %-5u [", group, sg.info.bb_free, - sg.info.bb_fragments, sg.info.bb_first_free); + seq_printf(seq, "#%-5lu: %-5u %-5u %-5u %-5u %-5lu [", + (long unsigned int)group, sg.info.bb_free, free, + sg.info.bb_fragments, sg.info.bb_first_free, + sg.info.bb_prealloc_nr); for (i = 0; i <= 13; i++) seq_printf(seq, " %-5u", i <= blocksize_bits + 1 ? sg.info.bb_counters[i] : 0); @@ -3593,6 +3603,42 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac, } /* + * check free blocks in bitmap match free block in group descriptor + * do this before taking preallocated blocks into account to be able + * to detect on-disk corruptions. The group lock should be hold by the + * caller. + */ +int ext4_mb_check_ondisk_bitmap(struct super_block *sb, void *bitmap, + struct ext4_group_desc *gdp, int group) +{ + unsigned short max = EXT4_CLUSTERS_PER_GROUP(sb); + unsigned short i, first, free = 0; + unsigned short free_in_gdp = ext4_free_group_clusters(sb, gdp); + + if (free_in_gdp == 0 && gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) + return 0; + + i = mb_find_next_zero_bit(bitmap, max, 0); + + while (i < max) { + first = i; + i = mb_find_next_bit(bitmap, max, i); + if (i > max) + i = max; + free += i - first; + if (i < max) + i = mb_find_next_zero_bit(bitmap, max, i); + } + + if (free != free_in_gdp) { + ext4_error(sb, "on-disk bitmap for group %d corrupted: %u blocks free in bitmap, %u - in gd\n", + group, free, free_in_gdp); + return -EIO; + } + return 0; +} + +/* * the function goes through all block freed in the group * but not yet committed and marks them used in in-core bitmap. * buddy must be generated from this bitmap @@ -3622,16 +3668,27 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap, * Need to be called with ext4 group lock held */ static noinline_for_stack -void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, - ext4_group_t group) +int ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, + ext4_group_t group) { struct ext4_group_info *grp = ext4_get_group_info(sb, group); struct ext4_prealloc_space *pa; + struct ext4_group_desc *gdp; struct list_head *cur; ext4_group_t groupnr; ext4_grpblk_t start; int preallocated = 0; - int len; + int skip = 0, count = 0; + int err, len; + + gdp = ext4_get_group_desc(sb, group, NULL); + if (!gdp) + return -EIO; + + /* before applying preallocations, check bitmap consistency */ + err = ext4_mb_check_ondisk_bitmap(sb, bitmap, gdp, group); + if (err) + return err; /* all form of preallocation discards first load group, * so the only competing code is preallocation use. @@ -3648,13 +3705,22 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, &groupnr, &start); len = pa->pa_len; spin_unlock(&pa->pa_lock); - if (unlikely(len == 0)) + if (unlikely(len == 0)) { + skip++; continue; + } BUG_ON(groupnr != group); ext4_set_bits(bitmap, start, len); preallocated += len; + count++; + } + if (count + skip != grp->bb_prealloc_nr) { + ext4_error(sb, "lost preallocations: count %d, bb_prealloc_nr %lu, skip %d\n", + count, grp->bb_prealloc_nr, skip); + return -EIO; } mb_debug(1, "preallocated %u for group %u\n", preallocated, group); + return 0; } static void ext4_mb_pa_callback(struct rcu_head *head) @@ -3718,6 +3784,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, */ ext4_lock_group(sb, grp); list_del(&pa->pa_group_list); + ext4_get_group_info(sb, grp)->bb_prealloc_nr--; ext4_unlock_group(sb, grp); spin_lock(pa->pa_obj_lock); @@ -3812,6 +3879,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, ext4_lock_group(sb, ac->ac_b_ex.fe_group); list_add(&pa->pa_group_list, &grp->bb_prealloc_list); + grp->bb_prealloc_nr++; ext4_unlock_group(sb, ac->ac_b_ex.fe_group); spin_lock(pa->pa_obj_lock); @@ -3873,6 +3941,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, ext4_lock_group(sb, ac->ac_b_ex.fe_group); list_add(&pa->pa_group_list, &grp->bb_prealloc_list); + grp->bb_prealloc_nr++; ext4_unlock_group(sb, ac->ac_b_ex.fe_group); /* @@ -4044,6 +4113,8 @@ static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac) spin_unlock(&pa->pa_lock); + BUG_ON(grp->bb_prealloc_nr == 0); + grp->bb_prealloc_nr--; list_del(&pa->pa_group_list); list_add(&pa->u.pa_tmp_list, &list); } @@ -4174,7 +4245,7 @@ void ext4_discard_preallocations(struct inode *inode) if (err) { ext4_error(sb, "Error %d loading buddy information for %u", err, group); - continue; + return; } bitmap_bh = ext4_read_block_bitmap(sb, group); @@ -4187,6 +4258,8 @@ void ext4_discard_preallocations(struct inode *inode) } ext4_lock_group(sb, group); + BUG_ON(e4b.bd_info->bb_prealloc_nr == 0); + e4b.bd_info->bb_prealloc_nr--; list_del(&pa->pa_group_list); ext4_mb_release_inode_pa(&e4b, bitmap_bh, pa); ext4_unlock_group(sb, group); @@ -4448,6 +4521,7 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) } ext4_lock_group(sb, group); list_del(&pa->pa_group_list); + ext4_get_group_info(sb, group)->bb_prealloc_nr--; ext4_mb_release_group_pa(&e4b, pa); ext4_unlock_group(sb, group); diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 88c98f1..8325ad9 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -70,7 +70,7 @@ /* * for which requests use 2^N search using buddies */ -#define MB_DEFAULT_ORDER2_REQS 2 +#define MB_DEFAULT_ORDER2_REQS 8 /* * default group prealloc size 512 blocks From patchwork Mon Jul 22 01:23:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051303 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE07014F6 for ; Mon, 22 Jul 2019 01:24:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98D7B27528 for ; Mon, 22 Jul 2019 01:24:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8C3A8284B3; Mon, 22 Jul 2019 01:24:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3D80627528 for ; Mon, 22 Jul 2019 01:24:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 14B8321FD33; Sun, 21 Jul 2019 18:24:17 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0616521F4F7 for ; Sun, 21 Jul 2019 18:23:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D5E70276; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CC1AF1FC; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:36 -0400 Message-Id: <1563758631-29550-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/22] ext4: update .. for hash indexed directory X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Handle special .. case. Signed-off-by: James Simmons --- fs/ext4/namei.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 91fc7fe..dd64a10 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2133,6 +2133,73 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, return retval; } +/* update ".." for hash-indexed directory, split the item "." if necessary */ +static int ext4_update_dotdot(handle_t *handle, struct dentry *dentry, + struct inode *inode) +{ + struct inode *dir = dentry->d_parent->d_inode; + struct buffer_head *dir_block; + struct ext4_dir_entry_2 *de; + int len, journal = 0, err = 0; + + if (IS_ERR(handle)) + return PTR_ERR(handle); + + if (IS_DIRSYNC(dir)) + handle->h_sync = 1; + + dir_block = ext4_bread(handle, dir, 0, 0); + if (IS_ERR(dir_block)) { + err = PTR_ERR(dir_block); + goto out; + } + + de = (struct ext4_dir_entry_2 *)dir_block->b_data; + /* the first item must be "." */ + assert(de->name_len == 1 && de->name[0] == '.'); + len = le16_to_cpu(de->rec_len); + assert(len >= EXT4_DIR_REC_LEN(1)); + if (len > EXT4_DIR_REC_LEN(1)) { + BUFFER_TRACE(dir_block, "get_write_access"); + err = ext4_journal_get_write_access(handle, dir_block); + if (err) + goto out_journal; + + journal = 1; + de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(1)); + } + + len -= EXT4_DIR_REC_LEN(1); + assert(len == 0 || len >= EXT4_DIR_REC_LEN(2)); + de = (struct ext4_dir_entry_2 *) + ((char *) de + le16_to_cpu(de->rec_len)); + if (!journal) { + BUFFER_TRACE(dir_block, "get_write_access"); + err = ext4_journal_get_write_access(handle, dir_block); + if (err) + goto out_journal; + } + + de->inode = cpu_to_le32(inode->i_ino); + if (len > 0) + de->rec_len = cpu_to_le16(len); + else + assert(le16_to_cpu(de->rec_len) >= EXT4_DIR_REC_LEN(2)); + de->name_len = 2; + strcpy(de->name, ".."); + ext4_set_de_type(dir->i_sb, de, S_IFDIR); + +out_journal: + if (journal) { + BUFFER_TRACE(dir_block, "call ext4_handle_dirty_metadata"); + err = ext4_handle_dirty_dirent_node(handle, dir, dir_block); + ext4_mark_inode_dirty(handle, dir); + } + brelse(dir_block); +out: + return err; +} + /* * ext4_add_entry() * @@ -2189,6 +2256,9 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, } if (is_dx(dir)) { + if (dentry->d_name.len == 2 && + memcmp(dentry->d_name.name, "..", 2) == 0) + return ext4_update_dotdot(handle, dentry, inode); retval = ext4_dx_add_entry(handle, &fname, dir, inode); if (!retval || (retval != ERR_BAD_DX_DIR)) goto out; From patchwork Mon Jul 22 01:23:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051311 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D359713AC for ; Mon, 22 Jul 2019 01:24:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE76626E4A for ; Mon, 22 Jul 2019 01:24:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B303A284AA; Mon, 22 Jul 2019 01:24:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 282CF26E4A for ; Mon, 22 Jul 2019 01:24:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0613021FD63; Sun, 21 Jul 2019 18:24:27 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 05D6321E0BD for ; Sun, 21 Jul 2019 18:24:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D754B27A; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CF50E1FD; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:37 -0400 Message-Id: <1563758631-29550-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/22] ext4: kill off struct dx_root X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Use struct dx_root_info directly. Signed-off-by: James Simmons --- fs/ext4/namei.c | 110 +++++++++++++++++++++++++++++--------------------------- 1 file changed, 58 insertions(+), 52 deletions(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index dd64a10..4c45570 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -197,23 +197,13 @@ struct dx_entry * dirent the two low bits of the hash version will be zero. Therefore, the * hash version mod 4 should never be 0. Sincerely, the paranoia department. */ - -struct dx_root +struct dx_root_info { - struct fake_dirent dot; - char dot_name[4]; - struct fake_dirent dotdot; - char dotdot_name[4]; - struct dx_root_info - { - __le32 reserved_zero; - u8 hash_version; - u8 info_length; /* 8 */ - u8 indirect_levels; - u8 unused_flags; - } - info; - struct dx_entry entries[0]; + __le32 reserved_zero; + u8 hash_version; + u8 info_length; /* 8 */ + u8 indirect_levels; + u8 unused_flags; }; struct dx_node @@ -514,6 +504,16 @@ static inline int ext4_handle_dirty_dx_node(handle_t *handle, * Future: use high four bits of block for coalesce-on-delete flags * Mask them off for now. */ +struct dx_root_info *dx_get_dx_info(struct ext4_dir_entry_2 *de) +{ + /* get dotdot first */ + de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(1)); + + /* dx root info is after dotdot entry */ + de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(2)); + + return (struct dx_root_info *)de; +} static inline ext4_lblk_t dx_get_block(struct dx_entry *entry) { @@ -738,7 +738,7 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, { unsigned count, indirect; struct dx_entry *at, *entries, *p, *q, *m; - struct dx_root *root; + struct dx_root_info *info; struct dx_frame *frame = frame_in; struct dx_frame *ret_err = ERR_PTR(ERR_BAD_DX_DIR); u32 hash; @@ -748,17 +748,17 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, if (IS_ERR(frame->bh)) return (struct dx_frame *) frame->bh; - root = (struct dx_root *) frame->bh->b_data; - if (root->info.hash_version != DX_HASH_TEA && - root->info.hash_version != DX_HASH_HALF_MD4 && - root->info.hash_version != DX_HASH_LEGACY) { + info = dx_get_dx_info((struct ext4_dir_entry_2 *)frame->bh->b_data); + if (info->hash_version != DX_HASH_TEA && + info->hash_version != DX_HASH_HALF_MD4 && + info->hash_version != DX_HASH_LEGACY) { ext4_warning_inode(dir, "Unrecognised inode hash code %u for directory %lu", - root->info.hash_version, dir->i_ino); + info->hash_version, dir->i_ino); goto fail; } if (fname) hinfo = &fname->hinfo; - hinfo->hash_version = root->info.hash_version; + hinfo->hash_version = info->hash_version; if (hinfo->hash_version <= DX_HASH_TEA) hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned; hinfo->seed = EXT4_SB(dir->i_sb)->s_hash_seed; @@ -766,13 +766,13 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, ext4fs_dirhash(dir, fname_name(fname), fname_len(fname), hinfo); hash = hinfo->hash; - if (root->info.unused_flags & 1) { + if (info->unused_flags & 1) { ext4_warning_inode(dir, "Unimplemented hash flags: %#06x", - root->info.unused_flags); + info->unused_flags); goto fail; } - indirect = root->info.indirect_levels; + indirect = info->indirect_levels; if (indirect >= ext4_dir_htree_level(dir->i_sb)) { ext4_warning(dir->i_sb, "Directory (ino: %lu) htree depth %#06x exceed" @@ -785,14 +785,13 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, goto fail; } - entries = (struct dx_entry *)(((char *)&root->info) + - root->info.info_length); + entries = (struct dx_entry *)(((char *)info) + info->info_length); if (dx_get_limit(entries) != dx_root_limit(dir, - root->info.info_length)) { + info->info_length)) { ext4_warning_inode(dir, "dx entry: limit %u != root limit %u", dx_get_limit(entries), - dx_root_limit(dir, root->info.info_length)); + dx_root_limit(dir, info->info_length)); goto fail; } @@ -877,7 +876,7 @@ static void dx_release(struct dx_frame *frames) if (frames[0].bh == NULL) return; - info = &((struct dx_root *)frames[0].bh->b_data)->info; + info = dx_get_dx_info((struct ext4_dir_entry_2 *)frames[0].bh->b_data); /* save local copy, "info" may be freed after brelse() */ indirect_levels = info->indirect_levels; for (i = 0; i <= indirect_levels; i++) { @@ -2020,17 +2019,16 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, struct inode *inode, struct buffer_head *bh) { struct buffer_head *bh2; - struct dx_root *root; struct dx_frame frames[EXT4_HTREE_LEVEL], *frame; struct dx_entry *entries; - struct ext4_dir_entry_2 *de, *de2; + struct ext4_dir_entry_2 *de, *de2, *dot_de, *dotdot_de; struct ext4_dir_entry_tail *t; char *data1, *top; unsigned len; int retval; unsigned blocksize; ext4_lblk_t block; - struct fake_dirent *fde; + struct dx_root_info *dx_info; int csum_size = 0; if (ext4_has_metadata_csum(inode->i_sb)) @@ -2045,18 +2043,19 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, brelse(bh); return retval; } - root = (struct dx_root *) bh->b_data; + + dot_de = (struct ext4_dir_entry_2 *)bh->b_data; + dotdot_de = ext4_next_entry(dot_de, blocksize); /* The 0th block becomes the root, move the dirents out */ - fde = &root->dotdot; - de = (struct ext4_dir_entry_2 *)((char *)fde + - ext4_rec_len_from_disk(fde->rec_len, blocksize)); - if ((char *) de >= (((char *) root) + blocksize)) { + de = (struct ext4_dir_entry_2 *)((char *)dotdot_de + + ext4_rec_len_from_disk(dotdot_de->rec_len, blocksize)); + if ((char *)de >= (((char *)dot_de) + blocksize)) { EXT4_ERROR_INODE(dir, "invalid rec_len for '..'"); brelse(bh); return -EFSCORRUPTED; } - len = ((char *) root) + (blocksize - csum_size) - (char *) de; + len = ((char *)dot_de) + (blocksize - csum_size) - (char *) de; /* Allocate new block for the 0th block's dirents */ bh2 = ext4_append(handle, dir, &block); @@ -2082,19 +2081,24 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, } /* Initialize the root; the dot dirents already exist */ - de = (struct ext4_dir_entry_2 *) (&root->dotdot); - de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(2), - blocksize); - memset (&root->info, 0, sizeof(root->info)); - root->info.info_length = sizeof(root->info); - root->info.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version; - entries = root->entries; + dotdot_de->rec_len = + ext4_rec_len_to_disk(blocksize - le16_to_cpu(dot_de->rec_len), + blocksize); + + /* initialize hashing info */ + dx_info = dx_get_dx_info(dot_de); + memset(dx_info, 0, sizeof(*dx_info)); + dx_info->info_length = sizeof(*dx_info); + dx_info->hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version; + + entries = (void *)dx_info + sizeof(*dx_info); + dx_set_block(entries, 1); dx_set_count(entries, 1); - dx_set_limit(entries, dx_root_limit(dir, sizeof(root->info))); + dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info))); /* Initialize as for dx_probe */ - fname->hinfo.hash_version = root->info.hash_version; + fname->hinfo.hash_version = dx_info->hash_version; if (fname->hinfo.hash_version <= DX_HASH_TEA) fname->hinfo.hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned; fname->hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed; @@ -2443,7 +2447,8 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, goto journal_error; } } else { - struct dx_root *dxroot; + struct dx_root_info *info; + memcpy((char *) entries2, (char *) entries, icount * sizeof(struct dx_entry)); dx_set_limit(entries2, dx_node_limit(dir)); @@ -2451,8 +2456,9 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, /* Set up root */ dx_set_count(entries, 1); dx_set_block(entries + 0, newblock); - dxroot = (struct dx_root *)frames[0].bh->b_data; - dxroot->info.indirect_levels += 1; + info = dx_get_dx_info((struct ext4_dir_entry_2 *) + frames[0].bh->b_data); + info->indirect_levels += 1; dxtrace(printk(KERN_DEBUG "Creating %d level index...\n", dxroot->info.indirect_levels)); From patchwork Mon Jul 22 01:23:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051313 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 487C713AC for ; Mon, 22 Jul 2019 01:24:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3356B26E4A for ; Mon, 22 Jul 2019 01:24:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 27972284AA; Mon, 22 Jul 2019 01:24:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BA8EA26E4A for ; Mon, 22 Jul 2019 01:24:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BD44E21FDB1; Sun, 21 Jul 2019 18:24:31 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5D13221FCE2 for ; Sun, 21 Jul 2019 18:24:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D90DD27B; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D3EDDB5; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:38 -0400 Message-Id: <1563758631-29550-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/22] ext4: fix mballoc pa free mismatch X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Intoduce pa_error so we can find any mballoc pa cleanup problems. Signed-off-by: James Simmons --- fs/ext4/mballoc.c | 45 +++++++++++++++++++++++++++++++++++++++------ fs/ext4/mballoc.h | 2 ++ 2 files changed, 41 insertions(+), 6 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 483fc0f..463fba6 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3863,6 +3863,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted = 0; pa->pa_type = MB_INODE_PA; + pa->pa_error = 0; mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa, pa->pa_pstart, pa->pa_len, pa->pa_lstart); @@ -3924,6 +3925,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted = 0; pa->pa_type = MB_GROUP_PA; + pa->pa_error = 0; mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa, pa->pa_pstart, pa->pa_len, pa->pa_lstart); @@ -3983,7 +3985,9 @@ static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac) unsigned long long grp_blk_start; int free = 0; + assert_spin_locked(ext4_group_lock_ptr(sb, e4b->bd_group)); BUG_ON(pa->pa_deleted == 0); + BUG_ON(pa->pa_inode == NULL); ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, &bit); grp_blk_start = pa->pa_pstart - EXT4_C2B(sbi, bit); BUG_ON(group != e4b->bd_group && pa->pa_len != 0); @@ -4006,12 +4010,19 @@ static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac) mb_free_blocks(pa->pa_inode, e4b, bit, next - bit); bit = next + 1; } - if (free != pa->pa_free) { - ext4_msg(e4b->bd_sb, KERN_CRIT, - "pa %p: logic %lu, phys. %lu, len %lu", - pa, (unsigned long) pa->pa_lstart, - (unsigned long) pa->pa_pstart, - (unsigned long) pa->pa_len); + + /* "free < pa->pa_free" means we maybe double alloc the same blocks, + * otherwise maybe leave some free blocks unavailable, no need to BUG. + */ + if ((free > pa->pa_free && !pa->pa_error) || (free < pa->pa_free)) { + ext4_error(sb, "pa free mismatch: [pa %p] " + "[phy %lu] [logic %lu] [len %u] [free %u] " + "[error %u] [inode %lu] [freed %u]", pa, + (unsigned long)pa->pa_pstart, + (unsigned long)pa->pa_lstart, + (unsigned)pa->pa_len, (unsigned)pa->pa_free, + (unsigned)pa->pa_error, pa->pa_inode->i_ino, + free); ext4_grp_locked_error(sb, group, 0, 0, "free %u, pa_free %u", free, pa->pa_free); /* @@ -4019,6 +4030,8 @@ static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac) * from the bitmap and continue. */ } + /* do not verify if the file system is being umounted */ + BUG_ON(atomic_read(&sb->s_active) > 0 && pa->pa_free != free); atomic_add(free, &sbi->s_mb_discarded); return 0; @@ -4764,6 +4777,26 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, ac->ac_b_ex.fe_len = 0; ar->len = 0; ext4_mb_show_ac(ac); + if (ac->ac_pa) { + struct ext4_prealloc_space *pa = ac->ac_pa; + + /* We can not make sure whether the bitmap has + * been updated or not when fail case. So can + * not revert pa_free back, just mark pa_error + */ + pa->pa_error++; + ext4_error(sb, + "Updating bitmap error: [err %d] " + "[pa %p] [phy %lu] [logic %lu] " + "[len %u] [free %u] [error %u] " + "[inode %lu]", *errp, pa, + (unsigned long)pa->pa_pstart, + (unsigned long)pa->pa_lstart, + (unsigned)pa->pa_len, + (unsigned)pa->pa_free, + (unsigned)pa->pa_error, + pa->pa_inode ? pa->pa_inode->i_ino : 0); + } } ext4_mb_release_context(ac); out: diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 8325ad9..e00c3b7 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -20,6 +20,7 @@ #include #include #include +#include #include "ext4_jbd2.h" #include "ext4.h" @@ -111,6 +112,7 @@ struct ext4_prealloc_space { ext4_grpblk_t pa_len; /* len of preallocated chunk */ ext4_grpblk_t pa_free; /* how many blocks are free */ unsigned short pa_type; /* pa type. inode or group */ + unsigned short pa_error; spinlock_t *pa_obj_lock; struct inode *pa_inode; /* hack, for history only */ }; From patchwork Mon Jul 22 01:23:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051297 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21C8213AC for ; Mon, 22 Jul 2019 01:24:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0985F26E4A for ; Mon, 22 Jul 2019 01:24:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F11FE284AA; Mon, 22 Jul 2019 01:24:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7F60F26E4A for ; Mon, 22 Jul 2019 01:24:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EE6E121FD98; Sun, 21 Jul 2019 18:24:10 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BD4F221FCE2 for ; Sun, 21 Jul 2019 18:24:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DCB7627C; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D7110BD; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:39 -0400 Message-Id: <1563758631-29550-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/22] ext4: add data in dentry feature X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Lustre likes to stuff extra data into dentrys. Signed-off-by: James Simmons --- fs/ext4/dir.c | 14 ++-- fs/ext4/ext4.h | 92 +++++++++++++++++++++-- fs/ext4/inline.c | 16 ++-- fs/ext4/namei.c | 217 +++++++++++++++++++++++++++++++++++++++++++------------ fs/ext4/super.c | 4 +- 5 files changed, 278 insertions(+), 65 deletions(-) diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index c7843b1..20e0d32 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -70,11 +70,11 @@ int __ext4_check_dir_entry(const char *function, unsigned int line, const int rlen = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize); - if (unlikely(rlen < EXT4_DIR_REC_LEN(1))) + if (unlikely(rlen < __EXT4_DIR_REC_LEN(1))) error_msg = "rec_len is smaller than minimal"; else if (unlikely(rlen % 4 != 0)) error_msg = "rec_len % 4 != 0"; - else if (unlikely(rlen < EXT4_DIR_REC_LEN(de->name_len))) + else if (unlikely(rlen < __EXT4_DIR_REC_LEN(de->name_len))) error_msg = "rec_len is too small for name_len"; else if (unlikely(((char *) de - buf) + rlen > size)) error_msg = "directory entry overrun"; @@ -219,7 +219,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx) * failure will be detected in the * dirent test below. */ if (ext4_rec_len_from_disk(de->rec_len, - sb->s_blocksize) < EXT4_DIR_REC_LEN(1)) + sb->s_blocksize) < __EXT4_DIR_REC_LEN(1)) break; i += ext4_rec_len_from_disk(de->rec_len, sb->s_blocksize); @@ -441,13 +441,17 @@ int ext4_htree_store_dirent(struct file *dir_file, __u32 hash, struct rb_node **p, *parent = NULL; struct fname *fname, *new_fn; struct dir_private_info *info; + int extra_data = 0; int len; info = dir_file->private_data; p = &info->root.rb_node; /* Create and allocate the fname structure */ - len = sizeof(struct fname) + ent_name->len + 1; + if (dirent->file_type & EXT4_DIRENT_LUFID) + extra_data = ext4_get_dirent_data_len(dirent); + + len = sizeof(struct fname) + ent_name->len + extra_data + 1; new_fn = kzalloc(len, GFP_KERNEL); if (!new_fn) return -ENOMEM; @@ -456,7 +460,7 @@ int ext4_htree_store_dirent(struct file *dir_file, __u32 hash, new_fn->inode = le32_to_cpu(dirent->inode); new_fn->name_len = ent_name->len; new_fn->file_type = dirent->file_type; - memcpy(new_fn->name, ent_name->name, ent_name->len); + memcpy(new_fn->name, ent_name->name, ent_name->len + extra_data); new_fn->name[ent_name->len] = 0; while (*p) { diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e321286..51b6159 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1069,6 +1069,7 @@ struct ext4_inode_info { __u32 i_csum_seed; kprojid_t i_projid; + void *i_dirdata; }; /* @@ -1112,6 +1113,7 @@ struct ext4_inode_info { #define EXT4_MOUNT_POSIX_ACL 0x08000 /* POSIX Access Control Lists */ #define EXT4_MOUNT_NO_AUTO_DA_ALLOC 0x10000 /* No auto delalloc mapping */ #define EXT4_MOUNT_BARRIER 0x20000 /* Use block barriers */ +#define EXT4_MOUNT_DIRDATA 0x40000 /* Data in directory entries */ #define EXT4_MOUNT_QUOTA 0x40000 /* Some quota option set */ #define EXT4_MOUNT_USRQUOTA 0x80000 /* "old" user quota, * enable enforcement for hidden @@ -1805,6 +1807,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei) EXT4_FEATURE_INCOMPAT_FLEX_BG| \ EXT4_FEATURE_INCOMPAT_EA_INODE| \ EXT4_FEATURE_INCOMPAT_MMP | \ + EXT4_FEATURE_INCOMPAT_DIRDATA | \ EXT4_FEATURE_INCOMPAT_INLINE_DATA | \ EXT4_FEATURE_INCOMPAT_ENCRYPT | \ EXT4_FEATURE_INCOMPAT_CASEFOLD | \ @@ -1984,6 +1987,43 @@ struct ext4_dir_entry_tail { #define EXT4_FT_SYMLINK 7 #define EXT4_FT_MAX 8 +#define EXT4_FT_MASK 0xf + +#if EXT4_FT_MAX > EXT4_FT_MASK +#error "conflicting EXT4_FT_MAX and EXT4_FT_MASK" +#endif + +/* + * d_type has 4 unused bits, so it can hold four types data. these different + * type of data (e.g. lustre data, high 32 bits of 64-bit inode number) can be + * stored, in flag order, after file-name in ext4 dirent. +*/ +/* + * this flag is added to d_type if ext4 dirent has extra data after + * filename. this data length is variable and length is stored in first byte + * of data. data start after filename NUL byte. + * This is used by Lustre FS. + */ +#define EXT4_DIRENT_LUFID 0x10 + +#define EXT4_LUFID_MAGIC 0xAD200907UL +struct ext4_dentry_param { + __u32 edp_magic; /* EXT4_LUFID_MAGIC */ + char edp_len; /* size of edp_data in bytes */ + char edp_data[0]; /* packed array of data */ +} __packed; + +static inline unsigned char *ext4_dentry_get_data(struct super_block *sb, + struct ext4_dentry_param *p) +{ + if (!ext4_has_feature_dirdata(sb)) + return NULL; + + if (p && p->edp_magic == EXT4_LUFID_MAGIC) + return &p->edp_len; + else + return NULL; +} #define EXT4_FT_DIR_CSUM 0xDE @@ -1994,8 +2034,11 @@ struct ext4_dir_entry_tail { */ #define EXT4_DIR_PAD 4 #define EXT4_DIR_ROUND (EXT4_DIR_PAD - 1) -#define EXT4_DIR_REC_LEN(name_len) (((name_len) + 8 + EXT4_DIR_ROUND) & \ +#define __EXT4_DIR_REC_LEN(name_len) (((name_len) + 8 + EXT4_DIR_ROUND) & \ ~EXT4_DIR_ROUND) +#define EXT4_DIR_REC_LEN(de) (__EXT4_DIR_REC_LEN((de)->name_len +\ + ext4_get_dirent_data_len(de))) + #define EXT4_MAX_REC_LEN ((1<<16)-1) /* @@ -2422,11 +2465,11 @@ extern int ext4_find_dest_de(struct inode *dir, struct inode *inode, struct buffer_head *bh, void *buf, int buf_size, struct ext4_filename *fname, - struct ext4_dir_entry_2 **dest_de); + struct ext4_dir_entry_2 **dest_de, int *dlen); void ext4_insert_dentry(struct inode *inode, struct ext4_dir_entry_2 *de, int buf_size, - struct ext4_filename *fname); + struct ext4_filename *fname, void *data); static inline void ext4_update_dx_flag(struct inode *inode) { if (!ext4_has_feature_dir_index(inode->i_sb)) @@ -2438,11 +2481,18 @@ static inline void ext4_update_dx_flag(struct inode *inode) static inline unsigned char get_dtype(struct super_block *sb, int filetype) { - if (!ext4_has_feature_filetype(sb) || filetype >= EXT4_FT_MAX) + int fl_index = filetype & EXT4_FT_MASK; + + if (!ext4_has_feature_filetype(sb) || fl_index >= EXT4_FT_MAX) return DT_UNKNOWN; - return ext4_filetype_table[filetype]; + if (!test_opt(sb, DIRDATA)) + return ext4_filetype_table[fl_index]; + + return (ext4_filetype_table[fl_index]) | + (filetype & EXT4_DIRENT_LUFID); } + extern int ext4_check_all_de(struct inode *dir, struct buffer_head *bh, void *buf, int buf_size); @@ -2604,6 +2654,8 @@ extern struct inode *ext4_create_inode(handle_t *handle, extern int ext4_delete_entry(handle_t *handle, struct inode * dir, struct ext4_dir_entry_2 *de_del, struct buffer_head *bh); +extern int ext4_add_dot_dotdot(handle_t *handle, struct inode *dir, + struct inode *inode, const void *, const void *); extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash, __u32 start_minor_hash, __u32 *next_hash); extern int ext4_search_dir(struct buffer_head *bh, @@ -3339,6 +3391,36 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end) extern const struct iomap_ops ext4_iomap_ops; +/* + * Compute the total directory entry data length. + * This includes the filename and an implicit NUL terminator (always present), + * and optional extensions. Each extension has a bit set in the high 4 bits of + * de->file_type, and the extension length is the first byte in each entry. + */ +static inline int ext4_get_dirent_data_len(struct ext4_dir_entry_2 *de) +{ + char *len = de->name + de->name_len + 1 /* NUL terminator */; + int dlen = 0; + u8 extra_data_flags = (de->file_type & ~EXT4_FT_MASK) >> 4; + struct ext4_dir_entry_tail *t = (struct ext4_dir_entry_tail *)de; + + if (!t->det_reserved_zero1 && + le16_to_cpu(t->det_rec_len) == + sizeof(struct ext4_dir_entry_tail) && + !t->det_reserved_zero2 && + t->det_reserved_ft == EXT4_FT_DIR_CSUM) + return 0; + + while (extra_data_flags) { + if (extra_data_flags & 1) { + dlen += *len + (dlen == 0); + len += *len; + } + extra_data_flags >>= 1; + } + return dlen; +} + #endif /* __KERNEL__ */ #define EFSBADCRC EBADMSG /* Bad CRC detected */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index f73bc39..7610cfe 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -1023,7 +1023,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle, struct ext4_dir_entry_2 *de; err = ext4_find_dest_de(dir, inode, iloc->bh, inline_start, - inline_size, fname, &de); + inline_size, fname, &de, NULL); if (err) return err; @@ -1031,7 +1031,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle, err = ext4_journal_get_write_access(handle, iloc->bh); if (err) return err; - ext4_insert_dentry(inode, de, inline_size, fname); + ext4_insert_dentry(inode, de, inline_size, fname, NULL); ext4_show_inline_dir(dir, iloc->bh, inline_start, inline_size); @@ -1100,7 +1100,7 @@ static int ext4_update_inline_dir(handle_t *handle, struct inode *dir, int old_size = EXT4_I(dir)->i_inline_size - EXT4_MIN_INLINE_DATA_SIZE; int new_size = get_max_inline_xattr_value_size(dir, iloc); - if (new_size - old_size <= EXT4_DIR_REC_LEN(1)) + if (new_size - old_size <= __EXT4_DIR_REC_LEN(1)) return -ENOSPC; ret = ext4_update_inline_data(handle, dir, @@ -1381,7 +1381,7 @@ int htree_inlinedir_to_tree(struct file *dir_file, fake.name_len = 1; strcpy(fake.name, "."); fake.rec_len = ext4_rec_len_to_disk( - EXT4_DIR_REC_LEN(fake.name_len), + EXT4_DIR_REC_LEN(&fake), inline_size); ext4_set_de_type(inode->i_sb, &fake, S_IFDIR); de = &fake; @@ -1391,7 +1391,7 @@ int htree_inlinedir_to_tree(struct file *dir_file, fake.name_len = 2; strcpy(fake.name, ".."); fake.rec_len = ext4_rec_len_to_disk( - EXT4_DIR_REC_LEN(fake.name_len), + EXT4_DIR_REC_LEN(&fake), inline_size); ext4_set_de_type(inode->i_sb, &fake, S_IFDIR); de = &fake; @@ -1489,8 +1489,8 @@ int ext4_read_inline_dir(struct file *file, * So we will use extra_offset and extra_size to indicate them * during the inline dir iteration. */ - dotdot_offset = EXT4_DIR_REC_LEN(1); - dotdot_size = dotdot_offset + EXT4_DIR_REC_LEN(2); + dotdot_offset = __EXT4_DIR_REC_LEN(1); + dotdot_size = dotdot_offset + __EXT4_DIR_REC_LEN(2); extra_offset = dotdot_size - EXT4_INLINE_DOTDOT_SIZE; extra_size = extra_offset + inline_size; @@ -1525,7 +1525,7 @@ int ext4_read_inline_dir(struct file *file, * failure will be detected in the * dirent test below. */ if (ext4_rec_len_from_disk(de->rec_len, extra_size) - < EXT4_DIR_REC_LEN(1)) + < __EXT4_DIR_REC_LEN(1)) break; i += ext4_rec_len_from_disk(de->rec_len, extra_size); diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 4c45570..9cb86e4 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -243,7 +243,9 @@ struct dx_tail { static unsigned dx_get_limit(struct dx_entry *entries); static void dx_set_count(struct dx_entry *entries, unsigned value); static void dx_set_limit(struct dx_entry *entries, unsigned value); -static unsigned dx_root_limit(struct inode *dir, unsigned infosize); +static unsigned int inline dx_root_limit(struct inode *dir, + struct ext4_dir_entry_2 *dot_de, + unsigned int infosize); static unsigned dx_node_limit(struct inode *dir); static struct dx_frame *dx_probe(struct ext4_filename *fname, struct inode *dir, @@ -384,24 +386,26 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode, struct ext4_dir_entry *dirent, int *offset) { + int dot_rec_len, dotdot_rec_len; struct ext4_dir_entry *dp; struct dx_root_info *root; int count_offset; if (le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb)) count_offset = 8; - else if (le16_to_cpu(dirent->rec_len) == 12) { - dp = (struct ext4_dir_entry *)(((void *)dirent) + 12); + else { + dot_rec_len = le16_to_cpu(dirent->rec_len); + dp = (struct ext4_dir_entry *)(((void *)dirent) + dot_rec_len); if (le16_to_cpu(dp->rec_len) != - EXT4_BLOCK_SIZE(inode->i_sb) - 12) + EXT4_BLOCK_SIZE(inode->i_sb) - dot_rec_len) return NULL; - root = (struct dx_root_info *)(((void *)dp + 12)); + dotdot_rec_len = EXT4_DIR_REC_LEN((struct ext4_dir_entry_2 *)dp); + root = (struct dx_root_info *)(((void *)dp + dotdot_rec_len)); if (root->reserved_zero || root->info_length != sizeof(struct dx_root_info)) return NULL; - count_offset = 32; - } else - return NULL; + count_offset = 8 + dot_rec_len + dotdot_rec_len; + } if (offset) *offset = count_offset; @@ -507,10 +511,10 @@ static inline int ext4_handle_dirty_dx_node(handle_t *handle, struct dx_root_info *dx_get_dx_info(struct ext4_dir_entry_2 *de) { /* get dotdot first */ - de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(1)); + de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(de)); /* dx root info is after dotdot entry */ - de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(2)); + de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(de)); return (struct dx_root_info *)de; } @@ -555,10 +559,17 @@ static inline void dx_set_limit(struct dx_entry *entries, unsigned value) ((struct dx_countlimit *) entries)->limit = cpu_to_le16(value); } -static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize) +static inline unsigned int dx_root_limit(struct inode *dir, + struct ext4_dir_entry_2 *dot_de, + unsigned int infosize) { - unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(1) - - EXT4_DIR_REC_LEN(2) - infosize; + struct ext4_dir_entry_2 *dotdot_de; + unsigned int entry_space; + + BUG_ON(dot_de->name_len != 1); + dotdot_de = ext4_next_entry(dot_de, dir->i_sb->s_blocksize); + entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(dot_de) - + EXT4_DIR_REC_LEN(dotdot_de) - infosize; if (ext4_has_metadata_csum(dir->i_sb)) entry_space -= sizeof(struct dx_tail); @@ -567,7 +578,7 @@ static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize) static inline unsigned dx_node_limit(struct inode *dir) { - unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(0); + unsigned entry_space = dir->i_sb->s_blocksize - __EXT4_DIR_REC_LEN(0); if (ext4_has_metadata_csum(dir->i_sb)) entry_space -= sizeof(struct dx_tail); @@ -679,7 +690,7 @@ static struct stats dx_show_leaf(struct inode *dir, (unsigned) ((char *) de - base)); #endif } - space += EXT4_DIR_REC_LEN(de->name_len); + space += EXT4_DIR_REC_LEN(de); names++; } de = ext4_next_entry(de, size); @@ -787,11 +798,14 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, entries = (struct dx_entry *)(((char *)info) + info->info_length); - if (dx_get_limit(entries) != dx_root_limit(dir, - info->info_length)) { + if (dx_get_limit(entries) != + dx_root_limit(dir, (struct ext4_dir_entry_2 *)frame->bh->b_data, + info->info_length)) { ext4_warning_inode(dir, "dx entry: limit %u != root limit %u", dx_get_limit(entries), - dx_root_limit(dir, info->info_length)); + dx_root_limit(dir, + (struct ext4_dir_entry_2 *)frame->bh->b_data, + info->info_length)); goto fail; } @@ -986,7 +1000,7 @@ static int htree_dirblock_to_tree(struct file *dir_file, de = (struct ext4_dir_entry_2 *) bh->b_data; top = (struct ext4_dir_entry_2 *) ((char *) de + dir->i_sb->s_blocksize - - EXT4_DIR_REC_LEN(0)); + __EXT4_DIR_REC_LEN(0)); #ifdef CONFIG_FS_ENCRYPTION /* Check if the directory is encrypted */ if (IS_ENCRYPTED(dir)) { @@ -1743,7 +1757,7 @@ struct dentry *ext4_get_parent(struct dentry *child) while (count--) { struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *) (from + (map->offs<<2)); - rec_len = EXT4_DIR_REC_LEN(de->name_len); + rec_len = EXT4_DIR_REC_LEN(de); memcpy (to, de, rec_len); ((struct ext4_dir_entry_2 *) to)->rec_len = ext4_rec_len_to_disk(rec_len, blocksize); @@ -1767,7 +1781,7 @@ static struct ext4_dir_entry_2* dx_pack_dirents(char *base, unsigned blocksize) while ((char*)de < base + blocksize) { next = ext4_next_entry(de, blocksize); if (de->inode && de->name_len) { - rec_len = EXT4_DIR_REC_LEN(de->name_len); + rec_len = EXT4_DIR_REC_LEN(de); if (de > to) memmove(to, de, rec_len); to->rec_len = ext4_rec_len_to_disk(rec_len, blocksize); @@ -1898,14 +1912,16 @@ int ext4_find_dest_de(struct inode *dir, struct inode *inode, struct buffer_head *bh, void *buf, int buf_size, struct ext4_filename *fname, - struct ext4_dir_entry_2 **dest_de) + struct ext4_dir_entry_2 **dest_de, int *dlen) { struct ext4_dir_entry_2 *de; - unsigned short reclen = EXT4_DIR_REC_LEN(fname_len(fname)); + unsigned short reclen = __EXT4_DIR_REC_LEN(fname_len(fname)) + + (dlen ? *dlen : 0); int nlen, rlen; unsigned int offset = 0; char *top; + dlen ? *dlen = 0 : 0; /* default set to 0 */ de = (struct ext4_dir_entry_2 *)buf; top = buf + buf_size - reclen; while ((char *) de <= top) { @@ -1914,10 +1930,29 @@ int ext4_find_dest_de(struct inode *dir, struct inode *inode, return -EFSCORRUPTED; if (ext4_match(dir, fname, de)) return -EEXIST; - nlen = EXT4_DIR_REC_LEN(de->name_len); + nlen = EXT4_DIR_REC_LEN(de); rlen = ext4_rec_len_from_disk(de->rec_len, buf_size); if ((de->inode ? rlen - nlen : rlen) >= reclen) break; + /* Then for dotdot entries, check for the smaller space + * required for just the entry, no FID + */ + if (fname_len(fname) == 2 && memcmp(fname_name(fname), "..", 2) == 0) { + if ((de->inode ? rlen - nlen : rlen) >= + __EXT4_DIR_REC_LEN(fname_len(fname))) { + /* set dlen=1 to indicate not + * enough space store fid + */ + dlen ? *dlen = 1 : 0; + break; + } + /* The new ".." entry must be written over the + * previous ".." entry, which is the first + * entry traversed by this scan. If it doesn't + * fit, something is badly wrong, so -EIO. + */ + return -EIO; + } de = (struct ext4_dir_entry_2 *)((char *)de + rlen); offset += rlen; } @@ -1931,12 +1966,12 @@ int ext4_find_dest_de(struct inode *dir, struct inode *inode, void ext4_insert_dentry(struct inode *inode, struct ext4_dir_entry_2 *de, int buf_size, - struct ext4_filename *fname) + struct ext4_filename *fname, void *data ) { int nlen, rlen; - nlen = EXT4_DIR_REC_LEN(de->name_len); + nlen = EXT4_DIR_REC_LEN(de); rlen = ext4_rec_len_from_disk(de->rec_len, buf_size); if (de->inode) { struct ext4_dir_entry_2 *de1 = @@ -1950,6 +1985,11 @@ void ext4_insert_dentry(struct inode *inode, ext4_set_de_type(inode->i_sb, de, inode->i_mode); de->name_len = fname_len(fname); memcpy(de->name, fname_name(fname), fname_len(fname)); + if (data) { + de->name[fname_len(fname)] = 0; + memcpy(&de->name[fname_len(fname) + 1], data, *(char *)data); + de->file_type |= EXT4_DIRENT_LUFID; + } } /* @@ -1966,15 +2006,21 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname, struct buffer_head *bh) { unsigned int blocksize = dir->i_sb->s_blocksize; + unsigned char *data; int csum_size = 0; + int dlen = 0; int err; + data = ext4_dentry_get_data(inode->i_sb, + (struct ext4_dentry_param *)EXT4_I(inode)->i_dirdata); if (ext4_has_metadata_csum(inode->i_sb)) csum_size = sizeof(struct ext4_dir_entry_tail); if (!de) { + if (data) + dlen = (*data) + 1; err = ext4_find_dest_de(dir, inode, bh, bh->b_data, - blocksize - csum_size, fname, &de); + blocksize - csum_size, fname, &de, &dlen); if (err) return err; } @@ -1986,7 +2032,10 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname, } /* By now the buffer is marked for journaling */ - ext4_insert_dentry(inode, de, blocksize, fname); + /* If writing the short form of "dotdot", don't add the data section */ + if (dlen == 1) + data = NULL; + ext4_insert_dentry(inode, de, blocksize, fname, data); /* * XXX shouldn't update any times until successful @@ -2095,7 +2144,8 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, dx_set_block(entries, 1); dx_set_count(entries, 1); - dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info))); + dx_set_limit(entries, dx_root_limit(dir, dot_de, + sizeof(*dx_info))); /* Initialize as for dx_probe */ fname->hinfo.hash_version = dx_info->hash_version; @@ -2145,6 +2195,8 @@ static int ext4_update_dotdot(handle_t *handle, struct dentry *dentry, struct buffer_head *dir_block; struct ext4_dir_entry_2 *de; int len, journal = 0, err = 0; + int dlen = 0; + char *data; if (IS_ERR(handle)) return PTR_ERR(handle); @@ -2162,19 +2214,24 @@ static int ext4_update_dotdot(handle_t *handle, struct dentry *dentry, /* the first item must be "." */ assert(de->name_len == 1 && de->name[0] == '.'); len = le16_to_cpu(de->rec_len); - assert(len >= EXT4_DIR_REC_LEN(1)); - if (len > EXT4_DIR_REC_LEN(1)) { + assert(len >= __EXT4_DIR_REC_LEN(1)); + if (len > __EXT4_DIR_REC_LEN(1)) { BUFFER_TRACE(dir_block, "get_write_access"); err = ext4_journal_get_write_access(handle, dir_block); if (err) goto out_journal; journal = 1; - de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(1)); + de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(de)); } - len -= EXT4_DIR_REC_LEN(1); - assert(len == 0 || len >= EXT4_DIR_REC_LEN(2)); + len -= EXT4_DIR_REC_LEN(de); + data = ext4_dentry_get_data(dir->i_sb, + (struct ext4_dentry_param *)dentry->d_fsdata); + if (data) + dlen = *data + 1; + assert(len == 0 || len >= __EXT4_DIR_REC_LEN(2 + dlen)); + de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de->rec_len)); if (!journal) { @@ -2188,10 +2245,15 @@ static int ext4_update_dotdot(handle_t *handle, struct dentry *dentry, if (len > 0) de->rec_len = cpu_to_le16(len); else - assert(le16_to_cpu(de->rec_len) >= EXT4_DIR_REC_LEN(2)); + assert(le16_to_cpu(de->rec_len) >= __EXT4_DIR_REC_LEN(2)); de->name_len = 2; strcpy(de->name, ".."); - ext4_set_de_type(dir->i_sb, de, S_IFDIR); + if (data && ext4_get_dirent_data_len(de) >= dlen) { + de->name[2] = 0; + memcpy(&de->name[2 + 1], data, *data); + ext4_set_de_type(dir->i_sb, de, S_IFDIR); + de->file_type |= EXT4_DIRENT_LUFID; + } out_journal: if (journal) { @@ -2230,6 +2292,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, ext4_lblk_t block, blocks; int csum_size = 0; + EXT4_I(inode)->i_dirdata = dentry->d_fsdata; if (ext4_has_metadata_csum(inode->i_sb)) csum_size = sizeof(struct ext4_dir_entry_tail); @@ -2757,37 +2820,71 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) return err; } +struct tp_block { + struct inode *inode; + void *data1; + void *data2; +}; + struct ext4_dir_entry_2 *ext4_init_dot_dotdot(struct inode *inode, struct ext4_dir_entry_2 *de, int blocksize, int csum_size, unsigned int parent_ino, int dotdot_real_len) { + void *data1 = NULL, *data2 = NULL; + int dot_reclen = 0; + + if (dotdot_real_len == 10) { + struct tp_block *tpb = (struct tp_block *)inode; + + data1 = tpb->data1; + data2 = tpb->data2; + inode = tpb->inode; + dotdot_real_len = 0; + } de->inode = cpu_to_le32(inode->i_ino); de->name_len = 1; - de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len), - blocksize); strcpy(de->name, "."); ext4_set_de_type(inode->i_sb, de, S_IFDIR); + /* get packed fid data*/ + data1 = ext4_dentry_get_data(inode->i_sb, + (struct ext4_dentry_param *) data1); + if (data1) { + de->name[1] = 0; + memcpy(&de->name[2], data1, *(char *) data1); + de->file_type |= EXT4_DIRENT_LUFID; + } + de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(de)); + dot_reclen = cpu_to_le16(de->rec_len); de = ext4_next_entry(de, blocksize); de->inode = cpu_to_le32(parent_ino); de->name_len = 2; + strcpy(de->name, ".."); + ext4_set_de_type(inode->i_sb, de, S_IFDIR); + data2 = ext4_dentry_get_data(inode->i_sb, + (struct ext4_dentry_param *) data2); + if (data2) { + de->name[2] = 0; + memcpy(&de->name[3], data2, *(char *) data2); + de->file_type |= EXT4_DIRENT_LUFID; + } if (!dotdot_real_len) de->rec_len = ext4_rec_len_to_disk(blocksize - - (csum_size + EXT4_DIR_REC_LEN(1)), + (csum_size + dot_reclen), blocksize); else de->rec_len = ext4_rec_len_to_disk( - EXT4_DIR_REC_LEN(de->name_len), blocksize); - strcpy(de->name, ".."); - ext4_set_de_type(inode->i_sb, de, S_IFDIR); + EXT4_DIR_REC_LEN(de), blocksize); return ext4_next_entry(de, blocksize); } static int ext4_init_new_dir(handle_t *handle, struct inode *dir, - struct inode *inode) + struct inode *inode, + const void *data1, const void *data2) { + struct tp_block param; struct buffer_head *dir_block = NULL; struct ext4_dir_entry_2 *de; struct ext4_dir_entry_tail *t; @@ -2812,7 +2909,11 @@ static int ext4_init_new_dir(handle_t *handle, struct inode *dir, if (IS_ERR(dir_block)) return PTR_ERR(dir_block); de = (struct ext4_dir_entry_2 *)dir_block->b_data; - ext4_init_dot_dotdot(inode, de, blocksize, csum_size, dir->i_ino, 0); + param.inode = inode; + param.data1 = (void *)data1; + param.data2 = (void *)data2; + ext4_init_dot_dotdot((struct inode *)(¶m), de, blocksize, + csum_size, dir->i_ino, 10); set_nlink(inode, 2); if (csum_size) { t = EXT4_DIRENT_TAIL(dir_block->b_data, blocksize); @@ -2829,6 +2930,30 @@ static int ext4_init_new_dir(handle_t *handle, struct inode *dir, return err; } +/* Initialize @inode as a subdirectory of @dir, and add the + * "." and ".." entries into the first directory block. + */ +int ext4_add_dot_dotdot(handle_t *handle, struct inode *dir, + struct inode *inode, + const void *data1, const void *data2) +{ + int rc; + + if (IS_ERR(handle)) + return PTR_ERR(handle); + + if (IS_DIRSYNC(dir)) + ext4_handle_sync(handle); + + inode->i_op = &ext4_dir_inode_operations; + inode->i_fop = &ext4_dir_operations; + rc = ext4_init_new_dir(handle, dir, inode, data1, data2); + if (!rc) + rc = ext4_mark_inode_dirty(handle, inode); + return rc; +} +EXPORT_SYMBOL(ext4_add_dot_dotdot); + static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) { handle_t *handle; @@ -2855,7 +2980,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) inode->i_op = &ext4_dir_inode_operations; inode->i_fop = &ext4_dir_operations; - err = ext4_init_new_dir(handle, dir, inode); + err = ext4_init_new_dir(handle, dir, inode, NULL, NULL); if (err) goto out_clear_inode; err = ext4_mark_inode_dirty(handle, inode); @@ -2906,7 +3031,7 @@ bool ext4_empty_dir(struct inode *inode) } sb = inode->i_sb; - if (inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2)) { + if (inode->i_size < __EXT4_DIR_REC_LEN(1) + __EXT4_DIR_REC_LEN(2)) { EXT4_ERROR_INODE(inode, "invalid size"); return true; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index d15c26c..564eb35 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1447,7 +1447,7 @@ enum { Opt_data_err_abort, Opt_data_err_ignore, Opt_test_dummy_encryption, Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota, - Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err, + Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err, Opt_dirdata, Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax, Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error, Opt_nowarn_on_error, Opt_mblk_io_submit, @@ -1474,6 +1474,7 @@ enum { {Opt_err_ro, "errors=remount-ro"}, {Opt_nouid32, "nouid32"}, {Opt_debug, "debug"}, + {Opt_dirdata, "dirdata"}, {Opt_removed, "oldalloc"}, {Opt_removed, "orlov"}, {Opt_user_xattr, "user_xattr"}, @@ -1747,6 +1748,7 @@ static int clear_qf_name(struct super_block *sb, int qtype) {Opt_grpjquota, 0, MOPT_Q}, {Opt_offusrjquota, 0, MOPT_Q}, {Opt_offgrpjquota, 0, MOPT_Q}, + {Opt_dirdata, EXT4_MOUNT_DIRDATA, MOPT_SET}, {Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT}, {Opt_jqfmt_vfsv0, QFMT_VFS_V0, MOPT_QFMT}, {Opt_jqfmt_vfsv1, QFMT_VFS_V1, MOPT_QFMT}, From patchwork Mon Jul 22 01:23:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051317 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB4E913AC for ; Mon, 22 Jul 2019 01:24:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C704826E4A for ; Mon, 22 Jul 2019 01:24:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB137284AA; Mon, 22 Jul 2019 01:24:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 850F126E4A for ; Mon, 22 Jul 2019 01:24:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B723021FDEB; Sun, 21 Jul 2019 18:24:35 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 22F6221FCE2 for ; Sun, 21 Jul 2019 18:24:01 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E3DB627D; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DA038BF; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:40 -0400 Message-Id: <1563758631-29550-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/22] ext4: over ride current_time X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP return i_ctime if IS_NOCMTIME is true for inode. Signed-off-by: James Simmons --- fs/ext4/ext4.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 51b6159..80601a9 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -661,6 +661,13 @@ enum { #define EXT4_GOING_FLAGS_LOGFLUSH 0x1 /* flush log but not data */ #define EXT4_GOING_FLAGS_NOLOGFLUSH 0x2 /* don't flush log nor data */ +static inline struct timespec64 ext4_current_time(struct inode *inode) +{ + if (IS_NOCMTIME(inode)) + return inode->i_ctime; + return current_time(inode); +} +#define current_time(a) ext4_current_time(a) #if defined(__KERNEL__) && defined(CONFIG_COMPAT) /* From patchwork Mon Jul 22 01:23:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051307 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A013D1398 for ; Mon, 22 Jul 2019 01:24:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8495826E4A for ; Mon, 22 Jul 2019 01:24:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 787D5284AA; Mon, 22 Jul 2019 01:24:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 618ED26E4A for ; Mon, 22 Jul 2019 01:24:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7BEA621FD96; Sun, 21 Jul 2019 18:24:21 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ED9ED21FD1F for ; Sun, 21 Jul 2019 18:24:01 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E57DF27E; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DD4651F3; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:41 -0400 Message-Id: <1563758631-29550-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/22] ext4: add htree lock implementation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Used for parallel lookup Signed-off-by: James Simmons --- fs/ext4/Makefile | 4 +- fs/ext4/ext4.h | 80 +++++ fs/ext4/htree_lock.c | 891 +++++++++++++++++++++++++++++++++++++++++++++++++++ fs/ext4/htree_lock.h | 187 +++++++++++ fs/ext4/namei.c | 461 +++++++++++++++++++++++--- fs/ext4/super.c | 1 + 6 files changed, 1583 insertions(+), 41 deletions(-) create mode 100644 fs/ext4/htree_lock.c create mode 100644 fs/ext4/htree_lock.h diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile index 8fdfcd3..0d0a28f 100644 --- a/fs/ext4/Makefile +++ b/fs/ext4/Makefile @@ -6,8 +6,8 @@ obj-$(CONFIG_EXT4_FS) += ext4.o ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \ - extents_status.o file.o fsmap.o fsync.o hash.o ialloc.o \ - indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \ + extents_status.o file.o fsmap.o fsync.o hash.o htree_lock.o \ + ialloc.o indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \ mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \ super.o symlink.o sysfs.o xattr.o xattr_trusted.o xattr_user.o diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 80601a9..bf74c7c 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -44,6 +44,8 @@ #include +#include "htree_lock.h" + /* * The fourth extended filesystem constants/structures */ @@ -936,6 +938,9 @@ struct ext4_inode_info { __u32 i_dtime; ext4_fsblk_t i_file_acl; + /* following fields for parallel directory operations -bzzz */ + struct semaphore i_append_sem; + /* * i_block_group is the number of the block group which contains * this file's inode. Constant across the lifetime of the inode, @@ -2145,6 +2150,72 @@ struct dx_hash_info */ #define HASH_NB_ALWAYS 1 +/* assume name-hash is protected by upper layer */ +#define EXT4_HTREE_LOCK_HASH 0 + +enum ext4_pdo_lk_types { +#if EXT4_HTREE_LOCK_HASH + EXT4_LK_HASH, +#endif + EXT4_LK_DX, /* index block */ + EXT4_LK_DE, /* directory entry block */ + EXT4_LK_SPIN, /* spinlock */ + EXT4_LK_MAX, +}; + +/* read-only bit */ +#define EXT4_LB_RO(b) (1 << (b)) +/* read + write, high bits for writer */ +#define EXT4_LB_RW(b) ((1 << (b)) | (1 << (EXT4_LK_MAX + (b)))) + +enum ext4_pdo_lock_bits { + /* DX lock bits */ + EXT4_LB_DX_RO = EXT4_LB_RO(EXT4_LK_DX), + EXT4_LB_DX = EXT4_LB_RW(EXT4_LK_DX), + /* DE lock bits */ + EXT4_LB_DE_RO = EXT4_LB_RO(EXT4_LK_DE), + EXT4_LB_DE = EXT4_LB_RW(EXT4_LK_DE), + /* DX spinlock bits */ + EXT4_LB_SPIN_RO = EXT4_LB_RO(EXT4_LK_SPIN), + EXT4_LB_SPIN = EXT4_LB_RW(EXT4_LK_SPIN), + /* accurate searching */ + EXT4_LB_EXACT = EXT4_LB_RO(EXT4_LK_MAX << 1), +}; + +enum ext4_pdo_lock_opc { + /* external */ + EXT4_HLOCK_READDIR = (EXT4_LB_DE_RO | EXT4_LB_DX_RO), + EXT4_HLOCK_LOOKUP = (EXT4_LB_DE_RO | EXT4_LB_SPIN_RO | + EXT4_LB_EXACT), + EXT4_HLOCK_DEL = (EXT4_LB_DE | EXT4_LB_SPIN_RO | + EXT4_LB_EXACT), + EXT4_HLOCK_ADD = (EXT4_LB_DE | EXT4_LB_SPIN_RO), + + /* internal */ + EXT4_HLOCK_LOOKUP_SAFE = (EXT4_LB_DE_RO | EXT4_LB_DX_RO | + EXT4_LB_EXACT), + EXT4_HLOCK_DEL_SAFE = (EXT4_LB_DE | EXT4_LB_DX_RO | EXT4_LB_EXACT), + EXT4_HLOCK_SPLIT = (EXT4_LB_DE | EXT4_LB_DX | EXT4_LB_SPIN), +}; + +extern struct htree_lock_head *ext4_htree_lock_head_alloc(unsigned hbits); +#define ext4_htree_lock_head_free(lhead) htree_lock_head_free(lhead) + +extern struct htree_lock *ext4_htree_lock_alloc(void); +#define ext4_htree_lock_free(lck) htree_lock_free(lck) + +extern void ext4_htree_lock(struct htree_lock *lck, + struct htree_lock_head *lhead, + struct inode *dir, unsigned flags); +#define ext4_htree_unlock(lck) htree_unlock(lck) + +extern struct buffer_head *ext4_find_entry_locked(struct inode *dir, + const struct qstr *d_name, + struct ext4_dir_entry_2 **res_dir, + int *inlined, struct htree_lock *lck); +extern int ext4_add_entry_locked(handle_t *handle, struct dentry *dentry, + struct inode *inode, struct htree_lock *lck); + struct ext4_filename { const struct qstr *usr_fname; struct fscrypt_str disk_name; @@ -2479,8 +2550,17 @@ void ext4_insert_dentry(struct inode *inode, struct ext4_filename *fname, void *data); static inline void ext4_update_dx_flag(struct inode *inode) { + /* Disable it for ldiskfs, because going from a DX directory to + * a non-DX directory while it is in use will completely break + * the htree-locking. + * If we really want to support this operation in the future, + * we need to exclusively lock the directory at here which will + * increase complexity of code + */ +#if 0 if (!ext4_has_feature_dir_index(inode->i_sb)) ext4_clear_inode_flag(inode, EXT4_INODE_INDEX); +#endif } static const unsigned char ext4_filetype_table[] = { DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK diff --git a/fs/ext4/htree_lock.c b/fs/ext4/htree_lock.c new file mode 100644 index 0000000..9c1e8f0 --- /dev/null +++ b/fs/ext4/htree_lock.c @@ -0,0 +1,891 @@ +/* + * fs/ext4/htree_lock.c + * + * Copyright (c) 2011, 2012, Intel Corporation. + * + * Author: Liang Zhen + */ +#include +#include +#include +#include "htree_lock.h" + +enum { + HTREE_LOCK_BIT_EX = (1 << HTREE_LOCK_EX), + HTREE_LOCK_BIT_PW = (1 << HTREE_LOCK_PW), + HTREE_LOCK_BIT_PR = (1 << HTREE_LOCK_PR), + HTREE_LOCK_BIT_CW = (1 << HTREE_LOCK_CW), + HTREE_LOCK_BIT_CR = (1 << HTREE_LOCK_CR), +}; + +enum { + HTREE_LOCK_COMPAT_EX = 0, + HTREE_LOCK_COMPAT_PW = HTREE_LOCK_COMPAT_EX | HTREE_LOCK_BIT_CR, + HTREE_LOCK_COMPAT_PR = HTREE_LOCK_COMPAT_PW | HTREE_LOCK_BIT_PR, + HTREE_LOCK_COMPAT_CW = HTREE_LOCK_COMPAT_PW | HTREE_LOCK_BIT_CW, + HTREE_LOCK_COMPAT_CR = HTREE_LOCK_COMPAT_CW | HTREE_LOCK_BIT_PR | + HTREE_LOCK_BIT_PW, +}; + +static int htree_lock_compat[] = { + [HTREE_LOCK_EX] HTREE_LOCK_COMPAT_EX, + [HTREE_LOCK_PW] HTREE_LOCK_COMPAT_PW, + [HTREE_LOCK_PR] HTREE_LOCK_COMPAT_PR, + [HTREE_LOCK_CW] HTREE_LOCK_COMPAT_CW, + [HTREE_LOCK_CR] HTREE_LOCK_COMPAT_CR, +}; + +/* max allowed htree-lock depth. + * We only need depth=3 for ext4 although user can have higher value. */ +#define HTREE_LOCK_DEP_MAX 16 + +#ifdef HTREE_LOCK_DEBUG + +static char *hl_name[] = { + [HTREE_LOCK_EX] "EX", + [HTREE_LOCK_PW] "PW", + [HTREE_LOCK_PR] "PR", + [HTREE_LOCK_CW] "CW", + [HTREE_LOCK_CR] "CR", +}; + +/* lock stats */ +struct htree_lock_node_stats { + unsigned long long blocked[HTREE_LOCK_MAX]; + unsigned long long granted[HTREE_LOCK_MAX]; + unsigned long long retried[HTREE_LOCK_MAX]; + unsigned long long events; +}; + +struct htree_lock_stats { + struct htree_lock_node_stats nodes[HTREE_LOCK_DEP_MAX]; + unsigned long long granted[HTREE_LOCK_MAX]; + unsigned long long blocked[HTREE_LOCK_MAX]; +}; + +static struct htree_lock_stats hl_stats; + +void htree_lock_stat_reset(void) +{ + memset(&hl_stats, 0, sizeof(hl_stats)); +} + +void htree_lock_stat_print(int depth) +{ + int i; + int j; + + printk(KERN_DEBUG "HTREE LOCK STATS:\n"); + for (i = 0; i < HTREE_LOCK_MAX; i++) { + printk(KERN_DEBUG "[%s]: G [%10llu], B [%10llu]\n", + hl_name[i], hl_stats.granted[i], hl_stats.blocked[i]); + } + for (i = 0; i < depth; i++) { + printk(KERN_DEBUG "HTREE CHILD [%d] STATS:\n", i); + for (j = 0; j < HTREE_LOCK_MAX; j++) { + printk(KERN_DEBUG + "[%s]: G [%10llu], B [%10llu], R [%10llu]\n", + hl_name[j], hl_stats.nodes[i].granted[j], + hl_stats.nodes[i].blocked[j], + hl_stats.nodes[i].retried[j]); + } + } +} + +#define lk_grant_inc(m) do { hl_stats.granted[m]++; } while (0) +#define lk_block_inc(m) do { hl_stats.blocked[m]++; } while (0) +#define ln_grant_inc(d, m) do { hl_stats.nodes[d].granted[m]++; } while (0) +#define ln_block_inc(d, m) do { hl_stats.nodes[d].blocked[m]++; } while (0) +#define ln_retry_inc(d, m) do { hl_stats.nodes[d].retried[m]++; } while (0) +#define ln_event_inc(d) do { hl_stats.nodes[d].events++; } while (0) + +#else /* !DEBUG */ + +void htree_lock_stat_reset(void) {} +void htree_lock_stat_print(int depth) {} + +#define lk_grant_inc(m) do {} while (0) +#define lk_block_inc(m) do {} while (0) +#define ln_grant_inc(d, m) do {} while (0) +#define ln_block_inc(d, m) do {} while (0) +#define ln_retry_inc(d, m) do {} while (0) +#define ln_event_inc(d) do {} while (0) + +#endif /* DEBUG */ + +EXPORT_SYMBOL(htree_lock_stat_reset); +EXPORT_SYMBOL(htree_lock_stat_print); + +#define HTREE_DEP_ROOT (-1) + +#define htree_spin_lock(lhead, dep) \ + bit_spin_lock((dep) + 1, &(lhead)->lh_lock) +#define htree_spin_unlock(lhead, dep) \ + bit_spin_unlock((dep) + 1, &(lhead)->lh_lock) + +#define htree_key_event_ignore(child, ln) \ + (!((child)->lc_events & (1 << (ln)->ln_mode))) + +static int +htree_key_list_empty(struct htree_lock_node *ln) +{ + return list_empty(&ln->ln_major_list) && list_empty(&ln->ln_minor_list); +} + +static void +htree_key_list_del_init(struct htree_lock_node *ln) +{ + struct htree_lock_node *tmp = NULL; + + if (!list_empty(&ln->ln_minor_list)) { + tmp = list_entry(ln->ln_minor_list.next, + struct htree_lock_node, ln_minor_list); + list_del_init(&ln->ln_minor_list); + } + + if (list_empty(&ln->ln_major_list)) + return; + + if (tmp == NULL) { /* not on minor key list */ + list_del_init(&ln->ln_major_list); + } else { + BUG_ON(!list_empty(&tmp->ln_major_list)); + list_replace_init(&ln->ln_major_list, &tmp->ln_major_list); + } +} + +static void +htree_key_list_replace_init(struct htree_lock_node *old, + struct htree_lock_node *new) +{ + if (!list_empty(&old->ln_major_list)) + list_replace_init(&old->ln_major_list, &new->ln_major_list); + + if (!list_empty(&old->ln_minor_list)) + list_replace_init(&old->ln_minor_list, &new->ln_minor_list); +} + +static void +htree_key_event_enqueue(struct htree_lock_child *child, + struct htree_lock_node *ln, int dep, void *event) +{ + struct htree_lock_node *tmp; + + /* NB: ALWAYS called holding lhead::lh_lock(dep) */ + BUG_ON(ln->ln_mode == HTREE_LOCK_NL); + if (event == NULL || htree_key_event_ignore(child, ln)) + return; + + /* shouldn't be a very long list */ + list_for_each_entry(tmp, &ln->ln_alive_list, ln_alive_list) { + if (tmp->ln_mode == HTREE_LOCK_NL) { + ln_event_inc(dep); + if (child->lc_callback != NULL) + child->lc_callback(tmp->ln_ev_target, event); + } + } +} + +static int +htree_node_lock_enqueue(struct htree_lock *newlk, struct htree_lock *curlk, + unsigned dep, int wait, void *event) +{ + struct htree_lock_child *child = &newlk->lk_head->lh_children[dep]; + struct htree_lock_node *newln = &newlk->lk_nodes[dep]; + struct htree_lock_node *curln = &curlk->lk_nodes[dep]; + + /* NB: ALWAYS called holding lhead::lh_lock(dep) */ + /* NB: we only expect PR/PW lock mode at here, only these two modes are + * allowed for htree_node_lock(asserted in htree_node_lock_internal), + * NL is only used for listener, user can't directly require NL mode */ + if ((curln->ln_mode == HTREE_LOCK_NL) || + (curln->ln_mode != HTREE_LOCK_PW && + newln->ln_mode != HTREE_LOCK_PW)) { + /* no conflict, attach it on granted list of @curlk */ + if (curln->ln_mode != HTREE_LOCK_NL) { + list_add(&newln->ln_granted_list, + &curln->ln_granted_list); + } else { + /* replace key owner */ + htree_key_list_replace_init(curln, newln); + } + + list_add(&newln->ln_alive_list, &curln->ln_alive_list); + htree_key_event_enqueue(child, newln, dep, event); + ln_grant_inc(dep, newln->ln_mode); + return 1; /* still hold lh_lock */ + } + + if (!wait) { /* can't grant and don't want to wait */ + ln_retry_inc(dep, newln->ln_mode); + newln->ln_mode = HTREE_LOCK_INVAL; + return -1; /* don't wait and just return -1 */ + } + + newlk->lk_task = current; + set_current_state(TASK_UNINTERRUPTIBLE); + /* conflict, attach it on blocked list of curlk */ + list_add_tail(&newln->ln_blocked_list, &curln->ln_blocked_list); + list_add(&newln->ln_alive_list, &curln->ln_alive_list); + ln_block_inc(dep, newln->ln_mode); + + htree_spin_unlock(newlk->lk_head, dep); + /* wait to be given the lock */ + if (newlk->lk_task != NULL) + schedule(); + /* granted, no doubt, wake up will set me RUNNING */ + if (event == NULL || htree_key_event_ignore(child, newln)) + return 0; /* granted without lh_lock */ + + htree_spin_lock(newlk->lk_head, dep); + htree_key_event_enqueue(child, newln, dep, event); + return 1; /* still hold lh_lock */ +} + +/* + * get PR/PW access to particular tree-node according to @dep and @key, + * it will return -1 if @wait is false and can't immediately grant this lock. + * All listeners(HTREE_LOCK_NL) on @dep and with the same @key will get + * @event if it's not NULL. + * NB: ALWAYS called holding lhead::lh_lock + */ +static int +htree_node_lock_internal(struct htree_lock_head *lhead, struct htree_lock *lck, + htree_lock_mode_t mode, u32 key, unsigned dep, + int wait, void *event) +{ + LIST_HEAD(list); + struct htree_lock *tmp; + struct htree_lock *tmp2; + u16 major; + u16 minor; + u8 reverse; + u8 ma_bits; + u8 mi_bits; + + BUG_ON(mode != HTREE_LOCK_PW && mode != HTREE_LOCK_PR); + BUG_ON(htree_node_is_granted(lck, dep)); + + key = hash_long(key, lhead->lh_hbits); + + mi_bits = lhead->lh_hbits >> 1; + ma_bits = lhead->lh_hbits - mi_bits; + + lck->lk_nodes[dep].ln_major_key = major = key & ((1U << ma_bits) - 1); + lck->lk_nodes[dep].ln_minor_key = minor = key >> ma_bits; + lck->lk_nodes[dep].ln_mode = mode; + + /* + * The major key list is an ordered list, so searches are started + * at the end of the list that is numerically closer to major_key, + * so at most half of the list will be walked (for well-distributed + * keys). The list traversal aborts early if the expected key + * location is passed. + */ + reverse = (major >= (1 << (ma_bits - 1))); + + if (reverse) { + list_for_each_entry_reverse(tmp, + &lhead->lh_children[dep].lc_list, + lk_nodes[dep].ln_major_list) { + if (tmp->lk_nodes[dep].ln_major_key == major) { + goto search_minor; + + } else if (tmp->lk_nodes[dep].ln_major_key < major) { + /* attach _after_ @tmp */ + list_add(&lck->lk_nodes[dep].ln_major_list, + &tmp->lk_nodes[dep].ln_major_list); + goto out_grant_major; + } + } + + list_add(&lck->lk_nodes[dep].ln_major_list, + &lhead->lh_children[dep].lc_list); + goto out_grant_major; + + } else { + list_for_each_entry(tmp, &lhead->lh_children[dep].lc_list, + lk_nodes[dep].ln_major_list) { + if (tmp->lk_nodes[dep].ln_major_key == major) { + goto search_minor; + + } else if (tmp->lk_nodes[dep].ln_major_key > major) { + /* insert _before_ @tmp */ + list_add_tail(&lck->lk_nodes[dep].ln_major_list, + &tmp->lk_nodes[dep].ln_major_list); + goto out_grant_major; + } + } + + list_add_tail(&lck->lk_nodes[dep].ln_major_list, + &lhead->lh_children[dep].lc_list); + goto out_grant_major; + } + + search_minor: + /* + * NB: minor_key list doesn't have a "head", @list is just a + * temporary stub for helping list searching, make sure it's removed + * after searching. + * minor_key list is an ordered list too. + */ + list_add_tail(&list, &tmp->lk_nodes[dep].ln_minor_list); + + reverse = (minor >= (1 << (mi_bits - 1))); + + if (reverse) { + list_for_each_entry_reverse(tmp2, &list, + lk_nodes[dep].ln_minor_list) { + if (tmp2->lk_nodes[dep].ln_minor_key == minor) { + goto out_enqueue; + + } else if (tmp2->lk_nodes[dep].ln_minor_key < minor) { + /* attach _after_ @tmp2 */ + list_add(&lck->lk_nodes[dep].ln_minor_list, + &tmp2->lk_nodes[dep].ln_minor_list); + goto out_grant_minor; + } + } + + list_add(&lck->lk_nodes[dep].ln_minor_list, &list); + + } else { + list_for_each_entry(tmp2, &list, + lk_nodes[dep].ln_minor_list) { + if (tmp2->lk_nodes[dep].ln_minor_key == minor) { + goto out_enqueue; + + } else if (tmp2->lk_nodes[dep].ln_minor_key > minor) { + /* insert _before_ @tmp2 */ + list_add_tail(&lck->lk_nodes[dep].ln_minor_list, + &tmp2->lk_nodes[dep].ln_minor_list); + goto out_grant_minor; + } + } + + list_add_tail(&lck->lk_nodes[dep].ln_minor_list, &list); + } + + out_grant_minor: + if (list.next == &lck->lk_nodes[dep].ln_minor_list) { + /* new lock @lck is the first one on minor_key list, which + * means it has the smallest minor_key and it should + * replace @tmp as minor_key owner */ + list_replace_init(&tmp->lk_nodes[dep].ln_major_list, + &lck->lk_nodes[dep].ln_major_list); + } + /* remove the temporary head */ + list_del(&list); + + out_grant_major: + ln_grant_inc(dep, lck->lk_nodes[dep].ln_mode); + return 1; /* granted with holding lh_lock */ + + out_enqueue: + list_del(&list); /* remove temprary head */ + return htree_node_lock_enqueue(lck, tmp2, dep, wait, event); +} + +/* + * release the key of @lck at level @dep, and grant any blocked locks. + * caller will still listen on @key if @event is not NULL, which means + * caller can see a event (by event_cb) while granting any lock with + * the same key at level @dep. + * NB: ALWAYS called holding lhead::lh_lock + * NB: listener will not block anyone because listening mode is HTREE_LOCK_NL + */ +static void +htree_node_unlock_internal(struct htree_lock_head *lhead, + struct htree_lock *curlk, unsigned dep, void *event) +{ + struct htree_lock_node *curln = &curlk->lk_nodes[dep]; + struct htree_lock *grtlk = NULL; + struct htree_lock_node *grtln; + struct htree_lock *poslk; + struct htree_lock *tmplk; + + if (!htree_node_is_granted(curlk, dep)) + return; + + if (!list_empty(&curln->ln_granted_list)) { + /* there is another granted lock */ + grtlk = list_entry(curln->ln_granted_list.next, + struct htree_lock, + lk_nodes[dep].ln_granted_list); + list_del_init(&curln->ln_granted_list); + } + + if (grtlk == NULL && !list_empty(&curln->ln_blocked_list)) { + /* + * @curlk is the only granted lock, so we confirmed: + * a) curln is key owner (attached on major/minor_list), + * so if there is any blocked lock, it should be attached + * on curln->ln_blocked_list + * b) we always can grant the first blocked lock + */ + grtlk = list_entry(curln->ln_blocked_list.next, + struct htree_lock, + lk_nodes[dep].ln_blocked_list); + BUG_ON(grtlk->lk_task == NULL); + wake_up_process(grtlk->lk_task); + } + + if (event != NULL && + lhead->lh_children[dep].lc_events != HTREE_EVENT_DISABLE) { + curln->ln_ev_target = event; + curln->ln_mode = HTREE_LOCK_NL; /* listen! */ + } else { + curln->ln_mode = HTREE_LOCK_INVAL; + } + + if (grtlk == NULL) { /* I must be the only one locking this key */ + struct htree_lock_node *tmpln; + + BUG_ON(htree_key_list_empty(curln)); + + if (curln->ln_mode == HTREE_LOCK_NL) /* listening */ + return; + + /* not listening */ + if (list_empty(&curln->ln_alive_list)) { /* no more listener */ + htree_key_list_del_init(curln); + return; + } + + tmpln = list_entry(curln->ln_alive_list.next, + struct htree_lock_node, ln_alive_list); + + BUG_ON(tmpln->ln_mode != HTREE_LOCK_NL); + + htree_key_list_replace_init(curln, tmpln); + list_del_init(&curln->ln_alive_list); + + return; + } + + /* have a granted lock */ + grtln = &grtlk->lk_nodes[dep]; + if (!list_empty(&curln->ln_blocked_list)) { + /* only key owner can be on both lists */ + BUG_ON(htree_key_list_empty(curln)); + + if (list_empty(&grtln->ln_blocked_list)) { + list_add(&grtln->ln_blocked_list, + &curln->ln_blocked_list); + } + list_del_init(&curln->ln_blocked_list); + } + /* + * NB: this is the tricky part: + * We have only two modes for child-lock (PR and PW), also, + * only owner of the key (attached on major/minor_list) can be on + * both blocked_list and granted_list, so @grtlk must be one + * of these two cases: + * + * a) @grtlk is taken from granted_list, which means we've granted + * more than one lock so @grtlk has to be PR, the first blocked + * lock must be PW and we can't grant it at all. + * So even @grtlk is not owner of the key (empty blocked_list), + * we don't care because we can't grant any lock. + * b) we just grant a new lock which is taken from head of blocked + * list, and it should be the first granted lock, and it should + * be the first one linked on blocked_list. + * + * Either way, we can get correct result by iterating blocked_list + * of @grtlk, and don't have to bother on how to find out + * owner of current key. + */ + list_for_each_entry_safe(poslk, tmplk, &grtln->ln_blocked_list, + lk_nodes[dep].ln_blocked_list) { + if (grtlk->lk_nodes[dep].ln_mode == HTREE_LOCK_PW || + poslk->lk_nodes[dep].ln_mode == HTREE_LOCK_PW) + break; + /* grant all readers */ + list_del_init(&poslk->lk_nodes[dep].ln_blocked_list); + list_add(&poslk->lk_nodes[dep].ln_granted_list, + &grtln->ln_granted_list); + + BUG_ON(poslk->lk_task == NULL); + wake_up_process(poslk->lk_task); + } + + /* if @curln is the owner of this key, replace it with @grtln */ + if (!htree_key_list_empty(curln)) + htree_key_list_replace_init(curln, grtln); + + if (curln->ln_mode == HTREE_LOCK_INVAL) + list_del_init(&curln->ln_alive_list); +} + +/* + * it's just wrapper of htree_node_lock_internal, it returns 1 on granted + * and 0 only if @wait is false and can't grant it immediately + */ +int +htree_node_lock_try(struct htree_lock *lck, htree_lock_mode_t mode, + u32 key, unsigned dep, int wait, void *event) +{ + struct htree_lock_head *lhead = lck->lk_head; + int rc; + + BUG_ON(dep >= lck->lk_depth); + BUG_ON(lck->lk_mode == HTREE_LOCK_INVAL); + + htree_spin_lock(lhead, dep); + rc = htree_node_lock_internal(lhead, lck, mode, key, dep, wait, event); + if (rc != 0) + htree_spin_unlock(lhead, dep); + return rc >= 0; +} +EXPORT_SYMBOL(htree_node_lock_try); + +/* it's wrapper of htree_node_unlock_internal */ +void +htree_node_unlock(struct htree_lock *lck, unsigned dep, void *event) +{ + struct htree_lock_head *lhead = lck->lk_head; + + BUG_ON(dep >= lck->lk_depth); + BUG_ON(lck->lk_mode == HTREE_LOCK_INVAL); + + htree_spin_lock(lhead, dep); + htree_node_unlock_internal(lhead, lck, dep, event); + htree_spin_unlock(lhead, dep); +} +EXPORT_SYMBOL(htree_node_unlock); + +/* stop listening on child-lock level @dep */ +void +htree_node_stop_listen(struct htree_lock *lck, unsigned dep) +{ + struct htree_lock_node *ln = &lck->lk_nodes[dep]; + struct htree_lock_node *tmp; + + BUG_ON(htree_node_is_granted(lck, dep)); + BUG_ON(!list_empty(&ln->ln_blocked_list)); + BUG_ON(!list_empty(&ln->ln_granted_list)); + + if (!htree_node_is_listening(lck, dep)) + return; + + htree_spin_lock(lck->lk_head, dep); + ln->ln_mode = HTREE_LOCK_INVAL; + ln->ln_ev_target = NULL; + + if (htree_key_list_empty(ln)) { /* not owner */ + list_del_init(&ln->ln_alive_list); + goto out; + } + + /* I'm the owner... */ + if (list_empty(&ln->ln_alive_list)) { /* no more listener */ + htree_key_list_del_init(ln); + goto out; + } + + tmp = list_entry(ln->ln_alive_list.next, + struct htree_lock_node, ln_alive_list); + + BUG_ON(tmp->ln_mode != HTREE_LOCK_NL); + htree_key_list_replace_init(ln, tmp); + list_del_init(&ln->ln_alive_list); + out: + htree_spin_unlock(lck->lk_head, dep); +} +EXPORT_SYMBOL(htree_node_stop_listen); + +/* release all child-locks if we have any */ +static void +htree_node_release_all(struct htree_lock *lck) +{ + int i; + + for (i = 0; i < lck->lk_depth; i++) { + if (htree_node_is_granted(lck, i)) + htree_node_unlock(lck, i, NULL); + else if (htree_node_is_listening(lck, i)) + htree_node_stop_listen(lck, i); + } +} + +/* + * obtain htree lock, it could be blocked inside if there's conflict + * with any granted or blocked lock and @wait is true. + * NB: ALWAYS called holding lhead::lh_lock + */ +static int +htree_lock_internal(struct htree_lock *lck, int wait) +{ + struct htree_lock_head *lhead = lck->lk_head; + int granted = 0; + int blocked = 0; + int i; + + for (i = 0; i < HTREE_LOCK_MAX; i++) { + if (lhead->lh_ngranted[i] != 0) + granted |= 1 << i; + if (lhead->lh_nblocked[i] != 0) + blocked |= 1 << i; + } + if ((htree_lock_compat[lck->lk_mode] & granted) != granted || + (htree_lock_compat[lck->lk_mode] & blocked) != blocked) { + /* will block current lock even it just conflicts with any + * other blocked lock, so lock like EX wouldn't starve */ + if (!wait) + return -1; + lhead->lh_nblocked[lck->lk_mode]++; + lk_block_inc(lck->lk_mode); + + lck->lk_task = current; + list_add_tail(&lck->lk_blocked_list, &lhead->lh_blocked_list); + +retry: + set_current_state(TASK_UNINTERRUPTIBLE); + htree_spin_unlock(lhead, HTREE_DEP_ROOT); + /* wait to be given the lock */ + if (lck->lk_task != NULL) + schedule(); + /* granted, no doubt. wake up will set me RUNNING. + * Since thread would be waken up accidentally, + * so we need check lock whether granted or not again. */ + if (!list_empty(&lck->lk_blocked_list)) { + htree_spin_lock(lhead, HTREE_DEP_ROOT); + if (list_empty(&lck->lk_blocked_list)) { + htree_spin_unlock(lhead, HTREE_DEP_ROOT); + return 0; + } + goto retry; + } + return 0; /* without lh_lock */ + } + lhead->lh_ngranted[lck->lk_mode]++; + lk_grant_inc(lck->lk_mode); + return 1; +} + +/* release htree lock. NB: ALWAYS called holding lhead::lh_lock */ +static void +htree_unlock_internal(struct htree_lock *lck) +{ + struct htree_lock_head *lhead = lck->lk_head; + struct htree_lock *tmp; + struct htree_lock *tmp2; + int granted = 0; + int i; + + BUG_ON(lhead->lh_ngranted[lck->lk_mode] == 0); + + lhead->lh_ngranted[lck->lk_mode]--; + lck->lk_mode = HTREE_LOCK_INVAL; + + for (i = 0; i < HTREE_LOCK_MAX; i++) { + if (lhead->lh_ngranted[i] != 0) + granted |= 1 << i; + } + list_for_each_entry_safe(tmp, tmp2, + &lhead->lh_blocked_list, lk_blocked_list) { + /* conflict with any granted lock? */ + if ((htree_lock_compat[tmp->lk_mode] & granted) != granted) + break; + + list_del_init(&tmp->lk_blocked_list); + + BUG_ON(lhead->lh_nblocked[tmp->lk_mode] == 0); + + lhead->lh_nblocked[tmp->lk_mode]--; + lhead->lh_ngranted[tmp->lk_mode]++; + granted |= 1 << tmp->lk_mode; + + BUG_ON(tmp->lk_task == NULL); + wake_up_process(tmp->lk_task); + } +} + +/* it's wrapper of htree_lock_internal and exported interface. + * It always return 1 with granted lock if @wait is true, it can return 0 + * if @wait is false and locking request can't be granted immediately */ +int +htree_lock_try(struct htree_lock *lck, struct htree_lock_head *lhead, + htree_lock_mode_t mode, int wait) +{ + int rc; + + BUG_ON(lck->lk_depth > lhead->lh_depth); + BUG_ON(lck->lk_head != NULL); + BUG_ON(lck->lk_task != NULL); + + lck->lk_head = lhead; + lck->lk_mode = mode; + + htree_spin_lock(lhead, HTREE_DEP_ROOT); + rc = htree_lock_internal(lck, wait); + if (rc != 0) + htree_spin_unlock(lhead, HTREE_DEP_ROOT); + return rc >= 0; +} +EXPORT_SYMBOL(htree_lock_try); + +/* it's wrapper of htree_unlock_internal and exported interface. + * It will release all htree_node_locks and htree_lock */ +void +htree_unlock(struct htree_lock *lck) +{ + BUG_ON(lck->lk_head == NULL); + BUG_ON(lck->lk_mode == HTREE_LOCK_INVAL); + + htree_node_release_all(lck); + + htree_spin_lock(lck->lk_head, HTREE_DEP_ROOT); + htree_unlock_internal(lck); + htree_spin_unlock(lck->lk_head, HTREE_DEP_ROOT); + lck->lk_head = NULL; + lck->lk_task = NULL; +} +EXPORT_SYMBOL(htree_unlock); + +/* change lock mode */ +void +htree_change_mode(struct htree_lock *lck, htree_lock_mode_t mode) +{ + BUG_ON(lck->lk_mode == HTREE_LOCK_INVAL); + lck->lk_mode = mode; +} +EXPORT_SYMBOL(htree_change_mode); + +/* release htree lock, and lock it again with new mode. + * This function will first release all htree_node_locks and htree_lock, + * then try to gain htree_lock with new @mode. + * It always return 1 with granted lock if @wait is true, it can return 0 + * if @wait is false and locking request can't be granted immediately */ +int +htree_change_lock_try(struct htree_lock *lck, htree_lock_mode_t mode, int wait) +{ + struct htree_lock_head *lhead = lck->lk_head; + int rc; + + BUG_ON(lhead == NULL); + BUG_ON(lck->lk_mode == mode); + BUG_ON(lck->lk_mode == HTREE_LOCK_INVAL || mode == HTREE_LOCK_INVAL); + + htree_node_release_all(lck); + + htree_spin_lock(lhead, HTREE_DEP_ROOT); + htree_unlock_internal(lck); + lck->lk_mode = mode; + rc = htree_lock_internal(lck, wait); + if (rc != 0) + htree_spin_unlock(lhead, HTREE_DEP_ROOT); + return rc >= 0; +} +EXPORT_SYMBOL(htree_change_lock_try); + +/* create a htree_lock head with @depth levels (number of child-locks), + * it is a per resoruce structure */ +struct htree_lock_head * +htree_lock_head_alloc(unsigned depth, unsigned hbits, unsigned priv) +{ + struct htree_lock_head *lhead; + int i; + + if (depth > HTREE_LOCK_DEP_MAX) { + printk(KERN_ERR "%d is larger than max htree_lock depth %d\n", + depth, HTREE_LOCK_DEP_MAX); + return NULL; + } + + lhead = kzalloc(offsetof(struct htree_lock_head, + lh_children[depth]) + priv, GFP_NOFS); + if (lhead == NULL) + return NULL; + + if (hbits < HTREE_HBITS_MIN) + lhead->lh_hbits = HTREE_HBITS_MIN; + else if (hbits > HTREE_HBITS_MAX) + lhead->lh_hbits = HTREE_HBITS_MAX; + + lhead->lh_lock = 0; + lhead->lh_depth = depth; + INIT_LIST_HEAD(&lhead->lh_blocked_list); + if (priv > 0) { + lhead->lh_private = (void *)lhead + + offsetof(struct htree_lock_head, lh_children[depth]); + } + + for (i = 0; i < depth; i++) { + INIT_LIST_HEAD(&lhead->lh_children[i].lc_list); + lhead->lh_children[i].lc_events = HTREE_EVENT_DISABLE; + } + return lhead; +} +EXPORT_SYMBOL(htree_lock_head_alloc); + +/* free the htree_lock head */ +void +htree_lock_head_free(struct htree_lock_head *lhead) +{ + int i; + + BUG_ON(!list_empty(&lhead->lh_blocked_list)); + for (i = 0; i < lhead->lh_depth; i++) + BUG_ON(!list_empty(&lhead->lh_children[i].lc_list)); + kfree(lhead); +} +EXPORT_SYMBOL(htree_lock_head_free); + +/* register event callback for @events of child-lock at level @dep */ +void +htree_lock_event_attach(struct htree_lock_head *lhead, unsigned dep, + unsigned events, htree_event_cb_t callback) +{ + BUG_ON(lhead->lh_depth <= dep); + lhead->lh_children[dep].lc_events = events; + lhead->lh_children[dep].lc_callback = callback; +} +EXPORT_SYMBOL(htree_lock_event_attach); + +/* allocate a htree_lock, which is per-thread structure, @pbytes is some + * extra-bytes as private data for caller */ +struct htree_lock * +htree_lock_alloc(unsigned depth, unsigned pbytes) +{ + struct htree_lock *lck; + int i = offsetof(struct htree_lock, lk_nodes[depth]); + + if (depth > HTREE_LOCK_DEP_MAX) { + printk(KERN_ERR "%d is larger than max htree_lock depth %d\n", + depth, HTREE_LOCK_DEP_MAX); + return NULL; + } + lck = kzalloc(i + pbytes, GFP_NOFS); + if (lck == NULL) + return NULL; + + if (pbytes != 0) + lck->lk_private = (void *)lck + i; + lck->lk_mode = HTREE_LOCK_INVAL; + lck->lk_depth = depth; + INIT_LIST_HEAD(&lck->lk_blocked_list); + + for (i = 0; i < depth; i++) { + struct htree_lock_node *node = &lck->lk_nodes[i]; + + node->ln_mode = HTREE_LOCK_INVAL; + INIT_LIST_HEAD(&node->ln_major_list); + INIT_LIST_HEAD(&node->ln_minor_list); + INIT_LIST_HEAD(&node->ln_alive_list); + INIT_LIST_HEAD(&node->ln_blocked_list); + INIT_LIST_HEAD(&node->ln_granted_list); + } + + return lck; +} +EXPORT_SYMBOL(htree_lock_alloc); + +/* free htree_lock node */ +void +htree_lock_free(struct htree_lock *lck) +{ + BUG_ON(lck->lk_mode != HTREE_LOCK_INVAL); + kfree(lck); +} +EXPORT_SYMBOL(htree_lock_free); diff --git a/fs/ext4/htree_lock.h b/fs/ext4/htree_lock.h new file mode 100644 index 0000000..9dc7788 --- /dev/null +++ b/fs/ext4/htree_lock.h @@ -0,0 +1,187 @@ +/* + * include/linux/htree_lock.h + * + * Copyright (c) 2011, 2012, Intel Corporation. + * + * Author: Liang Zhen + */ + +/* + * htree lock + * + * htree_lock is an advanced lock, it can support five lock modes (concept is + * taken from DLM) and it's a sleeping lock. + * + * most common use case is: + * - create a htree_lock_head for data + * - each thread (contender) creates it's own htree_lock + * - contender needs to call htree_lock(lock_node, mode) to protect data and + * call htree_unlock to release lock + * + * Also, there is advanced use-case which is more complex, user can have + * PW/PR lock on particular key, it's mostly used while user holding shared + * lock on the htree (CW, CR) + * + * htree_lock(lock_node, HTREE_LOCK_CR); lock the htree with CR + * htree_node_lock(lock_node, HTREE_LOCK_PR, key...); lock @key with PR + * ... + * htree_node_unlock(lock_node);; unlock the key + * + * Another tip is, we can have N-levels of this kind of keys, all we need to + * do is specifying N-levels while creating htree_lock_head, then we can + * lock/unlock a specific level by: + * htree_node_lock(lock_node, mode1, key1, level1...); + * do something; + * htree_node_lock(lock_node, mode1, key2, level2...); + * do something; + * htree_node_unlock(lock_node, level2); + * htree_node_unlock(lock_node, level1); + * + * NB: for multi-level, should be careful about locking order to avoid deadlock + */ + +#ifndef _LINUX_HTREE_LOCK_H +#define _LINUX_HTREE_LOCK_H + +#include +#include +#include + +/* + * Lock Modes + * more details can be found here: + * http://en.wikipedia.org/wiki/Distributed_lock_manager + */ +typedef enum { + HTREE_LOCK_EX = 0, /* exclusive lock: incompatible with all others */ + HTREE_LOCK_PW, /* protected write: allows only CR users */ + HTREE_LOCK_PR, /* protected read: allow PR, CR users */ + HTREE_LOCK_CW, /* concurrent write: allow CR, CW users */ + HTREE_LOCK_CR, /* concurrent read: allow all but EX users */ + HTREE_LOCK_MAX, /* number of lock modes */ +} htree_lock_mode_t; + +#define HTREE_LOCK_NL HTREE_LOCK_MAX +#define HTREE_LOCK_INVAL 0xdead10c + +enum { + HTREE_HBITS_MIN = 2, + HTREE_HBITS_DEF = 14, + HTREE_HBITS_MAX = 32, +}; + +enum { + HTREE_EVENT_DISABLE = (0), + HTREE_EVENT_RD = (1 << HTREE_LOCK_PR), + HTREE_EVENT_WR = (1 << HTREE_LOCK_PW), + HTREE_EVENT_RDWR = (HTREE_EVENT_RD | HTREE_EVENT_WR), +}; + +struct htree_lock; + +typedef void (*htree_event_cb_t)(void *target, void *event); + +struct htree_lock_child { + struct list_head lc_list; /* granted list */ + htree_event_cb_t lc_callback; /* event callback */ + unsigned lc_events; /* event types */ +}; + +struct htree_lock_head { + unsigned long lh_lock; /* bits lock */ + /* blocked lock list (htree_lock) */ + struct list_head lh_blocked_list; + /* # key levels */ + u16 lh_depth; + /* hash bits for key and limit number of locks */ + u16 lh_hbits; + /* counters for blocked locks */ + u16 lh_nblocked[HTREE_LOCK_MAX]; + /* counters for granted locks */ + u16 lh_ngranted[HTREE_LOCK_MAX]; + /* private data */ + void *lh_private; + /* array of children locks */ + struct htree_lock_child lh_children[0]; +}; + +/* htree_lock_node_t is child-lock for a specific key (ln_value) */ +struct htree_lock_node { + htree_lock_mode_t ln_mode; + /* major hash key */ + u16 ln_major_key; + /* minor hash key */ + u16 ln_minor_key; + struct list_head ln_major_list; + struct list_head ln_minor_list; + /* alive list, all locks (granted, blocked, listening) are on it */ + struct list_head ln_alive_list; + /* blocked list */ + struct list_head ln_blocked_list; + /* granted list */ + struct list_head ln_granted_list; + void *ln_ev_target; +}; + +struct htree_lock { + struct task_struct *lk_task; + struct htree_lock_head *lk_head; + void *lk_private; + unsigned lk_depth; + htree_lock_mode_t lk_mode; + struct list_head lk_blocked_list; + struct htree_lock_node lk_nodes[0]; +}; + +/* create a lock head, which stands for a resource */ +struct htree_lock_head *htree_lock_head_alloc(unsigned depth, + unsigned hbits, unsigned priv); +/* free a lock head */ +void htree_lock_head_free(struct htree_lock_head *lhead); +/* register event callback for child lock at level @depth */ +void htree_lock_event_attach(struct htree_lock_head *lhead, unsigned depth, + unsigned events, htree_event_cb_t callback); +/* create a lock handle, which stands for a thread */ +struct htree_lock *htree_lock_alloc(unsigned depth, unsigned pbytes); +/* free a lock handle */ +void htree_lock_free(struct htree_lock *lck); +/* lock htree, when @wait is true, 0 is returned if the lock can't + * be granted immediately */ +int htree_lock_try(struct htree_lock *lck, struct htree_lock_head *lhead, + htree_lock_mode_t mode, int wait); +/* unlock htree */ +void htree_unlock(struct htree_lock *lck); +/* unlock and relock htree with @new_mode */ +int htree_change_lock_try(struct htree_lock *lck, + htree_lock_mode_t new_mode, int wait); +void htree_change_mode(struct htree_lock *lck, htree_lock_mode_t mode); +/* require child lock (key) of htree at level @dep, @event will be sent to all + * listeners on this @key while lock being granted */ +int htree_node_lock_try(struct htree_lock *lck, htree_lock_mode_t mode, + u32 key, unsigned dep, int wait, void *event); +/* release child lock at level @dep, this lock will listen on it's key + * if @event isn't NULL, event_cb will be called against @lck while granting + * any other lock at level @dep with the same key */ +void htree_node_unlock(struct htree_lock *lck, unsigned dep, void *event); +/* stop listening on child lock at level @dep */ +void htree_node_stop_listen(struct htree_lock *lck, unsigned dep); +/* for debug */ +void htree_lock_stat_print(int depth); +void htree_lock_stat_reset(void); + +#define htree_lock(lck, lh, mode) htree_lock_try(lck, lh, mode, 1) +#define htree_change_lock(lck, mode) htree_change_lock_try(lck, mode, 1) + +#define htree_lock_mode(lck) ((lck)->lk_mode) + +#define htree_node_lock(lck, mode, key, dep) \ + htree_node_lock_try(lck, mode, key, dep, 1, NULL) +/* this is only safe in thread context of lock owner */ +#define htree_node_is_granted(lck, dep) \ + ((lck)->lk_nodes[dep].ln_mode != HTREE_LOCK_INVAL && \ + (lck)->lk_nodes[dep].ln_mode != HTREE_LOCK_NL) +/* this is only safe in thread context of lock owner */ +#define htree_node_is_listening(lck, dep) \ + ((lck)->lk_nodes[dep].ln_mode == HTREE_LOCK_NL) + +#endif diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 9cb86e4..0153c4d 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -54,6 +54,7 @@ struct buffer_head *ext4_append(handle_t *handle, struct inode *inode, ext4_lblk_t *block) { + struct ext4_inode_info *ei = EXT4_I(inode); struct buffer_head *bh; int err; @@ -62,15 +63,24 @@ struct buffer_head *ext4_append(handle_t *handle, EXT4_SB(inode->i_sb)->s_max_dir_size_kb))) return ERR_PTR(-ENOSPC); + /* with parallel dir operations all appends + * have to be serialized -bzzz + */ + down(&ei->i_append_sem); + *block = inode->i_size >> inode->i_sb->s_blocksize_bits; bh = ext4_bread(handle, inode, *block, EXT4_GET_BLOCKS_CREATE); - if (IS_ERR(bh)) + if (IS_ERR(bh)) { + up(&ei->i_append_sem); return bh; + } + inode->i_size += inode->i_sb->s_blocksize; EXT4_I(inode)->i_disksize = inode->i_size; BUFFER_TRACE(bh, "get_write_access"); err = ext4_journal_get_write_access(handle, bh); + up(&ei->i_append_sem); if (err) { brelse(bh); ext4_std_error(inode->i_sb, err); @@ -250,7 +260,8 @@ static unsigned int inline dx_root_limit(struct inode *dir, static struct dx_frame *dx_probe(struct ext4_filename *fname, struct inode *dir, struct dx_hash_info *hinfo, - struct dx_frame *frame); + struct dx_frame *frame, + struct htree_lock *lck); static void dx_release(struct dx_frame *frames); static int dx_make_map(struct inode *dir, struct ext4_dir_entry_2 *de, unsigned blocksize, struct dx_hash_info *hinfo, @@ -264,12 +275,13 @@ static void dx_insert_block(struct dx_frame *frame, static int ext4_htree_next_block(struct inode *dir, __u32 hash, struct dx_frame *frame, struct dx_frame *frames, - __u32 *start_hash); + __u32 *start_hash, struct htree_lock *lck); static struct buffer_head * ext4_dx_find_entry(struct inode *dir, struct ext4_filename *fname, - struct ext4_dir_entry_2 **res_dir); + struct ext4_dir_entry_2 **res_dir, struct htree_lock *lck); static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, - struct inode *dir, struct inode *inode); + struct inode *dir, struct inode *inode, + struct htree_lock *lck); /* checksumming functions */ void initialize_dirent_tail(struct ext4_dir_entry_tail *t, @@ -734,6 +746,226 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, } #endif /* DX_DEBUG */ +/* private data for htree_lock */ +struct ext4_dir_lock_data { + unsigned int ld_flags; /* bits-map for lock types */ + unsigned int ld_count; /* # entries of the last DX block */ + struct dx_entry ld_at_entry; /* copy of leaf dx_entry */ + struct dx_entry *ld_at; /* position of leaf dx_entry */ +}; + +#define ext4_htree_lock_data(l) ((struct ext4_dir_lock_data *)(l)->lk_private) +#define ext4_find_entry(dir, name, dirent, inline) \ + ext4_find_entry_locked(dir, name, dirent, inline, NULL) +#define ext4_add_entry(handle, dentry, inode) \ + ext4_add_entry_locked(handle, dentry, inode, NULL) + +/* NB: ext4_lblk_t is 32 bits so we use high bits to identify invalid blk */ +#define EXT4_HTREE_NODE_CHANGED (0xcafeULL << 32) + +static void ext4_htree_event_cb(void *target, void *event) +{ + u64 *block = (u64 *)target; + + if (*block == dx_get_block((struct dx_entry *)event)) + *block = EXT4_HTREE_NODE_CHANGED; +} + +struct htree_lock_head *ext4_htree_lock_head_alloc(unsigned hbits) +{ + struct htree_lock_head *lhead; + + lhead = htree_lock_head_alloc(EXT4_LK_MAX, hbits, 0); + if (lhead) { + htree_lock_event_attach(lhead, EXT4_LK_SPIN, HTREE_EVENT_WR, + ext4_htree_event_cb); + } + return lhead; +} +EXPORT_SYMBOL(ext4_htree_lock_head_alloc); + +struct htree_lock *ext4_htree_lock_alloc(void) +{ + return htree_lock_alloc(EXT4_LK_MAX, + sizeof(struct ext4_dir_lock_data)); +} +EXPORT_SYMBOL(ext4_htree_lock_alloc); + +static htree_lock_mode_t ext4_htree_mode(unsigned flags) +{ + switch (flags) { + default: /* 0 or unknown flags require EX lock */ + return HTREE_LOCK_EX; + case EXT4_HLOCK_READDIR: + return HTREE_LOCK_PR; + case EXT4_HLOCK_LOOKUP: + return HTREE_LOCK_CR; + case EXT4_HLOCK_DEL: + case EXT4_HLOCK_ADD: + return HTREE_LOCK_CW; + } +} + +/* return PR for read-only operations, otherwise return EX */ +static inline htree_lock_mode_t ext4_htree_safe_mode(unsigned flags) +{ + int writer = (flags & EXT4_LB_DE) == EXT4_LB_DE; + + /* 0 requires EX lock */ + return (flags == 0 || writer) ? HTREE_LOCK_EX : HTREE_LOCK_PR; +} + +static int ext4_htree_safe_locked(struct htree_lock *lck) +{ + int writer; + + if (!lck || lck->lk_mode == HTREE_LOCK_EX) + return 1; + + writer = (ext4_htree_lock_data(lck)->ld_flags & EXT4_LB_DE) == + EXT4_LB_DE; + if (writer) /* all readers & writers are excluded? */ + return lck->lk_mode == HTREE_LOCK_EX; + + /* all writers are excluded? */ + return lck->lk_mode == HTREE_LOCK_PR || + lck->lk_mode == HTREE_LOCK_PW || + lck->lk_mode == HTREE_LOCK_EX; +} + +/* relock htree_lock with EX mode if it's change operation, otherwise + * relock it with PR mode. It's noop if PDO is disabled. + */ +static void ext4_htree_safe_relock(struct htree_lock *lck) +{ + if (!ext4_htree_safe_locked(lck)) { + unsigned int flags = ext4_htree_lock_data(lck)->ld_flags; + + htree_change_lock(lck, ext4_htree_safe_mode(flags)); + } +} + +void ext4_htree_lock(struct htree_lock *lck, struct htree_lock_head *lhead, + struct inode *dir, unsigned flags) +{ + htree_lock_mode_t mode = is_dx(dir) ? ext4_htree_mode(flags) : + ext4_htree_safe_mode(flags); + + ext4_htree_lock_data(lck)->ld_flags = flags; + htree_lock(lck, lhead, mode); + if (!is_dx(dir)) + ext4_htree_safe_relock(lck); /* make sure it's safe locked */ +} +EXPORT_SYMBOL(ext4_htree_lock); + +static int ext4_htree_node_lock(struct htree_lock *lck, struct dx_entry *at, + unsigned int lmask, int wait, void *ev) +{ + u32 key = (at == NULL) ? 0 : dx_get_block(at); + u32 mode; + + /* NOOP if htree is well protected or caller doesn't require the lock */ + if (ext4_htree_safe_locked(lck) || + !(ext4_htree_lock_data(lck)->ld_flags & lmask)) + return 1; + + mode = (ext4_htree_lock_data(lck)->ld_flags & lmask) == lmask ? + HTREE_LOCK_PW : HTREE_LOCK_PR; + while (1) { + if (htree_node_lock_try(lck, mode, key, ffz(~lmask), wait, ev)) + return 1; + if (!(lmask & EXT4_LB_SPIN)) /* not a spinlock */ + return 0; + cpu_relax(); /* spin until granted */ + } +} + +static int ext4_htree_node_locked(struct htree_lock *lck, unsigned lmask) +{ + return ext4_htree_safe_locked(lck) || + htree_node_is_granted(lck, ffz(~lmask)); +} + +static void ext4_htree_node_unlock(struct htree_lock *lck, + unsigned int lmask, void *buf) +{ + /* NB: it's safe to call mutiple times or even it's not locked */ + if (!ext4_htree_safe_locked(lck) && + htree_node_is_granted(lck, ffz(~lmask))) + htree_node_unlock(lck, ffz(~lmask), buf); +} + +#define ext4_htree_dx_lock(lck, key) \ + ext4_htree_node_lock(lck, key, EXT4_LB_DX, 1, NULL) +#define ext4_htree_dx_lock_try(lck, key) \ + ext4_htree_node_lock(lck, key, EXT4_LB_DX, 0, NULL) +#define ext4_htree_dx_unlock(lck) \ + ext4_htree_node_unlock(lck, EXT4_LB_DX, NULL) +#define ext4_htree_dx_locked(lck) \ + ext4_htree_node_locked(lck, EXT4_LB_DX) + +static void ext4_htree_dx_need_lock(struct htree_lock *lck) +{ + struct ext4_dir_lock_data *ld; + + if (ext4_htree_safe_locked(lck)) + return; + + ld = ext4_htree_lock_data(lck); + switch (ld->ld_flags) { + default: + return; + case EXT4_HLOCK_LOOKUP: + ld->ld_flags = EXT4_HLOCK_LOOKUP_SAFE; + return; + case EXT4_HLOCK_DEL: + ld->ld_flags = EXT4_HLOCK_DEL_SAFE; + return; + case EXT4_HLOCK_ADD: + ld->ld_flags = EXT4_HLOCK_SPLIT; + return; + } +} + +#define ext4_htree_de_lock(lck, key) \ + ext4_htree_node_lock(lck, key, EXT4_LB_DE, 1, NULL) +#define ext4_htree_de_unlock(lck) \ + ext4_htree_node_unlock(lck, EXT4_LB_DE, NULL) + +#define ext4_htree_spin_lock(lck, key, event) \ + ext4_htree_node_lock(lck, key, EXT4_LB_SPIN, 0, event) +#define ext4_htree_spin_unlock(lck) \ + ext4_htree_node_unlock(lck, EXT4_LB_SPIN, NULL) +#define ext4_htree_spin_unlock_listen(lck, p) \ + ext4_htree_node_unlock(lck, EXT4_LB_SPIN, p) + +static void ext4_htree_spin_stop_listen(struct htree_lock *lck) +{ + if (!ext4_htree_safe_locked(lck) && + htree_node_is_listening(lck, ffz(~EXT4_LB_SPIN))) + htree_node_stop_listen(lck, ffz(~EXT4_LB_SPIN)); +} + +enum { + DX_HASH_COL_IGNORE, /* ignore collision while probing frames */ + DX_HASH_COL_YES, /* there is collision and it does matter */ + DX_HASH_COL_NO, /* there is no collision */ +}; + +static int dx_probe_hash_collision(struct htree_lock *lck, + struct dx_entry *entries, + struct dx_entry *at, u32 hash) +{ + if (!(lck && ext4_htree_lock_data(lck)->ld_flags & EXT4_LB_EXACT)) { + return DX_HASH_COL_IGNORE; /* don't care about collision */ + } else if (at == entries + dx_get_count(entries) - 1) { + return DX_HASH_COL_IGNORE; /* not in any leaf of this DX */ + } else { /* hash collision? */ + return ((dx_get_hash(at + 1) & ~1) == hash) ? + DX_HASH_COL_YES : DX_HASH_COL_NO; + } +} + /* * Probe for a directory leaf block to search. * @@ -745,10 +977,11 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, */ static struct dx_frame * dx_probe(struct ext4_filename *fname, struct inode *dir, - struct dx_hash_info *hinfo, struct dx_frame *frame_in) + struct dx_hash_info *hinfo, struct dx_frame *frame_in, + struct htree_lock *lck) { unsigned count, indirect; - struct dx_entry *at, *entries, *p, *q, *m; + struct dx_entry *at, *entries, *p, *q, *m, *dx = NULL; struct dx_root_info *info; struct dx_frame *frame = frame_in; struct dx_frame *ret_err = ERR_PTR(ERR_BAD_DX_DIR); @@ -811,6 +1044,13 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, dxtrace(printk("Look up %x", hash)); while (1) { + if (indirect == 0) { /* the last index level */ + /* NB: ext4_htree_dx_lock() could be noop if + * DX-lock flag is not set for current operation + */ + ext4_htree_dx_lock(lck, dx); + ext4_htree_spin_lock(lck, dx, NULL); + } count = dx_get_count(entries); if (!count || count > dx_get_limit(entries)) { ext4_warning_inode(dir, @@ -851,8 +1091,75 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir, dx_get_block(at))); frame->entries = entries; frame->at = at; - if (!indirect--) + + if (indirect == 0) { /* the last index level */ + struct ext4_dir_lock_data *ld; + u64 myblock; + + /* By default we only lock DE-block, however, we will + * also lock the last level DX-block if: + * a) there is hash collision + * we will set DX-lock flag (a few lines below) + * and redo to lock DX-block + * see detail in dx_probe_hash_collision() + * b) it's a retry from splitting + * we need to lock the last level DX-block so nobody + * else can split any leaf blocks under the same + * DX-block, see detail in ext4_dx_add_entry() + */ + if (ext4_htree_dx_locked(lck)) { + /* DX-block is locked, just lock DE-block + * and return + */ + ext4_htree_spin_unlock(lck); + if (!ext4_htree_safe_locked(lck)) + ext4_htree_de_lock(lck, frame->at); + return frame; + } + /* it's pdirop and no DX lock */ + if (dx_probe_hash_collision(lck, entries, at, hash) == + DX_HASH_COL_YES) { + /* found hash collision, set DX-lock flag + * and retry to abtain DX-lock + */ + ext4_htree_spin_unlock(lck); + ext4_htree_dx_need_lock(lck); + continue; + } + ld = ext4_htree_lock_data(lck); + /* because I don't lock DX, so @at can't be trusted + * after I release spinlock so I have to save it + */ + ld->ld_at = at; + ld->ld_at_entry = *at; + ld->ld_count = dx_get_count(entries); + + frame->at = &ld->ld_at_entry; + myblock = dx_get_block(at); + + /* NB: ordering locking */ + ext4_htree_spin_unlock_listen(lck, &myblock); + /* other thread can split this DE-block because: + * a) I don't have lock for the DE-block yet + * b) I released spinlock on DX-block + * if it happened I can detect it by listening + * splitting event on this DE-block + */ + ext4_htree_de_lock(lck, frame->at); + ext4_htree_spin_stop_listen(lck); + + if (myblock == EXT4_HTREE_NODE_CHANGED) { + /* someone split this DE-block before + * I locked it, I need to retry and lock + * valid DE-block + */ + ext4_htree_de_unlock(lck); + continue; + } return frame; + } + dx = at; + indirect--; frame++; frame->bh = ext4_read_dirblock(dir, dx_get_block(at), INDEX); if (IS_ERR(frame->bh)) { @@ -921,7 +1228,7 @@ static void dx_release(struct dx_frame *frames) static int ext4_htree_next_block(struct inode *dir, __u32 hash, struct dx_frame *frame, struct dx_frame *frames, - __u32 *start_hash) + u32 *start_hash, struct htree_lock *lck) { struct dx_frame *p; struct buffer_head *bh; @@ -936,12 +1243,23 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash, * this loop, num_frames indicates the number of interior * nodes need to be read. */ + ext4_htree_de_unlock(lck); while (1) { - if (++(p->at) < p->entries + dx_get_count(p->entries)) - break; + if (num_frames > 0 || ext4_htree_dx_locked(lck)) { + /* num_frames > 0 : + * DX block + * ext4_htree_dx_locked: + * frame->at is reliable pointer returned by dx_probe, + * otherwise dx_probe already knew no collision + */ + if (++(p->at) < p->entries + dx_get_count(p->entries)) + break; + } if (p == frames) return 0; num_frames++; + if (num_frames == 1) + ext4_htree_dx_unlock(lck); p--; } @@ -964,6 +1282,14 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash, * block so no check is necessary */ while (num_frames--) { + if (num_frames == 0) { + /* it's not always necessary, we just don't want to + * detect hash collision again + */ + ext4_htree_dx_need_lock(lck); + ext4_htree_dx_lock(lck, p->at); + } + bh = ext4_read_dirblock(dir, dx_get_block(p->at), INDEX); if (IS_ERR(bh)) return PTR_ERR(bh); @@ -972,10 +1298,10 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash, p->bh = bh; p->at = p->entries = ((struct dx_node *) bh->b_data)->entries; } + ext4_htree_de_lock(lck, p->at); return 1; } - /* * This function fills a red-black tree with information from a * directory block. It returns the number directory entries loaded @@ -1119,7 +1445,8 @@ int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash, } hinfo.hash = start_hash; hinfo.minor_hash = 0; - frame = dx_probe(NULL, dir, &hinfo, frames); + /* assume it's PR locked */ + frame = dx_probe(NULL, dir, &hinfo, frames, NULL); if (IS_ERR(frame)) return PTR_ERR(frame); @@ -1162,7 +1489,7 @@ int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash, count += ret; hashval = ~0; ret = ext4_htree_next_block(dir, HASH_NB_ALWAYS, - frame, frames, &hashval); + frame, frames, &hashval, NULL); *next_hash = hashval; if (ret < 0) { err = ret; @@ -1400,7 +1727,7 @@ static int is_dx_internal_node(struct inode *dir, ext4_lblk_t block, static struct buffer_head *__ext4_find_entry(struct inode *dir, struct ext4_filename *fname, struct ext4_dir_entry_2 **res_dir, - int *inlined) + int *inlined, struct htree_lock *lck) { struct super_block *sb; struct buffer_head *bh_use[NAMEI_RA_SIZE]; @@ -1442,7 +1769,7 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, goto restart; } if (is_dx(dir)) { - ret = ext4_dx_find_entry(dir, fname, res_dir); + ret = ext4_dx_find_entry(dir, fname, res_dir, lck); /* * On success, or if the error was file not found, * return. Otherwise, fall back to doing a search the @@ -1452,6 +1779,7 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, goto cleanup_and_exit; dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " "falling back\n")); + ext4_htree_safe_relock(lck); ret = NULL; } nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); @@ -1540,10 +1868,10 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, return ret; } -static struct buffer_head *ext4_find_entry(struct inode *dir, +struct buffer_head *ext4_find_entry_locked(struct inode *dir, const struct qstr *d_name, struct ext4_dir_entry_2 **res_dir, - int *inlined) + int *inlined, struct htree_lock *lck) { int err; struct ext4_filename fname; @@ -1555,11 +1883,12 @@ static struct buffer_head *ext4_find_entry(struct inode *dir, if (err) return ERR_PTR(err); - bh = __ext4_find_entry(dir, &fname, res_dir, inlined); + bh = __ext4_find_entry(dir, &fname, res_dir, inlined, lck); ext4_fname_free_filename(&fname); return bh; } +EXPORT_SYMBOL(ext4_find_entry_locked); static struct buffer_head *ext4_lookup_entry(struct inode *dir, struct dentry *dentry, @@ -1575,7 +1904,7 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir, if (err) return ERR_PTR(err); - bh = __ext4_find_entry(dir, &fname, res_dir, NULL); + bh = __ext4_find_entry(dir, &fname, res_dir, NULL, NULL); ext4_fname_free_filename(&fname); return bh; @@ -1583,7 +1912,8 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir, static struct buffer_head * ext4_dx_find_entry(struct inode *dir, struct ext4_filename *fname, - struct ext4_dir_entry_2 **res_dir) + struct ext4_dir_entry_2 **res_dir, + struct htree_lock *lck) { struct super_block * sb = dir->i_sb; struct dx_frame frames[EXT4_HTREE_LEVEL], *frame; @@ -1594,7 +1924,7 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir, #ifdef CONFIG_FS_ENCRYPTION *res_dir = NULL; #endif - frame = dx_probe(fname, dir, NULL, frames); + frame = dx_probe(fname, dir, NULL, frames, lck); if (IS_ERR(frame)) return (struct buffer_head *) frame; do { @@ -1616,7 +1946,7 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir, /* Check to see if we should continue to search */ retval = ext4_htree_next_block(dir, fname->hinfo.hash, frame, - frames, NULL); + frames, NULL, lck); if (retval < 0) { ext4_warning_inode(dir, "error %d reading directory index block", @@ -1799,8 +2129,9 @@ static struct ext4_dir_entry_2* dx_pack_dirents(char *base, unsigned blocksize) * Returns pointer to de in block into which the new entry will be inserted. */ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, - struct buffer_head **bh,struct dx_frame *frame, - struct dx_hash_info *hinfo) + struct buffer_head **bh, struct dx_frame *frames, + struct dx_frame *frame, struct dx_hash_info *hinfo, + struct htree_lock *lck) { unsigned blocksize = dir->i_sb->s_blocksize; unsigned count, continued; @@ -1862,8 +2193,15 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, hash2, split, count-split)); /* Fancy dance to stay within two buffers */ - de2 = dx_move_dirents(data1, data2, map + split, count - split, - blocksize); + if (hinfo->hash < hash2) { + de2 = dx_move_dirents(data1, data2, map + split, + count - split, blocksize); + } else { + /* make sure we will add entry to the same block which + * we have already locked + */ + de2 = dx_move_dirents(data1, data2, map, split, blocksize); + } de = dx_pack_dirents(data1, blocksize); de->rec_len = ext4_rec_len_to_disk(data1 + (blocksize - csum_size) - (char *) de, @@ -1884,12 +2222,20 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, dxtrace(dx_show_leaf(dir, hinfo, (struct ext4_dir_entry_2 *) data2, blocksize, 1)); - /* Which block gets the new entry? */ - if (hinfo->hash >= hash2) { - swap(*bh, bh2); - de = de2; + ext4_htree_spin_lock(lck, frame > frames ? (frame - 1)->at : NULL, + frame->at); /* notify block is being split */ + if (hinfo->hash < hash2) { + dx_insert_block(frame, hash2 + continued, newblock); + } else { + /* switch block number */ + dx_insert_block(frame, hash2 + continued, + dx_get_block(frame->at)); + dx_set_block(frame->at, newblock); + (frame->at)++; } - dx_insert_block(frame, hash2 + continued, newblock); + ext4_htree_spin_unlock(lck); + ext4_htree_dx_unlock(lck); + err = ext4_handle_dirty_dirent_node(handle, dir, bh2); if (err) goto journal_error; @@ -2167,7 +2513,7 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, if (retval) goto out_frames; - de = do_split(handle,dir, &bh2, frame, &fname->hinfo); + de = do_split(handle, dir, &bh2, frames, frame, &fname->hinfo, NULL); if (IS_ERR(de)) { retval = PTR_ERR(de); goto out_frames; @@ -2276,8 +2622,8 @@ static int ext4_update_dotdot(handle_t *handle, struct dentry *dentry, * may not sleep between calling this and putting something into * the entry, as someone else might have used it while you slept. */ -static int ext4_add_entry(handle_t *handle, struct dentry *dentry, - struct inode *inode) +int ext4_add_entry_locked(handle_t *handle, struct dentry *dentry, + struct inode *inode, struct htree_lock *lck) { struct inode *dir = d_inode(dentry->d_parent); struct buffer_head *bh = NULL; @@ -2326,9 +2672,10 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, if (dentry->d_name.len == 2 && memcmp(dentry->d_name.name, "..", 2) == 0) return ext4_update_dotdot(handle, dentry, inode); - retval = ext4_dx_add_entry(handle, &fname, dir, inode); + retval = ext4_dx_add_entry(handle, &fname, dir, inode, lck); if (!retval || (retval != ERR_BAD_DX_DIR)) goto out; + ext4_htree_safe_relock(lck); ext4_clear_inode_flag(dir, EXT4_INODE_INDEX); dx_fallback++; ext4_mark_inode_dirty(handle, dir); @@ -2378,12 +2725,14 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, ext4_set_inode_state(inode, EXT4_STATE_NEWENTRY); return retval; } +EXPORT_SYMBOL(ext4_add_entry_locked); /* * Returns 0 for success, or a negative error value */ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, - struct inode *dir, struct inode *inode) + struct inode *dir, struct inode *inode, + struct htree_lock *lck) { struct dx_frame frames[EXT4_HTREE_LEVEL], *frame; struct dx_entry *entries, *at; @@ -2395,7 +2744,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, again: restart = 0; - frame = dx_probe(fname, dir, NULL, frames); + frame = dx_probe(fname, dir, NULL, frames, lck); if (IS_ERR(frame)) return PTR_ERR(frame); entries = frame->entries; @@ -2430,6 +2779,12 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, struct dx_node *node2; struct buffer_head *bh2; + if (!ext4_htree_safe_locked(lck)) { /* retry with EX lock */ + ext4_htree_safe_relock(lck); + restart = 1; + goto cleanup; + } + while (frame > frames) { if (dx_get_count((frame - 1)->entries) < dx_get_limit((frame - 1)->entries)) { @@ -2533,8 +2888,34 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, restart = 1; goto journal_error; } + } else if (!ext4_htree_dx_locked(lck)) { + struct ext4_dir_lock_data *ld = ext4_htree_lock_data(lck); + + /* not well protected, require DX lock */ + ext4_htree_dx_need_lock(lck); + at = frame > frames ? (frame - 1)->at : NULL; + + /* NB: no risk of deadlock because it's just a try. + * + * NB: we check ld_count for twice, the first time before + * having DX lock, the second time after holding DX lock. + * + * NB: We never free blocks for directory so far, which + * means value returned by dx_get_count() should equal to + * ld->ld_count if nobody split any DE-block under @at, + * and ld->ld_at still points to valid dx_entry. + */ + if ((ld->ld_count != dx_get_count(entries)) || + !ext4_htree_dx_lock_try(lck, at) || + (ld->ld_count != dx_get_count(entries))) { + restart = 1; + goto cleanup; + } + /* OK, I've got DX lock and nothing changed */ + frame->at = ld->ld_at; } - de = do_split(handle, dir, &bh, frame, &fname->hinfo); + + de = do_split(handle, dir, &bh, frames, frame, &fname->hinfo, lck); if (IS_ERR(de)) { err = PTR_ERR(de); goto cleanup; @@ -2545,6 +2926,8 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, journal_error: ext4_std_error(dir->i_sb, err); /* this is a no-op if err == 0 */ cleanup: + ext4_htree_dx_unlock(lck); + ext4_htree_de_unlock(lck); brelse(bh); dx_release(frames); /* @restart is true means htree-path has been changed, we need to diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 564eb35..07242d7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1077,6 +1077,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) return NULL; inode_set_iversion(&ei->vfs_inode, 1); + sema_init(&ei->i_append_sem, 1); spin_lock_init(&ei->i_raw_lock); INIT_LIST_HEAD(&ei->i_prealloc_list); spin_lock_init(&ei->i_prealloc_lock); From patchwork Mon Jul 22 01:23:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051321 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB5001398 for ; Mon, 22 Jul 2019 01:25:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 93C2826E4A for ; Mon, 22 Jul 2019 01:25:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 83435284AA; Mon, 22 Jul 2019 01:25:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 72B3A26E4A for ; Mon, 22 Jul 2019 01:25:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 14A5121FD41; Sun, 21 Jul 2019 18:24:40 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6996121FD17 for ; Sun, 21 Jul 2019 18:24:01 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E598027F; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E085C1F8; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:42 -0400 Message-Id: <1563758631-29550-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/22] ext4: Add a proc interface for max_dir_size. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: James Simmons --- fs/ext4/sysfs.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 1375815..3a71a16 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -180,6 +180,8 @@ static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead, ext4_sb_info, s_inode_readahead_blks); EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal); +EXT4_RW_ATTR_SBI_UI(max_dir_size, s_max_dir_size_kb); +EXT4_RW_ATTR_SBI_UI(max_dir_size_kb, s_max_dir_size_kb); EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats); EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan); EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan); @@ -210,6 +212,8 @@ static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) ATTR_LIST(reserved_clusters), ATTR_LIST(inode_readahead_blks), ATTR_LIST(inode_goal), + ATTR_LIST(max_dir_size), + ATTR_LIST(max_dir_size_kb), ATTR_LIST(mb_stats), ATTR_LIST(mb_max_to_scan), ATTR_LIST(mb_min_to_scan), @@ -350,6 +354,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj, ret = kstrtoul(skip_spaces(buf), 0, &t); if (ret) return ret; + if (strcmp("max_dir_size", a->attr.name) == 0) + t >>= 10; if (a->attr_ptr == ptr_ext4_super_block_offset) *((__le32 *) ptr) = cpu_to_le32(t); else From patchwork Mon Jul 22 01:23:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051315 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 097AB1398 for ; Mon, 22 Jul 2019 01:24:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E8B6A26E4A for ; Mon, 22 Jul 2019 01:24:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC993284AA; Mon, 22 Jul 2019 01:24:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9768B26E4A for ; Mon, 22 Jul 2019 01:24:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A0F5221FD1C; Sun, 21 Jul 2019 18:24:33 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A9F8421FD17 for ; Sun, 21 Jul 2019 18:24:01 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E64C2617; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E4C07201; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:43 -0400 Message-Id: <1563758631-29550-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/22] ext4: remove inode_lock handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP There will cause a deadlock if invoke ext4_truncate with i_mutex locked in lustre. Since lustre has own lock to provide protect so we don't need this check at all. Signed-off-by: James Simmons --- fs/ext4/inode.c | 2 -- fs/ext4/namei.c | 4 ---- 2 files changed, 6 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5561351..9296611 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4474,8 +4474,6 @@ int ext4_truncate(struct inode *inode) * or it's a completely new inode. In those cases we might not * have i_mutex locked because it's not necessary. */ - if (!(inode->i_state & (I_NEW|I_FREEING))) - WARN_ON(!inode_is_locked(inode)); trace_ext4_truncate_enter(inode); if (!ext4_can_truncate(inode)) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 0153c4d..1b6d22a 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3485,8 +3485,6 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode) if (!sbi->s_journal || is_bad_inode(inode)) return 0; - WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) && - !inode_is_locked(inode)); /* * Exit early if inode already is on orphan list. This is a big speedup * since we don't have to contend on the global s_orphan_lock. @@ -3569,8 +3567,6 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode) if (!sbi->s_journal && !(sbi->s_mount_state & EXT4_ORPHAN_FS)) return 0; - WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) && - !inode_is_locked(inode)); /* Do this quick check before taking global s_orphan_lock. */ if (list_empty(&ei->i_orphan)) return 0; From patchwork Mon Jul 22 01:23:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051327 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 536501398 for ; Mon, 22 Jul 2019 01:25:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3DF2426E4A for ; Mon, 22 Jul 2019 01:25:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 321FC284AA; Mon, 22 Jul 2019 01:25:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8426626E4A for ; Mon, 22 Jul 2019 01:25:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D645121FE0A; Sun, 21 Jul 2019 18:24:48 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6471D21FD25 for ; Sun, 21 Jul 2019 18:24:02 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EABDBB86; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E8160B5; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:44 -0400 Message-Id: <1563758631-29550-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/22] ext4: remove bitmap corruption warnings X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Since we could skip corrupt block groups, this patch use ext4_warning() intead of ext4_error() to make FS not mount RO in default. Signed-off-by: James Simmons --- fs/ext4/balloc.c | 10 ++++----- fs/ext4/ialloc.c | 6 ++--- fs/ext4/mballoc.c | 66 +++++++++++++++++++++++-------------------------------- 3 files changed, 35 insertions(+), 47 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index bca75c1..aff3981 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -374,7 +374,7 @@ static int ext4_validate_block_bitmap(struct super_block *sb, if (unlikely(!ext4_block_bitmap_csum_verify(sb, block_group, desc, bh))) { ext4_unlock_group(sb, block_group); - ext4_error(sb, "bg %u: bad block bitmap checksum", block_group); + ext4_warning(sb, "bg %u: bad block bitmap checksum", block_group); ext4_mark_group_bitmap_corrupted(sb, block_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); return -EFSBADCRC; @@ -382,8 +382,8 @@ static int ext4_validate_block_bitmap(struct super_block *sb, blk = ext4_valid_block_bitmap(sb, desc, block_group, bh); if (unlikely(blk != 0)) { ext4_unlock_group(sb, block_group); - ext4_error(sb, "bg %u: block %llu: invalid block bitmap", - block_group, blk); + ext4_warning(sb, "bg %u: block %llu: invalid block bitmap", + block_group, blk); ext4_mark_group_bitmap_corrupted(sb, block_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); return -EFSCORRUPTED; @@ -459,8 +459,8 @@ struct buffer_head * ext4_unlock_group(sb, block_group); unlock_buffer(bh); if (err) { - ext4_error(sb, "Failed to init block bitmap for group " - "%u: %d", block_group, err); + ext4_warning(sb, "Failed to init block bitmap for group " + "%u: %d", block_group, err); goto out; } goto verify; diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 68d41e6..a72fe63 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -96,8 +96,8 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, if (!ext4_inode_bitmap_csum_verify(sb, block_group, desc, bh, EXT4_INODES_PER_GROUP(sb) / 8)) { ext4_unlock_group(sb, block_group); - ext4_error(sb, "Corrupt inode bitmap - block_group = %u, " - "inode_bitmap = %llu", block_group, blk); + ext4_warning(sb, "Corrupt inode bitmap - block_group = %u, " + "inode_bitmap = %llu", block_group, blk); ext4_mark_group_bitmap_corrupted(sb, block_group, EXT4_GROUP_INFO_IBITMAP_CORRUPT); return -EFSBADCRC; @@ -346,7 +346,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode) if (!fatal) fatal = err; } else { - ext4_error(sb, "bit already cleared for inode %lu", ino); + ext4_warning(sb, "bit already cleared for inode %lu", ino); ext4_mark_group_bitmap_corrupted(sb, block_group, EXT4_GROUP_INFO_IBITMAP_CORRUPT); } diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 463fba6..82398b0 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -741,10 +741,15 @@ int ext4_mb_generate_buddy(struct super_block *sb, grp->bb_fragments = fragments; if (free != grp->bb_free) { - ext4_grp_locked_error(sb, group, 0, 0, - "block bitmap and bg descriptor " - "inconsistent: %u vs %u free clusters", - free, grp->bb_free); + struct ext4_group_desc *gdp; + + gdp = ext4_get_group_desc(sb, group, NULL); + ext4_warning(sb, "group %lu: block bitmap and bg descriptor " + "inconsistent: %u vs %u free clusters " + "%u in gd, %lu pa's", + (long unsigned int)group, free, grp->bb_free, + ext4_free_group_clusters(sb, gdp), + grp->bb_prealloc_nr); /* * If we intend to continue, we consider group descriptor * corrupt and update bb_free using bitmap value @@ -1106,7 +1111,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group, gfp_t gfp) int block; int pnum; int poff; - struct page *page; + struct page *page = NULL; int ret; struct ext4_group_info *grp; struct ext4_sb_info *sbi = EXT4_SB(sb); @@ -1132,7 +1137,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group, gfp_t gfp) */ ret = ext4_mb_init_group(sb, group, gfp); if (ret) - return ret; + goto err; } /* @@ -1235,6 +1240,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group, gfp_t gfp) put_page(e4b->bd_buddy_page); e4b->bd_buddy = NULL; e4b->bd_bitmap = NULL; + ext4_warning(sb, "Error loading buddy information for %u", group); return ret; } @@ -3631,8 +3637,10 @@ int ext4_mb_check_ondisk_bitmap(struct super_block *sb, void *bitmap, } if (free != free_in_gdp) { - ext4_error(sb, "on-disk bitmap for group %d corrupted: %u blocks free in bitmap, %u - in gd\n", - group, free, free_in_gdp); + ext4_warning(sb, "on-disk bitmap for group %d corrupted: %u blocks free in bitmap, %u - in gd\n", + group, free, free_in_gdp); + ext4_mark_group_bitmap_corrupted(sb, group, + EXT4_GROUP_INFO_BBITMAP_CORRUPT); return -EIO; } return 0; @@ -4015,14 +4023,6 @@ static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac) * otherwise maybe leave some free blocks unavailable, no need to BUG. */ if ((free > pa->pa_free && !pa->pa_error) || (free < pa->pa_free)) { - ext4_error(sb, "pa free mismatch: [pa %p] " - "[phy %lu] [logic %lu] [len %u] [free %u] " - "[error %u] [inode %lu] [freed %u]", pa, - (unsigned long)pa->pa_pstart, - (unsigned long)pa->pa_lstart, - (unsigned)pa->pa_len, (unsigned)pa->pa_free, - (unsigned)pa->pa_error, pa->pa_inode->i_ino, - free); ext4_grp_locked_error(sb, group, 0, 0, "free %u, pa_free %u", free, pa->pa_free); /* @@ -4086,15 +4086,11 @@ static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac) bitmap_bh = ext4_read_block_bitmap(sb, group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); - ext4_error(sb, "Error %d reading block bitmap for %u", - err, group); return 0; } err = ext4_mb_load_buddy(sb, group, &e4b); if (err) { - ext4_warning(sb, "Error %d loading buddy information for %u", - err, group); put_bh(bitmap_bh); return 0; } @@ -4255,17 +4251,12 @@ void ext4_discard_preallocations(struct inode *inode) err = ext4_mb_load_buddy_gfp(sb, group, &e4b, GFP_NOFS|__GFP_NOFAIL); - if (err) { - ext4_error(sb, "Error %d loading buddy information for %u", - err, group); + if (err) return; - } bitmap_bh = ext4_read_block_bitmap(sb, group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); - ext4_error(sb, "Error %d reading block bitmap for %u", - err, group); ext4_mb_unload_buddy(&e4b); continue; } @@ -4527,11 +4518,8 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) group = ext4_get_group_number(sb, pa->pa_pstart); err = ext4_mb_load_buddy_gfp(sb, group, &e4b, GFP_NOFS|__GFP_NOFAIL); - if (err) { - ext4_error(sb, "Error %d loading buddy information for %u", - err, group); + if (err) continue; - } ext4_lock_group(sb, group); list_del(&pa->pa_group_list); ext4_get_group_info(sb, group)->bb_prealloc_nr--; @@ -4785,7 +4773,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, * not revert pa_free back, just mark pa_error */ pa->pa_error++; - ext4_error(sb, + ext4_warning(sb, "Updating bitmap error: [err %d] " "[pa %p] [phy %lu] [logic %lu] " "[len %u] [free %u] [error %u] " @@ -4796,6 +4784,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, (unsigned)pa->pa_free, (unsigned)pa->pa_error, pa->pa_inode ? pa->pa_inode->i_ino : 0); + ext4_mark_group_bitmap_corrupted(sb, 0, 0); } } ext4_mb_release_context(ac); @@ -5081,7 +5070,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, GFP_NOFS|__GFP_NOFAIL); if (err) - goto error_return; + goto error_brelse; /* * We need to make sure we don't reuse the freed block until after the @@ -5171,8 +5160,9 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, goto do_more; } error_return: - brelse(bitmap_bh); ext4_std_error(sb, err); +error_brelse: + brelse(bitmap_bh); return; } @@ -5272,7 +5262,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, err = ext4_mb_load_buddy(sb, block_group, &e4b); if (err) - goto error_return; + goto error_brelse; /* * need to update group_info->bb_free and bitmap @@ -5310,8 +5300,9 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, err = ret; error_return: - brelse(bitmap_bh); ext4_std_error(sb, err); +error_brelse: + brelse(bitmap_bh); return err; } @@ -5386,11 +5377,8 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count, trace_ext4_trim_all_free(sb, group, start, max); ret = ext4_mb_load_buddy(sb, group, &e4b); - if (ret) { - ext4_warning(sb, "Error %d loading buddy information for %u", - ret, group); + if (ret) return ret; - } bitmap = e4b.bd_bitmap; ext4_lock_group(sb, group); From patchwork Mon Jul 22 01:23:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051301 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 205A21398 for ; Mon, 22 Jul 2019 01:24:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F13D1284B3 for ; Mon, 22 Jul 2019 01:24:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1CF627528; Mon, 22 Jul 2019 01:24:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6842A27528 for ; Mon, 22 Jul 2019 01:24:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9CD9F21FDC5; Sun, 21 Jul 2019 18:24:15 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA58E21FD28 for ; Sun, 21 Jul 2019 18:24:02 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EC7712227; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EB217BD; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:45 -0400 Message-Id: <1563758631-29550-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/22] ext4: add warning for directory htree growth X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: James Simmons --- fs/ext4/ext4.h | 1 + fs/ext4/namei.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/ext4/super.c | 2 ++ fs/ext4/sysfs.c | 2 ++ 4 files changed, 72 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index bf74c7c..5f73e19 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1472,6 +1472,7 @@ struct ext4_sb_info { unsigned int s_mb_group_prealloc; unsigned long *s_mb_prealloc_table; unsigned int s_max_dir_size_kb; + unsigned long s_warning_dir_size; /* where last allocation was done - for stream allocation */ unsigned long s_mb_last_group; unsigned long s_mb_last_start; diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 1b6d22a..9b30cc6 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -757,12 +757,20 @@ struct ext4_dir_lock_data { #define ext4_htree_lock_data(l) ((struct ext4_dir_lock_data *)(l)->lk_private) #define ext4_find_entry(dir, name, dirent, inline) \ ext4_find_entry_locked(dir, name, dirent, inline, NULL) -#define ext4_add_entry(handle, dentry, inode) \ - ext4_add_entry_locked(handle, dentry, inode, NULL) /* NB: ext4_lblk_t is 32 bits so we use high bits to identify invalid blk */ #define EXT4_HTREE_NODE_CHANGED (0xcafeULL << 32) +inline int ext4_add_entry(handle_t *handle, struct dentry *dentry, + struct inode *inode) +{ + int ret = ext4_add_entry_locked(handle, dentry, inode, NULL); + + if (ret == -ENOBUFS) + ret = 0; + return ret; +} + static void ext4_htree_event_cb(void *target, void *event) { u64 *block = (u64 *)target; @@ -2612,6 +2620,55 @@ static int ext4_update_dotdot(handle_t *handle, struct dentry *dentry, return err; } +static unsigned long __ext4_max_dir_size(struct dx_frame *frames, + struct dx_frame *frame, + struct inode *dir) +{ + unsigned long max_dir_size; + + if (EXT4_SB(dir->i_sb)->s_max_dir_size_kb) { + max_dir_size = EXT4_SB(dir->i_sb)->s_max_dir_size_kb << 10; + } else { + max_dir_size = EXT4_BLOCK_SIZE(dir->i_sb); + while (frame >= frames) { + max_dir_size *= dx_get_limit(frame->entries); + if (frame == frames) + break; + frame--; + } + /* use 75% of max dir size in average */ + max_dir_size = max_dir_size / 4 * 3; + } + return max_dir_size; +} + +/* + * With hash tree growing, it is easy to hit ENOSPC, but it is hard + * to predict when it will happen. let's give administrators warning + * when reaching 3/5 and 2/3 of limit + */ +static inline bool dir_size_in_warning_range(struct dx_frame *frames, + struct dx_frame *frame, + struct inode *dir) +{ + struct super_block *sb = dir->i_sb; + unsigned long size1, size2; + + if (unlikely(!EXT4_SB(sb)->s_warning_dir_size)) + EXT4_SB(sb)->s_warning_dir_size = + __ext4_max_dir_size(frames, frame, dir); + + size1 = EXT4_SB(sb)->s_warning_dir_size / 16 * 10; + size1 = size1 & ~(EXT4_BLOCK_SIZE(sb) - 1); + size2 = EXT4_SB(sb)->s_warning_dir_size / 16 * 11; + size2 = size2 & ~(EXT4_BLOCK_SIZE(sb) - 1); + if (in_range(dir->i_size, size1, EXT4_BLOCK_SIZE(sb)) || + in_range(dir->i_size, size2, EXT4_BLOCK_SIZE(sb))) + return true; + + return false; +} + /* * ext4_add_entry() * @@ -2739,6 +2796,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, struct buffer_head *bh; struct super_block *sb = dir->i_sb; struct ext4_dir_entry_2 *de; + bool ret_warn = false; int restart; int err; @@ -2769,6 +2827,11 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, /* Block full, should compress but for now just split */ dxtrace(printk(KERN_DEBUG "using %u of %u node entries\n", dx_get_count(entries), dx_get_limit(entries))); + + if (frame - frames + 1 >= ext4_dir_htree_level(sb) || + EXT4_SB(sb)->s_warning_dir_size) + ret_warn = dir_size_in_warning_range(frames, frame, dir); + /* Need to split index? */ if (dx_get_count(entries) == dx_get_limit(entries)) { ext4_lblk_t newblock; @@ -2935,6 +2998,8 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, */ if (restart && err == 0) goto again; + if (err == 0 && ret_warn) + err = -ENOBUFS; return err; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 07242d7..a3179b2 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1901,6 +1901,8 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, sbi->s_li_wait_mult = arg; } else if (token == Opt_max_dir_size_kb) { sbi->s_max_dir_size_kb = arg; + /* reset s_warning_dir_size and make it re-calculated */ + sbi->s_warning_dir_size = 0; } else if (token == Opt_stripe) { sbi->s_stripe = arg; } else if (token == Opt_resuid) { diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 3a71a16..575f318 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -182,6 +182,7 @@ static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal); EXT4_RW_ATTR_SBI_UI(max_dir_size, s_max_dir_size_kb); EXT4_RW_ATTR_SBI_UI(max_dir_size_kb, s_max_dir_size_kb); +EXT4_RW_ATTR_SBI_UI(warning_dir_size, s_warning_dir_size); EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats); EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan); EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan); @@ -214,6 +215,7 @@ static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) ATTR_LIST(inode_goal), ATTR_LIST(max_dir_size), ATTR_LIST(max_dir_size_kb), + ATTR_LIST(warning_dir_size), ATTR_LIST(mb_stats), ATTR_LIST(mb_max_to_scan), ATTR_LIST(mb_min_to_scan), From patchwork Mon Jul 22 01:23:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051325 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3E3A14F6 for ; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE3F626E4A for ; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C230E27528; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 867DC284B3 for ; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4B09521FE91; Sun, 21 Jul 2019 18:24:45 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 24A2221FD28 for ; Sun, 21 Jul 2019 18:24:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EFFF52229; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EE5B11FA; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:46 -0400 Message-Id: <1563758631-29550-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/22] ext4: optimize ext4_journal_callback_add X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Change list_add_tail to list_add. It gives advantages to ldiskfs in tgt_cb_last_committed. In the beginning of list will be placed thandles with the highest transaction numbers. So at the first iterations we will have the highest transno. It will save from extra call of ptlrpc_commit_replies. Signed-off-by: James Simmons --- fs/ext4/ext4_jbd2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 75a5309..5ebf8ee 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -169,7 +169,7 @@ static inline void _ext4_journal_callback_add(handle_t *handle, struct ext4_journal_cb_entry *jce) { /* Add the jce to transaction's private list */ - list_add_tail(&jce->jce_list, &handle->h_transaction->t_private_list); + list_add(&jce->jce_list, &handle->h_transaction->t_private_list); } static inline void ext4_journal_callback_add(handle_t *handle, From patchwork Mon Jul 22 01:23:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051329 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0962D1398 for ; Mon, 22 Jul 2019 01:25:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5CCA26E4A for ; Mon, 22 Jul 2019 01:25:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA590284AA; Mon, 22 Jul 2019 01:25:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8B99E26E4A for ; Mon, 22 Jul 2019 01:25:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 70E6A21FD32; Sun, 21 Jul 2019 18:24:52 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6EE0221FCCB for ; Sun, 21 Jul 2019 18:24:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F2B75222E; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F1B3DBF; Sun, 21 Jul 2019 21:23:55 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:47 -0400 Message-Id: <1563758631-29550-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/22] ext4: attach jinode in writepages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: James Simmons --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5f73e19..0ee4606 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2643,6 +2643,7 @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid); /* inode.c */ +#define HAVE_LDISKFS_INFO_JINODE int ext4_inode_is_fast_symlink(struct inode *inode); struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int); struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9296611..bf00dfc 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -731,6 +731,9 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, !(flags & EXT4_GET_BLOCKS_ZERO) && !ext4_is_quota_file(inode) && ext4_should_order_data(inode)) { + ret = ext4_inode_attach_jinode(inode); + if (ret) + return ret; if (flags & EXT4_GET_BLOCKS_IO_SUBMIT) ret = ext4_jbd2_inode_add_wait(handle, inode); else @@ -2815,6 +2818,9 @@ static int ext4_writepages(struct address_space *mapping, mpd.last_page = wbc->range_end >> PAGE_SHIFT; } + ret = ext4_inode_attach_jinode(inode); + if (ret) + return ret; mpd.inode = inode; mpd.wbc = wbc; ext4_io_submit_init(&mpd.io_submit, wbc); @@ -4432,6 +4438,7 @@ int ext4_inode_attach_jinode(struct inode *inode) jbd2_free_inode(jinode); return 0; } +EXPORT_SYMBOL(ext4_inode_attach_jinode); /* * ext4_truncate() From patchwork Mon Jul 22 01:23:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051309 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CFEB1398 for ; Mon, 22 Jul 2019 01:24:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 288E026E4A for ; Mon, 22 Jul 2019 01:24:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19F4E284AA; Mon, 22 Jul 2019 01:24:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CDE3A26E4A for ; Mon, 22 Jul 2019 01:24:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CBAB021FE07; Sun, 21 Jul 2019 18:24:25 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B144021FD02 for ; Sun, 21 Jul 2019 18:24:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 02635222F; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 00DF71F3; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:48 -0400 Message-Id: <1563758631-29550-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/22] ext4: don't check before replay X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP When ldiskfs run in failover mode whith read-only disk. Part of allocation updates are lost and ldiskfs may fail while mounting this is due to inconsistent state of group-descriptor. Group-descriptor check is added after journal replay. Signed-off-by: James Simmons --- fs/ext4/super.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a3179b2..b818acb 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4255,11 +4255,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) } } sbi->s_gdb_count = db_count; - if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { - ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); - ret = -EFSCORRUPTED; - goto failed_mount2; - } timer_setup(&sbi->s_err_report, print_daily_error_info, 0); @@ -4401,6 +4396,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; no_journal: + if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { + ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); + ret = -EFSCORRUPTED; + goto failed_mount_wq; + } + if (!test_opt(sb, NO_MBCACHE)) { sbi->s_ea_block_cache = ext4_xattr_create_cache(); if (!sbi->s_ea_block_cache) { From patchwork Mon Jul 22 01:23:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051323 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94C6913AC for ; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FB2C26E4A for ; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 73BEA284AA; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3A90D26E4A for ; Mon, 22 Jul 2019 01:25:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 07E3021FE84; Sun, 21 Jul 2019 18:24:45 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0295221FCCB for ; Sun, 21 Jul 2019 18:24:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 05D432230; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 04464B5; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:49 -0400 Message-Id: <1563758631-29550-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/22] ext4: use GFP_NOFS in ext4_inode_attach_jinode X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: James Simmons --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bf00dfc..ca72097 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4422,7 +4422,7 @@ int ext4_inode_attach_jinode(struct inode *inode) if (ei->jinode || !EXT4_SB(inode->i_sb)->s_journal) return 0; - jinode = jbd2_alloc_inode(GFP_KERNEL); + jinode = jbd2_alloc_inode(GFP_NOFS); spin_lock(&inode->i_lock); if (!ei->jinode) { if (!jinode) { From patchwork Mon Jul 22 01:23:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051319 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74ADC13AC for ; Mon, 22 Jul 2019 01:24:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FE1C26E4A for ; Mon, 22 Jul 2019 01:24:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54494284AA; Mon, 22 Jul 2019 01:24:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1D5D626E4A for ; Mon, 22 Jul 2019 01:24:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 17C6621FE64; Sun, 21 Jul 2019 18:24:38 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5035C21FD37 for ; Sun, 21 Jul 2019 18:24:04 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 084B42231; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 07418BD; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:50 -0400 Message-Id: <1563758631-29550-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/22] ext4: export ext4_orphan_add X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: James Simmons --- fs/ext4/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 9b30cc6..6cb3f63 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3615,6 +3615,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode) ext4_std_error(sb, err); return err; } +EXPORT_SYMBOL(ext4_orphan_add); /* * ext4_orphan_del() removes an unlinked or truncated inode from the list From patchwork Mon Jul 22 01:23:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11051331 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0256613AC for ; Mon, 22 Jul 2019 01:25:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E19EE26E4A for ; Mon, 22 Jul 2019 01:25:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5B53284AA; Mon, 22 Jul 2019 01:25:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 83DDE26E4A for ; Mon, 22 Jul 2019 01:25:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 132DE21FE42; Sun, 21 Jul 2019 18:24:57 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 95C0021FCCB for ; Sun, 21 Jul 2019 18:24:04 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0B3202232; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0A18EBF; Sun, 21 Jul 2019 21:23:56 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff , Li Dongyang , Artem Blagodarenko , Yang Sheng Date: Sun, 21 Jul 2019 21:23:51 -0400 Message-Id: <1563758631-29550-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> References: <1563758631-29550-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/22] ext4: export mb stream allocator variables X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: James Simmons --- fs/ext4/ext4.h | 2 ++ fs/ext4/mballoc.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/ext4/sysfs.c | 4 ++++ 3 files changed, 65 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 0ee4606..46f2619 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2620,6 +2620,8 @@ extern int ext4_init_inode_table(struct super_block *sb, /* mballoc.c */ extern const struct file_operations ext4_seq_prealloc_table_fops; extern const struct seq_operations ext4_mb_seq_groups_ops; +extern const struct file_operations ext4_seq_mb_last_group_fops; +extern int ext4_mb_seq_last_start_seq_show(struct seq_file *m, void *v); extern long ext4_mb_stats; extern long ext4_mb_max_to_scan; extern int ext4_mb_init(struct super_block *); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 82398b0..8270e35 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2459,6 +2459,65 @@ static struct kmem_cache *get_groupinfo_cache(int blocksize_bits) return cachep; } +#define EXT4_MB_MAX_INPUT_STRING_SIZE 32 + +static ssize_t ext4_mb_last_group_write(struct file *file, + const char __user *buf, + size_t cnt, loff_t *pos) +{ + char dummy[EXT4_MB_MAX_INPUT_STRING_SIZE + 1]; + struct super_block *sb = PDE_DATA(file_inode(file)); + struct ext4_sb_info *sbi = EXT4_SB(sb); + unsigned long val; + char *end; + + if (cnt > EXT4_MB_MAX_INPUT_STRING_SIZE) + return -EINVAL; + if (copy_from_user(dummy, buf, cnt)) + return -EFAULT; + dummy[cnt] = '\0'; + val = simple_strtoul(dummy, &end, 0); + if (dummy == end) + return -EINVAL; + if (val >= ext4_get_groups_count(sb)) + return -ERANGE; + spin_lock(&sbi->s_md_lock); + sbi->s_mb_last_group = val; + sbi->s_mb_last_start = 0; + spin_unlock(&sbi->s_md_lock); + return cnt; +} + +static int ext4_mb_seq_last_group_seq_show(struct seq_file *m, void *v) +{ + struct ext4_sb_info *sbi = EXT4_SB(m->private); + + seq_printf(m , "%ld\n", sbi->s_mb_last_group); + return 0; +} + +static int ext4_mb_seq_last_group_open(struct inode *inode, struct file *file) +{ + return single_open(file, ext4_mb_seq_last_group_seq_show, PDE_DATA(inode)); +} + +const struct file_operations ext4_seq_mb_last_group_fops = { + .owner = THIS_MODULE, + .open = ext4_mb_seq_last_group_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, + .write = ext4_mb_last_group_write, +}; + +int ext4_mb_seq_last_start_seq_show(struct seq_file *m, void *v) +{ + struct ext4_sb_info *sbi = EXT4_SB(m->private); + + seq_printf(m , "%ld\n", sbi->s_mb_last_start); + return 0; +} + /* * Allocate the top-level s_group_info array for the specified number * of groups diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 575f318..6bcb455 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -425,6 +425,10 @@ int ext4_register_sysfs(struct super_block *sb) &ext4_mb_seq_groups_ops, sb); proc_create_data("prealloc_table", S_IRUGO, sbi->s_proc, &ext4_seq_prealloc_table_fops, sb); + proc_create_data("mb_last_group", S_IRUGO, sbi->s_proc, + &ext4_seq_mb_last_group_fops, sb); + proc_create_single_data("mb_last_start", S_IRUGO, sbi->s_proc, + ext4_mb_seq_last_start_seq_show, sb); } return 0; }