diff mbox series

fs: change last_ino type to unsigned long

Message ID 1561811293-75769-1-git-send-email-zhengbin13@huawei.com (mailing list archive)
State New, archived
Headers show
Series fs: change last_ino type to unsigned long | expand

Commit Message

Zheng Bin June 29, 2019, 12:28 p.m. UTC
tmpfs use get_next_ino to get inode number, if last_ino wraps,
there will be files share the same inode number. Change last_ino
type to unsigned long.

PS: Type of __old_kernel_stat->st_ino(cp_old_stat) is unsigned short
in x86 & arm, if we want to avoid an EOVERFLOW error on non LFS
stat() call, we need to change last_ino type to unsigned short

Signed-off-by: zhengbin <zhengbin13@huawei.com>
---
 fs/inode.c         | 16 +++++++---------
 include/linux/fs.h |  2 +-
 2 files changed, 8 insertions(+), 10 deletions(-)

--
2.7.4

Comments

Matthew Wilcox (Oracle) June 29, 2019, 2:21 p.m. UTC | #1
On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote:
> tmpfs use get_next_ino to get inode number, if last_ino wraps,
> there will be files share the same inode number. Change last_ino
> type to unsigned long.

Is this a serious problem?  I'd be more convinced by a patch to use
the sbitmap data structure than a blind conversion to use atomic64_t.
Zheng Bin July 1, 2019, 1:48 a.m. UTC | #2
On 2019/6/29 22:21, Matthew Wilcox wrote:
> On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote:
>> tmpfs use get_next_ino to get inode number, if last_ino wraps,
>> there will be files share the same inode number. Change last_ino
>> type to unsigned long.
> Is this a serious problem?  I'd be more convinced by a patch to use
> the sbitmap data structure than a blind conversion to use atomic64_t.

Yes, if two files share the same inode number, when application uses dlopen to get

file handle,  there will be problems.


Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode,

we need to add it to the hashtable), the questions are:

1. inode creation will be slow down, as the comment of function  iunique says:

 *    BUGS:
 *    With a large number of inodes live on the file system this function
 *    currently becomes quite slow.

2. we need to convert all callers of  get_next_ino to use iunique (tmpfs, autofs, configfs...),

moreover, the more callers are converted, the function of iunique will be more slower.


Looking forward to your reply, thanks.

> .
>
Matthew Wilcox (Oracle) July 1, 2019, 1:52 a.m. UTC | #3
On Mon, Jul 01, 2019 at 09:48:03AM +0800, zhengbin (A) wrote:
> 
> On 2019/6/29 22:21, Matthew Wilcox wrote:
> > On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote:
> >> tmpfs use get_next_ino to get inode number, if last_ino wraps,
> >> there will be files share the same inode number. Change last_ino
> >> type to unsigned long.
> > Is this a serious problem?
> 
> Yes, if two files share the same inode number, when application uses dlopen to get
> 
> file handle,  there will be problems.

That wasn't what I meant.  Does it happen in practice?  Have you observed
it, or are you just worrying about a potential problem?

> Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode,
> 
> we need to add it to the hashtable), the questions are:
> 
> 1. inode creation will be slow down, as the comment of function  iunique says:
> 
>  *    BUGS:
>  *    With a large number of inodes live on the file system this function
>  *    currently becomes quite slow.
> 
> 2. we need to convert all callers of  get_next_ino to use iunique (tmpfs, autofs, configfs...),
> 
> moreover, the more callers are converted, the function of iunique will be more slower.
> 
> 
> Looking forward to your reply, thanks.

> > I'd be more convinced by a patch to use
> > the sbitmap data structure than a blind conversion to use atomic64_t.

You ignored this sentence.
Zheng Bin July 1, 2019, 2:04 a.m. UTC | #4
On 2019/7/1 9:52, Matthew Wilcox wrote:
> On Mon, Jul 01, 2019 at 09:48:03AM +0800, zhengbin (A) wrote:
>> On 2019/6/29 22:21, Matthew Wilcox wrote:
>>> On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote:
>>>> tmpfs use get_next_ino to get inode number, if last_ino wraps,
>>>> there will be files share the same inode number. Change last_ino
>>>> type to unsigned long.
>>> Is this a serious problem?
>> Yes, if two files share the same inode number, when application uses dlopen to get
>>
>> file handle,  there will be problems.
> That wasn't what I meant.  Does it happen in practice?  Have you observed
> it, or are you just worrying about a potential problem?

Yes, it happens in practice. we use a script to test it, after 10 days, this problem will happen:

while the script are as follows:

while(1) {

  create a file

   dlopen it

    ...

    remove it

}

>> Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode,
>>
>> we need to add it to the hashtable), the questions are:
>>
>> 1. inode creation will be slow down, as the comment of function  iunique says:
>>
>>  *    BUGS:
>>  *    With a large number of inodes live on the file system this function
>>  *    currently becomes quite slow.
>>
>> 2. we need to convert all callers of  get_next_ino to use iunique (tmpfs, autofs, configfs...),
>>
>> moreover, the more callers are converted, the function of iunique will be more slower.
>>
>>
>> Looking forward to your reply, thanks.
>>> I'd be more convinced by a patch to use
>>> the sbitmap data structure than a blind conversion to use atomic64_t.
> You ignored this sentence.
I will study sbitmap data structure
> .
>
Zheng Bin July 1, 2019, 11:21 a.m. UTC | #5
On 2019/7/1 9:52, Matthew Wilcox wrote:

> On Mon, Jul 01, 2019 at 09:48:03AM +0800, zhengbin (A) wrote:
>> On 2019/6/29 22:21, Matthew Wilcox wrote:
>>> On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote:
>>>> tmpfs use get_next_ino to get inode number, if last_ino wraps,
>>>> there will be files share the same inode number. Change last_ino
>>>> type to unsigned long.
>>> Is this a serious problem?
>> Yes, if two files share the same inode number, when application uses dlopen to get
>>
>> file handle,  there will be problems.
> That wasn't what I meant.  Does it happen in practice?  Have you observed
> it, or are you just worrying about a potential problem?
>
>> Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode,
>>
>> we need to add it to the hashtable), the questions are:
>>
>> 1. inode creation will be slow down, as the comment of function  iunique says:
>>
>>  *    BUGS:
>>  *    With a large number of inodes live on the file system this function
>>  *    currently becomes quite slow.
>>
>> 2. we need to convert all callers of  get_next_ino to use iunique (tmpfs, autofs, configfs...),
>>
>> moreover, the more callers are converted, the function of iunique will be more slower.
>>
>>
>> Looking forward to your reply, thanks.
>>> I'd be more convinced by a patch to use
>>> the sbitmap data structure than a blind conversion to use atomic64_t.
> You ignored this sentence.

Hi, maybe sbitmap is not a good solution, max_inodes of tmpfs are controlled by mount options--nrinodes,

which can be modified by remountfs(bigger or smaller), as the comment of function sbitmap_resize says:

 * Doesn't reallocate anything. It's up to the caller to ensure that the new
 * depth doesn't exceed the depth that the sb was initialized with.

We can modify this to meet the growing requirements, there will still be questions as follows:

1. tmpfs is a ram filesystem, we need to allocate sbitmap memory for sbinfo->max_inodes(while this maybe huge)

2.If remountfs changes  max_inode, we have to deal with it, while this may take a long time

(bigger: we need to free the old sbitmap memory, allocate new memory, copy the old sbitmap to new sbitmap

smaller: How do we deal with it?ie: we use sb->map[inode number/8] to find the sbitmap, we need to change the exist

inode numbers?while this maybe used by userspace application.)

>
> .
>
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index df6542e..84cf8e3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -870,22 +870,20 @@  static struct inode *find_inode_fast(struct super_block *sb,
  * 2^32 range, and is a worst-case. Even a 50% wastage would only increase
  * overflow rate by 2x, which does not seem too significant.
  *
- * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
- * error if st_ino won't fit in target struct field. Use 32bit counter
- * here to attempt to avoid that.
  */
 #define LAST_INO_BATCH 1024
-static DEFINE_PER_CPU(unsigned int, last_ino);
+static DEFINE_PER_CPU(unsigned long, last_ino);

-unsigned int get_next_ino(void)
+unsigned long get_next_ino(void)
 {
-	unsigned int *p = &get_cpu_var(last_ino);
-	unsigned int res = *p;
+	unsigned long *p = &get_cpu_var(last_ino);
+	unsigned long res = *p;

 #ifdef CONFIG_SMP
 	if (unlikely((res & (LAST_INO_BATCH-1)) == 0)) {
-		static atomic_t shared_last_ino;
-		int next = atomic_add_return(LAST_INO_BATCH, &shared_last_ino);
+		static atomic64_t shared_last_ino;
+		long next = atomic64_add_return(LAST_INO_BATCH,
+						&shared_last_ino);

 		res = next - LAST_INO_BATCH;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f7fdfe9..51f153d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3005,7 +3005,7 @@  static inline void lockdep_annotate_inode_mutex_key(struct inode *inode) { };
 #endif
 extern void unlock_new_inode(struct inode *);
 extern void discard_new_inode(struct inode *);
-extern unsigned int get_next_ino(void);
+extern unsigned long get_next_ino(void);
 extern void evict_inodes(struct super_block *sb);

 extern void __iget(struct inode * inode);