[vfs.all,08/26] erofs: prevent direct access of bd_inode

Message ID	20240406090930.2252838-9-yukuai1@huaweicloud.com (mailing list archive)
State	New, archived
Headers	show Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97D7E23767; Sat, 6 Apr 2024 09:17:48 +0000 (UTC) From: Yu Kuai <yukuai1@huaweicloud.com> To: jack@suse.cz, hch@lst.de, brauner@kernel.org, viro@zeniv.linux.org.uk, axboe@kernel.dk Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, yukuai3@huawei.com Subject: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode Date: Sat, 6 Apr 2024 17:09:12 +0800 Message-Id: <20240406090930.2252838-9-yukuai1@huaweicloud.com> In-Reply-To: <20240406090930.2252838-1-yukuai1@huaweicloud.com> References: <20240406090930.2252838-1-yukuai1@huaweicloud.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	fs & block: remove bdev->bd_inode \| expand [vfs.all,00/26] fs & block: remove bdev->bd_inode [vfs.all,01/26] block: move two helpers into bdev.c [vfs.all,02/26] block: remove sync_blockdev_nowait() [vfs.all,03/26] block: remove sync_blockdev_range() [vfs.all,04/26] block: prevent direct access of bd_inode [vfs.all,05/26] block: add a helper bdev_read_folio() [vfs.all,06/26] bcachefs: remove dead function bdev_sectors() [vfs.all,07/26] cramfs: prevent direct access of bd_inode [vfs.all,08/26] erofs: prevent direct access of bd_inode [vfs.all,09/26] nilfs2: prevent direct access of bd_inode [vfs.all,10/26] gfs2: prevent direct access of bd_inode [vfs.all,11/26] btrfs: prevent direct access of bd_inode [vfs.all,12/26] ext4: remove block_device_ejected() [vfs.all,13/26] ext4: prevent direct access of bd_inode [vfs.all,14/26] jbd2: prevent direct access of bd_inode [vfs.all,15/26] s390/dasd: use bdev api in dasd_format() [vfs.all,16/26] bcache: prevent direct access of bd_inode [vfs.all,17/26] block2mtd: prevent direct access of bd_inode [vfs.all,18/26] scsi: use bdev helper in scsi_bios_ptable() [vfs.all,19/26] dm-vdo: convert to use bdev_file [vfs.all,20/26] block: factor out a helper init_bdev_file() [vfs.all,21/26] block: fix module reference leakage from bdev_open_by_dev error path [vfs.all,22/26] block: stash a bdev_file to read/write raw blcok_device [vfs.all,23/26] iomap: add helpers helpers to get and set bdev [vfs.all,24/26] iomap: convert to use bdev_file [vfs.all,25/26] buffer: add helpers to get and set bdev [vfs.all,26/26] buffer: convert to use bdev_file

Yu Kuai April 6, 2024, 9:09 a.m. UTC

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
for the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Al Viro April 7, 2024, 4:05 a.m. UTC | #1

On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode
> for the file.

Looking at the only user of erofs_buf->inode (erofs_bread())...  We
use the inode for two things there - block size calculation (to get
from block number to position in bytes) and access to page cache.
We read in full pages anyway.  And frankly, looking at the callers,
we really would be better off if we passed position in bytes instead
of block number.  IOW, it smells like erofs_bread() having wrong type.

Look at the callers.  With 3 exceptions it's
fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
fs/erofs/zdata.c:943:           src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);

and all of them actually want the return value + erofs_offset(...).  IOW,
we take a linear position (in bytes).  Divide it by block size (from sb).
Pass the factor to erofs_bread(), where we multiply that by block size
(from inode), see which page will that be in, get that page and return a
pointer *into* that page.  Then we again divide the same position
by block size (from sb) and add the remainder to the pointer returned
by erofs_bread().

IOW, it would be much easier to pass the position directly and to hell
with block size logics.  Three exceptions to that pattern:

fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);

Those could bloody well multiply the argument by block size;
the first one (erofs_read_metabuf()) is also interesting - its
callers themselves follow the similar pattern.  So it might be
worth passing it a position in bytes as well...

In any case, all 3 have superblock reference, so they can convert
from blocks to bytes conveniently.  Which means that erofs_bread()
doesn't need to mess with block size considerations at all.

IOW, it might make sense to replace erofs_buf->inode with
pointer to address space.  And use file_mapping() instead of
file_inode() in that patch...

Al Viro April 7, 2024, 4:08 a.m. UTC | #2

On Sun, Apr 07, 2024 at 05:05:31AM +0100, Al Viro wrote:

> IOW, it might make sense to replace erofs_buf->inode with
> pointer to address space.  And use file_mapping() instead of
				     ->f_mapping, that is.
> file_inode() in that patch...
>

Gao Xiang April 11, 2024, 4:13 p.m. UTC | #3

Hi Al,

On 2024/4/7 12:05, Al Viro wrote:
> On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Now that all filesystems stash the bdev file, it's ok to get inode
>> for the file.
> 
> Looking at the only user of erofs_buf->inode (erofs_bread())...  We
> use the inode for two things there - block size calculation (to get
> from block number to position in bytes) and access to page cache.
> We read in full pages anyway.  And frankly, looking at the callers,
> we really would be better off if we passed position in bytes instead
> of block number.  IOW, it smells like erofs_bread() having wrong type.
> 
> Look at the callers.  With 3 exceptions it's
> fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
> fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
> fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
> fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> fs/erofs/zdata.c:943:           src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);
> 
> and all of them actually want the return value + erofs_offset(...).  IOW,
> we take a linear position (in bytes).  Divide it by block size (from sb).
> Pass the factor to erofs_bread(), where we multiply that by block size
> (from inode), see which page will that be in, get that page and return a
> pointer *into* that page.  Then we again divide the same position
> by block size (from sb) and add the remainder to the pointer returned
> by erofs_bread().
> 
> IOW, it would be much easier to pass the position directly and to hell
> with block size logics.  Three exceptions to that pattern:
> 
> fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
> fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
> fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);
> 
> Those could bloody well multiply the argument by block size;
> the first one (erofs_read_metabuf()) is also interesting - its
> callers themselves follow the similar pattern.  So it might be
> worth passing it a position in bytes as well...
> 
> In any case, all 3 have superblock reference, so they can convert
> from blocks to bytes conveniently.  Which means that erofs_bread()
> doesn't need to mess with block size considerations at all.
> 
> IOW, it might make sense to replace erofs_buf->inode with
> pointer to address space.  And use file_mapping() instead of
> file_inode() in that patch...

Just saw this again by chance, which is unexpected.

Yeah, I think that is a good idea.  The story is that erofs_bread()
was derived from a page-based interface:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35

so it was once a page index number.  I think a byte offset will be
a better interface to clean up these, thanks for your time and work
on this!

BTW, sightly off the topic:

I'm little confused why I'm not be looped for this version this time
even:

  1) I explicitly asked to Cc the mailing list so that I could find
     the latest discussion and respond in time:
      https://lore.kernel.org/r/5e04a86d-8bbd-41da-95f6-cf1562ed04f9@linux.alibaba.com

  2) I sent my r-v-b tag on RFC v4 (and the tag was added on this
     version) but I didn't receive this new version.

Thanks,
Gao Xiang

Yu Kuai April 12, 2024, 1:14 a.m. UTC | #4

Hi,

在 2024/04/12 0:13, Gao Xiang 写道:
> Hi Al,
> 
> On 2024/4/7 12:05, Al Viro wrote:
>> On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Now that all filesystems stash the bdev file, it's ok to get inode
>>> for the file.
>>
>> Looking at the only user of erofs_buf->inode (erofs_bread())...  We
>> use the inode for two things there - block size calculation (to get
>> from block number to position in bytes) and access to page cache.
>> We read in full pages anyway.  And frankly, looking at the callers,
>> we really would be better off if we passed position in bytes instead
>> of block number.  IOW, it smells like erofs_bread() having wrong type.
>>
>> Look at the callers.  With 3 exceptions it's
>> fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, 
>> *offset), EROFS_KMAP);
>> fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, 
>> *offset), EROFS_KMAP);
>> fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, 
>> erofs_blknr(sb, it.pos), EROFS_KMAP);
>> fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, 
>> erofs_blknr(sb, it.pos),
>> fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(sb, it->pos),
>> fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(sb, it->pos),
>> fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(it->sb, it->pos),
>> fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(sb, it->pos),
>> fs/erofs/zdata.c:943:           src = erofs_bread(&buf, 
>> erofs_blknr(sb, pos), EROFS_KMAP);
>>
>> and all of them actually want the return value + erofs_offset(...).  IOW,
>> we take a linear position (in bytes).  Divide it by block size (from sb).
>> Pass the factor to erofs_bread(), where we multiply that by block size
>> (from inode), see which page will that be in, get that page and return a
>> pointer *into* that page.  Then we again divide the same position
>> by block size (from sb) and add the remainder to the pointer returned
>> by erofs_bread().
>>
>> IOW, it would be much easier to pass the position directly and to hell
>> with block size logics.  Three exceptions to that pattern:
>>
>> fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
>> fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
>> fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);
>>
>> Those could bloody well multiply the argument by block size;
>> the first one (erofs_read_metabuf()) is also interesting - its
>> callers themselves follow the similar pattern.  So it might be
>> worth passing it a position in bytes as well...
>>
>> In any case, all 3 have superblock reference, so they can convert
>> from blocks to bytes conveniently.  Which means that erofs_bread()
>> doesn't need to mess with block size considerations at all.
>>
>> IOW, it might make sense to replace erofs_buf->inode with
>> pointer to address space.  And use file_mapping() instead of
>> file_inode() in that patch...
> 
> Just saw this again by chance, which is unexpected.
> 
> Yeah, I think that is a good idea.  The story is that erofs_bread()
> was derived from a page-based interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35 
> 
> 
> so it was once a page index number.  I think a byte offset will be
> a better interface to clean up these, thanks for your time and work
> on this!
> 
> BTW, sightly off the topic:
> 
> I'm little confused why I'm not be looped for this version this time
> even:
> 
>   1) I explicitly asked to Cc the mailing list so that I could find
>      the latest discussion and respond in time:
>       
> https://lore.kernel.org/r/5e04a86d-8bbd-41da-95f6-cf1562ed04f9@linux.alibaba.com 
> 
> 
>   2) I sent my r-v-b tag on RFC v4 (and the tag was added on this
>      version) but I didn't receive this new version.

This is my fault to blame, I gave up to cc all address from 
get_maintainer.pl for this set, because I somehow can't send this set
with too much CC. However, I should still CC you.

Thanks,
Kuai

> 
> Thanks,
> Gao Xiang
> .
>

Al Viro April 25, 2024, 7:56 p.m. UTC | #5

On Fri, Apr 12, 2024 at 12:13:42AM +0800, Gao Xiang wrote:
> Hi Al,
> 
> On 2024/4/7 12:05, Al Viro wrote:
> > On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@huawei.com>
> > > 
> > > Now that all filesystems stash the bdev file, it's ok to get inode
> > > for the file.
> > 
> > Looking at the only user of erofs_buf->inode (erofs_bread())...  We
> > use the inode for two things there - block size calculation (to get
> > from block number to position in bytes) and access to page cache.
> > We read in full pages anyway.  And frankly, looking at the callers,
> > we really would be better off if we passed position in bytes instead
> > of block number.  IOW, it smells like erofs_bread() having wrong type.
> > 
> > Look at the callers.  With 3 exceptions it's
> > fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> > fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> > fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
> > fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
> > fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> > fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> > fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
> > fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> > fs/erofs/zdata.c:943:           src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);
> > 
> > and all of them actually want the return value + erofs_offset(...).  IOW,
> > we take a linear position (in bytes).  Divide it by block size (from sb).
> > Pass the factor to erofs_bread(), where we multiply that by block size
> > (from inode), see which page will that be in, get that page and return a
> > pointer *into* that page.  Then we again divide the same position
> > by block size (from sb) and add the remainder to the pointer returned
> > by erofs_bread().
> > 
> > IOW, it would be much easier to pass the position directly and to hell
> > with block size logics.  Three exceptions to that pattern:
> > 
> > fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
> > fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
> > fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);
> > 
> > Those could bloody well multiply the argument by block size;
> > the first one (erofs_read_metabuf()) is also interesting - its
> > callers themselves follow the similar pattern.  So it might be
> > worth passing it a position in bytes as well...
> > 
> > In any case, all 3 have superblock reference, so they can convert
> > from blocks to bytes conveniently.  Which means that erofs_bread()
> > doesn't need to mess with block size considerations at all.
> > 
> > IOW, it might make sense to replace erofs_buf->inode with
> > pointer to address space.  And use file_mapping() instead of
> > file_inode() in that patch...
> 
> Just saw this again by chance, which is unexpected.
> 
> Yeah, I think that is a good idea.  The story is that erofs_bread()
> was derived from a page-based interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35
> 
> so it was once a page index number.  I think a byte offset will be
> a better interface to clean up these, thanks for your time and work
> on this!

FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
minimal conversion of erofs_read_buf() and switch from buf->inode
to buf->mapping, the latter follows that up with massage for
erofs_read_metabuf().

Completely untested; it builds, but that's all I can promise.  Individual
patches in followups.

Al Viro April 25, 2024, 8:08 p.m. UTC | #6

On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:

> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
> minimal conversion of erofs_read_buf() and switch from buf->inode
> to buf->mapping, the latter follows that up with massage for
> erofs_read_metabuf().

First two and last four patches resp.  BTW, what are the intended rules
for inline symlinks?  "Should fit within the same block as the last
byte of on-disk erofs_inode_{compact,extended}"?  Feels like
erofs_read_inode() might be better off if it did copying the symlink
body instead of leaving it to erofs_fill_symlink(), complete with
the sanity checks...  I'd left that logics alone, though - I'm nowhere
near familiar enough with erofs layout.

Gao Xiang April 25, 2024, 9:56 p.m. UTC | #7

Hi Al,

On 2024/4/26 04:08, Al Viro wrote:
> On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:
> 
>> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
>> minimal conversion of erofs_read_buf() and switch from buf->inode
>> to buf->mapping, the latter follows that up with massage for
>> erofs_read_metabuf().
> 
> First two and last four patches resp.  BTW, what are the intended rules
> for inline symlinks?  "Should fit within the same block as the last

symlink on-disk layout follows the same rule of regular files.  The last
logical block can be inlined right after the on-disk inode (called tail
packing inline) or use a separate fs block to keep the symlink if tail
packing inline doesn't fit.

> byte of on-disk erofs_inode_{compact,extended}"?  Feels like
> erofs_read_inode() might be better off if it did copying the symlink
> body instead of leaving it to erofs_fill_symlink(), complete with
> the sanity checks...  I'd left that logics alone, though - I'm nowhere
> near familiar enough with erofs layout.
If I understand correctly, do you mean just fold erofs_fill_symlink()
into the caller?  That is fine with me, I can change this in the
future.

Thanks,
Gao Xiang

Al Viro April 25, 2024, 10:28 p.m. UTC | #8

On Fri, Apr 26, 2024 at 05:56:52AM +0800, Gao Xiang wrote:
> Hi Al,
> 
> On 2024/4/26 04:08, Al Viro wrote:
> > On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:
> > 
> > > FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
> > > minimal conversion of erofs_read_buf() and switch from buf->inode
> > > to buf->mapping, the latter follows that up with massage for
> > > erofs_read_metabuf().
> > 
> > First two and last four patches resp.  BTW, what are the intended rules
> > for inline symlinks?  "Should fit within the same block as the last
> 
> symlink on-disk layout follows the same rule of regular files.  The last
> logical block can be inlined right after the on-disk inode (called tail
> packing inline) or use a separate fs block to keep the symlink if tail
> packing inline doesn't fit.
> 
> > byte of on-disk erofs_inode_{compact,extended}"?  Feels like
> > erofs_read_inode() might be better off if it did copying the symlink
> > body instead of leaving it to erofs_fill_symlink(), complete with
> > the sanity checks...  I'd left that logics alone, though - I'm nowhere
> > near familiar enough with erofs layout.
> If I understand correctly, do you mean just fold erofs_fill_symlink()
> into the caller?  That is fine with me, I can change this in the
> future.

It's just that the calling conventions of erofs_read_inode() feel wrong ;-/
We return a pointer and offset, with (ERR_PTR(...), anything) used to
indicate an error and (pointer into page, offset) used (in case of
fast symlinks and only in case of fast symlinks) to encode the address
of symlink body, with data starting at pointer + offset + vi->xattr_isize
and length being ->i_size, no greater than block size - offset - vi->xattr_size.

If anything, it would be easier to follow (and document) if you had
allocated and filled the symlink body right there in erofs_read_inode().
That way you could lift erofs_put_metabuf() call into erofs_read_inode(),
along with the variable itself.

Perhaps something like void *erofs_read_inode(inode) with
	ERR_PTR(-E...) => error
	NULL => success, not a fast symlink
	pointer to string => success, a fast symlink, body allocated and returned
to caller.

Or, for that matter, have it return an int and stuff the body into ->i_link -
it's just that you'd need to set ->i_op there with such approach.

Not sure, really.  BTW, one comment about erofs_fill_symlink() - it's probably
a good idea to use kmemdup_nul() rather than open-coding it (and do that after
the block overflow check, obviously).

Gao Xiang April 25, 2024, 11:11 p.m. UTC | #9

On 2024/4/26 06:28, Al Viro wrote:
> On Fri, Apr 26, 2024 at 05:56:52AM +0800, Gao Xiang wrote:
>> Hi Al,
>>
>> On 2024/4/26 04:08, Al Viro wrote:
>>> On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:
>>>
>>>> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
>>>> minimal conversion of erofs_read_buf() and switch from buf->inode
>>>> to buf->mapping, the latter follows that up with massage for
>>>> erofs_read_metabuf().
>>>
>>> First two and last four patches resp.  BTW, what are the intended rules
>>> for inline symlinks?  "Should fit within the same block as the last
>>
>> symlink on-disk layout follows the same rule of regular files.  The last
>> logical block can be inlined right after the on-disk inode (called tail
>> packing inline) or use a separate fs block to keep the symlink if tail
>> packing inline doesn't fit.
>>
>>> byte of on-disk erofs_inode_{compact,extended}"?  Feels like
>>> erofs_read_inode() might be better off if it did copying the symlink
>>> body instead of leaving it to erofs_fill_symlink(), complete with
>>> the sanity checks...  I'd left that logics alone, though - I'm nowhere
>>> near familiar enough with erofs layout.
>> If I understand correctly, do you mean just fold erofs_fill_symlink()
>> into the caller?  That is fine with me, I can change this in the
>> future.
> 
> It's just that the calling conventions of erofs_read_inode() feel wrong ;-/
> We return a pointer and offset, with (ERR_PTR(...), anything) used to
> indicate an error and (pointer into page, offset) used (in case of
> fast symlinks and only in case of fast symlinks) to encode the address
> of symlink body, with data starting at pointer + offset + vi->xattr_isize
> and length being ->i_size, no greater than block size - offset - vi->xattr_size.
> 
> If anything, it would be easier to follow (and document) if you had
> allocated and filled the symlink body right there in erofs_read_inode().
> That way you could lift erofs_put_metabuf() call into erofs_read_inode(),
> along with the variable itself.
> 
> Perhaps something like void *erofs_read_inode(inode) with
> 	ERR_PTR(-E...) => error
> 	NULL => success, not a fast symlink
> 	pointer to string => success, a fast symlink, body allocated and returned
> to caller.

Got it.  my original plan was that erofs_read_inode() didn't need
to handle inode->i_mode stuffs (IOWs, different type of inodes).

But I think symlink i_link cases can be handled specifically in
erofs_read_inode().

> 
> Or, for that matter, have it return an int and stuff the body into ->i_link -
> it's just that you'd need to set ->i_op there with such approach.
> 
> Not sure, really.  BTW, one comment about erofs_fill_symlink() - it's probably
> a good idea to use kmemdup_nul() rather than open-coding it (and do that after
> the block overflow check, obviously).

Yes, let me handle all of this later.

Thanks,
Gao Xiang

Gao Xiang April 25, 2024, 11:22 p.m. UTC | #10

Hi Al,

On 2024/4/26 03:56, Al Viro wrote:
> On Fri, Apr 12, 2024 at 12:13:42AM +0800, Gao Xiang wrote:

...

>>
>> Just saw this again by chance, which is unexpected.
>>
>> Yeah, I think that is a good idea.  The story is that erofs_bread()
>> was derived from a page-based interface:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35
>>
>> so it was once a page index number.  I think a byte offset will be
>> a better interface to clean up these, thanks for your time and work
>> on this!
> 
> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
> minimal conversion of erofs_read_buf() and switch from buf->inode
> to buf->mapping, the latter follows that up with massage for
> erofs_read_metabuf().
> 
> Completely untested; it builds, but that's all I can promise.  Individual
> patches in followups.

Thanks for so much time on this, I will review/test/feedback
these patches by the end of this week since an internal project
for my employer is also ongoing.

Thanks,
Gao Xiang

[vfs.all,08/26] erofs: prevent direct access of bd_inode

Commit Message

Comments

Patch