mbox series

[v17,00/10] NTFS read-write driver GPL implementation by Paragon Software

Message ID 20201231152401.3162425-1-almaz.alexandrovich@paragon-software.com (mailing list archive)
Headers show
Series NTFS read-write driver GPL implementation by Paragon Software | expand

Message

Konstantin Komarov Dec. 31, 2020, 3:23 p.m. UTC
This patch adds NTFS Read-Write driver to fs/ntfs3.

Having decades of expertise in commercial file systems development and huge
test coverage, we at Paragon Software GmbH want to make our contribution to
the Open Source Community by providing implementation of NTFS Read-Write
driver for the Linux Kernel.

This is fully functional NTFS Read-Write driver. Current version works with
NTFS(including v3.1) and normal/compressed/sparse files and supports journal replaying.

We plan to support this version after the codebase once merged, and add new
features and fix bugs. For example, full journaling support over JBD will be
added in later updates.

v2:
 - patch splitted to chunks (file-wise)
 - build issues fixed
 - sparse and checkpatch.pl errors fixed
 - NULL pointer dereference on mkfs.ntfs-formatted volume mount fixed
 - cosmetics + code cleanup

v3:
 - added acl, noatime, no_acs_rules, prealloc mount options
 - added fiemap support
 - fixed encodings support
 - removed typedefs
 - adapted Kernel-way logging mechanisms
 - fixed typos and corner-case issues

v4:
 - atomic_open() refactored
 - code style updated
 - bugfixes

v5:
- nls/nls_alt mount options added
- Unicode conversion fixes
- Improved very fragmented files operations
- logging cosmetics

v6:
- Security Descriptors processing changed
  added system.ntfs_security xattr to set
  SD
- atomic_open() optimized
- cosmetics

v7:
- Security Descriptors validity checks added (by Mark Harmstone)
- atomic_open() fixed for the compressed file creation with directio
  case
- remount support
- temporarily removed readahead usage
- cosmetics

v8:
- Compressed files operations fixed

v9:
- Further cosmetics applied as suggested
by Joe Perches

v10:
- operations with compressed/sparse files on very fragmented volumes improved
- reduced memory consumption for above cases

v11:
- further compressed files optimizations: reads/writes are now skipping bufferization
- journal wipe to the initial state optimized (bufferization is also skipped)
- optimized run storage (re-packing cluster metainformation)
- fixes based on Matthew Wilcox feedback to the v10
- compressed/sparse/normal could be set for empty files with 'system.ntfs_attrib' xattr

v12:
- nls_alt mount option removed after discussion with Pali Rohar
- fixed ni_repack()
- fixed resident files transition to non-resident when size increasing

v13:
- nested_lock fix (lockdep)
- out-of-bounds read fix (KASAN warning)
- resident->nonresident transition fixed for compressed files
- load_nls() missed fix applied
- some sparse utility warnings fixes

v14:
- support for additional compression types (we've adapted WIMLIB's
  implementation, authored by Eric Biggers, into ntfs3)

v15:
- kernel test robot warnings fixed
- lzx/xpress compression license headers updated

v16:
- lzx/xpress moved to initial ntfs-3g plugin code
- mutexes instead of a global spinlock for compresions
- FALLOC_FL_PUNCH_HOLE and FALLOC_FL_COLLAPSE_RANGE implemented
- CONFIG_NTFS3_FS_POSIX_ACL added

v17:
- FALLOC_FL_COLLAPSE_RANGE fixed
- fixes for Mattew Wilcox's and Andy Lavr's concerns

Konstantin Komarov (10):
  fs/ntfs3: Add headers and misc files
  fs/ntfs3: Add initialization of super block
  fs/ntfs3: Add bitmap
  fs/ntfs3: Add file operations and implementation
  fs/ntfs3: Add attrib operations
  fs/ntfs3: Add compression
  fs/ntfs3: Add NTFS journal
  fs/ntfs3: Add Kconfig, Makefile and doc
  fs/ntfs3: Add NTFS3 in fs/Kconfig and fs/Makefile
  fs/ntfs3: Add MAINTAINERS

 Documentation/filesystems/ntfs3.rst |  107 +
 MAINTAINERS                         |    7 +
 fs/Kconfig                          |    1 +
 fs/Makefile                         |    1 +
 fs/ntfs3/Kconfig                    |   41 +
 fs/ntfs3/Makefile                   |   31 +
 fs/ntfs3/attrib.c                   | 2081 +++++++++++
 fs/ntfs3/attrlist.c                 |  463 +++
 fs/ntfs3/bitfunc.c                  |  135 +
 fs/ntfs3/bitmap.c                   | 1504 ++++++++
 fs/ntfs3/debug.h                    |   61 +
 fs/ntfs3/dir.c                      |  570 +++
 fs/ntfs3/file.c                     | 1140 ++++++
 fs/ntfs3/frecord.c                  | 3088 ++++++++++++++++
 fs/ntfs3/fslog.c                    | 5220 +++++++++++++++++++++++++++
 fs/ntfs3/fsntfs.c                   | 2527 +++++++++++++
 fs/ntfs3/index.c                    | 2662 ++++++++++++++
 fs/ntfs3/inode.c                    | 2051 +++++++++++
 fs/ntfs3/lib/decompress_common.c    |  332 ++
 fs/ntfs3/lib/decompress_common.h    |  352 ++
 fs/ntfs3/lib/lib.h                  |   26 +
 fs/ntfs3/lib/lzx_decompress.c       |  683 ++++
 fs/ntfs3/lib/xpress_decompress.c    |  155 +
 fs/ntfs3/lznt.c                     |  452 +++
 fs/ntfs3/namei.c                    |  590 +++
 fs/ntfs3/ntfs.h                     | 1237 +++++++
 fs/ntfs3/ntfs_fs.h                  | 1049 ++++++
 fs/ntfs3/record.c                   |  614 ++++
 fs/ntfs3/run.c                      | 1254 +++++++
 fs/ntfs3/super.c                    | 1477 ++++++++
 fs/ntfs3/upcase.c                   |   77 +
 fs/ntfs3/xattr.c                    | 1072 ++++++
 32 files changed, 31060 insertions(+)
 create mode 100644 Documentation/filesystems/ntfs3.rst
 create mode 100644 fs/ntfs3/Kconfig
 create mode 100644 fs/ntfs3/Makefile
 create mode 100644 fs/ntfs3/attrib.c
 create mode 100644 fs/ntfs3/attrlist.c
 create mode 100644 fs/ntfs3/bitfunc.c
 create mode 100644 fs/ntfs3/bitmap.c
 create mode 100644 fs/ntfs3/debug.h
 create mode 100644 fs/ntfs3/dir.c
 create mode 100644 fs/ntfs3/file.c
 create mode 100644 fs/ntfs3/frecord.c
 create mode 100644 fs/ntfs3/fslog.c
 create mode 100644 fs/ntfs3/fsntfs.c
 create mode 100644 fs/ntfs3/index.c
 create mode 100644 fs/ntfs3/inode.c
 create mode 100644 fs/ntfs3/lib/decompress_common.c
 create mode 100644 fs/ntfs3/lib/decompress_common.h
 create mode 100644 fs/ntfs3/lib/lib.h
 create mode 100644 fs/ntfs3/lib/lzx_decompress.c
 create mode 100644 fs/ntfs3/lib/xpress_decompress.c
 create mode 100644 fs/ntfs3/lznt.c
 create mode 100644 fs/ntfs3/namei.c
 create mode 100644 fs/ntfs3/ntfs.h
 create mode 100644 fs/ntfs3/ntfs_fs.h
 create mode 100644 fs/ntfs3/record.c
 create mode 100644 fs/ntfs3/run.c
 create mode 100644 fs/ntfs3/super.c
 create mode 100644 fs/ntfs3/upcase.c
 create mode 100644 fs/ntfs3/xattr.c


base-commit: f6e1ea19649216156576aeafa784e3b4cee45549

Comments

Kari Argillander Jan. 3, 2021, 9:57 p.m. UTC | #1
On Thu, Dec 31, 2020 at 06:23:55PM +0300, Konstantin Komarov wrote:
> This adds file operations and implementation
> 
> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
> ---
>  fs/ntfs3/dir.c     |  570 ++++++++
>  fs/ntfs3/file.c    | 1140 ++++++++++++++++
>  fs/ntfs3/frecord.c | 3088 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/ntfs3/namei.c   |  590 +++++++++
>  fs/ntfs3/record.c  |  614 +++++++++
>  fs/ntfs3/run.c     | 1254 ++++++++++++++++++
>  6 files changed, 7256 insertions(+)
>  create mode 100644 fs/ntfs3/dir.c
>  create mode 100644 fs/ntfs3/file.c
>  create mode 100644 fs/ntfs3/frecord.c
>  create mode 100644 fs/ntfs3/namei.c
>  create mode 100644 fs/ntfs3/record.c
>  create mode 100644 fs/ntfs3/run.c
> 
> diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c

> +int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
> +{
> +	return generic_file_fsync(filp, start, end, datasync);
> +}

Do we have a reson why we implement this if we just use generic. Isn't
it more clear if we use generic fsync straight away?

> +static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
> +{

> +	/* Return error if mode is not supported */
> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> +		     FALLOC_FL_COLLAPSE_RANGE))
> +		return -EOPNOTSUPP;

> +
> +	if (mode & FALLOC_FL_PUNCH_HOLE) {

> +	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {

> +	} else {
> +		/*
> +		 * normal file: allocate clusters, do not change 'valid' size
> +		 */
> +		err = ntfs_set_size(inode, max(end, i_size));
> +		if (err)
> +			goto out;
> +
> +		if (is_sparsed(ni) || is_compressed(ni)) {
> +			CLST vcn_v = ni->i_valid >> sbi->cluster_bits;
> +			CLST vcn = vbo >> sbi->cluster_bits;
> +			CLST cend = bytes_to_cluster(sbi, end);
> +			CLST lcn, clen;
> +			bool new;
> +
> +			/*
> +			 * allocate but not zero new clusters (see below comments)
> +			 * this breaks security (one can read unused on-disk areas)
> +			 * zeroing these clusters may be too long
> +			 * may be we should check here for root rights?
> +			 */
> +			for (; vcn < cend; vcn += clen) {
> +				err = attr_data_get_block(ni, vcn, cend - vcn,
> +							  &lcn, &clen, &new);
> +				if (err)
> +					goto out;
> +				if (!new || vcn >= vcn_v)
> +					continue;
> +
> +				/*
> +				 * Unwritten area
> +				 * NTFS is not able to store several unwritten areas
> +				 * Activate 'ntfs_sparse_cluster' to zero new allocated clusters
> +				 *
> +				 * Dangerous in case:
> +				 * 1G of sparsed clusters + 1 cluster of data =>
> +				 * valid_size == 1G + 1 cluster
> +				 * fallocate(1G) will zero 1G and this can be very long
> +				 * xfstest 016/086 will fail whithout 'ntfs_sparse_cluster'
> +				 */
> +				/*ntfs_sparse_cluster(inode, NULL, vcn,
> +				 *		    min(vcn_v - vcn, clen));
> +				 */
> +			}
> +		}
> +
> +		if (mode & FALLOC_FL_KEEP_SIZE) {

Isn't this hole else already (mode & FALLOC_FL_KEEP_SIZE?

> +			ni_lock(ni);
> +			/*true - keep preallocated*/
> +			err = attr_set_size(ni, ATTR_DATA, NULL, 0,
> +					    &ni->file.run, i_size, &ni->i_valid,
> +					    true, NULL);
> +			ni_unlock(ni);
> +		}
> +	}
> +
> +	if (!err) {
> +		inode->i_ctime = inode->i_mtime = current_time(inode);
> +		mark_inode_dirty(inode);
> +	}
> +out:
> +	if (err == -EFBIG)
> +		err = -ENOSPC;
> +
> +	inode_unlock(inode);
> +	return err;
> +}

> diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c

> +int mi_get(struct ntfs_sb_info *sbi, CLST rno, struct mft_inode **mi)
> +{
> +	int err;
> +	struct mft_inode *m = ntfs_alloc(sizeof(struct mft_inode), 1);
> +
> +	if (!m)
> +		return -ENOMEM;
> +
> +	err = mi_init(m, sbi, rno);

If error happend should we just free end exit. Now we call mi_put() to
clean up.

> +	if (!err)
> +		err = mi_read(m, false);
> +
> +	if (err) {
> +		mi_put(m);
> +		return err;
> +	}
> +
> +	*mi = m;
> +	return 0;
> +}

> +struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr)
> +{
> +	const struct MFT_REC *rec = mi->mrec;
> +	u32 used = le32_to_cpu(rec->used);
> +	u32 t32, off, asize;
> +	u16 t16;
> +
> +	if (!attr) {
> +		u32 total = le32_to_cpu(rec->total);
> +
> +		off = le16_to_cpu(rec->attr_off);
> +
> +		if (used > total)
> +			goto out;
> +
> +		if (off >= used || off < MFTRECORD_FIXUP_OFFSET_1 ||
> +		    !IsDwordAligned(off)) {
> +			goto out;
> +		}
> +
> +		/* Skip non-resident records */
> +		if (!is_rec_inuse(rec))
> +			goto out;
> +
> +		attr = Add2Ptr(rec, off);
> +	} else {
> +		/* Check if input attr inside record */
> +		off = PtrOffset(rec, attr);
> +		if (off >= used)
> +			goto out;
> +
> +		asize = le32_to_cpu(attr->size);
> +		if (asize < SIZEOF_RESIDENT)
> +			goto out;
> +
> +		attr = Add2Ptr(attr, asize);
> +		off += asize;
> +	}
> +
> +	asize = le32_to_cpu(attr->size);
> +
> +	/* Can we use the first field (attr->type) */
> +	if (off + 8 > used) {
> +		static_assert(QuadAlign(sizeof(enum ATTR_TYPE)) == 8);
> +		goto out;
> +	}
> +
> +	if (attr->type == ATTR_END) {
> +		if (used != off + 8)
> +			goto out;

This if is not needed if there is return NULL after. But return
NULL might also be bug.

> +		return NULL;
> +	}
> +
> +	t32 = le32_to_cpu(attr->type);
> +	if ((t32 & 0xf) || (t32 > 0x100))
> +		goto out;
> +
> +	/* Check boundary */
> +	if (off + asize > used)
> +		goto out;
> +
> +	/* Check size of attribute */
> +	if (!attr->non_res) {
> +		if (asize < SIZEOF_RESIDENT)
> +			goto out;
> +
> +		t16 = le16_to_cpu(attr->res.data_off);
> +
> +		if (t16 > asize)
> +			goto out;
> +
> +		t32 = le32_to_cpu(attr->res.data_size);
> +		if (t16 + t32 > asize)
> +			goto out;
> +
> +		return attr;
> +	}
> +
> +	/* Check some nonresident fields */
> +	if (attr->name_len &&
> +	    le16_to_cpu(attr->name_off) + sizeof(short) * attr->name_len >
> +		    le16_to_cpu(attr->nres.run_off)) {
> +		goto out;
> +	}
> +
> +	if (attr->nres.svcn || !is_attr_ext(attr)) {
> +		if (asize + 8 < SIZEOF_NONRESIDENT)
> +			goto out;
> +
> +		if (attr->nres.c_unit)
> +			goto out;
> +	} else if (asize + 8 < SIZEOF_NONRESIDENT_EX)
> +		goto out;
> +
> +	return attr;
> +
> +out:
> +	return NULL;
> +}

> diff --git a/fs/ntfs3/run.c b/fs/ntfs3/run.c

> +static inline int run_packed_size(const s64 *n)
> +{
> +#ifdef __BIG_ENDIAN

These are whole functions with ifdef. It would be maybe more clear
that there really is whole functions to both endiand.

> +	const u8 *p = (const u8 *)n + sizeof(*n) - 1;
> +
> +	if (*n >= 0) {
> +		if (p[-7] || p[-6] || p[-5] || p[-4])
> +			p -= 4;
> +		if (p[-3] || p[-2])
> +			p -= 2;
> +		if (p[-1])
> +			p -= 1;
> +		if (p[0] & 0x80)
> +			p -= 1;
> +	} else {
> +		if (p[-7] != 0xff || p[-6] != 0xff || p[-5] != 0xff ||
> +		    p[-4] != 0xff)
> +			p -= 4;
> +		if (p[-3] != 0xff || p[-2] != 0xff)
> +			p -= 2;
> +		if (p[-1] != 0xff)
> +			p -= 1;
> +		if (!(p[0] & 0x80))
> +			p -= 1;
> +	}
> +	return (const u8 *)n + sizeof(*n) - p;

}
#else
static inline int run_packed_size(const s64 *n)
{

Something like this.

> +	const u8 *p = (const u8 *)n;
> +
> +	if (*n >= 0) {
> +		if (p[7] || p[6] || p[5] || p[4])
> +			p += 4;
> +		if (p[3] || p[2])
> +			p += 2;
> +		if (p[1])
> +			p += 1;
> +		if (p[0] & 0x80)
> +			p += 1;
> +	} else {
> +		if (p[7] != 0xff || p[6] != 0xff || p[5] != 0xff ||
> +		    p[4] != 0xff)
> +			p += 4;
> +		if (p[3] != 0xff || p[2] != 0xff)
> +			p += 2;
> +		if (p[1] != 0xff)
> +			p += 1;
> +		if (!(p[0] & 0x80))
> +			p += 1;
> +	}
> +
> +	return 1 + p - (const u8 *)n;
> +#endif
> +}
> +
> +/*
> + * run_pack
> + *
> + * packs runs into buffer
> + * packed_vcns - how much runs we have packed
> + * packed_size - how much bytes we have used run_buf
> + */
> +int run_pack(const struct runs_tree *run, CLST svcn, CLST len, u8 *run_buf,
> +	     u32 run_buf_size, CLST *packed_vcns)
> +{
> +	CLST next_vcn, vcn, lcn;
> +	CLST prev_lcn = 0;
> +	CLST evcn1 = svcn + len;
> +	int packed_size = 0;
> +	size_t i;
> +	bool ok;
> +	s64 dlcn, len64;
> +	int offset_size, size_size, t;
> +	const u8 *p;
> +
> +	next_vcn = vcn = svcn;
> +
> +	*packed_vcns = 0;
> +
> +	if (!len)
> +		goto out;
> +
> +	ok = run_lookup_entry(run, vcn, &lcn, &len, &i);
> +
> +	if (!ok)
> +		goto error;
> +
> +	if (next_vcn != vcn)
> +		goto error;
> +
> +	for (;;) {
> +		/* offset of current fragment relatively to previous fragment */
> +		dlcn = 0;

This dlcn

> +		next_vcn = vcn + len;
> +
> +		if (next_vcn > evcn1)
> +			len = evcn1 - vcn;
> +
> +		/*
> +		 * mirror of len, but signed, because run_packed_size()
> +		 * works with signed int only
> +		 */
> +		len64 = len;
> +
> +		/* how much bytes is packed len64 */
> +		size_size = run_packed_size(&len64);

Does (s64 *)&len work just fine?

> +
> +		/* offset_size - how much bytes is packed dlcn */
> +		if (lcn == SPARSE_LCN) {
> +			offset_size = 0;

dlcn might be better to live here?

> +		} else {
> +			/* NOTE: lcn can be less than prev_lcn! */
> +			dlcn = (s64)lcn - prev_lcn;
> +			offset_size = run_packed_size(&dlcn);
> +			prev_lcn = lcn;
> +		}
> +
> +		t = run_buf_size - packed_size - 2 - offset_size;
> +		if (t <= 0)
> +			goto out;
> +
> +		/* can we store this entire run */
> +		if (t < size_size)
> +			goto out;
> +
> +		if (run_buf) {
> +			p = (u8 *)&len64;
> +
> +			/* pack run header */
> +			run_buf[0] = ((u8)(size_size | (offset_size << 4)));
> +			run_buf += 1;
> +
> +			/* Pack the length of run */
> +			switch (size_size) {
> +#ifdef __BIG_ENDIAN
> +			case 8:
> +				run_buf[7] = p[0];
> +				fallthrough;
> +			case 7:
> +				run_buf[6] = p[1];
> +				fallthrough;
> +			case 6:
> +				run_buf[5] = p[2];
> +				fallthrough;
> +			case 5:
> +				run_buf[4] = p[3];
> +				fallthrough;
> +			case 4:
> +				run_buf[3] = p[4];
> +				fallthrough;
> +			case 3:
> +				run_buf[2] = p[5];
> +				fallthrough;
> +			case 2:
> +				run_buf[1] = p[6];
> +				fallthrough;
> +			case 1:
> +				run_buf[0] = p[7];
> +#else
> +			case 8:
> +				run_buf[7] = p[7];
> +				fallthrough;
> +			case 7:
> +				run_buf[6] = p[6];
> +				fallthrough;
> +			case 6:
> +				run_buf[5] = p[5];
> +				fallthrough;
> +			case 5:
> +				run_buf[4] = p[4];
> +				fallthrough;
> +			case 4:
> +				run_buf[3] = p[3];
> +				fallthrough;
> +			case 3:
> +				run_buf[2] = p[2];
> +				fallthrough;
> +			case 2:
> +				run_buf[1] = p[1];
> +				fallthrough;
> +			case 1:
> +				run_buf[0] = p[0];
> +#endif
> +			}

Why is this not own function? We use this like 5 places. Also isn't
little endian just memcopy()

> +
> +			run_buf += size_size;
> +			p = (u8 *)&dlcn;

I think that when we have function for that switch tmp p is not needed
anymore.

> +
> +			/* Pack the offset from previous lcn */
> +			switch (offset_size) {
> +#ifdef __BIG_ENDIAN
> +			case 8:
> +				run_buf[7] = p[0];
> +				fallthrough;
> +			case 7:
> +				run_buf[6] = p[1];
> +				fallthrough;
> +			case 6:
> +				run_buf[5] = p[2];
> +				fallthrough;
> +			case 5:
> +				run_buf[4] = p[3];
> +				fallthrough;
> +			case 4:
> +				run_buf[3] = p[4];
> +				fallthrough;
> +			case 3:
> +				run_buf[2] = p[5];
> +				fallthrough;
> +			case 2:
> +				run_buf[1] = p[6];
> +				fallthrough;
> +			case 1:
> +				run_buf[0] = p[7];
> +#else
> +			case 8:
> +				run_buf[7] = p[7];
> +				fallthrough;
> +			case 7:
> +				run_buf[6] = p[6];
> +				fallthrough;
> +			case 6:
> +				run_buf[5] = p[5];
> +				fallthrough;
> +			case 5:
> +				run_buf[4] = p[4];
> +				fallthrough;
> +			case 4:
> +				run_buf[3] = p[3];
> +				fallthrough;
> +			case 3:
> +				run_buf[2] = p[2];
> +				fallthrough;
> +			case 2:
> +				run_buf[1] = p[1];
> +				fallthrough;
> +			case 1:
> +				run_buf[0] = p[0];
> +#endif
> +			}

> +int run_get_highest_vcn(CLST vcn, const u8 *run_buf, u64 *highest_vcn)
> +{

> +		/* skip size_size */
> +		run += size_size;
> +
> +		if (!len)
> +			goto error;
> +
> +		run += offset_size;

Can this be straight
run += size_size + offset_size;

> +
> +#ifdef NTFS3_64BIT_CLUSTER
> +		if ((vcn >> 32) || (len >> 32))
> +			goto error;
> +#endif
> +		vcn64 += len;
> +	}
> +
> +	*highest_vcn = vcn64 - 1;
> +	return 0;
> +}
Matthew Wilcox Jan. 3, 2021, 10:04 p.m. UTC | #2
On Sun, Jan 03, 2021 at 11:57:32PM +0200, Kari Argillander wrote:
> > +		/*
> > +		 * mirror of len, but signed, because run_packed_size()
> > +		 * works with signed int only
> > +		 */
> > +		len64 = len;
> > +
> > +		/* how much bytes is packed len64 */
> > +		size_size = run_packed_size(&len64);
> 
> Does (s64 *)&len work just fine?

No.  run_packed_size() is going to load/store eight bytes to/from that
pointer.  You can't just cast a pointer to a different size and expect
it to work (it might happen to work, particularly on little-endian,
but big-endian is going to get completely the wrong value).
Matthew Wilcox Jan. 4, 2021, 2:18 a.m. UTC | #3
On Thu, Dec 31, 2020 at 06:23:55PM +0300, Konstantin Komarov wrote:
> +/*
> + * file_operations::iterate_shared
> + *
> + * Use non sorted enumeration.
> + * We have an example of broken volume where sorted enumeration
> + * counts each name twice
> + */
> +static int ntfs_readdir(struct file *file, struct dir_context *ctx)
> +{
> +	const struct INDEX_ROOT *root;
> +	u64 vbo;
> +	size_t bit;
> +	loff_t eod;
> +	int err = 0;
> +	struct inode *dir = file_inode(file);
> +	struct ntfs_inode *ni = ntfs_i(dir);
> +	struct super_block *sb = dir->i_sb;
> +	struct ntfs_sb_info *sbi = sb->s_fs_info;
> +	loff_t i_size = i_size_read(dir);
> +	u32 pos = ctx->pos;
> +	u8 *name = NULL;
> +	struct indx_node *node = NULL;
> +	u8 index_bits = ni->dir.index_bits;
> +
> +	/* name is a buffer of PATH_MAX length */
> +	static_assert(NTFS_NAME_LEN * 4 < PATH_MAX);
> +
> +	eod = i_size + sbi->record_size;
> +
> +	if (pos >= eod)
> +		return 0;
> +
> +	if (!dir_emit_dots(file, ctx))
> +		return 0;
> +
> +	/* allocate PATH_MAX bytes */
> +	name = __getname();
> +	if (!name)
> +		return -ENOMEM;
> +
> +	ni_lock(ni);

What is this lock protecting?
Konstantin Komarov Jan. 18, 2021, 10 a.m. UTC | #4
From: Kari Argillander <kari.argillander@gmail.com>
Sent: Monday, January 4, 2021 12:58 AM
> To: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
> Cc: linux-fsdevel@vger.kernel.org; viro@zeniv.linux.org.uk; linux-kernel@vger.kernel.org; pali@kernel.org; dsterba@suse.cz;
> aaptel@suse.com; willy@infradead.org; rdunlap@infradead.org; joe@perches.com; mark@harmstone.com; nborisov@suse.com;
> linux-ntfs-dev@lists.sourceforge.net; anton@tuxera.com; dan.carpenter@oracle.com; hch@lst.de; ebiggers@kernel.org;
> andy.lavr@gmail.com
> Subject: Re: [PATCH v17 04/10] fs/ntfs3: Add file operations and implementation
> 
> On Thu, Dec 31, 2020 at 06:23:55PM +0300, Konstantin Komarov wrote:
> > This adds file operations and implementation
> >
> > Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
> > ---
> >  fs/ntfs3/dir.c     |  570 ++++++++
> >  fs/ntfs3/file.c    | 1140 ++++++++++++++++
> >  fs/ntfs3/frecord.c | 3088 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ntfs3/namei.c   |  590 +++++++++
> >  fs/ntfs3/record.c  |  614 +++++++++
> >  fs/ntfs3/run.c     | 1254 ++++++++++++++++++
> >  6 files changed, 7256 insertions(+)
> >  create mode 100644 fs/ntfs3/dir.c
> >  create mode 100644 fs/ntfs3/file.c
> >  create mode 100644 fs/ntfs3/frecord.c
> >  create mode 100644 fs/ntfs3/namei.c
> >  create mode 100644 fs/ntfs3/record.c
> >  create mode 100644 fs/ntfs3/run.c
> >
> > diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> 
> > +int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
> > +{
> > +	return generic_file_fsync(filp, start, end, datasync);
> > +}
> 
> Do we have a reson why we implement this if we just use generic. Isn't
> it more clear if we use generic fsync straight away?
> 
Hi! Indeed. Migration to the generic fsync will be introduced in v18.

> > +static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
> > +{
> 
> > +	/* Return error if mode is not supported */
> > +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > +		     FALLOC_FL_COLLAPSE_RANGE))
> > +		return -EOPNOTSUPP;
> 
> > +
> > +	if (mode & FALLOC_FL_PUNCH_HOLE) {
> 
> > +	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
> 
> > +	} else {
> > +		/*
> > +		 * normal file: allocate clusters, do not change 'valid' size
> > +		 */
> > +		err = ntfs_set_size(inode, max(end, i_size));
> > +		if (err)
> > +			goto out;
> > +
> > +		if (is_sparsed(ni) || is_compressed(ni)) {
> > +			CLST vcn_v = ni->i_valid >> sbi->cluster_bits;
> > +			CLST vcn = vbo >> sbi->cluster_bits;
> > +			CLST cend = bytes_to_cluster(sbi, end);
> > +			CLST lcn, clen;
> > +			bool new;
> > +
> > +			/*
> > +			 * allocate but not zero new clusters (see below comments)
> > +			 * this breaks security (one can read unused on-disk areas)
> > +			 * zeroing these clusters may be too long
> > +			 * may be we should check here for root rights?
> > +			 */
> > +			for (; vcn < cend; vcn += clen) {
> > +				err = attr_data_get_block(ni, vcn, cend - vcn,
> > +							  &lcn, &clen, &new);
> > +				if (err)
> > +					goto out;
> > +				if (!new || vcn >= vcn_v)
> > +					continue;
> > +
> > +				/*
> > +				 * Unwritten area
> > +				 * NTFS is not able to store several unwritten areas
> > +				 * Activate 'ntfs_sparse_cluster' to zero new allocated clusters
> > +				 *
> > +				 * Dangerous in case:
> > +				 * 1G of sparsed clusters + 1 cluster of data =>
> > +				 * valid_size == 1G + 1 cluster
> > +				 * fallocate(1G) will zero 1G and this can be very long
> > +				 * xfstest 016/086 will fail whithout 'ntfs_sparse_cluster'
> > +				 */
> > +				/*ntfs_sparse_cluster(inode, NULL, vcn,
> > +				 *		    min(vcn_v - vcn, clen));
> > +				 */
> > +			}
> > +		}
> > +
> > +		if (mode & FALLOC_FL_KEEP_SIZE) {
> 
> Isn't this hole else already (mode & FALLOC_FL_KEEP_SIZE?

Sorry, can you please clarify your question? Not sure, understood it.
> 
> > +			ni_lock(ni);
> > +			/*true - keep preallocated*/
> > +			err = attr_set_size(ni, ATTR_DATA, NULL, 0,
> > +					    &ni->file.run, i_size, &ni->i_valid,
> > +					    true, NULL);
> > +			ni_unlock(ni);
> > +		}
> > +	}
> > +
> > +	if (!err) {
> > +		inode->i_ctime = inode->i_mtime = current_time(inode);
> > +		mark_inode_dirty(inode);
> > +	}
> > +out:
> > +	if (err == -EFBIG)
> > +		err = -ENOSPC;
> > +
> > +	inode_unlock(inode);
> > +	return err;
> > +}
> 
> > diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
> 
> > +int mi_get(struct ntfs_sb_info *sbi, CLST rno, struct mft_inode **mi)
> > +{
> > +	int err;
> > +	struct mft_inode *m = ntfs_alloc(sizeof(struct mft_inode), 1);
> > +
> > +	if (!m)
> > +		return -ENOMEM;
> > +
> > +	err = mi_init(m, sbi, rno);
> 
> If error happend should we just free end exit. Now we call mi_put() to
> clean up.
> 

Done, will be fixed in v18.

> > +	if (!err)
> > +		err = mi_read(m, false);
> > +
> > +	if (err) {
> > +		mi_put(m);
> > +		return err;
> > +	}
> > +
> > +	*mi = m;
> > +	return 0;
> > +}
> 
> > +struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr)
> > +{
> > +	const struct MFT_REC *rec = mi->mrec;
> > +	u32 used = le32_to_cpu(rec->used);
> > +	u32 t32, off, asize;
> > +	u16 t16;
> > +
> > +	if (!attr) {
> > +		u32 total = le32_to_cpu(rec->total);
> > +
> > +		off = le16_to_cpu(rec->attr_off);
> > +
> > +		if (used > total)
> > +			goto out;
> > +
> > +		if (off >= used || off < MFTRECORD_FIXUP_OFFSET_1 ||
> > +		    !IsDwordAligned(off)) {
> > +			goto out;
> > +		}
> > +
> > +		/* Skip non-resident records */
> > +		if (!is_rec_inuse(rec))
> > +			goto out;
> > +
> > +		attr = Add2Ptr(rec, off);
> > +	} else {
> > +		/* Check if input attr inside record */
> > +		off = PtrOffset(rec, attr);
> > +		if (off >= used)
> > +			goto out;
> > +
> > +		asize = le32_to_cpu(attr->size);
> > +		if (asize < SIZEOF_RESIDENT)
> > +			goto out;
> > +
> > +		attr = Add2Ptr(attr, asize);
> > +		off += asize;
> > +	}
> > +
> > +	asize = le32_to_cpu(attr->size);
> > +
> > +	/* Can we use the first field (attr->type) */
> > +	if (off + 8 > used) {
> > +		static_assert(QuadAlign(sizeof(enum ATTR_TYPE)) == 8);
> > +		goto out;
> > +	}
> > +
> > +	if (attr->type == ATTR_END) {
> > +		if (used != off + 8)
> > +			goto out;
> 
> This if is not needed if there is return NULL after. But return
> NULL might also be bug.
> 

Thanks, will also be fixed in v18.

> > +		return NULL;
> > +	}
> > +
> > +	t32 = le32_to_cpu(attr->type);
> > +	if ((t32 & 0xf) || (t32 > 0x100))
> > +		goto out;
> > +
> > +	/* Check boundary */
> > +	if (off + asize > used)
> > +		goto out;
> > +
> > +	/* Check size of attribute */
> > +	if (!attr->non_res) {
> > +		if (asize < SIZEOF_RESIDENT)
> > +			goto out;
> > +
> > +		t16 = le16_to_cpu(attr->res.data_off);
> > +
> > +		if (t16 > asize)
> > +			goto out;
> > +
> > +		t32 = le32_to_cpu(attr->res.data_size);
> > +		if (t16 + t32 > asize)
> > +			goto out;
> > +
> > +		return attr;
> > +	}
> > +
> > +	/* Check some nonresident fields */
> > +	if (attr->name_len &&
> > +	    le16_to_cpu(attr->name_off) + sizeof(short) * attr->name_len >
> > +		    le16_to_cpu(attr->nres.run_off)) {
> > +		goto out;
> > +	}
> > +
> > +	if (attr->nres.svcn || !is_attr_ext(attr)) {
> > +		if (asize + 8 < SIZEOF_NONRESIDENT)
> > +			goto out;
> > +
> > +		if (attr->nres.c_unit)
> > +			goto out;
> > +	} else if (asize + 8 < SIZEOF_NONRESIDENT_EX)
> > +		goto out;
> > +
> > +	return attr;
> > +
> > +out:
> > +	return NULL;
> > +}
> 
> > diff --git a/fs/ntfs3/run.c b/fs/ntfs3/run.c
> 
> > +static inline int run_packed_size(const s64 *n)
> > +{
> > +#ifdef __BIG_ENDIAN
> 
> These are whole functions with ifdef. It would be maybe more clear
> that there really is whole functions to both endiand.
> 

This is controversial question, but fixed as well (in v18).

> > +	const u8 *p = (const u8 *)n + sizeof(*n) - 1;
> > +
> > +	if (*n >= 0) {
> > +		if (p[-7] || p[-6] || p[-5] || p[-4])
> > +			p -= 4;
> > +		if (p[-3] || p[-2])
> > +			p -= 2;
> > +		if (p[-1])
> > +			p -= 1;
> > +		if (p[0] & 0x80)
> > +			p -= 1;
> > +	} else {
> > +		if (p[-7] != 0xff || p[-6] != 0xff || p[-5] != 0xff ||
> > +		    p[-4] != 0xff)
> > +			p -= 4;
> > +		if (p[-3] != 0xff || p[-2] != 0xff)
> > +			p -= 2;
> > +		if (p[-1] != 0xff)
> > +			p -= 1;
> > +		if (!(p[0] & 0x80))
> > +			p -= 1;
> > +	}
> > +	return (const u8 *)n + sizeof(*n) - p;
> 
> }
> #else
> static inline int run_packed_size(const s64 *n)
> {
> 
> Something like this.
> 
> > +	const u8 *p = (const u8 *)n;
> > +
> > +	if (*n >= 0) {
> > +		if (p[7] || p[6] || p[5] || p[4])
> > +			p += 4;
> > +		if (p[3] || p[2])
> > +			p += 2;
> > +		if (p[1])
> > +			p += 1;
> > +		if (p[0] & 0x80)
> > +			p += 1;
> > +	} else {
> > +		if (p[7] != 0xff || p[6] != 0xff || p[5] != 0xff ||
> > +		    p[4] != 0xff)
> > +			p += 4;
> > +		if (p[3] != 0xff || p[2] != 0xff)
> > +			p += 2;
> > +		if (p[1] != 0xff)
> > +			p += 1;
> > +		if (!(p[0] & 0x80))
> > +			p += 1;
> > +	}
> > +
> > +	return 1 + p - (const u8 *)n;
> > +#endif
> > +}
> > +
> > +/*
> > + * run_pack
> > + *
> > + * packs runs into buffer
> > + * packed_vcns - how much runs we have packed
> > + * packed_size - how much bytes we have used run_buf
> > + */
> > +int run_pack(const struct runs_tree *run, CLST svcn, CLST len, u8 *run_buf,
> > +	     u32 run_buf_size, CLST *packed_vcns)
> > +{
> > +	CLST next_vcn, vcn, lcn;
> > +	CLST prev_lcn = 0;
> > +	CLST evcn1 = svcn + len;
> > +	int packed_size = 0;
> > +	size_t i;
> > +	bool ok;
> > +	s64 dlcn, len64;
> > +	int offset_size, size_size, t;
> > +	const u8 *p;
> > +
> > +	next_vcn = vcn = svcn;
> > +
> > +	*packed_vcns = 0;
> > +
> > +	if (!len)
> > +		goto out;
> > +
> > +	ok = run_lookup_entry(run, vcn, &lcn, &len, &i);
> > +
> > +	if (!ok)
> > +		goto error;
> > +
> > +	if (next_vcn != vcn)
> > +		goto error;
> > +
> > +	for (;;) {
> > +		/* offset of current fragment relatively to previous fragment */
> > +		dlcn = 0;
> 
> This dlcn
> 
> > +		next_vcn = vcn + len;
> > +
> > +		if (next_vcn > evcn1)
> > +			len = evcn1 - vcn;
> > +
> > +		/*
> > +		 * mirror of len, but signed, because run_packed_size()
> > +		 * works with signed int only
> > +		 */
> > +		len64 = len;
> > +
> > +		/* how much bytes is packed len64 */
> > +		size_size = run_packed_size(&len64);
> 
> Does (s64 *)&len work just fine?
> 

In addition to Mattew's response:
NTFS by it's spec supports 64bit clusters, however ntfs.sys driver
uses 32bit clusters only. This is a default for ntfs3 as well, which may
be configured to 64bit as well. I.e. len may be both.

> > +
> > +		/* offset_size - how much bytes is packed dlcn */
> > +		if (lcn == SPARSE_LCN) {
> > +			offset_size = 0;
> 
> dlcn might be better to live here?
> 

You are right, done.

> > +		} else {
> > +			/* NOTE: lcn can be less than prev_lcn! */
> > +			dlcn = (s64)lcn - prev_lcn;
> > +			offset_size = run_packed_size(&dlcn);
> > +			prev_lcn = lcn;
> > +		}
> > +	
> > +		t = run_buf_size - packed_size - 2 - offset_size;
> > +		if (t <= 0)
> > +			goto out;
> > +
> > +		/* can we store this entire run */
> > +		if (t < size_size)
> > +			goto out;
> > +
> > +		if (run_buf) {
> > +			p = (u8 *)&len64;
> > +
> > +			/* pack run header */
> > +			run_buf[0] = ((u8)(size_size | (offset_size << 4)));
> > +			run_buf += 1;
> > +
> > +			/* Pack the length of run */
> > +			switch (size_size) {
> > +#ifdef __BIG_ENDIAN
> > +			case 8:
> > +				run_buf[7] = p[0];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[1];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[2];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[3];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[4];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[5];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[6];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[7];
> > +#else
> > +			case 8:
> > +				run_buf[7] = p[7];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[6];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[5];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[4];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[3];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[2];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[1];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[0];
> > +#endif
> > +			}
> 
> Why is this not own function? We use this like 5 places. Also isn't
> little endian just memcopy()
> 

You are right, for little endiand this is just memcopy, but if it isn't
inlined, we will have overhead calling the function.

> > +
> > +			run_buf += size_size;
> > +			p = (u8 *)&dlcn;
> 
> I think that when we have function for that switch tmp p is not needed
> anymore.
> 

Yes, but this doesn't give more readability or simplification I think.
More of a personal preference.

> > +
> > +			/* Pack the offset from previous lcn */
> > +			switch (offset_size) {
> > +#ifdef __BIG_ENDIAN
> > +			case 8:
> > +				run_buf[7] = p[0];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[1];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[2];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[3];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[4];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[5];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[6];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[7];
> > +#else
> > +			case 8:
> > +				run_buf[7] = p[7];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[6];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[5];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[4];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[3];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[2];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[1];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[0];
> > +#endif
> > +			}
> 
> > +int run_get_highest_vcn(CLST vcn, const u8 *run_buf, u64 *highest_vcn)
> > +{
> 
> > +		/* skip size_size */
> > +		run += size_size;
> > +
> > +		if (!len)
> > +			goto error;
> > +
> > +		run += offset_size;
> 
> Can this be straight
> run += size_size + offset_size;
> 

Done.

> > +
> > +#ifdef NTFS3_64BIT_CLUSTER
> > +		if ((vcn >> 32) || (len >> 32))
> > +			goto error;
> > +#endif
> > +		vcn64 += len;
> > +	}
> > +
> > +	*highest_vcn = vcn64 - 1;
> > +	return 0;
> > +}
Kari Argillander Jan. 18, 2021, 2:24 p.m. UTC | #5
On Mon, Jan 18, 2021 at 10:00:53AM +0000, Konstantin Komarov wrote:
> From: Kari Argillander <kari.argillander@gmail.com>
> Sent: Monday, January 4, 2021 12:58 AM
> > On Thu, Dec 31, 2020 at 06:23:55PM +0300, Konstantin Komarov wrote:
 
> > > +static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
> > > +{

> > > +	/* Return error if mode is not supported */
> > > +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > > +		     FALLOC_FL_COLLAPSE_RANGE))
> > > +		return -EOPNOTSUPP;

> > > +
> > > +	if (mode & FALLOC_FL_PUNCH_HOLE) {

> > > +	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {

> > > +	} else {

> > > +		if (mode & FALLOC_FL_KEEP_SIZE) {
> > 
> > Isn't this hole else already (mode & FALLOC_FL_KEEP_SIZE?
> 
> Sorry, can you please clarify your question? Not sure, understood it.

I have hide unrelevant code now. So maybe now you see better what I
mean. Last else have to be already FALLOC_FL_KEEP_SIZE so if statment is
not needed.