diff mbox series

generic/453: Exclude filenames that are not supported by exfat

Message ID 20210425223105.1855098-1-shreeya.patel@collabora.com (mailing list archive)
State New
Headers show
Series generic/453: Exclude filenames that are not supported by exfat | expand

Commit Message

Shreeya Patel April 25, 2021, 10:31 p.m. UTC
exFAT filesystem does not support the following character codes
0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >

Hence, exclude the filenames which creates FAKESLASH and a BOX
since they are using character codes which are not supported
by exfat.

Filename creating a BOX uses a control code '\xa0' which is
restricted by exfat.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
---
 tests/generic/453 | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Comments

Matthew Wilcox April 26, 2021, 12:34 a.m. UTC | #1
On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
> exFAT filesystem does not support the following character codes
> 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >

ummm ...

> -# Fake slash?
> -setf "urk\xc0\xafmoo" "FAKESLASH"

That doesn't use any of the explained banned characters.  It uses 0xc0,
0xaf.

Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
standards require that producers of UTF-8 shall use the shortest form
possible, for example, producing a two-byte sequence with first byte 0xc0
is nonconforming.  Unicode 3.1 has added the requirement that conforming
programs must not accept non-shortest forms in their input."

So is it that exfat is rejecting nonconforming sequences?  Or is it
converting the nonconforming sequence from 0xc0 0xaf to the conforming
sequence 0x2f, and then rejecting it (because it's '/')?
Shreeya Patel April 26, 2021, 11:57 a.m. UTC | #2
On 26/04/21 6:04 am, Matthew Wilcox wrote:
> On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
>> exFAT filesystem does not support the following character codes
>> 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >
> ummm ...
>
>> -# Fake slash?
>> -setf "urk\xc0\xafmoo" "FAKESLASH"
> That doesn't use any of the explained banned characters.  It uses 0xc0,
> 0xaf.
>
> Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
> standards require that producers of UTF-8 shall use the shortest form
> possible, for example, producing a two-byte sequence with first byte 0xc0
> is nonconforming.  Unicode 3.1 has added the requirement that conforming
> programs must not accept non-shortest forms in their input."
>
> So is it that exfat is rejecting nonconforming sequences?  Or is it
> converting the nonconforming sequence from 0xc0 0xaf to the conforming
> sequence 0x2f, and then rejecting it (because it's '/')?
>

No, I don't think exfat is not converting nonconforming sequence from 
0xc0 0xaf
to the conforming sequence 0x2f.
Because I get different outputs when tried with both ways.
When I create a file with "urk\xc0\xafmoo", I get output as "Operation 
not permitted"
and when I create it as "urk\x2fmoo", it gives "No such file or 
directory error" or
you can consider this error as "Invalid argument"
( because that's what I get when I try for other characters like |, :, 
?, etc )

Box filename also fails with "Invalid argument" error.
Shreeya Patel April 26, 2021, 12:03 p.m. UTC | #3
On 26/04/21 5:27 pm, Shreeya Patel wrote:
>
> On 26/04/21 6:04 am, Matthew Wilcox wrote:
>> On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
>>> exFAT filesystem does not support the following character codes
>>> 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >
>> ummm ...
>>
>>> -# Fake slash?
>>> -setf "urk\xc0\xafmoo" "FAKESLASH"
>> That doesn't use any of the explained banned characters.  It uses 0xc0,
>> 0xaf.
>>
>> Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
>> standards require that producers of UTF-8 shall use the shortest form
>> possible, for example, producing a two-byte sequence with first byte 
>> 0xc0
>> is nonconforming.  Unicode 3.1 has added the requirement that conforming
>> programs must not accept non-shortest forms in their input."
>>
>> So is it that exfat is rejecting nonconforming sequences?  Or is it
>> converting the nonconforming sequence from 0xc0 0xaf to the conforming
>> sequence 0x2f, and then rejecting it (because it's '/')?
>>
>
> No, I don't think exfat is not converting nonconforming sequence from 
> 0xc0 0xaf
> to the conforming sequence 0x2f.


Sorry, I meant "I don't think exfat is converting nonconforming sequence 
from 0xc0 0xaf
to the conforming sequence 0x2f." here.


> Because I get different outputs when tried with both ways.
> When I create a file with "urk\xc0\xafmoo", I get output as "Operation 
> not permitted"
> and when I create it as "urk\x2fmoo", it gives "No such file or 
> directory error" or
> you can consider this error as "Invalid argument"
> ( because that's what I get when I try for other characters like |, :, 
> ?, etc )
>
> Box filename also fails with "Invalid argument" error.
>
>
Matthew Wilcox April 26, 2021, 12:37 p.m. UTC | #4
On Mon, Apr 26, 2021 at 05:27:51PM +0530, Shreeya Patel wrote:
> On 26/04/21 6:04 am, Matthew Wilcox wrote:
> > On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
> > > exFAT filesystem does not support the following character codes
> > > 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >
> > ummm ...
> > 
> > > -# Fake slash?
> > > -setf "urk\xc0\xafmoo" "FAKESLASH"
> > That doesn't use any of the explained banned characters.  It uses 0xc0,
> > 0xaf.
> > 
> > Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
> > standards require that producers of UTF-8 shall use the shortest form
> > possible, for example, producing a two-byte sequence with first byte 0xc0
> > is nonconforming.  Unicode 3.1 has added the requirement that conforming
> > programs must not accept non-shortest forms in their input."
> > 
> > So is it that exfat is rejecting nonconforming sequences?  Or is it
> > converting the nonconforming sequence from 0xc0 0xaf to the conforming
> > sequence 0x2f, and then rejecting it (because it's '/')?
> > 
> 
> No, I don't think exfat is not converting nonconforming sequence from 0xc0
> 0xaf
> to the conforming sequence 0x2f.
> Because I get different outputs when tried with both ways.
> When I create a file with "urk\xc0\xafmoo", I get output as "Operation not
> permitted"
> and when I create it as "urk\x2fmoo", it gives "No such file or directory
> error" or
> you can consider this error as "Invalid argument"
> ( because that's what I get when I try for other characters like |, :, ?,
> etc )

I think we need to understand this before skipping the test.  Does it
also fail, eg, on cifs, vfat, jfs or udf?

> Box filename also fails with "Invalid argument" error.
> 
>
Shreeya Patel April 27, 2021, 11:13 a.m. UTC | #5
On 26/04/21 6:07 pm, Matthew Wilcox wrote:
> On Mon, Apr 26, 2021 at 05:27:51PM +0530, Shreeya Patel wrote:
>> On 26/04/21 6:04 am, Matthew Wilcox wrote:
>>> On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
>>>> exFAT filesystem does not support the following character codes
>>>> 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >
>>> ummm ...
>>>
>>>> -# Fake slash?
>>>> -setf "urk\xc0\xafmoo" "FAKESLASH"
>>> That doesn't use any of the explained banned characters.  It uses 0xc0,
>>> 0xaf.
>>>
>>> Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
>>> standards require that producers of UTF-8 shall use the shortest form
>>> possible, for example, producing a two-byte sequence with first byte 0xc0
>>> is nonconforming.  Unicode 3.1 has added the requirement that conforming
>>> programs must not accept non-shortest forms in their input."
>>>
>>> So is it that exfat is rejecting nonconforming sequences?  Or is it
>>> converting the nonconforming sequence from 0xc0 0xaf to the conforming
>>> sequence 0x2f, and then rejecting it (because it's '/')?
>>>
>> No, I don't think exfat is not converting nonconforming sequence from 0xc0
>> 0xaf
>> to the conforming sequence 0x2f.
>> Because I get different outputs when tried with both ways.
>> When I create a file with "urk\xc0\xafmoo", I get output as "Operation not
>> permitted"
>> and when I create it as "urk\x2fmoo", it gives "No such file or directory
>> error" or
>> you can consider this error as "Invalid argument"
>> ( because that's what I get when I try for other characters like |, :, ?,
>> etc )
> I think we need to understand this before skipping the test.  Does it
> also fail, eg, on cifs, vfat, jfs or udf?


I tested it for VFAT, UDF and JFS and following are the results.


1. VFAT ( as per wikipedia 0x00-0x1F 0x7F " * / : < > ? \ | are reserved 
characters)

For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory

For \xc0\xaf) - /var/mnt/scratch/test-453/urk��moo.txt: Invalid argument

Also gives error for Box filename

( this is very much similar to exfat, the only difference is that I do 
not get Operation not permitted when
using \xc0\xaf, instead it gives invalid argument.)


2. UDF ( as per wikipedia - only NULL cannot be used )

For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory

For \xc0\xaf - creates filename something like this 'urk??moo.txt' and 
does not throw any error.
( But this seems to be invalid and should have thrown some error)

Also gives error for dotdot entry.

I am not sure why UDF was giving error for / and dot dot entry but then
I read the following for UDF in one of the man pages which justifies the 
above errors I think

"Invalid characters such as "NULL" and "/" and  invalid  file
names  such  as "." and ".." will be translated according to
the following rule:

Replace the invalid character with an "_," then  append  the
file name with # followed by a 4 digit hex representation of
the 16-bit CRC of the original FileIdentifier. For  example,
the file name ".." will become "__#4C05" "

Source - http://www-it.desy.de/cgi-bin/man-cgi?udfs+7


3. JFS ( as per Wikipedia NULL cannot be used )

For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory

For \xc0\xaf - Works fine

Again not sure why / is failing here. Did not find much resource about 
the restricted filenames for JFS.


So as per above all the results, it seems like using \x2f fails for all 
but \xc0\xaf does work for JFS.


>
>> Box filename also fails with "Invalid argument" error.
>>
>>
Darrick J. Wong April 27, 2021, 6:11 p.m. UTC | #6
On Tue, Apr 27, 2021 at 04:43:05PM +0530, Shreeya Patel wrote:
> 
> On 26/04/21 6:07 pm, Matthew Wilcox wrote:
> > On Mon, Apr 26, 2021 at 05:27:51PM +0530, Shreeya Patel wrote:
> > > On 26/04/21 6:04 am, Matthew Wilcox wrote:
> > > > On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
> > > > > exFAT filesystem does not support the following character codes
> > > > > 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >
> > > > ummm ...
> > > > 
> > > > > -# Fake slash?
> > > > > -setf "urk\xc0\xafmoo" "FAKESLASH"
> > > > That doesn't use any of the explained banned characters.  It uses 0xc0,
> > > > 0xaf.
> > > > 
> > > > Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
> > > > standards require that producers of UTF-8 shall use the shortest form
> > > > possible, for example, producing a two-byte sequence with first byte 0xc0
> > > > is nonconforming.  Unicode 3.1 has added the requirement that conforming
> > > > programs must not accept non-shortest forms in their input."
> > > > 
> > > > So is it that exfat is rejecting nonconforming sequences?  Or is it
> > > > converting the nonconforming sequence from 0xc0 0xaf to the conforming
> > > > sequence 0x2f, and then rejecting it (because it's '/')?
> > > > 
> > > No, I don't think exfat is not converting nonconforming sequence from 0xc0
> > > 0xaf
> > > to the conforming sequence 0x2f.
> > > Because I get different outputs when tried with both ways.
> > > When I create a file with "urk\xc0\xafmoo", I get output as "Operation not
> > > permitted"
> > > and when I create it as "urk\x2fmoo", it gives "No such file or directory
> > > error" or
> > > you can consider this error as "Invalid argument"
> > > ( because that's what I get when I try for other characters like |, :, ?,
> > > etc )
> > I think we need to understand this before skipping the test.  Does it
> > also fail, eg, on cifs, vfat, jfs or udf?
> 
> 
> I tested it for VFAT, UDF and JFS and following are the results.
> 
> 
> 1. VFAT ( as per wikipedia 0x00-0x1F 0x7F " * / : < > ? \ | are reserved
> characters)
> 
> For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory
> 
> For \xc0\xaf) - /var/mnt/scratch/test-453/urk��moo.txt: Invalid argument
> 
> Also gives error for Box filename
> 
> ( this is very much similar to exfat, the only difference is that I do not
> get Operation not permitted when
> using \xc0\xaf, instead it gives invalid argument.)

vfat checks for those invalid characters, see msdos_format_name() and
vfat_is_used_badchars().

TBH I think these tests (g/453 and g/454) are probably only useful for
filesystems that allow unrestricted byte streams for names.

> 2. UDF ( as per wikipedia - only NULL cannot be used )
> 
> For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory
> 
> For \xc0\xaf - creates filename something like this 'urk??moo.txt' and does
> not throw any error.
> ( But this seems to be invalid and should have thrown some error)
> 
> Also gives error for dotdot entry.
> 
> I am not sure why UDF was giving error for / and dot dot entry but then
> I read the following for UDF in one of the man pages which justifies the
> above errors I think
> 
> "Invalid characters such as "NULL" and "/" and  invalid  file
> names  such  as "." and ".." will be translated according to
> the following rule:
> 
> Replace the invalid character with an "_," then  append  the
> file name with # followed by a 4 digit hex representation of
> the 16-bit CRC of the original FileIdentifier. For  example,
> the file name ".." will become "__#4C05" "
> 
> Source - http://www-it.desy.de/cgi-bin/man-cgi?udfs+7

That's Solaris.

> 3. JFS ( as per Wikipedia NULL cannot be used )
> 
> For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory
> 
> For \xc0\xaf - Works fine
> 
> Again not sure why / is failing here. Did not find much resource about the
> restricted filenames for JFS.

"/" is a path separator, it should always return ENOENT (unless you
created $SCRATCH_MNT/test-453/urk/moo.txt).  0x2f is the ascii encoding
for a slash.

> So as per above all the results, it seems like using \x2f fails for all but
> \xc0\xaf does work for JFS.

<nod>

--D

> 
> 
> > 
> > > Box filename also fails with "Invalid argument" error.
> > > 
> > >
Shreeya Patel April 27, 2021, 9 p.m. UTC | #7
On 27/04/21 11:41 pm, Darrick J. Wong wrote:
> On Tue, Apr 27, 2021 at 04:43:05PM +0530, Shreeya Patel wrote:
>> On 26/04/21 6:07 pm, Matthew Wilcox wrote:
>>> On Mon, Apr 26, 2021 at 05:27:51PM +0530, Shreeya Patel wrote:
>>>> On 26/04/21 6:04 am, Matthew Wilcox wrote:
>>>>> On Mon, Apr 26, 2021 at 04:01:05AM +0530, Shreeya Patel wrote:
>>>>>> exFAT filesystem does not support the following character codes
>>>>>> 0x0000 - 0x001F ( Control Codes ), /, ?, :, ", \, *, <, |, >
>>>>> ummm ...
>>>>>
>>>>>> -# Fake slash?
>>>>>> -setf "urk\xc0\xafmoo" "FAKESLASH"
>>>>> That doesn't use any of the explained banned characters.  It uses 0xc0,
>>>>> 0xaf.
>>>>>
>>>>> Now, in utf-8, that's an nonconforming sequence.  "The Unicode and UCS
>>>>> standards require that producers of UTF-8 shall use the shortest form
>>>>> possible, for example, producing a two-byte sequence with first byte 0xc0
>>>>> is nonconforming.  Unicode 3.1 has added the requirement that conforming
>>>>> programs must not accept non-shortest forms in their input."
>>>>>
>>>>> So is it that exfat is rejecting nonconforming sequences?  Or is it
>>>>> converting the nonconforming sequence from 0xc0 0xaf to the conforming
>>>>> sequence 0x2f, and then rejecting it (because it's '/')?
>>>>>
>>>> No, I don't think exfat is not converting nonconforming sequence from 0xc0
>>>> 0xaf
>>>> to the conforming sequence 0x2f.
>>>> Because I get different outputs when tried with both ways.
>>>> When I create a file with "urk\xc0\xafmoo", I get output as "Operation not
>>>> permitted"
>>>> and when I create it as "urk\x2fmoo", it gives "No such file or directory
>>>> error" or
>>>> you can consider this error as "Invalid argument"
>>>> ( because that's what I get when I try for other characters like |, :, ?,
>>>> etc )
>>> I think we need to understand this before skipping the test.  Does it
>>> also fail, eg, on cifs, vfat, jfs or udf?
>>
>> I tested it for VFAT, UDF and JFS and following are the results.
>>
>>
>> 1. VFAT ( as per wikipedia 0x00-0x1F 0x7F " * / : < > ? \ | are reserved
>> characters)
>>
>> For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory
>>
>> For \xc0\xaf) - /var/mnt/scratch/test-453/urk��moo.txt: Invalid argument
>>
>> Also gives error for Box filename
>>
>> ( this is very much similar to exfat, the only difference is that I do not
>> get Operation not permitted when
>> using \xc0\xaf, instead it gives invalid argument.)
> vfat checks for those invalid characters, see msdos_format_name() and
> vfat_is_used_badchars().
>
> TBH I think these tests (g/453 and g/454) are probably only useful for
> filesystems that allow unrestricted byte streams for names.


So it means I should just not run this test for all the fs that have 
some restricted characters.
But what about the other filenames which work fine. Don't we want to 
test them?


>> 2. UDF ( as per wikipedia - only NULL cannot be used )
>>
>> For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory
>>
>> For \xc0\xaf - creates filename something like this 'urk??moo.txt' and does
>> not throw any error.
>> ( But this seems to be invalid and should have thrown some error)
>>
>> Also gives error for dotdot entry.
>>
>> I am not sure why UDF was giving error for / and dot dot entry but then
>> I read the following for UDF in one of the man pages which justifies the
>> above errors I think
>>
>> "Invalid characters such as "NULL" and "/" and  invalid  file
>> names  such  as "." and ".." will be translated according to
>> the following rule:
>>
>> Replace the invalid character with an "_," then  append  the
>> file name with # followed by a 4 digit hex representation of
>> the 16-bit CRC of the original FileIdentifier. For  example,
>> the file name ".." will become "__#4C05" "
>>
>> Source - http://www-it.desy.de/cgi-bin/man-cgi?udfs+7
> That's Solaris.
Sorry missed that.
>
>> 3. JFS ( as per Wikipedia NULL cannot be used )
>>
>> For \x2f - /var/mnt/scratch/test-453/urk/moo.txt: No such file or directory
>>
>> For \xc0\xaf - Works fine
>>
>> Again not sure why / is failing here. Did not find much resource about the
>> restricted filenames for JFS.
> "/" is a path separator, it should always return ENOENT (unless you
> created $SCRATCH_MNT/test-453/urk/moo.txt).  0x2f is the ascii encoding
> for a slash.


Hmmm, makes sense. Myabe that is why we are using \xc0\xaf instead of \x2f.


>
>> So as per above all the results, it seems like using \x2f fails for all but
>> \xc0\xaf does work for JFS.
> <nod>
>
> --D
>
>>
>>>> Box filename also fails with "Invalid argument" error.
>>>>
>>>>
Theodore Ts'o April 28, 2021, 1:50 p.m. UTC | #8
On Tue, Apr 27, 2021 at 11:11:16AM -0700, Darrick J. Wong wrote:
> 
> TBH I think these tests (g/453 and g/454) are probably only useful for
> filesystems that allow unrestricted byte streams for names.

I'm actually a little puzzled about why these tests should exist:

# Create a directory with multiple filenames that all appear the same
# (in unicode, anyway) but point to different inodes.  In theory all
# Linux filesystems should allow this (filenames are a sequence of
# arbitrary bytes) even if the user implications are horrifying.

Why do we care about testing this?  The assertion "In all theory all
Linux filesystems should allow this" is clearly not true --- if you
enable unicode support for ext4 or f2fs, this will no longer be true,
and this is considered by some a _feature_ not a bug --- precisely
_because_ the user implications are horrifying.

So why does these tests exist?  Darrick, I see you added them in 2017
to test whether or not xfs_scrub will warn about confuable names, if
_check_xfs_scrub_does_unicode is true.  So we already understand that
it's possible for a file system checker to complain that these file
names are bad.

It's not at all clear to me that asserting that all Linux file systems
_must_ treat file names as "bag of bits" and not apply any kind of
unicode normalization or strict unicode validation is a valid thing to
test for in 2021.

					- Ted
Darrick J. Wong April 29, 2021, 12:37 a.m. UTC | #9
On Wed, Apr 28, 2021 at 09:50:56AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 27, 2021 at 11:11:16AM -0700, Darrick J. Wong wrote:
> > 
> > TBH I think these tests (g/453 and g/454) are probably only useful for
> > filesystems that allow unrestricted byte streams for names.
> 
> I'm actually a little puzzled about why these tests should exist:
> 
> # Create a directory with multiple filenames that all appear the same
> # (in unicode, anyway) but point to different inodes.  In theory all
> # Linux filesystems should allow this (filenames are a sequence of
> # arbitrary bytes) even if the user implications are horrifying.
> 
> Why do we care about testing this?  The assertion "In all theory all
> Linux filesystems should allow this" is clearly not true --- if you
> enable unicode support for ext4 or f2fs, this will no longer be true,
> and this is considered by some a _feature_ not a bug --- precisely
> _because_ the user implications are horrifying.
> 
> So why does these tests exist?  Darrick, I see you added them in 2017
> to test whether or not xfs_scrub will warn about confuable names, if
> _check_xfs_scrub_does_unicode is true.  So we already understand that
> it's possible for a file system checker to complain that these file
> names are bad.

Yes, that's exactly why this test (and generic/454) were created -- as a
functional test for xfs_scrub's unicode checking.

> It's not at all clear to me that asserting that all Linux file systems
> _must_ treat file names as "bag of bits" and not apply any kind of
> unicode normalization or strict unicode validation is a valid thing to
> test for in 2021.

Perhaps not.  These two tests do have the interesting side effect of
catching filesystems that don't hew to the "names are bytestreams"
philosophy.  In 2017, fstests usage seemed like it pretty narrowly
included only the big three filesystems, so it amuses me to no end that
four years went by before this discussion started. :P

Nowadays with wider testing of other filesystems (thanks, Red Hat!) we
should hide these behind _require_names_are_bytes or move them to
tests/xfs/.

Question -- the unicode case folding doesn't apply to xattr names,
right?

--D

> 
> 					- Ted
Gabriel Krisman Bertazi April 29, 2021, 2:32 p.m. UTC | #10
"Darrick J. Wong" <djwong@kernel.org> writes:

> On Wed, Apr 28, 2021 at 09:50:56AM -0400, Theodore Ts'o wrote:
>> On Tue, Apr 27, 2021 at 11:11:16AM -0700, Darrick J. Wong wrote:
>> > 
>> > TBH I think these tests (g/453 and g/454) are probably only useful for
>> > filesystems that allow unrestricted byte streams for names.
>> 
>> I'm actually a little puzzled about why these tests should exist:
>> 
>> # Create a directory with multiple filenames that all appear the same
>> # (in unicode, anyway) but point to different inodes.  In theory all
>> # Linux filesystems should allow this (filenames are a sequence of
>> # arbitrary bytes) even if the user implications are horrifying.
>> 
>> Why do we care about testing this?  The assertion "In all theory all
>> Linux filesystems should allow this" is clearly not true --- if you
>> enable unicode support for ext4 or f2fs, this will no longer be true,
>> and this is considered by some a _feature_ not a bug --- precisely
>> _because_ the user implications are horrifying.
>> 
>> So why does these tests exist?  Darrick, I see you added them in 2017
>> to test whether or not xfs_scrub will warn about confuable names, if
>> _check_xfs_scrub_does_unicode is true.  So we already understand that
>> it's possible for a file system checker to complain that these file
>> names are bad.
>
> Yes, that's exactly why this test (and generic/454) were created -- as a
> functional test for xfs_scrub's unicode checking.
>
>> It's not at all clear to me that asserting that all Linux file systems
>> _must_ treat file names as "bag of bits" and not apply any kind of
>> unicode normalization or strict unicode validation is a valid thing to
>> test for in 2021.
>
> Perhaps not.  These two tests do have the interesting side effect of
> catching filesystems that don't hew to the "names are bytestreams"
> philosophy.  In 2017, fstests usage seemed like it pretty narrowly
> included only the big three filesystems, so it amuses me to no end that
> four years went by before this discussion started. :P
>
> Nowadays with wider testing of other filesystems (thanks, Red Hat!) we
> should hide these behind _require_names_are_bytes or move them to
> tests/xfs/.
>
> Question -- the unicode case folding doesn't apply to xattr names,
> right?

No, they don't apply to xattr name in ext4 and f2fs.
diff mbox series

Patch

diff --git a/tests/generic/453 b/tests/generic/453
index d997736c..7fc73b4c 100755
--- a/tests/generic/453
+++ b/tests/generic/453
@@ -115,15 +115,9 @@  setf "greek_\xce\xa5\xcc\x81.txt" "GREEK UPSILON WITH ACUTE AND HOOK SYMBOL, NFK
 setf "arabic_\xef\xb7\xba.txt" "ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM, NFC"
 setf "arabic_\xd8\xb5\xd9\x84\xd9\x89\x20\xd8\xa7\xd9\x84\xd9\x84\xd9\x87\x20\xd8\xb9\xd9\x84\xd9\x8a\xd9\x87\x20\xd9\x88\xd8\xb3\xd9\x84\xd9\x85.txt" "ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM, NFKC"
 
-# Fake slash?
-setf "urk\xc0\xafmoo" "FAKESLASH"
-
 # Emoji: octopus butterfly owl giraffe
 setf "emoji_\xf0\x9f\xa6\x91\xf0\x9f\xa6\x8b\xf0\x9f\xa6\x89\xf0\x9f\xa6\x92.txt" "octopus butterfly owl giraffe emoji"
 
-# Line draw characters, because why not?
-setf "\x6c\x69\x6e\x65\x64\x72\x61\x77\x5f\x0a\xe2\x95\x94\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x97\x0a\xe2\x95\x91\x20\x6d\x65\x74\x61\x74\x61\x62\x6c\x65\x20\xe2\x95\x91\x0a\xe2\x95\x9f\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x95\xa2\x0a\xe2\x95\x91\x20\x5f\x5f\x69\x6e\x64\x65\x78\x20\x20\x20\xe2\x95\x91\x0a\xe2\x95\x9a\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x9d\x0a.txt" "ugly box because we can"
-
 # unicode rtl widgets too...
 setf "moo\xe2\x80\xaegnp.txt" "Well say hello,"
 setf "mootxt.png" "Harvey"
@@ -155,6 +149,16 @@  setf "zerojoin_moo\xe2\x80\x8dcow.txt" "zero width joiners"
 setf "combmark_\xe1\x80\x9c\xe1\x80\xad\xe1\x80\xaf.txt" "combining marks"
 setf "combmark_\xe1\x80\x9c\xe1\x80\xaf\xe1\x80\xad.txt" "combining marks"
 
+if [ "$FSTYP" != "exfat" ]; then
+
+	# Fake slash?
+	setf "urk\xc0\xafmoo" "FAKESLASH"
+
+	# Line draw characters, because why not?
+	setf "\x6c\x69\x6e\x65\x64\x72\x61\x77\x5f\x0a\xe2\x95\x94\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x97\x0a\xe2\x95\x91\x20\x6d\x65\x74\x61\x74\x61\x62\x6c\x65\x20\xe2\x95\x91\x0a\xe2\x95\x9f\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x95\xa2\x0a\xe2\x95\x91\x20\x5f\x5f\x69\x6e\x64\x65\x78\x20\x20\x20\xe2\x95\x91\x0a\xe2\x95\x9a\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x9d\x0a.txt" "ugly box because we can"
+
+fi
+
 # fake dotdot entry
 setd ".\xe2\x80\x8d" "zero width joiners in dot entry"
 setd "..\xe2\x80\x8d" "zero width joiners in dotdot entry"
@@ -176,12 +180,8 @@  testf "greek_\xce\xa5\xcc\x81.txt" "GREEK UPSILON WITH ACUTE AND HOOK SYMBOL, NF
 testf "arabic_\xef\xb7\xba.txt" "ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM, NFC"
 testf "arabic_\xd8\xb5\xd9\x84\xd9\x89\x20\xd8\xa7\xd9\x84\xd9\x84\xd9\x87\x20\xd8\xb9\xd9\x84\xd9\x8a\xd9\x87\x20\xd9\x88\xd8\xb3\xd9\x84\xd9\x85.txt" "ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM, NFKC"
 
-testf "urk\xc0\xafmoo" "FAKESLASH"
-
 testf "emoji_\xf0\x9f\xa6\x91\xf0\x9f\xa6\x8b\xf0\x9f\xa6\x89\xf0\x9f\xa6\x92.txt" "octopus butterfly owl giraffe emoji"
 
-testf "\x6c\x69\x6e\x65\x64\x72\x61\x77\x5f\x0a\xe2\x95\x94\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x97\x0a\xe2\x95\x91\x20\x6d\x65\x74\x61\x74\x61\x62\x6c\x65\x20\xe2\x95\x91\x0a\xe2\x95\x9f\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x95\xa2\x0a\xe2\x95\x91\x20\x5f\x5f\x69\x6e\x64\x65\x78\x20\x20\x20\xe2\x95\x91\x0a\xe2\x95\x9a\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x9d\x0a.txt" "ugly box because we can"
-
 testf "moo\xe2\x80\xaegnp.txt" "Well say hello,"
 testf "mootxt.png" "Harvey"
 
@@ -206,6 +206,14 @@  testf "zerojoin_moo\xe2\x80\x8dcow.txt" "zero width joiners"
 testf "combmark_\xe1\x80\x9c\xe1\x80\xad\xe1\x80\xaf.txt" "combining marks"
 testf "combmark_\xe1\x80\x9c\xe1\x80\xaf\xe1\x80\xad.txt" "combining marks"
 
+if [ "$FSTYP" != "exfat" ]; then
+
+	testf "urk\xc0\xafmoo" "FAKESLASH"
+
+	testf "\x6c\x69\x6e\x65\x64\x72\x61\x77\x5f\x0a\xe2\x95\x94\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x97\x0a\xe2\x95\x91\x20\x6d\x65\x74\x61\x74\x61\x62\x6c\x65\x20\xe2\x95\x91\x0a\xe2\x95\x9f\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x95\xa2\x0a\xe2\x95\x91\x20\x5f\x5f\x69\x6e\x64\x65\x78\x20\x20\x20\xe2\x95\x91\x0a\xe2\x95\x9a\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x9d\x0a.txt" "ugly box because we can"
+
+fi
+
 testd ".\xe2\x80\x8d" "zero width joiners in dot entry"
 testd "..\xe2\x80\x8d" "zero width joiners in dotdot entry"