diff mbox

[1/4] fs: add SEEK_HOLE and SEEK_DATA flags

Message ID 1309275199-10801-1-git-send-email-josef@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Josef Bacik June 28, 2011, 3:33 p.m. UTC
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  We need to match solaris
here, and the definitions are

*o* If /whence/ is SEEK_HOLE, the offset of the start of the
next hole greater than or equal to the supplied offset
is returned. The definition of a hole is provided near
the end of the DESCRIPTION.

*o* If /whence/ is SEEK_DATA, the file pointer is set to the
start of the next non-hole file region greater than or
equal to the supplied offset.

So in the generic case the entire file is data and there is a virtual hole at
the end.  That means we will just return i_size for SEEK_HOLE and will return
the same offset for SEEK_DATA.  This is how Solaris does it so we have to do it
the same way.

Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/read_write.c    |   44 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h |    4 +++-
 2 files changed, 44 insertions(+), 4 deletions(-)

Comments

Marco July 29, 2011, 9:58 a.m. UTC | #1
Sorry maybe I'm a bit late? :)

Il 28/06/2011 17:33, Josef Bacik ha scritto:
>
>   loff_t default_llseek(struct file *file, loff_t offset, int origin)
>   {
> +	struct inode *inode = file->f_path.dentry->d_inode;
>   	loff_t retval;
>
> -	mutex_lock(&file->f_dentry->d_inode->i_mutex);
> +	mutex_lock(&inode->i_mutex);
>   	switch (origin) {
>   		case SEEK_END:
> -			offset += i_size_read(file->f_path.dentry->d_inode);
> +			offset += i_size_read(inode);

Here we are under mutex, so I think we can use directly i_size without 
i_size_read.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sunil Mushran Aug. 20, 2011, 3:36 p.m. UTC | #2
On 08/20/2011 03:03 AM, Marco Stornelli wrote:
> Il 20/08/2011 11:41, Marco Stornelli ha scritto:
>> Hi,
>>
>> Il 28/06/2011 17:33, Josef Bacik ha scritto:
>>> This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.
>>> Turns out
>>> using fiemap in things like cp cause more problems than it solves, so
>>> lets try
>>> and give userspace an interface that doesn't suck. We need to match
>>> solaris
>>> here, and the definitions are
>>>
>>> *o* If /whence/ is SEEK_HOLE, the offset of the start of the
>>> next hole greater than or equal to the supplied offset
>>> is returned. The definition of a hole is provided near
>>> the end of the DESCRIPTION.
>>>
>>> *o* If /whence/ is SEEK_DATA, the file pointer is set to the
>>> start of the next non-hole file region greater than or
>>> equal to the supplied offset.
>>>
>>
>> I'm implementing the SEEK_DATA/SEEK_HOLE management for pramfs and I've
>> got some doubts about the right behavior:
>>
>> 1) when we use SEEK_DATA/SEEK_HOLE, the offset used in lseek means
>> always the offset from the start of the file, right?
>>
>> 2) in case of a file with hole at the beginning and data at the end, if
>> I do lseek(fd, 0, SEEK_HOLE) I should receive the end of the file
>> because the idea is to search the *next* hole and we have always a
>> virtual hole at the end of the file, right?
>
> Just to be precise about this question: the alternative here, it's to 
> return the same position because we are already in a hole.

Yes, the offset is from the start of the file.

And yes, same offset is ok. I think the word next should be
dropped from the definition. It is misleading.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marco Aug. 20, 2011, 4:32 p.m. UTC | #3
Il 20/08/2011 17:36, Sunil Mushran ha scritto:
> On 08/20/2011 03:03 AM, Marco Stornelli wrote:
>> Il 20/08/2011 11:41, Marco Stornelli ha scritto:
>>> Hi,
>>>
>>> Il 28/06/2011 17:33, Josef Bacik ha scritto:
>>>> This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.
>>>> Turns out
>>>> using fiemap in things like cp cause more problems than it solves, so
>>>> lets try
>>>> and give userspace an interface that doesn't suck. We need to match
>>>> solaris
>>>> here, and the definitions are
>>>>
>>>> *o* If /whence/ is SEEK_HOLE, the offset of the start of the
>>>> next hole greater than or equal to the supplied offset
>>>> is returned. The definition of a hole is provided near
>>>> the end of the DESCRIPTION.
>>>>
>>>> *o* If /whence/ is SEEK_DATA, the file pointer is set to the
>>>> start of the next non-hole file region greater than or
>>>> equal to the supplied offset.
>>>>
>>>
>>> I'm implementing the SEEK_DATA/SEEK_HOLE management for pramfs and I've
>>> got some doubts about the right behavior:
>>>
>>> 1) when we use SEEK_DATA/SEEK_HOLE, the offset used in lseek means
>>> always the offset from the start of the file, right?
>>>
>>> 2) in case of a file with hole at the beginning and data at the end, if
>>> I do lseek(fd, 0, SEEK_HOLE) I should receive the end of the file
>>> because the idea is to search the *next* hole and we have always a
>>> virtual hole at the end of the file, right?
>>
>> Just to be precise about this question: the alternative here, it's to
>> return the same position because we are already in a hole.
>
> Yes, the offset is from the start of the file.
>
> And yes, same offset is ok. I think the word next should be
> dropped from the definition. It is misleading.
>

Thank. Yes the word "next" is not very clear. I re-read the proposal for 
the standard, actually it's seems to me that if we are in the last hole 
we should return the file size, if we are not in the last hole than it's 
ok the same offset - "....except that
if offset falls beyond the last byte not within a hole, then the file
offset may be set to the file size instead".

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sunil Mushran Aug. 22, 2011, 6:08 a.m. UTC | #4
On 08/20/2011 09:32 AM, Marco Stornelli wrote:
> Il 20/08/2011 17:36, Sunil Mushran ha scritto:
>> On 08/20/2011 03:03 AM, Marco Stornelli wrote:
>>> Il 20/08/2011 11:41, Marco Stornelli ha scritto:
>>>> Hi,
>>>>
>>>> Il 28/06/2011 17:33, Josef Bacik ha scritto:
>>>>> This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.
>>>>> Turns out
>>>>> using fiemap in things like cp cause more problems than it solves, so
>>>>> lets try
>>>>> and give userspace an interface that doesn't suck. We need to match
>>>>> solaris
>>>>> here, and the definitions are
>>>>>
>>>>> *o* If /whence/ is SEEK_HOLE, the offset of the start of the
>>>>> next hole greater than or equal to the supplied offset
>>>>> is returned. The definition of a hole is provided near
>>>>> the end of the DESCRIPTION.
>>>>>
>>>>> *o* If /whence/ is SEEK_DATA, the file pointer is set to the
>>>>> start of the next non-hole file region greater than or
>>>>> equal to the supplied offset.
>>>>>
>>>>
>>>> I'm implementing the SEEK_DATA/SEEK_HOLE management for pramfs and 
>>>> I've
>>>> got some doubts about the right behavior:
>>>>
>>>> 1) when we use SEEK_DATA/SEEK_HOLE, the offset used in lseek means
>>>> always the offset from the start of the file, right?
>>>>
>>>> 2) in case of a file with hole at the beginning and data at the 
>>>> end, if
>>>> I do lseek(fd, 0, SEEK_HOLE) I should receive the end of the file
>>>> because the idea is to search the *next* hole and we have always a
>>>> virtual hole at the end of the file, right?
>>>
>>> Just to be precise about this question: the alternative here, it's to
>>> return the same position because we are already in a hole.
>>
>> Yes, the offset is from the start of the file.
>>
>> And yes, same offset is ok. I think the word next should be
>> dropped from the definition. It is misleading.
>>
>
> Thank. Yes the word "next" is not very clear. I re-read the proposal 
> for the standard, actually it's seems to me that if we are in the last 
> hole we should return the file size, if we are not in the last hole 
> than it's ok the same offset - "....except that
> if offset falls beyond the last byte not within a hole, then the file
> offset may be set to the file size instead".

Any proposal that differentiates between holes is wrong. It should not
matter where the hole is.

Think of it from the usage-pov.

doff = 0;
while ((doff = lseek(SEEK_DATA, doff)) != -ENXIO) {
     hoff = lseek(SEEK_HOLE, doff);
     read_offset = doff;
     read_len = hoff -doff;
     process();
     doff = hoff;
}

The goal is to make this as efficient as follows. Treating the last
hole differently adds more code for no benefit.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marco Aug. 22, 2011, 10:56 a.m. UTC | #5
2011/8/22 Sunil Mushran <sunil.mushran@oracle.com>:
> On 08/20/2011 09:32 AM, Marco Stornelli wrote:
>>
>> Il 20/08/2011 17:36, Sunil Mushran ha scritto:
>>>
>>> On 08/20/2011 03:03 AM, Marco Stornelli wrote:
>>>>
>>>> Il 20/08/2011 11:41, Marco Stornelli ha scritto:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Il 28/06/2011 17:33, Josef Bacik ha scritto:
>>>>>>
>>>>>> This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.
>>>>>> Turns out
>>>>>> using fiemap in things like cp cause more problems than it solves, so
>>>>>> lets try
>>>>>> and give userspace an interface that doesn't suck. We need to match
>>>>>> solaris
>>>>>> here, and the definitions are
>>>>>>
>>>>>> *o* If /whence/ is SEEK_HOLE, the offset of the start of the
>>>>>> next hole greater than or equal to the supplied offset
>>>>>> is returned. The definition of a hole is provided near
>>>>>> the end of the DESCRIPTION.
>>>>>>
>>>>>> *o* If /whence/ is SEEK_DATA, the file pointer is set to the
>>>>>> start of the next non-hole file region greater than or
>>>>>> equal to the supplied offset.
>>>>>>
>>>>>
>>>>> I'm implementing the SEEK_DATA/SEEK_HOLE management for pramfs and I've
>>>>> got some doubts about the right behavior:
>>>>>
>>>>> 1) when we use SEEK_DATA/SEEK_HOLE, the offset used in lseek means
>>>>> always the offset from the start of the file, right?
>>>>>
>>>>> 2) in case of a file with hole at the beginning and data at the end, if
>>>>> I do lseek(fd, 0, SEEK_HOLE) I should receive the end of the file
>>>>> because the idea is to search the *next* hole and we have always a
>>>>> virtual hole at the end of the file, right?
>>>>
>>>> Just to be precise about this question: the alternative here, it's to
>>>> return the same position because we are already in a hole.
>>>
>>> Yes, the offset is from the start of the file.
>>>
>>> And yes, same offset is ok. I think the word next should be
>>> dropped from the definition. It is misleading.
>>>
>>
>> Thank. Yes the word "next" is not very clear. I re-read the proposal for
>> the standard, actually it's seems to me that if we are in the last hole we
>> should return the file size, if we are not in the last hole than it's ok the
>> same offset - "....except that
>> if offset falls beyond the last byte not within a hole, then the file
>> offset may be set to the file size instead".
>
> Any proposal that differentiates between holes is wrong. It should not
> matter where the hole is.
>
> Think of it from the usage-pov.
>
> doff = 0;
> while ((doff = lseek(SEEK_DATA, doff)) != -ENXIO) {
>    hoff = lseek(SEEK_HOLE, doff);
>    read_offset = doff;
>    read_len = hoff -doff;
>    process();
>    doff = hoff;
> }
>
> The goal is to make this as efficient as follows. Treating the last
> hole differently adds more code for no benefit.
>
>

Mmmm.....It seems that Josef has to be clear in this point. However I
looked for the seek hole test in xfs test suite, but I didn't find
anything. Btrfs guys, how have you got tested the implementation? What
do you think about this corner case? Al, what do you think about it?

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sunil Mushran Aug. 22, 2011, 3:57 p.m. UTC | #6
On 08/22/2011 03:56 AM, Marco Stornelli wrote:
> 2011/8/22 Sunil Mushran<sunil.mushran@oracle.com>:
>> On 08/20/2011 09:32 AM, Marco Stornelli wrote:
>>> Thank. Yes the word "next" is not very clear. I re-read the proposal for
>>> the standard, actually it's seems to me that if we are in the last hole we
>>> should return the file size, if we are not in the last hole than it's ok the
>>> same offset - "....except that
>>> if offset falls beyond the last byte not within a hole, then the file
>>> offset may be set to the file size instead".
>> Any proposal that differentiates between holes is wrong. It should not
>> matter where the hole is.
>>
>> Think of it from the usage-pov.
>>
>> doff = 0;
>> while ((doff = lseek(SEEK_DATA, doff)) != -ENXIO) {
>>     hoff = lseek(SEEK_HOLE, doff);
>>     read_offset = doff;
>>     read_len = hoff -doff;
>>     process();
>>     doff = hoff;
>> }
>>
>> The goal is to make this as efficient as follows. Treating the last
>> hole differently adds more code for no benefit.
>>
> Mmmm.....It seems that Josef has to be clear in this point. However I
> looked for the seek hole test in xfs test suite, but I didn't find
> anything. Btrfs guys, how have you got tested the implementation? What
> do you think about this corner case? Al, what do you think about it?


The following test was used to test the early implementations.
http://oss.oracle.com/~smushran/seek_data/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marco Aug. 22, 2011, 5:56 p.m. UTC | #7
Il 22/08/2011 17:57, Sunil Mushran ha scritto:
>>> Any proposal that differentiates between holes is wrong. It should not
>>> matter where the hole is.
>>>
>>> Think of it from the usage-pov.
>>>
>>> doff = 0;
>>> while ((doff = lseek(SEEK_DATA, doff)) != -ENXIO) {
>>> hoff = lseek(SEEK_HOLE, doff);
>>> read_offset = doff;
>>> read_len = hoff -doff;
>>> process();
>>> doff = hoff;
>>> }
>>>
>>> The goal is to make this as efficient as follows. Treating the last
>>> hole differently adds more code for no benefit.
>>>
>> Mmmm.....It seems that Josef has to be clear in this point. However I
>> looked for the seek hole test in xfs test suite, but I didn't find
>> anything. Btrfs guys, how have you got tested the implementation? What
>> do you think about this corner case? Al, what do you think about it?
>
>
> The following test was used to test the early implementations.
> http://oss.oracle.com/~smushran/seek_data/
>

Thank you very much!! I found another point. Your test fails with my 
implementation because here 
(http://www.austingroupbugs.net/view.php?id=415) says: "If whence is 
SEEK_DATA, the file offset shall be set to the smallest location of a 
byte not within a hole and not less than offset. It shall be an error if 
no such byte exists." So in this case I return ENXIO but the test 
expects another value. I have to say that there is a bit of confusion 
about the real behavior of this new feature :)

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sunil Mushran Aug. 22, 2011, 9:22 p.m. UTC | #8
On 08/22/2011 10:56 AM, Marco Stornelli wrote:
> Il 22/08/2011 17:57, Sunil Mushran ha scritto:
>>
>> The following test was used to test the early implementations.
>> http://oss.oracle.com/~smushran/seek_data/
>>
>
> Thank you very much!! I found another point. Your test fails with my 
> implementation because here 
> (http://www.austingroupbugs.net/view.php?id=415) says: "If whence is 
> SEEK_DATA, the file offset shall be set to the smallest location of a 
> byte not within a hole and not less than offset. It shall be an error 
> if no such byte exists." So in this case I return ENXIO but the test 
> expects another value. I have to say that there is a bit of confusion 
> about the real behavior of this new feature :)
>

That's test 5.10, 5.12, 5.14. And it expects -ENXIO.

Which test is failing for you?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marco Aug. 23, 2011, 5:44 p.m. UTC | #9
Il 22/08/2011 23:22, Sunil Mushran ha scritto:
> On 08/22/2011 10:56 AM, Marco Stornelli wrote:
>> Il 22/08/2011 17:57, Sunil Mushran ha scritto:
>>>
>>> The following test was used to test the early implementations.
>>> http://oss.oracle.com/~smushran/seek_data/
>>>
>>
>> Thank you very much!! I found another point. Your test fails with my
>> implementation because here
>> (http://www.austingroupbugs.net/view.php?id=415) says: "If whence is
>> SEEK_DATA, the file offset shall be set to the smallest location of a
>> byte not within a hole and not less than offset. It shall be an error
>> if no such byte exists." So in this case I return ENXIO but the test
>> expects another value. I have to say that there is a bit of confusion
>> about the real behavior of this new feature :)
>>
>
> That's test 5.10, 5.12, 5.14. And it expects -ENXIO.
>
> Which test is failing for you?
>

Sorry, I was reading the results in a wrong way.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 31, 2011, 12:35 a.m. UTC | #10
On Mon, Aug 22, 2011 at 07:56:31PM +0200, Marco Stornelli wrote:
> Il 22/08/2011 17:57, Sunil Mushran ha scritto:
> >>>Any proposal that differentiates between holes is wrong. It should not
> >>>matter where the hole is.
> >>>
> >>>Think of it from the usage-pov.
> >>>
> >>>doff = 0;
> >>>while ((doff = lseek(SEEK_DATA, doff)) != -ENXIO) {
> >>>hoff = lseek(SEEK_HOLE, doff);
> >>>read_offset = doff;
> >>>read_len = hoff -doff;
> >>>process();
> >>>doff = hoff;
> >>>}
> >>>
> >>>The goal is to make this as efficient as follows. Treating the last
> >>>hole differently adds more code for no benefit.
> >>>
> >>Mmmm.....It seems that Josef has to be clear in this point. However I
> >>looked for the seek hole test in xfs test suite, but I didn't find
> >>anything. Btrfs guys, how have you got tested the implementation? What
> >>do you think about this corner case? Al, what do you think about it?
> >
> >
> >The following test was used to test the early implementations.
> >http://oss.oracle.com/~smushran/seek_data/
> >
> 
> Thank you very much!! I found another point. Your test fails with my
> implementation because here
> (http://www.austingroupbugs.net/view.php?id=415) says: "If whence is
> SEEK_DATA, the file offset shall be set to the smallest location of
> a byte not within a hole and not less than offset. It shall be an
> error if no such byte exists." So in this case I return ENXIO but
> the test expects another value. I have to say that there is a bit of
> confusion about the real behavior of this new feature :)

Which is exactly why I'm trying to get the definitions clarified
first, then the behaviour codified in a single test suite we can
call the 'authoritive test'.

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index 5520f8a..5907b49 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -64,6 +64,23 @@  generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
 			return file->f_pos;
 		offset += file->f_pos;
 		break;
+	case SEEK_DATA:
+		/*
+		 * In the generic case the entire file is data, so as long as
+		 * offset isn't at the end of the file then the offset is data.
+		 */
+		if (offset >= inode->i_size)
+			return -ENXIO;
+		break;
+	case SEEK_HOLE:
+		/*
+		 * There is a virtual hole at the end of the file, so as long as
+		 * offset isn't i_size or larger, return i_size.
+		 */
+		if (offset >= inode->i_size)
+			return -ENXIO;
+		offset = inode->i_size;
+		break;
 	}
 
 	if (offset < 0 && !unsigned_offsets(file))
@@ -128,12 +145,13 @@  EXPORT_SYMBOL(no_llseek);
 
 loff_t default_llseek(struct file *file, loff_t offset, int origin)
 {
+	struct inode *inode = file->f_path.dentry->d_inode;
 	loff_t retval;
 
-	mutex_lock(&file->f_dentry->d_inode->i_mutex);
+	mutex_lock(&inode->i_mutex);
 	switch (origin) {
 		case SEEK_END:
-			offset += i_size_read(file->f_path.dentry->d_inode);
+			offset += i_size_read(inode);
 			break;
 		case SEEK_CUR:
 			if (offset == 0) {
@@ -141,6 +159,26 @@  loff_t default_llseek(struct file *file, loff_t offset, int origin)
 				goto out;
 			}
 			offset += file->f_pos;
+			break;
+		case SEEK_DATA:
+			/*
+			 * In the generic case the entire file is data, so as
+			 * long as offset isn't at the end of the file then the
+			 * offset is data.
+			 */
+			if (offset >= inode->i_size)
+				return -ENXIO;
+			break;
+		case SEEK_HOLE:
+			/*
+			 * There is a virtual hole at the end of the file, so
+			 * as long as offset isn't i_size or larger, return
+			 * i_size.
+			 */
+			if (offset >= inode->i_size)
+				return -ENXIO;
+			offset = inode->i_size;
+			break;
 	}
 	retval = -EINVAL;
 	if (offset >= 0 || unsigned_offsets(file)) {
@@ -151,7 +189,7 @@  loff_t default_llseek(struct file *file, loff_t offset, int origin)
 		retval = offset;
 	}
 out:
-	mutex_unlock(&file->f_dentry->d_inode->i_mutex);
+	mutex_unlock(&inode->i_mutex);
 	return retval;
 }
 EXPORT_SYMBOL(default_llseek);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b9792..c9156f3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -32,7 +32,9 @@ 
 #define SEEK_SET	0	/* seek relative to beginning of file */
 #define SEEK_CUR	1	/* seek relative to current file position */
 #define SEEK_END	2	/* seek relative to end of file */
-#define SEEK_MAX	SEEK_END
+#define SEEK_DATA	3	/* seek to the next data */
+#define SEEK_HOLE	4	/* seek to the next hole */
+#define SEEK_MAX	SEEK_HOLE
 
 struct fstrim_range {
 	__u64 start;