Message ID | 1304531920-2890-1-git-send-email-josef@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Wed, 04 May 2011 13:58:39 EDT, Josef Bacik said: > -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the > given position. Nearest, or next? Solaris defines it as "next", for a good reason - otherwise you can get stuck in a case where the "nearest" hole is back towards the start of the file - and "seek data" will bounce back to the next byte at the other end of the hole. Consider a file with this layout: < 40K of data> A < 32K hole> B < 32K data> C < 8K hole> D <32K data> E .... If you're in the range between "8K-1 before C" and "8K-1 after D", there's no application of seeks to "nearest" data/hole that doesn't leave you oscillating between C and D, and unable to reach B or E. If youre at C, "nearest hole" is where you are, and "nearest data" is at D, not B. Similarly for D - nearest data is C, not E. However, this is easily dealt with if you define it as "next", as then it is simple to discover exactly where A/B/C/D/E are.
On 05/04/2011 03:04 PM, Valdis.Kletnieks@vt.edu wrote: > On Wed, 04 May 2011 13:58:39 EDT, Josef Bacik said: > >> -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the >> given position. > > Nearest, or next? Solaris defines it as "next", for a good reason - otherwise > you can get stuck in a case where the "nearest" hole is back towards the > start of the file - and "seek data" will bounce back to the next byte at > the other end of the hole. > Yeah sorry the log says "nearest" but the code says "next", if you look at it thats how it works. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 04 May 2011 15:10:20 EDT, Josef Bacik said: > Yeah sorry the log says "nearest" but the code says "next", if you look > at it thats how it works. Thanks, Oh good - the changelog is usually easier to fix than the code is. :) Probably want to fix the changelog before it gets committed, as there's a fair chance that text will end up being used as the basis for a manpage or other documentation.
On 05/04/2011 03:20 PM, Valdis.Kletnieks@vt.edu wrote: > On Wed, 04 May 2011 15:10:20 EDT, Josef Bacik said: > >> Yeah sorry the log says "nearest" but the code says "next", if you look >> at it thats how it works. Thanks, > > Oh good - the changelog is usually easier to fix than the code is. :) > > Probably want to fix the changelog before it gets committed, as there's a fair > chance that text will end up being used as the basis for a manpage or other > documentation. > Yeah agreed I meant to change it this time around but forgot, I will make the log all nice and pretty next time around, as I doubt this will be the last iteration of these patches ;). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 04 May 2011 13:58:39 EDT, Josef Bacik said: > +#define SEEK_HOLE 3 /* seek to the closest hole */ > +#define SEEK_DATA 4 /* seek to the closest data */ Comments here need nearest/next fixing as well - otherwise the ext[34] crew may actually implement the commented semantics. ;) Other than that, patch 1/2 looks OK to me (not that there's much code to review), and 2/2 *seems* sane and implement the "next" semantics, though I only examined the while/if structure and am assuming the btrfs innards are done correctly. In particular, that 'while (1)' looks like it can be painful for a sufficiently large and fragmented file (think a gigabyte file in 4K chunks, producing a million extents), but I'll let a btrfs expert analyse that performance issue ;)
On 05/04/2011 03:31 PM, Valdis.Kletnieks@vt.edu wrote: > On Wed, 04 May 2011 13:58:39 EDT, Josef Bacik said: > >> +#define SEEK_HOLE 3 /* seek to the closest hole */ >> +#define SEEK_DATA 4 /* seek to the closest data */ > > Comments here need nearest/next fixing as well - otherwise the ext[34] crew may > actually implement the commented semantics. ;) > Balls, thanks I'll fix that. > Other than that, patch 1/2 looks OK to me (not that there's much code to > review), and 2/2 *seems* sane and implement the "next" semantics, though I only > examined the while/if structure and am assuming the btrfs innards are done > correctly. In particular, that 'while (1)' looks like it can be painful for a > sufficiently large and fragmented file (think a gigabyte file in 4K chunks, > producing a million extents), but I'll let a btrfs expert analyse that > performance issue ;) > Heh well we do while (1) in btrfs _everywhere_, so this isn't anything new, tho I should probably throw a cond_resched() in there. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2011 02:10 PM, Josef Bacik wrote: > On 05/04/2011 03:04 PM, Valdis.Kletnieks@vt.edu wrote: >> On Wed, 04 May 2011 13:58:39 EDT, Josef Bacik said: >> >>> -SEEK_HOLE: this moves the file pos to the nearest hole in the file >>> from the >>> given position. >> >> Nearest, or next? Solaris defines it as "next", for a good reason - >> otherwise >> you can get stuck in a case where the "nearest" hole is back towards the >> start of the file - and "seek data" will bounce back to the next byte at >> the other end of the hole. >> > > Yeah sorry the log says "nearest" but the code says "next", if you look > at it thats how it works. Thanks, The comments in fs.h say "closest". You may want to change them to "next" as well. Thanks, Shaggy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2011 04:54 PM, Dave Kleikamp wrote: > The comments in fs.h say "closest". You may want to change them to > "next" as well. Sorry. Missed some of the replies before I responded. Already addressed. Shaggy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 04/05/2011 19:58, Josef Bacik ha scritto: > + if (offset>= i_size_read(inode)) { > + mutex_unlock(&inode->i_mutex); > + return -ENXIO; > + } > + offset = i_size_read(inode); > + break; Here maybe it's possible to use offset bigger than i_size, because i_size_read is "atomic" but something can happen between two calls, isn't it? Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 05/05/2011 21:01, Josef Bacik ha scritto: > On 05/05/2011 02:54 PM, Marco Stornelli wrote: >> Il 04/05/2011 19:58, Josef Bacik ha scritto: >>> + if (offset>= i_size_read(inode)) { >>> + mutex_unlock(&inode->i_mutex); >>> + return -ENXIO; >>> + } >>> + offset = i_size_read(inode); >>> + break; >> >> Here maybe it's possible to use offset bigger than i_size, because >> i_size_read is "atomic" but something can happen between two calls, >> isn't it? >> > > We're holding the i_mutex so we are safe, i_size_read is used just for > consistency sake. Thanks, > > Josef > Oh, I'm sorry, I misread the patch, ok. Maybe we can use i_size at this point without i_size_read. Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/05/2011 02:54 PM, Marco Stornelli wrote: > Il 04/05/2011 19:58, Josef Bacik ha scritto: >> + if (offset>= i_size_read(inode)) { >> + mutex_unlock(&inode->i_mutex); >> + return -ENXIO; >> + } >> + offset = i_size_read(inode); >> + break; > > Here maybe it's possible to use offset bigger than i_size, because > i_size_read is "atomic" but something can happen between two calls, > isn't it? > We're holding the i_mutex so we are safe, i_size_read is used just for consistency sake. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 04/05/2011 19:58, Josef Bacik ha scritto: > + if (offset>= i_size_read(inode)) { > + mutex_unlock(&inode->i_mutex); > + return -ENXIO; > + } > + offset = i_size_read(inode); > + break; I can add that generic_file_llseek_unlocked means *unlocked* so you shouldn't unlock any mutex but only return a value. The current version, in case of SEEK_END uses directly i_size indeed, so maybe I'm missing something. Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/05/2011 03:19 PM, Marco Stornelli wrote: > Il 04/05/2011 19:58, Josef Bacik ha scritto: >> + if (offset>= i_size_read(inode)) { >> + mutex_unlock(&inode->i_mutex); >> + return -ENXIO; >> + } >> + offset = i_size_read(inode); >> + break; > > I can add that generic_file_llseek_unlocked means *unlocked* so you > shouldn't unlock any mutex but only return a value. The current version, > in case of SEEK_END uses directly i_size indeed, so maybe I'm missing > something. Yeah this was a copy+paste mistake, ext4 has it's own llseek that I modified to run my tests against and then I just copied and pasted it over to the generic things. I've fixed this earlier, I'll be sending a refreshed set out soon. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/read_write.c b/fs/read_write.c index 5520f8a..6ee63a4 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -64,6 +64,28 @@ generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin) return file->f_pos; offset += file->f_pos; break; + case SEEK_DATA: + /* + * In the generic case the entire file is data, so data only + * starts at position 0 provided the file has an i_size, + * otherwise it's an empty file and will always be ENXIO. + */ + if (offset != 0 || i_size_read(inode)) { + mutex_unlock(&inode->i_mutex); + return -ENXIO; + } + break; + case SEEK_HOLE: + /* + * There is a virtual hole at the end of the file, so as long as + * offset isn't i_size or larger, return i_size. + */ + if (offset >= i_size_read(inode)) { + mutex_unlock(&inode->i_mutex); + return -ENXIO; + } + offset = i_size_read(inode); + break; } if (offset < 0 && !unsigned_offsets(file)) diff --git a/include/linux/fs.h b/include/linux/fs.h index dbd860a..1b72e0c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -31,7 +31,9 @@ #define SEEK_SET 0 /* seek relative to beginning of file */ #define SEEK_CUR 1 /* seek relative to current file position */ #define SEEK_END 2 /* seek relative to end of file */ -#define SEEK_MAX SEEK_END +#define SEEK_HOLE 3 /* seek to the closest hole */ +#define SEEK_DATA 4 /* seek to the closest data */ +#define SEEK_MAX SEEK_DATA struct fstrim_range { __u64 start;
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. So we have -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the given position. If the given position is a hole then pos won't move. A "hole" is defined by whatever the fs feels like defining it to be. In simple things like ext2/3 it will simplly mean an unallocated space in the file. For more complex things where you have preallocated space then that is left up to the filesystem. Since preallocated space is supposed to return all 0's it is perfectly legitimate to have SEEK_HOLE dump you out at the start of a preallocated extent, but then again if this is not something you can do and be sure the extent isn't in the middle of being converted to a real extent then it is also perfectly legitimate to skip preallocated extents and only park f_pos at a truly unallocated section. -SEEK_DATA: this is obviously a little more self-explanatory. Again the only ambiguity comes in with preallocated extents. If you have an fs that can't reliably tell that the preallocated extent is in the process of turning into a real extent, it is correct for SEEK_DATA to park you at a preallocated extent. In the generic case we will just assume the entire file is data and there is a virtual hole at i_size, so SEEK_DATA will return -ENXIO unless you provide an offset of 0 and the file size is larger than 0, and SEEK_HOLE will put you at i_size unless pos is i_size or larger, and i_size is larger than 0. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> --- v1->v2: Make the generic case assume that the entire file is data and there is a virtual hole at the end of the file. fs/read_write.c | 22 ++++++++++++++++++++++ include/linux/fs.h | 4 +++- 2 files changed, 25 insertions(+), 1 deletions(-)