CSE 451: Operating SystemsSpring 2012Module18Berkeley Log-Structured File System
Ed [email protected] Center 570
© 2012 Gribble, Lazowska, Levy, Zahorjan
2
LFS inspiration
Suppose, instead, what you wrote to disk was a log of changes made to fileslog includes modified data blocks and modified metadata blocksbuffer a huge block (“segment”) in memory – 512K or 1Mwhen full, write it to disk in one efficient contiguous transferright away, you’ve decreased seeks by a factor of 1M/4K = 250So the disk contains a single big long log of changes, consisting of threaded segments
© 2012 Gribble, Lazowska, Levy, Zahorjan
3
LFS basic approach
Use the disk as alogA log is a data structure that is written only at one endIf the disk were managed as a log, there would be effectively no seeksThe “file” is always added to sequentiallyNew data and metadata (i-nodes, directories) are accumulated in the buffer cache, then written all at once in large blocks (e.g., segments of .5M or 1M)This would greatly increase disk write throughputSounds simple – but really complicated under the covers
© 2012 Gribble, Lazowska, Levy, Zahorjan
4
LFS vs. UNIX File System or FFS
file1
file2
dir1
dir2
Unix FileSystem
file1
file2
dir1
dir2
Log-StructuredFile System
Log
i-nodedirectorydatai-node map
Blocks written tocreate two 1-blockfiles: dir1/file1 anddir2/file2, in UFS andLFS
© 2012 Gribble, Lazowska, Levy, Zahorjan
5
5
LFS Challenges
Locating data written in the logFS/FFS place files in a well-known location, LFS writes data “at the end of the log”Even locating i-nodes!In LFS, i-nodes too go into the log!Managing free space on the diskDisk is finite, and therefore log must be finiteSo cannot just keep appending to log, ad infinitum!need to recover deleted blocks in old part of logneed to fill holes created by recovered blocks(Note: Reads are the same as FS/FFS once you find the i-node – and writes are a ton faster!)
© 2012 Gribble, Lazowska, Levy, Zahorjan
6
6
LFS: Locating data and i-nodes
LFS uses i-nodes to locate data blocks, just like FS/FFSLFS appends i-nodes to end of log, just like datamakes them hard to findSolution:use another level of indirection:i-node mapsi-node maps map i-node #s to i-node locationso how do you find the i-node map?after all, changes to it must be appended to the loglocation of i-node map blocks are kept in acheckpoint regioncheckpoint region has a fixed locationcache i-node maps in memory for performance
© 2012 Gribble, Lazowska, Levy, Zahorjan
7
7
LFS: File reads and writes
Reads are no different than in FS/FFS, once we find the i-node for the fileThe i-node map, which is cached in memory, gets you to the i-node, which gets you to the blocksEvery write causes new blocks to be added to the tail end of the current “segment buffer” in memoryWhen the segment is full, it’s written to disk
© 2012 Gribble, Lazowska, Levy, Zahorjan
8
8
LFS: Free space management
Writing segments to the log eats up disk spaceOver time, segments in the log become fragmented as we replace old blocks of files with new blocksi-nodes no longer point to blocks, but those blocks still occupy their space in the logImagine modifying a single block of a file, over and over again – eventually this would chew up the entire disk!Solution: Garbage-collect segments with little “live” data and recover the disk space
© 2012 Gribble, Lazowska, Levy, Zahorjan
9
9
LFS: Segment cleaning
Log is divided into (large) segmentsSegments are threaded on disksegments can be anywhereReclaim space by cleaning segmentsread segmentcopy live data to end of lognow have free segment you can reuse!Cleaning is an issuecostly overhead, when do you do it?A cleaner daemon cleans old segments, based onutilization: how much is to be gained by cleaning?age: how likely is the segment to change soon?
© 2012 Gribble, Lazowska, Levy, Zahorjan
10
LFS summary
As caches get big, most reads will be satisfied from the cacheNo matter how you cache write operations, though, they are eventually going to have to get back to diskThus, most disk traffic will be write trafficIf you eventually put blocks (i-nodes, file content blocks) back where they came from, then even if you schedule disk writes cleverly, there’s still going to be a lot of head movement (which dominates disk performance)
© 2012 Gribble, Lazowska, Levy, Zahorjan
11
Suppose, instead, what you wrote to disk was a log of changes made to fileslog includes modified data blocks and modified metadata blocksbuffer a huge block (“segment”) in memory – 512K or 1Mwhen full, write it to disk in one efficient contiguous transferright away, you’ve decreased seeks by a factor of 1M/4K = 250So the disk is just one big long log, consisting of threaded segments
© 2012 Gribble, Lazowska, Levy, Zahorjan
12
What happens when a crash occurs?you lose some workbut the log that’s on disk represents a consistent view of the file system at some instant in timeSuppose you have to read a file?once you find its current i-node, you’re finei-node maps provide a level of indirection that makes this possibledetails aren’t that important
© 2012 Gribble, Lazowska, Levy, Zahorjan
13
How do you prevent overflowing the disk (because the log just keeps on growing)?segment cleaner coalesces the active blocks from multiple old log segments into a new log segment, freeing the old log segments for re-useAgain, the details aren’t that important
© 2012 Gribble, Lazowska, Levy, Zahorjan
14
Tradeoffs
LFS wins, relative to FFSmetadata-heavy workloadssmall file writesdeletes(metadata requires an additional write, and FFS does this synchronously)LFS loses, relative to FFSmany files are partially over-written in random orderfile gets splayed throughout the logLFS vs. JFSJFS is “robust” like LFS, but data must eventually be written back “where it came from” so disk bandwidth is still an issue
© 2012 Gribble, Lazowska, Levy, Zahorjan
15
LFS history
Designed by Mendel Rosenblum and his advisor John Ousterhout at Berkeley in 1991Rosenblum went on to become a Stanford professor and to co-found VMware, so even if this wasn’t his finest hour, he’s OKEx-Berkeley student Margo Seltzer (faculty at Harvard) published a 1995 paper comparing and contrasting LFS with conventional FFS, and claiming poor LFS performance in some realistic circumstancesOusterhout published a “Critique of Seltzer’s LFS Measurements,” rebutting her argumentsSeltzer published “A Response to Ousterhout’s Critique of LFS Measurements,” rebutting the rebuttalOusterhout published “A Response to Seltzer’s Response,” rebutting the rebuttal of the rebuttal
© 2012 Gribble, Lazowska, Levy, Zahorjan
16
Moral of the storyIf you’re going to do OS research, you need a thick skinVerydifficult to predict how a FS will be usedSo it’s hard to generate reasonable benchmarks, let alone a reasonable FS designVerydifficult to measure a FS in practicedepends on a HUGE number of parameters, involving both workload and hardware architecture
0
Embed
Upload