Squashfs 4.0 Filesystem — The Linux Kernel documentation (2024)

Squashfs is a compressed read-only filesystem for Linux.

It uses zlib, lz4, lzo, or xz compression to compress files, inodes anddirectories. Inodes in the system are very small and all blocks are packed tominimise data overhead. Block sizes greater than 4K are supported up to amaximum of 1Mbytes (default block size 128K).

Squashfs is intended for general read-only filesystem use, for archivaluse (i.e. in cases where a .tar.gz file may be used), and in constrainedblock device/memory systems (e.g. embedded systems) where low overhead isneeded.

Mailing list: squashfs-devel@lists.sourceforge.netWeb site: www.squashfs.org

1. Filesystem Features

Squashfs filesystem features versus Cramfs:

Max filesystem size

2^64

256 MiB

Max file size

~ 2 TiB

16 MiB

Max files

unlimited

unlimited

Max directories

unlimited

unlimited

Max entries per directory

unlimited

unlimited

Max block size

1 MiB

4 KiB

Metadata compression

yes

no

Directory indexes

yes

no

Sparse file support

yes

no

Tail-end packing (fragments)

yes

no

Exportable (NFS etc.)

yes

no

Hard link support

yes

no

“.” and “..” in readdir

yes

no

Real inode numbers

yes

no

32-bit uids/gids

yes

no

File creation time

yes

no

Xattr support

yes

no

ACL support

no

no

Squashfs compresses data, inodes and directories. In addition, inode anddirectory data are highly compacted, and packed on byte boundaries. Eachcompressed inode is on average 8 bytes in length (the exact length varies onfile type, i.e. regular file, directory, symbolic link, and block/char deviceinodes have different sizes).

2. Using Squashfs

As squashfs is a read-only filesystem, the mksquashfs program must be used tocreate populated squashfs filesystems. This and other squashfs utilitiescan be obtained from http://www.squashfs.org. Usage instructions can beobtained from this site also.

The squashfs-tools development tree is now located on kernel.org

git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git

2.1 Mount options

errors=%s

Specify whether squashfs errors trigger a kernel panicor not

continue

errors don’t trigger a panic (default)

panic

trigger a panic when errors are encountered,similar to several other filesystems (e.g.btrfs, ext4, f2fs, GFS2, jfs, ntfs, ubifs)

This allows a kernel dump to be saved,useful for analyzing and debugging thecorruption.

threads=%s

Select the decompression mode or the number of threads

If SQUASHFS_CHOICE_DECOMP_BY_MOUNT is set:

single

use single-threaded decompression (default)

Only one block (data or metadata) can bedecompressed at any one time. This limitsCPU and memory usage to a minimum, but italso gives poor performance on parallel I/Oworkloads when using multiple CPU machinesdue to waiting on decompressor availability.

multi

use up to two parallel decompressors per core

If you have a parallel I/O workload and yoursystem has enough memory, using this optionmay improve overall I/O performance. Itdynamically allocates decompressors on ademand basis.

percpu

use a maximum of one decompressor per core

It uses percpu variables to ensuredecompression is load-balanced across thecores.

1|2|3|...

configure the number of threads used fordecompression

The upper limit is num_online_cpus() * 2.

If SQUASHFS_CHOICE_DECOMP_BY_MOUNT is not set andSQUASHFS_DECOMP_MULTI, SQUASHFS_MOUNT_DECOMP_THREADS areboth set:

2|3|...

configure the number of threads used fordecompression

The upper limit is num_online_cpus() * 2.

3. Squashfs Filesystem Design

A squashfs filesystem consists of a maximum of nine parts, packed together on abyte alignment:

 ---------------| superblock ||---------------|| compression || options ||---------------|| datablocks || & fragments ||---------------|| inode table ||---------------|| directory || table ||---------------|| fragment || table ||---------------|| export || table ||---------------|| uid/gid || lookup table ||---------------|| xattr || table | ---------------

Compressed data blocks are written to the filesystem as files are read fromthe source directory, and checked for duplicates. Once all file data has beenwritten the completed inode, directory, fragment, export, uid/gid lookup andxattr tables are written.

3.1 Compression options

Compressors can optionally support compression specific options (e.g.dictionary size). If non-default compression options have been used, thenthese are stored here.

3.2 Inodes

Metadata (inodes and directories) are compressed in 8Kbyte blocks. Eachcompressed block is prefixed by a two byte length, the top bit is set if theblock is uncompressed. A block will be uncompressed if the -noI option is set,or if the compressed block was larger than the uncompressed block.

Inodes are packed into the metadata blocks, and are not aligned to blockboundaries, therefore inodes overlap compressed blocks. Inodes are identifiedby a 48-bit number which encodes the location of the compressed metadata blockcontaining the inode, and the byte offset into that block where the inode isplaced (<block, offset>).

To maximise compression there are different inodes for each file type(regular file, directory, device, etc.), the inode contents and lengthvarying with the type.

To further maximise compression, two types of regular file inode anddirectory inode are defined: inodes optimised for frequently occurringregular files and directories, and extended types where extrainformation has to be stored.

3.3 Directories

Like inodes, directories are packed into compressed metadata blocks, storedin a directory table. Directories are accessed using the start address ofthe metablock containing the directory and the offset into thedecompressed block (<block, offset>).

Directories are organised in a slightly complex way, and are not simplya list of file names. The organisation takes advantage of thefact that (in most cases) the inodes of the files will be in the samecompressed metadata block, and therefore, can share the start block.Directories are therefore organised in a two level list, a directoryheader containing the shared start block value, and a sequence of directoryentries, each of which share the shared start block. A new directory headeris written once/if the inode start block changes. The directoryheader/directory entry list is repeated as many times as necessary.

Directories are sorted, and can contain a directory index to speed upfile lookup. Directory indexes store one entry per metablock, each entrystoring the index/filename mapping to the first directory headerin each metadata block. Directories are sorted in alphabetical order,and at lookup the index is scanned linearly looking for the first filenamealphabetically larger than the filename being looked up. At this point thelocation of the metadata block the filename is in has been found.The general idea of the index is to ensure only one metadata block needs to bedecompressed to do a lookup irrespective of the length of the directory.This scheme has the advantage that it doesn’t require extra memory overheadand doesn’t require much extra storage on disk.

3.4 File data

Regular files consist of a sequence of contiguous compressed blocks, and/or acompressed fragment block (tail-end packed block). The compressed sizeof each datablock is stored in a block list contained within thefile inode.

To speed up access to datablocks when reading ‘large’ files (256 Mbytes orlarger), the code implements an index cache that caches the mapping fromblock index to datablock location on disk.

The index cache allows Squashfs to handle large files (up to 1.75 TiB) whileretaining a simple and space-efficient block list on disk. The cacheis split into slots, caching up to eight 224 GiB files (128 KiB blocks).Larger files use multiple slots, with 1.75 TiB files using all 8 slots.The index cache is designed to be memory efficient, and by default uses16 KiB.

3.5 Fragment lookup table

Regular files can contain a fragment index which is mapped to a fragmentlocation on disk and compressed size using a fragment lookup table. Thisfragment lookup table is itself stored compressed into metadata blocks.A second index table is used to locate these. This second index table forspeed of access (and because it is small) is read at mount time and cachedin memory.

3.6 Uid/gid lookup table

For space efficiency regular files store uid and gid indexes, which areconverted to 32-bit uids/gids using an id look up table. This table isstored compressed into metadata blocks. A second index table is used tolocate these. This second index table for speed of access (and because itis small) is read at mount time and cached in memory.

3.7 Export table

To enable Squashfs filesystems to be exportable (via NFS etc.) filesystemscan optionally (disabled with the -no-exports Mksquashfs option) containan inode number to inode disk location lookup table. This is required toenable Squashfs to map inode numbers passed in filehandles to the inodelocation on disk, which is necessary when the export code reinstantiatesexpired/flushed inodes.

This table is stored compressed into metadata blocks. A second index table isused to locate these. This second index table for speed of access (and becauseit is small) is read at mount time and cached in memory.

3.8 Xattr table

The xattr table contains extended attributes for each inode. The xattrsfor each inode are stored in a list, each list entry containing a type,name and value field. The type field encodes the xattr prefix(“user.”, “trusted.” etc) and it also encodes how the name/value fieldsshould be interpreted. Currently the type indicates whether the valueis stored inline (in which case the value field contains the xattr value),or if it is stored out of line (in which case the value field stores areference to where the actual value is stored). This allows large valuesto be stored out of line improving scanning and lookup performance and italso allows values to be de-duplicated, the value being stored once, andall other occurrences holding an out of line reference to that value.

The xattr lists are packed into compressed 8K metadata blocks.To reduce overhead in inodes, rather than storing the on-disklocation of the xattr list inside each inode, a 32-bit xattr idis stored. This xattr id is mapped into the location of the xattrlist using a second xattr id lookup table.

4. TODOs and Outstanding Issues

4.1 TODO list

Implement ACL support.

4.2 Squashfs Internal Cache

Blocks in Squashfs are compressed. To avoid repeatedly decompressingrecently accessed data Squashfs uses two small metadata and fragment caches.

The cache is not used for file datablocks, these are decompressed and cached inthe page-cache in the normal way. The cache is used to temporarily cachefragment and metadata blocks which have been read as a result of a metadata(i.e. inode or directory) or fragment access. Because metadata and fragmentsare packed together into blocks (to gain greater compression) the read of aparticular piece of metadata or fragment will retrieve other metadata/fragmentswhich have been packed with it, these because of locality-of-reference may beread in the near future. Temporarily caching them ensures they are availablefor near future access without requiring an additional read and decompress.

In the future this internal cache may be replaced with an implementation whichuses the kernel page cache. Because the page cache operates on page sizedunits this may introduce additional complexity in terms of locking andassociated race conditions.

Squashfs 4.0 Filesystem — The Linux Kernel  documentation (2024)
Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6498

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.