Ntfs-3g 2010.8.8

Ntfs-3g Options
Ntfs-3g Big_writes

I tried installing the ntfs-3g package by using the lino on the first page, but it looks like the URL is not longer valid. So, as I read in an earlier part of the thread, I installed the folowing packages from the server: ntfs-3g2010.8.8-1-fuseintbrcm47xx.ipk, ntfs-3g-utils2010.8.8-1-fuseintbrcm47xx.ipk, and ntfsprogs2.0.0-1brcm47xx.ipk. SYSLOG: Apr 4 04:55:36 ubuntu ntfs-3g: Version 2010.8.8 external FUSE 28 Apr 4 04:55:36 ubuntu ntfs-3g: Mounted /dev/sda1 (Read-Write, label 'SYSTEM', NTFS 3.1) Apr 4.

NTFS has a special internal storage format for compressedfiles. This format is compatible with compressing and decompressing onthe fly, which makes transparent reading and writing compressed files byapplications possible without explicitly starting a compression tool. Doing so, storage space is saved at the expense of computingrequirements and files stored on slow devices may be read faster than uncompressed ones.

Another compression method has been brought by Windows Server 2012 through the deduplication feature : similar parts of different files areonly stored once and compressed. This requires an examination of all the files in the file system so it is not compatible with deduplicating on the fly. Deduplicated files are however organized as reparse points, so that the reassembly of parts needed for a transparent reading can be triggered. The deduplication is mostly used for saving space on backup storage.
Yet newer compression methods have been brought by Windows 10 for savingspace on the system partition. Each file is compressed individually, using a more efficient format not compatible with compressing on the fly. Such compressed files are also organized as reparse points and can be read transparently. Windows 10 apparently only uses these formats on computers which can decompress faster than read uncompressed.

Basic Compression

Currently reading compressed files is supported by all ntfs-3g versions.Creating new compressed files, clearing contents, and appending data toexisting compressed files are supported since ntfs-2009.11.14. Modifyingexisting compressed files by overwriting existing data (or existing holes) issupported since ntfs-3g-2010.8.8.

Ntfs-3g Options

When the mount option compression is set, files created in a directory marked forcompression are created compressed. They remain compressed when they are moved (byrenaming) to a regular directory in the same volume, and data appended tothem after they have been moved are compressed. Conversely files which werepresent in a directory before it is marked for compression, and files movedfrom a directory not marked for compression are not compressed. Copying acompressed file always decompresses it, just to compress it again if thetarget directory is marked for compression.

A directory is marked for compression by setting the attribute flagFILE_ATTRIBUTE_COMPRESSED (hex value 0x00000800). This can be done by setfattr applied tothe extended attribute system.ntfs_attrib_be. This attribute is not available in olderversions, and system.ntfs_attrib has to be used instead, with the value shownas 0x00080000 on small-endian computers. Marking or unmarking a directory forcompression has no effect on existing files or directories, the mark is onlyused when creating new files or directories in the marked directory.

Notes

compression is not recommended for files which are frequentlyread,such as system files or files made available on file servers. Moreovercompression is not effective on files compressed by other means (suchas zip, gz, jpg, gif, mp3, etc.)
ntfs-3g tries to allocated consecutive clusters to a compressedfile, thus avoiding fragmentation of the storage space when files arecreated without overwriting.
some programs, like gccor torrent-type downloaders, overwrite existing data or holes in filesthey are creating. This implies multiple decompressions andrecompressions, and causes fragmentation when the recompressed data hasnot the same size as the original. Such inefficient situations should beavoided.
compression is not possible if the cluster size is greater than4K bytes.

Deduplicated Files

Ntfs-3g Big_writes

The file deduplication feature can be enabled at partition level on Windows Servers since the 2012 edition. The deduplication itself is done by a background process which examines files from the partition to detect common parts. Small files are excluded, so are encrypted files and those which are compressed at the application level (by zip, gzip, etc.).

The deduplicated files are stored like sparse files with no data and they are referenced in normal directories. Their attributes (size, time stamps, permissions, etc.) are the original ones with reparse point information added to locate actual data possibly shared with other deduplicated files.

Each part of a file (a chunk) is compressed and stored within a chunks file. The size of each chunk is variable and limited to 131K. The chunks which are part of a file are listed in one or more smaps file. Finally the smaps designating parts of the file are listed in the reparse data of the file (see picture). The splitting of files to chunks strives to maximize the amount of shared chunks.

Each smap entry records the size and position of the designated chunk in the file, and similarly each reparse data entry records the global size and position of the chunks recorded in the designated smap. Thus, when looking for some position in the file, the required chunk can easily be determined, and getting the data only requires decompressingthe chunk from its beginning.

The chunks, smaps and other technical files used by the background deduplication process are stored in standard files within the 'System Volume Information' directory. New files are created and stored the usual way, until the background process examines them for deduplication.Files which are opened for updating must be extracted and stored theusual way before the updating takes place.

Reading deduplicated files is only possible since ntfs-3g-2016.2.22AR.1and requires a specific plugin (available on the download page).There is no deduplication background process. Newly created files are not deduplicated. Updating deduplicated files is not possible, they can be deleted and recreated as new files. When the partition is mounted on Windows, the deduplication process examines new files for decompression and reclaims the space which was used by chunks from deleted files not used by other files.

System Compressed Files

The system compression feature is activated on some Windows 10 systempartitions in order to reduce the system footprint. Several compressionmethods are available, they have a better compression rate than basiccompression but they requires more CPU time and they are not compatible withupdating on the fly. Windows uses this compression method for system fileswhich are written during a system update and never updated subsequently.

The system compressed files are stored like sparse files with no dataand they are referenced in normal directories. Their attributes (size, time stamps, permissions, etc.) are the original ones with a compressed stream added, in association with reparse point information to describethe compression method used.

The original file is split into fixed-size chunks which are compressedindependently. A list of the compressed chunks is located at the beginning ofthe compressed stream, followed by the compressed data, with no space inbetween. By looking at the table, the compressed chunk for some uncompressedposition can be determined and decompressed, so reading some portion of thefile is possible without decompressing from the beginning of the file.However creating a new compressed file is only possible when its size isknown, so that the space for the chunks table can be reserved. Updating somepart of the file without changing its size implies changing the size of somecompressed chunks and consequently having to relocate the chunks up to theend of stream and to update the chunks table. In short, appending data to acompressed file or updating its contents require decompressing the whole fileand recompressing it.

Reading system compressed files is only possible since ntfs-3g-2016.2.22AR.1 and requires a specific plugin (available on thedownload page)

Basic Compression Method

The basic NTFS compression is based on the public domain algorithmLZ77 (Zivand Lempel, 1977). It isfaster than most widely used compression methods, does not requireto decompress the beginning of the file to read or update a random part of it, but its compression rate is moderate.

The file to compress is split into 4096 byte blocks, and compressionis applied on each block independently. In each block, when asequence of three bytes or more appears twice, the second occurrence isreplaced by the position and length of the first one. A block can thusbe decompressed, provided its beginning can be located, by locating thereferences to a previous sequence and replacing the references by thedesignated bytes.

If such a block compresses to 4094 bytes or less, two bytesmentioning the new size are prepended to the block. If it does not, theblock is not compressed and two bytes mentioning a count of 4096 areprepended.

Several compressed blocks representing 16 clusters of uncompresseddata are then concatenated. If the total compressed size is 15 clustersor less, the needed clusters are written and marked as used, and theremaining ones are marked as unneeded. If they only contain zeroes,they are all marked as unneeded. If 16 or 17 clusters areneeded, no compression is done, the 16 clusters are filled withuncompressed data. The cluster size is defined when formating thevolume (generally 512 bytes for small volumes and 4096 for big volumes).

Only the allocated clusters in a set of 16 or less are identified in theallocation tables, with neighbouring ones being grouped. When seeking to arandom byte for reading, the first cluster in the relevant set is directlylocated. If the set is found to contain 16 allocated clusters, it is notcompressed and the requested byte is directly located. If it contains 15clusters or less, it contains blocks of compressed data, and the first coupleof bytes of each block indicates its compressed size, so that the relevantblock can be located, it has to be decompressed to access the requestedbyte.

When ntfs-3g appends data to a compressed file, the data is first writtenuncompressed, until 16 clusters are filled, which implies the 16 clusters areallocated to the file. When the set of 16 clusters is full, data is read backand compressed. Then, if compression if effective, the needed clusters arewritten again and the unneeded ones are deallocated.

When the file is closed, the last set of clusters is compressed, and ifthe file is opened again for appending, the set is decompressed for mergingthe new data.