Deduplication, Deduplication (DeDup), or Data Deduplication, Data Deduplication(DDD), is about identifying and eliminating files that have been processed and stored multiple times and are redundant or differ only slightly. The goal of deduplication is to optimize the capacity ofstorage media. It is about reducing the stored data volume and saving storage space.
In deduplication, identical files are forwarded by a pointer to the original data block, or they are replaced by small files that assume the role of a placeholder, with the storage address of the original files. Thus, they form the reference to already recorded files. If the files have only minor differences, then only the differences are stored in the files. In deduplication techniques, there are two basic methods: Inline processing and post-processing.
Inline processing involves continuous processing of data for virtual tape libraries(VTLs). This can be done bychecking smaller files for duplicate content or by determining the hash value for these small files.
Post-processing is different in that the data is first saved and then checked for duplicates. This technique requires more memory first when storing, which is reduced after checking.
The savings in memory usage, power and processing costs possible with deduplication depend on the size of the files, their number and the frequency of backups. It is in this area that a significant savings effect occurs when differentiating between the same, modified and new files. In more complex applications, deduplication can also be applied to individual data blocks. In addition to classical deduplication, there is also global deduplication. It allows data from multiple copies to be compared against each other and redundant data to be eliminated.