ZPAR ARCHIVER

This is just a quick walk through of a small test I just did with the ZPAQ archiver. The test was done on Windows 64bit (Win7Pro) environment but this app (ZPAQ) is available for a variety of platforms, ie, Win32 and Linux also.

For more info please refer to the homepage of ZPaq.

Disclaimer. I have nothing to do with this software, and I never heard of it before today. I just was curious to find a tool that did what I needed, and stumbled across this. Wanted to test if it worked as expected, and since I prepped the small test doc for myself, put it here for reference.

Context - What was I looking for anyway ?

I was looking for an incremental archival compression too that behaved in a sane way with a use case scenario:

  • Windows machine dumps a fairly dense backup file from a DMS (DMS = Document Management System; Autodesk Vault Freebie Edition - in this case) as part of a nightly backup job
  • Currently I've got the thing doing a 7Zip archive compression on the dump, and the compressed size was originally around 1 gig at time of deployment.
  • Actual change to the DMS is fairly modest on a day-by-day basis. Over the past 3 months, the compressed backup dump has grown in size from ~1gig to ~1.2 gigs.
  • This compressed backup file is replicated to an off-site backup location each night, as a way to be fairly sure that it is possible to recover the data / in case the on-site server blows up / burns down /etc.
  • Unfortunately, the way it works currently - is that every night, a full (now) 1.2gig file is pushed across public internet to the offsite location. Even though the actual "net delta" of any given night is approximately (1.2gigs minus 1gig divided by 90) - ie - modest / trivial size of delta each night, but we end up pushing the whole file every night. Yuk.
  • So all this is driven by a desire to reduce the volume of data required for each night push to offsite, so that I can still have a fresh nightly backup happen, but we don't have to push the whole mess of data each and every night.

The Plan

  • What if there was a sensible archiver tool I can use, instead of vanilla 7Zip compression, which lets me generate a series of incremental archive files, which all together form a full-latest-backup. But - adding new data to the archive will benefit from 'dedup' of identical blocks of data / and won't taint the older files (significantly) by changing the time:date stamp / or size of the older files - and thus forcing all that old data to be replicated offsite, each and every night that I do the replication to offsite task.
  • So, it looks like ZPAQ meets the requirements nicely.
  • Example is walk through below, using trivial test case (TESTDIR) with a few text files being added, then a binary (15meg Installer) and then a copy of same binary with new name.
  • And my likely end-point use case for the DMS System, will be something approx thus
  1. Schedule will assume one full cycle per week, one full plus 6 incremental backups basically.
  2. Sunday - delete all old backup data from the past week.
  3. Sunday bit later - run a fresh full backup - first using the AutoDesk backup script, and then use ZPAQ to make a sensible 'incremental friendly' compression of this thing.
  4. Sunday bit later still - allow the replicate-to-offsite task to happen. This normally takes ~2 hours based on ISP / Internet access speed at the site.
  5. Monday - run a fresh pass of the AutoDesk backup script. And then let ZPAQ do an incremental backup of this; generating a new increment file which ~mostly represents the deltas of data in the last ~24 hours, plus a small delta to the index file, which will need to be re-pushed in addition to the new segment file.
  6. Tuesday through Saturday - rinse and repeat, same gig as per Monday.
  7. Sunday, welcome back to the top of the cycle. Repeat as per step 2 in this list.
  • The net result of this is one full backup being pushed each week (Sunday) and then much smaller incrementals being pushed all other nights of the week. So we don't get away from pushing 'all the data' entirely; but it happens far less often (weekly, instead of nightly)
  • Obviously, if I wanted, we could tweak the schedule to be more along the lines of
  1. Day1 of the month - run the cleanup; then the backup and FULL task
  2. All other days of month - run the backup and incremental backup
  • However by taking that approach, we end up with .. progressively longer (up to ~30 days) long 'chain' of incremental. Strictly speaking this isn't so bad. And maybe ZPAQ will deal with this as elegantly as I require. So in fact maybe I will go with this approach :) instead of the 1-week cycle.
  • Broadly speaking I would assume there is some 'sane limit' beyond which you probably don't want to make your increments stretch forever back to the past (ie, 1 year worth of nightly increments? 3 years? Longer? I guess it depends on your use case, how big the data footprint is, what your ISP speed limits are, how much you trust ZPAQ to be easy to use for restore-from-increment-that-is-really-long-chain-depth, etc.

Anyhow. Lots of fun.

Kudos to the author of ZPAQ for making such a lovely tool / and making it open source for all to benefit from.

- Tim Chipman Jan.2016


TEST / EXAMPLE WALKTHROUGH


STARTING POINT: WE HAVE ZPAQ BINARY READY TO USE. WE HAVE A TESTDIR WITH A SMALL TEXT FILE IN IT.



Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Windows\system32>cd \

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:\

12/31/2015  11:47 AM    <DIR>          Brother
04/28/2014  03:13 PM    <DIR>          Intel
12/31/2015  11:17 AM    <DIR>          maintenance
07/13/2009  11:20 PM    <DIR>          PerfLogs
01/08/2016  01:09 PM    <DIR>          Program Files
01/08/2016  01:14 PM    <DIR>          Program Files (x86)
04/28/2014  03:06 PM               211 setup.log
12/31/2015  11:50 AM    <DIR>          temp
01/16/2016  10:31 AM    <DIR>          testdir
12/22/2015  04:40 AM    <DIR>          Users
01/13/2016  06:21 PM    <DIR>          Windows
01/16/2016  10:29 AM           657,920 zpaq64.exe
               2 File(s)        658,131 bytes
              10 Dir(s)  259,510,542,336 bytes free

C:\>zpaq64.exe add part??? testdir
zpaq v7.05 journaling archiver, compiled Apr 17 2015
Creating part001.zpaq dated 2016-01-16 14:32:56 assuming 0 prior bytes
Adding 0.000014 MB in 1 files -method 14 -threads 4 at 2016-01-16 14:32:56.
+ testdir/test document.txt 14
[1..1] 26 -method 14,14,0
+ testdir/
2 +added, 0 -removed.

0.000000 + (0.000014 -> 0.000014 -> 0.001137) = 0.001137 MB
0.047 seconds (all OK)


NOW TAKE A LOOK AT CHANGES:

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:\

12/31/2015  11:47 AM    <DIR>          Brother
04/28/2014  03:13 PM    <DIR>          Intel
12/31/2015  11:17 AM    <DIR>          maintenance
01/16/2016  10:32 AM               705 part000.zpaq
01/16/2016  10:32 AM             1,137 part001.zpaq
07/13/2009  11:20 PM    <DIR>          PerfLogs
01/08/2016  01:09 PM    <DIR>          Program Files
01/08/2016  01:14 PM    <DIR>          Program Files (x86)
04/28/2014  03:06 PM               211 setup.log
12/31/2015  11:50 AM    <DIR>          temp
01/16/2016  10:31 AM    <DIR>          testdir
12/22/2015  04:40 AM    <DIR>          Users
01/13/2016  06:21 PM    <DIR>          Windows
01/16/2016  10:29 AM           657,920 zpaq64.exe
               4 File(s)        659,973 bytes
              10 Dir(s)  259,510,534,144 bytes free


NOTE ZPAQ CREATED AN INDEX FILE (part000) and ALSO a DATA File (part001)

NOW BEHIND THE SCENES DROP ANOTHER TEXT FILE INTO TESTDIR, NOTHING BIG YET. RE-RUN ZPAQ IN INCREMENTAL MODE

		  
C:\>zpaq64.exe add part??? testdir
zpaq v7.05 journaling archiver, compiled Apr 17 2015
part???.zpaq: 1 versions, 2 files, 1 fragments, 0.001137 MB
part000.zpaq: 1 versions, 2 files, 1 fragments, 0.000705 MB
Adding 0.002332 MB in 1 files -method 14 -threads 4 at 2016-01-16 14:33:31.
+ testdir/another test.txt 2332
[2..2] 2344 -method 14,169,1
# testdir/
2 +added, 0 -removed.

0.001137 + (0.002332 -> 0.002332 -> 0.002976) = 0.004113 MB
0.093 seconds (all OK)


TAKE A LOOK AT THE OUTPUT ON DISK: NOTE WE NOW HAVE PART002 ADDED / EXISTS. SMALL DELTA IN SIZE TO PART000, NO DELTA TO PART001

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:\

12/31/2015  11:47 AM    <DIR>          Brother
04/28/2014  03:13 PM    <DIR>          Intel
12/31/2015  11:17 AM    <DIR>          maintenance
01/16/2016  10:33 AM             1,409 part000.zpaq
01/16/2016  10:32 AM             1,137 part001.zpaq
01/16/2016  10:33 AM             2,976 part002.zpaq
07/13/2009  11:20 PM    <DIR>          PerfLogs
01/08/2016  01:09 PM    <DIR>          Program Files
01/08/2016  01:14 PM    <DIR>          Program Files (x86)
04/28/2014  03:06 PM               211 setup.log
12/31/2015  11:50 AM    <DIR>          temp
01/16/2016  10:33 AM    <DIR>          testdir
12/22/2015  04:40 AM    <DIR>          Users
01/13/2016  06:21 PM    <DIR>          Windows
01/16/2016  10:29 AM           657,920 zpaq64.exe
               5 File(s)        663,653 bytes
              10 Dir(s)  259,510,525,952 bytes free


BEHIND THE SCENES ADD ANOTHER FILE INTO TEST DIR. IT IS A COPY OF THE FILE ADDED IN LAST PASS. IDENTICAL CONTENT JUST SLIGHTLY DIFFERENT NAME (append copy to end of name)

C:\>zpaq64.exe add part??? testdir
zpaq v7.05 journaling archiver, compiled Apr 17 2015
part???.zpaq: 2 versions, 3 files, 2 fragments, 0.004113 MB
part000.zpaq: 2 versions, 3 files, 2 fragments, 0.001409 MB
Adding 0.002332 MB in 1 files -method 14 -threads 4 at 2016-01-16 14:33:42.
+ testdir/another test - Copy.txt 2332 -> 0
# testdir/
2 +added, 0 -removed.

0.004113 + (0.002332 -> 0.000000 -> 0.000586) = 0.004699 MB
0.047 seconds (all OK)


LOOK AT WHERE WE END UP - PART000 gets index addition, part001 AND part002 are unchanged; part003 is created and is quite tiny - very little data.

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:\

12/31/2015  11:47 AM    <DIR>          Brother
04/28/2014  03:13 PM    <DIR>          Intel
12/31/2015  11:17 AM    <DIR>          maintenance
01/16/2016  10:33 AM             1,995 part000.zpaq
01/16/2016  10:32 AM             1,137 part001.zpaq
01/16/2016  10:33 AM             2,976 part002.zpaq
01/16/2016  10:33 AM               586 part003.zpaq
07/13/2009  11:20 PM    <DIR>          PerfLogs
01/08/2016  01:09 PM    <DIR>          Program Files
01/08/2016  01:14 PM    <DIR>          Program Files (x86)
04/28/2014  03:06 PM               211 setup.log
12/31/2015  11:50 AM    <DIR>          temp
01/16/2016  10:33 AM    <DIR>          testdir
12/22/2015  04:40 AM    <DIR>          Users
01/13/2016  06:21 PM    <DIR>          Windows
01/16/2016  10:29 AM           657,920 zpaq64.exe
               6 File(s)        664,825 bytes
              10 Dir(s)  259,510,517,760 bytes free


NOW BEHIND THE SCENES DROP A BINARY OF NON_TRIVIAL SIZE INTO THE TESTDIR: Installer EXE for MS_security essentials, approx 14 megs.

C:\>zpaq64.exe add part??? testdir
zpaq v7.05 journaling archiver, compiled Apr 17 2015
part???.zpaq: 3 versions, 4 files, 2 fragments, 0.004699 MB
part000.zpaq: 3 versions, 4 files, 2 fragments, 0.001995 MB
Adding 14.243034 MB in 2 files -method 14 -threads 4 at 2016-01-16 14:34:47.
100.00% 0:00:00 + testdir/mseinstall.exe 14243008
100.00% 0:00:00 + testdir/mseinstall.exe:Zone.Identifier:$DATA 26
100.00% 0:00:00 [3..199] 14243830 -method 14,4,0
# testdir/
3 +added, 0 -removed.

0.004699 + (14.243034 -> 14.243034 -> 14.251026) = 14.255725 MB
1.466 seconds (all OK)


SEE WHAT WE HAVE GOT - Update to Part000 Index, modest increase but much less than data delta AND part001,2,3 are unchanged AND part004 is created, which has majority of data just added - 14megs of dense data.

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:\

12/31/2015  11:47 AM    <DIR>          Brother
04/28/2014  03:13 PM    <DIR>          Intel
12/31/2015  11:17 AM    <DIR>          maintenance
01/16/2016  10:34 AM             8,213 part000.zpaq
01/16/2016  10:32 AM             1,137 part001.zpaq
01/16/2016  10:33 AM             2,976 part002.zpaq
01/16/2016  10:33 AM               586 part003.zpaq
01/16/2016  10:34 AM        14,251,026 part004.zpaq
07/13/2009  11:20 PM    <DIR>          PerfLogs
01/08/2016  01:09 PM    <DIR>          Program Files
01/08/2016  01:14 PM    <DIR>          Program Files (x86)
04/28/2014  03:06 PM               211 setup.log
12/31/2015  11:50 AM    <DIR>          temp
01/16/2016  10:34 AM    <DIR>          testdir
12/22/2015  04:40 AM    <DIR>          Users
01/13/2016  06:21 PM    <DIR>          Windows
01/16/2016  10:29 AM           657,920 zpaq64.exe
               7 File(s)     14,922,069 bytes
              10 Dir(s)  259,481,960,448 bytes free


FINALLY: MAKE A COPY OF THE 14meg MSE INSTALLER, AND RE RUN THE ZPAQ:


C:\>zpaq64.exe add part??? testdir
zpaq v7.05 journaling archiver, compiled Apr 17 2015
part???.zpaq: 4 versions, 6 files, 199 fragments, 14.255725 MB
part000.zpaq: 4 versions, 6 files, 199 fragments, 0.008213 MB
Adding 14.243034 MB in 2 files -method 14 -threads 4 at 2016-01-16 14:34:57.
100.00% 0:00:00 + testdir/mseinstall - Copy.exe 14243008 -> 0
100.00% 0:00:00 + testdir/mseinstall - Copy.exe:Zone.Identifier:$DATA 26 -> 0
# testdir/
3 +added, 0 -removed.

14.255725 + (14.243034 -> 0.000000 -> 0.001394) = 14.257119 MB
0.202 seconds (all OK)


SEE WHAT WE HAVE: THIS TIME

- another modest increment to index file part000 - Files 1,2,3,4 are unchanged - File 005 is created and is VERY SMALL (compared to part004) since the binary data is virtually identical. (just filename is different)


C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:\

12/31/2015  11:47 AM    <DIR>          Brother
04/28/2014  03:13 PM    <DIR>          Intel
12/31/2015  11:17 AM    <DIR>          maintenance
01/16/2016  10:34 AM             9,607 part000.zpaq
01/16/2016  10:32 AM             1,137 part001.zpaq
01/16/2016  10:33 AM             2,976 part002.zpaq
01/16/2016  10:33 AM               586 part003.zpaq
01/16/2016  10:34 AM        14,251,026 part004.zpaq
01/16/2016  10:34 AM             1,394 part005.zpaq
07/13/2009  11:20 PM    <DIR>          PerfLogs
01/08/2016  01:09 PM    <DIR>          Program Files
01/08/2016  01:14 PM    <DIR>          Program Files (x86)
04/28/2014  03:06 PM               211 setup.log
12/31/2015  11:50 AM    <DIR>          temp
01/16/2016  10:34 AM    <DIR>          testdir
12/22/2015  04:40 AM    <DIR>          Users
01/13/2016  06:21 PM    <DIR>          Windows
01/16/2016  10:29 AM           657,920 zpaq64.exe
               8 File(s)     14,924,857 bytes
              10 Dir(s)  259,467,665,408 bytes free

C:\>