Testing.Jan-2016-zpaq-archiver-test-notes History

Hide minor edits - Show changes to markup

January 16, 2016, at 03:35 PM by 142.167.11.206 -
Changed line 53 from:
to:

Changed line 57 from:
to:

Changed line 99 from:
to:

Changed lines 126-127 from:

@]

to:

@]


Added line 148:

Added line 178:

Added line 195:

Changed lines 224-225 from:

@]

to:

@]


Added line 243:

Changed line 275 from:
to:

Added line 293:

January 16, 2016, at 03:34 PM by 142.167.11.206 -
Changed line 53 from:

[++

to:
Deleted line 54:
Changed lines 56-57 from:

++]

to:
January 16, 2016, at 03:34 PM by 142.167.11.206 -
Changed line 53 from:

'''

to:

[++

Changed lines 57-58 from:

'''

to:

++]

January 16, 2016, at 03:33 PM by 142.167.11.206 -
Added line 53:

'''

Added line 57:

'''

January 16, 2016, at 03:32 PM by 142.167.11.206 -
Added lines 124-125:

@] NOTE ZPAQ CREATED AN INDEX FILE (part000) and ALSO a DATA File (part001)

Deleted lines 126-127:

NOTE ZPAQ CREATED AN INDEX FILE (part000) and ALSO a DATA File (part001)

Changed lines 129-130 from:
to:

[@

Added line 144:

@]

Changed line 148 from:
to:

[@

Added line 173:

@]

Changed line 176 from:
to:

[@

Added line 189:

@]

Added line 192:

[@

Changed lines 217-218 from:
to:

@]

Changed line 220 from:
to:

[@

Changed line 235 from:
to:

@]

Changed line 239 from:
to:

[@

Added lines 266-267:

@]

Changed lines 269-270 from:
to:

[@

Added line 284:

@]

Added lines 291-292:

[@

January 16, 2016, at 03:14 PM by 142.167.11.206 -
Added lines 53-55:

STARTING POINT: WE HAVE ZPAQ BINARY READY TO USE.

WE HAVE A TESTDIR WITH A SMALL TEXT FILE IN IT.

Changed lines 58-59 from:
to:

Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Windows\system32>cd C:\>dir

 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:

12/31/2015 11:47 AM <DIR> Brother 04/28/2014 03:13 PM <DIR> Intel 12/31/2015 11:17 AM <DIR> maintenance 07/13/2009 11:20 PM <DIR> PerfLogs 01/08/2016 01:09 PM <DIR> Program Files 01/08/2016 01:14 PM <DIR> Program Files (x86) 04/28/2014 03:06 PM 211 setup.log 12/31/2015 11:50 AM <DIR> temp 01/16/2016 10:31 AM <DIR> testdir 12/22/2015 04:40 AM <DIR> Users 01/13/2016 06:21 PM <DIR> Windows 01/16/2016 10:29 AM 657,920 zpaq64.exe

               2 File(s)        658,131 bytes
              10 Dir(s)  259,510,542,336 bytes free

C:\>zpaq64.exe add part??? testdir zpaq v7.05 journaling archiver, compiled Apr 17 2015 Creating part001.zpaq dated 2016-01-16 14:32:56 assuming 0 prior bytes Adding 0.000014 MB in 1 files -method 14 -threads 4 at 2016-01-16 14:32:56. + testdir/test document.txt 14 [1..1] 26 -method 14,14,0 + testdir/ 2 +added, 0 -removed.

0.000000 + (0.000014 -> 0.000014 -> 0.001137) = 0.001137 MB 0.047 seconds (all OK)

@]

NOW TAKE A LOOK AT CHANGES:

[@ C:\>dir

 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:

12/31/2015 11:47 AM <DIR> Brother 04/28/2014 03:13 PM <DIR> Intel 12/31/2015 11:17 AM <DIR> maintenance 01/16/2016 10:32 AM 705 part000.zpaq 01/16/2016 10:32 AM 1,137 part001.zpaq 07/13/2009 11:20 PM <DIR> PerfLogs 01/08/2016 01:09 PM <DIR> Program Files 01/08/2016 01:14 PM <DIR> Program Files (x86) 04/28/2014 03:06 PM 211 setup.log 12/31/2015 11:50 AM <DIR> temp 01/16/2016 10:31 AM <DIR> testdir 12/22/2015 04:40 AM <DIR> Users 01/13/2016 06:21 PM <DIR> Windows 01/16/2016 10:29 AM 657,920 zpaq64.exe

               4 File(s)        659,973 bytes
              10 Dir(s)  259,510,534,144 bytes free

NOTE ZPAQ CREATED AN INDEX FILE (part000) and ALSO a DATA File (part001)

NOW BEHIND THE SCENES DROP ANOTHER TEXT FILE INTO TESTDIR, NOTHING BIG YET. RE-RUN ZPAQ IN INCREMENTAL MODE

C:\>zpaq64.exe add part??? testdir zpaq v7.05 journaling archiver, compiled Apr 17 2015 part???.zpaq: 1 versions, 2 files, 1 fragments, 0.001137 MB part000.zpaq: 1 versions, 2 files, 1 fragments, 0.000705 MB Adding 0.002332 MB in 1 files -method 14 -threads 4 at 2016-01-16 14:33:31. + testdir/another test.txt 2332 [2..2] 2344 -method 14,169,1

  1. testdir/

2 +added, 0 -removed.

0.001137 + (0.002332 -> 0.002332 -> 0.002976) = 0.004113 MB 0.093 seconds (all OK)

TAKE A LOOK AT THE OUTPUT ON DISK: NOTE WE NOW HAVE PART002 ADDED / EXISTS. SMALL DELTA IN SIZE TO PART000, NO DELTA TO PART001

C:\>dir

 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:

12/31/2015 11:47 AM <DIR> Brother 04/28/2014 03:13 PM <DIR> Intel 12/31/2015 11:17 AM <DIR> maintenance 01/16/2016 10:33 AM 1,409 part000.zpaq 01/16/2016 10:32 AM 1,137 part001.zpaq 01/16/2016 10:33 AM 2,976 part002.zpaq 07/13/2009 11:20 PM <DIR> PerfLogs 01/08/2016 01:09 PM <DIR> Program Files 01/08/2016 01:14 PM <DIR> Program Files (x86) 04/28/2014 03:06 PM 211 setup.log 12/31/2015 11:50 AM <DIR> temp 01/16/2016 10:33 AM <DIR> testdir 12/22/2015 04:40 AM <DIR> Users 01/13/2016 06:21 PM <DIR> Windows 01/16/2016 10:29 AM 657,920 zpaq64.exe

               5 File(s)        663,653 bytes
              10 Dir(s)  259,510,525,952 bytes free

BEHIND THE SCENES ADD ANOTHER FILE INTO TEST DIR. IT IS A COPY OF THE FILE ADDED IN LAST PASS. IDENTICAL CONTENT JUST SLIGHTLY DIFFERENT NAME (append copy to end of name)

C:\>zpaq64.exe add part??? testdir zpaq v7.05 journaling archiver, compiled Apr 17 2015 part???.zpaq: 2 versions, 3 files, 2 fragments, 0.004113 MB part000.zpaq: 2 versions, 3 files, 2 fragments, 0.001409 MB Adding 0.002332 MB in 1 files -method 14 -threads 4 at 2016-01-16 14:33:42. + testdir/another test - Copy.txt 2332 -> 0

  1. testdir/

2 +added, 0 -removed.

0.004113 + (0.002332 -> 0.000000 -> 0.000586) = 0.004699 MB 0.047 seconds (all OK)

LOOK AT WHERE WE END UP - PART000 gets index addition, part001 AND part002 are unchanged; part003 is created and is quite tiny - very little data.

C:\>dir

 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:

12/31/2015 11:47 AM <DIR> Brother 04/28/2014 03:13 PM <DIR> Intel 12/31/2015 11:17 AM <DIR> maintenance 01/16/2016 10:33 AM 1,995 part000.zpaq 01/16/2016 10:32 AM 1,137 part001.zpaq 01/16/2016 10:33 AM 2,976 part002.zpaq 01/16/2016 10:33 AM 586 part003.zpaq 07/13/2009 11:20 PM <DIR> PerfLogs 01/08/2016 01:09 PM <DIR> Program Files 01/08/2016 01:14 PM <DIR> Program Files (x86) 04/28/2014 03:06 PM 211 setup.log 12/31/2015 11:50 AM <DIR> temp 01/16/2016 10:33 AM <DIR> testdir 12/22/2015 04:40 AM <DIR> Users 01/13/2016 06:21 PM <DIR> Windows 01/16/2016 10:29 AM 657,920 zpaq64.exe

               6 File(s)        664,825 bytes
              10 Dir(s)  259,510,517,760 bytes free

NOW BEHIND THE SCENES DROP A BINARY OF NON_TRIVIAL SIZE INTO THE TESTDIR: Installer EXE for MS_security essentials, approx 14 megs.

C:\>zpaq64.exe add part??? testdir zpaq v7.05 journaling archiver, compiled Apr 17 2015 part???.zpaq: 3 versions, 4 files, 2 fragments, 0.004699 MB part000.zpaq: 3 versions, 4 files, 2 fragments, 0.001995 MB Adding 14.243034 MB in 2 files -method 14 -threads 4 at 2016-01-16 14:34:47. 100.00% 0:00:00 + testdir/mseinstall.exe 14243008 100.00% 0:00:00 + testdir/mseinstall.exe:Zone.Identifier:$DATA 26 100.00% 0:00:00 [3..199] 14243830 -method 14,4,0

  1. testdir/

3 +added, 0 -removed.

0.004699 + (14.243034 -> 14.243034 -> 14.251026) = 14.255725 MB 1.466 seconds (all OK)

SEE WHAT WE HAVE GOT - Update to Part000 Index, modest increase but much less than data delta AND part001,2,3 are unchanged AND part004 is created, which has majority of data just added - 14megs of dense data.

C:\>dir

 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:

12/31/2015 11:47 AM <DIR> Brother 04/28/2014 03:13 PM <DIR> Intel 12/31/2015 11:17 AM <DIR> maintenance 01/16/2016 10:34 AM 8,213 part000.zpaq 01/16/2016 10:32 AM 1,137 part001.zpaq 01/16/2016 10:33 AM 2,976 part002.zpaq 01/16/2016 10:33 AM 586 part003.zpaq 01/16/2016 10:34 AM 14,251,026 part004.zpaq 07/13/2009 11:20 PM <DIR> PerfLogs 01/08/2016 01:09 PM <DIR> Program Files 01/08/2016 01:14 PM <DIR> Program Files (x86) 04/28/2014 03:06 PM 211 setup.log 12/31/2015 11:50 AM <DIR> temp 01/16/2016 10:34 AM <DIR> testdir 12/22/2015 04:40 AM <DIR> Users 01/13/2016 06:21 PM <DIR> Windows 01/16/2016 10:29 AM 657,920 zpaq64.exe

               7 File(s)     14,922,069 bytes
              10 Dir(s)  259,481,960,448 bytes free

FINALLY: MAKE A COPY OF THE 14meg MSE INSTALLER, AND RE RUN THE ZPAQ:

C:\>zpaq64.exe add part??? testdir zpaq v7.05 journaling archiver, compiled Apr 17 2015 part???.zpaq: 4 versions, 6 files, 199 fragments, 14.255725 MB part000.zpaq: 4 versions, 6 files, 199 fragments, 0.008213 MB Adding 14.243034 MB in 2 files -method 14 -threads 4 at 2016-01-16 14:34:57. 100.00% 0:00:00 + testdir/mseinstall - Copy.exe 14243008 -> 0 100.00% 0:00:00 + testdir/mseinstall - Copy.exe:Zone.Identifier:$DATA 26 -> 0

  1. testdir/

3 +added, 0 -removed.

14.255725 + (14.243034 -> 0.000000 -> 0.001394) = 14.257119 MB 0.202 seconds (all OK)

SEE WHAT WE HAVE: THIS TIME

- another modest increment to index file part000 - Files 1,2,3,4 are unchanged - File 005 is created and is VERY SMALL (compared to part004) since the binary data is virtually identical. (just filename is different)

C:\>dir

 Volume in drive C has no label.
 Volume Serial Number is 2E9C-DB05

 Directory of C:

12/31/2015 11:47 AM <DIR> Brother 04/28/2014 03:13 PM <DIR> Intel 12/31/2015 11:17 AM <DIR> maintenance 01/16/2016 10:34 AM 9,607 part000.zpaq 01/16/2016 10:32 AM 1,137 part001.zpaq 01/16/2016 10:33 AM 2,976 part002.zpaq 01/16/2016 10:33 AM 586 part003.zpaq 01/16/2016 10:34 AM 14,251,026 part004.zpaq 01/16/2016 10:34 AM 1,394 part005.zpaq 07/13/2009 11:20 PM <DIR> PerfLogs 01/08/2016 01:09 PM <DIR> Program Files 01/08/2016 01:14 PM <DIR> Program Files (x86) 04/28/2014 03:06 PM 211 setup.log 12/31/2015 11:50 AM <DIR> temp 01/16/2016 10:34 AM <DIR> testdir 12/22/2015 04:40 AM <DIR> Users 01/13/2016 06:21 PM <DIR> Windows 01/16/2016 10:29 AM 657,920 zpaq64.exe

               8 File(s)     14,924,857 bytes
              10 Dir(s)  259,467,665,408 bytes free

C:\>

January 16, 2016, at 03:02 PM by 142.167.11.206 -
Changed line 12 from:
  • Windows machine dumps a fairly dense backup file from a CMS (Autodesk Vault in this case) as part of a nightly backup job
to:
  • Windows machine dumps a fairly dense backup file from a DMS (DMS = Document Management System; Autodesk Vault Freebie Edition - in this case) as part of a nightly backup job
Changed line 14 from:
  • Actual change to the CMS is fairly modest on a day-by-day basis. Over the past 3 months, the compressed backup dump has grown in size from ~1gig to ~1.2 gigs.
to:
  • Actual change to the DMS is fairly modest on a day-by-day basis. Over the past 3 months, the compressed backup dump has grown in size from ~1gig to ~1.2 gigs.
Added lines 18-57:

The Plan

  • What if there was a sensible archiver tool I can use, instead of vanilla 7Zip compression, which lets me generate a series of incremental archive files, which all together form a full-latest-backup. But - adding new data to the archive will benefit from 'dedup' of identical blocks of data / and won't taint the older files (significantly) by changing the time:date stamp / or size of the older files - and thus forcing all that old data to be replicated offsite, each and every night that I do the replication to offsite task.
  • So, it looks like ZPAQ meets the requirements nicely.
  • Example is walk through below, using trivial test case (TESTDIR) with a few text files being added, then a binary (15meg Installer) and then a copy of same binary with new name.
  • And my likely end-point use case for the DMS System, will be something approx thus
  1. Schedule will assume one full cycle per week, one full plus 6 incremental backups basically.
  2. Sunday - delete all old backup data from the past week.
  3. Sunday bit later - run a fresh full backup - first using the AutoDesk backup script, and then use ZPAQ to make a sensible 'incremental friendly' compression of this thing.
  4. Sunday bit later still - allow the replicate-to-offsite task to happen. This normally takes ~2 hours based on ISP / Internet access speed at the site.
  5. Monday - run a fresh pass of the AutoDesk backup script. And then let ZPAQ do an incremental backup of this; generating a new increment file which ~mostly represents the deltas of data in the last ~24 hours, plus a small delta to the index file, which will need to be re-pushed in addition to the new segment file.
  6. Tuesday through Saturday - rinse and repeat, same gig as per Monday.
  7. Sunday, welcome back to the top of the cycle. Repeat as per step 2 in this list.
  • The net result of this is one full backup being pushed each week (Sunday) and then much smaller incrementals being pushed all other nights of the week. So we don't get away from pushing 'all the data' entirely; but it happens far less often (weekly, instead of nightly)
  • Obviously, if I wanted, we could tweak the schedule to be more along the lines of
  1. Day1 of the month - run the cleanup; then the backup and FULL task
  2. All other days of month - run the backup and incremental backup
  • However by taking that approach, we end up with .. progressively longer (up to ~30 days) long 'chain' of incremental. Strictly speaking this isn't so bad. And maybe ZPAQ will deal with this as elegantly as I require. So in fact maybe I will go with this approach :) instead of the 1-week cycle.
  • Broadly speaking I would assume there is some 'sane limit' beyond which you probably don't want to make your increments stretch forever back to the past (ie, 1 year worth of nightly increments? 3 years? Longer? I guess it depends on your use case, how big the data footprint is, what your ISP speed limits are, how much you trust ZPAQ to be easy to use for restore-from-increment-that-is-really-long-chain-depth, etc.

Anyhow. Lots of fun.

Kudos to the author of ZPAQ for making such a lovely tool / and making it open source for all to benefit from.

- Tim Chipman Jan.2016


TEST / EXAMPLE WALKTHROUGH




January 16, 2016, at 02:51 PM by 142.167.11.206 -
Changed line 17 from:
  • So all this is driven by a desire to reduce the volume of data required for each night push to offsite,
to:
  • So all this is driven by a desire to reduce the volume of data required for each night push to offsite, so that I can still have a fresh nightly backup happen, but we don't have to push the whole mess of data each and every night.
January 16, 2016, at 02:50 PM by 142.167.11.206 -
Added lines 1-17:

ZPAR ARCHIVER

This is just a quick walk through of a small test I just did with the ZPAQ archiver. The test was done on Windows 64bit (Win7Pro) environment but this app (ZPAQ) is available for a variety of platforms, ie, Win32 and Linux also.

For more info please refer to the homepage of ZPaq.

Disclaimer. I have nothing to do with this software, and I never heard of it before today. I just was curious to find a tool that did what I needed, and stumbled across this. Wanted to test if it worked as expected, and since I prepped the small test doc for myself, put it here for reference.

Context - What was I looking for anyway ?

I was looking for an incremental archival compression too that behaved in a sane way with a use case scenario:

  • Windows machine dumps a fairly dense backup file from a CMS (Autodesk Vault in this case) as part of a nightly backup job
  • Currently I've got the thing doing a 7Zip archive compression on the dump, and the compressed size was originally around 1 gig at time of deployment.
  • Actual change to the CMS is fairly modest on a day-by-day basis. Over the past 3 months, the compressed backup dump has grown in size from ~1gig to ~1.2 gigs.
  • This compressed backup file is replicated to an off-site backup location each night, as a way to be fairly sure that it is possible to recover the data / in case the on-site server blows up / burns down /etc.
  • Unfortunately, the way it works currently - is that every night, a full (now) 1.2gig file is pushed across public internet to the offsite location. Even though the actual "net delta" of any given night is approximately (1.2gigs minus 1gig divided by 90) - ie - modest / trivial size of delta each night, but we end up pushing the whole file every night. Yuk.
  • So all this is driven by a desire to reduce the volume of data required for each night push to offsite,