CORRBITS

This page is best viewed with Internet Explorer 7.

The corrbits calculates bitwise correlation metrics between a pair of files. The files can be identical in which case self-correlation metrics are calculated. Calculated metrics are used to statistically analyse information properties of a binary data.

References:
1. Viznyuk S. 2008 Use of self-correlation metrics for evaluation of information properties of binary strings.
2. Cover, Thomas M., Thomas, Joy A. 2006 Elements of information theory. Second Edition. John Wiley & Sons, Inc. ISBN-13 978-0-471-24195-9.
3. Shannon, C.E. 1948 A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379-423, 623-656.

Usage: corrbits

[-d] [-b int64_buffer_size] [-t top_counts]
[-s start_bit] [-e end_bit] [-i interval_bits]
[-n threads ] <-p|g|r> <fileA> [fileB]

The corrbits performs the following computations:

  1. Metric CR(n) is calculated as CR(n)=∑Mi=1( Ai Å Bi+n), where Ai and Bi are ith bits of fileA and fileB; Å is logical XOR operator; Ai=Ai−A for i>A, where A is size of fileA in bits; Bi+n=Bi+n−B for i+n>B, where B is size of fileB in bits; M is the least common multiple of A and B in bits; The range of possible values for CR(n) is 0 to M.
  2. Metric MF is calculated as MF=G−1 n=0 MF(n), where MF(n)=M−2•CR(n), and G is the greatest common divisor of A and B in bits.
  3. Metric DF is calculated as DF= (∑ G−1. n=0 MF(n)• MF(n) ⁄ M2)−1
  4. Adjusted MF is calculated as Adj.MF=(1−MF /M2)•MF
Explanation of command-line options, either -p or -g or -r options must be specified:
fileAmandatory input fileA
fileBoptional input fileB; if omitted, corrbits takes fileB=fileA which results in calculation of self-correlation metrics on fileA
-p prints n , MF(n) , n=0...G−1
-g prints G , MF ; this is the aggregate of -p option output
-r prints G , M , total number of bitwise XOR operations performed , MF , Adj.MF , DF
-t top_countsoptional used with -p or -g option; prints first top_counts of pairs
MF(n)max*N , MF(n)min*N , where MF(n)max and MF(n)min are the current running max and min values of MF(n) of the given rank, starting with 1st through top_count , and N is the running total of times the given max or min value has been encountered. If the [MF(n)max] or [MF(n)min] are enclosed in square brackets, the current output line is the one which provided the given max or min.
-s start_bitoptional start summation in formulas (2-3) for MF and DF with n=start_bit ; default start_bit=0
-e end_bitoptional end summation in formulas (2-3) for MF and DF with n=end_bit ; default end_bit=G−1
-i interval_bitsoptional summation interval in the formulas (2-3) for MF and DF ; default interval_bits=1
-n threadsoptional allows corrbits to run in multi-threaded mode on SMP machines. Normally the execution time for corrbits grows as M•G. Using threads parameter allows corrbits to execute ≈threads times faster as long as there are at least threads CPU cores available on the machine. This option requires license.
-doptional debug flag, outputs some debugging info
-b int64_buffer_sizeoptional by default corrbits may use up to nb=256Mb of memory to buffer the input files. Use this parameter to increase available buffer memory for corrbits, up to nbmax=4Gb
corrbits executables available for download:
Linuxcorrbits_linux.tgz
Solaris 9 64-bitcorrbits_solaris9.tgz
MS Windows cygwincorrbits_cygwin.tgz

Please email questions or comments to phystech@hotmail.com