perl-obfus - obfuscate (make more difficult to understand) perl source code



NAME

perl-obfus - obfuscate (make more difficult to understand) Perl source code programs


SYNOPSIS

perl-obfus-v|--version ] [ --noparsing ] [ --output-line-len N ] [ --jam 0|1 ] [ --end-handling keep|skip|mangle] [ --pod-handling keep|skip ] [ --old-spacing-mode ] [ --keep-spaces ] [ --keep-newlines ] [ --bannerhead filename] [--bannertail filename ] [ --SN name_of_SN_sub] [--SNS name_of_SNS_sub ] [ --excludeidentsfile|-x filename ].. [ --excludeidentsfile-anycase filename ].. [ -X filename ].. [ --suffixes-asis-list filename ].. [ -F user-defined-mapping-filename ].. [ -I include-dirs ].. [ -m module ].. [ -M module ].. [ -o destination-filename ] [ -P backend-perl-path ] [ -d map-filename ] [--embed-map] [ -e encode-count ] [ -i idents-mangling-params ] [ -n number-mangling-params ] [ -s string-mangling-params ] [ -c charcode-mangling-params ] [ -T time-asserter-params ] [ -H hostname-asserter-params ] [ -G generic-asserter-params ] [ -O profile-name ] file-to-obfuscate


DESCRIPTION

This program turns Perl source code files into functionally equivalent Perl source code that is much more difficult to study, analyze and modify - thus providing you control over intellectual property theft. This is not compiler, thus the code it outputs will perfectly will run on all platforms it was able to run before. It does this by accessing the parsed form of the programs - thus it's MUCH more reliable than alternatives that don't do that; it supports all Perl features including all advanced ones like nested regular expressions, expressions in substitution parts of s// operator, Perl formats. It works perfectly with multi-module programs and for programs that depend on a lot of third-party modules that are not subject to obfuscation. By default it also encodes the obfuscated version of the file and makes it self-decoding at runtime thus not requiring any standalone decoder, and making the file completely non-understandable by anybody.

Perl-Obfus also allows to ensure licensing conditions of the code at runtime by providing any combination of lifetime period expiration control, advanced hostname checking and generic user-defined checks; in case licensing conditions are not met, there is an option to delete obfuscated file automatically, print user-defined message and terminate the execution or ability to execute user-implemented code. All checking of licensing conditions are additionally encoded to make them very difficult to analyze. The block of code that checks for licensing conditions can't be removed from the obfuscated program since it's made dependant on the initialization functionality performed by that block.

Perl-Obfus also has auxilary no-parsing mode in which it doesn't try to obfuscate the code at all, code is only encoded in this mode. This mode is useful for quick and unperfect source code hiding only. This mode is not default one, it can be activated by passing --noparsing commandline switch.

This program obfuscates only one perl source file at a time. By default it writes obfucated file to stdout, but it's greatly recommended to use the option -o to get the obfuscated version of the file in the file specified (since a lot of additional operations are required when simply redirecting the stdout to any file of choice). Note that the same file can't be used as an input and as an output in any case.

All comments besides the one on the first line are omitted from obfuscated file, there is no option to preserve them. It's possible to request to preserve or to omit POD documentation from obfuscated file via the use of --pod-handling option. The text after the __DATA__ and __END__ sections can be either stripped away, left as is or mangled - per the choice of the user via the use of --end-handling option (sometimes people put testsuites for the modules after the __END__). It's possible to add comments with author and copyright information to the top and to the end of the obfuscated version of the file using options --bannerhead and --bannertail respectively. Of course these comments and POD documentation will appear in clear text form in the obfuscated file, independant of whether encoding was applied to it.

The obfuscation typically means

  • replacing all symbol names it's possible to with the non-meaningfull ones, e.g. replacing variable @files with @zcadaa4fc81, while preserving synaxical and semantical correctness of the source code. Of course predefined symbols like @ARGV and symbols from the third-party or standard Perl modules the perl source code uses will be left the same so the obfuscated code will still work without requiring to obfuscate those third-party or standard Perl modules

  • substitution of numeric values with the arithmetic expressions using (random or constant for the same numeric value as requested by the options) decimial and hexadecimial numeric values that evaluate to the same value

  • using hexadecimial character codes for all characters in strings

  • replacing strings with interpolated values with the concatenation of the appropriate components

  • adding extra parenthesis for the expressions

  • removing extra white space

  • jamming as much code on each line (of length not more than specified using --output-line-len option) as possible if --jam=0 is not specified

Add to that the fact that the obfuscated code will also be encoded thus making the source code completely unreadable.

The non-encoded obfuscated code is extremely difficult to understand for a human since the name of variables and subroutines and other symbols are totally meaningless and hard to remember (e.g. @files becomes @zcadaa4fc81). It's possible to control most aspects of obfuscation using the commandline switches of the Perl-Obfus.

If the file being obfuscated is a script (i.e. not a module), no modification to the original source file is needed for obfuscation to succeed. If the file being obfuscated is a module that exports some symbols by the use of a standard Exporter package and these symbols are used by other files that you also wish to obfuscate, then you have to make minor modification to the file (otherwise, for obvious reasons, after obfuscation, the content of @EXPORT variable will be names of non-obfuscated symbols, while the symbol names will be obfuscated. To overcome this, the perl-obfus supports two special functions with names SN and SNS (both names can be changed by the use of --SN and --SNS). First one accepts a scalar as an argument, while the second one - a list. For SN function, the special support is enabled if its argument is a constant string in single quotes. For SNS function, the special support is enabled if its arguments is a constant list produced using single qw() operator (exactly with parenthesis as delimiters). The special support is treating their arguments as symbol names, and mangling the symbol names as all symbols are mangled. I.e. SN('$a') becomes SN('$MANGLED_a') and SNS(qw($a %b)) becomes SNS(qw($MANGLED_a %MANGLED_b)) (the names of functions treated as SN and SNS will never become obfuscated - so you don't need to include them in exceptions list). Using other way of passing arguments to these two special subroutines won't enable the special treating so you should use only the supported ways only, i.e. the SN('$' . "a") or SN("\$a") or SN(q($a)) or SNS('$a','%b') or even SNS(qw[$a %b]) will be the same as before obfuscation (and thus some symbols won't be exported from the module being obfuscated). Also SN and SNS should be used if your code generates strings that are then eval'ed - e.g. instead of eval('$abc = '. "$value;") you should write eval(SN('$abc') . " = $value;"). If you also need to run your code non-obfuscated too, you should cut and paste definitions of the subroutines SN and SNS as following:

     sub SN { '';$_[0]; }
     sub SNS { '';@_; }

Note, that sometimes you will have to put this inside a BEGIN{} block in order these subroutines to be visible at the point where they are used.

The script starts a pipe to another (backend) perl process that does part of the processing. Note that rather fresh version of perl is required for backend - 5.7.2 or above, so in some cases you'll have to install it in parallel to the version of the perl you are using. So you may be required to pass the location and probably ionvokation options for the perl interpreter used as a backend using -P switch - e.g. -P '/usr/local/bin/perl'.

You don't need to install all modules used by the code you are obfuscating for the version of perl used for backend.

If the code being obfuscated expects modules in non-standard locations or needs them preloaded and requires specifying them to be performed via usual perl's switches -I, -m, -M, then you will have to pass the same set of switches to the perl-obfus (they will be passed to perl backend for it to be able to analyze the source code properly).

As was said above, the symbols from third-party and standard modules won't be mangled. But user needs to gather a list of such symbol names (called exceptions from this point) using a dedicated utility gen-ident-exceptions.pl, and pointing the names of files with exceptions using --excludeidentsfile or --excludeidentsfile-anycase options. For convenience, there is a -X switch that can be passed multiplie times to specifies the names of files in which list of exceptions to ignore are stored.

It's possible to request Perl-Obfus to save the mapping between obfuscated symbol names and original symbol names in the external file by passing the filename after -d switch.

Encoding can be controlled with the -e switch, to completely turn it off add -e 0 to the perl-obfus command line.


Support for ensuring licensing conditions

Perl-Obfus has advanced support for ensuring licensing conditions (not available in Lite or Trial editions). It's possible to ensure licensing conditions by any combination of the following criterias:

lifetime expiration
hostname matching
arbitrary check implemented by user

Each type of checking criterias is implemented by subengines called asserters from here - the specific asserters are called time asserters, hostname asserters and generic asserters correpondingly from now. There are several subtypes of asserters of each type, each with different behaviour; it's possible to enable only one subtype of asserters of a given type (i.e. no more than one time asserter, no more than one hostname asserter, etc).

By default no asserters of any type are enabled.

If any asserter is enabled, the special block of code is prepended to the obfuscated version of original code; if it was requested to additionally encode the obfuscated code then the resultant code (special block and obfuscated version of original code) will be encoded as a whole.

In any case that special block of code will actualy be an encoded version of the code that will include implementation of all checks and actions to be performed in case licensing conditions are not met AND special initialization code without which the obfuscated version of original code will not work correctly (the special initialization code is in fact an initialization of variables used in some part of expressions inside the obfuscated version of original code). This means that it's impossible to remove the special block that includes checks for ensuring licensing conditions without making the rest of the code malfunctioning, even if user selected to just obfuscate (without applying encoding) the original source.

Asserters are configured from a command line in a similar way to token mangling parameters. Use -T option to configure time asserters, -H to configure hostname asserters, -G to configure generic asserters. See description of the individual asserters of each type for more information about their options.


OPTIONS

It's possible to store the default commandline options in the globally-visible file $instroot/lib/perl-obfus/perl-obfus-settings.pl (where $instroot is a directory in which the Perl-Obfus package was installed to). See comments in that file for more information.

Note that there is interacive web-based commandline builder for Perl-Obfus available at http://www.stunnix.com/support/interactive/cmd-builder/.

-v|--version
Output version information and exit.

--output-line-len N
Set the maximum line length for the obfuscated file. However, if some string constant will be longer than this limit, it won't be split or otherwise wrapped, resulting in a line longer than an amount specified. The default value for parameter N is 80.

--jam 0|1
Control whether to omit extra white spaces. When argument is 1, all extra white spaces (including carridge returns) are omitted, that makes the obfuscated file looking even less readable. When argument is 0, the obfuscated (but not encoded) file will look like prettyprinted version of the original file with respect to spaces and newlines. By default jamming is enabled (value is 1). It's highly recommended not to turn it off since there are some bugs in the perl interpreter that may make appearing obfuscated files syntaxically incorrect.

--end-handling keep|skip|mangle
Select the way text after __DATA__ or __END__ is handled. The mangle means treat text after __DATA__ or __END__ as perl source code that should be obfuscated, meaning of keep and skip should be obvious. Note that __END__ will become __DATA__ in the obfuscated file (this is really innocent). Default value is keep.

--pod-handling keep|skip
Select the way POD (plain old documentation) inside source code is handled. Default value is keep.

--old-spacing-mode
Select spacing algorithm (that decided where space characters were needed between lexems) that was the only available in versions of Perl-Obfus prior to 1.6. This option is present for backward compatibility.

--keep-spaces
Request not to skip extra spaces in the lines (whitespaces and tabs) if jamming is enabled (which is the default). By default extra spaces in the line are ignored if jamming is enabled.

Note that extra spaces in the lines (whitespaces and tabs) won't correspond to the ones in the original file, but to certain prettyprinted version of it.

--keep-newlines
Request not to skip newlines if jamming is enabled (which is the default). By default newlines are ignored if jamming is enabled.

Note that newlines won't correspond to the ones in the original file, but to certain prettyprinted version of it.

--bannerhead filename
--bannertail filename
Specify the file whoose content will be prepended or appended to the obfuscated file. This is most useful for adding comments with copyright and license information. Such comments will be visible as clear text in any file that was obfuscated and/or encoded.

Note: use --bannertail only for files that don't have __END__ or __DATA__ sections, since otherwise these sections will be corrupted (since banner will be appended after the __END__ or __DATA__ sections).

--SN name_of_SN_sub
--SNS name_of_SNS_sub
Specify the names of the subs that will be treated as SN and SNS. See above for the description on what these subs are for. Default values are of course SN and SNS.

-I include-dirs
-m module
-M module
The options can be specified more than once and along with their values are passed down to the perl interpreter used as a backend, so read more about their meaning in the Perl documentation (perlrun). In most cases there is no need to use these options if the code being obfuscated runs without passing -I/-m/-M options. By default no additional -I/-m/-M options are passed to perl interpreter used for backend.

-P backend-perl-path
Specify the perl interpreter used for backend, and if necessary, additional options to pass to that perl interpreter. Note: the perl interpreter should be of version 5.7.2 or above! If you don't have such recent perl installed, you'll have to install it in parallel into some non-standard directory. The default value for this option is perl (i.e. perl will be searched in the default search path). If your base perl interpreter is older than 5.7.2, then you may wish to store the default argument for this option for use by all users of your host - see the very begining of the section OPTIONS for more information.

-o destination-filename
Output the obfuscated version to the named file. If such file exists, it will be deleted before writing to it. On unix, the permissions of the source file will obey current umask and the executable bit will be set for everybody if the input file was executable. It's highly recommended to use this option instead of using redirection available in your shell or you'll get into the trouble if the file being obfuscated references packages defined in itself. There is no default value for this option - i.e. the obfuscated file is written to stdout by default.

--excludeidentsfile filename
-x filename
This option can be specified more than once. It allows to specify the names of files that contain a list of symbol names that won't be mangled, one symbol per line. Such symbols are called exceptions from this point. Comments are allowed in such files by placing a hash sign (#) as the first character of the line. The file name specified is first searched in the current directory (if it's not absolute path), and then in the subdirectory lib/perl-obfus/exceptions/ of the directory where Perl-Obfus was installed to. Most of the exceptions are generated using gen-ident-exceptions.pl script. In very few cases users will have to manually extend a set of exceptions using hand-written files - see the description of the syntax of such files in the gen-ident-exceptions.pl's manual. There is no need to add perl special variables like @ARGV and builtin subroutines like open - they are already hardcoded in the perl-obfus.

It's possible to remove symbols from lists of exceptions by passing names of files with these symbol names using -X switch.

The filename can be name of directory, in this case all files located in this directory and any of its subdirectories (at any depth) are loaded as if the names of these files were specified individually one-by-one.

--excludeidentsfile-anycase filename
This option is very similar to --excludeidentsfile, except that symbols read from the specified file are treated as case-insensitive exceptions. This functionality is useful for listing methods and properties of ActiveX objects, that are case-insensitive.

The filename can be name of directory, in this case all files located in this directory and any of its subdirectories (at any depth) are loaded as if the names of these files were specified individually one-by-one.

-X filename
This option can be specified more than once. It allows to specify the names of files that contain a list of symbol names that should be mangled, even if those symbol names were in the files with names passed with -x switch (i.e. for disabling some exceptions). At first files specified with -x switch are processed, and then files specified with -X switch are processed.

Comments are allowed in such files by placing a hash sign (#) as the first character of the line. The file name specified is first searched in the current directory (if it's not absolute path), and then in the subdirectory lib/perl-obfus/exceptions/ of the directory where Perl-Obfus was installed to.

This option is mostly useful in case the set of exceptions created from builtin list and content of files passed with -x switch is too broad.

The filename can be name of directory, in this case all files located in this directory and any of its subdirectories (at any depth) are loaded as if the names of these files were specified individually one-by-one.

--suffixes-asis-list filename
This option can be specified more than once. It allows to specify the names of files that contain a list of suffixes that should be preserved in symbols being mangled. E.g. if suffix onclick is listed in some file mentioned by this option, then symbol myButton_onclick will be mangled to something like z2b9a0ec6d_onclick, rather than to something like za40f93e635d.

Comments are allowed in such files by placing a hash sign (#) as the first character of the line. The file name specified is first searched in the current directory (if it's not absolute path), and then in the subdirectory lib/perl-obfus/exceptions/ of the directory where Perl-Obfus was installed to.

This option is mostly useful for protecting code for environments, that scan name of symbol for some suffix in order to treat the symbol specially.

The filename can be name of directory, in this case all files located in this directory and any of its subdirectories (at any depth) are loaded as if the names of these files were specified individually one-by-one.

-F user-defined-mapping-filename
This option can be specified more than once. It allows to specify the names of files that contain user-specified mapping of symbols.

Comments are allowed in such files by placing a hash sign (#) as the first character of the line. Each line in such file contains two symbols: name of original symbol, one or more space characters, and required resultant symbol.

In case some mangling engine decides to assign a symbol that is listed as resultant symbol, special attempts will be made to guarantee that the symbol chosen by obfuscation engine won't conflict with it (by adding prefixes until unqueness is reached).

-d map-filename
This option specifies the name of file to write the mapping between obfuscated symbol names and non-obfuscated symbol names to. Such mapping may be useful for analyzing Perl error messages that contain obfuscated symbol names - just find the line with the symbol Perl interpreter complained about, and the second word on that line will be the original symbol name. Please keep in mind that it will be much easier for a person having access to such mapfile to study the code, so it's highly suggested to keep such map file in secure place and not to distribute it to the customers.

If the file specified with this option exists, the accumulated mapping information will be merged with mapping information previously stored in the file - this allows one to have map file for entire project.

By default no filename is specified, and thus mapping information is not saved anywhere.

--embed-map
This option instructs the mapping between obfuscated symbol names and original symbol names to be appended inside a comment to the result of processing, in form of couple of strings per line - obfuscated symbol name and original symbol name. For symbols that are exceptions (i.e. for ones obfuscated symbol name is the same as original) such lines are not emitted at all. Lines are emitted only for symbols found in subject file.

By default this mapping information is not embedded at all.

-e encode-count
This option controls the number of encoding iterations to be applied to obfuscated file. To disable encoding completely one should specify 0 as number of encoding iterations. It's not recommended to apply more than 30 encoding iterations. Each encoding iteration increases output file size. The relation between non-encoded obfuscated file size and encoded obfuscated file size is approximately the following: E=I*2+N*450 where E is encoded size, I is non-encoded obfuscated size, and N is the number of iterations applied. The default value for this option is 10.

--noparsing
This is additional mode of operation in which Perl-Obfus only encodes the file. In this mode no parsing of code is performed, no obfuscation of any type applied (e.g variables are not renamed, numbers are not turned to expressions, and so on) and comments are not removed, but the original code becomes unreadable. Code encoded in this mode is guaranteed to work the same way it was working before encoding, without a need of any modifications to it.

If this mode is activated, only the following options are in effect: all related to encoding - i.e. -e, and --bannerhead, --bannertail ,--pod-handling, -o .

-i idents-mangling-params
-n number-mangling-params
-s string-mangling-params
-c charcode-mangling-params
Specify options for mangling of tokens of each type. The argument is mangling-specification, that has the following syntax:

obfuscator-title[,option=value]..

Tokens of each type can be mangled using different approaches, each approach corresponds to obfuscator, identified by obfuscator-title. Each obfuscator can have options that alter its behaviour, in order to specify them the comma separated option=value pairs may follow obfuscator-title after a comma.

The mangling-specification specifies all details on how to mangle tokens of each type, so if multiplie occurences of the option are specified, the last one is taken into the effect.

For each type of token a special obfuscator with title none is available - it doesn't alter the tokens in any way.

Here is a list of obfuscators for each type of the token, with the options they support.

-i = obfuscators for symbol names
It's obvious that symbols with the same name should be obfuscated to the same name, independant of location in the program these symbol names are locatged. It also should be obvious that entire set of modules and scripts that uses them should be obfuscated using the same value of mangling-specification - otherwise there will appear undeclared symbols.
obfuscator none
Selecting this obfuscator will keep symbol names unchanged.

obfuscator combs
This obfuscator replaces names of symbols with names consisting of all possible combinations of characters, specified via option spec, of the length specified via option minlen. E.g. it can replace formname with IlI and mystr with llI (which both look very similar to the human eye) if user specified the value of spec option as Il. The resulting symbol name depends only on original symbol name and the value of seed option, calculation of some md5 sum of the string formed from these two items is used to generate resultant symbol name. The md5 algorithm can produce same sum value for different arguments - in which case so-called md5sum-collision occurs. The detection of collisions for symbols in the current file is done automatically. It's possible to activate detection of collisions for symbols in entire project by the use of adhere-mapfile option of this symbol obfuscator. If option adhere-mapping is specified for this obfuscator and has non-zero value and if mapfile name is specified via global option -d, then Perl-Obfus will read specified mapfile at startup, and will try to lookup the original symbol names in it and use a replacement from that file if found; it will also ensure that protected symbols that were produced during that invokation of Perl-Obfus are not assigned to any symbol listed in mapfile (and if it encounters some obfuscated symbol it was going to use as a replacement as being used as a replacement for another symbol (i.e. so-called ``hash-collision'' occurs) then execution of Perl-Obfus is aborted with error message - in which case it's necessary to clear mapfile, change the seed and/or increase value of len option and protect entire application again); after processing completes, mapfile will be updated as usual. Note, that shortest symbol obfuscator also can generate protected symbols using all possible combinations of characters, but it allows to generate shortest names possible at the same time (by requiring 2 passes on each source file).

Options:

adhere-mapfile
See description of combs obfuscator for more information on this option. The default value is 0.

seed
The value of this option affects the order in which all possible combinations of characters used for symbol name are chosen. The value can be arbitrary string.

minlen
len
Minimal length of generated symbol. Once all combinations of characters of a given length were used for generating symbol names, the length of resultant symbol name is automatically increased. This means it's not necessary to make the value of this option very long - set it to big enough value that makes your code acceptably unreadable and acceptably big. It's impossible to assign value lower than 4 to this option. The default value is for this option is 10.

spec
The value of this option instructs which characters can be used for generating names of symbols, the value should either be string that is a concatenation of all characters possible in the resultant symbol name, e.g. Il or OQ, or a couple of such strings separated by colon, in which case a string before the colon specifies characters that can be used for leading symbol's character, and string after the colon specifies characters that can be used for all characters of the symbol except the first (the leading) - e.g. lI:lI1 or O:O0. The recommended values are lI, O:O0 and lI:lI1. If this option is not specified, it's assumed that all characters allowed for Perl language to be used for symbols can form a resultant symbol name.

obfuscator md5
This obfuscator calculates md5 sum of the string produced by concatentation of a constant prefix (that can be passed via seed option) and the symbol name to be obfuscated. After that from the hexadecimial representation of the md5sum several (exact length is specified using len option) leading characters are appended to another prefix (that can be set via prefix option) to form obfuscated symbol name.

It's obvious that in theory it's possible to get md5sum collision - the critical situation when two different symbols will be obfuscated to the same symbol name. When such situation is detected, the obfuscation is aborted. The detection of collisions for symbols in the current file is done automatically. If detection of collisions for symbols in entire project is required, one can use adhere-mapfile option for enforcing uniqueness of protected symbols across all files - please read the description of symbol name obfuscator combs. The only solution in case md5sum collision is detected is to change the value of the seed option or to increase the value of the len option. However, such situations are very rare.

This is the default obfuscator for symbol names.

Options:

adhere-mapfile
See description of combs obfuscator for more information on this option. The default value is 0.

seed
See above for a description of this option. The value can be arbitrary string. The default value is generated as random string at the Perl-Obfus suite installation time, so it will be unique for each user of Perl-Obfus.

len
Specifies how many characters of the hexadecimial representation of the md5 sum to use for obfuscated name of the symbol. The less the value, the shorter all identifiers will be, the smaller obfuscated code will become, and the easier it will be for human to study the code. Also increasing the value lowers the probability of md5sum collision. The default value is 10.

prefix
Specifies the prefix of all mangled symbol names. It should non-empty string (one character is enough) just because hex representation of md5sum can begin with a digit. There is no point in changing the prefix. The default value is z.

obfuscator prefix
This obfuscator just prepends the same string (specified via str option) to all symbol names to get the obfuscated symbol name. This obfuscator is designed to be used for initial testing of obfuscated code for locations of use of undeclared symbols in obfuscated code. It's obvious that while testing obfuscated code it's much more easier to find out what symbol is undeclared if it's trivial to correlate that symbol of the obfuscated program with the symbol of the non-obfuscated program.

Options:

str
The string to prepend to all symbol names. Default value is Z439Z_.

obfuscator shortest
This obfuscator replaces each symbol name with the shortest identifier possible, using the shorter identifiers for symbols that are used more times. Using this obfuscator and none obfuscators for strings and numbers will produce the most compact version of the code possible, that will be smaller than the original one. The presence of this obfuscator turns Perl-Obfus into so-called source code ``compressor''.

It's perfectly suitable for multimodule projects too. There are two modes of operation this obfuscator works in (controlled by its parameter countupdate) - scanning through the project files for computing the use counts for all symbols (used if parameter countupdate is passed value 1) and saving the counts to a special file hereafter called countsfile (whose name is specified as value of parameter countsfile) or performing the obfuscation itself using the symbol use counts from countsfile gathered during first mode of operation (used if parameter countupdate is passed value 0, or if this parameter is not specified at all). In the obfuscation mode obfuscator maintains its state (a mapping between original symbols and obfuscated ones) in the file whose name specified as a value of parameter statefile (hereafter such file will be called statefile).

Note that file with symbol counts should be uptodate - at least it should mention all symbols that are subject to obfuscation - so if you added some code and introduced some new symbol, you'll have to regenerate countsfile. Perl-Obfus aborts execution if it encounters that some symbol was not counted at all, with diagnostics indicating that countsfile needs to be rebuilt. Rebuilding countsfile means deleting (or truncating) the countsfile and statefile and running Perl-Obfus in symbol count gathering mode on all files of the project. If your change to the code didn't introduce new symbols but just increased or decreased the use of already existing ones, it won't abort the execution but there will be a chance that size of resultant obfuscated file won't be the smallest possible.

So the common approach to using this obfuscator for symbol names is: develop and debug the code, delete files a-file-with-counts and a-file-with-state, then run the Perl-Obfus with options like this -i shortest,countupdate=1,countsfile=a-file-with-counts for all source files in the project to gather symbol counts to the file a-file-with-counts, and then run Perl-Obfus with options like this -i shortest,countupdate=0,countsfile=a-file-with-counts,statefile=a-file-with-state for all source files in the project.

By default each symbol name is obfuscated to the unque, but random identifier of the length corresponding to the number of occurencies of the given symbol. That randomness of identifier can be disabled by passing value 0 for parameter dontshuffle - this will force e.g. first symbol in the first source file of the project to always be obfuscated to the name c (provided there is no exception with the same name).

It's possible to specify a set of characters that can be used for resultant symbol names by the use of spec option - e.g. one can make code very hard to analyze without modification by asking to use only symbols I and l for names of symbols - that will produce symbols like IllII or IIlIIl which look very similar in the most fonts (but of course this won't result in smallest output). The use of this option makes shortest obfuscator a reliable version of combs obfuscator for multi-module projects, since it eliminates a chance for a case when two different symbols in two different modules (in which only one of the symbols is used) getting replaced with the same resultant symbol (which is possible in theory, but has a very small possibility).

Options:

countsfile
Location of the countsfile - the file containing use counts for all symbols.

countupdate
Selects the mode of obfuscator - gathering symbol use counts (if value is 1) or obfuscating (value is 0). Default value is 0.

statefile
Specifies the name of the file with the state information used internally by obfuscator when it's in obfuscation mode.

dontshuffle
Instructs not to select random name of the given length as obfuscated symbol name, but to select the next one by alphabet not being an exception.

minlen
Instructs the minimal length of resultant symbol name. Default value is 1.

spec
The value of this option instructs which characters can be used for generating names of symbols, the value should either be string that is a concatenation of all characters possible in the resultant symbol name, e.g. Il (that will produce symbols like IllII or IIlIIl) or OQ (that will produce symbols like OQOOQQ or QOOQQO), or a couple of such strings separated by colon, in which case a string before the colon specifies characters that can be used for leading symbol's character, and string after the colon specifies characters that can be used for all characters of the symbol except the first (the leading) - e.g. lI:lI1 (that will produce symbols like I1lII1 or lI1IIl1) or O:O0. The recommended values are lI, O:O0 and lI:lI1. The use of this option makes shortest obfuscator a reliable version of combs obfuscator for multi-module projects. If this option is not specified, it's assumed that all characters allowed for Perl language can form a resultant symbol name.

-n = obfuscators for numeric constants
There is only one non-trivial obfuscator for numeric constants currently - sum3. It's the default.
obfuscator none
Selecting this obfuscator will keep numbers unchanged.

obfuscator sum3
This obfuscator replaces the constant value with an arithmetic expression consisting of addition and substraction operations on either 3 constant numeric values (in case no asserters were enabled) or 2 constant numeric values and 1 constant variable (in case some asserters were enabled - please note that asserters are not supported in Lite or Trial version of the Product), which are represented as decimial and hexadecimial values (their radixes can be changed by altering format option). For different occurencies of the same constant numeric values, the choice is provided between using the same values used in expressions, or using 2 random values and one computed - this is controlled using const option. If you wish to make analysis of the differencies between revisions of your software more difficult, you should request the use of 2 random values and one computed in the expression - this way after each obfuscation each obfuscated file will differ from the previous run of obfuscator. This is the default obfuscator (in fact it's the only non-trvial one for numeric constants).

Options:

const
Specifies whether for the same numeric constant the obfuscation should produce different substitution expression (the value for the option is 0) or same expressuib (the value for the option is 1). Default value is 0.

format
Specifies the sprintf format string for the obfuscated substitutor. The default value is (0x%04x+% 4d-0x%04x).

format_var
Specifies the sprintf format string for the obfuscated substitutor in case a variable and 2 constants are used. The default value is (0x%04x+% 4d-%s).

var_use_ratio
In case some asserters were enabled, specifies the ratio of occurencies of expressions that involve variables compared to all numeric constants occured in the source code. E.g. if you wish that half of the numeric constants in your code referenced variables, you should set the value of this parameter to 0.5. It's not recommended to set this parameter to 0 for obvious reasons. The expressions in the code use references to variables, the bigger code is (expression with reference to variable is larger than expression without them by about 7-9 bytes). The default value is 0.25.

-s = obfuscators for string constants
These make constant strings more difficult to read. The default string obfuscator is hexchar. The default string obfuscator is none - that is, strings are not mangled at all.
obfuscator none
Selecting this obfuscator will keep strings unchanged.

obfuscator hexchar
This obfuscator substitutes each character of the string with reverse slash and it's code, by default in hexadecimial notation - e.g. string ``abc'' is substituted with ``\x61\x62\x63''.
format
Specifies the sprintf format string for each character's substitutor. The default value is '\\x%x'.

obfuscator list4chr
This obfuscator substitutes a string with the (join(``'',map { chr($_); } (@list_of_character_codes)). The @list_of_character_codes is inline list of expressions that evaluate to integers. Each item in the list corresponds to the character of the string being obfuscated. Each character is created by composing a character with code computed as numeric expression that is produced using obfuscator for numeric constants; the parameters for this obfuscator are passed using -c option and have the same meaning as parameters to the obfuscator for numeric constants passed using -n option. The default value of parameter for the -c option is the same as for -n. The list4chr obfuscator is designed in order to make automatic deobfuscation of string constants someone may like to implement more complicated.

-c = obfuscator for character codes
Please read the description of the list4chr obfuscator for string codes (a previous paragraph). none obfuscator will not obfuscate character codes at all (but the string will still be unreadable).

-T time-asserter-params
-H hostname-asserter-params
-G generic-asserter-params
Specify options for asserters of each type (not available in Lite or Trial version of the product). The argument is asserter-specification, that has the following syntax:

asserter-title[,option=value]..

There are several subtypes of asserters for each type. The subtype is selected by asserter-title. Each asserter can have options that alter its behaviour, in order to specify them the comma separated option=value pairs may follow asserter-title after a comma.

For each type of asserters a special asserter with title none is available - it doesn't perform any action.

Here is a list of asserter-titles for each type of the asserters, with the options they support.

-T = time asserters
Time asserters insure that some condition about the point of time the script was started at is true.
asserter expire
This asserter insures that current date and time is less than one specified via whenexpires option. There way used to acquire current date and time is specified via source option. If condition is violated, the code finishes execution after performing all other actions requested. It's possible to request that main module of the program to be erased by setting option onviolated-destroy to 1. The warning, specified via option onviolated-message will be printed if value of option onviolated-warn is 1.

Options:

whenexpires
Specifies the date and time after which program should stop working. The format of this parameter can be either number of seconds since 1 Jan 1970, or human representation of the absolute date and time like 20 Apr 2004 15:43 or 2004/4/20 15:43, or relative date and time of the form ``now + count units time-spec'' like ``now + 2 weeks 9:00'' (i.e. any format Perl module Time::ParseDate recognizes). If the value is a string 0, then the asserter is disabled. Default value is 0.

onviolated-warn
If 1, directs message specified via onviolated-message to be printed. Default value is 0.

onviolated-message
Specifies the message to be printed in case value of onviolated-warn is 1. The trailing newline will be automatically appended. Escape sequences are allowed. Please be sure to quote the message properly in order the shell passed the message as a single string. Default value is ``Content-type: text/html\n\nThe script has expired, please contact webmaster.''.

onviolated-destroy
If 1, directs main module of the program to be destroyed in case licensing conditions are not met. Default value is 0.

source
Specifies the source of information about current time. The possible options are:
builtin-time
The value returned by builtin time() function.

atime-of-self
Access time of the script. May be non-informative on filesystems mounted with ``noatime'' option.

mtime-of-tmp
Modification time of /tmp. May be non-informative on some systems.

mtime-of-/proc/uptime
Modification time of /proc/uptime. This is Linux-specific, and even some Linux systems don't make /proc/uptime accessible to users.

/bin/date/
The output of /bin/date +%s.

For all cases time() is used as a fallback source of information in case primary method is not available. For Unix systems, /bin/date seems to be the most reliable and trusted source of information.

Default value of this option is builtin-time.

-H = hostname asserters
In case the conditions are not met, the script is terminated. If parameter onviolated-warn is 1, the message specified by parameter onviolated-message is displayed. It's possible to request that main module of the program to be erased by setting option onviolated-destroy to 1.

All of these asserters support the same set of parameters:

matches
The string interpreted differently by each asserter, it's a host name for single-host asserter, a plus-separated list of allowed host suffixes for hosttails asserter or a regular expression to which hostname should match for hostregex asserter. In all cases hostnames should be assumed in lowercase. See more details about treatment of hostname below. Default value is localhost.

onviolated-warn
If 1, the message specified with parameter onviolated-message is printed before program is terminated in case condition about hostname is not met. Default value is 0.

onviolated-message
Specifies the message to be printed in case condition about hostname is not met. The trailing newline will be automatically appended. Escape sequences are allowed. Please be sure to quote the message properly in order the shell passed the message as a single string. Default value is ``Content-type: text/html\n\nThe script is not licensed to be run on this machine.''

onviolated-destroy
If 1, directs main module of the program to be destroyed in case licensing conditions are not met. Default value is 0.

source

The following sources of information are supported:

Sys::Hostname
This is a default source of information. Also it's used as a fallback when other sources of information are used.

/bin/hostname
Output of /bin/hostname (present in most Unix-like OSes) is used. Please note that for most hosting providers it will be some name inside hosting provider's domain - something like uk3.valuehost.com independant of the name of the web server name; there are no guarantee that such internal name of the server won't change. On the other hand, if one wants to license some software for all clients of the particular hosting provider, it may be a good idea to use this source of information.

/bin/uname
Output of /bin/uname -n 2 (present in most Unix-like OSes) is used. Please note that for most hosting providers it will be some name inside hosting provider's domain - something like uk3.valuehost.com independant of the name of the web server name; there are no guarantee that such internal name of the server won't change. On the other hand, if one wants to license some software for all clients of the particular hosting provider, it may be a good idea to use this source of information.

env_http_host
The value of environment variable HTTP_HOST will be used as a name of host; this variable is preset when program is serving dynamic web content. The value of this variable depends on the hostname the user requested - i.e. if user visited http://www.site.com then HTTP_HOST will be www.site.com, but if user has visited http://site.com then HTTP_HOST will be site.com, so if both kinds URLs are to be supported than single-host asserter shouldn't be used. Please note that it's easy to spoof arbitrary hostname by malicous user by just prepending an assignment to this envirnment variable to the protected script, so it's not very reliable source of information.

env_server_name
The value of environment variable SERVER_NAME will be used as a name of host; this variable is preset when program is serving dynamic web content. The value of this variable depends on the hostname the user requested - i.e. if user visited http://www.site.com then SERVER_NAME will be www.site.com, but if user has visited http://site.com then SERVER_NAME will be site.com, so if both kinds URLs are to be supported than single-host asserter shouldn't be used. Please note that it's easy to spoof arbitrary hostname by malicous user by just prepending an assignment to this envirnment variable to the protected script, so it's not very reliable source of information.

There is a plain CGI Perl script in lib/perl-obfus/print-hostname.pl in the directory where Perl-Obfus is installed that prints the value acquired by all sources of information.

All hostname asserters differ only in treatment of the parameter matches. The following hostname asserters are supported:

single-host
The parameter matches is a single name of host with domain part, e.g. uk3.valuehost.com.

hosttails
The parameter matches is a +-separated list of hostname tails of host to match, e.g. valuehost.com+valuehost.co.uk. Hosts with names uk3.valuehost.com and support.valuehost.co.uk and even ad.bestvaluehost.com will be considered as matching by this asserter.

hostregex
The parameter matches is a regular expression that host name should match. Don't forget to insert acnhors ^ and $ around it - they won't be automatically appended and prepended. Don't forget to quote shell special characters like * and $ (or just enclose entire parameter inside single quotes if on unix). E.g. the value [.]valuehost[.] of parameter matches will make hosts with names www.valuehost.com and support.valuehost.co.uk as matching by this asserter, and host with name www.bestvaluehost.com and www.valuehosters.com won't be considered matching.

-G = generic asserters
Generic asserters allow to insert custom checks and actions in the highly-protected block of code. There are no requirement on the code at all, no parsing of it is performed.
asserter from-string
This asserter has only one parameter - code - a string of custom code.

asserter from-file
This asserter takes the code from file specified via filename parameter.

It's possible to use fake generic asserters with code ' ' (i.e. single space character) to make the analysis of the program more complex (since in case any asserter is used, some fraction of numeric expressions will be turned to arithmetic expressions involving constant variables initialized in the encoded block). This trick (passing -G from-string,code=' ') will make custom decompiler one will have to write to be able to analyze the code much more complex.

In order to report violation of licensing conditions, user's code should execute the following statements:

 exit 0;
-O profile-params
Tune the behaviour of the Perl-Obfus for some specific dialect or environment.

The argument is profile-params, that has the following syntax:

profile-name[,option=value]..

There are several profiles available. The profile is selected by profile-name. Each profile can have options that alter its behaviour, in order to specify them the comma separated option=value pairs may follow profile-name after a comma.

The following values for profile-name are available:

default
Selects default Perl dialect.

The profile with name default is the default profile.

All profiles have the following options (specified in the way options for manglers and extractors are specified):

handle-dynamic-scripts
Specifies whether dynamic Perl code (the code generated on the fly) should be obfuscated. See the description of the option with the same name for asp extractor.

dynamic-scripts-by
Specifies names of objects and methods whoose arguments should be scanned for dynamic Perl code. See the description of the option with the same name for asp extractor.


RETURN VALUE

In case of an error, the exit code will be non-zero, otherwise the exit code will be zero.


DIAGNOSTICS

On successful processing of the file, the message 'input-filename syntax OK' to stderr. The processing will stop if there is a syntax error in the file being obfuscated or in the file it uses - in that case location and details of syntax error will be printed to stderr.


EXAMPLES

The following commandline obfuscates and encodes file blah.pl using default parameters and exceptions from file named ./excepts, writing obfuscated and encoded version to oblah.pl:

    perl-obfus blah.pl -o oblah.pl -x ./excepts

The following commandline is recommended way of obfuscating file blah.pl for shipping using default parameters and exceptions from file named ./excepts, writing obfuscated and encoded version to oblah.pl (the main difference from previous example is passing the value of the seed parameter for obfuscator routine for symbol names):

    perl-obfus blah.pl -o oblah.pl -x ./excepts -i md5,seed=SomeRandomString

The following commandline is a recommended for producing the mildly-obfuscated non-encoded version of the blah.pl that is ideal for testing whether the obfuscated code has no problems like use of undefined symbols (that may arise due to insufficiently complete list of exceptions in file ./excepts) :

    perl-obfus blah.pl -e 0 -o oblah.pl -x ./excepts -n none -s none -i prefix,str=ZZZ

The following commandlines are a sample of passing same set values for all options to the md5 obfuscator routine for symbol names. It obfuscates and encodes file blah.pl, writing obfuscated and encoded version of the file to oblah.pl:

    perl-obfus blah.pl -o oblah.pl -i md5,seed=57823,prefix=p,len=5
    perl-obfus blah.pl -o oblah.pl -i 'md5,prefix=p, seed=57823 , len=5'

The following example obfuscates and encodes file blah.pl, writing obfuscated and encoded version of the file to oblah.pl, with embedding code for license checking that allows the code to be executed itself till 28 April 2005; upon expiration of the code default message is printed:

    perl-obfus blah.pl -o oblah.pl 
        -T 'expire,whenexpires=28 April 2005,onviolated-warn=1' 
        -H hosttails,matches=site.com+.site.com,onviolated-warn=1


FILES

It's possible to store the default commandline options in the globally-visible file $instroot/lib/perl-obfus/perl-obfus-settings.pl (where $instroot is a directory in which the Perl-Obfus package was installed to) which is a Perl module. This file defines one sub cmnargs that should return a list of options to be prepended to actual commandline the perl-obfus, thus allowing to store ``persistent settings'' for perl-obfus. It is most useful for specifying the location of perl used for backend (that should be a perl of version 5.7.2 or greater).


CAVEATS

Here is a list of mostly innocent caveats.

  • __END__ section becomes __DATA__ in the output. Mostly it won't make any difference to your code.

  • __LINE__ in the input file turns into line number (a numeric constant) in obfuscated file.

  • The perl code in substitution part of s///e won't be obfuscated heavily - only the identifiers will be mangled properly in order code to work correctly, but no integer, string and whitespace mangling will be applied to the code.

  • It's recommended to turn some perl warnings off for obfuscated files, since due to jamming white space there will appear some constructs for which perl will issue a warning.


BUGS

  • With -jam=0 some obfuscated code does not work (gives syntax error due to strange sensitivity of perl to linebreaks). Better use -jam=1 with -output_line_len=80 to get working and somewhat readable code.

  • Constants implemented as subs may get inlined in the file they are defined, and the sub definitions theirself may disappear from the obfuscated file corresponding to the file they were defined in (and the the values these subs return will be substituted in the places where these subs are invoked). To solve this problem, just add a my-scoped list variables initialized with the references to these subroutines.

  • A class of problems may arise with the obfuscated code due to abnormal sensitivity of perl parser to the extra white space. The following subs act different due to features of (or bugs in) perl parser:
        sub f { { "blah"
        , 2}; };
        sub g { { "blah", 2}; };

    Here f() returns integer 2, g() returns reference to anonymous hash - though the difference is only in amount of whitespace (whether there is a newline after ``blah''). Since perl-obfus removes extra whitespaces (and wraps line in order it not to be longer that the constant you specified) the behaviour of functions can change. You should not write the code that is sensitive to whitespace and perl parser bugs in general - so you should add explicit return in f and g if you want them to return ref to hash.

See section NOTES for troubleshooting instructions.


NOTES

In most cases, once properly prepared for obfuscation, obfuscated version of the code should work the same as non-obfuscated. It's recommended to check obfuscated version of the code for the use of undeclared subroutines using find-undeclared-subs.pl script - this will help to detect incomplete set of symbol name exceptions. After fixing the issues with incomplete set of exceptions, it's recommended to check whether ofbuscated code behaves exactly the same as original - by using pre-existing testsuite or checking functionality manually.

If some obfuscated code is syntaxically correct but works differently than original version , obfuscate it without encoding and string, integer and ident mangling (but with -jam=1), as following:

    perl-obfus -i none -s none -n none -jam 1 -e 0

Then try to run it again. If it still does not work correctly, find the source file which is guilty by replacing each of the obfuscated files with original ones one by one. After you have found the file that contains the problem, append the definitions of all functions from the source file to that target file and by temporary renaming function names in the appended part to something else (e.g. by suffixing the names with '1' or 'blah') you will be able to find the function that is guilty. Same process can be applied to the blocks in the guilty function too (just replace obfuscated parts with source parts) to find out which part of the obfuscate function is misbehaving.

Having found the function block that misbehaves, that block should be modified in order the obfuscated version to have the same functionality as original code.


SEE ALSO

gen-ident-exceptions.pl, find-undeclared-subs.pl.