| #+TITLE: UglifyJS -- a JavaScript parser/compressor/beautifier |
| #+KEYWORDS: javascript, js, parser, compiler, compressor, mangle, minify, minifier |
| #+DESCRIPTION: a JavaScript parser/compressor/beautifier in JavaScript |
| #+STYLE: <link rel="stylesheet" type="text/css" href="docstyle.css" /> |
| #+AUTHOR: Mihai Bazon |
| #+EMAIL: [email protected] |
| |
| * UglifyJS --- a JavaScript parser/compressor/beautifier |
| |
| This package implements a general-purpose JavaScript |
| parser/compressor/beautifier toolkit. It is developed on [[http://nodejs.org/][NodeJS]], but it |
| should work on any JavaScript platform supporting the CommonJS module system |
| (and if your platform of choice doesn't support CommonJS, you can easily |
| implement it, or discard the =exports.*= lines from UglifyJS sources). |
| |
| The tokenizer/parser generates an abstract syntax tree from JS code. You |
| can then traverse the AST to learn more about the code, or do various |
| manipulations on it. This part is implemented in [[../lib/parse-js.js][parse-js.js]] and it's a |
| port to JavaScript of the excellent [[http://marijn.haverbeke.nl/parse-js/][parse-js]] Common Lisp library from [[http://marijn.haverbeke.nl/][Marijn |
| Haverbeke]]. |
| |
| ( See [[http://github.com/mishoo/cl-uglify-js][cl-uglify-js]] if you're looking for the Common Lisp version of |
| UglifyJS. ) |
| |
| The second part of this package, implemented in [[../lib/process.js][process.js]], inspects and |
| manipulates the AST generated by the parser to provide the following: |
| |
| - ability to re-generate JavaScript code from the AST. Optionally |
| indented---you can use this if you want to “beautify” a program that has |
| been compressed, so that you can inspect the source. But you can also run |
| our code generator to print out an AST without any whitespace, so you |
| achieve compression as well. |
| |
| - shorten variable names (usually to single characters). Our mangler will |
| analyze the code and generate proper variable names, depending on scope |
| and usage, and is smart enough to deal with globals defined elsewhere, or |
| with =eval()= calls or =with{}= statements. In short, if =eval()= or |
| =with{}= are used in some scope, then all variables in that scope and any |
| variables in the parent scopes will remain unmangled, and any references |
| to such variables remain unmangled as well. |
| |
| - various small optimizations that may lead to faster code but certainly |
| lead to smaller code. Where possible, we do the following: |
| |
| - foo["bar"] ==> foo.bar |
| |
| - remove block brackets ={}= |
| |
| - join consecutive var declarations: |
| var a = 10; var b = 20; ==> var a=10,b=20; |
| |
| - resolve simple constant expressions: 1 +2 * 3 ==> 7. We only do the |
| replacement if the result occupies less bytes; for example 1/3 would |
| translate to 0.333333333333, so in this case we don't replace it. |
| |
| - consecutive statements in blocks are merged into a sequence; in many |
| cases, this leaves blocks with a single statement, so then we can remove |
| the block brackets. |
| |
| - various optimizations for IF statements: |
| |
| - if (foo) bar(); else baz(); ==> foo?bar():baz(); |
| - if (!foo) bar(); else baz(); ==> foo?baz():bar(); |
| - if (foo) bar(); ==> foo&&bar(); |
| - if (!foo) bar(); ==> foo||bar(); |
| - if (foo) return bar(); else return baz(); ==> return foo?bar():baz(); |
| - if (foo) return bar(); else something(); ==> {if(foo)return bar();something()} |
| |
| - remove some unreachable code and warn about it (code that follows a |
| =return=, =throw=, =break= or =continue= statement, except |
| function/variable declarations). |
| |
| ** <<Unsafe transformations>> |
| |
| The following transformations can in theory break code, although they're |
| probably safe in most practical cases. To enable them you need to pass the |
| =--unsafe= flag. |
| |
| *** Calls involving the global Array constructor |
| |
| The following transformations occur: |
| |
| #+BEGIN_SRC js |
| new Array(1, 2, 3, 4) => [1,2,3,4] |
| Array(a, b, c) => [a,b,c] |
| new Array(5) => Array(5) |
| new Array(a) => Array(a) |
| #+END_SRC |
| |
| These are all safe if the Array name isn't redefined. JavaScript does allow |
| one to globally redefine Array (and pretty much everything, in fact) but I |
| personally don't see why would anyone do that. |
| |
| UglifyJS does handle the case where Array is redefined locally, or even |
| globally but with a =function= or =var= declaration. Therefore, in the |
| following cases UglifyJS *doesn't touch* calls or instantiations of Array: |
| |
| #+BEGIN_SRC js |
| // case 1. globally declared variable |
| var Array; |
| new Array(1, 2, 3); |
| Array(a, b); |
| |
| // or (can be declared later) |
| new Array(1, 2, 3); |
| var Array; |
| |
| // or (can be a function) |
| new Array(1, 2, 3); |
| function Array() { ... } |
| |
| // case 2. declared in a function |
| (function(){ |
| a = new Array(1, 2, 3); |
| b = Array(5, 6); |
| var Array; |
| })(); |
| |
| // or |
| (function(Array){ |
| return Array(5, 6, 7); |
| })(); |
| |
| // or |
| (function(){ |
| return new Array(1, 2, 3, 4); |
| function Array() { ... } |
| })(); |
| |
| // etc. |
| #+END_SRC |
| |
| *** =obj.toString()= ==> =obj+“”= |
| |
| ** Install (NPM) |
| |
| UglifyJS is now available through NPM --- =npm install uglify-js= should do |
| the job. |
| |
| ** Install latest code from GitHub |
| |
| #+BEGIN_SRC sh |
| ## clone the repository |
| mkdir -p /where/you/wanna/put/it |
| cd /where/you/wanna/put/it |
| git clone git://github.com/mishoo/UglifyJS.git |
| |
| ## make the module available to Node |
| mkdir -p ~/.node_libraries/ |
| cd ~/.node_libraries/ |
| ln -s /where/you/wanna/put/it/UglifyJS/uglify-js.js |
| |
| ## and if you want the CLI script too: |
| mkdir -p ~/bin |
| cd ~/bin |
| ln -s /where/you/wanna/put/it/UglifyJS/bin/uglifyjs |
| # (then add ~/bin to your $PATH if it's not there already) |
| #+END_SRC |
| |
| ** Usage |
| |
| There is a command-line tool that exposes the functionality of this library |
| for your shell-scripting needs: |
| |
| #+BEGIN_SRC sh |
| uglifyjs [ options... ] [ filename ] |
| #+END_SRC |
| |
| =filename= should be the last argument and should name the file from which |
| to read the JavaScript code. If you don't specify it, it will read code |
| from STDIN. |
| |
| Supported options: |
| |
| - =-b= or =--beautify= --- output indented code; when passed, additional |
| options control the beautifier: |
| |
| - =-i N= or =--indent N= --- indentation level (number of spaces) |
| |
| - =-q= or =--quote-keys= --- quote keys in literal objects (by default, |
| only keys that cannot be identifier names will be quotes). |
| |
| - =--ascii= --- pass this argument to encode non-ASCII characters as |
| =\uXXXX= sequences. By default UglifyJS won't bother to do it and will |
| output Unicode characters instead. (the output is always encoded in UTF8, |
| but if you pass this option you'll only get ASCII). |
| |
| - =-nm= or =--no-mangle= --- don't mangle variable names |
| |
| - =-ns= or =--no-squeeze= --- don't call =ast_squeeze()= (which does various |
| optimizations that result in smaller, less readable code). |
| |
| - =-mt= or =--mangle-toplevel= --- mangle names in the toplevel scope too |
| (by default we don't do this). |
| |
| - =--no-seqs= --- when =ast_squeeze()= is called (thus, unless you pass |
| =--no-squeeze=) it will reduce consecutive statements in blocks into a |
| sequence. For example, "a = 10; b = 20; foo();" will be written as |
| "a=10,b=20,foo();". In various occasions, this allows us to discard the |
| block brackets (since the block becomes a single statement). This is ON |
| by default because it seems safe and saves a few hundred bytes on some |
| libs that I tested it on, but pass =--no-seqs= to disable it. |
| |
| - =--no-dead-code= --- by default, UglifyJS will remove code that is |
| obviously unreachable (code that follows a =return=, =throw=, =break= or |
| =continue= statement and is not a function/variable declaration). Pass |
| this option to disable this optimization. |
| |
| - =-nc= or =--no-copyright= --- by default, =uglifyjs= will keep the initial |
| comment tokens in the generated code (assumed to be copyright information |
| etc.). If you pass this it will discard it. |
| |
| - =-o filename= or =--output filename= --- put the result in =filename=. If |
| this isn't given, the result goes to standard output (or see next one). |
| |
| - =--overwrite= --- if the code is read from a file (not from STDIN) and you |
| pass =--overwrite= then the output will be written in the same file. |
| |
| - =--ast= --- pass this if you want to get the Abstract Syntax Tree instead |
| of JavaScript as output. Useful for debugging or learning more about the |
| internals. |
| |
| - =-v= or =--verbose= --- output some notes on STDERR (for now just how long |
| each operation takes). |
| |
| - =--unsafe= --- enable other additional optimizations that are known to be |
| unsafe in some contrived situations, but could still be generally useful. |
| For now only this: |
| |
| - foo.toString() ==> foo+"" |
| |
| - =--max-line-len= (default 32K characters) --- add a newline after around |
| 32K characters. I've seen both FF and Chrome croak when all the code was |
| on a single line of around 670K. Pass --max-line-len 0 to disable this |
| safety feature. |
| |
| - =--reserved-names= --- some libraries rely on certain names to be used, as |
| pointed out in issue #92 and #81, so this option allow you to exclude such |
| names from the mangler. For example, to keep names =require= and =$super= |
| intact you'd specify --reserved-names "require,$super". |
| |
| - =--inline-script= -- when you want to include the output literally in an |
| HTML =<script>= tag you can use this option to prevent =</script= from |
| showing up in the output. |
| |
| - =--lift-vars= -- when you pass this, UglifyJS will apply the following |
| transformations (see the notes in API, =ast_lift_variables=): |
| |
| - put all =var= declarations at the start of the scope |
| - make sure a variable is declared only once |
| - discard unused function arguments |
| - discard unused inner (named) functions |
| - finally, try to merge assignments into that one =var= declaration, if |
| possible. |
| |
| *** API |
| |
| To use the library from JavaScript, you'd do the following (example for |
| NodeJS): |
| |
| #+BEGIN_SRC js |
| var jsp = require("uglify-js").parser; |
| var pro = require("uglify-js").uglify; |
| |
| var orig_code = "... JS code here"; |
| var ast = jsp.parse(orig_code); // parse code and get the initial AST |
| ast = pro.ast_mangle(ast); // get a new AST with mangled names |
| ast = pro.ast_squeeze(ast); // get an AST with compression optimizations |
| var final_code = pro.gen_code(ast); // compressed code here |
| #+END_SRC |
| |
| The above performs the full compression that is possible right now. As you |
| can see, there are a sequence of steps which you can apply. For example if |
| you want compressed output but for some reason you don't want to mangle |
| variable names, you would simply skip the line that calls |
| =pro.ast_mangle(ast)=. |
| |
| Some of these functions take optional arguments. Here's a description: |
| |
| - =jsp.parse(code, strict_semicolons)= -- parses JS code and returns an AST. |
| =strict_semicolons= is optional and defaults to =false=. If you pass |
| =true= then the parser will throw an error when it expects a semicolon and |
| it doesn't find it. For most JS code you don't want that, but it's useful |
| if you want to strictly sanitize your code. |
| |
| - =pro.ast_lift_variables(ast)= -- merge and move =var= declarations to the |
| scop of the scope; discard unused function arguments or variables; discard |
| unused (named) inner functions. It also tries to merge assignments |
| following the =var= declaration into it. |
| |
| If your code is very hand-optimized concerning =var= declarations, this |
| lifting variable declarations might actually increase size. For me it |
| helps out. On jQuery it adds 865 bytes (243 after gzip). YMMV. Also |
| note that (since it's not enabled by default) this operation isn't yet |
| heavily tested (please report if you find issues!). |
| |
| Note that although it might increase the image size (on jQuery it gains |
| 865 bytes, 243 after gzip) it's technically more correct: in certain |
| situations, dead code removal might drop variable declarations, which |
| would not happen if the variables are lifted in advance. |
| |
| Here's an example of what it does: |
| |
| #+BEGIN_SRC js |
| function f(a, b, c, d, e) { |
| var q; |
| var w; |
| w = 10; |
| q = 20; |
| for (var i = 1; i < 10; ++i) { |
| var boo = foo(a); |
| } |
| for (var i = 0; i < 1; ++i) { |
| var boo = bar(c); |
| } |
| function foo(){ ... } |
| function bar(){ ... } |
| function baz(){ ... } |
| } |
| |
| // transforms into ==> |
| |
| function f(a, b, c) { |
| var i, boo, w = 10, q = 20; |
| for (i = 1; i < 10; ++i) { |
| boo = foo(a); |
| } |
| for (i = 0; i < 1; ++i) { |
| boo = bar(c); |
| } |
| function foo() { ... } |
| function bar() { ... } |
| } |
| #+END_SRC |
| |
| - =pro.ast_mangle(ast, options)= -- generates a new AST containing mangled |
| (compressed) variable and function names. It supports the following |
| options: |
| |
| - =toplevel= -- mangle toplevel names (by default we don't touch them). |
| - =except= -- an array of names to exclude from compression. |
| |
| - =pro.ast_squeeze(ast, options)= -- employs further optimizations designed |
| to reduce the size of the code that =gen_code= would generate from the |
| AST. Returns a new AST. =options= can be a hash; the supported options |
| are: |
| |
| - =make_seqs= (default true) which will cause consecutive statements in a |
| block to be merged using the "sequence" (comma) operator |
| |
| - =dead_code= (default true) which will remove unreachable code. |
| |
| - =pro.gen_code(ast, options)= -- generates JS code from the AST. By |
| default it's minified, but using the =options= argument you can get nicely |
| formatted output. =options= is, well, optional :-) and if you pass it it |
| must be an object and supports the following properties (below you can see |
| the default values): |
| |
| - =beautify: false= -- pass =true= if you want indented output |
| - =indent_start: 0= (only applies when =beautify= is =true=) -- initial |
| indentation in spaces |
| - =indent_level: 4= (only applies when =beautify= is =true=) -- |
| indentation level, in spaces (pass an even number) |
| - =quote_keys: false= -- if you pass =true= it will quote all keys in |
| literal objects |
| - =space_colon: false= (only applies when =beautify= is =true=) -- wether |
| to put a space before the colon in object literals |
| - =ascii_only: false= -- pass =true= if you want to encode non-ASCII |
| characters as =\uXXXX=. |
| - =inline_script: false= -- pass =true= to escape occurrences of |
| =</script= in strings |
| |
| *** Beautifier shortcoming -- no more comments |
| |
| The beautifier can be used as a general purpose indentation tool. It's |
| useful when you want to make a minified file readable. One limitation, |
| though, is that it discards all comments, so you don't really want to use it |
| to reformat your code, unless you don't have, or don't care about, comments. |
| |
| In fact it's not the beautifier who discards comments --- they are dumped at |
| the parsing stage, when we build the initial AST. Comments don't really |
| make sense in the AST, and while we could add nodes for them, it would be |
| inconvenient because we'd have to add special rules to ignore them at all |
| the processing stages. |
| |
| ** Compression -- how good is it? |
| |
| Here are updated statistics. (I also updated my Google Closure and YUI |
| installations). |
| |
| We're still a lot better than YUI in terms of compression, though slightly |
| slower. We're still a lot faster than Closure, and compression after gzip |
| is comparable. |
| |
| | File | UglifyJS | UglifyJS+gzip | Closure | Closure+gzip | YUI | YUI+gzip | |
| |-----------------------------+------------------+---------------+------------------+--------------+------------------+----------| |
| | jquery-1.6.2.js | 91001 (0:01.59) | 31896 | 90678 (0:07.40) | 31979 | 101527 (0:01.82) | 34646 | |
| | paper.js | 142023 (0:01.65) | 43334 | 134301 (0:07.42) | 42495 | 173383 (0:01.58) | 48785 | |
| | prototype.js | 88544 (0:01.09) | 26680 | 86955 (0:06.97) | 26326 | 92130 (0:00.79) | 28624 | |
| | thelib-full.js (DynarchLIB) | 251939 (0:02.55) | 72535 | 249911 (0:09.05) | 72696 | 258869 (0:01.94) | 76584 | |
| |
| ** Bugs? |
| |
| Unfortunately, for the time being there is no automated test suite. But I |
| ran the compressor manually on non-trivial code, and then I tested that the |
| generated code works as expected. A few hundred times. |
| |
| DynarchLIB was started in times when there was no good JS minifier. |
| Therefore I was quite religious about trying to write short code manually, |
| and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ? a |
| = 10 : b = 20”, though the more readable version would clearly be to use |
| “if/else”. |
| |
| Since the parser/compressor runs fine on DL and jQuery, I'm quite confident |
| that it's solid enough for production use. If you can identify any bugs, |
| I'd love to hear about them ([[http://groups.google.com/group/uglifyjs][use the Google Group]] or email me directly). |
| |
| [1] I even reported a few bugs and suggested some fixes in the original |
| [[http://marijn.haverbeke.nl/parse-js/][parse-js]] library, and Marijn pushed fixes literally in minutes. |
| |
| ** Links |
| |
| - Twitter: [[http://twitter.com/UglifyJS][@UglifyJS]] |
| - Project at GitHub: [[http://github.com/mishoo/UglifyJS][http://github.com/mishoo/UglifyJS]] |
| - Google Group: [[http://groups.google.com/group/uglifyjs][http://groups.google.com/group/uglifyjs]] |
| - Common Lisp JS parser: [[http://marijn.haverbeke.nl/parse-js/][http://marijn.haverbeke.nl/parse-js/]] |
| - JS-to-Lisp compiler: [[http://github.com/marijnh/js][http://github.com/marijnh/js]] |
| - Common Lisp JS uglifier: [[http://github.com/mishoo/cl-uglify-js][http://github.com/mishoo/cl-uglify-js]] |
| |
| ** License |
| |
| UglifyJS is released under the BSD license: |
| |
| #+BEGIN_EXAMPLE |
| Copyright 2010 (c) Mihai Bazon <[email protected]> |
| Based on parse-js (http://marijn.haverbeke.nl/parse-js/). |
| |
| Redistribution and use in source and binary forms, with or without |
| modification, are permitted provided that the following conditions |
| are met: |
| |
| * Redistributions of source code must retain the above |
| copyright notice, this list of conditions and the following |
| disclaimer. |
| |
| * Redistributions in binary form must reproduce the above |
| copyright notice, this list of conditions and the following |
| disclaimer in the documentation and/or other materials |
| provided with the distribution. |
| |
| THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY |
| EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR |
| PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE |
| LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, |
| OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
| PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
| PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY |
| THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR |
| TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF |
| THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
| SUCH DAMAGE. |
| #+END_EXAMPLE |