A JavaScript Modules Manager That Fits in a Tweet

29 Oct 2015 – experiments, javascript, packing

ES6 does provide modules; but unless you’re using Babel you’ll have to rely on third-party libraries such as RequireJS until all major browsers support them.

I use D3 everyday to visualize data about ego networks and have a small (400-500 SLOC) JavaScript codebase I need to keep organized. In the context I work in I must keep things simple as I won’t always be there to maintain the code I’m writing today.

How simple a modules implementation could possibly be? It should at least be able to register modules and require a module inside another; much like Python’s import. It should also handle issues like circular dependencies (e.g. foo requires bar which requires foo) and undeclared modules. Modules should be lazily loaded, i.e. only when they are required; and requiring twice the same module shouldn’t execute it twice.

Well, here is one:

p={f:{},m:{},r:function(a,b){p.f[a]=b},g:function(a){if(!p.m[a]){if(p.m[a]<1|!p.f[a])throw"p:"+a;p.m[a]=0;p.m[a]=p.f[a](p)}return p.m[a]}};

It’s 136-bytes long. 139 if you count the variable definition. At this level you can’t expect long function names but here is an usage example:

// register a "main" module. A module consists of a name and a
// function that takes an object used to require other modules.
p.r("main", function(r) {
    // get the "num" module and store it in a `num` variable
    var num = r.g("num");

    // use it to print something
    console.log(num.add(20, 22));
});

// register a "num" module
p.r("num", function(r) {
    // a module can export bindings by returning an object
    return {
        add: function(a, b) { return a+b; },
    };
});

// call the "main" module
p.g("main");

This code will print 42 in the console. It only uses two modules but the implementation works with an arbitrary number of modules. A module can depend on any number of other modules that can be declared in an arbitrary order.

Consider this example:

p.r("m1", function(r) { r.g("m2"); });
p.r("m2", function(r) { r.g("m3"); });
p.r("m3", function(r) { r.g("m1"); });

p.g("m1");

m1 depends on m2 which depends on m3 which itself depends on m1. The implementation won’t die in an endless loop leading to a stack overflow but will fail as soon as it detects the loop:

p:m1

Admittedly this error message doesn’t give us too information but we have to be thrifty in order to fit under 140 characters. The prefix p: tells you the error comes from p, and the part after is the faulty module. It can either be a wrong name (the module doesn’t exist) or a circular dependency.

Walk-through

Note: don’t use this at home. This is just an experiment; I eventually used Browserify for my project.

We need an object to map modules to their functions; we’ll populate it on calls to register. We need another object to store the result of their function call; i.e. what they export. I added a third object to “lock” a module while it’s executed in order to detect circular dependencies.

We’ll have something like that:

var p = {
    _fn: {},   // the functions
    _m: {},    // the modules’ exported values
    _lock: {}, // the locks

    register: function(name, callback) {
        // add the function in the object
        p._fn[name] = callback;
    },

    get: function(name) {
        // if we have a value for this module let’s return it.
        // Note that we should use `.hasOwnProperty` here
        // because this’ll fail if the module returns a falsy
        // value. This is not really important for this problem.
        if (p._m[name]) {
            return p._m[name];
        }

        // if it’s locked that’s because we’re already getting
        // it; so there’s a recursive requirement
        if (p._lock[name]) {
            throw "Recursive requirement: '" + name + "'";
        }

        // if we don’t have any function for this we can’t
        // execute it and get its value. See also the
        // remark about `.hasOwnProperty` above.
        if (!p._fn[name]) {
            throw "Unknown module '" + name + "'";
        }

        // we lock the module so we can detect circular
        // requirements.
        p._lock[name] = true;

        try {
            // execute the module's function and pass
            // ourselves to it so it can require other
            // modules with p.get.
            p._m[name] = p._fn[name](p);
        } finally {
            // ensure we *always* remove the lock.
            delete p._lock[name];
        }

        // return the result
        return p._m[name];
    },
};

This works and is pretty short; but that won’t fit in a Tweet ;)

Let’s compact the exceptions into one because those strings take a lot of place:

if (p._lock[name] || !p._fn[name]) {
    throw "Module error: " + name;
}

The error is less explicit but we’ll accept that here.

We try to get as little code as possible then use YUI Compressor to remove the spaces and rename the variables. This means we can still work with (mostly) readable code and let YUI Compressor do the rest for us.

I measure the final code size with the following command:

yuicompressor p.js | wc -c

Right now we have 240 bytes. We need a way to remove 100 bytes. Let’s rename the attributes. _fn becomes f; _m becomes m, _lock becomes l and the public methods are reduced to their first letter. We can also remove the var since p will be global anyway. Let’s also reduce the error message prefix to "p:".

p = {
    f: {}, m: {}, l: {},

    r: function(name, callback) { p.f[name] = callback; },

    g: function(name) {
        if (p.m[name]) {
            return p.m[name];
        }

        if (p.l[name] || !p.f[name]) {
            throw "p:" + name;
        }

        p.l[name] = true;

        try {
            p.m[name] = p.f[name](p);
        } finally {
            delete p.l[name];
        }

        return p.m[name];
    },
};

That’s 186 bytes once compressed. Not bad! Note that we have twice the same line in the g function (previously known as “get”):

return p.m[name];

We can invert the first if condition and fit the whole code in it; combining both returns into one. This is equivalent to transforming this code:

function () {
    if (A) {
        return B;
    }

    // ...
    return B;
}

Into this one:

function () {
    if (!A) {
        // ...
    }

    return B;
}

The first form is preferable because it removes one indentation level for the function body. But here return is a keyword we can’t compress.

Speaking of keyword we can’t compress; how could we remove the delete? All we care about is to know if there’s a lock or not, so we can set the value to false instead, at the expense of more memory. This saves us only one byte but since we only care about the boolean values we can replace true with 1 and false with 0.

We’re now at 166 bytes and the g function looks like this:

function(name) {
    if (!p.m[name]) {
        if (p.l[name] || !p.f[name]) {
            throw "p:" + name;
        }

        p.l[name] = 1;

        try {
            p.m[name] = p.f[name](p);
        } finally {
            p.l[name] = 0;
        }
    }

    return p.m[name];
}

Now, what if we tried to remove one of the three objects we’re using? We need to keep the functions and the results in separate objects but we might be able to remove the locks object without losing the functionality.

Assuming that modules only return objects let’s merge m and l. We’ll set p.m[A] to 0 if it’s locked and will then override the lock with the result. p.m[A] then have the following possible values:

undefined: the key doesn’t exist; the module hasn’t been required yet
0: the module is currently being executed
something else: the module has already been executed; we have its return value

We need to modify our code a little bit for this:

function(name) {
    if (!p.m[name]) {
        if (p.m[name] === 0 || !p.f[name]) {
            throw "p:" + name;
        }

        p.m[name] = 0;
        p.m[name] = p.f[name](p);
    }

    return p.m[name];
}

Note that this allowed us to get ride of the try/finally which let us go down to 143 bytes. We can already save two bytes by using < 1 instead of === 0.

Replacing || (boolean OR) with | (binary OR) saves one more byte and allows us to fit in 140 bytes! We can go further and remove the brackets for the inner if since it only has one instruction. We need to do that after the compression because YUI Compressor adds brackets if they’re missing.

The final code looks like this:

p = {
    f: {}, m: {},

    r: function(name, callback) { p.f[name] = callback; },

    g: function(name) {
        if (!p.m[name]) {
            if (p.m[name] < 1 | !p.f[name])
                throw "p:" + name;

            p.m[name] = 0;
            p.m[name] = p.f[name](p);
        }

        return p.m[name];
    },
};

That’s 139 bytes once compressed! You can see the result at the top of this blog post.
Please add a comment below if you think of any way to reduce this further while preserving all existing features.

Thank you for reading!

•

Preventing Bash Pranks

17 Jan 2015 – bash, cli, tips

The easiest and most popular Bash pranks involve someone messing up with your ~/.bashrc. For example, here is a real-life example:

#! /bin/bash
b=~/.bashrc
echo >>$b
echo "echo sleep 1 >>$b" >>$b

If you execute this script, it’ll add a newline in your ~/.bashrc just in case it doesn’t end with a newline, then add this line:

echo sleep 1 >>~/.bashrc

The effect of this isn’t immediately visible to the pranked user. When they’ll start a new Bash session, e.g. by opening a new terminal window, the code in ~/.bashrc will be executed, and the previous line will add sleep 1 at the end of it, which means it’ll be executed and the user will have to wait one more second before having their prompt. The next time they’ll open a session, it’ll add one more line and thus will wait 2 seconds, and so forth.

In this post, I’ll give you an overview of the existing solutions to prevent these pranks.

Note that I’m referring to ~/.bashrc as your startup Bash file because it’s commonly used, but some people directly use ~/.bash_profile instead, or another one. When you start a session, Bash reads /etc/profile, then tries ~/.bash_profile, ~/.bash_login, and ~/.profile, (in that order). In most environments the default ~/.bash_profile file sources ~/.bashrc.

User Rights

The first solution is to protect your ~/.bashrc by restraining the access. Nobody should be able to edit your file except you (and root). It should be the default, but if you messed up with user rights, here is how to reset the file to a safe state (read and write for you, and that’s all):

$ chmod 600 ~/.bashrc

Most attacks thus involve you executing a script, which allows them to bypass the rights because the script is executed by you with your editing rights.

One solution would be to remove your own writing right and adding it only when you need it:

# add this in your ~/.bashrc
secure-edit() { chmod u+w $*; ${EDITOR:-vi} $*; chmod u-w $*; }

Then remove your writing right:

$ chmod 400 ~/.bashrc

You can’t edit your file anymore, but you can use your new secure-edit command:

$ secure-edit ~/.bashrc

It temporarily allows you to modify the file, open your editor, then put the restricted rights back.

The “last line protection”

This one is easy to use but easy to circumvent. The goal is to prevent one-line insertions, such as:

echo "alias ls=cd" >> ~/.bashrc

and the solution is as simple as:

Yes, that’s just an hash symbol. If you ends your ~/.bashrc with it, the first inserted line will be commented out:

#alias ls=cd

It doesn’t work if the prankster adds multiple lines, or adds a newline before the prank.

`return`

You can exit from a Bash script with exit. Your ~/.bashrc is not executed like a script, it’s sourced. This means Bash doesn’t start a subshell for it and execute all its content in the current shell. This also means if you write exit it’ll exit your current shell.

The solution here is to use return at the end of your file:

return

Any line added after this one won’t be executed because Bash will stop the evaluation. Note that while it’s better than the previous solution, it can be nullified by a sed call (e.g. sed 's/return//').

The disguised `return`

This one is the same as the previous one, but prevents pranksters from removing it with calls to sed or similar search & replace techniques. It uses the fact than in Bash you can execute a command contained in a variable by using it at the proper place:

print_something=echo
$print_something hello

These lines are equivalent to echo hello. We use the same thing here with return. The idea is to execute an obfuscated version of return, e.g.:

b=tu
a=re
c=rn

$a$b$c

And voilà! It’s now nearly impossible to detect the return execution without manually editing the ~/.bashrc file.

This is still vulnerable to file replacement, e.g.:

rm ~/.bashrc
echo 'echo "sorry, no."' > ~/.bashrc

This wipes the existing ~/.bashrc file and replace it with another one.

•

A Python toolbox

11 Nov 2014 – cli, python, tips

I started learning Python four years ago and have been heavily programming with it for near than a year now. In this post I’ll share some tools I use to ease and speed-up my workflow, either in the Python code or in the development environment.

These tips should be OS- and editor-independent. I have some useful Vim plugins to work with Python, but that’ll be for another post. You might have to adapt commands if you work on Windows.

Setting Things Up

Let’s say you’re starting a Python library. You have a couple dependencies, and hopefully you’d like it to work on multiple versions, like 2.6, 2.7, and 3.x. How can you test that? You have to (a) manage your dependencies to not mess up with your user environment, and (b) test with multiple Python versions.

Introducing virtualenv.

Virtualenv lets you create isolated Python environments. That means you get a pristine environment where you’ll install your library’s dependencies, and nothing else. It’ll be independent of your user space. It’s important to work with an isolated environment because you don’t know which environment will your users have, so you shouldn’t make any assumption besides your own requirements. Working with your user environment means you might forgot a dependency because it just works since it’s installed on your computer but it’ll broke if run on another computer without this dependency.

You should be able to install it with pip:

$ pip install virtualenv

It needs a directory to store the environment. I usually use venv, but you can choose whatever you want:

$ virtualenv venv

You can then either “activate” the environment, which adds its bin directory in your PATH:

$ source venv/bin/activate
$ python  # that's your virtualenv's Python
$ deactivate
$ python  # that's your Python

or prefix your commands with venv/bin/ (replace venv with your directory’s name):

$ venv/bin/python  # that's your virtualenv's Python
$ python  # that's your Python

I usually do the later. Install dependencies in the environment:

$ venv/bin/pip install your-dependency

Don’t forget to tell Git or any tool you’re using for versioning to ignore this venv directory. It can take some place (from 12MB to more than 40MB) depending on the number of dependencies you’re relying on.

To remove an environment, just delete its directory:

$ rm -rf venv

It can be convenient to save place on your computer if you have dozen of environments for different projects, especially if can quickly re-create any environment with its dependencies.

If you’re using pip to manage them, you should know you can install dependencies not only from the command line, but also from a file, with one dependency per line. Each one of them is processed as if it were given on the command-line.

For example, you could have a file containing this:

colorama==0.2.7
coverage==3.7.1
pep8==1.4.6
py==1.4.20
tox==1.7.0
argparse>=1.1
virtualenv==1.11.4

This file is usually called requirements.txt, but here again you can call it whatever you want. You call install all these at once with this command:

$ pip install -r requirements.txt

But we’re programmers and we’re lazy, we don’t want to track down each installed library to include it in this file.

Here comes pip freeze.

pip freeze outputs all installed libraries with their version. It can be put in our requirements.txt for later use:

$ pip freeze > requirements.txt

This requirements.txt file becomes handy when used with virtualenv because you’re now able to fire up a new environment and install all required libraries, with two commands:

$ virtualenv venv
$ venv/bin/pip install -r requirements.txt

Note that these are libraries used in the environment, not necessarily your library’s dependencies. In the above example you can see we’re installing coverage and pep8, which respectively are a code coverage test tool and a lint one we’ll talk about later in this post, not libraries we’re depending on here.

You should thus add this file in your public repository, because it provides anyone with the informations they need to have in order to mirror your environment and be able to contribute to your project.

Coding

Now that your local environment is ready, you can start coding your library. You’ll often have to fire an interpreter to test some things, use help() to check a function’s arguments, etc. Having to type the same things over and over takes time, and remember: we’re lazy.

Like Bash and some other tools, the Python interpreter can be configured with an user file, namely $PYTHONSTARTUP. It allows you to add autocompletion, import common modules, and execute pretty much any code you want.

Start by setting PYTHONSTARTUP in your ~/.bashrc (if you’re using Bash):

export PYTHONSTARTUP="$HOME/.pythonrc.py"

Here we’re telling the interpreter to look for the $HOME/.pythonrc.py file and executing it before giving us a prompt.

Let’s initialize this file with autocompletion support:

try:
    import readline
except ImportError:
    pass
else:
    import rlcompleter
    readline.parse_and_bind("tab: complete")

You can add a lot more stuff in this file, like history support, colored prompts or common imports. For example, if you use sys and re a lot, you can save time by adding them in your startup file:

import sys
import re

You won’t need to type these two lines anymore in your interpreter. It doesn’t change how Python executes files, just the interactive interpreter.

Testing

Four different kinds of tests are covered by this part: style checkers to ensure your code’s consistency, static analysis tools to detect problems before executing the code, unit tests to actually test your library, and code coverage tests to ensure you do test all the code.

Style Checking

These are tools which help you maintaining a consistent coding style in your whole codebase. Most of these tools are easy to use, the hardest part being to choose which one fits your requirements.

Python has a PEP (Python Enhancement Proposal, sort of RFC), the PEP 8, dedicated to its coding conventions. If you want to follow it, a command-line tool, rightly named pep8, is available.

$ venv/bin/pip install pep8
$ venv/bin/pep8 your/lib/root/directory
...
foo/mod.py:84:80: E501 line too long (96 > 79 characters)
foo/mod.py:85:6: E203 whitespace before ':'
foo/mod.py:86:80: E501 line too long (87 > 79 characters)
foo/mod.py:87:4: E121 continuation line indentation is not a multiple of four
...

It’ll check each file and print a list of warnings. You can choose to hide some of them, or use a white list to decide which ones you want. It’s a good tool if you want to follow the PEP 8 conventions.

Another highly customizable tool is pylint. It reads its configuration from a file in your project, which can inherit from global and user configurations. It’ll warn you about bad naming, missing docstrings, functions which take too many arguments, duplicated code, etc. It also gives you some statistics about your code. It’s really powerful but can be a pain if you don’t configure it. For example, it warns you about one-letter variables while you might find them ok.

Enters prospector.

Prospector is built on top of pep8 and pylint and comes with sane defaults regarding the pickiness of both tools. You can tell it to only print important problems about your code:

$ venv/bin/pip install prospector
$ venv/bin/prospector --strictness high

You’ll get a much shorter output, which will hopefully help you find potential problems in your code.

Static Analysis

Here, we’re talking about analysing the code without executing it. Compiled languages benefit from this at compilation time, but interpreted languages like Python have no way to have it.

One of the most popular tools for static analysis on Python code is Pyflakes. It doesn’t check your coding style like pep8 or pylint, but warns you about missing imports, dead code, unused variables, redefined functions and more. You can work without style checkers, but static analysis is really helpful to detect potential bugs before actually running the code.

Pyflakes can be integrated in editors like Vim with Syntastic, but its command-line usage is as easy as the previous tools:

$ venv/bin/pip install pyflakes
$ venv/bin/pyflakes your/directory

Prospector, mentioned in the previous section, also includes pyflakes. You might also want to try Flake8, which combines pyflakes and pep8.

Unit Tests

When talking about testing, we usually think of unit testing, which is testing small pieces of our code at a time, to make sure everything is working correctly. The goal is to test only one feature at a time, to quickly be able to find which parts of the code are not working. There are a lot of great testing frameworks, and Python comes with a built-in one, unittest, which I personnally use. I won’t cover these, and I’ll instead cover the case when you need to test on multiple Python versions, which is often the case when you plan to release a public library. You obviously don’t want to manually switch to each Python version, install your dependencies then run your tests suit each time.

This is a job for tox.

Tox uses Virtualenv, which I talked about earlier, to create standalone Python environments for different Python versions, and test your code in each one of them.

$ venv/bin/pip install tox

Like some previous tools, it needs a configuration file. Here is a basic tox.ini to test on Python 2.6, 2.7, 3.3 and 3.4:

[tox]
envlist = py26, py27, py33, py34
downloadcache = {toxworkdir}/_download/

[testenv]
sitepackages = False
deps =
  colorama
commands =
  {envpython} {toxinidir}/tests/test.py

It declares one dependency, colorama, and tells tox to run tests by executing tests/test.py. That’s all. We can then run our tests:

$ venv/bin/pip/tox

It’ll takes some time on the first run to fetch dependencies and create environments, but all the following times will be faster.

Like virtualenv, tox uses a directory to store these environments. You can safely delete it if you need more space, it’ll be re-created by tox the next time:

$ rm -rf .tox

Code Coverage

This last part about testing talks about code coverage tests. These are tests about tests. The goal here is to ensure your tests cover all your code, and you don’t leave some parts untested. Most tools tell you how many lines where executed when running your tests suit, and give you an overall coverage percentage.

One of them is coverage.

$ venv/bin/pip install coverage

Give it your project’s root directory as well as a file to run your tests:

$ venv/bin/coverage run --source=your/directory tests/test.py

It’ll run them, and give you a nice coverage report:

$ coverage report -m
Name              Stmts   Miss  Cover   Missing
-----------------------------------------------
foobar/__init__       5      0   100%
foobar/base          46      2    96%   5-6
foobar/cli          159    159     0%   3-280
foobar/config        66     66     0%   3-150
foobar/session       64     14    78%   9-10, 106-124
foobar/barfooqux     22      0   100%
foobar/helloworld    25     20    20%   14-41
...
-----------------------------------------------
TOTAL               459    313    32%

It can also give you an HTML version:

$ venv/bin/coverage html

Getting to 100% is the ultimate goal, but you’ll quickly find that’s the first 80% are easy and the remaining 20% are the hardest part, especially when you have I/O, external dependencies like databases, and/or complicated corner-cases.

Check Coverage’s doc for more info.

Debugging

There are a lot of ways to debug, including logs, but the simplest debugging tool is the good old print. It becomes really impracticable when you have to restart your server every time you add or remove one of them from your code. What if you could fire a Python interpreter right from your code and inspect it when it’s running? Well, you can do that with Python’s code module! This trick is really handy, and I’ve been using it heavily instead of these prints we write everywhere since I’ve discovered it.

The code module provides you with an interact function, which takes a dict of variables to inject in the interpreter. You’ll be able to print them, play with them, but these changes won’t be reflected in the program you’re debugging.

Remember that Python lets you get all global variables as a dict with globals() and all local ones with locals(). We thus start by creating a mirror of the local environment:

vars = globals().copy()
vars.update(locals())

These two lines get all global variables (including imported modules) in a dict called vars and add local variables in it. This can then be passed directly to the interpreter:

import code
code.interact(local=vars)

This will start an interpreter with all these variables already available in it. There’s nothing to install, this is a standard Python module.

You can even inline the code and add it in your favorite snippets manager:

g=globals().copy();g.update(locals());import code;code.interact(local=g)

Don’t forget to remove it when you’re done!

TL;DR

Use virtualenv to isolate your Python environment, pip freeze and a requirements.txt file to keep track of your dependencies.
Write a pythonrc.py file to add autocomplete support to your interpreter
Use pep8, pylint and pyflakes to keep a high code quality
Use tox to test on multiple Python versions
Fire a local interpreter instead of printing variables

That was all. Please comment on this post if you think of any tool you use to speed-up your Python development!

•

← Older Newer →