Category Archives: Angular.js

Mutation Testing with Stryker in an Angularjs 1.6 App

In my current project I’m working on a large and messy Angular front-end. We are frequently making small changes and bugs keep creeping into the site. This had me wonder about some questions lately:

How do we make sure that your unit tests are testing the right things? And how do we find out if there is untestable code in the application or if there are tests that are always passing no matter what?

If you think the answer is measuring code coverage then I have to disappoint you: Code coverage only counts which lines of code are being executed while running the unit tests. It does not count which lines are actually tested let alone how thorough the tests are.

So how do we solve this problem? One of the potential answers is called mutation testing. What’s behind the interesting name? A mutation testing library takes the code that you are trying to test… and literally screws it up. It will for example remove function bodies and switch logical or binary operators for their counterpart. This is called mutation.

It will then run your tests against those destroyed versions of the code. Why does it do that? And what should happen? Each of the mutated pieces of code should at least fail one test. If all the tests still pass although a significant part of the code was altered then you know something is wrong with your tests. It’s actually kind of like a test suite for your test suite.

To illustrate the concept I’ve created a diagram: stryker_mutation_testing_principle To summarise: A tree of different so called mutations of our code is generated. Borrowing terms from Genetics, if all tests fail for a particular mutant, we say the mutant is “killed” by the test suite; on the opposite, if the tests passes, we say the mutant survived. Then our task has to be to “kill it” by writing a better or an additional test.

The concept of mutation testing itself has been around for decades but it used to be too computationally expensive to make practical use of it. Now with fast and parallel computing power at our hands we can give it another try… In Javascript among several mutation testing libraries there is one which is quite far developed: Stryker.

Stryker actually incorporates the technique of measuring code coverage to determine which unit tests should be executed against which tests.

Examples please!

Of course, right ahead… I wanted to give mutation testing a try but I couldn’t find any running example Angular applications with Stryker online, so I created one myself.

With a couple of hiccups due to the still thin documentation on the web Stryker was easy to install: adding a couple of dependencies to package.json and adding a stryker.conf.js to the project. In stryker.conf.js we define that we want to use Jasmine as a test framework and karma as a test runner.

Stryker will then import karma.conf.js to determine the test settings. An important thing to highlight is that the list of files to mutate is defined in  stryker.conf.js, but the list of files needed to execute the tests is being imported from  karma.conf.js.

So what does it look like when Stryker is run and how can we make its magic visible? Let’s run

node_modules/.bin/stryker run --logLevel debug stryker.conf.js

which will give us an actual output of what is mutated and how:

[DEBUG] ClearTextReporter - Mutant killed!
[DEBUG] ClearTextReporter - /Users/daten/Sites/AngularStrykerDemo/app.js: line 9:40
[DEBUG] ClearTextReporter - Mutator: BlockStatement
[DEBUG] ClearTextReporter - -               $ctrl.add = function (a, b) {
[DEBUG] ClearTextReporter - -                 var result = a + b;
[DEBUG] ClearTextReporter - -                 return result;
[DEBUG] ClearTextReporter - -               };
[DEBUG] ClearTextReporter - +               $ctrl.add = function (a, b) {
[DEBUG] ClearTextReporter - +   };
[DEBUG] ClearTextReporter - Ran all tests for this mutant.
[DEBUG] ClearTextReporter - Mutant killed!
[DEBUG] ClearTextReporter - /Users/daten/Sites/AngularStrykerDemo/app.js: line 10:27
[DEBUG] ClearTextReporter - Mutator: BinaryOperator
[DEBUG] ClearTextReporter - -                 var result = a + b;
[DEBUG] ClearTextReporter - +                 var result = a - b;
[DEBUG] ClearTextReporter -
[DEBUG] ClearTextReporter - Ran all tests for this mutant.
[DEBUG] ClearTextReporter - Mutant survived!
[DEBUG] ClearTextReporter -   /Users/daten/Sites/AngularStrykerDemo/app.js: line 15:27
[DEBUG] ClearTextReporter - Mutator: LogicalOperator
[DEBUG] ClearTextReporter - -                 var result = a && b;
[DEBUG] ClearTextReporter - +                 var result = a || b;

The first mutation that was created deleted the contents of our whole add method. The second mutation exchanges a plus operator for a minus in that same method. Luckily for both of these mutants at least one test failed. Hence the Mutant killed! output.

The third mutation – switching a logical and for an or – unfortunately was not caught by our unit tests which is indicated by the Mutant survived! output. We can fix this by adding the following test in app.spec.js (which is currently commented out):

it('should return false', function () {
  expect($ctrl.and(false, true)).toEqual(false);
});

Next time we run the mutation Stryker will recognise that the second test fails for this particular modification and so all will be good. No survivors.

To help you analyse the results of the mutation Stryker also gives you a handy summary in the form of an html page via the html reporter. And by the way a full list of the mutators that Stryker has built in can be found here.

Summary

I hope that I have convinced you that mutation testing is a great idea. But how practical is it to use mutation testing in a production app? Unfortunately I don’t have the answer to that. I myself have only tried it in a sample app so far. To integrate it in a productive way into a live app would be a much bigger effort and personally I am skeptical about the usefulness of doing so.

One problem that I see with using mutation testing in a large live application it is the increased time needed to execute all the tests. From the diagram above we can see that the time of execution for the test suite is multiplied by the amount of mutations made on the code. Every unit test is now not only run once but run once against every mutated version of the code.

Another problem that I see is the amount of time you’d have to spend to fix all your surviving mutations. Ideally this would result in great test quality and that’s what we are after. But in my own experience it is hard enough to get a team of developers to write a tests for every feature anyway. Getting them to also execute mutations and fix another layer of breaking tests might encounter a lot of resistance. But of course every project and team is different so I can’t speak for everyone.

However I think the concept is super interesting and can teach us a lot about how good tests are written. I really enjoyed my excursion into mutation testing and hope to someday use it in a way that goes beyond simple experiments. I’d be interested to hear any experiences using mutation testing in production if there are people out there!

How does code coverage report generation with Instanbul work?

Recently I started working on a project that uses Istanbul to measure and report on code coverage of the unit tests. I thought the tool was genius but I asked myself: How does Istanbul actually track code coverage? It seemed like magic. Today I found the time to take a high level  look at the source code of the project and I was able to shed some light on how it actually works.

Please note that this blog post is not to give you instructions about how to use Istanbul. I think there are plenty out there. Please consider this blog post as a work in progress, the information in here might be up to 50 percent wrong (my own estimate). So far my inspection of the source code has been quite high level and here I document how I THINK it works. I will try to update this blog post once I spend more time looking at some of the details of the code. Also I hope that some people will find this (googling “How do Javascript code coverage reports with Istanbul work” didn’t find any relevant results so far) and let me know if there is something wrong with my understanding.

Let’s start with a diagram of some of the core modules within Istanbul:

Diagram of Istanbul modules.

Now this diagram is certainly an outrageous oversimplification, but it will serve us as a base for further discussion. Let’s take a closer look at those modules:

  • instrumenter: enriches the code that is to be tested with extra statements that log the coverage. Such as if you had a function and you wanted to test how many times the function is called you would insert a counter that gets incremented in the first line of the function.
  • walker: ‘walks’ through the code in the form of a tree.
  • esprima parser: an external plugin that is used to parse the code which is to be tested. Esprima spits out a syntax tree that can be walked by the walker.
  • code: well this is the code that is to be covered.
  • coverage object: at some point the code is executed and through the instrumentation the execution of the code is logged. The output of this process is the coverage object. It is simply a json file with information about how many times the statements, functions and branches in the code have been executed.
  • collector: there might be several coverage object generated for different parts of modules of the code. The collector takes these coverage objects and merges them into one coverage object that contains all the information.
  • reporter: takes the final coverage object and turns it into a human readable and condensed coverage report.

In my opinion the instrumenter is at the heart of the whole plugin. Understanding the concept of instrumentation was the entry point for me to understand how code coverage can be measured. This contains the ‘magic’ that I could not get my head around before.

Important to say is that the coverage reporting is virtually independent of the actual unit tests. The only important thing is that the code execution is monitored while the unit tests are executed. This means that the unit tests need to execute the instrumented code in order for the coverage to be recorded. Before diving into the Istanbul source code I thought that the unit tests needed to deeply integrate with the coverage plugin in order for everything to work. Luckily this is not the case which makes the whole thing less error-prone, because it’s less interdependent.

Tadaaa. This was my short overview over how code coverage with Istanbul works. As mentioned before this is a work in progress and I hope to update it as soon as I have more time to dive into the internals of Istanbul more. In the meantime feel free to let me know if you find anything wrong with my explanation or if you know more about a certain area and can help me understand.