How to tag and track changes in excel models

This article explains how to use XLtoy as a change management tool to follow workbook changes.

Writing models in Excel nowadays is de facto a standard, but for many reasons can become very painful maintain the development cycle for a long time. Excel was born as software for individual productivity, in fact, it lacks in collaborative tools, to make up for this lack, many teams tend to work on shared folders or write tons of extra-excel docs, underestimating the problem of concurrency or maintainability.

With the growth of the complexity of models and the number of people working on it, became very difficult to follow topics like versioning, topology, regressions and in general all topics related to what one identify as change management.

Programmers face this problem on a daily basis and they have refined some best practice which can be summarized in 2 main activities take a snapshot of the work in progress store it as a fingerprint and give it a tag, nevertheless a tool capable to make semantic differences between different versions.

XLtoy can interact easily with excel as saw in the last article and it is the right instrument to conquer these kinds of topics. Combining XLtoy collect and diff features, we can handle model versioning in these terms.

Here is a simple example showing these capabilities, i use YAML format for its simplicity and outrightness, but in the same manner is possible to use JSON format to manipulate it with other tools, this is the first step towards automation.

In this exercise, i’ve used a simple model in which a variable is described by receiving the variations of a reference signal and applying a trend. The example file is included in the package itself.

This is a graphic representation of our simple model that is composed of only 2 exogenous: Time, Signal and 2 processes: Trend, Model

Following the rules of XLtoy use the name manager to describe where is the model, in green and which cells must be used as labels, in red, this technique let XLtoy associate a label to each formula. One line is enough, all the others have been dragged, therefore from a relative point of view, all the rows contain the same formula but each row reads from the previous one.

This operation takes few seconds and now the name manager is filled like this

We’re done with Excel, let’s move on to XLtoy, use a shell to launch some commands.

The first step is to take a snapshot of all values. Remember that the model is in row 12 and we are using label/model representation:

$ xltoy collect simple_model1.xlsx --data --yaml Foglio1:
Model: 75.23377001879057
Signal: 44.93377001879057
Time: 0.25
Trend: 80.3

This allows us to see what’s happening to values contained in each cell, but some cases cannot be intercepted from a formula point of view, e.g., add an empty cell to an existing formula don’t change the result so this is not enough to track changes in the formulas.

$ xltoy collect simple_model1.xlsx --yamlFoglio1:
Model: =(G12-G11)*$H$3+I11+$F12*$H$4
Signal: 44.93377001879057
Time: 0.25
Trend: =H11+$F12*$H$4

The outcome of this command, shows us all formulas of the model indicated in the named range. Each formula is coupled with its label. This representation is very intuitive and easy to understand by humans or code.

This can be used to sign a model when it must be deployed to a production environment, for versioning documentation, like git tag do and, as well it can be used as a snapshot to use with XLtoy diff command, this work as explained in the previous article “How to efficiently find differences on BIG excel worksheet”.

To tag a version it is necessary to add more metadata, XLtoy allows us to add a fingerprint, a tag, and a description to freeze all the information about the version in the output:

$ xltoy collect simple_model1.xlsx --yaml --tag v1.0 --description 'Demo model, published in 2021M3'
Model: =(G12-G11)*$H$3+I11+$F12*$H$4
Signal: 44.93377001879057
Time: 0.25
Trend: =H11+$F12*$H$4
xltoy.datetime: 2021-03-12T13:50:16.222577
xltoy.description: Demo model, published in 2021M3
xltoy.filename: simple_model1.xlsx
xltoy.tag: v1.0
xltoy.version: 0.1.8

Let’s all, this command extracts model informations and tag it with some necessary metadata information.

Ok! now we know how to take a snapshot of the model, but how we use it to track changes between versions?

The XLtoy diff works like the diff command and can work on a previously collected snapshot or an xlsx file, so you can compute the difference between excel file, YAML, JSON snapshot, or any combination of these.

First of all, we need to store the current version of the model shown before in a file model_version1.json:

$ xltoy collect simple_model1.xlsx  --json --tag v1.0 --description 'Demo model, published in 2021M3' > model_version1.json

Here the menù for this second part of the exercise, we apply some changes to bring us to version 2 of the model:

  1. add a 0 parameter to the trend (it will not change the result, but will change the formula)
  2. rename time to Y_step
  3. do a change to the syntax of the model formula
  4. add a new formula to calculate step in days d_step
  5. add the new formula to named ranges.
  6. save the updated model.

After a minute of work on excel, we can run the diff command to see if these changes are tracked correctly:

$ xltoy diff model_version1.json simple_model1.xlsx add:
D_step: =360*F12
Y_step: 0.25
Model: =(G12-G11)*$H$3+I11+$F12*$H$4 -> =I11+(G12-G11)*$H$3+$F12*$H$4
Trend: =H11+$F12*$H$4 -> =H11+$F12*$H$4+$G$5
Time: 0.25

Wow ! it found all the changes. This report, show all changes between the versions, organized in a simple and readable hierarchy. It can be used as a report for change tracking or to investigate unexpected behavior, like a changelog, or in general as diff document.

In this article, we have seen how XLtoy can freeze an excel model in a JSON file and use it as a mold to control and track differences between releases.

In this case, we have used a simple model, but the diff command can handle models with higher complexity of orders of magnitude, it was tested on models of over than 5K formulas. To give you an example it takes 300 sec. to complete the collect command, but only 8 msec. to do a diff on a model with >2.5K formulas

In the next article, we’ll see how to use graph power to analyze the model topology in order to solve problems related to the interdependencies between formulas.


Pragmatic programmer | Data Engineer | Python architect

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store