Excel born to organize data and typically is used to perform (sometimes not so) basic analysis and models. It is widely spread in offices across the globe, used in all working areas and in companies of all dimensions.
If one billion people use Microsoft Office  globally, assuming that about 1/3 of these use mainly Excel, this is a surprising ~300M of users, and they are definitely not going down!
It is a de-facto professional productivity tool for pretty much anything that requires data organization.
But looking deeper, its strength is focused in:
Every spreadsheet has its own syntax, and excel is not far behind. So implementing a model in a spreadsheet must consider two main aspects: syntax and topology. My purpose is to break down these, and of course others, aspects to get a complete picture, and implement it in XLtoy package. Today we analyze how graph theory can be effectively applied in an excel workbook to solve problems that cannot be solved with other approaches.
Spreadsheet syntax is heavily influenced from many aspects: the context of use, how symbols are identified, how refers to the layout and order of the function…
Writing models in Excel nowadays is de facto a standard, but for many reasons can become very painful maintain the development cycle for a long time. Excel was born as software for individual productivity, in fact, it lacks in collaborative tools, to make up for this lack, many teams tend to work on shared folders or write tons of extra-excel docs, underestimating the problem of concurrency or maintainability.
With the growth of the complexity of models and the number of people working on it, became very difficult to follow topics like versioning, topology, regressions and in general all topics related…
Many times i found myself in front of big datasets stored in xlsx format. To be honest, open format is not designed to store lots of data, old binary file xls (Excel Binary Format) was more performing, from this point of view.
Anyway, handling these kinds of files is a very tedious process, because a lot of the time is spent to open, compare, manage changes, and in this use case, to find differences between 2 updated versions of the same data source.
A friend of mine, weekly downloads some new meteorological data. These data are produced in some manner…
Pragmatic programmer | Data Engineer | Python architect