The phrase 'Insanity is doing the same thing over and over again and expecting different results' has found a foothold in popular wisdom in recent decades, even if its provenance remains disputed. Such stubborn behavior confounds scientific method, exposes an immature tendency in human psychology, and accords with our own experience of achieving change and progress. It applies everywhere.
Except in machine learning.
Machine learning has the opposite problem, in that neural networks cannot exactly reproduce the efficacy of previous results even where all the tightly-controlled variables are the same: the same data, the same hardware, the same methodologies.
When the need arises to migrate to new software versions, better loss functions, upgraded hardware, revised/amended data, or to add or reduce model complexity, precise reproducibility drops even further — and all of those circumstances are frequent and inevitable. In this article on the challenges of AI software development, we'll take a look at five key areas in setting up a machine learning model where minor changes can yield critical differences in usability and performance.
Herding Cats in a Neural Network
Industry faith (and ongoing investment) in new technologies depends on reproducibility and on explicable and predictable processes. Where a process is successful but occult, it's expected to be a proprietary technology, such as the profitable Google search algorithm.
By contrast, nearly all of machine learning frameworks are open-source and accessible to all. How is it possible, given this level of transparency, that the AI and machine learning sectors struggle against a popular perception that they are 'black-box' technologies? Why is it so difficult1 to industrialize complex reproducible outcomes from machine learning models? Why is extracting core truths from big data so annoyingly like herding cats?
Controlling Convergence
The goal in the development of a machine learning model is to identify central relationships and potential transformations in large amounts of data, in a manner that enables it to repeat the process later on a similarly structured but different set of data.
To achieve this, the model must traverse large amounts of input training data and establish the 'neural pathways' through which similar information will travel in future sessions (hopefully in a profitable or otherwise beneficial way). When the model has understood and established the innate relationships in the data, it has achieved convergence.
In mathematics, a 'convergent sequence' is a formula that will ultimately resolve to a fixed point, or minima, which might be a specific outcome or a narrow gamut of possible outcomes that will not become any narrower with further processing of the data.
So a convergent algorithm is a reductionist device designed to determine the most useful and generalized outcome from a large volume of possible outcomes, by systematically applying a formula and rejecting what it perceives to be the least accurate results in each iteration.