Published

How do deep learning models learn?

"A model is like the brain of a newborn" by
Kandinsky

"A model is like the brain of a newborn" by Kandinsky (DALL-E)

Models

Initially, a model (a task solver) is like the brain of a newborn: it has a tremendous capacity to learn, but it contains no useful information (knowledge). That is, as in the case of a fresh brain, our model is open to learning from new experiences and is flexible enough to do so.

Generalization

The first time a baby sees a dog and her mother or father tells her/him that it is a dog, the baby does not learn what a dog is. The child has just seen a specimen of a particular breed or mix of breeds, so she/he will not have realized what the enormous variety of morphologies of the different dog breeds have in common (the patterns that all breeds share). That child will have to live many experiences like the previous one with very diverse dogs in different scenarios to learn what a dog is, that is, to abstract what is common to all experiences until assimilating the concept of dog (generalization).

Learning principle

This learning principle is transferred to the training of models: process multiple experiences, discover common patterns and relate them to concepts. In essence, our model is a mathematical function: it transforms a data into a result. The model learns by repeatedly processing many examples of the task to be solved (e.g. dog images, X-ray images, etc.), while comparing the model result (model prediction) with the expected result (expectation) in each case. If both are different, we make adjustments or adaptations to the model so that the next time the model "sees" the same examples, the difference between the prediction and the expectation is smaller. This process is repeated until the difference disappears or is acceptable.

Case study

In our case study (estimation of bone age), the model needs to process many X-ray images of the left hands of girls and boys, until it is able to estimate or calculate a precise value of bone age in each case. But precisely with respect to what? In this task there is no reference value, because biological age is an estimate in itself, and we do not have an oracle to solve the problem for us. Therefore, the model prediction will be compared against consensus bone age values ​​based on individual opinions from as many clinicians or experts as possible. That is, we will ask the model to perform as well as a committee of human experts.

"The AI model is like the brain of a newborn" by a impressionism
painter

"The AI model is like the brain of a newborn" by a impressionism painter (DALL-E)

Next steps

As the reader can imagine, a real training process is much more complex. However, the objective of these notes has been to convey some first ideas to people outside the field of machine learning. In future notes we will cover other analogies between the human brain and machine learning models, the importance of sample diversity in model learning, the relationship between task complexity and model capability, independent assessment of learning, etc. And of course, always keeping technical jargon to a minimum!