The cognitive cycle of life: born, learn, assess progress, correct deviations, learn...

"Photo of a silhouette of a robot in a color lit desert at night learning constellation stars"

"Photo of a silhouette of a robot in a color lit desert at night learning constellation stars" (DALL-E)

Machine learning, consist of parameterizable mathematical models that learn to solve tasks by imitating human capacities such as abstraction and decision making

The learning mechanisms

You have just been born and are immediately evaluated: Do you breathe normally? Mom and dad cry with emotion. Weight, height, and head circumference within normal limits? Mom and Dad relax… for a couple of days. The pediatrician checks if we respond to stimuli: Any evidence of being social? We learn, we are evaluated, and hopefully, we correct deviations and improve. And start again.

The learning mechanisms are still under investigation. However, there are certainties about the effectiveness of some universal principles. Assessment is one of them. It is useful if it faithfully explains the state of the learning process and if it serves as a tool for improvement. And yes, also for an artificial intelligence.

Machine learning, a discipline that covers the creation of intelligent solutions from experiences, has adopted these concepts in the design of evaluation protocols. These solutions consist of parameterizable mathematical models that learn to solve tasks by imitating human capacities such as abstraction and decision making.

As we explained in previous posts these models learn from past 'experiences' in the specific 'task of interest'. Experiences are, people's photos, medical images (e.g., an X-ray, a magnetic resonance image, etc.), video segments acquired by drones, video from surveillance cameras, interactions with a chatbot. Tasks on the other hand are, identifying an individual based on biometric traits, diagnosing diseases, detecting intrusion into private property, forest fires from aerial images, driving autonomous vehicles, chatbots, etc. In addition, for simplicity, we will refer to annotated experiences as examples or use cases accompanied by known and verified solutions.

What do we mean when we talk about model?

Before explaining how to evaluate a machine learning model, we need to convey an idea of what a model is and what the learning goal is.

Let's suppose one of the biggest retail companies request us to “design a new paper bag for customers”. As a starting point we should establish some rules about the shape or design of the bag: number of handles (one or two), size of the bag (volume), geometry, etc. These factors would determine the general structure of the bag. Will refer them as hyperparameters of the "bag" model. On the other hand, we must define some specific properties about paper we are going to use in the building process. Those properties depend on the purpose (task) for which it's being designed. Our properties could be stiffness, strength, water absorption capacity, and we will call them model parameters.

The objective of the learning will be to find the optimal combination of values for the hyperparameters (geometry, number of handles, etc.) and parameters (stiffness, resistance, etc.) of the model with respect to the stated objective: to create a comfortable and efficient bag at a viable cost, to make a minor purchase in a supermarket.

Our AI solution will also be defined by hyperparameters and parameters. The former will determine the structure, size, and complexity of the model, while the latter will provide the logic that will allow the model to solve the task of interest. Once the model is defined, it is time to train it so that it learns to solve a particular task. This process rests on multiple phases of continuous assessment.

Three different evaluation phases can be identified. Each of them uses an independent set of annotated experiences (examples with verified solutions). These phases are often referred to as training, validation, and testing

First phase of learning evaluation: parameter adjustment

This phase can be assimilated to the work of a teacher in the classroom correcting exercises solved by a student. The student solves an exercise, the teacher checks the very first solution and points out errors, the student improves it, the teacher checks it again and identifies new errors, the student improves it a little more, and after several iterations, the student ends up solving the problem correctly. And then the teacher proposes a new exercise.

In the case of our AI model (our student), the process starts with an arbitrary choice of values for hyperparameters and parameters (arbitrary knowledge base). That is, our artificial student will not be able to make any logical decision in the beginning.

"A group of baby Droids in a mathematics high school class paying attention to their human

"A group of baby Droids in a mathematics high school class paying attention to their human teacher." (DALL-E)

The model is asked to solve all the examples (exercises) of the training set, and initial arbitrary solutions are obtained. These first solutions are compared with those annotated (the correct ones), the differences between them are measured, and mathematical processes are activated to correct the parameter values so that, the next time the examples are processed, the solutions of the model will be more logical, that is, more similar to those annotated (learning increase). This process is repeated until the differences between the solutions generated by the model and those annotated are small enough. That is, we are repeatedly evaluating the model (student) to progressively correct its deviations until, finally, it ends up learning!?

Second phase of learning evaluation: hyperparameter adjustment

If our student were evaluated only with the exercises proposed in the classroom under the supervision of the teacher, he would soon learn that it is enough to memorize the solutions, and that it is not necessary to learn to generalize.

Our savvy teacher, to validate if the student has genuinely learned before the final exam, performs a mock exam with exercises other than those solved in class. In addition, this time the student does not receive any feedback.

When the student has solved all the exercises, the teacher scores them, identifies biases, structural deficiencies, corrects learning methods, and re-plans a new training class from scratch.

Our AI model also goes through this independent evaluation process based on the examples in the validation set (different from those seen in the first phase). Our model solves the new examples, the quality of the solutions is evaluated, biases are identified (it could solve one category of examples very well, but fail in another), model deficiencies are detected (e.g. inappropriate hyperparameters), etc. And then we use this information to improve the model (e.g. tune hyperparameters), and then repeat the first phase.

That is, the results of this second evaluation give second chances: they allow you to correct the model to improve performance on independent data.

Third phase of learning assessment: what you haven't already learned...

Our student takes the final exam. He must solve exercises never seen before, but similar to those with which he has been trained. Student's solutions are evaluated, and the final score is sent to their parents. This should be reliable enough to inform the true level of the skills acquired by the student. If the student wants to learn more, it will no longer be in this subject.

And once again, our model lives a fate parallel to that of our student. Test data, not involved in the previous phases, were used to obtain an independent (unbiased) measure of the model's ability to solve the task of interest. A definitive measurement of the performance of the model is obtained, and stakeholders (e.g., clients, users, managers) are informed.

In summary, proven practices in the training and evaluation of human intelligence have been transferred and adapted to the training and evaluation of artificial intelligence models. Why reinvents the wheel?