Artificial intelligence that mimics human thinking

Human semantic structure integrated into neural image processing models for the first time

November 13, 2025

To the Point

  • Hierarchical knowledge: Human knowledge is typically organized hierarchically, while machines have difficulty grasping this structure. AligNet enables models to mimic human judgments about image similarities.
  • AI research: New approaches are improving the visual understanding of computer models. One team has developed AligNet to integrate human semantic structures into neural networks.
  • Increased efficiency: Fine-tuning models with AligNet takes significantly less computing time than retraining. The models show up to a 93.5 percent improvement in alignment with human evaluations.

The scientists are investigating how visual representations in modern deep neural networks are structured compared to human perception and conceptual knowledge, and how these can be better aligned. Although artificial intelligence (AI) today achieves impressive performance in image processing, machines often generalize less robustly than humans, for instance, when faced with new types of images or unfamiliar relations.

“The central question of our study is: what do modern machine learning systems lack to show human-like behavior, not only in terms of performance, but also in how they organize and form representations?” explains lead author Lukas Muttenthaler, scientist at the Max Planck Institute for Human Cognitive and Brain Sciences, the BIFOLD institute of TU Berlin, and former employee at Google DeepMind.

The researchers show that human knowledge is typically organized hierarchically, from fine-grained distinctions (e.g., “pet dog”) to coarse ones (e.g., “animal”). Machine learning systems, on the other hand, often fail to capture these different levels of abstraction and semantics. To align the models with human conceptual knowledge, the scientists first trained a teacher model to imitate human similarity judgments. This teacher model thus learned a representational structure that can be considered “human-like.” The learned structure was then used to improve already pretrained, high-performing Vision Foundation Models, the so-called student models, through a process called soft alignment. This fine-tuning requires several orders of magnitude less computational cost than retraining the models from scratch.

Cognitively grounded AI

The student models were fine-tuned using „AligNet“, a large image dataset synthetically generated through the teacher model that incorporates similarity judgments corresponding to human perceptions. To evaluate the fine-tuned student models, the researchers used a specially collected dataset known as the „Levels“ dataset.

“For this dataset, around 500 participants performed an image-similarity task that covered multiple levels of semantic abstraction, from very coarse categorizations to fine-grained distinctions and category boundaries. For each judgment, we recorded both full response distributions and reaction times to capture potential links with human decision uncertainty. The resulting dataset represents a newly established benchmark for human-machine alignment, which we open-scourced,” reports Frieda Born, PhD student at BIFOLD and the MPI for Human Development.

The models trained with „AligNet“ show significant improvements in alignment with human judgments, including up to a 93.5 percent relative improvement in coarse semantic evaluations. In some cases, they even surpass the reliability of human ratings. Moreover, these models exhibit no loss in performance; on the contrary, they demonstrate consistent performance increases (25 to 150 percent relative improvement) across various complex real-world machine learning tasks, all at minimal computational cost.

Klaus-Robert Müller, Co-director at BIFOLD: “Our research methodologically bridges cognitive science (human levels of abstraction) and modern deep-learning practice (Vision Foundation Models), thus forming a link between the concept of representation in humans and in machines. This represents an important step toward more interpretable, cognitively grounded AI.”

AligNet demonstrates that hierarchical conceptual structures can be transferred to neural networks without explicit hierarchical training, with reorganisation visible across network layers. These results suggest that AligNet achieves fundamental improvements in visual representations that better reflect the human level of conceptual understanding, thereby making AI less of a ‘black box’.

Andrew K. Lampinen from Google DeepMind adds: “For the first time, researchers have found an efficient way to teach computer vision models about the hierarchical structure of human conceptual knowledge. We show that this not only makes the representations of these models more human-like, and therefore more interpretable, but also improves their predictive power and robustness across a wide range of vision tasks.”

Other Interesting Articles

Go to Editor View