Computational model · semantic associations · visual representations
This project builds a neural network model that predicts the concepts an image is likely to evoke—not simply what object is present. The model links visual input to semantic association profiles derived from large-scale human word-association norms.
The broader aim is to create a high-level semantic model that can be contrasted with lower-level visual models and with human or neural representational geometries. Instead of training the network to classify images into object labels, the target is a vector of associated concepts: for example, a dog image should activate associations such as leash, bone, or pet, while a piano should activate a different semantic neighbourhood.
This makes the model useful as a bridge between object recognition and richer semantic cognition: it still receives images as input, but its output approximates the associative structure that humans bring to those objects.
A subset of images is sampled for ImageNet classes that can be linked to SWOW cues.
Each class is represented as a distribution over a shared 5,000-word association vocabulary.
A pretrained visual encoder is fine-tuned to map images onto semantic association profiles.
We have trained a convolutional neural network to predict semantic association profiles directly from images. The model starts from a Taskonomy `class_object` encoder and fine-tunes it on ImageNet images paired with association vectors derived from SWOW free-association norms. The current implementation follows a static, reproducible pipeline that can later be used to compute representational dissimilarity matrices and compare this model with behavioural, fMRI, EEG, or other artificial-network representations. And many more options! If you have any, reach out!
ImageNet category names are matched to SWOW cue words. Classes that do not match directly are resolved through curated mappings to superordinate cues such as dog, bird, or fish.
For each mapped class, SWOW response strengths are projected onto a global vocabulary. The resulting target matrix contains one semantic association vector per ImageNet class.
A balanced subset of ImageNet images is downloaded for the mapped classes, creating image–association-vector pairs for training.
The model starts from a Taskonomy class_object encoder. A linear head maps the flattened encoder features to the association vocabulary, and the full network is fine-tuned end-to-end.
Predictions are evaluated by averaging over images within each class and comparing the predicted and target vectors using rank-based and top-k metrics.
Representative images are passed through the model and their predicted association strengths are shown over a compact vocabulary selected to make cross-class differences visible.
The model is not only asked to identify an object. It is asked to approximate the semantic neighbourhood of that object. This means that two images can be visually dissimilar yet become closer in the model's output space if they evoke similar associations, and visually similar images can separate if they imply different semantic contexts.
class_object
Provides an object-sensitive visual representation as the starting point for fine-tuning.
The head predicts association strengths over a fixed vocabulary derived from SWOW responses.
All images from a class share the same target vector, encouraging the network to learn class-level associative structure.
The learned features and outputs can be converted into RDMs for comparison with neural or behavioural data.
The panel below shows a compact probe of the trained model. Select one class to inspect its predicted association profile, or overlay all classes to compare the semantic signatures the model assigns to different images.
Fine-tuned Taskonomy encoder trained on SWOW semantic association norms
Predicted association strengths are sigmoid outputs over a fixed 20-term vocabulary selected from high-ranking model predictions across the five displayed classes. Each term's height reflects how strongly the model associates it with the image.