From images to semantic associates

Goal

The broader aim is to create a high-level semantic model that can be contrasted with lower-level visual models and with human or neural representational geometries. Instead of training the network to classify images into object labels, the target is a vector of associated concepts: for example, a dog image should activate associations such as leash, bone, or pet, while a piano should activate a different semantic neighbourhood.

This makes the model useful as a bridge between object recognition and richer semantic cognition: it still receives images as input, but its output approximates the associative structure that humans bring to those objects.

Input ImageNet images

A subset of images is sampled for ImageNet classes that can be linked to SWOW cues.

Target SWOW association vectors

Each class is represented as a distribution over a shared 5,000-word association vocabulary.

Model Taskonomy encoder + association head

A pretrained visual encoder is fine-tuned to map images onto semantic association profiles.

Pipeline

We have trained a convolutional neural network to predict semantic association profiles directly from images. The model starts from a Taskonomy `class_object` encoder and fine-tunes it on ImageNet images paired with association vectors derived from SWOW free-association norms. The current implementation follows a static, reproducible pipeline that can later be used to compute representational dissimilarity matrices and compare this model with behavioural, fMRI, EEG, or other artificial-network representations. And many more options! If you have any, reach out!

Map ImageNet classes to SWOW cues

ImageNet category names are matched to SWOW cue words. Classes that do not match directly are resolved through curated mappings to superordinate cues such as dog, bird, or fish.

Build class-level association targets

For each mapped class, SWOW response strengths are projected onto a global vocabulary. The resulting target matrix contains one semantic association vector per ImageNet class.

Sample visual training data

A balanced subset of ImageNet images is downloaded for the mapped classes, creating image–association-vector pairs for training.

Fine-tune the network

The model starts from a Taskonomy class_object encoder. A linear head maps the flattened encoder features to the association vocabulary, and the full network is fine-tuned end-to-end.

Evaluate association structure

Predictions are evaluated by averaging over images within each class and comparing the predicted and target vectors using rank-based and top-k metrics.

Visualize predicted profiles

Representative images are passed through the model and their predicted association strengths are shown over a compact vocabulary selected to make cross-class differences visible.

What the model learns

The model is not only asked to identify an object. It is asked to approximate the semantic neighbourhood of that object. This means that two images can be visually dissimilar yet become closer in the model's output space if they evoke similar associations, and visually similar images can separate if they imply different semantic contexts.

Visual backbone Taskonomy class_object

Provides an object-sensitive visual representation as the starting point for fine-tuning.

Output layer 5,000 association terms

The head predicts association strengths over a fixed vocabulary derived from SWOW responses.

Training signal Class-level semantic profiles

All images from a class share the same target vector, encouraging the network to learn class-level associative structure.

Use case Representational comparison

The learned features and outputs can be converted into RDMs for comparison with neural or behavioural data.

Interactive visualization

The panel below shows a compact probe of the trained model. Select one class to inspect its predicted association profile, or overlay all classes to compare the semantic signatures the model assigns to different images.

Association model — predicted profiles

Fine-tuned Taskonomy encoder trained on SWOW semantic association norms

golden retriever

Predicted association strengths are sigmoid outputs over a fixed 20-term vocabulary selected from high-ranking model predictions across the five displayed classes. Each term's height reflects how strongly the model associates it with the image.

Static project page. Representative images are embedded directly in the HTML; the chart is rendered in the browser with Chart.js.