src.model.concept_history module
This module defines logic for initializing and updating the concept history.
- src.model.concept_history.get_initial_concepts(n_best_concepts: int, n_random_concepts: int, model_layer_activations_path: str, neuron_id: int) Sequence[str][source]
Get initial concepts combining best and random selections.
Returns a sequence of initial concepts consisting of the top n_best_concepts that most strongly activate the neuron, plus n_random_concepts randomly selected from all available control concepts.
- Parameters:
n_best_concepts – Number of top-activating concepts to include.
n_random_concepts – Number of random concepts to include.
model_layer_activations_path – Path to directory containing activation files.
neuron_id – Index of the neuron to select concepts for.
- Returns:
Sequence of initial concept names combining best and random selections.
- src.model.concept_history.update_concept_history(concept_history: Mapping[str, float], new_concept: str, score: float) Mapping[str, float][source]
Update concept history by replacing the worst or randomly selected concept.
If the new concept’s score is better than the worst existing concept, replaces the worst one. Otherwise, randomly removes a concept with probability weighted by its distance from the max score.
- Parameters:
concept_history – Dictionary mapping concept names to their scores.
new_concept – Name of the new concept to add.
score – Score of the new concept.
- Returns:
Updated concept history dictionary with the new concept added.