src.activation_sampling module
Activation sampling utilities for scoring generated concepts.
This module provides functionality to sample control concept activations based on semantic similarity to newly generated concepts.
- class src.activation_sampling.ActivationSampler(model_layer_activations_path: str)[source]
Bases:
objectSample control concept activations based on semantic similarity.
Uses a sentence transformer to compute embeddings and find similar control concepts to a given new concept, then samples from them with inverse similarity weighting.
- sample_control_activations(new_concept: str) torch.Tensor[source]
Sample control activations similar to a new concept.
Finds control concepts semantically similar to the new concept and samples one with inverse similarity weighting, then returns its activation tensor.
- Parameters:
new_concept – New concept to find similar control concepts for.
- Returns:
Control activation tensor for the sampled control concept.