src.activation_sampling module

Activation sampling utilities for scoring generated concepts.

This module provides functionality to sample control concept activations based on semantic similarity to newly generated concepts.

class src.activation_sampling.ActivationSampler(model_layer_activations_path: str)[source]

Bases: object

Sample control concept activations based on semantic similarity.

Uses a sentence transformer to compute embeddings and find similar control concepts to a given new concept, then samples from them with inverse similarity weighting.

sample_control_activations(new_concept: str) torch.Tensor[source]

Sample control activations similar to a new concept.

Finds control concepts semantically similar to the new concept and samples one with inverse similarity weighting, then returns its activation tensor.

Parameters:

new_concept – New concept to find similar control concepts for.

Returns:

Control activation tensor for the sampled control concept.