src.model.language_model module

Large language model wrapper for concept generation and reasoning.

This module provides functionality to use a language model to generate and reason about concepts based on activation patterns and generation history.

class src.model.language_model.LanguageModel(model_id: str, device: str, model_swapping: bool, max_new_tokens: int, prompt_path: str, summary_prompt_path: str)[source]

Bases: Model

Language model for generating and reasoning about neuron concepts.

Uses a large language model to generate concept names and explanations based on activation history and concept scores.

generate_concept(top_k: int | None = None)[source]

Generate a new concept name with reasoning using the language model.

Uses the concept history and generation history to generate a new concept name. Optionally uses a summary prompt with top-k concepts for faster iteration. Parses the model response to extract both the answer (concept) and reasoning.

Parameters:

top_k – Optional number of top concepts to use in summary mode. If None, uses the full prompt template.

Returns:

Tuple of (concept_name, reasoning_text) from the model.

get_best_concept() str[source]

Get the best concept and its score from the history.

Returns the concept with the highest score from the concept history.

Returns:

Tuple of (best_concept_name, score) for the highest-scoring concept.

update_concept_history(new_concept: str, score: float) None[source]

Update the concept history with a new concept and score.

Appends the new concept to the generation history and updates the concept history mapping with the new concept.

Parameters:
  • new_concept – The new concept name to add.

  • score – The score associated with the new concept.