src.model.language_model module
Large language model wrapper for concept generation and reasoning.
This module provides functionality to use a language model to generate and reason about concepts based on activation patterns and generation history.
- class src.model.language_model.LanguageModel(model_id: str, device: str, model_swapping: bool, max_new_tokens: int, prompt_path: str, summary_prompt_path: str)[source]
Bases:
ModelLanguage model for generating and reasoning about neuron concepts.
Uses a large language model to generate concept names and explanations based on activation history and concept scores.
- generate_concept(top_k: int | None = None)[source]
Generate a new concept name with reasoning using the language model.
Uses the concept history and generation history to generate a new concept name. Optionally uses a summary prompt with top-k concepts for faster iteration. Parses the model response to extract both the answer (concept) and reasoning.
- Parameters:
top_k – Optional number of top concepts to use in summary mode. If None, uses the full prompt template.
- Returns:
Tuple of (concept_name, reasoning_text) from the model.
- get_best_concept() str[source]
Get the best concept and its score from the history.
Returns the concept with the highest score from the concept history.
- Returns:
Tuple of (best_concept_name, score) for the highest-scoring concept.
- update_concept_history(new_concept: str, score: float) None[source]
Update the concept history with a new concept and score.
Appends the new concept to the generation history and updates the concept history mapping with the new concept.
- Parameters:
new_concept – The new concept name to add.
score – The score associated with the new concept.