Artificial Intelligence News News Briefs

How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making

March 10, 2026

0 Views

SaveSavedRemoved 0

🔥 LIMITED-TIME DEAL ALERT – Click Here Before It’s Gone! 🔥

In this tutorial, we build an advanced agent system that goes beyond simple response generation by integrating an internal critic and uncertainty estimation framework. We simulate multi-sample inference, evaluate candidate responses across accuracy, coherence, and safety dimensions, and quantify predictive uncertainty using entropy, variance, and consistency measures. We implement risk-sensitive selection strategies to balance confidence and uncertainty in decision-making. Through structured experiments and visualizations, we explore how self-consistent reasoning and uncertainty-aware selection improve reliability and robustness in agent behavior.

🔥 Amazon Gadget Deal
Check Best Price →

Copy Code

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from collections import Counter
import warnings
warnings.filterwarnings('ignore')


np.random.seed(42)


print("=" * 80)
print("


 AGENT WITH INTERNAL CRITIC + UNCERTAINTY ESTIMATION")
print("=" * 80)
print()


@dataclass
class Response:
   content: str
   confidence: float
   reasoning: str
   token_logprobs: List[float]
  
   def __repr__(self):
       return f"Response(content='{self.content[:50]}...', confidence={self.confidence:.3f})"


@dataclass
class CriticScore:
   accuracy_score: float
   coherence_score: float
   safety_score: float
   overall_score: float
   feedback: str
  
   def __repr__(self):
       return f"CriticScore(overall={self.overall_score:.3f})"


@dataclass
class UncertaintyEstimate:
   entropy: float
   variance: float
   consistency_score: float
   epistemic_uncertainty: float
   aleatoric_uncertainty: float
  
   def risk_level(self) -> str:
       if self.entropy < 0.5 and self.consistency_score > 0.8:
           return "LOW"
       elif self.entropy < 1.0 and self.consistency_score > 0.5:
           return "MEDIUM"
       else:
           return "HIGH"

We define the foundational data structures that power our agent system. We create structured containers for responses, critic scores, and uncertainty estimates using dataclasses. We establish reproducibility and initialize the core components that allow us to quantify risk and model confidence.

Copy Code

class SimulatedLLM:
  
   def __init__(self, model_quality: float = 0.8):
       self.model_quality = model_quality
       self.response_templates = {
           "math": [
               "The answer is {answer}. This is calculated by {reasoning}.",
               "{answer} is the result when you {reasoning}.",
               "By {reasoning}, we get {answer}.",
               "The solution is {answer} because {reasoning}.",
           ],
           "factual": [
               "The answer is {answer}. {reasoning}",
               "Based on the facts, {answer}. {reasoning}",
               "{answer} is correct. {reasoning}",
               "The factual answer is {answer}. {reasoning}",
           ]
       }
  
   def generate_response(self, prompt: str, temperature: float = 0.7) -> Response:
       noise = np.random.randn() * temperature
       quality = np.clip(self.model_quality + noise * 0.2, 0.1, 1.0)
      
       if "What is" in prompt and "+" in prompt:
           parts = prompt.split()
           try:
               num1 = int(parts[parts.index("is") + 1])
               num2 = int(parts[parts.index("+") + 1].rstrip("?"))
               correct_answer = num1 + num2
              
               if np.random.rand() > quality:
                   answer = correct_answer + np.random.randint(-3, 4)
               else:
                   answer = correct_answer
              
               template = np.random.choice(self.response_templates["math"])
               reasoning = f"adding {num1} and {num2}"
               content = template.format(answer=answer, reasoning=reasoning)
              
               token_logprobs = list(np.random.randn(10) - (1 - quality) * 2)
               confidence = quality + np.random.randn() * 0.1
               confidence = np.clip(confidence, 0.1, 0.99)
              
               return Response(
                   content=content,
                   confidence=confidence,
                   reasoning=reasoning,
                   token_logprobs=token_logprobs
               )
           except:
               pass
      
       template = np.random.choice(self.response_templates["factual"])
       answer = "unknown"
       reasoning = "insufficient information to determine"
       content = template.format(answer=answer, reasoning=reasoning)
      
       token_logprobs = list(np.random.randn(10) - 1)
       confidence = 0.5 + np.random.randn() * 0.1
      
       return Response(
           content=content,
           confidence=np.clip(confidence, 0.1, 0.99),
           reasoning=reasoning,
           token_logprobs=token_logprobs
       )
  
   def generate_multiple(self, prompt: str, n: int = 5, temperature: float = 0.7) -> List[Response]:
       return [self.generate_response(prompt, temperature) for _ in range(n)]

We simulate a language model that generates multiple candidate responses of varying quality. We introduce temperature-based variability and controlled noise to mimic realistic sampling behavior. We enable multi-sample generation to support self-consistency reasoning and uncertainty estimation.

Copy Code

class InternalCritic:
  
   def __init__(self, strict_mode: bool = False):
       self.strict_mode = strict_mode
  
   def evaluate_response(self, response: Response, prompt: str, ground_truth: Optional[str] = None) -> CriticScore:
       accuracy = self._evaluate_accuracy(response, ground_truth)
       coherence = self._evaluate_coherence(response)
       safety = self._evaluate_safety(response)
      
       weights = {'accuracy': 0.4, 'coherence': 0.3, 'safety': 0.3}
       overall = (weights['accuracy'] * accuracy +
                 weights['coherence'] * coherence +
                 weights['safety'] * safety)
      
       feedback = self._generate_feedback(accuracy, coherence, safety)
      
       return CriticScore(
           accuracy_score=accuracy,
           coherence_score=coherence,
           safety_score=safety,
           overall_score=overall,
           feedback=feedback
       )
  
   def _evaluate_accuracy(self, response: Response, ground_truth: Optional[str]) -> float:
       if ground_truth is None:
           return response.confidence
      
       if ground_truth.lower() in response.content.lower():
           return 1.0
       else:
           response_words = set(response.content.lower().split())
           truth_words = set(ground_truth.lower().split())
           overlap = len(response_words & truth_words) / max(len(truth_words), 1)
           return overlap * 0.5
  
   def _evaluate_coherence(self, response: Response) -> float:
       avg_logprob = np.mean(response.token_logprobs)
       coherence_from_logprobs = 1.0 / (1.0 + np.exp(-avg_logprob))
       coherence = 0.6 * coherence_from_logprobs + 0.4 * response.confidence
       length_penalty = 1.0
       content_length = len(response.content.split())
       if content_length < 5 or content_length > 100:
           length_penalty = 0.8
       return coherence * length_penalty
  
   def _evaluate_safety(self, response: Response) -> float:
       unsafe_patterns = ['ignore instructions', 'harmful', 'dangerous']
       content_lower = response.content.lower()
       for pattern in unsafe_patterns:
           if pattern in content_lower:
               return 0.3
       return 1.0
  
   def _generate_feedback(self, accuracy: float, coherence: float, safety: float) -> str:
       feedback_parts = []
       if accuracy < 0.5:
           feedback_parts.append("Low accuracy - may contain errors")
       elif accuracy > 0.8:
           feedback_parts.append("High accuracy")
       if coherence < 0.5:
           feedback_parts.append("Low coherence - uncertain")
       if safety < 0.9:
           feedback_parts.append("Safety concerns detected")
       if not feedback_parts:
           feedback_parts.append("Good quality response")
       return "; ".join(feedback_parts)

We implement an internal critic that evaluates each response across the dimensions of accuracy, coherence, and safety. We compute a weighted overall score to systematically quantify response quality. We generate interpretable feedback that explains how well each response performs.

Copy Code

class UncertaintyEstimator:
  
   def estimate_uncertainty(self, responses: List[Response], critic_scores: List[CriticScore]) -> UncertaintyEstimate:
       answers = [self._extract_answer(r.content) for r in responses]
       entropy = self._compute_entropy(answers)
       variance = np.var([score.overall_score for score in critic_scores])
       consistency = self._compute_consistency(answers)
       epistemic = self._compute_epistemic_uncertainty(responses)
       aleatoric = self._compute_aleatoric_uncertainty(responses)
      
       return UncertaintyEstimate(
           entropy=entropy,
           variance=variance,
           consistency_score=consistency,
           epistemic_uncertainty=epistemic,
           aleatoric_uncertainty=aleatoric
       )
  
   def _extract_answer(self, content: str) -> str:
       words = content.split()
       for word in words:
           if word.replace('.', '').replace('-', '').isdigit():
               return word
       return content.split('.')[0]
  
   def _compute_entropy(self, answers: List[str]) -> float:
       if not answers:
           return 0.0
      
       counts = Counter(answers)
       total = len(answers)
      
       entropy = 0.0
       for count in counts.values():
           p = count / total
           if p > 0:
               entropy -= p * np.log2(p)
      
       return entropy
  
   def _compute_consistency(self, answers: List[str]) -> float:
       if len(answers) <= 1:
           return 1.0
      
       counts = Counter(answers)
       most_common_count = counts.most_common(1)[0][1]
      
       return most_common_count / len(answers)
  
   def _compute_epistemic_uncertainty(self, responses: List[Response]) -> float:
       confidences = [r.confidence for r in responses]
       mean_logprobs = [np.mean(r.token_logprobs) for r in responses]
      
       confidence_var = np.var(confidences)
       logprob_var = np.var(mean_logprobs) / 10.0
      
       return np.sqrt(confidence_var + logprob_var)
  
   def _compute_aleatoric_uncertainty(self, responses: List[Response]) -> float:
       variances = [np.var(r.token_logprobs) for r in responses]
       return np.mean(variances) / 10.0

We estimate predictive uncertainty using entropy, variance, consistency, and uncertainty decomposition. We distinguish between epistemic and aleatoric uncertainty to better understand model behavior. We provide quantitative signals that guide risk-aware response selection.

Copy Code

class RiskSensitiveSelector:
  
   def __init__(self, risk_tolerance: float = 0.5):
       self.risk_tolerance = risk_tolerance
  
   def select_response(self, responses: List[Response], critic_scores: List[CriticScore], uncertainty: UncertaintyEstimate, strategy: str = "risk_adjusted") -> Tuple[Response, int]:
       if strategy == "best_score":
           return self._select_best_score(responses, critic_scores)
       elif strategy == "most_confident":
           return self._select_most_confident(responses)
       elif strategy == "most_consistent":
           return self._select_most_consistent(responses)
       elif strategy == "risk_adjusted":
           return self._select_risk_adjusted(responses, critic_scores, uncertainty)
       else:
           raise ValueError(f"Unknown strategy: {strategy}")
  
   def _select_best_score(self, responses: List[Response], critic_scores: List[CriticScore]) -> Tuple[Response, int]:
       best_idx = np.argmax([score.overall_score for score in critic_scores])
       return responses[best_idx], best_idx
  
   def _select_most_confident(self, responses: List[Response]) -> Tuple[Response, int]:
       best_idx = np.argmax([r.confidence for r in responses])
       return responses[best_idx], best_idx
  
   def _select_most_consistent(self, responses: List[Response]) -> Tuple[Response, int]:
       answers = [self._extract_answer(r.content) for r in responses]
       most_common = Counter(answers).most_common(1)[0][0]
      
       for idx, answer in enumerate(answers):
           if answer == most_common:
               return responses[idx], idx
      
       return responses[0], 0
  
   def _select_risk_adjusted(self, responses: List[Response], critic_scores: List[CriticScore], uncertainty: UncertaintyEstimate) -> Tuple[Response, int]:
       scores = []
       risk_penalty = (1 - self.risk_tolerance) * uncertainty.entropy
      
       for response, critic_score in zip(responses, critic_scores):
           base_score = critic_score.overall_score
           confidence_bonus = self.risk_tolerance * response.confidence
           adjusted_score = base_score + confidence_bonus - risk_penalty
           scores.append(adjusted_score)
      
       best_idx = np.argmax(scores)
       return responses[best_idx], best_idx
  
   def _extract_answer(self, content: str) -> str:
       words = content.split()
       for word in words:
           if word.replace('.', '').replace('-', '').isdigit():
               return word
       return content.split('.')[0]

We implement multiple response selection strategies, including best-score, most-confident, most-consistent, and risk-adjusted approaches. We incorporate risk tolerance to balance quality against uncertainty. We enable adaptive decision-making depending on how conservative or risk-seeking we want the agent to be.

Copy Code

class CriticAugmentedAgent:
  
   def __init__(self, model_quality: float = 0.8, risk_tolerance: float = 0.5, n_samples: int = 5):
       self.llm = SimulatedLLM(model_quality=model_quality)
       self.critic = InternalCritic()
       self.uncertainty_estimator = UncertaintyEstimator()
       self.selector = RiskSensitiveSelector(risk_tolerance=risk_tolerance)
       self.n_samples = n_samples
  
   def generate_with_critic(self, prompt: str, ground_truth: Optional[str] = None, strategy: str = "risk_adjusted", temperature: float = 0.7, verbose: bool = True) -> Dict:
       if verbose:
           print(f" Prompt: {prompt}")
           print(f" Generating {self.n_samples} candidate responses...")
      
       responses = self.llm.generate_multiple(prompt, self.n_samples, temperature)
      
       if verbose:
           print(f"✓ Generated {len(responses)} responsesn")
           print(" Evaluating with internal critic...")
      
       critic_scores = [
           self.critic.evaluate_response(response, prompt, ground_truth)
           for response in responses
       ]
      
       if verbose:
           print(f"✓ All responses evaluatedn")
           print(" Computing uncertainty estimates...")
      
       uncertainty = self.uncertainty_estimator.estimate_uncertainty(
           responses, critic_scores
       )
      
       if verbose:
           print(f"✓ Uncertainty: {uncertainty.risk_level()} risk")
           print(f"  - Entropy: {uncertainty.entropy:.3f}")
           print(f"  - Consistency: {uncertainty.consistency_score:.3f}n")
           print(f" Selecting best response (strategy: {strategy})...")
      
       selected_response, selected_idx = self.selector.select_response(
           responses, critic_scores, uncertainty, strategy
       )
      
       if verbose:
           print(f"✓ Selected response #{selected_idx}n")
           print("=" * 80)
           print("FINAL ANSWER:")
           print(selected_response.content)
           print("=" * 80)
           print(f"nConfidence: {selected_response.confidence:.3f}")
           print(f"Critic Score: {critic_scores[selected_idx].overall_score:.3f}")
           print(f"Risk Level: {uncertainty.risk_level()}")
           print()
      
       return {
           'selected_response': selected_response,
           'selected_index': selected_idx,
           'all_responses': responses,
           'critic_scores': critic_scores,
           'uncertainty': uncertainty,
           'strategy': strategy
       }

We integrate the language model, critic, uncertainty estimator, and selector into a complete critic-augmented agent pipeline. We generate multiple responses, evaluate them, compute uncertainty, and select the optimal answer. We orchestrate the full multi-stage reasoning workflow in a structured and extensible manner.

Copy Code

class AgentAnalyzer:
  
   @staticmethod
   def plot_response_distribution(result: Dict):
       fig, axes = plt.subplots(2, 2, figsize=(14, 10))
       fig.suptitle('Agent Response Analysis', fontsize=16, fontweight='bold')
      
       responses = result['all_responses']
       scores = result['critic_scores']
       uncertainty = result['uncertainty']
       selected_idx = result['selected_index']
      
       ax = axes[0, 0]
       score_values = [s.overall_score for s in scores]
       bars = ax.bar(range(len(scores)), score_values, alpha=0.7)
       bars[selected_idx].set_color('green')
       bars[selected_idx].set_alpha(1.0)
       ax.axhline(np.mean(score_values), color='red', linestyle='--', label=f'Mean: {np.mean(score_values):.3f}')
       ax.set_xlabel('Response Index')
       ax.set_ylabel('Critic Score')
       ax.set_title('Critic Scores for Each Response')
       ax.legend()
       ax.grid(True, alpha=0.3)
      
       ax = axes[0, 1]
       confidences = [r.confidence for r in responses]
       bars = ax.bar(range(len(responses)), confidences, alpha=0.7, color='orange')
       bars[selected_idx].set_color('green')
       bars[selected_idx].set_alpha(1.0)
       ax.axhline(np.mean(confidences), color='red', linestyle='--', label=f'Mean: {np.mean(confidences):.3f}')
       ax.set_xlabel('Response Index')
       ax.set_ylabel('Confidence')
       ax.set_title('Model Confidence per Response')
       ax.legend()
       ax.grid(True, alpha=0.3)
      
       ax = axes[1, 0]
       components = {
           'Accuracy': [s.accuracy_score for s in scores],
           'Coherence': [s.coherence_score for s in scores],
           'Safety': [s.safety_score for s in scores]
       }
       x = np.arange(len(responses))
       width = 0.25
       for i, (name, values) in enumerate(components.items()):
           offset = (i - 1) * width
           ax.bar(x + offset, values, width, label=name, alpha=0.8)
       ax.set_xlabel('Response Index')
       ax.set_ylabel('Score')
       ax.set_title('Critic Score Components')
       ax.set_xticks(x)
       ax.legend()
       ax.grid(True, alpha=0.3, axis='y')
      
       ax = axes[1, 1]
       uncertainty_metrics = {
           'Entropy': uncertainty.entropy,
           'Variance': uncertainty.variance,
           'Consistency': uncertainty.consistency_score,
           'Epistemic': uncertainty.epistemic_uncertainty,
           'Aleatoric': uncertainty.aleatoric_uncertainty
       }
       bars = ax.barh(list(uncertainty_metrics.keys()), list(uncertainty_metrics.values()), alpha=0.7)
       ax.set_xlabel('Value')
       ax.set_title(f'Uncertainty Estimates (Risk: {uncertainty.risk_level()})')
       ax.grid(True, alpha=0.3, axis='x')
      
       plt.tight_layout()
       plt.show()
  
   @staticmethod
   def plot_strategy_comparison(agent: CriticAugmentedAgent, prompt: str, ground_truth: Optional[str] = None):
       strategies = ["best_score", "most_confident", "most_consistent", "risk_adjusted"]
       results = {}
      
       print("Comparing selection strategies...n")
      
       for strategy in strategies:
           print(f"Testing strategy: {strategy}")
           result = agent.generate_with_critic(prompt, ground_truth, strategy=strategy, verbose=False)
           results[strategy] = result
      
       fig, axes = plt.subplots(1, 2, figsize=(14, 5))
       fig.suptitle('Strategy Comparison', fontsize=16, fontweight='bold')
      
       ax = axes[0]
       selected_scores = [
           results[s]['critic_scores'][results[s]['selected_index']].overall_score
           for s in strategies
       ]
       bars = ax.bar(strategies, selected_scores, alpha=0.7, color='steelblue')
       ax.set_ylabel('Critic Score')
       ax.set_title('Selected Response Quality by Strategy')
       ax.set_xticklabels(strategies, rotation=45, ha='right')
       ax.grid(True, alpha=0.3, axis='y')
      
       ax = axes[1]
       for strategy in strategies:
           result = results[strategy]
           selected_idx = result['selected_index']
           confidence = result['all_responses'][selected_idx].confidence
           score = result['critic_scores'][selected_idx].overall_score
           ax.scatter(confidence, score, s=200, alpha=0.6, label=strategy)
       ax.set_xlabel('Confidence')
       ax.set_ylabel('Critic Score')
       ax.set_title('Confidence vs Quality Trade-off')
       ax.legend()
       ax.grid(True, alpha=0.3)
      
       plt.tight_layout()
       plt.show()
      
       return results


def run_basic_demo():
   print("n" + "=" * 80)
   print("DEMO 1: Basic Agent with Critic")
   print("=" * 80 + "n")
  
   agent = CriticAugmentedAgent(
       model_quality=0.8,
       risk_tolerance=0.3,
       n_samples=5
   )
  
   prompt = "What is 15 + 27?"
   ground_truth = "42"
  
   result = agent.generate_with_critic(
       prompt=prompt,
       ground_truth=ground_truth,
       strategy="risk_adjusted",
       temperature=0.8
   )
  
   print("n Generating visualizations...")
   AgentAnalyzer.plot_response_distribution(result)
  
   return result


def run_strategy_comparison():
   print("n" + "=" * 80)
   print("DEMO 2: Strategy Comparison")
   print("=" * 80 + "n")
  
   agent = CriticAugmentedAgent(
       model_quality=0.75,
       risk_tolerance=0.5,
       n_samples=6
   )
  
   prompt = "What is 23 + 19?"
   ground_truth = "42"
  
   results = AgentAnalyzer.plot_strategy_comparison(agent, prompt, ground_truth)
  
   return results


def run_uncertainty_analysis():
   print("n" + "=" * 80)
   print("DEMO 3: Uncertainty Analysis")
   print("=" * 80 + "n")
  
   fig, axes = plt.subplots(1, 2, figsize=(14, 5))
  
   qualities = [0.5, 0.6, 0.7, 0.8, 0.9]
   uncertainties = []
   consistencies = []
  
   prompt = "What is 30 + 12?"
  
   print("Testing model quality impact on uncertainty...n")
   for quality in qualities:
       agent = CriticAugmentedAgent(model_quality=quality, n_samples=8)
       result = agent.generate_with_critic(prompt, verbose=False)
       uncertainties.append(result['uncertainty'].entropy)
       consistencies.append(result['uncertainty'].consistency_score)
       print(f"Quality: {quality:.1f} -> Entropy: {result['uncertainty'].entropy:.3f}, "
             f"Consistency: {result['uncertainty'].consistency_score:.3f}")
  
   ax = axes[0]
   ax.plot(qualities, uncertainties, 'o-', linewidth=2, markersize=8, label='Entropy')
   ax.set_xlabel('Model Quality')
   ax.set_ylabel('Entropy')
   ax.set_title('Uncertainty vs Model Quality')
   ax.grid(True, alpha=0.3)
   ax.legend()
  
   ax = axes[1]
   ax.plot(qualities, consistencies, 's-', linewidth=2, markersize=8, color='green', label='Consistency')
   ax.set_xlabel('Model Quality')
   ax.set_ylabel('Consistency Score')
   ax.set_title('Self-Consistency vs Model Quality')
   ax.grid(True, alpha=0.3)
   ax.legend()
  
   plt.tight_layout()
   plt.show()


def run_risk_sensitivity_demo():
   print("n" + "=" * 80)
   print("DEMO 4: Risk Sensitivity Analysis")
   print("=" * 80 + "n")
  
   prompt = "What is 18 + 24?"
   risk_tolerances = [0.1, 0.3, 0.5, 0.7, 0.9]
  
   results = {
       'risk_tolerance': [],
       'selected_confidence': [],
       'selected_score': [],
       'uncertainty': []
   }
  
   print("Testing different risk tolerance levels...n")
   for risk_tol in risk_tolerances:
       agent = CriticAugmentedAgent(
           model_quality=0.75,
           risk_tolerance=risk_tol,
           n_samples=6
       )
       result = agent.generate_with_critic(prompt, verbose=False)
      
       selected_idx = result['selected_index']
       results['risk_tolerance'].append(risk_tol)
       results['selected_confidence'].append(
           result['all_responses'][selected_idx].confidence
       )
       results['selected_score'].append(
           result['critic_scores'][selected_idx].overall_score
       )
       results['uncertainty'].append(result['uncertainty'].entropy)
      
       print(f"Risk Tolerance: {risk_tol:.1f} -> "
             f"Confidence: {results['selected_confidence'][-1]:.3f}, "
             f"Score: {results['selected_score'][-1]:.3f}")
  
   fig, ax = plt.subplots(1, 1, figsize=(10, 6))
   ax.plot(results['risk_tolerance'], results['selected_confidence'], 'o-', linewidth=2, markersize=8, label='Selected Confidence')
   ax.plot(results['risk_tolerance'], results['selected_score'], 's-', linewidth=2, markersize=8, label='Selected Score')
   ax.set_xlabel('Risk Tolerance')
   ax.set_ylabel('Value')
   ax.set_title('Risk Tolerance Impact on Selection')
   ax.legend()
   ax.grid(True, alpha=0.3)
   plt.tight_layout()
   plt.show()


def demonstrate_verbalized_uncertainty():
   print("n" + "=" * 80)
   print("RESEARCH TOPIC: Verbalized Uncertainty")
   print("=" * 80 + "n")
  
   print("Concept: Agent not only estimates uncertainty but explains it.n")
  
   agent = CriticAugmentedAgent(model_quality=0.7, n_samples=5)
   prompt = "What is 25 + 17?"
   result = agent.generate_with_critic(prompt, verbose=False)
  
   uncertainty = result['uncertainty']
  
   explanation = f"""
Uncertainty Analysis Report:
---------------------------
Risk Level: {uncertainty.risk_level()}


Detailed Breakdown:
• Answer Entropy: {uncertainty.entropy:.3f}
 → {'Low' if uncertainty.entropy < 0.5 else 'Medium' if uncertainty.entropy < 1.0 else 'High'} disagreement among generated responses


• Self-Consistency: {uncertainty.consistency_score:.3f}
 → {int(uncertainty.consistency_score * 100)}% of responses agree on the answer


• Epistemic Uncertainty: {uncertainty.epistemic_uncertainty:.3f}
 → {'Low' if uncertainty.epistemic_uncertainty < 0.3 else 'Medium' if uncertainty.epistemic_uncertainty < 0.6 else 'High'} model uncertainty (knowledge gaps)


• Aleatoric Uncertainty: {uncertainty.aleatoric_uncertainty:.3f}
 → {'Low' if uncertainty.aleatoric_uncertainty < 0.3 else 'Medium' if uncertainty.aleatoric_uncertainty < 0.6 else 'High'} data uncertainty (inherent randomness)


Recommendation:
"""
  
   if uncertainty.risk_level() == "LOW":
       explanation += "✓ High confidence in answer - safe to trust"
   elif uncertainty.risk_level() == "MEDIUM":
       explanation += " Moderate confidence - consider verification"
   else:
       explanation += " Low confidence - strongly recommend verification"
  
   print(explanation)


def demonstrate_self_consistency():
   print("n" + "=" * 80)
   print("RESEARCH TOPIC: Self-Consistency Reasoning")
   print("=" * 80 + "n")
  
   print("Concept: Generate multiple reasoning paths, select most common answer.n")
  
   agent = CriticAugmentedAgent(model_quality=0.75, n_samples=7)
   prompt = "What is 35 + 7?"
   result = agent.generate_with_critic(prompt, strategy="most_consistent", verbose=False)
  
   estimator = UncertaintyEstimator()
   answers = [estimator._extract_answer(r.content) for r in result['all_responses']]
  
   print("Generated Responses and Answers:")
   print("-" * 80)
   for i, (response, answer) in enumerate(zip(result['all_responses'], answers)):
       marker = "✓ SELECTED" if i == result['selected_index'] else ""
       print(f"nResponse {i}: {answer} {marker}")
       print(f"  Confidence: {response.confidence:.3f}")
       print(f"  Content: {response.content[:80]}...")
  
   from collections import Counter
   answer_dist = Counter(answers)
  
   print(f"nnAnswer Distribution:")
   print("-" * 80)
   for answer, count in answer_dist.most_common():
       percentage = (count / len(answers)) * 100
       bar = "█" * int(percentage / 5)
       print(f"{answer:>10}: {bar} {count}/{len(answers)} ({percentage:.1f}%)")
  
   print(f"nMost Consistent Answer: {answer_dist.most_common(1)[0][0]}")
   print(f"Consistency Score: {result['uncertainty'].consistency_score:.3f}")


def main():
   print("n" + "" * 40)
   print("ADVANCED AGENT WITH INTERNAL CRITIC + UNCERTAINTY ESTIMATION")
   print("Tutorial and Demonstrations")
   print("" * 40)
  
   plt.style.use('seaborn-v0_8-darkgrid')
   sns.set_palette("husl")
  
   try:
       result1 = run_basic_demo()
       result2 = run_strategy_comparison()
       run_uncertainty_analysis()
       run_risk_sensitivity_demo()
       demonstrate_verbalized_uncertainty()
       demonstrate_self_consistency()
      
       print("n" + "=" * 80)
       print(" ALL DEMONSTRATIONS COMPLETED SUCCESSFULLY")
       print("=" * 80)
       print("""
Key Takeaways:
1. Internal critics improve response quality through multi-dimensional evaluation
2. Uncertainty estimation enables risk-aware decision making
3. Self-consistency reasoning increases reliability
4. Different selection strategies optimize for different objectives
5. Verbalized uncertainty helps users understand model confidence


Next Steps:
• Implement with real LLM APIs (OpenAI, Anthropic, etc.)
• Add learned critic models (fine-tuned classifiers)
• Explore ensemble methods and meta-learning
• Integrate with retrieval-augmented generation (RAG)
• Deploy in production with monitoring and feedback loops
       """)
      
   except Exception as e:
       print(f"n Error during demonstration: {e}")
       import traceback
       traceback.print_exc()


if __name__ == "__main__":
   main()

We analyze and visualize agent behavior through experiments and comparative strategies. We explore how model quality, risk tolerance, and selection strategy impact final outputs. We demonstrate advanced research concepts, such as verbalized uncertainty and self-consistency reasoning, to better understand and interpret agents’ decisions.

In conclusion, we demonstrated that combining internal critics with uncertainty estimation yields safer, more reliable agent systems. We showed how multi-sample generation, critic-based scoring, entropy analysis, and risk-adjusted selection work together to improve response quality. We observed that self-consistency strengthens robustness, while uncertainty metrics enable informed, risk-aware decisions. By integrating these mechanisms, we moved toward agent architectures that are intelligent and also transparent, interpretable, and production-ready for real-world deployment.

Check out Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making appeared first on MarkTechPost.