Sitemap

Development of Decision-Support Systems with Self-Criticism and Reason+Action Development in Agentic AI Tools— Part 1

6 min readSep 29, 2025
Press enter or click to view image in full size
Source: https://cdn.midjourney.com/5681cac8-c769-4def-a191-7935151457a8/0_0.png

Over the past two years, AI Agents and agentic systems have rapidly entered our lives. However, these systems have mostly served as tools for automation. And nearly two years later, the systems we have built still primarily function as mechanisms that interpret our data or questions through AI and connect the outputs to automation pipelines. Unfortunately, this can be better described as ‘Automation Systems Based on AI-Mediated Data Interpretation’ rather than true AI Agents.

So, what level do we actually expect from a genuine ‘Agentic’ AI? Which of our needs can it fulfill? When we think about this, autonomous system capabilities immediately come to mind. That is, an agentic structure should employ the designated LLM model to begin operating without human intervention, define its own goals and plans, create tasks, advance step by step, and revise its strategy and plan whenever execution is not possible or when errors occur. This is precisely where ReAct (Reason-Act) and Reflection come into play.

Press enter or click to view image in full size
Source: https://www.analyticsvidhya.com/blog/2024/10/agentic-ai-reflection-pattern/

Since LLMs offer both high manageability and accessibility, they are the first choice in agentic processes. However, in agentic systems, LLMs are highly prone to errors when used for tasks requiring planning, tool use, and multi-step reasoning. For this reason, recent research has introduced methods and patterns such as ‘Reflection’ (self-evaluation), ‘Reflexion’ (self-learning through verbal reinforcement), ‘Self-Refine’ (critiquing and correcting its own outputs), ‘Tree-of-Thoughts’ (exploring branching lines of reasoning), and ‘Chain-of-Verification’ (self-verification through confirmation questions) to make LLMs more compatible with agentic processes. With these patterns, researchers aim to develop agents that can examine their own outputs, detect mistakes, recognize errors in tool-using processes, and refine their operations and responses accordingly.

Reflection Pattern

The reflection pattern refers to the idea that an LLM, integrated into an agent structure, reviews its own response output, plan, or tool call results within the same task context through a secondary evaluation and critique step, making corrections when necessary. This enables improvement without updating weights, relying solely on linguistic feedback and architectural control flow. Alongside this pattern, several methods exist:

  • ReAct: Intertwines reasoning with tool use based on actions, thereby establishing interactive loops learned from observations.
  • Reflexion: In the agent system, the LLM provides linguistic feedback after failed attempts, stores it in episodic memory, and then alters strategies in subsequent trials (verbal RL).
  • Self-Refine: With this pattern, the model iterates through a draft → feedback → refinement loop multiple times within a single session, thus improving quality without additional training. Reported results indicate an absolute improvement of ~20%.
Press enter or click to view image in full size
Source: https://arxiv.org/pdf/2303.17651
  • Tree-of-Thoughts (ToT): Instead of proceeding along a single linear flow, the agentic system explores a tree of ‘thoughts.’ It then performs self-evaluation and backtracking as needed.
  • Self-Consistency: Reaches conclusions by harmonizing across multiple reasoning paths through majority alignment, thereby reducing the likelihood of errors.
  • Chain-of-Verification (CoVe): The model first designs a draft; then it plans verification questions and prepares independent answers. Finally, it generates a validated final response to complete the process. Studies have reported that this reduces hallucination in outputs.

Designing the Taxonomy of Agent and Tool Errors

Let us first examine the types of errors that occur in agentic systems built with LLMs:

Agent–LLM Integration Errors:

  • Logical: Situations where the AI model skips steps or produces inconsistent intermediate decisions due to an incomplete agent structure.
  • Factual: Instances where the AI model generates unsupported claims or nonexistent dates/names — in other words, hallucinations.
  • Plan–Execution Deviations: Cases where the model fails to carry out the planned actions or ends up looping indefinitely.
  • Uncertainty/Calibration Issues: When the model makes biased decisions due to overconfidence and low diversity in its outputs.
Press enter or click to view image in full size
Source: https://www.kore.ai/blog/five-levels-of-ai-agents

Tool Errors:

  • Syntactic/Schema: Decision–action errors caused by JSON/function call formatting issues or missing parameters. (Can be caught with JSON Schema/Pydantic.)
  • Semantic/Contractual: Errors arising from invalid identifiers, out-of-range values, or business rule violations. (Handled with field validators.)
  • Runtime: Actions may fail due to timeouts, 5xx connection errors, or service interruptions. (Mitigated via retry/circuit-breaker mechanisms.)
  • Information Quality: Occurs in RAG when weak/misleading content or outdated data is used. (Addressed with CRAG/LLM-verified retrieval.)

To create a practical reflection skeleton, we can apply the following steps:

  • Planner/Policy: Break down the task into sub-goals and divide them into ToDos, creating an execution flow that ensures systematic progress within the model. At the same time, the tool selection strategy is determined for each sub-goal.
  • Actor (Tool-User): The agentic structure executes function calls and produces an observation loop.
  • Critic/Refiner: Evaluates the draft response or traces against rubrics and suggests necessary corrections (this forms the basis of Self-Refine/Reflexion).
  • Verifier: Ensured by adding external and automated checks such as schema/validation, CoVe questions, tests, and RAG verifiers to construct this pattern.
  • Memory: Supported through episodic memory (Reflexion notes) and persistent task-specific findings.
  • Orchestrator (Graph Engine): Realized through a flow graph with branching/retries and circuit breakers.

Types of Reflection Loops and When to Use Them

  • Self-Refine: The flow is Draft → Critique → Revision.
    Input points: Text, summaries, essays, or long-form responses.
    Key advantage: Requires no additional training.
  • Reflexion: The flow is Trial → Feedback → Strategy Adjustment.
    Input points: Coding, game/environment interactions, multi-step decision-making tasks.
    Key advantage: Learns quickly within the same session through episodic memory.
  • ToT (Tree-of-Thoughts): The flow is Search-based reasoning.
    Input points: Tasks requiring combinatorial reasoning or exploration (e.g., puzzles, planning).
    Key advantage: Enables backtracking and global choice, thereby reducing the impact of a single erroneous step.
Press enter or click to view image in full size
Source: https://learnprompting.org/docs/intermediate/self_consistency?srsltid=AfmBOoroV8QW8ovaxbmO1Nssvy0x7g-pAD5WlHCQOE0B2HANW1wRQUis
  • Self-Consistency: The flow is “Multiple sampling and consensus.”
    Input points: Used for problems that can be reduced to a single correct output.
    Key advantage: Estimates uncertainty through diversity.
  • CoVe (Chain-of-Verification): Verification-oriented, i.e., test-based validation.
    Input points: Tasks where factual correctness is critical and must be “proven.”
    Key advantage: Provides measurable assurance through verification questions and unit tests.
  • CRAG, LLM-verified retrieval: Frequently applied in scenarios involving RAG errors (e.g., incorrect/incomplete retrievals).
    Key advantage: Forms a loop that critiques the retrieved content and generates a refined query for re-retrieval.
Press enter or click to view image in full size
Source: https://www.kore.ai/blog/corrective-rag-crag

Example Generation-Calibration Reflection Skeleton:

function SOLVE_WITH_REFLECTION(query, tools, rag, budget):
plan ← PLAN(query) # ReAct/ToT plan
state ← ∅; memory ← ∅
for step in 1..MAX_STEPS:
act ← SELECT_ACTION(plan, state)
call ← FORMAT_TOOL_CALL(act)
assert SCHEMA_OK(call) # JSON Schema/Pydantic
obs ← EXECUTE(call) # tool answers
if RUNTIME_ERROR(obs): # timeout, 5xx, empty result
obs ← RETRY_OR_FALLBACK(call)
state ← state ⊕ obs
if step % REFLECTION_INTERVAL == 0 or TRIGGERED(state):
critique ← CRITIC(state, RUBRICS) # Self-Refine/Reflexion
if critique.flags.contains("FACTUAL_RISK"):
verif ← COVE_VERIFY(state) # CoVe
state ← state ⊕ verif
if critique.flags.contains("RETRIEVAL_ISSUE"):
score ← RETRIEVAL_EVAL(state) # CRAG evaluator
if score < τ: state ← state ⊕ RE_RETRIEVE(rag, state)
plan ← UPDATE_PLAN(plan, critique, memory)
memory ← UPDATE_MEMORY(memory, critique) # Reflexion
if STOP_CONDITION(state, plan): break
drafts ← {GENERATE_DRAFTS(state, M)} # Self-Consistency
best ← MAJORITY_VOTE(drafts)
return FINALIZE_WITH_EVIDENCE(best)

Hopefully this has been a useful article. In the next section, I will examine example workflows and provide a detailed explanation of ReAct.

See you soon…

--

--

Alican Kiraz
Alican Kiraz

Written by Alican Kiraz

Sr. Staff Security Engineer @Trendyol | CSIE | CSAE | CCISO | CASP+ | OSCP | eCIR | CPENT | eWPTXv2 | eCDFP | eCTHPv2 | OSWP | CEH Master | Pentest+ | CySA+

No responses yet