What is an LLM?

  • language models
    • text to text generation
  • that are large
    • Massive number of model parameters (weights)
    • Trained on huge amounts of data

How do LLMs learn?

  • Via a model architecture (giant graph of interconnected nodes)
    • transformer, mamba
  • Model weights “trained” with massive dataset
    • internet, books, code

What can LLMs do?

  • Text Gen
  • Translation
  • Summarization
  • Question Answering
  • Conversational AI

What can’t LLMs do?

  • Hallucination: can generate seeming correct but incorrect information
  • Bias: can reflect biases present in their training data
  • Real-time info: no access to data present in their training data
  • Privacy: your personal info can be part of the training data
  • Security: LLMs can be manipulated to give/not give certain answers

What is Infrastructure (systems)?

  • the backbone that powers applications
  • Traditional infra (systems)
    • OS, database, networking, hardware, compiler, security, virtualization, DevOps, etc
    • for data centers, cloud computing

What is LLM infra?

  • LLM training
    • Pre-training (foundational model training)
    • Post-training (fine-tuning, RLHF)
  • LLM serving
    • Single-GPU/CPU LLM inference
    • Distributed model serving
  • LLMOps
    • Training data collection, preparation, and synthesize
    • Experimental tracking, model registry
    • monitoring and logging of LLM serving

What is an AI Agent?

Categories of AI Agents

  1. Single-agent applications
  2. Multi-agent systems
    • LLM agents interacting with each other in a collaborative or competitive manner
  3. Human-Agent cooperation
    • LLM agents can interact with humans, providing them with assistance and performing tasks more efficiently and safely

Agent-to-Agent (A2A) Protocol

  • A protocol enabling standardized communication across agents
    • Support agents using different frameworks to communicate
    • A client (local) agent can discover agents by fetching “Agent Card” of available remote agents
    • And then delegate a task to the chosen remote agent
  • Supports streaming and asynchronous push notifications for long tasks

What are LLM Agents

Planning

  • Subgoal and decomposition: the agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks
  • Reflection and refinement: the agent can do self-criticism and self reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results

Task Decomposition: Chain of Thought (CoT)

CoT has become a standard prompting technique for enhancing model performance on complex tasks

The model is instruction to “think step by step”, utilizing more computation time to decompose hard tasks into smaller and simpler steps.

Transforms big tasks into multiple manageable tasks and shows model’s thinking process

Task Decomposition: Tree of Thoughts

Tree of Thoughts extends CoT by exploring multiple reasoning possibilities at each step, creating a tree structure.

Design choices involved:

  • DFS or BFS search
  • how each state is evaluated by a classifier (via prompt or majority vote)

Self-Reflection: ReAct

Adding reasoning and acting within the LLM by extending the action space to be a combination of task-specific discrete actions and the language space

Example format of the explicit steps for an LLM to think Example of ReAct differences

Memory

  • Sensory Memory: learning embedding representation for raw inputs, including text, image or other modalities (Vision encoder/ Speech encoder)
  • Short-term memory: all the in-context learning is utilizing short-term memory of the model to learn. Short and finite, restricted by window length of transformer (prompt engineering)
  • Long-term memory: this provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval. External vector store for agent to query (RAGs)

Agent Memory Types

  • Semantic: facts and general knowledge
  • Episodic: past requests/actions
  • Procedural: skills defined in programs or fine-tuned
  • Short-term: context within a request execution

RAG

Why RAG?

  • LLM’s knowledge is static at time of training
  • No proprietary info
  • Non-attributable: difficult to trace the source of an LLM generation
  • Hallucination: easier to hallucinate when asked about knowledge outside of training set
  • LLMs are large and expensive to train and run, put proprietary info out of model Components:
  • Data source
    • preprocessed by chunking and embedding into vector DB
  • Retrieve
    • convert query into embedding
  • Augment and generate
    • combine retrieved chunks with existing context

Tool Use

Agent learns to call external APIs for extra information missing from model weights

  • eg:
    • web search + crawling
    • browser
    • social media, email hooks
    • code + CLI execution

What make LLM agents stand out?

  • Language Mastery: comprehend and produce language
  • Decision-making: LLMs are equipped to reason and decide
  • Flexibility: adaptability ensures they can be molded for diverse applications
  • Collaborative Interactions: collaborate with humans or other agents → multifaceted interactions Examples
  • HuggingGPT - LLM + APIs
    • LLM mapped to hugging face models as tools
  • TALM (tool augmented language model)
  • Toolformer