Large Behaviour Models
Large Language Models (LLMs) primarily focus on processing & generating text, relying on users to articulate their desired outcomes.
Achieving effective results often involves leveraging in-context learning (ICL), which provides the model with contextual examples or references to guide its understanding & improve the relevance of its output. This method leverages the model’s ability to interpret prompts & align responses more closely with user expectations.
Seemingly the terms Large Behaviour Models and Large Content & Behaviour Models are used interchangeably.
Language Model Progress
Significant advancements have been made in the field of Language Models. A key milestone is the introduction of Small Language Models (SLMs), which are in many cases fine-tuned using custom, meticulously designed data and techniques such as partial masking and tiny stories to enhance their performance.
In 2024, models with integrated vision and reasoning capabilities were introduced, expanding the applications of Generative AI.
Additionally, Large Action Models (LAMs) emerged, tailored for AI Agent tasks, tool utilisation, and delivering structured outputs for tool integration.
But Here Is The Problem
A large part of human conversation and communication is symbolic…with expressions, such as words, gestures, speech, pictures, memes, musical sounds and more.
Large Content & Behaviour Models are intended to understand, simulate and optimise based on available content and user behaviour.
So communication is not just mere text…
Human communication should be studied on three levels: technical, semantic and effectiveness.
Technical: How accurate can the symbols of communication be transmitted?
Semantic: How precise are transmitted symbols conveying the desired meaning?
Effectiveness: How well does the received meaning induce the desired conduct in the receiver?
These three levels build on top of each other. Hence the problem needs to be solved on all three levels.
Consider the diagram below, where you are the information source, there is a message and a transmitter. A way of transporting the message.
Possible noise which can be introduced, weakening the signal are ambiguous or poorly formed prompts, incomplete or misleading context, training data bias, hallucination or mismatched model use cases.
Again considering the image below, communication can also be seen via these five factors, where there is a communicator, a channel or a medium acting as a conduit for the message. Then there is the receiver, and the interpretation and effect/next-step induced in the receiver.
AI Agents
Large Behaviour Models (LBMs) are highly suited for AI Agents tasked with navigating complex environments like computer operating systems, where actions are both contextual and influenced by user behaviour.
These models leverage their ability to predict and simulate behaviours based on historical human interactions, enabling agents to anticipate user needs and perform tasks efficiently.
For example, when managing files or configuring settings, an LBM can incorporate behavioural cues to optimise workflows, ensuring more personalised and adaptive interactions. This integration enhances decision-making in dynamic, context-sensitive environments.
Behavioural Tokens
The study introduces something they refer to as behavioural tokens…this an aspect I found very interesting, behaviour tokens are elements like shares, likes, clicks, purchases, and retweets, and the like.
Modern data littered with these elements with act as queue’s to the behavioural patterns of users.
LCBM takes concatenated inputs of visual tokens, scene ASR, caption, scene behaviour of replays, channel information & video title and behaviour metrics of views and a ratio of likes to views.
Behavioural & Content Understanding
Behavioural Understanding refers to a model’s ability to analyse and explain receiver behaviours, such as likes, comments, or views, in response to specific content.
For example, the model might predict the sentiment of user comments and provide reasoning for these behaviours. Evaluation involves human reviewers assessing how well the model’s reasoning aligns with actual user behaviours.
Content Understanding assesses a model’s ability to interpret and classify the features of content itself, such as the topic, emotions, or actions in videos. This ensures that behaviour focused training does not compromise the model’s original ability to understand content.
Practical Examples
The example below comparing the LBM/LCBM to Vicuna and GPT-3.5…it is interesting how the LCBM can convey empathy and emotion based on visual input and human comments.
And again below a number of examples showing LCBM’s ability to understand and explain human behaviour of replayed scenes, compare against human-provided explanations.
Shannon and Weaver’s seminal information theory divides communication into three levels: technical, semantic, and effectiveness. While the technical level deals with the accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver.
As seen below, encoding and predicting content (such as images, videos, and text) along with user behaviour in a shared language space involves fine-tuning LLMs for behaviour instruction to create LCBMs.
Visual concepts are captured and also integrates world knowledge. Tools are used to convert visual data into language tokens.
The model is fine-tuned end-to-end to predict behaviour based on content or vice versa, with parts of the architecture selectively frozen or unfrozen for training efficiency.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.