A New Prompt Technique From DeepMind Called Optimisation by PROmpting (OPRO)
A number of advancements have been made in the area of prompting techniques, enabling LLMs to significantly improved performance on a variety of domains.
Overview
Lately there has been much focus on methods to optimise LLMs using a gradient-free (not using fine-tuning) approach. An astute approach to LLM implementation would be to use both gradient-free and gradient in concert while orchestrating multiple LLMs.
This study clearly state that the purpose of this research is not to replace gradient approaches, but rather explore to what extent prompt engineeringcan be leveraged to improve LLM performance. OPRO is not intended to outperform gradient-based optimisation.
This study also highlights the fact that LLMs are sensitive to different prompt formats, and that optimal prompt formats are model and task specific.
Another area of focus was stability, or which can also be thought of as reliability. To improve stability, OPRO follows a method where the LLM is used to generate multiple solutions at each inference (optimisation step), allowing the LLM to simultaneously explore multiple possibilities and quickly discover promising directions to move forward.
This creates a process of convergence to an optimal solution; So OPRO can be thought of as a prompt based optimisation trajectory.
Initial Considerations
This study again shows the flexibility and utility of LLMs, and to what extent Prompt Engineering can be leveraged to achieve tasks not envisioned previously.
One of the challenges with LLMs in production is inference-time latency; and this iterative process seems to be an excellent idea for prompt engineering optimisation, crafting prompts which outperform human-designed prompts. Including prompts which are task and model specific.
At this stage I cannot see this being used at run-time in a production environment; OPRO can work well at design-time.
I get the impression as prompt engineering techniques develop, especially considering the newly released studies, that a more iterative approach is being followed which has cost, latency and complexity considerations.
It would be convenient if a UI existed which sits between a playground and autonomous agents; where these prompt engineering techniques can be experimented with. I envisage this as a more configurable playground where routines can be defined and a process or number of iterative loops are possible.
Prompts optimised by OPRO outperform human-designed prompts by 8% to 50% at certain tasks.
— Source
The study states that OPRP follows an optimisation process with the LLM, which starts from 5 conMeta Prompt Example
The meta-prompt OPRO makes use of, has a four components, with the configuration shown below…
Meta-Prompt = Meta-Instructions + Solution-Score Pairs + Meta-Instructions + Optimisation Task & Output Format + Meta-Instructions
Meta-Instructions
I have some texts along with their corresponding scores.
The texts are arranged in ascending order based on their scores,
where higher scores indicate better quality.
Solution-Score Pairs
text:
Let’s figure it out! score:
61
text:
Let’s solve the problem. score:
63
(... moreinstructionsandscores...)
Meta-Instructions
The following exemplars show how to apply your text:
you replace <INS> in each input with your text, then read the input
and give an output. We say your output is wrong if your output is
different from the given output, and we say your output
is correct if they are the same.
Optimisation Task & Output Format
input:
Q: Alannah, Beatrix, and Queen are preparing for the new school year and
have been given books by their parents.
Alannah has 20 more books than Beatrix.
Queen has 1/5 times more books than Alannah.
If Beatrix has 30 books, how many books do the three have together?
A: <INS>
output:
140
(... more exemplars ...)
Meta-Instructions
Write your new text that is different from the old ones and has a score
as high as possible. Write the text in square brackets.
The Complete Prompt Concatenated:
I have some texts along with their corresponding scores.
The texts are arranged in ascending order based on their scores,
where higher scores indicate better quality.
text:
Let’s figure it out! score:
61
text:
Let’s solve the problem. score:
63
(... moreinstructionsandscores...)
The following exemplars show how to apply your text:
you replace <INS> in each input with your text, then read the input
and give an output. We say your output is wrong if your output is
different from the given output, and we say your output
is correct if they are the same.
input:
Q: Alannah, Beatrix, and Queen are preparing for the new school year and
have been given books by their parents.
Alannah has 20 more books than Beatrix.
Queen has 1/5 times more books than Alannah.
If Beatrix has 30 books, how many books do the three have together?
A: <INS>
output:
140
(... more exemplars ...)
Write your new text that is different from the old ones and has a score
as high as possible. Write the text in square brackets.
The meta-prompt contains two core pieces of information:
The previously generated prompts with their corresponding training accuracies.
The optimisation problem description, which includes several exemplars randomly selected from the training set to exemplify the task of interest.
OPRO Methodology
OPRO follows a process of leverages a LLM to gradually generate new prompts via an iterative optimisation process.
Recent work considered the discipline of LLMs generating and improving on human-level prompt engineering. What makes OPRO different is that it is an iterative process, each optimisation step generates new prompts that aim to increase the test accuracy based on a trajectory of previously generated prompts.
OPRO follows a full optimisation trajectory, OPRO enables the LLM to gradually generate new prompts that improve the task accuracy throughout the optimisation process, where the initial prompts have low task accuracies.
The optimal prompt formats can be model-specific and task-specific.
— Source
With a variety of LLMs, the study demonstrates that the best prompts optimised by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
The image below show where OPRO fits into the data delivery process of Large Language Models, in the gradient-free, in-context sphere.
⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.