Microsoft researchers created a brand-new prompt optimisation technique called Automatic Prompt Optimisation (APO). A broad and nonparametric rapid optimisation technique called APO was developed in the spirit of numerical gradient descent. It attempts to automate and enhance the rapid development procedure for LLMs.
The methodology extends previously developed automated methods, such as developing auxiliary models or differentiable representations of the prompt and implementing discrete manipulations utilizing reinforcement learning or LLM-based feedback.
APO, in contrast to earlier approaches, overcomes the discrete optimisation hurdle by incorporating gradient descent into a text-based Socratic discussion. Backpropagation is replaced with LLM editing, and distinction is replaced with LLM feedback. The system begins by obtaining natural language “gradients” that explain the defects in a particular prompt using mini-batches of training data.
These gradients serve as a guidance during editing, where the prompt is changed to follow the gradient’s inverse semantic orientation. The prompt optimization problem is then changed into a beam candidate selection problem by doing a broader beam search to broaden the search space for prompts. The effectiveness of the algorithm is improved by this method.
The Microsoft research team tested APO’s performance against three cutting-edge prompt learning baselines on a variety of NLP tasks, including the detection of jailbreaks, hate speech, bogus news, and sarcasm. On all four tasks, APO consistently beat the baselines, outperforming both the Monte Carlo (MC) and the reinforcement learning (RL) baselines by a wide margin.