Meta Introduces Self-Taught Evaluator to Train LLMs

www.analyticsdrift.com Image source: Analytics Drift

Introduction to the new LLM evaluation technology

[{"selector":"#anim-0b1de99a-eb4e-415e-9749-6b0d5726fd03","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-5ba3146e-83d2-46dd-a90b-9f480b0112eb","keyframes":{"transform":["translate3d(0px, 170.84610%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-9947cb73-5040-4a06-acfc-b0137d687a8c","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-69c146ac-7b8e-4d63-a53a-eff9735a5433","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] On 20th August 2024, Meta announced a new method of evaluating large language model (LLM) performance, which reduces manual effort. Image source: Meta

Challenges of using the traditional method

[{"selector":"#anim-ad57d34c-4a38-41f4-9e65-8ada920533e2","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-1173ebee-07e3-44f7-8813-bd0874b6d92b","keyframes":{"transform":["translate3d(0px, 163.52842%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-c56f0443-e17b-4b5c-9a56-63d725247d5c","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a6a76e75-d67b-41b1-b2c7-8a1a9f58c581","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] The current method to train accurate LLM evaluators relies on human-annotated data. This requires time, money, and specialized training, often creating a bottleneck in rapid development. Image source: Meta

Addressing LLM evaluation challenges

[{"selector":"#anim-c91820a6-c88a-4aa6-b549-cbcadc429fec","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-248546b0-5f90-4f0b-9edd-f9d49ff89f2b","keyframes":{"transform":["translate3d(0px, 165.05516%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-183dbace-3a86-41e8-9ec1-7c2623c513e2","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-037e6974-b028-4eca-b36a-8a1c01dc1b98","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] The Self-Taught Evaluators eliminate the requirement of human-labeled data by working on the principle of LLM-as-a-judge, which generates a reasoning chain for accurate responses. Image source: Canva

Self-Taught Evaluator methodology in assessing LLM accuracy

[{"selector":"#anim-28f67f14-23f8-45b5-87d5-1682884ac51c","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-32757405-d215-426b-918a-e5ebd73151b6","keyframes":{"transform":["translate3d(0px, 155.89485%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-09be2749-b44c-4153-a674-db3ceee25407","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-0059938e-8982-46fa-88d2-64fa62cf48ef","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Self-taught evaluators use a seed LLM and unlabeled instructions to generate training data. It iteratively fine-tunes performance by adding examples with a correct reasoning chain to the training data. Image source: Meta

Judging Self-Taught Evaluator’s performance

[{"selector":"#anim-69b16607-e925-4d49-bb38-5859145e9445","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-921b39d2-143a-4c4c-a958-a44cdb754ab9","keyframes":{"transform":["translate3d(0px, 174.51583%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-92cf1ea9-2c56-4e23-b6bd-64ee9ac14641","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-69f952ca-42f2-4ba7-8301-744ab17c298d","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] The Self-Taught Evaluator surpassed some models trained on human-labeled data, enhancing the accuracy on benchmarks like MT-Bench and RewardBench. Image source: Canva

Efficient Utilization of Self-Taught Evaluator

[{"selector":"#anim-9ca8a05c-f326-4644-ad31-c9adeb35eb28","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-dbe5bf5e-b1a7-4df8-a3a1-924898333ad7","keyframes":{"transform":["translate3d(0px, 169.63530%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-339c00a8-41b8-4a03-8bf6-4d9f44524f63","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-3da1956f-1a6f-4870-bbee-94d83f2d8bc9","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] The Self-Taught Evaluator heavily relies on the initial seed model. It becomes necessary for you to choose seed and base models that are relevant to your data and specific requirements. Image source: Canva Read more Opening https://analyticsdrift.com/

Join Now Opening https://www.whatsapp.com/channel/0029Va4lGiPIXnlw2R2W4T0T