Audiobox Released by Meta

www.analyticsdrift.com Image source: Analytics Drift

[{"selector":"#anim-3de744f0-50bd-4d18-927a-4b1c8ce49b5e","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-1facb0d4-b89b-4f73-8759-52130aa497d1","keyframes":{"transform":["translate3d(0px, 239.45249%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-8baeebcb-f8e8-4161-89bb-c1a291ac6e79","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Meta introduced the Audiobox as their latest foundational research model for audio generation. Image source: Meta

[{"selector":"#anim-97074da4-daee-46cb-b361-bbddb37adaa8","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-869221b6-887c-42e4-8300-bd1378d3accb","keyframes":{"transform":["translate3d(0px, 205.80240%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-d5b32026-12aa-4e04-84e3-6b6cfa3469a5","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Within this family of models are specialized versions such as Audiobox Speech and Audiobox Sound. Image source: Canva

[{"selector":"#anim-f87fb77e-5150-41ab-bfc7-a9a8ebcc56ad","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-898569c5-4c33-49cd-ac5e-1c060f96e166","keyframes":{"transform":["translate3d(0px, 182.18212%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-78d0e0d6-e3c4-450a-8044-95258c20295c","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] These models enable the creation of voices and sound effects by amalgamating voice inputs with natural language prompts, catering to diverse audio needs. Image source: Canva

[{"selector":"#anim-b6c37112-aa67-4f18-afde-7acb6a93f2e5","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-97db5b10-aa2c-4b46-ab5f-4b27b2da2862","keyframes":{"transform":["translate3d(0px, 189.38933%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-28fda77c-4bf0-4dee-af37-bd37fc421046","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Audiobox empowers users to utilize text description prompts to specify and manipulate sound effects, expanding the range of controllable features. Image source: Canva

[{"selector":"#anim-e66c4260-c5bf-4c52-a7a4-ded90fbda7b1","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-e7627789-e45d-435d-bb89-bde7735320f1","keyframes":{"transform":["translate3d(0px, 183.08306%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-4d6712b5-717a-4bed-a441-229fe757cd2b","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] When combined, the voice input establishes the fundamental timbre, while the text prompt becomes a tool for altering other attributes. Image source: Canva

[{"selector":"#anim-a6d14bdf-37ef-434f-a0e0-1f7b84e01e2f","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-7a4f4f69-702b-4aa8-968e-ab2d5a55d8c1","keyframes":{"transform":["translate3d(0px, 200.24686%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-0feb5992-0503-4793-b97e-e19487ee36cb","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Audiobox inherits Voicebox’s guided audio generation training objective and flow-matching method, enabling audio infilling. Image source: Meta

[{"selector":"#anim-d61b82e5-0c57-44d5-bdec-628581118b26","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-fda1d851-d468-4051-9b1d-55510ef30b8f","keyframes":{"transform":["translate3d(0px, 182.18212%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-f04718eb-4170-4b4a-b36f-38e12c615ef2","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] This capability permits users to refine sound effects, such as incorporating diverse thunder sounds into a rain soundscape, enhancing the model’s versatility. Image source: Canva Read more

Instagram

[{"selector":"#anim-adfcfb53-2958-4a29-944d-9b48b069cc7c","keyframes":{"transform":["translate3d(-115.92356%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-04dffa58-97b0-4d9c-bd60-8999bb03fb9b","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-b608fac4-856d-4cce-9c04-cc90643952f4","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}] [{"selector":"#anim-062b60aa-4ac9-4817-aa19-00a6c08a7d69","keyframes":{"opacity":[0,1]},"delay":200,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-5a5d2125-9e24-49b5-8c26-d00673620363","keyframes":{"transform":["translate3d(153.49999%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-ff325207-ca94-4e60-87a3-fd070750691e","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a6405683-5643-480f-92aa-eea57255f79a","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}] [{"selector":"#anim-8f4785e7-7d42-40cd-bef9-f027800d7eec","keyframes":{"opacity":[0,1]},"delay":200,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-de2fc509-2bb6-485d-b642-cd5137eac1f8","keyframes":{"opacity":[0,1]},"delay":200,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] @analyticsdrift Produced by: Analytics Drift Designed by: Prathamesh Follow Us Now

Audiobox Released by Meta

Instagram

Follow us on

Don't Miss Out on the

Latest in AI and Data Science