Activeloop.ai, a company leveraging deep learning services for complex data infrastructure, launches Deep Lake, a data lake for deep learning capabilities. Without compromising on GPU utilization, Deep Lake stores complex data in the form of tensors, such as photos, videos, annotations, embeddings, and tabular data. It rapidly feeds the data across the network to Tensor Query Language, in-browser visualization engines, and deep learning frameworks.
A data lake is a centralized storage where companies store data for governance, analysis, and management. First-generation data lakes collect data into distributed storage platforms like HDFS or AWS S3.
The second generation of data lakes, led by Delta, Iceberg, and Hudi, is a result of the transformation of data lakes into “data swamps” by unorganized data collections. Data lakes readily connect to query engines to run analytical queries.
Read More: Snowflake invests in Domino Data Lab to provide deeper integrations
Over the past ten years, deep learning algorithms have effectively handled complex and unstructured data, including text, images, videos, and audio.
Deep Lake maintains the advantages of a typical data lake with one notable exception: it saves complex data as tensors. It feeds it quickly to deep learning frameworks across the network without reducing GPU utilization.
A seamless interface between deep learning frameworks like PyTorch, TensorFlow, and JAX is also supported by Deep Lake. On its GitHub page, DeepLake provides access to all of its resources.