Monday, November 17, 2025
ad
Home Blog Page 116

Hexo Raises $270,000 in Pre-Seed Funding by Antler India

Generative AI startup Hexo has raised $270,000 as pre-seed funding led by Antler India through the venture capital firm Antler India Residency initiative.

Founded in 2012, Hexo is a fast, simple, and powerful blog framework. It is building an open-source image generation API that provides a wide range of controls to make image generation more accurate and predictable. This fine tuner product for businesses will allow them to quickly build image generation engines based on the company’s design language, characters, products, and IP and embed text to image generation in their workflow. 

Read more: Infinity AI raises $5M in a seed funding round to use synthetic data

Vignesh Baskaran, the co-founder of Hexo, mentioned that the pre-seed funding would be used to launch Hexo’s new product, a fine-tuner for businesses to build custom image generation engines. 

Antler India is thrilled to lead Hexo’s pre-seeding round and collaborate with them in the journey of generative AI. The founder of Hexo, Vignesh, is a deep learning expert and has conducted many machine learning research projects globally. Kunal Bhatia, a three-time founder, has built SuperLearn (EdTech) and Switch (IoT). Antler India believes that Vignesh’s engineering strength and Kunal’s business skills can ensure to build of great AI products for sale.

Advertisement

AWS, Meta and Microsoft to develop Google Maps rival Overture Maps

Overture Maps

The Linux Foundation has announced to create a rival for Google Maps called Overture Maps. This is a new collaborative effort to optimize interoperable open map data as a shared asset to strengthen mapping services worldwide.

It’s an open-source mapping effort that includes big companies like Amazon Web Services (AWS), Meta, Microsoft, and TomTom. The project is open to all communities with the goal of building open map data. 

The Linux Foundation announced the initiative through a press release about the project and a new website for the Overture Maps Foundation. 

Read More: Biochemists Present AlphaFill, An Upgraded Version Of AlphaFold For Protein Folding

The project will focus on integrating existing open map data from city planning departments and several projects like OpenStreetMap. It will also use new map data contributed by members and built using AI/ML techniques and computer vision to create a living digital record of the physical world.

The Overture Maps Foundation aims to power new map products via openly available datasets that may be used and reused across businesses and applications, with each member throwing their own data and resources into the mix. 

Advertisement

PubMed GPT: A GPT Model Trained on PubMed Biomedical Papers at Stanford

pubmed gpt

Researchers at the Stanford Center for Research on Foundation Models (CRFM) have recently worked on investigating industry-specific LLMs (large language models). They introduce PubMed GPT as a part of their research, specifically focusing on biomedicine. 

Using the MosaicML cloud platform, CRFM researchers trained a GPT model on PubMed biomedical papers. The resultant model is highly accurate in several NLP tasks. PubMed GPT is based on a HuggingFace foundation and uses a biomedical tokenizer trained on the Pile dataset abstracts and PubMed central sections. It uses the PyTorch framework and the composer from MosaicML for training LLMs.

After training the model, researchers evaluated it on several popular benchmarks, a critical measure being the MedQA-USMLE question-answer challenge. In addition, the researchers manually assessed its generations for a task that involved summarising questions. The researchers employed several previous CRFM and biological models, including GPT-Neo, Galactica, and PubMedBERT.

Read More: Meta Turned Down the Galactica Demo After Being Criticized as “Dangerous”

The researchers concluded that LLMs are versatile and have much to offer when trained on domain-specific datasets. But the versatility comes at a cost due to many parameters. Model complexity, cost, specialized architectures, and domain knowledge are all trade-offs with the performance of PubMedGPT.

The researchers plan to concentrate future work on enhancing the scope of the model and assessing it against a more extensive collection of NLP tasks. PubMed GPT is intended solely for research as it is yet to be developed for production.

Advertisement

Meta takes down 40 phishing accounts by CryperRoot Risk Advisory

Meta 40 phishing accounts CryperRoot Risk Advisory

Meta has taken down more than 40 accounts operated by Indian firm CyberRoot Risk Advisory for phishing. The accounts were allegedly involved in hacking-for-hire services. 

The tech giant has also taken down a network of over 900 fake accounts on Facebook and Instagram operated by an unknown entity from China. 

These accounts were designed to collect data from people in the US, China, Myanmar, India, and Taiwan. According to Meta’s Threat Report on the Surveillance-for-Hire Industry, released on December 15, the accounts focused on military personnel, pro-democracy activists, government employees, politicians, and journalists,  

Read More: Biochemists Present AlphaFill, An Upgraded Version Of AlphaFold For Protein Folding

According to the report, CyberRoot used fake accounts to create fictitious persons tailored to gain trust with the people they targeted worldwide. These accounts impersonated journalists, business executives, and media personalities to appear more credible. 

In some cases, CyberRoot also created accounts identical to those connected to their targets, like their family members or friends, with only slightly changed usernames, to trick people into engaging, the report said.

Advertisement

Donald Trump NFTs Collection Sells Out Within a Day

trump nfts collection sells out

Former US President Donald Trump’s collection of non-fungible tokens (NFTs) sells out within a day of its launch following a hyped announcement on Truth Social. The announcement featured an animated image of Trump in a superhero costume, shooting beams of laser light from his eyes. While the announcement was not exactly what people expected, everyone was surprised.

As per OpenSea, the collection’s trading volume is roughly 900 ETH (US$1.08 million), while its floor price is US$230, more than double the original price (US$99), while some select NFTs are being sold for an even higher price. The rarest kinds (roughly 1,000 NFTs) are selling for as much as 6 ETH! 

Read More: Top Excel Formulas Bots of 2022

One of these extremely uncommon trade cards depicting the 45th president carrying a torch while standing in front of the Statue of Liberty is listed for 20 ETH, or almost US$24,000. Many one-of-ones are currently held in Gnosis Safe multi-signature wallet, a secured payment wallet for receiving royalty payments via secondary NFT sales.


As per Dune Analytics, over 115 people purchased the set of 45 tickets required for an assured dinner with Trump, and over 17 people bought the maximum quantity permitted by the Trump Card site. However, more metrics hint that other wallets held far more as Trump NFTs collection sells out.

Advertisement

Meta’s consulting CTO for virtual reality efforts, John Carmack, resigns

John Carmack resigns

John Carmack, the consulting CTO for Meta’s virtual reality efforts, is leaving, according to two people familiar with the company. His exit came on Friday.

Carmack posted about his decision to leave on the company’s internal Workplace forum. He was openly critical of Meta’s advancements in VR and AR, which are core to its metaverse ambitions. He later posted the entire post on his Facebook profile

While the company is transitioning its focus to the metaverse, Carmack said that Meta is running at half its effectiveness. He added that the company has “a ridiculous amount of people and resources” only to “squander effort and self-sabotage.”

Read More: Biochemists Present AlphaFill, An Upgraded Version Of AlphaFold For Protein Folding

“It has been a struggle for me. I have a voice at the highest levels here, so it feels like I should be able to move things, but I’m evidently not persuasive enough,” Carmack said in his post.

In 2014, Oculus was acquired by Facebook, the leading virtual reality company, for about $2 billion. Carmack was one of the driving forces behind the development of Meta’s virtual reality headsets. 

Advertisement

Top Opensource ETL Tools 2022

opensource etl tools

ETL stands for Extract, Transform, and Load, and the acronym is an umbrella term for the process of collecting, transforming, and storing data at a specified location to accomplish a business goal. The process is accomplished using specially designed ETL tools. Depending on the volume and complexity of data and the number of queries required, enterprises can either purchase them or use open-source ETL tools. But first, it is necessary to know what ETL tools do.

Extract: In the first data processing step, the ETL tools “extract” or collect data from the desired location. The tools also recognize the data storing technique, security controls and then issue queries to read and see if there has been a change since the last extractions.

Transform: ETL tools alter the extracted data to make it appropriate for the target location where it will be loaded. The tools may change certain information in table cells, add/delete a few rows/columns to maintain consistency, and interact with different applications to do so, depending on the queries. 

Load: After transforming the data, the ETL tool loads it in the desired location. Most of the time, the location is a data lake or a warehouse for analysis. The ETL also optimizes the loading process for maximum efficiency, bulk loading, and minimum loading time. 

This article enlists some of the best open-source ETL tools.

Top 10 Open-source ETL tools

Listed below are some of the most useful opensource ETL tools, have a look.

  1. Jaspersoft ETL

Jaspersoft ETL is a powerful, open-source, and versatile tool powered by Talend. The tool comes under the umbrella company TIBCO’s product portfolio and is specially designed for seamless data integration with volumes of complex data. Developers can graphically plan, schedule, and manage data workflows and transformations to load any target location, like Operational Data Store (ODS), Data Mart, or Data Warehouse. Once the data is loaded, it can be used for centralized reporting and advanced analytics. Jaspersoft ETL also offers a Community Edition with over 500 connectors and components version control and an Enterprise Edition with embeddable web reporting and self-service BI tools. 

  1. CloverDX (CloverETL)

CloverETL was the first of many open-source ETL tools developed when data warehousing started gaining momentum. Since then, CloverETL has dramatically improved as data has become progressively complex. The company currently offers a global service, a flexible data integration platform, and powerful support and services teams that actively aid enterprises in their data operations. Over the years, the company has proceeded to CloverDX, an entire “Data Experience,” with a holistic approach and flexibility. With CloverDX, enterprises can leverage multiple data management software while automating the entire ETL process. Nearly every data source or output can be connected using CloverDX. Additionally, it breaks down data silos, prevents vendor lock-in, and can customize connections specific to your business requirements.

  1. Apache NiFi

Next on our list of opensource ETL tools is Apache NiFi, a robust and powerful ETL tool specially developed to upvote and seamlessly leverage the host system’s capabilities. It helps process, distribute, route, transform, and mediate system data. NiFi leverages a web-based user interface that allows users to switch between design, control, feedback, and monitoring. NiFi can establish dataflows both visually and in real-time. Any changes you make to the data flow take effect right away.

Additionally, it has an extensive configuration with low latency, runtime, dynamic prioritization, and back pressure control for enhanced efficiency. These configurations can be customized and extended to multi-tenant authorization, standard protocols, and strategies. 

  1. Scriptella ETL 

Scriptella is yet another ETL and script execution tool available as an open-source tool. Launched by Apache, the tool is scripted in Java and can be used for executing scripts written in JavaScript, JEXL, Velocity, and many more. Unlike other opensource ETL tools, Scriptella is compatible with all cross-database ETL operations and provides a developer-friendly experience as it is interoperable with LDAP, JDBC, XML, etc. Unlike many other ETL tools, no prior knowledge of SQL (or any other extensive programming language) is required for basic ETL operations, making it very convenient for beginner and intermediate-level developers.

  1. Jedox ETL

While all other opensource ETL tools focus on accomplishing the process, the top-of-the-line Jedox ETL focuses on strategizing, investigating, covering, and monitoring performance during extraction, transformation, and loading. With its powerful data integration and preparation tool, developers can import and extract vast amounts of data from any source. Jedox also provides a user-friendly web-based interface for visual data modeling, enabling non-technical users to undertake more complex projects. 

Jedox Integrator offers preconfigured interfaces to all well-established relational databases and ERP/finance, CRM, HCM, and SCM applications. Any additional cloud or on-premises data source can be integrated using flexible connections, providing seamless authentication using a standard interface.

Read More: Solving the scaling errors in Optical Neural Networks

  1. KETL

A production-ready ETL platform, KETL, is open, multi-threaded, and built on an XML-based architecture. KETL allows the management of complex data, scheduling, and ETL activities by leveraging an advanced data integration platform. The multi-threaded engine comprises several job executors, each of which performs a specific function. These executors can mainly perform actions falling under three categories, SQL, XML, and OS. KETL also provides additional support for other jobs via KETI API. All kinds of data, including relational, flat files, XML data sources, and proprietary database APIs, are supported by KETL. Data integration and time/event-based scheduling require no additional third-party dependency. 

  1. GeoKettle 

GeoKettle is a potent, metadata-based ETL tool that integrates data from several sources to build and upkeep geospatial databases. It is a “spatially-enabled” version of Pentaho Data Integration software, formerly Kettle. With GeoKettle, users can extract data, transform it to fix errors, clean it, change its structure, make it consistent with standards, and then load the modified data into a GIS file, a target DBMS, or a geographic web service. This ETL service is mainly used for automating repetitive jobs without code. Due to its functionalities and read/write support for numerous file formats, services, and DBMS, GeoKettle is dependable, quick, standards-compliant, and reliable, making it one of the best opensource ETL tools.

  1. Apache Camel

Another open source ETL tool by Apache, Camel, is an integration framework that enables users to integrate multiple systems consuming and producing data. It is one of the standalone opensource ETL tools that can also be embedded as a Spring Boot or Quarkus library. Camel is compatible with most standard integration patterns and keeps evolving to cover the newer patterns. It leverages multiple EIP patterns for data transformation and routing. Additionally, with support for several industry-standard formats from the financial, telco, healthcare, and other sectors, Camel supports about 50 data types. Recently, Apache Camel 3.19 was released with several features and significant improvements.

  1. Singer 

Singer is one of the most potent opensource ETL tools that seamlessly facilitates data extraction and loading. Stitch, a fully managed data pipeline, sponsors Singer. With Stitch, you can automate monitoring and alerting while running Singer taps on schedule and streaming the data to any target location. Singer describes data extraction via scripts called “taps” and data loading with scripts called “targets.” These taps and targets communicate data movements from any desired source to the destination. Taps extract data and output it in a JSON format, while Targets consume data extracted by Taps and load it in a file/API/database. Singer is also available on GitHub for everyone to access for free. 

  1. Matillion

The last one on our list of opensource ETL tools is Matillion. Matillion is an advanced ETL service and a part of a modern data stack designed for cloud-agnostic enterprises to help them manage day-to-day business data operations. Users can collect data from any source using its connectors and pipelines. Matillion simplifies pipeline management by leveraging batch loading from a single control panel. With Matillion’s lifetime free basic plan, enterprises can seamlessly integrate with Facebook, Gmail, Google BigQuery, Intercom, Azure SQL, LDAP, and many others to gather and analyze data. For more advanced features, Matillion offers paid plans depending on business needs.

Advertisement

Top Excel Formulas Bots of 2022

excel formula bot

Many people use Microsoft Excel or Google Spreadsheets daily to identify trends, organize data, and sort it into meaningful categories. But these tools are less intuitive for the general public when it comes to curating data presentations. Compared to more specific products like Microsoft Word, users would need prior knowledge before getting started with advanced excel or spreadsheets. This is because there are so many options, codes, and formulas to choose from, making it challenging to master the tool and ease your workflows. However, there are specific artificial intelligence tools and bots that can help. These bots intend to alleviate the challenges of creating an excel sheet, especially one with numerous formulas.

This article enlists some potent excel formula bot alternatives. Have a look.

List of useful Excel Formula Bots

Here is a list of some Excel Formula Bots, have a look.

  1. Excel Formula Bot

Excel Formula Bot is probably the most renowned AI bot that generates formulas from input.

The Excel Formula Bot comprises an input field where the user describes what is needed. The website generates results based on the prompts and showcases an example for users to understand how to give inputs. 

Developing formulas with an excel formula bot is as easy as seen in the above example.

  1. QRS Toolbox for Excel

To save time while working with Excel spreadsheets, QRS Toolbox is an excellent excel formula creator that provides custom functions to write short and standard formulas. Without any requirements of complex computing and VBA codes, the excel formula bot is available as an add-in and can directly process data within Excel, reducing software dependence. As an alternative to the excel formula bot, the QRS toolbox is the only publicly available add-in that fits all Pearson and Johnson distributions and other custom techniques not commonly found in other software.

  1. Excel CoPilot

Excel CoPilot is an excellent excel formula bot alternative that can save numerous hours every week. The bot utilizes artificial intelligence to generate complex spreadsheet formulas precisely. It has been trained explicitly on millions of lines of text and code, eliminating the need for users to work through codes to generate formulas. The AI-powered formula bot works on text-to-formula principles and natural language and is available as a free chrome extension.

  1. Publisheet

The first excel formula bot alternative, Publisheet, is an Excel add-in that enables users to have a dynamic cloud-hosted worksheet experience. The add-in comes with formula support without any necessary coding requirements. It is an efficient tool for converting spreadsheets into web pages from within Excel. The tool is compatible with Excel 2016 or later and the online version of Excel. Publisheet also allows users to create custom reports accessible via a public URL.

  1. Lumelixir or Onetap.ai

Lumelixir, also called OneTap.ai, is one of the best AI-based excel formula bot to get rid of googling formulas while working with spreadsheets. People behind Lumelixir believe that time is the most essential element. Lumelixir can help by immediately generating complex formulas within seconds. When given an input to “Find and replace ‘specific’ with ‘set’ from Column AQ,” Limelixir will give output formulas as “=find(“specific”,AQ)&replace(“specific”,”set”,AQ).”

The excel formula bot is available as a chrome extension

  1. Formula Builder – Daniel’s XL Toolbox

Instead of using the excel formula bot online, using the Formula Builder from Daniel’s XL Toolbox can help you generate formulas by automatically collecting the cell references. The formula builder only requires four cell ranges: input groups (group names), input data (cells that contain data), output groups (groups for which a formula is needed), and output formulas (cells where the formula builder writes the formula). Once these inputs are provided, the formula builder will generate the desired output.

As seen in the above images, the excel formula bot will work out the input groups and input data to generate the sorted output groups and the formulas.

Advertisement

Top Web Scraping tools in 2022

web scraping tools

Web scraping is a mechanized technique to extract massive amounts of data from websites. The bulk of this data is semi-structured in HTML format and is later transformed into structured data, stored in a database or spreadsheet so that it can be used in multiple applications. Although web scraping can be done manually, automated methods using specific web scraping tools are typically preferred since they can be less expensive and perform more quickly. 

Web scraping, however, is typically not an easy operation. Initially, the scraping tool will be fed with one or more URLs. Then the scraper loads the entire HTML code for the requested page and extracts all the data or specifics. Some advanced web scraping tools can render the entire webpage, including CSS and Javascript elements. Finally, the web scraper will export all the acquired data in a more user-friendly format. Most online scrapers output data to Excel spreadsheets or CSV, but more complicated ones support other formats like JSON.

Because of digitization, websites have gathered massive amounts of data, and web scraping techniques have gained popularity. Web scraping tools differ in functionality and features since website data comes in various kinds and sizes. Some of the standard data scraping techniques that the best web scraping tools use, are:

  • HTTP programming
  • HTML parsing
  • DOM parsing
  • Semantic Annotation
  • Computer Visions Web-page analysis

Best Web Scraping Tools you Should Try

Since every web crawler is different, choosing the right one can be challenging. This article describes what is a website scraper and compiles a list of some of the best web scraping tools.

  1. Scraper API

Web scraping is a complex procedure, but Scraper API, one of the best web scrapers, simplifies it by handling proxies, browsers, and CAPTCHAs. Scraper API has built multiple web scrapers and repeatedly went through setting procedures to build use-specific web scrapers. There are many web scrapers based on the data one needs to extract. Nevertheless, all APIs work similarly, and so does Scraper API. The user requests a particular source. This request is received by the scraper API, which connects to the target system using its specifics. It then extracts data from that system and gives it back to you for processing and storing for later use or immediate usage.

This webpage scraper offers a new Async Scraper endpoint that enables web scraping jobs at scale without any timeouts, making data scraping more resilient. The API offers seamless integration with NodeJS, NodeJS Puppeteer, and Cheerio.

You can avail of 5000 free API credits with their 7-day trial service post, which you can use for US$49/month. For more pricing details, you can visit the website.

  1. Smartproxy SERP Scraping API

Web scraping the Google Search results pages can be tedious as Google does not allow it. Moreover, scraping at a rate higher than eight keyword requests/hour risks your detection, and more than ten keyword requests will result in blocking. An excellent solution to this is offered by the SERP scraping API, the best web scraping tool from Smartproxy. Web scraper, data parser, and a sizable proxy network are all combined in the API.

This full-stack tool for web scraping allows users to send a single, successful API request to retrieve structured data from the most significant search engines. The search engine proxies from Smartproxy can be used for everything from monitoring prices to retrieving paid and organic data to examining keyword ranks and other SEO metrics in real-time.

Smartproxy scarper service offers multiple plans based on your requirements for proxy requests. There are four plans, Lite, Basic, Standard, and Solid. Additionally, Smartproxy offers enterprise-level plans for more complex results. For more details, you can check their pricing page.

  1. ParseHub

The previous generation of scraping tools was based on codes and hours of coding. To make web scraping tools more precise and save coding time, no-code development platforms like ParseHub have come into the picture. 

With this web scraping tool, users can create their data extraction workflows without programming knowledge. ParseHub can manage all source code element selection and neighbor element prediction independently.

Data scrapers like ParseHub offers a FREE version without any credit card requirement using which users can extract 200 pages per run within 40 minutes. ParseHub offers Standard, Professional, and Enterprise-level plans to offer better services and more pages. Check their pricing page for more details.

  1. Web Scraping using Beautiful Soup

The above-mentioned data scraping techniques are third-party scraping tools. You can also scrape data manually by utilizing open-sourced libraries and codes. Beautiful Soup is a Python library that extracts data from HTML, XML, and other similar formats. Simply, it helps users to pull specific content from a webpage by removing the HTML markup and saving information. The library can be used to isolate titles, links, and texts from  HTML tags and alter HTML within the document.

To scrap data, the user sends an HTTP request to the target URL. Once the access is granted, data has to be parsed using an HTML parser, like html5lib, that creates a nested data structure. Finally, the last step is to navigate and search the parse tree using Beautiful Soup. 

It is a no-cost way to extract data from web pages. Install all third-party libraries, requests, html5lib, and bs4 using the pip command and follow the steps to scrap data. 

  1. Octoparse

Octoparse is a cloud-based web data extraction tool that helps scrap data from various websites. Users can use it to scrape product comments, reviews, social media channels, and other unstructured data and save it in different formats, including HTML, Excel, and plain text. Octoparse is capable of running multiple extraction tasks simultaneously. These tasks can be scheduled in real-time or at regular intervals.

Octoparse offers two customized modes, the Wizard Mode and the Advanced Mode. The Wizard mode provides step-by-step instructions for scraping data, while the Advanced mode offers features for more complex web pages. Additionally, the IP rotation feature prevents XML/API blockages from suppliers. 

Octoparse provides services that include email and online knowledge base help on a monthly subscription basis. The free plan has a cap of 10.000 records per export and a low number of concurrent crawlers and runs. For more information, refer to the pricing page.

Read More: Donald Trump Launches $99 Digital Trading Card NFTs Minted On Polygon.

  1. Helium Scraper

Most websites that display lists of information do so by querying a database and intuitively presenting the data. This procedure is reversed by a web scraper, which takes unstructured websites and converts them back into a database. Helium Scraper is a web scraper that focuses on the kind of data to be extracted and not on how to extract it. 

It offers software for web scraping using multiple off-screen Chromium browsers, presents a simple interface, and integrates web scraping and API calling into a single project. The web scraper tool also supports JavaScript code and function generation to match, split or replace extracted text. 

For new users to get started, there is a 10-day trial available. Later, with a one-time purchase, users can buy the software for a lifetime. For more information, refer to the pricing page.

  1. Apify

Apify is an automation, data extraction, and web scraping platform. With Apify, users can create an API with an integrated data center and residential proxies for extraction. For web pages like Instagram, Facebook, Twitter, and Google Maps, Apify Store offers ready-made scraping solutions. Developers can build customized scraping tools for others websites while Apify handles infrastructure and payment.

Apify offers shared IPs and seamless integration with Keboola, Transposit, Airbyte, Zapier, and other similar platforms. It supports programming languages like Selenium, Python, and PHP.

Apify provides 1000 no-cost API requests. For more requests, the plans start at US$49/month and come at a 20% discounted value with yearly payments.  For more information, refer to the pricing page.

  1. Zenscape API

Despite numerous online scraping solutions, Zenscrape is one of the most reliable data scrapers. It meets your requirements and does web scraping on a big scale while resolving any problems. It is another online web scraper tool with no coding requirements. With Zenscape, users can extract data from any website having anti-scraping measures by its IP rotation, CAPTCHA solving, and other features. 

Zenscape provides a user-friendly interface, JavaScript rendering and supports many front-end frameworks like JQuery, Vue, and React. Additionally, Zenscape does not limit the number of Queries Per Second, and every request is allotted to a unique IP address.

Zenscape offers a lifetime free plan for US$0, a Small plan for US$24,99, a Medium for US$79,99, and a Large for US$199,99. For more information, refer to the pricing page.

  1. Import.io 

There are plenty of ways to scrape data and mine information from a website. One of the numerous services that intends to streamline the scraping process is Import.io. Import.io is an e-commerce platform that helps enterprises create more innovative analytics and offers web scraping assistance. It leverages a no-cost and convenient data scraping service, even for websites that employ JavaScript and display results over numerous pages. 

Users can download, install, and launch Import.io for Windows, OS X, and Linux by going to the website. Then create an Import.io account, which can be done for free up to 250,000-page calls each day, or sign in with GitHub, Google, or LinkedIn accounts. Click here to learn more about the prices.

  1. Sequentum Content Grabber

Sequentum Content Grabber is yet another low-code web data extraction tool that automates the extraction process by adapting to recurrent data, code, and environment changes. The scraper tool is aimed at enterprises that wish to reduce their coding labor and time by creating stand-alone web crawling agents.

The end-to-end data extraction platform can be used in-house and outsourced for web data. For web data extraction, document management, and intelligent process automation, this tool for scraping offers total control (IPA). Users will be able to create scripts or debug the crawling process programming using C# or VB.NET. Almost any website’s content can be extracted and saved as structured data in the desired format.


The annual enterprise license starts at US$15,000. To scale their operations, some enterprises may require additional licenses, which can be added for additional costs. Refer to the main website for prices.

Advertisement

Infinity AI raises $5M in a seed funding round to use synthetic data

Infinity AI, the synthetic data generation startup, has recently raised $5M in a seed funding round led by Diana Kimball Berlin at Matrix to build faster AI models using synthetic data. The founders and operators from the companies like Tesla, Snorkel AI, and Google also participated in the round. 

The company noticed that AI models are only as good as the data they have been trained on. So, data collection is one of the main challenges in making better AI models. According to Infinity AI’s studies, many data scientists spend 80% of their time gathering, organizing, and labeling AI training data. As a result, many AI projects do not lead to production.

According to Infinity, the training data collection problem can be solved using synthetic data. It allows users to upload a single real video on its platform and transform it into hundred of perfectly labeled synthetic videos. 

Read more: retrain.ai, one of the most promising startups of 2022: Globes

Over the past two years, several companies have relied on synthetic data to solve the training data collection problem and enhance their AI and machine learning models, including Tesla, Amazon, and Microsoft.

Infinity AI stated that the accuracy of the AI model is directly correlated to its training data. The process of collecting real-world data is very time-consuming and expensive. After the data is collected, it has to be correctly classified and annotated before using it for training. Therefore, many organizations are moving towards synthetic data, especially if the data acquisition and training costs are limited.

Advertisement