Researchers at the Stanford Center for Research on Foundation Models (CRFM) have recently worked on investigating industry-specific LLMs (large language models). They introduce PubMed GPT as a part of their research, specifically focusing on biomedicine.
Using the MosaicML cloud platform, CRFM researchers trained a GPT model on PubMed biomedical papers. The resultant model is highly accurate in several NLP tasks. PubMed GPT is based on a HuggingFace foundation and uses a biomedical tokenizer trained on the Pile dataset abstracts and PubMed central sections. It uses the PyTorch framework and the composer from MosaicML for training LLMs.
After training the model, researchers evaluated it on several popular benchmarks, a critical measure being the MedQA-USMLE question-answer challenge. In addition, the researchers manually assessed its generations for a task that involved summarising questions. The researchers employed several previous CRFM and biological models, including GPT-Neo, Galactica, and PubMedBERT.
Read More: Meta Turned Down the Galactica Demo After Being Criticized as “Dangerous”
The researchers concluded that LLMs are versatile and have much to offer when trained on domain-specific datasets. But the versatility comes at a cost due to many parameters. Model complexity, cost, specialized architectures, and domain knowledge are all trade-offs with the performance of PubMedGPT.
The researchers plan to concentrate future work on enhancing the scope of the model and assessing it against a more extensive collection of NLP tasks. PubMed GPT is intended solely for research as it is yet to be developed for production.