The CoT Collection, a new dataset created for instruction tuning, was unveiled by a research team recently. 1.88 million CoT rationales from 1,060 tasks are included in the collection. The CoT Collection dataset and the trained models are accessible through the team’s GitHub repository.
The team has carefully considered the trustworthiness, logical coherence, and informativeness of the CoT Collection in comparison to human-authored CoT rationales. The C2F2 model has also been introduced, which was developed by continuously adjusting Flan-T5 LMs with 3B and 11B parameters using the CoT Collection. It has been shown that using the CoT Collection for fine-tuning led to better zero-shot CoT performance on hidden problems.
How effectively C2F2 works in situations where learning happens in a small number of instances, or few-shot learning, is discussed in the research paper. On domain-specific datasets from the legal and medical fields, parameter-efficient fine-tuning (PEFT) on C2F2 outperforms direct fine-tuning using FLAN-T5. The benefits of utilizing CoT arguments to enhance task generalization and encourage future study have also been highlighted by the authors.
In order to determine the degree of improvement following the use of the CoT Collection, the researchers assessed the average zero-shot accuracy on 27 datasets of the BIG-Bench-Hard benchmark. The 3B and 11B LMs’ accuracy improved by +4.34% and +2.44%, respectively. The few-shot learning capabilities of the language models were also enhanced by the CoT instruction modification. This resulted in improvements of +2.97% and +2.37% on four domain-specific tasks as compared to Flan-T5 LMs (3B and 11B), respectively.
In comparison to earlier CoT datasets, the CoT Collection contains over 52 times as many CoT justifications and roughly 177 times as many jobs. The CoT Collection dataset, in conclusion, demonstrates the efficacy of CoT justifications for enhancing task generalization in Language Models under zero-shot and few-shot learning conditions. It overcomes the difficulties encountered when applying CoT reasoning in more compact language models.