We will present BloombergGPT, a 50 billion parameter language model, purpose-built for finance and trained on a uniquely balanced mix of standard general-purpose datasets and a diverse array of financial documents from the Bloomberg archives. Building a large language model (LLM) is a costly and time-intensive endeavor. To reduce risk, we adhered closely to model designs and training strategies from recent successful models, such as OPT and BLOOM. Nevertheless, we faced numerous challenges during the training process, including loss spikes, unexpected parameter drifts, and performance plateaus.
In this talk, we will discuss these hurdles and our responses, which included a complete training restart after weeks of effort. Our persistence paid off: BloombergGPT ultimately outperformed existing models on financial tasks by significant margins, while maintaining competitive performance on general LLM benchmarks. We will also provide several examples illustrating how BloombergGPT stands apart from general-purpose models.
Our goal is to provide valuable insights into the specific challenges encountered when building LLMs and to offer guidance for those debating whether to embark on their own LLM journey, as well as for those who are already determined to do so.
David Rosenberg, Head of ML Strategy, Office of the CTO, Bloomberg
6 сен 2024