It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text,...