UT Austin's Inheritune helps train efficient language models: leverage inheritance and reduced data to achieve comparable performance

Screenshot 2024-04-20 at 10.58.07 PM — https://arxiv.org/abs/2404.08634

Scaling up LLM poses significant challenges as it requires vast computational resources and high-quality datasets. The pre-training process typically involves taking a model with billions of parameters and training the model on a dataset containing trillions of tokens. This complex procedure requires considerable computational power and access to high-quality data to improve the performance of language comprehension and production tasks.

Researchers at UT Austin say,inheritance”, How to distinguish between small and large base LMs. They inherit some transformation blocks from a larger LM and train a small model on a small fraction (0.1%) of the original pre-training data. This approach leverages a single GPU to efficiently create an LM with 1.5 billion parameters in less than 12 hours using just 1 billion tokens. Despite using significantly less data, the resulting model performs on par with publicly available LMs trained on large datasets and demonstrates effectiveness across a variety of settings. .

Previous approaches to training small-based LMs have included large-scale training from scratch with trillions of tokens or leveraging high-quality synthetic data. For example, tinyllama-1B is trained from scratch with 3 trillion tokens over 90 days. In contrast, Inheritune efficiently trains small base LMs by inheriting transformer blocks from larger models and training them on a small subset of data, achieving comparable performance with significantly fewer computational resources. will be achieved. While model compression techniques have been successful in other areas such as neural networks, they are still not very effective for the complex features of large-scale LMs.

In the Inheritune approach, a small base LM is created by inheriting some of the pre-training data and some layers from an existing larger LM. First, the first n layers of the referenced model are inherited and the target model is initialized. The target model is then trained on the available subset of training data for the specified number of epochs. In an experiment, the researchers used his 1 billion token subset of his Redpajama v1 dataset to train a 1.5 billion parameter LM and compared it to scratch-trained and derived LMs to achieve competitive performance. achieved a certain performance. Researchers have evaluated their approaches using different baseline models, mainly considering the quality of the pre-training data for a fair comparison.

Inheritance allows the extraction of smaller target LMs without sacrificing performance, resulting in comparable zero-shot performance on related downstream tasks. Additionally, these LMs outperform similarly sized models trained from scratch and outperform them with fewer training steps. Experiments with the GPT2 medium model demonstrate that initialization with Inheritune, especially the use of retention and MLP weights, provides good convergence speed and final validation loss performance. Surprisingly, initializing the attention or MLP weights similarly improves convergence speed and validation loss.

Additionally, limitations of the Inheritune method include the inability to change the architectural design other than by changing the number of transformer blocks, which can limit flexibility in customizing hidden sizes and attention heads. . Due to the small size of the training dataset, sensitivity to the quality of the training dataset is also an issue. Additionally, we need to explore ways to improve the selection of retained blocks, curation of datasets, and tuning of hyperparameters. Nevertheless, this study shows that Inheritune effectively pre-trains small base language models with minimal data and computational resources, providing a direct approach to model reduction from large reference models. Then I conclude.

Please check Paper and Github. All credit for this study goes to the researchers of this project.Don't forget to follow us twitter.Please join us telegram channel, Discord channeland linkedin groupsHmm.

If you like what we do, you'll love Newsletter..

Don't forget to join us 40,000+ ML subreddits

Learn more about content partnerships here Please fill out the form here.

Sana Hassan, a consulting intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a new perspective to the intersection of AI and real-world solutions.

🐝 Join the fastest growing AI research newsletter from researchers at Google + NVIDIA + Meta + Stanford + MIT + Microsoft and more…

Source link

What's Hot

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

UT Austin's Inheritune helps train efficient language models: leverage inheritance and reduced data to achieve comparable performance

Unraveling UN Gaza death toll data

Grindr’s chief privacy officer on the dating app’s data controversies

Everything your parents said about posture is true.For data security

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI-powered SEO software market [2024-2031] Size, Trends, Sales, Revenue Forecasts HubSpot. Marketo. Oracle – Economica

AMD Ryzen AI CPU beats Intel Core Ultra in AI LLM and GenAI benchmarks, delivers lower power consumption and lower cost with XDNA

Microsoft investigates harmful AI-powered chatbot 'Copilot'

AnkerWork S600 review: An AI-powered speakerphone that actually works

Our Picks

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Most Popular

OnlyFans creator dishes dirt on dating

Anya Taylor-Joy has big plans to rival Gwyneth Paltrow's £197m business Goop as she prepares to launch a lifestyle business

OnlyFans star suffers from online stalking by family member: 'It hurts my stomach'

Subscribe to Updates

What's Hot

UT Austin's Inheritune helps train efficient language models: leverage inheritance and reduced data to achieve comparable performance

Related Posts