Training of Large Language Models (LLMs) by Indian Firms

Syllabus: GS3/ Science and Technology

Context

  • Bengaluru-based startup Sarvam AI unveiled two indigenous Large Language Models (LLMs), underscoring India’s push for sovereign, multilingual, and compute-efficient AI amid global competition.

Large Language Models (LLMs)

  • A large language model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.
  • Deep learning involves the probabilistic analysis of unstructured data, which eventually enables the deep learning model to recognize distinctions between pieces of content without human intervention.
  • It helps to understand how characters, words, and sentences function together.

Indigenous LLM Ecosystem in India

  • Sarvam AI Models: Focus on efficiency, accuracy, and Indian language capabilities. Intended to be open-source, though broader public scrutiny is ongoing.
  • BharatGen, incubated at IIT Bombay, trained a multilingual 17-billion-parameter model for sectors like education and healthcare.
  • Gnani.ai launched compact speech and text-to-speech models.

How LLMs Are Trained?

  • GPU Clusters: LLM training requires massive computational power using clusters of Graphics Processing Units (GPUs). Thousands of GPUs operate simultaneously for weeks or months.
  • Data as the Core Input: Training relies on enormous datasets, often scraped from the Internet.
  • Model Parameters: Parameters represent the internal weights through which models learn patterns. Sarvam AI trained models with 35 billion and 105 billion parameters.
    • Larger parameter counts improve capability but require more computation.

Key Training Methodologies Used

  • Data Curation: It focuses on collecting high-quality datasets in Indian languages.
    • It includes government documents, literature, media, and synthetic data generation.
    • It is critical for improving performance beyond English-centric AI systems.
  • Pre-Training: The models learn general language patterns by predicting the next token in large unlabelled datasets.
    • This stage builds foundational reasoning and grammar capabilities.
  • Fine-Tuning: Models are adapted for specific tasks using curated datasets.
    • Tools such as Hugging Face and LangChain support instruction tuning, classification, and domain adaptation.
  • Alignment/RLHF (Reinforcement Learning from Human Feedback): Human raters rank model outputs to teach it to be safer, more accurate, and better aligned with human intent, discouraging harmful or biased responses.

Challenges in Training LLMs in India

  • Limited Indian Language Data: Scarcity of high-quality datasets in Indian languages reduces model performance.
    • Many systems rely on translation into English before processing, increasing token usage and latency. Suboptimal native performance affects adoption among non-English users.
  • High Capital Requirements: Training frontier models demands substantial financial investment. Startups often lack immediate commercial returns to justify such costs.
  • Infrastructure Constraints: Access to high-end computing facilities remains limited without government support.

IndiaAI Mission

  • The IndiaAI Mission is the flagship initiative to build a comprehensive, sovereign AI ecosystem for India.
  • It focuses on developing high-performance computer infrastructure, indigenous foundational models, and safe, ethical AI, under the vision of “Making AI in India and Making AI Work for India”.
  • India has achieved 38,000 GPUs, providing affordable access to world-class AI resources.
  • A GPU or Graphics Processing Unit is a powerful computer chip that helps machines think faster, process images, run AI programs, and handle complex tasks more efficiently than a regular processor.
LLMs

Source: TH

 

Other News of the Day

Syllabus: GS1/Geography Context A recent study highlights that the 2022 increase in Earth’s energy imbalance was largely driven by a shift from a “triple-dip” La Niña to a warm El Niño, combined with long-term climate change. Earth’s Energy Imbalance Study Earth’s Energy Imbalance (EEI) refers to the difference between incoming solar radiation and outgoing heat...
Read More

Syllabus: GS3/Science and Technology Context The Blockchain India Challenge, launched by the Ministry of Electronics & Information Technology (MeitY) is a national initiative aimed at encouraging visionary Indian startups to pitch & pilot cutting-edge Blockchain-based digital governance solutions.  What is Blockchain? Blockchain is a distributed, transparent, secure, and immutable database that functions like a ledger...
Read More

Syllabus: GS3/Environment  In News Recently, it has been observed that Carbon Capture and Utilisation (CCU)  technologies are essential for achieving India’s net-zero emissions targets, particularly for hard-to-abate sectors like cement. Carbon Capture and Utilisation (CCU)  It refers to a set of technologies that capture carbon dioxide emissions from industrial sources or directly from the air...
Read More

RAMP Programme Syllabus: GS2/Governance Context The 5th National MSME Council has reviewed the progress of the World Bank supported RAMP Programme.  Raising and Accelerating MSME Performance (RAMP) RAMP is a World Bank supported Central Sector Scheme aimed at improving access of MSMEs to market, finance and technology upgradation by enhancing the outreach of existing MoMSME...
Read More
scroll to top