Ministry of Science & Technology
azadi ka amrit mahotsav

PARLIAMENT QUESTION: BHARATGEN AI MODELS

Posted On: 06 AUG 2025 6:03PM by PIB Delhi

BharatGen is the first government supported national initiative to develop a range of sovereign foundational AI models tailored to Indian languages and societal contexts. It spans multiple modalities, including text (via Large Language Models), speech (Text-to-Speech and Automatic Speech Recognition), and vision-language systems.

Currently, BharatGen models cover 9 Indian languages which include Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada.

Roadmap of language coverage across these models includes the following milestones:

  • By December 2025, a total of 15 Indian Languages (including Assamese, Bengali, Gujarati, Hindi, Kannada, Maithili, Malayalam, Marathi, Nepali, Odia, Punjabi, Sanskrit, Sindhi, Tamil and Telugu) will be covered.
  • By June 2026, all 22 scheduled Indian languages will be covered.

BharatGen has developed applications in the sectors of agriculture, governance and defence, wherein pilots have been carried out. Once deployed fully, it is planned to make these applications available across all states and districts.

BharatGen is implemented as a project under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS) of Department of Science and Technology (DST). As part of the BharatGen network, following Technology Innovation Hubs (TIHs) are currently active:

  1. TIH Foundation for IoT and IoE, IIT Bombay (Maharashtra)

This TIH is hosting the BharatGen project and serves as the central program implementation and coordination hub. It is responsible for end-to-end execution, including managing the national academic consortium, overseeing the development of sovereign foundational AI models across text, speech, and vision, and driving ecosystem partnerships for compute, data, and talent. The TIH also drives governance and strategic planning for the BharatGen project, ensuring cohesive progress across all stakeholders.

  1. IITM Pravartak Technologies Foundation, IIT Madras (Tamil Nadu)

This TIH serves as an implementation partner, focusing on solution and real-world deployment of BharatGen AI technologies. Thematic focus areas include governance, security, and media-related use cases.

The following institutions are a part of the BharatGen consortium:

Institution Name

Role in BharatGen

Indian Institute of Technology, Bombay

Lead institution, guiding research and integration across consortium partners

International Institute of Information Technology, Hyderabad

Vision-language document modelling

Indian Institute of Technology, Madras

Speech foundation model development and evaluation

Indian Institute of Technology, Kanpur

Legal AI research, domain-specific datasets, and developing tokenization strategies for multilingual models

Indian Institute of Technology, Hyderabad

Advanced tokenization and vocabulary optimization for large multilingual LLMs

Indian Institute of Technology, Mandi

Inclusive multilingual model development and research on efficient training strategies for LLMs

Indian Institute of Management, Indore

Bharat-centric evaluation and benchmarking of LLMs, multilingual and multimodal data collection

 

While BharatGen is working with the above consortia members, it may explore partnerships with research institutions in Karnataka.

Since BharatGen AI is currently under pilot deployment phase, it has not been released for public and institutional use. Once fully deployed, there is a plan to extend its applicability across all states and districts.

This information was given by Dr. Jitendra Singh, Union Minister of State (Independent Charge) for Science and Technology, Earth Sciences, MoS PMO, MoS Personnel, Public Grievances & Pensions, Department of Atomic Energy and Department of Space, in a written reply in the Lok Sabha today.

 ***

NKR/PSM

 


(Release ID: 2153187)
Read this release in: Urdu , Hindi