The BharatGen Initiative:

How IIT Bombay is Powering India’s AI Revolution

At IIT Bombay, research in artificial intelligence (AI) is driven by a clear national purpose: building sovereign capability, reducing dependence on external technologies, and translating cutting-edge innovation into systems that can operate at India’s scale. Faculty across AI and machine learning work at the frontier of language, speech, vision, and large-scale systems, anchoring solutions in Indian data, languages, and real-world conditions.

This commitment to Atmanirbhar Bharat, where critical digital infrastructure is designed, owned, and governed within the country, sets the foundation for BharatGen, a landmark initiative that positions India not just as a consumer of global AI but as a creator of AI built for its own people. BharatGen is bringing together research institutions, government, and industry to create a foundational AI infrastructure built for India.

Read on as the BharatGen team outlines in detail what this initiative is, how it works, why it matters, and how it is enabling developers, startups, governments, and enterprises to build real-world AI applications grounded in India’s languages, contexts, and needs.

BharatGen at IIT Bombay

BharatGen is India’s sovereign AI initiative building large language and foundational multimodal models that include text, speech, and vision combined, and enables people across India to engage with advanced AI in their own languages instead of being restricted to English.

1. BharatGen and LLMs in simple terms

A large language model is a computer program trained on massive amounts of text so it can understand questions and write human-like answers in natural language. What makes BharatGen different is that it’s built from the ground up for India, with models trained heavily on Hindi and other Indian languages, not just adapted from Western models with a bit of Indian data added.​

Speech is central to how BharatGen works. The initiative isn’t just about reading and writing—it includes Automatic Speech Recognition (ASR) so users can speak in Hindi, Marathi, Tamil, or other Indian languages, and get responses read back to them in their own language through text-to-speech (TTS) technology. This matters a lot for people who don’t use English or aren’t comfortable typing, which includes most of India’s farmers, informal workers, and first-time internet users.​

BharatGen’s main text model is called Param-1, a 2.9 billion parameter model trained entirely on bilingual Hindi-English data, designed to handle code-mixing (when people switch between languages mid-sentence) and understand Indian cultural references. For understanding documents—forms, certificates, handwritten pages—there’s Patram-7B-Instruct, a vision-language model with 7 billion parameters trained specifically on Indian administrative documents, IDs, and certificates.​

The speech side includes Shrutam, an ASR model trained on over 2,000 hours of Hindi audio from real-world conditions like call centers and busy streets, so it actually works when there’s background noise. For converting text to natural-sounding speech in Indian languages, BharatGen released 19 different speech synthesis models across languages like Marathi, Bengali, Tamil, Telugu, Kannada, Punjabi, Gujarati, and Malayalam.​

2. What BharatGen enables for developers and startups

Developers and startups get direct access to Param-1 and Patram-7B-Instruct—they can download them, fine-tune them for their own use cases, and deploy them without having to build models from scratch. These are available on platforms like Hugging Face and AIKosh, making it easier and cheaper to build Indian language applications.​

But what’s interesting is that BharatGen isn’t just releasing research models. It’s building actual product-level POCs that show what’s possible. Krishi Saathi, for example, is a voice-enabled WhatsApp bot that gives farmers localized advice on crop management, pest control, and yield prediction in Hindi and English. Farmers don’t need to read instructions or type anything—they just speak.​

e-VikrAI tackles a real problem in Indian e-commerce: most online sellers, especially in rural areas, struggle to write product descriptions in English. e-VikrAI lets a seller photograph their product and automatically generates a full product listing with descriptions, category tags, and pricing suggestions in multiple languages. This matters because online retail is only 8% of India’s market; a lot of that gap is because sellers can’t easily move offline inventory online.​

DocBodh is built on Patram-7B-Instruct and handles the document problem. India runs on paperwork—government forms, bank statements, legal notices, school certificates—but they’re often long, written in jargon, and in different languages. DocBodh lets you talk to a document in your language and get back simple, accurate answers about what it says.​

These three sample applications show what the ecosystem can unlock. Startups can layer their own expertise on top of BharatGen’s foundation models, focusing on distribution and domain knowledge rather than spending months building language models.

3. Significance of IIT Bombay launching its first-ever AI company

Most research institutes primarily focus on academic work and contribute by publishing research papers on their cutting-edge work. The IIT Bombay–led consortium went a step further by establishing the BharatGen Technology Foundation, a Section 8 (not-for-profit) company, effectively creating its own AI entity. This is significant because the institute is not only doing research but also building and operating long-term AI infrastructure for India with a strong focus on industry readiness.

The foundation brings together eight leading institutions: IIT Madras, IIT Kanpur, IIT Hyderabad, IIIT Hyderabad, IIT Mandi, IIT Kharagpur, IIIT Delhi, and IIM Indore. This isn’t a single-institute project—it’s a national consortium pooling research, datasets, talent, and domain expertise.​

The funding reflects how seriously the government takes this. The Department of Science and Technology allocated ₹235 crore, and the Ministry of Electronics and Information Technology committed an additional ₹1,058 crore through the IndiaAI Mission, totaling ₹1,293 crore. To put this in perspective, that’s comparable to how ISRO is funded for space infrastructure—it shows that building sovereign AI is now treated as critical national infrastructure.​

4. How corporates can engage with BharatGen

BharatGen is government-funded, and it’s working with industry partners on large-scale, production-ready AI applications.​

For corporates wanting to engage, the model is straightforward: integrate BharatGen models into your platforms, help adapt models for specific sectors, or license the technology or contribute domain datasets. BharatGen is moving toward a licensing revenue model in 2026 where companies license models for production use, making the initiative self-sustaining beyond government grants.​

BharatGen is actively invested in early deployments and is helping corporates crack some of their Use cases and core problems with teams’ foundational understanding put to use in solving them with 3Cs, Compute efficiency, Colocation, and Cost optimisation. This approach benefits corporates too. They get access to multilingual, multimodal AI capabilities without building these capabilities themselves, and they’re building on a sovereign stack that aligns with India’s data protection goals.

BharatGen is also partnering with other partners in the ecosystem to enable AI. In September 2025, IBM announced a partnership with BharatGen to integrate IBM’s watsonx platform with BharatGen’s models, targeting agriculture and healthcare deployments while maintaining data sovereignty. This isn’t a research collaboration—IBM is bringing its enterprise platform expertise to help scale BharatGen’s models for companies and governments.​

Other partners include Zoho (contributing enterprise datasets) and NASSCOM (providing industry expertise). Government agencies are also partners: the Ministry of Water and Sanitation and state governments like Maharashtra are co-creating applications for their specific sectors.​

Alternative ways for corporates to engage can be CSR contributions to IIT Bombay (approved CSR-eligible institution) by funding model training, domain datasets, and sectoral pilots in areas aligned with CSR priorities such as education, healthcare, agriculture, skilling, financial inclusion, governance and access to justice.​

5. Consortium structure and IP ownership

BharatGen operates as a government-backed consortium under India’s National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), with partner institutions contributing research, datasets, benchmarks, and domain expertise.​

The Section 8 structure allows it to function as a shared national platform. Core foundational models like Param-1 and Patram-7B-Instruct are owned by the BharatGen consortium, but the strategy is to release them as open-weight or open-source assets so they’re widely accessible for Indian innovation. The thinking is that if private companies and startups can build on these open foundations, India gets a richer ecosystem faster.​

For partnerships with industry, the typical model is: BharatGen provides the models and stack readiness, partners co-develop domain-specific adaptations and applications for their sectors, and IP is co-owned or licensed appropriately. This protects both the public investment and incentivizes commercial innovation.​

By doing this, BharatGen becomes a shared infrastructure layer—like digital public goods—that everyone builds on.

BharatGen at a Glance

  • BharatGen is India’s sovereign AI stack. A national consortium led by IIT Bombay built from scratch on Indian data, Param-1 and Patram-7B-Instruct, along with other models work across text, speech, and vision. They’re open-weight models released so startups, governments, and companies can build applications without relying on foreign AI systems.​
  • Voice is fundamental, not an afterthought. With speech recognition trained on real Hindi speech, text-to-speech synthesis in multiple languages, and voice-first interface examples like Krishi Sathi, BharatGen ensures AI works for non-English speakers and people with low digital literacy.​
  • The Department of Science and Technology and the Ministry of Electronics and Information Technology are treating sovereign AI the way they treat space and cyber infrastructure. IBM, Zoho, NASSCOM, and government ministries are integrating BharatGen’s models into enterprise systems.​
  • Real applications exist now, not just research prototypes. Krishi Saathi gives farmers voice-based crop advice. e-VikrAI helps rural sellers list products. DocBodh makes government documents readable to citizens in their languages.
  • The business model is moving to sustainability through licensing. BharatGen would be generating revenue in 2026 through enterprise licensing, reducing reliance on government grants, and proving that sovereign AI can be commercially viable.​
  • It’s a complete multimodal stack—text, speech, and vision. Only a handful of countries have built the full range. This positions India to export AI technology to the world.

Prof. Ganesh Ramakrishnan

Principal Investigator of BharatGen


The following video is an official statement by Prof. Ramakrishnan, in which he discusses the core mission and focus areas of the BharatGen initiative. The video provides comprehensive information about BharatGen’s goals, including the Bharat Data Sagar dataset, multilingual and multimodal capabilities, workforce development, and industry-academic collaborations.

Prof Ganesh Ramakrishnan Speaks About BharatGen

(Ramakrishnan, 2024, 0:60-0:96)

 Image sources: BharatGen 1BharatGen 2