Presented at Open Knowledge Initiatives, IIIT-H
Easy and intuitive explorer for browsing affidavits
and parliamentary activity of elected MPs.
Data, development, design. We could do more.

Drainage patterns in Bangalore as a convenient map,
with curated historical context.
Building on the previous project, we analyzed election candidate names to find out “namesakes” which could have potentially flipped the election.
How do you find more namesakes like S Veeramani and S .V Ramani?
You’ve seen what we build. But how do we build it?
We’ll look at
India’s largest, and only, archive of film censorship


Each film leads you to others like it, creating links between 18,000 movies based on censorship

Normally the only way to find a certificate for a movie is to go around searching for something like this in a theatre
On opening the URL, the certificate is displayed with the list of cuts made to the film.


What I got, I accepted as my fate, What I lost, I gradually forgot.
This data was still unusable because the information that users cared about was hidden in piles of text and timestamps with no context.
Manual Classification

Large Language Models
Detailed prompts + Edge case examples = Text categorization that also cleans up messy content for better readability.
The interesting work—the analysis, the trends, the insights—that’s entirely (and necessarily) human.
01:32:59:00 Replaced the whole V.O. stanza about caste system of Manu Maharaj. Aabadi hain Aabad .Aur unka jivan sarthak hoga To Aabadi hain Aabad nahiazhadi ki
Clean Description: Replaced a voice-over passage discussing the caste system.
Categories: - REPLACEMENT - TEXT DIALOGUE - IDENTITY REFERENCE
Topic: CASTE
Why would you not want your users to share things? Everything is linkable!
https://cbfc.watch/film/sinners-2025https://cbfc.watch/browse/actors/fahadh-faasilhttps://cbfc.watch/browse/content/religioushttps://cbfc.watch/search?q=maps+language%3AEnglishAll 1.2 lakh records scrapped from E-Cinepramaan live on Archive.org
People have been able to use this data, and the way it was given, in many ways.
https://www.hollywoodreporterindia.com/features/insight/cbfc-data-malayalam-bhojpuri-u-rated-films
https://fortuneiascircle.com/uploads/download/FWD_03rd_November_to_09th_November,_2025.pdf
https://www.google.com/search?q=site%3Agrokipedia.com+%22cbfc.watch%22
How people in India spend their time each day.
Visualization by Nathan Yau, FlowingData
Other than creating fancy visuals, it can be used to answer many questions:
Suppose you want to answer any such question, how would you do it?
First you have to create an account on the MoSPI portal.
Then you get when you download the data.
To map the code in the data to an actual activity, you need to go through multiple documents.
Publicly accessible, browseable on the web and a single file
The data pipelines are replicable, do this for yourself!
A web interface for the Excel file is good, but it can be even better.
View the raw data.
Run time analysis queries.


Linkable and shareable URLs.
Districts with more incidental sleep/naps:
https://diagramchasing.fun/2025/time-use-explorer?viewMode=time_analysis&filters=%5B%22activity_code%7C%3D%7CIncidental+sleep%2Fnaps%22%5D&columns=%5B%22gender%22%2C%22age%22%2C%22state%22%2C%22district%22%2C%22activity_code%22%2C%22education%22%2C%22time_from%22%2C%22time_to%22%5D&demographic=%5B%22district%22%5D&activity=activity_code&agg=%5B%7B%22column%22%3A%22*%22%2C%22function%22%3A%22COUNT_DISTINCT_PERSON%22%7D%5DHow many people would use the data if they had to:
versus
MoSPI is one government entity that publicly releases data.
There are many such government entities. Most of them are worse than MoSPI.
Problem solved!
Haha, not really.
5 steps on a convoluted UI
…to access a single data point
They make the process of accessing them very inviting.
It’s never been easier to spin up a dashboard for your open-data with a few prompts.

It’s also never been easier for your audience to ignore yet another dashboard.

The scarcity of open data in India makes poor execution particularly costly.
PDF / Dashboards are “read only”
You can only participate by observing.
CSV / JSON / API are open
You can participate by observing and building.
We love to bash PDFs, but do we do the same for dashboards?
Most dashboards, including government ones, pretend to be the final answer.
LOOK NO BEYOND ME!
Instead,
Think about the various kinds of users your data might attract, can you make their lives easier?
Show them one way to slice the data. Get them thinking of more!

We extensively document all data releases because we want users to use this data.
No guesswork.

If there is a national disaster, NDEM website hopes you are on a laptop.

Public data means public. There is already friction to generating interest in information, more barriers lose more people.
https://www.mdpi.com/2076-3387/13/11/229
Websites regularly disappear.
Indian government websites especially so.
What to do about this?


Mirroring to archive.org turned out to be straightforward.
After CBFC, mirrored documents from a bunch of other sites:
archive.org automatically makes documents findable and searchable.
What was said in Parliament about the Cuban Missile Crisis?

Government data should be about more than just numbers and stats.
Government data is the result of government processes.
Everyone has a different answer.
Wikipedia says 780
Local Government Directory says 778
Google/Gemini says 780 to 806
But these issues are not unique to open data, they are general to all data.
Even if the data is not open, they still use it to train the LLM.
As a team of two with day jobs, LLM-assitance in code helps us maintain our pace.
Grunt work only. We use LLMs for code and data cleaning. We never use it for writing, analysis, or creating art.
Open source. We publish not just the code, but the prompts and the raw data. The pipeline must be reproducible.
Transparency. If a dataset was cleaned or summarized by AI, the metadata and UI must explicitly say so.
Keep us in your bookmarks!
diagramchasing.fun