The teams get dismantled. The testing tools are scientifically inadequate. The commitments don’t survive changes, corporate or political. And no jurisdiction has found a governance model that works to make AI safe and responsible. That is the assessment of Rumman Chowdhury, who led AI safety policies at (erstwhile) Twitter, at the US government, and in civil society, speaking to HT on the sidelines of the India AI Impact Summit.

“One of the big things that happens at all these summits is new voluntary commitments,” she said. “It’s lovely PR and great optics, but what does it tangibly, fundamentally mean? These organisations cannot live at the mercy of companies or political organisations. In order to be codified or solidified, it needs to actually be something required by law.”
Chowdhury has reason to be direct. She directed the ML Ethics, Transparency, and Accountability (META) team at X (formerly Twitter), which published a study on algorithmic amplification of political content. The team was dissolved after Elon Musk’s acquisition. She subsequently led the US Department of Defense’s Responsible AI division, was designated the US Science envoy for AI, and stepped down as the Trump administration reorganised the office and revoked Biden-era safeguards. Two responsible AI teams, two collapses — one corporate, one political.
For Chowdhury, CEO and co-founder of AI safety nonprofit Humane Intelligence, the Twitter algorithm analysis — the only one by a Big Tech company yet involving independent, outside researchers — exposed a problem relevant in AI safety regulation today. The analysis had found Twitter was surfacing more right-wing content organically.
Chowdhury’s working hypothesis was that it was “not algorithmic bias, but algorithmic reinforcement of human behaviour” — the system surfacing what people were already engaging with.
“If everyone’s interacting with right-wing content, then we’re surfacing more right-wing content — again, by design,” she said. The algorithm, in other words, was not malfunctioning — it was optimising for engagement, which happened to skew right during a period of intense political polarisation. That raised a question the team could not answer technically, because it was not a technical question: “Who decides what is fair? Does Jack Dorsey decide? There’s no regulatory body.”
The India AI Impact Summit, adopting a ‘seven-chakras’ approach, lists safe and trusted AI as one of the key themes.
Institutionalising safety
Chowdhury said even where responsible AI teams survive leadership changes, their effectiveness depends on where they sit within an organisation — a question she said is rarely discussed.
Her ethics team at X was an engineering unit in core product. The alternative, in her experience, is less effective. “A lot of companies, when you mention theatre or optics — it sits as a pure policy or research arm, which is personally my least favourite place for it to sit,” she said. “You are only external facing. You have to be invited to the product table. You are not there by design.”
The framework would apply to government. Asked about India, where AI governance sits under the ministry of electronics and IT (Meity), Chowdhury, without commenting on the merits, drew a parallel with corporate procurement: traditional IT function in a company is concerned with “cost-benefit analysis and compute and password storage,” not broader risk. “Are IT and technology ministers the right people to opine on children and education, or mental health and wellbeing, or parasocial relationships, or bias and discrimination?” she asked. “That is not within their comfort zone.”
To be sure, India’s AI regulation rests on sectoral regulators, although most punitive action and enforcement is through Meity.
Testing tools don’t work
The tools available to evaluate AI systems, Chowdhury argued, are not fit for purpose — and the industry knows it. “I cannot emphasise enough that our testing mechanisms for gen AI models are insufficient. We don’t actually have rigorous scientific methods of testing these AI models,” she said. “Frontier model companies kind of handwave over it because they get to construct what safety means, absent scientific rigour.” Benchmarks, she said, “are really just question-answer pairs. Who creates them? Who decides these are representative?”
Red teaming, where experts try to make a tool break safeguards, fared no better: “a bunch of people in a room hacking at a model. That’s not really rigorous.” She described cherry-picking on both sides — companies testing only for harms they want to find, critics digging until they get the result they were looking for. The underlying problem is that the shift from deterministic machine-learning models to probabilistic generative AI has outpaced the evaluation methods built for an earlier generation of technology. “We have not developed the sort of scalable, rigorous technology” for testing probabilistic systems, she said. The result is what the industry calls “pilot purgatory” — models stuck in controlled demonstrations because evaluation tools are disconnected from real-world conditions.
The wrong conversation about jobs
Two days before this interview, Microsoft AI CEO Mustafa Suleiman had declared that all white-collar jobs would be automated within 18 months. Chowdhury was unequivocal: the claim is factually wrong and politically dangerous.
“Never in the history of human evolution have we ever done less work when we build more technology. We’ve always had more,” she said. Keynes had predicted 15-hour work weeks and lives of leisure. “That is an old story that never came to be,” Chowdhury said. AI improvement, she added, is not linear — she cited Yann LeCun and Fei-Fei Li investigating alternative architectures because current approaches face diminishing returns, compounded by AI-generated content poisoning training data.
But the problem lies in what doomsday framing does to policy. Real displacement is happening among young people — internships and entry-level positions are being squeezed. “We’re failing young people because it is true that it’s hard to get a job as an intern, because AI can do that stuff for you,” she said. “What can we build so that kids today don’t become a lost generation? This is not a story for 18 months from now. This is a story for today.”
“We can’t make sweeping policy decisions about global joblessness. We can make policy decisions about helping young people or entry-level workers. That is actually a problem we can tackle and should tackle.” But apocalyptic framing, she argued, makes even tractable problems impossible to address — comparing it to the film Don’t Look Up, in which the certainty of catastrophe paralyses any response. “When you speak that kind of language, there’s nothing you can do. You’ve completely alienated any possibility of doing anything.”
The law should lead
No jurisdiction has got AI governance right. Chowdhury’s preferred model is the EU’s Digital Services Act, not for its implementation but for its ambition. “They very explicitly codify the things against which social media platforms should be measured — adverse mental impact, impact on children, violation of fundamental human rights,” she said.
The technical capability to test against safe standards does not yet exist. In her view, that is fine. “If the law exists, then we are obligated to create the way to implement the law. It will be poorly implemented for five to eight years and then maybe become better. But it agenda-sets.”
India’s policymakers have cited the EU experience — the GDPR in particular — as a cautionary tale against regulating too early. Chowdhury’s counter: “We’ve experimented in the [permissive] direction for two decades. Why not try the other option?”
www.hindustantimes.com
#World #lacks #rigorous #scientific #methods #test #safety #Rumman #Chowdhury #India #News





