The genie won't go back in the bottle: AI, jobs and why student panic is miscalibrated

🕒 Published on AI Momentum: June 30, 2026 · 03:40
The article, published on June 26, 2026 in Forward Future and signed by independent analyst Deepanshu Sharma, starts from a striking image: at recent graduation ceremonies at several prestigious universities, several guest speakers met with rejection from the student audience whenever…
The article, published on June 26, 2026 in Forward Future and authored by independent analyst Deepanshu Sharma, opens with a striking image: at the latest graduation ceremonies of several prestigious universities, several guest speakers drew pushback from the student audience every time they mentioned artificial intelligence. This is not anecdotal; it reflects a collective mood that has spread quickly among young people about to enter the job market. The question underlying that reaction is always the same: is AI going to eliminate every job?
To answer rigorously, Sharma begins by dissecting the GDPval benchmark, an evaluation framework introduced by OpenAI in September 2025. GDPval covers 44 occupations distributed across the 9 most important industrial sectors and assesses the performance of large language models (LLMs) on 1,320 specialized tasks. Its stated goal is to measure AI capabilities in direct comparison with the real work people do. The headlines it generated were striking: Claude Opus 4.1 reached 47.6% on the evaluation, which was presented in many outlets as «almost at industry expert level».
However, Sharma warns that the reality is «slightly twisted», and explains why the raw number is misleading. First, GDPval does not evaluate all the tasks of any specific occupation, but only a subset of them; and from that subset, it includes only those that can be carried out entirely digitally. More restrictive still: it only considers tasks that can be clearly specified from the outset, which automatically excludes any scenario in which the information changes midway through the process—something that happens constantly in real work environments. These limitations are spelled out in the original paper itself, but rarely accompany the press headlines.
Sharma's second line of criticism focuses on speed and cost. One of the usual arguments in favor of adopting AI at work is that it completes tasks faster than a human, which should translate into time and money savings. The GDPval data drastically qualifies that argument: according to the original paper, only GPT-5 shows speed improvements with any practical significance. More surprising still, the use of GPT-4o resulted in a net slowdown of the work: the total time spent, including reviewing the model's outputs, was greater than without using the tool. This points to a systemic problem that the public debate tends to ignore: the cost of supervision is not free.
Added to this is the risk of catastrophic errors. The GDPval paper itself acknowledges that, in approximately 2.7% of cases, the model makes mistakes whose severity can wipe out any economic benefit derived from its use. The examples the study mentions are telling: insulting a customer, issuing an erroneous medical diagnosis, recommending a fraudulent action, or suggesting something that could cause physical harm. An error of that magnitude in a professional context is not a minor statistical mishap; it can have legal, reputational, or even safety consequences that no productivity gain offsets.
In this context, a public clarification by Sam Altman becomes especially relevant. When an interviewer asked him why there had been so much negative reaction following the announcement that GPT-5.2 «surpasses professionals in 44 occupations», Altman responded with a significant correction: «What I wish I had said then is that it surpasses professionals at small tasks within those 44 occupations, which I think is a more accurate claim». This distinction—between surpassing a professional at a narrow micro-task versus surpassing a professional at their job in a broad sense—is precisely what gets lost in the popular narrative, and it is what fuels the student panic the article seeks to defuse.
Sharma stresses that this misunderstanding matters, because GDPval remains the reference benchmark used to evaluate the most recent models. The problem is not the benchmark itself, but the way its results are communicated and interpreted outside the academic context. The emotional narrative—«AI already surpasses experts at everything»—spreads far faster than the methodological nuances.
But the article does not dwell on the past. Sharma recalls that in AI «a month feels like a year» and shifts the focus to the most recent developments. On June 19, 2026, Artificial Analysis presented a new benchmark called AA-Briefcase, designed specifically to evaluate models' performance on agentic-type knowledge work with real-world complexities. Unlike GDPval, AA-Briefcase simulates projects that span weeks, require processing thousands of inputs—corporate documents, meeting transcripts, massive data exports, Slack messages, emails—and demand contextual judgment sustained over time.
The AA-Briefcase results are revealing. The best-performing model, identified in the article as Fable 5, successfully completed only 3% of the tasks. In 31 of the 91 tasks included in the evaluation, no model even reached 50% on the evaluation rubric's criteria. Sharma explains the reasons for the failure in two categories: the most capable models fail by not detecting subtle requirements hidden within the task or context files; the less capable models do not even achieve basic execution, whether by ignoring the input files, delivering unusable work, or producing nothing at all.
This gap between performance on well-specified digital tasks (GDPval) and performance on complex knowledge work in real environments (AA-Briefcase) is probably the most valuable insight in the article. It shows that today's AI is extraordinarily good in a relatively narrow task space and enormously deficient in another space—that of long, ambiguous, and multidimensional projects—which represents a good part of what highly qualified professionals do.
Sharma does not, however, fall into the naive optimism of saying that everything is fine and there is nothing to fear. His position is more nuanced: it is true that AI is far from eliminating professionals as a category, but it is also true that within any profession there are a significant number of subtasks at which AI already clearly surpasses humans. Ignoring that reality and failing to automate those subtasks would be, in his words, «a costly mistake». The point of balance between irrational fear and equally irrational complacency lies in precisely identifying which part of the work can and should be delegated to the machine, and which part requires human judgment, experience, and responsibility.
The article then returns to the initial question—what someone should study today in the face of AI's advance—through the answer that Google DeepMind CEO Demis Hassabis gave to Stanford President John Levin in a recent interview. Hassabis said that if he were back in university, he would feel tremendously excited. His advice was clear: those studying science, STEM, mathematics, or computer science should keep doing so, because understanding how these tools are built and what they are capable of doing will give them a real edge for at least the next ten years. And he added the phrase that gives the article its title: «The genie does not go back in the bottle». There is no going back to a pre-AI era, neither in personal nor in professional life.
Sharma endorses that view and develops it. In his opinion, two profiles will come out ahead in the new landscape. The first is the domain expert—or whoever has the discipline to become one—because only someone who knows a field in depth knows exactly which tasks make sense to delegate to AI and which require their judgment. The second profile is the accelerated learner: someone who uses AI as a learning engine to expand their scope of competence into new domains quickly, what the article calls «vertical integration». AI democratizes access to knowledge in a way that was previously unthinkable; Sharma illustrates this with his own experience: today he can summarize multiple academic papers in minutes, extract data from lengthy documents, detect perspectives he would have overlooked, and start from scratch in a new field without the brake of not knowing the basic fundamentals.
The article concludes with a pragmatic call to action: the only sensible response to this generational transformation is to put the technology on your side. That does not mean surrendering to the hype or capitulating to fear, but rather developing the ability to identify which tasks can be outsourced to the machine in order to free up time and cognitive energy for what the machine still cannot do, and probably will not be able to do for a long time: expert judgment, responsibility, deep creativity, and navigating ambiguity in contexts where information changes constantly.
In terms of usefulness for the reader of the Agentic AI newsletter, this article provides three elements of specific value. First, a well-grounded demystification of the most-cited benchmarks in the AI-and-employment debate, with particular attention to the methodological limitations that press headlines systematically ignore. Second, a window onto the current frontier of agentic-systems evaluation—Artificial Analysis's AA-Briefcase benchmark—which confirms that complex, long-horizon knowledge work remains an open and difficult problem for today's models. Third, a simple but well-articulated strategic framework for thinking about how to position oneself professionally in the face of AI: not as the victim of an inevitable process, but as an agent who can decide what to delegate, what to learn, and what to become with AI as an ally.