AI-Run Fake Company Fails Miserably in Carnegie Mellon Experiment

AI Company Experiment

AI isn’t ready to take over all jobs yet, as a recent experiment by Carnegie Mellon researchers demonstrated. A fake company, TheAgentCompany, staffed entirely by AI agents, achieved at best a 24% success rate in completing basic business tasks. This highlights the current limitations of AI in replacing human roles.

The Experiment

Researchers created a simulated software startup environment where AI models from companies like OpenAI, Anthropic, Meta, and Google were tasked with functions such as:

Analyzing spreadsheet data
Conducting performance reviews
Selecting a new office space

The results were far from promising. Claude from Anthropic performed the best, completing only 24% of tasks, while other models like Google’s Gemini and OpenAI’s ChatGPT managed around 10% success rates. The worst performer was Amazon’s Nova, which accomplished a mere 1.7% of tasks.

Cost and Efficiency Issues

The study also revealed that AI-run companies are prohibitively expensive. Each task cost an average of $6, and with approximately 30 tasks per job, the expenses quickly add up. This inefficiency underscores the impracticality of relying solely on AI for business operations.

Why AI Falls Short

AI lacks common sense and struggles with simple problems. For example, when a pop-up interrupted a task, the AI failed to close it and abandoned the job entirely—a problem any human could solve effortlessly. This highlights a critical flaw in current AI systems: they cannot adapt to unexpected challenges without human intervention.

The Bigger Picture

Despite the hype around AI’s potential, this experiment shows that human oversight remains essential. While AI can assist with specific tasks, it is far from capable of autonomously running a business. For more on AI’s limitations, check out this list of AI failures.

Key Takeaways:

AI models struggle with basic business tasks.
Costs for AI-run operations are high.
Human intervention is still necessary for problem-solving.

For further reading on AI’s environmental impact, see this report.

AI-Run Fake Company Fails Miserably in Carnegie Mellon Experiment

The Experiment

Cost and Efficiency Issues

Why AI Falls Short

The Bigger Picture

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Alex Thompson

Expertise

The Experiment

Cost and Efficiency Issues

Why AI Falls Short

The Bigger Picture

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Alex Thompson

Expertise

Agent Newsletter

Get Agentic Newsletter Today