GPT-5 vs Gemini 2.5 – A Big Test of AI Models in Analytical Tasks

10 August 2025

I tested GPT-5 against Gemini 2.5 in four real-world data analysis tasks – from Excel summaries and SQL optimization to complex business analytics. The results may surprise you.

There’s been a lot of hype around GPT-5 long before its release. OpenAI promised a revolution, Sam Altman painted a picture of a breakthrough, and many content creators online fueled the narrative, saying no other model would stand a chance. At the same time, the first critical voices started appearing: questionable demos, mediocre charts, the absence of promised multimodality, and suspicions of “creative” interpretations of performance metrics.

I decided to find out for myself how GPT-5 compares to Google’s current flagship model – Gemini 2.5. I wasn’t interested in sterile benchmarks. Instead, I prepared four practical analytical tasks – ranging from simple to genuinely complex – and tested both models to see which one would perform better. My goal was to answer one question: is GPT-5 truly a leap forward, or just a small incremental update disguised as a major release?

How the test was set up

I designed four analytical tasks with increasing levels of difficulty. The first was simple – analyzing an Excel file and creating a short summary of sales results. The second involved optimizing a not-so-great but functional SQL query. The third – analyzing the correlation between YouTube views, blog views, and sales. And the fourth – evaluating the business viability of introducing a subscription model.

Each task was meant to test different skills: response speed, code quality, business reasoning, and the ability to work with messy data. The goal was simple – simulate real-world challenges a data analyst faces.

Test 1 – Simple sales summary in Excel

The first task was almost trivial: a single spreadsheet and a request to produce a short LinkedIn-style summary of sales results.

GPT-5 jumped straight into generating code, while Gemini 2.5 described the process step by step. The execution time was almost identical, differing only by a few seconds. The results? Both models did a good job, but Gemini’s output felt more professional and less saturated with emojis – a clear plus for business communication. First (subjective) point goes to Google.

Test 2 – SQL code optimization

The second task was more interesting: a long, mediocre SQL query in need of optimization. This is exactly the type of challenge where GPT-5 was said to crush the competition.

This time, Gemini was faster. Within moments, it provided optimized code with clear comments and explanations. GPT-5 took longer and delivered an overly complex solution – a bit of “over-engineering.” While GPT’s code was correct, for someone looking for a practical, ready-to-use answer, Gemini’s approach was more efficient.

Test 3 – Correlation analysis: YouTube, blog, and sales

The third task ramped up the difficulty. Three datasets and a business question: do my content views actually impact sales? The task was inspired by challenges I run in the KajoDataSpace community.

Both models quickly identified that direct correlation was close to zero. But their approaches differed significantly. Gemini expanded on the analytical and business side – suggesting further avenues of analysis and asking about context. GPT-5, on the other hand, simply provided code and a brief statement: “correlation is zero.” That’s too shallow, especially when the data begs for a time-lag analysis.

Worse still, running the same query multiple times produced completely different results. This inconsistency is a major limitation for analytical work.

Test 4 – Subscription model business analysis

The final task combined analytics and business strategy. I wanted to check whether experimenting with a cheaper monthly subscription instead of a more expensive annual plan made sense. The data was messier and required cleaning.

Here, GPT-5 stumbled. It stopped to say the files needed “parsing” before analysis and didn’t proceed further. Frankly, even a junior analyst could have cleaned this data quickly. Gemini, however, immediately generated a clear chart and a relevant summary: subscriptions attract new customers and grow revenue but may cannibalize sales of higher-priced products.

Yes, Gemini also hit a minor glitch while rendering one chart, but it still provided a usable business answer – something I could work with.

My takeaways from the whole test

Is GPT-5 a failure? Not exactly. In some cases, there’s noticeable improvement over previous versions. The issue is that the improvement is minimal – more like going from version 4.5.0 to 4.5.1, not from 4.5 to 5. And the marketing hype? Completely overblown, especially with no sign of the promised multimodality.

The biggest limitations I found:

Lack of result consistency,
Tendency to be either too shallow or unnecessarily complex,
Too little initiative in expanding the analysis.

Gemini isn’t perfect either, but in my tests, it more often delivered responses closer to what I’d expect from a tool assisting a data analyst.

Why this matters for beginners in data analytics

If you’re just starting your career, the worst thing you can do is outsource all your knowledge to an AI model. If you can’t independently verify the correctness of Python code or assess the validity of business conclusions, the model could lead you astray – and you might not even notice.

GPT-5 and Gemini can be great accelerators for your work, but they’re not a substitute for solid fundamentals. Just as Excel didn’t eliminate the need for analysts, AI won’t make knowledge obsolete.

Conclusion

My test showed that while GPT-5 is a step forward, it’s not the leap many were expecting. OpenAI’s marketing narrative didn’t survive contact with reality. The lack of multimodality, marginal differences in answer quality, and inconsistency in results make it hard to call this a revolution.

Gemini 2.5, while outperforming GPT-5 in several areas, still isn’t flawless. Both tools require a vigilant user who can review and refine their output.

The most important conclusion for me: these tools won’t replace analysts. They can speed up certain tasks and handle straightforward work, but the ultimate responsibility for analysis quality will always rest with the human. And that’s a good thing – because it’s our knowledge and critical thinking that determine whether an analysis truly makes sense.

Want to read in Polish? No problem!