WhitepaperMay 26, 20268 min read

We Benchmarked CVoria Cover Letters. Here Is What Happened.

We ran a 60-comparison blind AI-judged benchmark of CVoria V2 against standard ChatGPT, Gemini, Claude, and CVoria V1 cover letters.

C
CVoria
Abstract benchmark scene showing two anonymized cover letters compared beside a dashboard

We wanted to answer a simple but uncomfortable product question: are our cover letters actually better than what someone gets by pasting a CV and job ad into a general AI model? So we ran a blind benchmark. In 60 AI-judged head-to-head comparisons, CVoria V2 was preferred 57 times.

Short version

CVoria V2 won 57 of 60 blind comparisons overall. It won 44 of 45 comparisons against standard ChatGPT, Gemini, and Claude prompts, and 13 of 15 comparisons against CVoria V1.

Why we ran this benchmark

Cover letters are easy to generate now. That is exactly why the bar is higher. A product like CVoria should not only produce clean text; it should produce a letter that feels specific to the role, grounded in the candidate’s CV, and worth reading for a recruiter.

The weak version of this claim would be “AI can write cover letters.” Everyone knows that. The real question is whether a structured CVoria workflow can beat ordinary prompting and improve on our own previous prompt system.

How the test worked

We used five Swedish CV and job-ad pairs across different role types: sales, care work, software development, digital marketing, and warehouse work. For each pair, we generated one CVoria V2 letter with Gemini Flash 3.5 and compared it against four baselines.

  • ChatGPT 5.5 Thinking with a simple standard cover-letter prompt.
  • Gemini 3.5 Flash with the same standard prompt.
  • Claude Haiku 4.5 with the same standard prompt.
  • Claude Haiku 4.5 CVoria V1, our earlier CVoria cover-letter setup.

Each head-to-head comparison was judged by ChatGPT 5.5 Thinking, Claude Opus 4.6 Thinking, and Gemini 3.1 Pro. The judge saw the CV, the job ad, Letter A, and Letter B. It did not see which system wrote which letter.

The results

Comparison sliceCVoria V2 winsWin rate
Overall benchmark57 / 6095.0%
Against standard prompts44 / 4597.8%
Against CVoria V113 / 1586.7%

The V1 comparison matters most to us internally. Beating a weak generic prompt is useful, but it does not prove the product improved. Beating our previous Claude Haiku 4.5-based CVoria setup is a stronger signal that the new Gemini Flash 3.5-based workflow is moving in the right direction.

What the judges rewarded

The winning letters were not simply longer or more polished. The judges repeatedly rewarded letters that made the candidate’s fit easier to understand without inventing details.

  • Sharper openings that gave the recruiter a reason to keep reading.
  • More role-sensitive writing instead of one generic “professional” tone.
  • Better use of concrete evidence from the CV.
  • Cleaner bridges when the candidate was not a perfect match.
  • Less unsupported enthusiasm, tool claims, relocation certainty, or invented motivation.

Where V2 still lost

CVoria V2 lost three comparisons. All three were concentrated in the marketing/content profile, where the job ad emphasized practical requirements like hands-on video production, travel, location, and a driving license.

That is useful feedback. V2 was conservative when the CV did not fully support those practical commitments. Some judges rewarded that honesty. Others preferred a more aggressive letter that sounded more committed, even when that commitment was not clearly supported by the CV.

What we can and cannot claim

We are intentionally careful with the wording. This benchmark does not prove that CVoria gets more interviews, callbacks, or job offers. That would require recruiter testing or a real-world field study.

The CVoria V1 and V2 comparison also includes a model change: V1 was generated with Claude Haiku 4.5, while V2 was generated with Gemini Flash 3.5. The result should therefore be read as a benchmark of the current generation setup, not as a prompt-only A/B test.

What we can say is narrower and still meaningful: in this controlled benchmark, AI judges preferred CVoria V2 cover letters in 57 of 60 blind head-to-head comparisons.

Why we publish the data

We are publishing the whitepaper and a public data package because this kind of claim should be inspectable. The package includes anonymized CV/job inputs, generated letters, blind judgments, the standard baseline prompt, the comparison prompt template, and a verification script.

We do not publish the proprietary CVoria generation prompt. That is the product logic. But the inputs, outputs, judgments, and result calculation are available so readers can inspect the benchmark without exposing candidate or company identities.

What this means for users

The practical takeaway is not that every generated letter is perfect. It is that a cover-letter system should do more than format generic enthusiasm. It should read the role, choose the right evidence, avoid unsupported claims, and create a first paragraph that makes the recruiter want to keep going.

Why user input still matters

The benchmark used only the CV and job description so the test stayed controlled. The real CVoria cover-letter workflow can use more than that. Users can add personal highlights, extra context, tone preferences, writing instructions, or an existing cover letter they want to adapt.

That matters because some of the strongest cover-letter details are not always visible in a CV: why the role interests you, whether you are open to travel or relocation, which achievement you want emphasized, or what kind of tone feels natural. The product should use those details when the user provides them, while still keeping the letter grounded in evidence.

You can try that workflow with your own CV and job description in the CVoria cover-letter generator.

Take the next step in the product

Use the blog to find the right angle, then move into the analyzer and guides to turn the advice into a concrete resume revision.