ResearchMay 26, 2026

CVoria publishes cover letter benchmark whitepaper

Our benchmark whitepaper shares the methodology and results behind CVoria V2 cover letter generation.

57 of 60

CVoria V2 was preferred in 95.0% of all blind head-to-head comparisons.

44 of 45

Against standard ChatGPT, Gemini, and Claude prompts, CVoria V2 won 97.8%.

13 of 15

Against the previous CVoria V1 setup, CVoria V2 won 86.7%.

5 role types

The benchmark used Swedish CV/job pairs across sales, care work, software, marketing, and warehouse work.

Benchmark results

The benchmark measured AI-judged letter quality in blind one-to-one comparisons. It does not measure recruiter callbacks or hiring outcomes.

Overall benchmark57/60 · 95%

All baselines and judge models combined.

Against standard prompts44/45 · 97.8%

Compared with standard cover-letter prompts in ChatGPT, Gemini, and Claude.

Against CVoria V113/15 · 86.7%

Compared with the previous CVoria cover-letter setup.

Role breakdown

B2B sales100%

12/12

Care work100%

12/12

Software development100%

12/12

Digital marketing75%

9/12

Warehouse work100%

12/12

The headline result

CVoria V2 was preferred in 57 of 60 blind head-to-head comparisons. That is a 95.0% win rate across all baselines, role pairs, and judge models in the benchmark.

The strongest slice was against ordinary prompting. Compared with standard ChatGPT, Gemini, and Claude cover-letter prompts, CVoria V2 won 44 of 45 comparisons. Against the previous CVoria V1 setup, it won 13 of 15.

The purpose of the whitepaper is to make that claim inspectable. It includes the setup, role pairs, baseline prompt, blind judge prompt, anonymized outputs, judge decisions, limitations, and a verification script for recalculating the headline result.

How the benchmark worked

The test used five Swedish CV and job-ad pairs: B2B sales, care work, software development, digital marketing/content, and warehouse work. For each pair, CVoria V2 generated one cover letter and was compared against four baselines.

Those baselines were standard-prompt letters from ChatGPT, Gemini, and Claude, plus CVoria V1. Each comparison was judged by ChatGPT, Claude, and Gemini. The judge saw the CV, the job ad, Letter A, and Letter B, but did not see which system wrote which letter.

That produced 60 total judgments: 45 against standard prompts and 15 against CVoria V1.

Where CVoria V2 won

CVoria V2 won every comparison in four of the five role pairs: B2B sales, care work, software development, and warehouse work. It also won every comparison judged by ChatGPT and 19 of 20 comparisons judged by Claude.

The judge comments repeatedly favored letters that selected specific evidence from the CV, opened with a clear professional angle, handled gaps without inventing facts, and avoided generic enthusiasm.

The average judged score for CVoria V2 was 7.72 out of 10, compared with 5.80 for the baseline letters. The average judge confidence was 8.13 out of 10.

Where it lost, and why that matters

The three losses all came from the digital marketing/content pair. The job ad emphasized practical requirements such as hands-on video production, travel, location, and a driving license.

CVoria V2 was more conservative when the CV did not fully support those requirements. Some judges rewarded that restraint, while others preferred a more assertive letter that directly stated availability or commitment even when the CV evidence was weaker.

That is useful product feedback. A better cover-letter system should be grounded and honest, but it also needs to help users add missing practical context when the job requires it.

What the result does and does not prove

The benchmark measures judged cover-letter quality, not recruiter callbacks, interviews, or offers. It also includes a model change: CVoria V1 used Claude Haiku 4.5, while CVoria V2 used Gemini Flash 3.5.

The safest claim is therefore precise: in this controlled benchmark, AI judges preferred CVoria V2 cover letters in 57 of 60 blind head-to-head comparisons.

That still matters. It gives CVoria a repeatable evaluation method for future cover-letter changes, and it creates a public baseline that can be challenged instead of accepted as marketing copy.

What is included in the public package

The public package includes the whitepaper PDF, the anonymized CV and job inputs, generated letters, blind judge decisions, the standard baseline prompt, the comparison prompt template, and a verification script.

The proprietary CVoria V2 generation prompt is not included, but the evidence needed to inspect the public claim is available: inputs, outputs, judgments, and result calculation.

That is the point of publishing the whitepaper as more than a blog post. If we make a quality claim, readers should be able to see how it was produced and where its boundaries are.

Read the whitepaper Download public data