Our Testing Methodology 2026 — How Companaya Reviews AI Companion Apps | Companaya
Updated June 2026 · Version 3.1

How We Test
Every AI Companion App

Companaya independently tests every AI companion platform for a minimum of 30 days before publishing a score. No platform pays to be reviewed, no platform pays to be ranked. Here is exactly how we do it.

No sponsored rankings 30-day minimum test period Standardised memory testing Affiliate links — scores unaffected Scores locked across all content
40+
Platforms tested to date
30
Minimum test days per platform
10
Memory details planted per test
0
Platforms that paid for placement

Our Core Principles

Three rules that govern every review published on Companaya.

Principle 01
No platform pays for placement, reviews, or scores

Companaya earns commission through affiliate links when readers sign up through our links — at no extra cost to them. This is how the site is funded. It does not and cannot influence scores or rankings. A platform with a high affiliate commission rate will receive a low score if its product doesn't earn a high score in testing. SpicyChat AI's affiliate program and its 7.8/10 score are a direct example of this.

Principle 02
Every score is locked and consistent across all content

Darlink AI scores 9.0/10 in every blog post, every PSEO page, every comparison page, and every companion review on this site. Scores are set after testing and locked. They are not adjusted based on platform requests, partnership negotiations, or affiliate commission changes. If a score changes, it is because we re-tested the platform and found meaningful product changes — and we document that.

Principle 03
We test at the same tier you pay for — not the best possible tier

Most reviews test platforms on the highest tier and report results as if they're typical. We test across the free tier, the entry paid tier, and the premium tier — and we clearly state which tier produced which results. If the free tier hits a paywall after 5 messages, we say so. If the memory only works on the $29.99/month tier, we say so. The experience you can reasonably access is what gets reviewed.

The Five Testing Dimensions

Every platform is scored across these five dimensions. The weights reflect what matters most to people using AI companion apps.

🧠
Memory Quality
25% of score

Cross-session memory is the single most important factor in whether an AI companion feels real or like a chatbot you're meeting for the first time every session. We test using the 10-detail recall protocol — details planted in week 1, checked at day 30 without prompting. Each recalled detail scores one point out of 10.

💬
Conversation Depth
25% of score

Does the AI feel like a companion or a FAQ bot? We evaluate personality consistency across sessions, response quality on complex emotional and creative topics, character coherence in roleplay, and whether the conversation genuinely develops over weeks rather than feeling scripted. Filter interruptions, therapy-speak, and sudden content pivots are scored negatively.

💰
Value for Money
20% of score

We calculate the real monthly cost for a complete experience — not the entry price. This includes credit/token systems layered on top of subscriptions, features locked behind higher tiers, and what you actually need to pay to access the features that make the platform worth using. Hidden costs score negatively.

🎬
Multimedia Completeness
15% of score

Image generation quality and consistency, video generation availability and quality, voice call naturalness and character consistency, and whether multimedia features are integrated into the companion experience or bolted on as separate tools. Speed, resolution, and character consistency across generations are all evaluated.

🎁
Free Tier Honesty
15% of score

Is the free tier genuinely usable for evaluation, or is it a 5-message demo designed to create FOMO? We specifically test whether a new user can form a real opinion of the platform without paying. Platforms that advertise "free" but hit paywalls within one session score low here regardless of how strong the paid experience is.

🔒
Privacy Practices
Noted — not scored

We review the privacy policy of every platform for encryption language, data retention clarity, third-party sharing disclosures, and real-name requirements. Privacy concerns are noted prominently in reviews. Following a 2026 security study that found critical vulnerabilities across popular AI companion apps, privacy flagging is a standard part of every review.

The 30-Day Testing Protocol

What happens during the testing period — week by week.

Week 1 — Days 1–7
Setup, Free Tier Evaluation, and Memory Seeding

Account creation is timed. Free tier access is tested immediately — message caps, credit limits, paywalls, and what features are genuinely accessible without payment. 10 specific conversation details are planted across sessions: character names, relationship history, personal details, past events. These are the details checked for recall at day 30. First image and voice generation tests are run if applicable.

Week 2 — Days 8–14
Paid Tier Testing and Core Feature Evaluation

Every paid tier is tested — entry, mid, and premium where applicable. The real monthly cost of each tier is calculated including any credit or token system consumption. NSFW content is tested for consistency — filter interruptions are tracked across sessions. Image generation quality is compared against competitors at the same price point. Voice quality and character consistency across calls is evaluated.

Week 3 — Days 15–21
Competitor Comparison and Feature Deep Dive

Head-to-head feature comparisons against the closest competitors in the category. Memory consistency check — are the week 1 details still present? Mobile experience is evaluated. Privacy policy is reviewed for data retention, encryption, and third-party sharing. Generation speed is benchmarked against competitors at equivalent price points.

Week 4 — Days 22–30
Memory Recall Test, Reliability Check, and Final Scoring

The 10 planted memory details are checked at day 30 without prompting — this is the memory score. Platform reliability over four weeks is assessed. Any filter creep, quality degradation, or behavioral changes from week 1 are noted. Final scores are calculated across all five dimensions and locked. Scores do not change after locking unless a platform undergoes significant product changes.

The Memory Testing Protocol

How we specifically test cross-session memory — the most important factor in AI companion quality.

📋 10-Detail Recall Protocol — June 2026 Results
10
Darlink AI
Living Memory
8
Nomi AI
Emotional
7
OurDream AI
Session+
3
Candy AI
Profile-based
0
SpicyChat AI
Session only

Ten specific conversation details — a character name, a relationship milestone, a shared joke, a personal fear, a past event, a preference, an emotional pattern, a recurring topic, a specific date, and a physical detail — are introduced across the first week of sessions. At day 30, the AI is engaged in normal conversation with no prompting toward these details. Each unprompted recall scores one point. Profile field storage (name, preferences set in settings) does not count — only conversational recall.

How Scores Are Calculated

Every dimension is scored 1–5, then weighted to produce the final Companaya score out of 10.

Dimension Weight What a 5/5 Looks Like What a 1/5 Looks Like
Memory Quality 25% 10/10 unprompted recall at day 30 Session reset — no cross-session recall
Conversation Depth 25% Deep, consistent, develops over weeks Generic, repetitive, filter interruptions
Value for Money 20% Headline price = real cost, no hidden fees Credit systems multiply real cost 3x+
Multimedia 15% High quality images, video, voice integrated No multimedia or low quality output
Free Tier Honesty 15% Unlimited, no card, full evaluation possible Paywall within 5 messages, effectively demo only

What We Do and Don't Do

Transparency about how this site works and how it earns revenue.

✦ We Do
Test every platform independently for 30 days minimum
Give honest low scores when platforms deserve them
Disclose affiliate relationships on every page
Lock scores consistently across all content types
Flag privacy concerns prominently even for partners
Document score changes when platforms improve or decline
Test real pricing including credit systems and hidden costs
✦ We Don't
Accept payment for reviews, scores, or rankings
Adjust scores based on affiliate commission rates
Give higher placement to platforms that request it
Test only the premium tier and report it as typical
Remove negative findings from reviews under platform pressure
Give identical scores to avoid platform complaints
Hide the methodology or make scores unverifiable
Affiliate Disclosure
How Companaya Earns Revenue

Companaya earns a commission when you sign up for a platform through links on this site — at no extra cost to you. This is how the site is funded. Affiliate relationships do not influence scores, rankings, or review content in any way. The clearest evidence: SpicyChat AI has an active affiliate program and scores 7.8/10. Selira AI has an affiliate program and scores 9.5/10 — not because of the program, but because the free tier genuinely outperforms every competitor. Platforms without affiliate programs are still reviewed and listed. Platforms with affiliate programs that don't meet our testing standards still receive honest low scores.

Frequently Asked Questions

Common questions about how we test and score.

Every platform is tested for a minimum of 30 days across five dimensions: cross-session memory recall (25%), conversation depth (25%), value for money (20%), multimedia completeness (15%), and free tier honesty (15%). Memory is tested by planting 10 specific conversation details in week 1 and checking for unprompted recall at day 30. Pricing is verified by testing every tier and calculating the true monthly cost including credit systems. No platform pays for placement or reviews.
No. Companaya does not accept payment for reviews, rankings, or placement. Every score is based on independent 30-day testing. Companaya earns commission through affiliate links when users sign up — this does not influence scores or rankings. A platform with a high affiliate commission rate will receive a low score if its product does not earn a high score in testing.
Each of the five testing dimensions is scored 1–5. Memory quality (25%) and conversation depth (25%) carry the highest weight because they most directly determine whether an AI companion experience is worth using. Value for money (20%) reflects real pricing transparency. Multimedia (15%) and free tier honesty (15%) complete the score. Dimension scores are weighted and converted to a 10-point scale. Scores are locked after testing and never adjusted for affiliate or partnership reasons.
Ten specific conversation details are planted in week 1 sessions: a character name, a relationship milestone, a shared joke, a personal fear, a past event, a preference, an emotional pattern, a recurring topic, a specific date, and a physical detail. At day 30, the AI is engaged in normal conversation with no prompting toward these details. Each unprompted recall scores one point. Profile field storage — name, preferences set in account settings — does not count. Only details the AI recalls from actual conversation history score points. Only Darlink AI achieved a perfect 10/10 score in this testing as of June 2026.
Yes — if a platform undergoes significant product changes (major model upgrade, memory architecture change, pricing restructure, content policy change) we re-test and update scores. Score changes are documented in the review with a dated update note. Scores do not change based on platform requests, partnership negotiations, or affiliate commission changes. Scores change only when the product changes in ways that materially affect the testing dimensions.
Affiliate links are how independent review sites are funded without charging readers or accepting advertiser payments. The alternative — accepting payment from platforms for placement — is what most review sites in this category do. Companaya chose affiliate-funded independence over paid placement. The evidence of independence is in the scores: platforms with active affiliate programs score 7.0/10 and 7.8/10 on this site. If affiliate income drove rankings, those platforms would score 9.5/10. They don't.

Our Commitment to Readers

Every score on this site is earned, not bought. Every platform is tested, not assumed. Every negative finding is published, not suppressed. That is the only way an AI companion review site is worth reading.

✓ Independent testing ✓ No sponsored placements ✓ Scores locked across all content ✓ Honest negative findings published ✓ Real pricing calculated