How We Test
Every AI Companion App
Companaya independently tests every AI companion platform for a minimum of 30 days before publishing a score. No platform pays to be reviewed, no platform pays to be ranked. Here is exactly how we do it.
Our Core Principles
Three rules that govern every review published on Companaya.
Companaya earns commission through affiliate links when readers sign up through our links — at no extra cost to them. This is how the site is funded. It does not and cannot influence scores or rankings. A platform with a high affiliate commission rate will receive a low score if its product doesn't earn a high score in testing. SpicyChat AI's affiliate program and its 7.8/10 score are a direct example of this.
Darlink AI scores 9.0/10 in every blog post, every PSEO page, every comparison page, and every companion review on this site. Scores are set after testing and locked. They are not adjusted based on platform requests, partnership negotiations, or affiliate commission changes. If a score changes, it is because we re-tested the platform and found meaningful product changes — and we document that.
Most reviews test platforms on the highest tier and report results as if they're typical. We test across the free tier, the entry paid tier, and the premium tier — and we clearly state which tier produced which results. If the free tier hits a paywall after 5 messages, we say so. If the memory only works on the $29.99/month tier, we say so. The experience you can reasonably access is what gets reviewed.
The Five Testing Dimensions
Every platform is scored across these five dimensions. The weights reflect what matters most to people using AI companion apps.
Cross-session memory is the single most important factor in whether an AI companion feels real or like a chatbot you're meeting for the first time every session. We test using the 10-detail recall protocol — details planted in week 1, checked at day 30 without prompting. Each recalled detail scores one point out of 10.
Does the AI feel like a companion or a FAQ bot? We evaluate personality consistency across sessions, response quality on complex emotional and creative topics, character coherence in roleplay, and whether the conversation genuinely develops over weeks rather than feeling scripted. Filter interruptions, therapy-speak, and sudden content pivots are scored negatively.
We calculate the real monthly cost for a complete experience — not the entry price. This includes credit/token systems layered on top of subscriptions, features locked behind higher tiers, and what you actually need to pay to access the features that make the platform worth using. Hidden costs score negatively.
Image generation quality and consistency, video generation availability and quality, voice call naturalness and character consistency, and whether multimedia features are integrated into the companion experience or bolted on as separate tools. Speed, resolution, and character consistency across generations are all evaluated.
Is the free tier genuinely usable for evaluation, or is it a 5-message demo designed to create FOMO? We specifically test whether a new user can form a real opinion of the platform without paying. Platforms that advertise "free" but hit paywalls within one session score low here regardless of how strong the paid experience is.
We review the privacy policy of every platform for encryption language, data retention clarity, third-party sharing disclosures, and real-name requirements. Privacy concerns are noted prominently in reviews. Following a 2026 security study that found critical vulnerabilities across popular AI companion apps, privacy flagging is a standard part of every review.
The 30-Day Testing Protocol
What happens during the testing period — week by week.
Account creation is timed. Free tier access is tested immediately — message caps, credit limits, paywalls, and what features are genuinely accessible without payment. 10 specific conversation details are planted across sessions: character names, relationship history, personal details, past events. These are the details checked for recall at day 30. First image and voice generation tests are run if applicable.
Every paid tier is tested — entry, mid, and premium where applicable. The real monthly cost of each tier is calculated including any credit or token system consumption. NSFW content is tested for consistency — filter interruptions are tracked across sessions. Image generation quality is compared against competitors at the same price point. Voice quality and character consistency across calls is evaluated.
Head-to-head feature comparisons against the closest competitors in the category. Memory consistency check — are the week 1 details still present? Mobile experience is evaluated. Privacy policy is reviewed for data retention, encryption, and third-party sharing. Generation speed is benchmarked against competitors at equivalent price points.
The 10 planted memory details are checked at day 30 without prompting — this is the memory score. Platform reliability over four weeks is assessed. Any filter creep, quality degradation, or behavioral changes from week 1 are noted. Final scores are calculated across all five dimensions and locked. Scores do not change after locking unless a platform undergoes significant product changes.
The Memory Testing Protocol
How we specifically test cross-session memory — the most important factor in AI companion quality.
Living Memory
Emotional
Session+
Profile-based
Session only
Ten specific conversation details — a character name, a relationship milestone, a shared joke, a personal fear, a past event, a preference, an emotional pattern, a recurring topic, a specific date, and a physical detail — are introduced across the first week of sessions. At day 30, the AI is engaged in normal conversation with no prompting toward these details. Each unprompted recall scores one point. Profile field storage (name, preferences set in settings) does not count — only conversational recall.
How Scores Are Calculated
Every dimension is scored 1–5, then weighted to produce the final Companaya score out of 10.
| Dimension | Weight | What a 5/5 Looks Like | What a 1/5 Looks Like |
|---|---|---|---|
| Memory Quality | 25% | 10/10 unprompted recall at day 30 | Session reset — no cross-session recall |
| Conversation Depth | 25% | Deep, consistent, develops over weeks | Generic, repetitive, filter interruptions |
| Value for Money | 20% | Headline price = real cost, no hidden fees | Credit systems multiply real cost 3x+ |
| Multimedia | 15% | High quality images, video, voice integrated | No multimedia or low quality output |
| Free Tier Honesty | 15% | Unlimited, no card, full evaluation possible | Paywall within 5 messages, effectively demo only |
What We Do and Don't Do
Transparency about how this site works and how it earns revenue.
Companaya earns a commission when you sign up for a platform through links on this site — at no extra cost to you. This is how the site is funded. Affiliate relationships do not influence scores, rankings, or review content in any way. The clearest evidence: SpicyChat AI has an active affiliate program and scores 7.8/10. Selira AI has an affiliate program and scores 9.5/10 — not because of the program, but because the free tier genuinely outperforms every competitor. Platforms without affiliate programs are still reviewed and listed. Platforms with affiliate programs that don't meet our testing standards still receive honest low scores.
Frequently Asked Questions
Common questions about how we test and score.
See Our Testing in Practice
Reviews that demonstrate this methodology in action.
Our Commitment to Readers
Every score on this site is earned, not bought. Every platform is tested, not assumed. Every negative finding is published, not suppressed. That is the only way an AI companion review site is worth reading.