Methodology

How we detect and
eliminate AI fingerprints.

AI detectors don't use magic. They measure specific statistical properties of text that differ between human writers and language models. This page explains exactly what we measure, why those measurements detect AI, and how our humanization pipeline eliminates those signals.

Live demo

Check your text for AI patterns right now.

Paste any text below to run our 8-dimension analysis. The tool runs entirely in your browser — nothing is sent to a server at this stage. You'll see an AI score and a breakdown of which patterns were detected.

0 words · free · unlimited

Scoring system

Eight independent signals, one composite score.

Our AI score is a weighted composite of eight independent measurements. Each dimension targets a distinct property of AI-generated text. The weights are calibrated against labeled corpora — texts we know to be AI-generated versus texts written by humans.

01

Sentence Length Variance (Burstiness)

High signal

Human writers naturally mix very short and very long sentences. A paragraph might go: two words. Then a much longer sentence that develops an idea with subordinate clauses and qualifications. Then another short one. Language models produce sentences at a predictably similar length — usually 18–25 words each. We measure this as the coefficient of variation (CV%) of sentence lengths. CV% below 40% strongly signals AI authorship.

AI PATTERN

"Each dimension is assessed with precision. The scoring system evaluates multiple factors. Results are presented in a clear format."

HUMAN PATTERN

"Short. Then a longer sentence that shows natural variation in rhythm and length. Then short again."

02

AI Vocabulary Detection

High signal

Language models are trained on internet text and develop strong statistical preferences for certain words. "Delve" appears in LLM output at roughly 100x the rate it appears in natural human writing. "Crucial," "straightforward," "tapestry," "foster," "leverage," "embark," "realm," and "utilize" follow the same pattern. We maintain a list of 60+ overused AI terms. High density of these terms is a strong detection signal.

AI PATTERN

"It is crucial to delve into this realm and leverage straightforward approaches to foster meaningful results."

HUMAN PATTERN

"You need to dig into this area and use simple methods to get real results."

03

Transition Word Density

Medium signal

AI models overuse logical connectors — "furthermore," "moreover," "additionally," "in conclusion," "it is worth noting," "consequently." These words signal structured reasoning, which LLMs produce at a rate far above natural human writing. Humans connect ideas more implicitly, or simply start a new sentence without a connector. We count transition word frequency per 100 words. Above 5 per 100 words is a yellow flag.

AI PATTERN

"Furthermore, this approach is effective. Moreover, it has been tested extensively. Additionally, the results confirm the hypothesis."

HUMAN PATTERN

"This approach works well. We have tested it extensively and the results back it up."

04

Passive Voice Ratio

Medium signal

Passive voice is not an AI-specific pattern, but AI models overuse it relative to casual human writing. "The analysis was conducted" instead of "we ran the analysis." "Results were observed" instead of "we saw." Formal academic writing uses passive voice legitimately, but when passive constructions appear alongside other AI signals, the combination is diagnostic. We measure passive voice sentences as a percentage of total sentences.

AI PATTERN

"The experiment was conducted, results were analyzed, and conclusions were drawn by the research team."

HUMAN PATTERN

"The research team ran the experiment, analyzed the results, and drew their conclusions."

05

Short Sentence Presence

Supporting signal

This dimension works in tandem with burstiness. We specifically check whether any sentences under 8 words exist in the text. Human writers naturally produce very short sentences — fragments, punchy one-liners, abrupt statements for emphasis. LLMs rarely produce sentences this short unless specifically prompted. Complete absence of short sentences is a weak but consistent AI signal.

06

Em Dash Usage

Medium signal

Em dashes are a stylistic marker that some models — particularly GPT-4 and Claude — overuse dramatically. A human writer might use one or two em dashes per page. AI-generated text sometimes places em dashes in nearly every paragraph. We flag texts with high em dash frequency relative to sentence count. This dimension matters specifically for GPT-4 and Claude output, which show this pattern most strongly.

07

List and Parallel Structure

High signal

LLMs default to structured output — numbered lists, bullet points, symmetrical three-part constructions. "First... Second... Third..." patterns. "Not only X, but also Y" constructions. Balanced parallel clauses. These structures signal organized, machine-generated reasoning. Human writing is messier — ideas trail off, points get developed unevenly, lists are less common. High parallel structure density is one of the strongest AI detection signals we measure.

AI PATTERN

"First, analyze the data. Second, identify the patterns. Third, draw your conclusions based on the evidence gathered."

HUMAN PATTERN

"Look at the data, see what patterns show up, and go from there."

08

Contraction Frequency

Supporting signal

Contractions (it's, don't, you'll, we're) signal casual register. LLMs writing in formal mode avoid contractions entirely, producing text that reads as unnaturally stiff. Low contraction frequency combined with other signals suggests AI authorship. Note: this dimension is register-dependent — academic writing legitimately avoids contractions — so we weight it lower and primarily use it as a tiebreaker when other signals are mixed.

AI PATTERN

"It is important to understand that these patterns do not appear in isolation. They are part of a larger system."

HUMAN PATTERN

"It's worth knowing these patterns don't appear alone. They're part of something bigger."

Key insight

Why burstiness is the most important signal.

Burstiness — the term researchers use for high variance in sentence length — is the single most reliable indicator of human authorship. The reason is structural: language models are trained to produce coherent, well-organized text. That training systematically pushes toward uniform sentence length. A human writer doesn't optimize for coherence at the sentence level — they write until the thought is expressed, which produces wildly variable lengths.

The coefficient of variation (CV%) measures this directly. CV% is the standard deviation of sentence lengths divided by the mean, expressed as a percentage. Human writing typically has CV% between 50–90%. AI-generated text clusters between 20–40%. This single measurement alone can separate human from AI-authored text with roughly 80% accuracy.

Our humanization prompt specifically instructs the model to vary sentence length dramatically — mixing sentences under 8 words with sentences over 35 words in the same paragraph. The post-processing layer then checks the resulting CV% and flags any output that remains in the AI range.

Per-token probability

Why synonym swapping doesn't fool Turnitin.

Academic AI detectors like Turnitin use a different approach from pattern-based tools: they measure per-token perplexity, which is a measure of how "surprising" each word choice is relative to what a language model would predict. LLMs tend to choose highly probable tokens — words that are the obvious continuation of the phrase. Human writers make unexpected word choices more often.

Low perplexity = each word was predictable = likely machine-generated. High perplexity = word choices were surprising = likely human. This is why simple synonym swapping doesn't work: if you replace "utilize" with "use," you've changed one high-probability word for another high-probability word — the perplexity stays low.

Refrazr addresses this by rewriting entire sentence structures. When the grammatical construction changes, the conditional probability of each subsequent word changes too. Structural rewriting raises per-token entropy in a way that vocabulary substitution cannot. This is why we see consistent results against Turnitin, which specifically targets perplexity, while tools that only paraphrase do not.

The pipeline

Four steps from AI text to human text.

01

Pattern Analysis

Your text is scored across all 8 dimensions. The results are packed into the LLM prompt — not just "humanize this," but "this text has CV% of 28%, remove the 4 list constructions, and eliminate these specific transition phrases." The model receives precise instructions, not vague goals.

02

LLM Deep Rewriting

The LLM (DeepSeek V3 via OpenRouter) rewrites the text with structural-level instructions. It is explicitly told not to: use transition phrases, construct parallel lists, maintain uniform sentence length, or use any word from our AI vocabulary list. It is told to: vary sentence length dramatically, use casual register, write imperfect non-symmetrical sentences.

03

60+ Post-Processing Rules

The LLM output runs through a deterministic post-processor. These rules catch what the probabilistic model misses — specific AI vocabulary that survived the rewrite, any em dash overuse, transition phrases that reappeared, symmetrical constructions that snuck back in. This layer is fast, reliable, and runs every time regardless of model quality.

04

Quality Scoring and Retry

The processed output is scored again. If the AI score is still high (above 30%), the pipeline retries — up to 4 attempts. When multiple attempts are made, the result with the lowest AI score is returned. In practice, the first pass usually achieves a score below 10%.

Try it

Scroll up to test your own text.

Free, unlimited browser analysis. No account needed to run the 8-dimension breakdown.

Humanize Your Text Free