How we built the AI Optimize feature

10 April 2025

Our plugin now features an AI-driven tool that suggests improvements to content based on SEO and readability assessments: Yoast AI Optimize. This was our first AI-powered feature designed to optimize posts automatically. In this article, we’ll take you through how we built it and why we embarked on this journey.

If you’ve used our plugin before, you’re probably familiar with our SEO and readability assessments. They analyze your content and provide feedback on its readability and keyword optimization. While this feedback is helpful, manually implementing every suggestion can be time-consuming, especially for those new to writing or SEO.

This sparked an idea: What if we could integrate a Large Language Model (LLM) to automate these improvements? With the right training, such a model could suggest changes to the user, instantly optimizing their content.

What is a Large Language Model (LLM)?

Machine learning is a branch of AI focused on teaching machines to learn and perform tasks. Large Language Models (LLMs) are a product of machine learning—AI models specifically designed to generate human-like text.

There are several LLMs available, one of the most popular ones of which is ChatGPT. There are also open-source options models like LLaMA. We integrated one of the most advanced models into our code and instructed it to apply automatic SEO and readability fixes using a technique called prompt engineering.

How prompt engineering shapes AI behavior

Prompt engineering is the process of giving AI instructions to achieve a desired outcome. It is similar to addressing an assistant. In our case, we designed specific prompts containing rules and examples for each SEO and readability fix. By doing so, we ensured the AI understood exactly what changes were needed.

But AI’s power is also its challenge. While LLMs are excellent at improving readability, detecting intent, and even grasping SEO fundamentals, they don’t always behave predictably. This led us to several hurdles we had to overcome. We built AI Optimize with the help of multiple teams: product owners, strategists, developers, researchers (including linguists), UX designers, and QA engineers. Having a mix of expertise was key to solving tricky challenges around compatibility, design, and how the feature should work. For many of us, working with AI was new, and this was our first feature designed to assist users by co-writing. That meant we had a lot of tough questions to figure out along the way.

Key challenges and how we solved them

1. AI fixing too much or too little

LLMs can be aggressively helpful. They are designed to be proactive. When given a task, they don’t just follow instructions; they try to anticipate the user’s needs and suggest additional help. For example, when asked to shorten long sentences in a text, they might also correct grammar, tweak wording, or restructure paragraphs. While this can be helpful for overall editing, it often leads to excessive changes that go beyond what the user intended.

On the other hand, sometimes AI doesn’t fix enough. If a model shortens only a couple of long sentences in a lengthy post, the overall readability might not improve significantly.

Our solution: We refined our prompts to be as precise as possible, clearly defining the AI’s scope of work. This helped balance the AI’s enthusiasm with controlled, purposeful changes. However, we also emphasized that human revision remains irreplaceable—AI may optimize content, but only a person can ensure it fully aligns with their intent and brand voice.

2. Deciding on the scope of fixes

Should AI optimize everything at once or handle one issue at a time? We debated whether to bundle related assessments (like keyphrase density and placement) into a single fix or keep them separate.

Our solution: After weighing the pros and cons, we opted for the AI to propose suggestions for improvements to individual assessments. Firstly, because it’s hard for AI to follow many rules in one go. This approach also offers more flexibility, allowing users to gradually improve their content rather than having all changes applied at once. In this way, users understand exactly what modifications are being made. Moreover, the changes AI Optimize feature proposes are merely suggestions rather than automatic fixes. Providing separate suggestions per assessment leaves the user with more choices on what improvements to accept.

3. Making the AI follow rules

While LLMs are trained on vast amounts of data, they don’t inherently know SEO best practices. Without clear guidance, they might optimize content based on assumptions rather than structured principles.

Our solution: We embedded strict rules within our prompts. These ensured that the AI adhered to key SEO requirements, such as maintaining keyword density and placing keyphrases appropriately. Moreover, we learned that the order of instructions matters—guiding AI through a logical reasoning process produces better results.

While we did find a solution, we had to accept that AI wouldn’t always follow the rules. For example, for our Keyphrase-based assessments, we instructed the LLM to use different grammatical forms of the keyphrase (e.g., both “tent for camping beginners” and “camping tents for beginners”) without changing any content words from the keyphrase. But we then found out that while our plugin considers keyphrases and their synonyms separately, AI treats them as the same entity.

Like Google, AI understands synonyms and often swaps words for variation. As a result, the LLM sometimes dropped or replaced words in a keyphrase, changing “camping for beginners” to “camping for the first time,” for example. It also occasionally splits keyphrase words across two sentences instead of adding all words from the keyphrase in one sentence. Is that incorrect? Not necessarily. Does it weaken keyphrase optimization? That depends on the keyphrase placement and distribution. Since these nuances are beyond the LLM’s current understanding, occasional rule-breaking is unavoidable.

4. Helping the AI estimate numbers

AI products process numbers differently than humans or calculational programs. They are good at estimating, but not at counting, so they try to predict the number you request based on context. This was illustrated well with the infamous “strawberry” example, where LLMs like GPT-4o and Claude counted 2 letters “r” in “strawberry” instead of 3. This is because AI isn’t actually reading your text. It’s converting input text into smaller units or ‘tokens’ such as words or subwords. So while it knows that the tokens “straw” and “berry” make “strawberry”, it may not understand what letters it’s composed of, especially the order and amount of those letters.

Our solution: In tasks where numbers matter (like reducing the number of words in sentences that were too long), we had to think outside of numbers. What is usually communicated through numbers we communicated with language. For example, instead of telling AI to target sentences above 20 words, we asked it to target sentences that are too long. However, it turned out that our idea of what makes a sentence too long can sometimes differ from the LLM’s. So in some cases, especially readability assessments with concrete thresholds and percentages, we had to accept that not all changes that the LLM suggests will improve the assessment score.

This doesn’t mean it’s not useful to apply numbers in your prompts, they are taken into consideration by the LLM. However, mistakes are expected because the LLM’s response is based on an assumption instead of a calculation.

5. Managing AI’s content awareness and token limits

AI processing takes a lot of resources (including energy), and LLMs have token limits. Sending the entire post as input ensures maximum context but drains resources and can slow performance. On the other hand, providing too little context results in edits that can sound unnatural. Moreover, it can lead the LLM to mistakenly change the meaning of some sentences.

We needed to decide how much content from the post an LLM needs for enough context to change the users’ sentences correctly. For example, when working on the sentence length assessment, we had to decide whether to send the whole post as input to the AI or only the sentences that were too long.

Our solution: Since some tasks require more context than others, we settled on different amounts of content depending on the assessment. For example, we use the paragraphs containing too long sentences as input for Sentence length, while the Keyphrase in introduction assessment only uses the introduction. This ensures smoother edits that correspond to the context of the post.

6. Compatibility across editors

One of the biggest compatibility challenges was ensuring AI Optimize worked smoothly across different WordPress editors, like the Classic Editor, Gutenberg, and Elementor. Each editor handles formatting differently, which means the AI-generated changes had to be applied without breaking existing layouts or stripping essential HTML elements. Therefore, we currently only support the Gutenberg editor, as it’s the default WordPress editor and the most mature (having a great API for interacting with it). We do plan on supporting more editors soon, though. We will share more about what we learned from the compatibility challenges in one of our future articles.

Lessons learned for future AI features

Developing this AI feature encouraged us to refine our SEO assessments and remain critical of our scoring criteria. Writing styles evolve, and user behavior changes, especially online. Analyzing data from modern online content (that AI uses as an example) helped us look at our requirements more critically. For instance, we decreased our threshold for text length of taxonomies after realizing that taxonomy pages shorter than our expected minimum amount of words are acceptable.

Furthermore, we recognized that different AI-powered writing tools might have conflicting rules. A sentence optimized for readability by one tool might be flagged as “too long” by another. By maintaining logical, justified criteria, we ensure our tool follows average standards that conflict with other tools as little as possible.

Finally, while AI can significantly enhance content optimization, excessive automation isn’t always helpful. Human revision is still crucial, especially in cases where the LLM lacks specific context or information on the user’s intent.

Hanna Worku

Hanna is a developer and a researcher at Yoast. She is passionate about linguistics and adding new features to the content analysis.

Coming up next!

Event
Generative Engine Optimization (GEO) Conference
July 18, 2025 Team Yoast is at Speaking Generative Engine Optimization (GEO) Conference! Click through to see who will be there, what we will do, and more! See where you can find us next »
SEO webinar
Webinar: How to start with SEO (July 14, 2025)
14 July 2025 Learn how to start your SEO journey the right way with our free webinar. Get practical tips and answers to all your questions in the live Q&A! All Yoast SEO webinars »