What is BLAST patent sequence search?

BLAST patent sequence search is the process of using NCBI BLAST+ to compare biological sequences against patent databases to determine overlap, similarity, or potential infringement. Standard BLAST was built for research, not IP work — it ranks results by E-value rather than percent identity, which is what patent claims require.

Why doesn't standard NCBI BLAST work well for patent search?

Standard NCBI BLAST ranks results by statistical significance (E-value and bit score), not by percent identity or sequence coverage. Patent claims are defined by identity thresholds such as 95% or 80% identical, so examiners must manually filter thousands of hits. This workflow is slow, error-prone, and risks missing relevant matches.

What is percent identity in patent sequence search?

Percent identity is the proportion of matching residues between two aligned sequences. In patent law, it defines the scope of a sequence claim — a patent covering sequences at least 95% identical to a reference includes all sequences meeting that threshold. Standard BLAST reports percent identity per hit but does not filter by it, requiring manual post-processing.

What are the main scenarios in patent sequence search?

There are three core scenarios: fragment detection (finding short peptide sequences of 15-25 residues within longer database sequences), full-sequence containment (determining whether a query appears within a longer subject at high identity), and variant identification (finding near-identical sequences differing by only a few residues across their entire length).

How did Azati extend NCBI BLAST+ for patent search?

Azati extended the open-source NCBI BLAST+ engine with additional filtering parameters — including min_align_identity, min_query_identity, min_query_coverage, min_subject_identity, min_subject_coverage, subject_length_min, and subject_length_max — that operate inside the search algorithm. Filtering happens after alignment but before output, so only results meeting patent-relevant thresholds are returned.

What are the limitations of custom BLAST for patent search?

BLAST is a heuristic algorithm: if two sequences are similar but do not share a seed match, BLAST will miss the hit regardless of filtering. Additionally, BLAST performs local alignment, which approximates but does not replace formal global alignment. Custom identity and coverage filters closely approximate global identity when combined, but are not a substitute for global alignment tools in legally sensitive contexts.

What databases can be searched with Azati's enhanced BLAST?

Azati's enhanced BLAST+ can be applied to any sequence database formatted for BLAST, including the WIPO patent sequence collection, USPTO sequence databases, EPO data, and custom internal databases. The filtering parameters work at the algorithm level and are independent of the database source.

BLAST for Patent Sequence Search: Custom Filtering for IP Professionals

Back to blog

April 03, 2026

BLAST for Patent Sequence Search: Custom Filtering for IP Professionals

Technology

Business

Expert Insight

Vladimir Khramkov

Lead Software Developer at Azati

Every biotechnology patent defines its scope through percent identity thresholds: 95%, 90%, 80%, 75%. These numbers determine what a patent covers and what it doesn't. Yet the primary tool used to search patent sequence databases, NCBI BLAST, was never designed to answer questions framed this way, leaving patent examiners and IP attorneys to filter results manually, which is slow, error-prone, and often incomplete.

At Azati, we've extended BLAST to speak the language of patent search. Here's why that matters and what it means for IP professionals.

The gap between biology and patent law

BLAST is the gold standard for biological sequence search. It's used by researchers worldwide to compare DNA, RNA, and protein sequences against massive databases. It's fast, reliable, and well-understood.

But BLAST was built for biologists. It ranks results by statistical significance using E-values and bit scores, exactly the right measures for evolutionary biology and genomics research.

They are the wrong metrics for patent search.

When a patent examiner or IP attorney needs to determine whether a newly filed sequence infringes an existing patent, the question is straightforward: "Is this sequence at least 95% identical to the patented one?" BLAST doesn't answer this question directly. Instead, it returns thousands of hits ranked by statistical significance, leaving the searcher to figure out percent identity on their own.

💡The Result: a workflow that is slow, error-prone, and most critically, incomplete.

What patent searchers actually need

Patent sequence search involves three distinct scenarios that standard BLAST handles poorly:

Fragment detection. A patent may cover short peptide fragments (15 to 25 amino acid residues) derived from a larger protein. Finding these fragments requires filtering by sequence length and identity simultaneously. Standard BLAST has no mechanism for this.
Full-sequence containment. An examiner may need to know whether a query sequence appears within any longer database sequence with high identity over most or all of the query. BLAST reports local alignments that may cover only a small portion of the query, making it difficult to assess full-sequence matches.
Variant identification. Often the goal is to find sequences that are almost identical to the query across their entire length. BLAST's local alignment approach can miss these, reporting partial matches over short windows instead of surfacing near-identical full-length hits.

Our approach: filtering inside the algorithm

The Azati bioinformatics team extended the open-source NCBI BLAST+ engine with additional filtering parameters that operate inside the search algorithm itself. Instead of post-processing results externally, our version applies identity and coverage filters after alignment but before output, so only results that meet your criteria are returned.

This approach offers three key advantages:

Precision. Results are filtered by the exact metrics patent claims use: percent identity and percent coverage relative to the query, the subject, or the alignment, as well as subject sequence length. These filters can be applied individually or in combination.
Completeness. Because filtering happens inside the algorithm, BLAST processes every candidate alignment against your criteria before deciding what to keep. You get a complete set of results that meet your thresholds, rather than a statistically ranked subset.
Speed. No separate post-processing step. For large databases like the WIPO patent sequence collection, this eliminates hours of manual filtering work.

Real-world search scenarios

Here's how our enhanced BLAST handles the three patent search scenarios described above.

Fragment search

Parameters: subject_length_min, subject_length_max, min_subject_identity

Search for short peptide fragments within a specific length range that match your query with high identity.

Identity threshold search

Parameters: min_align_identity, min_query_identity, min_query_coverage

Find all database sequences that contain your query above a specified percent identity threshold. This directly answers the question patent claims ask: "Is there anything in the database that is at least X% identical to my sequence?"

Variant detection

Parameters: min_subject_identity, min_subject_coverage

Find database sequences that are similar to your query across their entire length, not just in a local alignment window. By requiring both high coverage and high identity of the subject sequence, the search identifies true sequence variants while excluding partial or coincidental matches.

Understanding the limitations

These enhancements have limitations worth understanding.

BLAST is a heuristic algorithm that uses short "seed" sequences to identify candidate regions before performing full alignment. If two sequences are similar but don't share a seed match, BLAST will miss the hit entirely. Our enhancements improve filtering, not candidate detection.

Additionally, BLAST performs local alignment, finding the best-matching region between two sequences. When a patent says "95% identical," it typically implies identity over the entire sequence. Our identity and coverage filters closely approximate global identity, especially when combined, but they are not a substitute for formal global alignment tools in legally sensitive contexts.

Work with us

The Azati bioinformatics team develops custom sequence search algorithms and extends established open-source tools like NCBI BLAST+ to meet the specific demands of patent sequence work.

If you're facing complex challenges in sequence search or analysis, contact the Azati team, we're happy to discuss your specific use case and run a complimentary test search on your sequences.

Full Name^*

Email^*

Your request^*

Upload additional information or RFP

Search for file

I permit to collect my data according to Privacy Policy and Terms of Use

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

BLAST for Patent Sequence Search: Custom Filtering for IP Professionals

The gap between biology and patent law

What patent searchers actually need

Our approach: filtering inside the algorithm

Real-world search scenarios

Fragment search

Identity threshold search

Variant detection

Understanding the limitations

Work with us

Latest Updates

How Intent-Based Development is Revolutionizing Proof of Concepts

When Engineering Data Becomes an Execution Risk

The Hidden Cost of Vibe Coding Without Code Review

Managed AI Services: Why AI Is an Operating Model, Not a Technology

Intelligent document processing for Utilities and Infrastructure Operators

Governing Generative AI: How Executives Balance Speed, Risk, and Control

Generative AI and Competitive Advantage: Where the Real Moat Is (and Isn't)

Generative AI as a Strategic Capability: How Executives Should Think Beyond Tools

AI in Customer Experience 2026: Complete CX & AI Guide

How AI Handles Holiday Traffic Surges

Expert Systems vs AI: Complete 2026 Guide | Differences Explained

AI-Powered Progressive Delivery: Smart Feature Flags in 2026

Top 10 LLM Development Companies in 2026

From Discovery to Deployment: Understanding the Custom Software Development Lifecycle

Recommendation Systems: Benefits And Development Process Issues

Enterprise Software Development: Streamlining Complex Business Workflows

Custom Web Application Development: How to Build Scalable Solutions

Custom Software Engineering Services: A Complete Guide to Building Tailored Software Solutions

How Artificial Intelligence Is Transforming Industries

AI-Powered NLP in Healthcare: 7 Game-Changing Applications Transforming Patient Care in 2025

Why Small Teams Accelerate Internal Product Development

Schema-Guided Reasoning (SGR): Fixing Broken LLM Pipelines for Measurable Results

How Much Does It Cost To Build A Recommendation System

Java Outsourcing: Save Costs Without Sacrificing Quality

Java Development Outsourcing Companies 2025

Cutting Costs with Healthcare IT Outsourcing

Top Ruby Development Agencies to Hire in 2025

Real-Time Data Analysis: How AI is Transforming Financial Market Predictions

Road to Agile Automation

Why Data Science Experts Are Essential for Digital Transformation

AI in Every Business: Bottom-Line Reality

Why Java Is the Right Choice for Enterprise

Has anyone else found serious value in building LLM integrations for companies?

How to Balance AI Tools and Human Creativity in Graphic Design

Our Process Of Software Development: Turn Uncertainty Into Measurable Business Value

Is It Worth Trying to Build a Startup Today?

Rewrite or Rot? The Business Case for Modernizing Legacy Software

Building the Right Software Development Crew

Metaprogramming in Ruby: The Key to Rapid MVP Delivery

Engineering Powerful Teams for Breakthrough Results

Do We See Coding Assistants a Game-Changer or Hidden Risk?

The Rise of Continuous Testing: Why You Need It Now

Why Startups Can’t Stop Choosing Ruby

AI-Powered DevOps: Automating Software Development and Deployment

IT Trends 2025: Shaping the Future of Technology

Why Snowflake is a Game-Changer for Data Analytics in 2024

AI Trends to Watch in 2024: The Future of Artificial Intelligence

Cybersecurity Best Practices: Protecting Your Business in a Digital World

How IT Companies Ensure Your Data Security When You Use Online Services

Microservices Architecture: Optimizing Scalability in Outsourced Software Development

Cloud Computing Trends: Multi-cloud Strategies and Hybrid Infrastructure Management

Transforming Recruitment Processes leveraging NLP and AI

Language Models in Healthcare: Transforming Medical Text Analysis and Diagnosis

Conversational Banking: LLMs in VFAs

Language Models for NLU: Applications and Challenges

The Future of QA: Exploring AI and Machine Learning in Testing

Face Verification: Enhancing Customer Experience And Data Security

Why You Should Hire A Metaverse Consulting Company

Empowering Developers To Create More Advanced AI Systems

Exploring LLMs: Deep Dive into Large Language Model Technology

Why You Should Use ChatGPT in Digital Marketing

What is a Service-Level Agreement (SLA) and Why Do Businesses Need It

Document Digitization At Workplaces To Optimize Workflow

How To Build An E-Commerce Software Platform From Scratch

How DevOps Automates the Development Process

Unstructured Data Analysis With Machine Learning

How To Extract Data From Invoices With Azati OCR

Is It Worth Hiring Blockchain Outsourcing Company?

Document Digitization With Machine Learning