80-Fold Software Performance Improvement

Azati significantly accelerated the client’s DNA sequence processing software by identifying and optimizing a critical bottleneck in the FASTAptamer toolkit. The team rewrote the core clusterization logic from Perl to C++, achieving an 80x reduction in total execution time and a 1,000x improvement for the embedded Levenshtein algorithm.

Discuss an idea
80x

overall software performance improvement

1,000x

Levenshtein algorithm execution speedup

30.5 min

end-to-end processing time after optimization

All Technologies Used

C++
C++
Perl
Perl

Motivation

The client needed a dramatic acceleration of their DNA sequencing pipeline, which processed vast amounts of biological data. Manual research workflows were slowed by a software bottleneck, delaying experiments and reducing lab productivity. The goal was to shorten processing time while maintaining result accuracy, enabling faster research cycles and timely delivery of insights.

Main Challenges

Challenge 01
FASTAptamer Performance Limits Speed

The client’s DNA sequencing software took approximately 48 hours to process a dataset because the FASTAptamer toolkit had a major performance bottleneck. This delay disrupted research timelines and impacted productivity. Azati analyzed the pipeline, pinpointed the slowest steps, and proposed performance engineering solutions to optimize them using low-level programming techniques.

#1
Challenge 02
Inefficient Sequence Clustering in Perl

The clusterization step relied on the Levenshtein algorithm implemented in Perl, which was inefficient for high-throughput data. Azati proposed rewriting this logic in C++ to exploit faster execution, better memory handling, and native compilation advantages, drastically reducing processing time while maintaining accuracy.

#2

Our Approach

Pipeline Analysis
Analyzed the client's DNA sequencing pipeline to locate performance bottlenecks.
Bottleneck Identification
Determined that the clusterization program in FASTAptamer, particularly the Levenshtein calculation, consumed the majority of execution time.
Benchmarking and Root Cause Analysis
Benchmarked the Perl implementation and confirmed inefficiencies due to language limitations in high-throughput operations.
Algorithm Rewriting in C++
Rewrote the Levenshtein algorithm in C++ to enhance execution speed and memory efficiency.
Integration and Validation
Integrated the optimized C++ algorithm into the client’s pipeline and validated results to ensure consistency and accuracy.
Open Source Contribution
Submitted the improved algorithm to the official FASTAptamer repository, which was merged in version 1.0.12, benefiting the global bioinformatics community.

Want a similar solution?

Just tell us about your project and we'll get back to you with a free consultation.

Schedule a call

Solution

01

Optimized Clusterization Logic

The original Levenshtein algorithm in Perl was a major performance bottleneck. Azati rewrote it in C++ to leverage efficient memory management and faster computation, allowing the software to process sequences thousands of times faster without changing the output results.
Key capabilities:
  • High-speed Levenshtein calculation
  • Efficient memory management
  • Support for high-throughput sequence processing
  • Seamless integration with existing pipeline
02

Pipeline Acceleration

Beyond the algorithm rewrite, Azati optimized the full DNA sequencing workflow, removing unnecessary delays and improving data handling across modules. This reduced total execution time for datasets from 48 hours to just 30.5 minutes, massively increasing research throughput and lab efficiency.
Key capabilities:
  • 80x overall pipeline speedup
  • 1,000x algorithm execution improvement
  • Maintains result accuracy
  • Significant reduction in research wait times
03

Open Source Integration

The optimized Levenshtein algorithm was submitted to FASTAptamer’s official repository and merged in the subsequent release. This not only improved the client’s performance but also contributed to the wider bioinformatics community, enabling all users to benefit from faster DNA sequence analysis.
Key capabilities:
  • Contribution to official toolkit
  • Ensures reproducibility for all users
  • Supports collaborative development
  • Widespread adoption of performance improvements

Business Value

Faster Data Processing: Reduced total runtime from 48 hours to 30.5 minutes, drastically improving research productivity.

Validated Accuracy: Output results remained consistent, ensuring scientific integrity and reproducibility.

Community Benefit: Optimization accepted into the mainstream toolset, benefiting all FASTAptamer users worldwide.

Ready To Get Started

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.