LLM Comparison: Find the Best Fit for Legacy System Rewrites


09/12/2024

near 19 min of reading

Legacy systems often struggle with performance, are vulnerable to security issues, and are expensive to maintain. Despite these challenges, over 65% of enterprises still rely on them for critical operations.

At the same time, modernization is becoming a pressing business need, with the application modernization services market valued at $17.8 billion in 2023 and expected to grow at a CAGR of 16.7%.

This growth highlights a clear trend: businesses recognize the need to update outdated systems to keep pace with industry demands.

The journey toward modernization varies widely. While 75% of organizations have started modernization projects, only 18% have reached a state of continuous improvement.

Legacy modernization progresss

Data source: https://www.redhat.com/en/resources/app-modernization-report

For many, the process remains challenging, with a staggering 74% of companies failing to complete their legacy modernization efforts. Security and efficiency are the primary drivers, with over half of surveyed companies citing these as key motivators.

Given these complexities, the question arises: Could Generative AI simplify and accelerate this process?

With the surging adoption rates of AI technology, it’s worth exploring if Generative AI has a role in rewriting legacy systems.

This article explores LLM comparison, evaluating GenAI tools’ strengths, weaknesses, and potential risks. The decision to use them ultimately lies with you.

Here’s what we’ll discuss:

  • Why Generative AI?
  • The research methodology
  • Generative AI tools: six contenders for LLM comparison
    • OpenAI backed by ChatGPT-4o
    • Claude-3-sonnet
    • Claude-3-opus
    • Claude-3-haiku
    • Gemini 1.5 Flash
    • Gemini 1.5 Pro 
  • Comparison summary

Why Generative AI?

Traditionally, updating outdated systems has been a labor-intensive and error-prone process. Generative AI offers a solution by automating code translation, ensuring consistency and efficiency. This accelerates the modernization of legacy systems and supports cross-platform development and refactoring.

As businesses aim to remain competitive, using Generative AI for code transformation is crucial, allowing them to fully use modern technologies while reducing manual rewrite risks.

Here are key reasons to consider its use:

  • Uncovering dependencies and business logic – Generative AI can dissect legacy code to reveal dependencies and embedded business logic, ensuring essential functionalities are retained and improved in the updated system.
  • Decreased development time and expenses – automation drastically reduces the time and resources required for system re-writing. Quicker development cycles and fewer human hours needed for coding and testing decrease the overall project cost.
  • Consistency and accuracy – manual code translation is prone to human error. AI models ensure consistent and accurate code conversion, minimizing bugs and enhancing reliability.
  • Optimized performance – Generative AI facilitates the creation of optimized code from the beginning, incorporating advanced algorithms that enhance efficiency and adaptability, often lacking in older systems.

The LLM comparison research methodology

It could be tough to compare different Generative AI models to each other. It’s hard to find the same criteria for available tools. Some are web-based, some are restricted to a specific IDE, some offer a “chat” feature, and others only propose a code.

As our goal was the re-writing of existing projects, we aimed to create an LLM comparison based on the following six main challenges while working with existing code:

  • Analyzing project architecture –  understanding the architecture is crucial for maintaining the system’s integrity during re-writing. It ensures the new code aligns with the original design principles and system structure.
  • Analyzing data flows – proper analysis of data flows is essential to ensure that data is processed correctly and efficiently in the re-written application. This helps maintain functionality and performance.
  • Generating historical backlog –  this involves querying the Generative AI to create Jira (or any other tracking system) tickets that could potentially be used to rebuild the system from scratch. The aim is to replicate the workflow of the initial project implementation. These “tickets” should include component descriptions and acceptance criteria.
  • Converting code from one programming language to another –  language conversion is often necessary to leverage modern technologies. Accurate translation preserves functionality and enables integration with contemporary systems.
  • Generating new code –  the ability to generate new code, such as test cases or additional features, is important for enhancing the application’s capabilities and ensuring comprehensive testing.
  • Privacy and security of a Generative AI tool – businesses are concerned about sharing their source codebase with the public internet. Therefore, work with Generative AI must occur in an isolated environment to protect sensitive data.

Source projects overview

To test the capabilities of Generative AI, we used two projects:

  • Simple CRUD application – The project utilizes .Net Core as its framework, with Entity Framework Core serving as the ORM and SQL Server as the relational database. The target application is a backend system built with Java 17 and Spring Boot 3.
  • Microservice-based application – The application is developed with .Net Core as its framework, Entity Framework Core as the ORM, and the Command Query Responsibility Segregation (CQRS) pattern for handling entity operations. The target system includes a microservice-based backend built with Java 17 and Spring Boot 3, alongside a frontend developed using the React framework
legacy system modernization Grape Up

Generative AI tools: six contenders for LLM comparison

In this article, we will compare six different Generative AI tools used in these example projects:

  • OpenAI backed by ChatGPT-4o  with a context of 128k tokens
  • Claude-3-sonnet – context of 200k tokens
  • Claude-3-opus – context of 200k tokens
  • Claude-3-haiku – context of 200k tokens
  • Gemini 1.5 Flash – context of 1M tokens
  • Gemini 1.5 Pro  – context of 2M tokens

OpenAI

OpenAI’s ChatGPT-4o represents an advanced language model that showcases the leading edge of artificial intelligence technology. Known for its conversational prowess and ability to manage extensive contexts, it offers great potential for explaining and generating code.

  • Analyzing project architecture

ChatGPT faces challenges in analyzing project architecture due to its abstract nature and the high-level understanding required. The model struggles with grasping the full context and intricacies of architectural design, as it lacks the ability to comprehend abstract concepts and relationships not explicitly defined in the code.

  • Analyzing data flows

ChatGPT performs better at analyzing data flows within a program. It can effectively trace how data moves through a program by examining function calls, variable assignments, and other code structures. This task aligns well with ChatGPT’s pattern recognition capabilities, making it a suitable application for the model.

  • Generating historical backlog

When given a project architecture as input, OpenAI can generate high-level epics that capture the project’s overall goals and objectives. However, it struggles to produce detailed user stories suitable for project management tools like Jira, often lacking the necessary detail and precision for effective use.

  • Converting code from one programming language to another

ChatGPT performs reasonably well in converting code, such as from C# to Java Spring Boot, by mapping similar constructs and generating syntactically correct code. However, it encounters limitations when there is no direct mapping between frameworks, as it lacks the deep semantic understanding needed to translate unique framework-specific features.

  • Generating new code

ChatGPT excels in generating new code, particularly for unit tests and integration tests. Given a piece of code and a prompt, it can generate tests that accurately verify the code’s functionality, showcasing its strength in this area.

  • Privacy and security of the Generative AI tool

OpenAI’s ChatGPT, like many cloud-based AI services, typically operates over the internet. However, there are solutions to using it in an isolated private environment without sharing code or sensitive data on the public internet. To achieve this, on-premise deployments such as Azure OpenAI can be used, a service offered by Microsoft where OpenAI models can be accessed within Azure’s secure cloud environment.

Best tip

Use Reinforcement Learning from Human Feedback (RLHF): If possible, use RLHF to fine-tune GPT-4. This involves providing feedback on the AI’s outputs, which it can then use to improve future outputs. This can be particularly useful for complex tasks like code migration.

Overall

OpenAI’s ChatGPT-4o is a mature and robust language model that provides substantial support to developers in complex scenarios. It excels in tasks like code conversion between programming languages, ensuring accurate translation while maintaining functionality.

  • Possibilities 3/5
  • Correctness 3/5
  • Privacy 5/5
  • Maturity 4/5

Overall score: 4/5


Claude-3-sonnet

Claude-3-Sonnet is a language model developed by Anthropic, designed to provide advanced natural language processing capabilities. Its architecture is optimized for maintaining context over extended interactions, offering a balance of intelligence and speed.

  • Analyzing project architecture

Claude-3-Sonnet excels in analyzing and comprehending the architecture of existing projects. When presented with a codebase, it provides detailed insights into the project’s structure, identifying components, modules, and their interdependencies. Claude-3-Sonnet offers a comprehensive breakdown of project architecture, including class hierarchies, design patterns, and architectural principles employed.

  • Analyzing data flows

It struggles to grasp the full context and nuances of data flows, particularly in complex systems with sophisticated data transformations and conditional logic. This limitation can pose challenges when rewriting projects that heavily rely on intricate data flows or involve sophisticated data processing pipelines, necessitating manual intervention and verification by human developers.

  • Generating historical backlog

Claude-3-Sonnet can provide high-level epics that cover main functions and components when prompted with a project’s architecture. However, they lack detailed acceptance criteria and business requirements. While it may propose user stories to map to the epics, these stories will also lack the details needed to create backlog items. It can help capture some user goals without clear confirmation points for completion.

  • Converting code from one programming language to another

Claude-3-Sonnet showcases impressive capabilities in converting code, such as translating C# code to Java Spring Boot applications. It effectively translates the logic and functionality of the original codebase into a new implementation, leveraging framework conventions and best practices. However, limitations arise when there is no direct mapping between frameworks, requiring additional manual adjustments and optimizations by developers.

  • Generating new code

Claude-3-Sonnet demonstrates remarkable proficiency in generating new code, particularly in unit and integration tests. The AI tool can analyze existing codebases and automatically generate comprehensive test suites covering various scenarios and edge cases.

  • Privacy and security of the Generative AI tool

Unfortunately, Anthropic’s privacy policy is quite confusing. Before January 2024, they used clients’ data to train their models. The updated legal document ostensibly provides protections and transparency for Anthropic’s commercial clients, but it’s recommended to consider the privacy of your data while using Claude.

Best tip

Be specific and detailed: provide the GenerativeAI with specific and detailed prompts to ensure it understands the task accurately. This includes clear descriptions of what needs to be rewritten, any constraints, and desired outcomes.

Overall

The model’s ability to generate coherent and contextually relevant content makes it a valuable tool for developers and businesses seeking to enhance their AI-driven solutions. However, the model might have difficulty fully grasping intricate data flows, especially in systems with complex transformations and conditional logic.

  • Possibilities 3/5
  • Correctness 3/5
  • Privacy 3/5
  • Maturity 3/5

Overall score: 3/5


Claude-3-opus

Claude-3-Opus is another language model by Anthropic, designed for handling more extensive and complex interactions. This version of Claude models focuses on delivering high-quality code generation and analysis with high precision.

  • Analyzing project architecture

With its advanced natural language processing capabilities, it thoroughly examines the codebase, identifying various components, their relationships, and the overall structure. This analysis provides valuable insights into the project’s design, enabling developers to understand the system’s organization better and make decisions about potential refactoring or optimization efforts.

  • Analyzing data flows

While Claude-3-Opus performs reasonably well in analyzing data flows within a project, it may lack the context necessary to fully comprehend all possible scenarios. However, compared to Claude-3-sonnet, it demonstrates improved capabilities in this area. By examining the flow of data through the application, it can identify potential bottlenecks, inefficiencies, or areas where data integrity might be compromised.

  • Generating historical backlog

By providing the project architecture as an input prompt, it effectively creates high-level epics that encapsulate essential features and functionalities. One of its key strengths is generating detailed and precise acceptance criteria for each epic. However, it may struggle to create granular Jira user stories. Compared to other Claude models, Claude-3-Opus demonstrates superior performance in generating historical backlog based on project architecture.

  • Converting code from one programming language to another

Claude-3-Opus shows promising capabilities in converting code from one programming language to another, particularly in converting C# code to Java Spring Boot, a popular Java framework for building web applications. However, it has limitations when there is no direct mapping between frameworks in different programming languages.

  • Generating new code

The AI tool demonstrates proficiency in generating both unit tests and integration tests for existing codebases. By leveraging its understanding of the project’s architecture and data flows, Claude-3-Opus generates comprehensive test suites, ensuring thorough coverage and improving the overall quality of the codebase.

  • Privacy and security of the Generative AI tool

Like other Anthropic models, you need to consider the privacy of your data. For specific details about Anthropic’s data privacy and security practices, it would be better to contact them directly.

Best tip

Break down the existing project into components and functionality that need to be recreated. Reducing input complexity minimizes the risk of errors in output.

Overall

Claude-3-Opus’s strengths are analyzing project architecture and data flows, converting code between languages, and generating new code, which makes the development process easier and improves code quality. This tool empowers developers to quickly deliver high-quality software solutions.

  • Possibilities 4/5
  • Correctness 4/5
  • Privacy 3/5
  • Maturity 4/5

Overall score: 4/5


Claude-3-haiku

Claude-3-Haiku is part of Anthropic’s suite of Generative AI models, declared as the fastest and most compact model in the Claude family for near-instant responsiveness. It excels in answering simple queries and requests with exceptional speed.

  • Analyzing project architecture

Claude-3-Haiku struggles with analyzing project architecture. The model tends to generate overly general responses that closely resemble the input data, limiting its ability to provide meaningful insights into a project’s overall structure and organization.

  • Analyzing data flows

Similar to its limitations in project architecture analysis, Claude-3-Haiku fails to effectively group components based on their data flow relationships. This lack of precision makes it difficult to clearly understand how data moves throughout the system.

  • Generating historical backlog

Claude-3-Haiku is unable to generate Jira user stories effectively. It struggles to produce user stories that meet the standard format and detail required for project management. Additionally, its performance generating high-level epics is unsatisfactory, lacking detailed acceptance criteria and business requirements. These limitations likely stem from its training data, which focused on short forms and concise prompts, restricting its ability to handle more extensive and detailed inputs.

  • Converting code from one programming language to another

Claude-3-Haiku proved good at converting code between programming languages, demonstrating an impressive ability to accurately translate code snippets while preserving original functionality and structure.

  • Generating new code

Claude-3-Haiku performs well in generating new code, comparable to other Claude-3 models. It can produce code snippets based on given requirements or specifications, providing a useful starting point for developers.

  • Privacy and security of the Generative AI tool

Similar to other Anthropic models, you need to consider the privacy of your data, although according to official documentation, Claude 3 Haiku prioritizes enterprise-grade security and robustness. Also, keep in mind that security policies may vary for different Anthropic models.

Best tip

Be aware of Claude-3-haiku capabilities: Claude-3-haiku is a natural language processing model trained on short form. It is not designed for complex tasks like converting a project from one programming language to another.

Overall

Its fast response time is a notable advantage, but its performance suffers when dealing with larger prompts and more intricate tasks. Other tools or manual analysis may prove more effective in analyzing project architecture and data flows. However, Claude-3-Haiku can be a valuable asset in a developer’s toolkit for straightforward code conversion and generation tasks.

  • Possibilities 2/5
  • Correctness 2/5
  • Privacy 3/5
  • Maturity 2/5

Overall score: 2/5


Gemini 1.5 Flash

Gemini 1.5 Flash represents Google’s commitment to advancing AI technology; it is designed to handle a wide range of natural language processing tasks, from text generation to complex data analysis. Google presents Gemini Flash as a lightweight, fast, and cost-efficient model featuring multimodal reasoning and a breakthrough long context window of up to one million tokens.

  • Analyzing project architecture

Gemini Flash’s performance in analyzing project architecture was found to be suboptimal. The AI tool struggled to provide concrete and actionable insights, often generating abstract and high-level observations instead.

  • Analyzing data flows

It effectively identified and traced the flow of data between different components and modules, offering developers valuable insights into how information is processed and transformed throughout the system. This capability aids in understanding the existing codebase and identifying potential bottlenecks or inefficiencies. However, the effectiveness of data flow analysis may vary depending on the project’s complexity and size.

  • Generating historical backlog

Gemini Flash can synthesize meaningful epics that capture overarching goals and functionalities required for the project by analyzing architectural components, dependencies, and interactions within a software system. However, it may fall short of providing granular acceptance criteria and detailed business requirements. The generated epics often lack the precision and specificity needed for effective backlog management and task execution, and it struggles to generate Jira user stories.

  • Converting code from one programming language to another

Gemini Flash showed promising results in converting code from one programming language to another, particularly when translating from C# to Java Spring Boot. It successfully mapped and transformed language-specific constructs, such as syntax, data types, and control structures. However, limitations exist, especially when dealing with frameworks or libraries that do not have direct equivalents in the target language.

  • Generating new code

Gemini Flash excels in generating new code, including test cases and additional features, enhancing application reliability and functionality. It analyzed the existing codebase and generated test cases that cover various scenarios and edge cases.

  • Privacy and security of the Generative AI tool

Google was one of the first in the industry to publish an AI/ML privacy commitment, which outlines our belief that customers should have the highest level of security and control over their data stored in the cloud. That commitment extends to Google Cloud Generative AI products. You can set up a Gemini AI model in Google Cloud and use an encrypted TLS connection over the internet to connect from your on-premises environment to Google Cloud.

Best tip

Use prompt engineering: Starting by providing necessary background information or context within the prompt helps the model understand the task’s scope and nuances. It’s beneficial to experiment with different phrasing and structures; refining prompts iteratively based on the quality of the outputs. Specifying any constraints or requirements directly in the prompt can further tailor the model’s output to meet your needs.

Overall

By using its AI capabilities in data flow analysis, code translation, and test creation, developers can optimize their workflow and concentrate on strategic tasks. However, it is important to remember that Gemini Flash is optimized for high-speed processing, which makes it less effective for complex tasks.

  • Possibilities 2/5
  • Correctness 2/5
  • Privacy 5/5
  • Maturity 2/5

Overall score: 2/5


Gemini 1.5 Pro

Gemini 1.5 Pro is the largest and most capable model created by Google, designed for handling highly complex tasks. While it is the slowest among its counterparts, it offers significant capabilities. The model targets professionals and developers needing a reliable assistant for intricate tasks.

  • Analyzing project architecture

Gemini Pro is highly effective in analyzing and understanding the architecture of existing programming projects, surpassing Gemini Flash in this area. It provides detailed insights into project structure and component relationships.

  • Analyzing data flows

The model demonstrates proficiency in analyzing data flows, similar to its performance in project architecture analysis. It accurately traces and understands data movement throughout the codebase, identifying how information is processed and exchanged between modules.

  • Generating historical backlog

By using project architecture as an input, it creates high-level epics that encapsulate main features and functionalities. While it may not generate specific Jira user stories, it excels at providing detailed acceptance criteria and precise details for each epic.

  • Converting code from one programming language to another

The model shows impressive results in code conversion, particularly from C# to Java Spring Boot. It effectively maps and transforms syntax, data structures, and constructs between languages. However, limitations exist when there is no direct mapping between frameworks or libraries.

  • Generating new code

Gemini Pro excels in generating new code, especially for unit and integration tests. It analyzes the existing codebase, understands functionality and requirements, and automatically generates comprehensive test cases.

  • Privacy and security of the Generative AI tool

Similarly to other Gemini models, Gemini Pro is packed with advanced security and data governance features, making it ideal for organizations with strict data security requirements.

Best tip

Manage context: Gemini Pro incorporates previous prompts into its input when generating responses. This use of historical context can significantly influence the model’s output and lead to different responses. Include only the necessary information in your input to avoid overwhelming the model with irrelevant details.

Overall

Gemini Pro shows remarkable capabilities in areas such as project architecture analysis, data flow understanding, code conversion, and new code generation. However, there may be instances where the AI encounters challenges or limitations, especially with complex or highly specialized codebases. As such, while Gemini Pro offers significant advantages, developers should remain mindful of its current boundaries and use human expertise when necessary.

  • Possibilities 4/5
  • Correctness 3/5
  • Privacy 5/5
  • Maturity 3/5

Overall score: 4/5


LLM comparison summary

LLM comparison

Embrace AI-driven approach to legacy code modernization

Generative AI offers practical support for rewriting legacy systems. While tools like GPT-4o and Claude-3-opus can’t fully automate the process, they excel in tasks like analyzing codebases and refining requirements. Combined with advanced platforms for data analysis and workflows, they help create a more efficient and precise redevelopment process.

This synergy allows developers to focus on essential tasks, reducing project timelines and improving outcomes.

Looking to rewrite legacy systems?

We can help you leverage Generative AI alongside cutting-edge platforms to streamline code analysis, enhance documentation, and optimize workflows.

Contact us to get started.



Is it insightful?
Share the article!



Check related articles


Read our blog and stay informed about the industry's latest trends and solutions.


see all articles



Choosing the Right Approach: How Generative AI Powers Legacy System Modernization


Read the article

Modernizing Legacy Applications with Generative AI: Lessons from R&D Projects


Read the article