Beyond Vector Similarity: Hierarchical Context-Aware Graph RAG vs Standard RAG in Enterprise Code Migration

Suddhasatwa Bhaumik
Nilesh Jaiswal
Arjit Shukla
Divya Malhotra
Aniket Agrawal
Saurabh Garg
Suchit Puri
Google Cloud India, Google, S. No, AP81, 83, N Main Rd, near Hard Rock Cafe, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411036 (2026)
Google Scholar

Abstract

As enterprises modernize legacy systems (e.g., monolithic Java architectures to Python microservices), Large Language Models (LLMs) have become instrumental in automated code translation.

However, traditional vector-based Retrieval-Augmented Generation (Standard RAG) struggles
with topological relationships, fetching isolated text chunks that frequently sever inheritance chains and lead to high compilation failure rates.

This paper presents a comparative analysis between Standard RAG and a novel Hierarchical
Context-Resident Graph (HCRG) methodology. Our pipeline utilizes tree-sitter for polyglot
Abstract Syntax Tree (AST) extraction, mapping architectural edges into a Google Cloud Spanner Property Graph, and serializing this structure into a Gemini (on Vertex AI) Context Cache to enable topological, parent-first code translation.

By shifting evaluation from naive text-overlap to a custom 7-metric framework measuring Software Engineering (SE) utility, empirical evaluations on the spring-petclinic-genai repository
demonstrate significant structural improvements. Graph RAG decisively mitigates dependency loss, dropping the API hallucination rate from 56.4% to 16.2%.

Furthermore, it improves Dependency Resolution Quality (DRQ) from 34.8% to
65.9% and enhances Parent-Child Consistency (PCC) from 26.7% to 45.5%. Interestingly, traditional lexical metrics fail to capture this divergence; both methodologies achieved
an identical 91% average CodeBLEU score, effectively masking Standard RAG’s structural
failures behind syntactically plausible but broken code.

However, the results indicate that Graph RAG is not strictly superior across all dimensions. Providing the LLM with dense, global structural context introduces new vulnerabilities: Graph RAG suffers a severe degradation in Cyclomatic Complexity Consistency
(dropping from Standard RAG’s 71.6% to 46.7%) due to defensive over-engineering
by the LLM, alongside a slight drop in Docstring Preservation (67.0% down to 61.0%)
caused by prompt attention dilution.

Ultimately, this research validates that while Graph RAG trades an increase in code complexity
for critical reductions in API hallucinations, it offers a substantially more viable and architecturally sound path for automated enterprise codebase modernisation.

Follow us

×