arela
Version:
AI-powered CTO with multi-agent orchestration, code summarization, visual testing (web + mobile) for blazing fast development.
180 lines (142 loc) • 28.1 kB
Markdown
# **The Agentic-Monolith: A Framework for Implementing and Governing Multi-Agent Software Teams**
## **Executive Summary**
This report presents a comprehensive framework for implementing, governing, and empirically validating an autonomous multi-agent software development team. The proposed system is architected to build and maintain applications using a Vertical Slice Architecture (VSA) within a Modular Monolith (MMA). This architectural combination is uniquely suited for autonomous agents, which excel at small, self-contained tasks —a characteristic VSA provides by design.
Governance is achieved through a "Human-on-the-Loop" (HOTL) model , where human oversight shifts from manual code review to high-leverage plan approval and policy management. The query's specified enforcer, "Arela," is addressed by implementing its *functional intent*—a robust, automated policy engine—using the industry-standard Open Policy Agent (OPA). OPA is embedded within the CI/CD pipeline to provide immutable, automated guardrails against architectural drift, security vulnerabilities, and quality degradation.
The AI team structure is modeled on *Team Topologies* , inverting Conway's Law to use the agent's communication structure as a tool to enforce architectural boundaries. Validation of this system must extend beyond traditional DORA metrics to include AI-specific indicators, such as "Human Override Rate" , and, most critically, the measurement of human *cognitive load* to prevent the "review overload" bottleneck identified in recent AI adoption research.
## **1\. Literature Synthesis: Foundations**
### **1.1. Team Design & Topology**
The design of an autonomous AI team must be a direct reflection of the software architecture it is intended to build. This approach inverts the traditional observation of Conway's Law—where system design follows human communication structures —and uses it as a prescriptive tool. By designing the agent team's communication protocols and roles *first*, we enforce the desired architecture.
A modern framework for this is *Team Topologies*. Research shows a direct mapping from these human-centric topologies to AI agent roles :
* **Stream-Aligned Teams** map to **Task-Specialized AI Agents**. These are the "Developer" agents, each with a domain bounded by a specific module in the Modular Monolith.
* **Platform Teams** map to **Infrastructure AI Agents**. This is the core governance agent. It owns the CI/CD pipeline, manages API contracts, and—critically—enforces the OPA policies.
* **Complicated-Subsystem Teams** map to **Expert AI Agents**. These are specialized agents for complex, cross-cutting concerns (e.g., a "Fraud Detection AI" or "Security Audit AI") that own a single, complex VSA slice or module.
Frameworks like MetaGPT provide a "Software Company" metaphor for this team, with roles like Product Manager, Architect, and Engineer coordinated by Standard Operating Procedures (SOPs).
For governance, a "Human-in-the-Loop" (HITL) model, where humans validate every step , is unscalable. The literature supports a "Human-on-the-Loop" (HOTL) model. In an HOTL framework, the human operator does not micromanage; they provide oversight. This shifts the human's role from a *code reviewer* to a *policy auditor* and *plan approver*, a scalable pattern for governing autonomous systems.
### **1.2. Architecture–Agent Integration**
The choice of a Modular Monolith (MMA) and Vertical Slice Architecture (VSA) is a prerequisite for successful AI development. An MMA provides operational simplicity and a single deployment unit , while VSA structures work into "distinct requests". VSA's core principle—"minimize coupling between slices, and maximize coupling in a slice" —creates the exact type of small, self-contained, and context-rich units of work where AI agents excel. The VSA/MMA combination is complementary: the MMA defines the *macro-boundaries* (the modules), while VSA defines the *micro-boundaries* (the features within each module).
For agents to collaborate, they require "agent-ready" APIs. Agents cannot "read between the lines" of poor documentation or navigate complex, human-centric onboarding. Therefore, a contract-first approach using OpenAPI is the *lingua franca* for all inter-agent and inter-module communication.
To enforce these contracts, the framework integrates AI-augmented contract testing with tools like Pactflow. This creates a self-validating loop:
1. An "Architect Agent" generates an OpenAPI specification.
2. A "Provider Agent" implements the API.
3. A "Consumer Agent" uses Pactflow's AI capabilities to generate a consumer contract test against the spec.
4. The Pactflow broker verifies the contract *before* the CI/CD pipeline executes.
This process, combined with tools to prevent architectural drift , ensures the MMA's modularity is actively maintained by the agents themselves.
### **1.3. Governance & Reliability**
The query specifies "Arela" as the policy enforcer. A review of the provided materials indicates this term is ambiguous, pointing to unrelated network or password tools or being inaccessible. However, the *function* of "Arela"—a universal, code-based policy engine for CI/CD governance—is critical. This report substitutes the industry-standard Open Policy Agent (OPA) to fulfill this role.
OPA, a CNCF-graduated project , provides a high-level declarative language, Rego , to "specify policy as code". This is the central mechanism for AI governance. The "Platform Agent" will embed OPA eval commands into the CI/CD pipeline to act as an immutable "immune system" for the codebase.
Key policies include:
* **Architectural Drift Prevention:** The OPA policy will fail any build where an AI-generated commit introduces a dependency that crosses MMA module boundaries (e.g., by parsing import statements).
* **Supply Chain Security:** OPA will validate package.json or other manifests against an allow-list, *proactively blocking* "AI Package Hallucinations"—a documented risk where agents invent and insert non-existent, malicious dependencies.
* **Quality Governance:** Enforcing minimum test coverage and conventional commit formats.
To ensure reliability, the system must include "circuit breakers". These are automated safety controls, including threshold-based cutoffs (e.g., on token usage or error rates), HOTL escalation triggers for human approval, and auto-rollback triggers.
Finally, auditability is achieved via "Explainable DevOps". Instead of a human reviewing thousands of lines of AI-generated code, the HOTL overseer reviews a high-level "XAI Log" , which provides a transparent, auditable record of the agent's decisions, test results, and policy validations.
### **1.4. Empirical Evaluation Framework**
A robust validation blueprint is required to measure success and avoid common pitfalls. The 2025 DORA "State of AI-assisted Software Development" report notes that AI acts as an "amplifier" , and analysis of its impact shows that while individual tasks accelerate, it can create "review overload," increasing pull request size and review time.
Therefore, measuring DORA metrics alone is insufficient. The experimental blueprint must be a longitudinal A/B study (Human Team vs. Hybrid AI Team) measuring a balanced scorecard:
1. **DORA Metrics :**
* **Velocity:** Deployment Frequency, Lead Time for Changes.
* **Stability:** Change Failure Rate, Time to Restore Service (MTTR).
2. **AI-Specific Indicators :**
* **Human Override Rate:** The percentage of agent-proposed plans or commits that are rejected by the human overseer. This is the primary metric for agent trust and capability.
* **Policy Violation Rate:** The frequency of agent commits that are automatically blocked by OPA. This measures the agent's adherence to guardrails.
3. **Human-Centric Metrics:**
* **Team Cognitive Load :** This is the most critical qualitative metric. If the AI team reduces developer load but creates an unsustainable cognitive bottleneck for the human architects (the overseers), the system has failed. This can be measured via surveys and qualitative interviews.
### **1.5. Limitations, Risks, and Ethical Considerations**
The primary risks of this system are hallucination, security, and accountability.
* **Hallucination:** Agents may generate code that is plausible but incorrect or fabricate non-existent dependencies. This is mitigated by RAG , OPA policies , and rigorous automated testing.
* **Security:** AI-generated code has been shown to contain vulnerabilities. This requires specialized "Expert AI Agents" for security and OPA policies that enforce secure coding patterns.
* **Accountability:** The system raises complex ethical and legal questions about authorship and liability. If an AI-generated VSA slice causes a production outage, accountability is diffused between the agent, its training data, and the human policy author.
A critical industry data point must be considered: Google's Service Weaver framework, which had the *exact goal* of "write as a modular monolith and deploy it as a set of microservices" , was discontinued in late 2024\. This strongly suggests that a hybrid *deployment* model is exceedingly complex. The framework proposed here is more robust because it *avoids* this complexity, adhering to a *true* MMA with a single, unified deployment unit , thereby realizing the "operational simplicity" that the Service Weaver model failed to achieve.
## **2\. Comparative Table: Human vs. AI vs. Hybrid Team Models**
| Dimension | Traditional Human Team | Fully Autonomous Team (Theoretical) | Hybrid "Human-on-the-Loop" (HOTL) Team (Proposed) |
| :---- | :---- | :---- | :---- |
| **Coordination Model** | Synchronous (meetings, stand-ups) and asynchronous (PRs, Slack). High communication overhead. | Agent-to-Agent (A2A) protocols. Coordination is programmatic, based on pre-defined SOPs. | Policy-gated orchestration. Agents run autonomously until they hit a policy gate (OPA) or an escalation trigger (HOTL). |
| **Efficiency (DORA Velocity)** | Variable. Limited by human cognitive capacity , context switching, and manual processes. | Extremely high (theoretically). Capable of 24/7/365 development. | High, but gated. Velocity is determined by the speed of automated policy checks and the (low) frequency of human plan approvals. |
| **Quality (DORA Stability)** | Variable. Depends on human discipline, manual reviews, and test culture. Prone to human error. | Consistent, but *brittle*. Quality is 100% (if policies are perfect) or 0% (if a hallucination occurs ). No "common sense." | High and resilient. Quality is enforced *automatically* by OPA policies (architecture, testing, security) and contract tests. |
| **Governance Mechanism** | Manual pull requests (PRs). Peer review. Architectural review boards. Slow, expensive, and inconsistent. | Automated policy enforcement only. OPA is the *only* governor. | **Policy-as-Code (OPA) \+ XAI Log Review.** OPA governs all code. Humans govern the *plan* and *audit* the XAI logs. |
| **Key Risks** | Cognitive load, burnout , architectural drift, knowledge silos, slow delivery. | Catastrophic hallucination , prompt drift, security breaches, architectural "lock-in" to a bad policy, lack of accountability. | **Review overload / Cognitive load shift.** The human architect (overseer) becomes the bottleneck. System failure if policies are poorly written. |
| **Human Cognitive Load** | **High & Diffuse.** Distributed across all team members (developers, reviewers, PMs). | **Low.** Humans are out of the loop. Load shifts to (infrequent) incident response. | **High & Focused.** Load is *removed* from developers and *concentrated* on the few human architects writing policies and reviewing plans. |
## **3\. Proposed Implementation Framework**
This framework provides a phased-in approach to deploying the autonomous agent team, prioritizing safety and measurement.
* **Phase 1: Architectural & Policy Foundation (The "Chassis & Rulebook")**
1. **Define Architecture:** Formally establish the MMA module boundaries (e.g., as separate projects/folders) and define the VSA request templates.
2. **Establish Contracts:** Mandate an OpenAPI-first process for any new feature that crosses an MMA module boundary.
3. **Implement Policy Engine:** Install Open Policy Agent (OPA) as the "Arela" enforcer. Integrate opa eval into the CI/CD pipeline, configured to fail builds on policy violation.
4. **Author Initial Policies:** Author and version-control the initial Rego policy set (e.g., architecture.rego, security.rego).
* architecture.rego: Fails build if a VSA slice in module\_A\[span\_13\](start\_span)\[span\_13\](end\_span) imports code from module\_B.
* security.rego: Fails build if package.json contains dependencies *not* in a blessed allow-list (mitigates ).
* quality.rego: Fails build if test coverage for the VSA slice is \< 90%.
* **Phase 2: Agent Team Deployment (Hybrid HOTL Model)**
1. **Deploy Platform Agent:** Instantiate the "Platform Agent" with the sole responsibility of managing and executing the OPA-gated CI/CD pipeline.
2. **Deploy Development Agents:** Instantiate "Developer" and "Tester" Agents for a *single, non-critical VSA slice*.
3. **Execute First Slice:** The agents operate in a strict HOTL model.
* An "AI Planner" proposes a task breakdown.
* A human architect *approves* the plan.
* The agents generate code and tests.
* The OPA-gated pipeline runs.
* A human architect *manually reviews the code and the OPA result* before approving the merge.
* **Phase 3: Scaling to Measured Autonomy (The "Inner Loop")**
1. **Automate Inner Loop:** Agents are now trusted to execute the full inner loop (plan, code, test, commit) for a new VSA slice autonomously.
2. **Shift Governance Checkpoint:** The OPA pipeline automatically gates the commit.
3. **Implement XAI Log Review:** The human architect's role *officially* shifts. They no longer review the raw code. They review the high-level, auditable "XAI Log" and the OPA policy decision log. The merge is approved based on a *passing policy check* and a *sensible plan*.
* **Phase 4: Full-Loop Autonomy (Target State)**
1. **Grant Trusted Status:** Agents are granted permission to merge "green" (OPA-passed, contract-verified) builds to a staging environment autonomously.
2. **Activate Circuit Breakers:** The automated "circuit breakers" and rollback triggers are fully activated. If DORA metrics (e.g., Change Fail Rate) spike in staging, the system automatically rolls back and escalates to the human (HOTL).
3. **Evolve Human Role:** Human oversight shifts entirely to *systems-level review*: "Are our OPA policies correct?", "Is the agent's Human Override Rate decreasing?", "Is the cognitive load on our auditors manageable?"
## **4\. Experimental Validation Blueprint**
**Methodology:** A comparative, longitudinal A/B study.
* **Control Group (Team H):** One traditional human development team (e.g., 5 engineers) tasked with building VSA slices in the MMA.
* **Experimental Group (Team A):** The OPA-governed hybrid agent team (e.g., 2 human overseers/architects, 3 AI agents) following the Implementation Framework.
* **Duration:** 6 months, measuring metrics on a weekly basis.
* **Hypothesis:** Team H will have a faster *initial* Lead Time, but Team A will surpass it in Velocity (Deployment Frequency, Lead Time) within 3 months while maintaining a superior Change Fail Rate. Team A's human overseers will report high, but focused, cognitive load, which will be the primary scaling bottleneck.
### **4.1. Quantitative Success Metrics**
* **DORA Metrics :**
* *Lead Time for Changes:* Time from commit (or plan approval for Team A) to production.
* *Deployment Frequency:* Deploys per day/week.
* *Change Failure Rate:* Percentage of deployments causing a failure in production.
* *Time to Restore Service (MTTR):* Time to recover from a failure.
* **AI-Specific Indicators :**
* *Human Override Rate (HOR):* (Required Human Rejections) / (Total Agent-Proposed Plans). A declining HOR indicates increasing agent trust and autonomy.
* *Policy Violation Rate (PVR):* (OPA-Blocked Commits) / (Total Agent Commits). A high PVR indicates agent "drift" or poor policy design.
* **Economic Indicators:**
* *Cost Per Slice:* (Total cost of human-hours \+ compute) / (VSA slices delivered).
### **4.2. Qualitative Success Metrics**
* **Team Cognitive Load :**
* *Method:* Use validated cognitive load surveys (e.g., NASA-TLX) and structured interviews with both Team H (developers) and Team A (overseers).
* *Objective:* To test the "Cognitive Load Shift" hypothesis. Success is defined as Team A's overseers reporting their *focused* load is more manageable and sustainable than the *diffuse* load reported by Team H's developers.
* **Developer Satisfaction (DevEx):**
* *Method:* Measure developer satisfaction.
* *Objective:* For Team A, this measures the satisfaction of the *human architects* with their new role as policy-driven overseers.
## **5\. Bibliography**
*Note: The following represents a selection of key sources used in this report, formatted in APA style based on available information.*
(2025). *AI-Augmented CI/CD Pipelines: From Code Commit to Production with Autonomous Decisions*. arXiv.
(2025). *DORA Research: 2025*. DORA.
(2025). *Explainable Artificial Intelligence Techniques for Software Development Lifecycle: A Phase-specific Survey*. arXiv.
(2025). *State of AI-assisted Software Development 2025*. Google Cloud.
Aitelharraj, A. (n.d.). *Team Topologies applied to AI Agents: Conway's Law for Agentic AI*. Medium.
Bogard, J. (n.d.). *Vertical Slice Architecture*. jimmybogard.com.
Datadog. (n.d.). *What are DORA metrics?* Datadog.
FoundationAgents. (2025). *MetaGPT*. GitHub.
Google. (2024). *Service Weaver*. serviceweaver.dev.
Hoop. (n.d.). *AI governance made simple with Open Policy Agent (OPA)*. hoop.dev.
InfoQ. (2023). *Google has released Service Weaver, an open-source framework for building and deploying distributed applications*. InfoQ.
IT Revolution. (2024). *Team Cognitive Load: The Hidden Crisis in Modern Tech Organizations*. IT Revolution.
Liu, C., Lin, H. Y., & Thongtanunam, P. (2025). *Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics*. arXiv.
Open Policy Agent. (n.d.). *Introduction*. openpolicyagent.org.
Open Policy Agent. (n.d.). *Policy-Driven CI/CD*. openpolicyagent.org.
PactFlow. (2025). *AI-Augmented Contract Testing*. pactflow.io.
PactFlow. (2025). *Introducing the PactFlow MCP Server: AI-Powered Contract Testing, Now in Your IDE*. pactflow.io.
PactFlow. (n.d.). *How Pact works*. pactflow.io.
Skelton, M., & Pais, M. (2025, January 14). *The future of Team Topologies: when AI agents dominate*. teamtopologies.com.
Syntaxia. (n.d.). *AI Agent Safety: Circuit Breakers for Autonomous Systems*. syntaxia.com.
The New Stack. (n.d.). *Human-on-the-Loop: The New AI Control Model That Actually Works*. thenewstack.io.
Thoughtworks. (n.d.). *Vertical thin slicing*. thoughtworks.com.
Trend Micro. (2024). *The Mirage of AI Programming: Hallucinations and Code Integrity*. Trend Micro.
TrendMicro. (n.d.). *When LLMs day dream: Hallucinations and how to prevent them*. Red Hat.
V. K. (2024). *Modular Monoliths: Is This the Trend in Software Architecture?* 2024 IEEE/ACM International Workshop on New Trends in Software Architecture (SATrends).
Vaz, A. P. P., et al. (2024). *Modular Monolith Architecture: A Systematic Grey Literature Review*. MDPI.
Vercel. (2025, October 3). *AI Week: What Autonomous Agents Actually Need from Your APIs*. zuplo.com.
VFunction. (n.d.). *vFunction*. vfunction.com.
Wang, P., et al. (2024). *Human-in-the-Loop LLM-based Software Development Agent Framework (HULA)*. arXiv.
#### **Works cited**
1\. From Monoliths to Composability: Aligning Architecture with AI's Modularity \- Medium, https://medium.com/software-architecture-in-the-age-of-ai/from-monoliths-to-composability-aligning-architecture-with-ais-modularity-55914fc86b16 2\. Vertical Slice Architecture \- Jimmy Bogard, https://www.jimmybogard.com/vertical-slice-architecture/ 3\. Human-on-the-Loop: The New AI Control Model That Actually Works ..., https://thenewstack.io/human-on-the-loop-the-new-ai-control-model-that-actually-works/ 4\. Introduction | Open Policy Agent, https://openpolicyagent.org/docs 5\. Using OPA in CI/CD Pipelines \- Open Policy Agent, https://openpolicyagent.org/docs/cicd 6\. Policy-driven continuous integration with Open Policy Agent | by Luc Perkins, https://blog.openpolicyagent.org/policy-driven-continuous-integration-with-open-policy-agent-b98a8748e536 7\. Team Topologies applied to AI Agents : Conway's Law for Agentic AI, https://medium.com/@amine.aitelharraj/-3eb5cd3dbcea 8\. Conway's law \- Wikipedia, https://en.wikipedia.org/wiki/Conway%27s\_law 9\. What Are DORA Metrics? \- Datadog, https://www.datadoghq.com/knowledge-center/dora-metrics/ 10\. AI-Augmented CI/CD Pipelines: From Code Commit to ... \- arXiv, https://arxiv.org/pdf/2508.11867 11\. Team Cognitive Load: The Hidden Crisis in Modern Tech ..., https://itrevolution.com/articles/team-cognitive-load-the-hidden-crisis-in-modern-tech-organizations/ 12\. DORA Report 2025 Key Takeaways: AI Impact on Dev Metrics, https://www.faros.ai/blog/key-takeaways-from-the-dora-report-2025 13\. Organizational design and Team Topologies after AI \- Thoughtworks, https://www.thoughtworks.com/insights/podcasts/technology-podcasts/organizational-design-team-topologies-ai 14\. FoundationAgents/MetaGPT: The Multi-Agent Framework ... \- GitHub, https://github.com/FoundationAgents/MetaGPT 15\. What Is Human In The Loop (HITL)? \- IBM, https://www.ibm.com/think/topics/human-in-the-loop 16\. Human-In-the-Loop Software Development Agents \- arXiv, https://arxiv.org/html/2411.12924v1 17\. Human in the Loop vs. Human on the Loop: Navigating the Future of AI \- Serco, https://www.serco.com/na/media-and-news/2025/human-in-the-loop-vs-human-on-the-loop-navigating-the-future-of-ai 18\. Modular Monolith Architecture in Cloud Environments: A Systematic ..., https://www.mdpi.com/1999-5903/17/11/496 19\. Exploring Software Architecture: Vertical Slice | by Andy MacConnell | Medium, https://medium.com/@andrew.macconnell/exploring-software-architecture-vertical-slice-789fa0a09be6 20\. Clean Architecture with Modular Monolith and Vertical Slice | by Eda Belge | Medium, https://medium.com/@eda.belge/clean-architecture-with-modular-monolith-and-vertical-slice-896b7ee22e3e 21\. AI Week: What Autonomous Agents Actually Need from Your APIs ..., https://zuplo.com/blog/what-autonomous-agents-actually-need-from-your-apis 22\. OpenAPI Contract | OpenAPI Definition file \- 42Crunch, https://42crunch.com/openapi-contract/ 23\. From Specification to Service: Accelerating API-First Development Using Multi-Agent Systems \- arXiv, https://arxiv.org/html/2510.19274v1 24\. AI-Augmented Contract Testing | PactFlow, https://pactflow.io/ai/ 25\. Introducing the PactFlow MCP Server: AI-Powered Contract Testing, Now in Your IDE, https://pactflow.io/blog/pactflow-mcp-server/ 26\. What is Pact contract testing & how does it work? | PactFlow, https://pactflow.io/how-pact-works/ 27\. vFunction | AI-Driven Architectural Modernization, https://vfunction.com/ 28\. Enforcer | Official Products and Services for Duende IdentityServer and IdentityServer4, https://www.identityserver.com/products/enforcer 29\. Active Directory Password Policy Enforcer \- Netwrix, https://netwrix.com/en/products/password-policy-enforcer// 30\. URL Filtering | Junos OS \- Juniper Networks, https://www.juniper.net/documentation/us/en/software/junos/interfaces-adaptive-services/topics/topic-map/url-filtering.html 31\. Integration of Juniper ATP Cloud and Web Filtering on MX Series Routers | Junos OS, https://www.juniper.net/documentation/us/en/software/junos/interfaces-next-gen-services/interfaces-adaptive-services/topics/topic-map/sky-atp-mx-integration.html 32\. AI Governance Made Simple with Open Policy Agent (OPA) \- hoop.dev, https://hoop.dev/blog/ai-governance-made-simple-with-open-policy-agent-opa/ 33\. When LLMs day dream: Hallucinations and how to prevent them \- Red Hat, https://www.redhat.com/en/blog/when-llms-day-dream-hallucinations-how-prevent-them 34\. The Mirage of AI Programming: Hallucinations and Code Integrity | Trend Micro (US), https://www.trendmicro.com/vinfo/us/security/news/vulnerabilities-and-exploits/the-mirage-of-ai-programming-hallucinations-and-code-integrity 35\. AI Agent Safety: Circuit Breakers for Autonomous Systems \- Syntaxia, https://www.syntaxia.com/post/ai-agent-safety-circuit-breakers-for-autonomous-systems 36\. AI Issues? Take Control with Rubrik Agent Rewind, https://www.rubrik.com/insights/ai-issues-take-control-with-rubrik-agent-rewind 37\. \[2505.07058\] Explainable Artificial Intelligence Techniques for Software Development Lifecycle: A Phase-specific Survey \- arXiv, https://arxiv.org/abs/2505.07058 38\. Towards an MLOps Architecture for XAI in Industrial Applications \- arXiv, https://arxiv.org/html/2309.12756 39\. The Role of Explainable AI in Automated Software Testing: Opportunities and Challenges \- Preprints.org, https://www.preprints.org/frontend/manuscript/ac4c260521783ab5c71424fcc1bb4bfa/download\_pub 40\. State of AI-assisted Software Development 2025 \- DORA, https://dora.dev/dora-report-2025 41\. DORA | Get Better at Getting Better, https://dora.dev/ 42\. DORA's software delivery metrics: the four keys, https://dora.dev/guides/dora-metrics-four-keys/ 43\. Cognitive processes while using Artificial Intelligence at work: a research agenda on challenges and opportunities Processi cogn, https://oaj.fupress.net/index.php/formare/article/download/17122/13856/80903 44\. What Are AI Hallucinations? \- IBM, https://www.ibm.com/think/topics/ai-hallucinations 45\. When AI Gets It Wrong: Addressing AI Hallucinations and Bias, https://mitsloanedtech.mit.edu/ai/basics/addressing-ai-hallucinations-and-bias/ 46\. Google Service Weaver Enables Coding as a Monolith and Deploying as Microservices, https://www.infoq.com/news/2023/03/google-weaver-framework/ 47\. Service Weaver, https://serviceweaver.dev/ 48\. Modular Monolith: Is This the Trend in Software Architecture? | IEEE ..., https://ieeexplore.ieee.org/document/10669865 49\. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead \- arXiv, https://arxiv.org/html/2404.04834v4 50\. DORA Metrics: How to measure Open DevOps Success \- Atlassian, https://www.atlassian.com/devops/frameworks/dora-metrics 51\. Measuring Developer Productivity in the LLM Era | by Yuji Isobe \- Medium, https://medium.com/@yujiisobe/measuring-developer-productivity-in-the-llm-era-b002cc0b5ab4 52\. 2025 DORA State of AI-assisted Software Development Report \- Google Cloud, https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report 53\. Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics \- arXiv, https://arxiv.org/html/2508.08661v1 54\. Open Policy Agent, https://openpolicyagent.org/ 55\. AWS Marketplace: PactFlow for AWS, https://aws.amazon.com/marketplace/pp/prodview-ccsfu3wgfn77k 56\. Comprehensive Contract Testing | API Hub, https://pactflow.io/ 57\. The Future of Team Topologies: When AI Agents Dominate, https://teamtopologies.com/news-blogs-newsletters/2025/1/14/the-future-of-team-topologies-when-ai-agents-dominate 58\. Three delivery planning principles for iterating towards the right data product \- Thoughtworks, https://www.thoughtworks.com/en-us/insights/e-books/modern-data-engineering-playbook/delivery-planning-principles