Modern AI Development: What Has Actually Changed

/images/blog-generated/modern-ai-development-what-has-changed.webp

There is no shortage of articles declaring that AI has changed everything about software development. Most of them are wrong, or at least imprecise. They conflate what has changed with what people want to have changed, or they describe a future state as if it were the present.

This is not that article. We have been building software at CONFLICT for over thirteen years and building with AI for a meaningful subset of that time. What follows is a practitioner’s assessment of what has actually shifted in how software gets built over the last 18 months, based on what we have shipped, what has worked, what has failed, and what we have watched the rest of the industry learn the hard way.

The short version: the change is real, but it is not the change most people are talking about. AI has not replaced developers. It has not made software engineering easy. What it has done is change the methodology of building software in ways that are structural, not cosmetic. And organizations that understand this distinction are pulling ahead fast.

AI Is No Longer a Feature. It Is a Methodology.

For most of the last decade, AI in software development meant one thing: adding AI-powered features to products. Recommendation engines, chatbots, image classification, natural language search. AI was a capability you bolted onto an application. The engineering process itself remained unchanged. You still wrote code the same way, reviewed it the same way, tested it the same way, and deployed it the same way.

That framing is obsolete.

The shift that has actually happened is not about adding AI features to products. It is about using AI as the methodology for building products. The engineering process itself has changed. Code generation, test writing, architecture exploration, specification validation, documentation, debugging – these are now AI-assisted activities regardless of whether the product you are building has anything to do with AI.

GitHub’s Octoverse 2025 report found that AI-assisted development has moved beyond early adoption into mainstream practice, with AI coding tools now integrated into the daily workflow of the majority of active developers on the platform. This is not a niche practice anymore. It is how software gets built.

The Stack Overflow 2025 Developer Survey tells a similar story: AI tool adoption among professional developers has crossed the tipping point from optional to expected. Developers who are not using AI tools are now the exception, not the norm, and the gap in perceived productivity between users and non-users continues to widen.

But adoption numbers do not tell you what actually changed in practice. They tell you that people are using tools. What matters is how those tools changed the work.

The Rise of Agentic Coding Tools

The most visible change is the maturation of agentic coding tools. Eighteen months ago, the state of the art was code completion – GitHub Copilot suggesting the next line. That was Level 1 on the maturity model we wrote about previously. Useful, but fundamentally a typing accelerator.

What exists now is categorically different. Claude Code, Cursor, Windsurf, Copilot Workspace, and a growing list of competitors operate as agentic coding environments. They do not suggest the next line. They implement features. They read your codebase, understand its architecture, write code across multiple files, run tests, interpret errors, fix bugs, and iterate until the implementation works.

The difference between autocomplete and agentic coding is the difference between a spell checker and a ghostwriter. One corrects your typos. The other produces the first draft.

This changes the developer’s role. The primary activity shifts from writing code to directing, reviewing, and refining code. The developer becomes an architect and reviewer who spends more time on specification, design decisions, and quality evaluation than on keystroke-level implementation. This is not a theoretical prediction. It is what we observe daily in our own delivery pipeline.

Google DeepMind’s research on AI coding capabilities has demonstrated that frontier models can now handle increasingly complex software engineering tasks, not just isolated function generation but multi-file changes that require understanding system-level context. The gap between what AI can generate and what production systems require is closing, though it has not closed.

The key insight from working with these tools daily: they are not uniformly capable. An agentic tool that excels at implementing a well-specified REST endpoint may struggle with a complex database migration or a nuanced architectural refactor. The skill is not in using the tool. The skill is in knowing what to hand to the tool, what to keep for yourself, and how to verify what comes back.

Context Engineering: The Skill That Actually Matters

We wrote an entire post about the death of prompt engineering and the rise of context engineering. Eighteen months of further practice has only strengthened that position.

Prompt engineering – the art of crafting the perfect instruction to an LLM – was a useful skill when the interaction model was a single request and response. You asked a question, the model answered, and the quality of your question determined the quality of the answer.

Modern AI-assisted development does not work that way. When you are using an agentic coding tool to implement a feature, the “prompt” is a small fraction of what the model sees. The context includes your codebase, your file structure, your recent changes, your test results, your error logs, relevant documentation, architectural conventions, and the accumulated history of the current session. The quality of this context – what is included, what is excluded, how it is structured – determines the quality of the output far more than the words in your instruction.

Anthropic has written extensively about context engineering as a discipline, describing it as the systematic design of the information environment that surrounds an LLM interaction. This framing matches our experience precisely. The teams that produce the best results with agentic tools are not the teams with the best prompts. They are the teams with the best-organized codebases, the clearest architectural documentation, the most disciplined file structures, and the most complete specifications.

Context engineering is a systems discipline, not a writing skill. It encompasses:

Codebase organization that makes relevant code easy for agents to discover and understand. Clear module boundaries, consistent naming conventions, well-structured directories.
Specification quality that gives agents unambiguous instructions. We have written about spec-driven development as the backbone of agentic engineering. This is where it pays off.
Documentation as context. Architecture decision records, API contracts, coding standards documents – these are not just for humans anymore. They are context that agents consume to produce better output.
Session management. Knowing when to start a fresh session, when to provide additional context, when to break a task into smaller pieces that fit within the agent’s effective context window.
Retrieval infrastructure. For larger codebases, the ability to surface the right files and the right documentation to the agent at the right time. This is where tools like knowledge graphs and intelligent code search become critical.

The practical implication: investing in codebase quality, documentation, and specification rigor now has a double return. It helps human developers, and it dramatically improves agent output. Organizations with messy codebases and sparse documentation get worse results from the same AI tools than organizations with clean codebases and thorough documentation. The tools amplify what you already have.

Multi-Model Strategies Are Table Stakes

Twelve months ago, many organizations were still asking “which AI model should we use?” as if the answer were singular. That question is already outdated.

The model landscape has fragmented in a way that makes single-model strategies untenable. Anthropic’s Claude excels at analysis, nuanced reasoning, and long-context tasks. OpenAI’s GPT-4o is strong at structured output and broad general knowledge. Google’s Gemini offers competitive performance with massive context windows and multimodal capabilities. And behind these frontier models sits a growing tier of cost-efficient options – Claude Haiku, GPT-4o-mini, Gemini Flash – that handle routine tasks at a fraction of the cost.

No single model is the best choice for every task. The engineering decision is not which model to use. It is which model to use for which task, and how to route between them.

This is why we built CalliopeAI. We hit this problem in our own work: different tasks in the same pipeline performed better on different models, and we needed an orchestration layer that could handle routing, prompt adaptation, failover, and quality evaluation across providers. The alternative – coupling every application to a single provider – is a bet against the pace of change in this market.

Multi-model routing in practice looks like this:

Complex reasoning and analysis routes to a frontier model (Claude Opus, GPT-4o) where quality justifies the cost.
Code generation for well-specified tasks routes to mid-tier models that offer a good balance of capability and speed.
Classification, extraction, and formatting routes to cost-efficient models (Haiku, GPT-4o-mini, Flash) that handle structured tasks well at high throughput.
Fallback chains ensure that if the primary model is unavailable or degraded, the request routes to an alternative without the application failing.

This is not optimization for the sake of optimization. It is a 3-5x cost reduction on AI spend with equivalent or better quality, because you are not paying frontier model prices for tasks that a smaller model handles just as well.

The Death of the Monolithic AI Vendor Relationship

This multi-model reality has a strategic implication that many organizations have not yet internalized: the era of the monolithic AI vendor relationship is over.

For the last two years, a common pattern was for an organization to sign an enterprise agreement with OpenAI or another provider, build everything on that provider’s models, and treat AI as a vendor relationship similar to a cloud provider. One contract, one integration, one dependency.

This was always a risky strategy, and now it is an actively harmful one. The model landscape shifts too fast. In the last 18 months, we have seen:

Models that were best-in-class get surpassed within weeks of their release
Pricing changes that made previously cost-effective architectures expensive
Provider outages that took down AI-dependent applications for hours
New model capabilities that unlocked architectural patterns that did not exist before

Organizations locked into a single provider miss these shifts. They continue paying premium prices for tasks where cheaper alternatives perform equally well. They lack failover when their provider has issues. They cannot take advantage of new capabilities from competing providers without a major re-integration effort.

The organizations that are navigating this well treat AI models as a commodity layer with an abstraction layer above it. The abstraction handles provider management, routing, prompt adaptation, and quality evaluation. The application code never touches a provider API directly. When a new model emerges or a pricing change occurs, the adaptation happens at the abstraction layer, not in the application code.

This is not future-proofing. It is present-tense operational necessity.

Spec-Driven Development Replaces Ad-Hoc Prompting

The last major shift worth highlighting is in how work is defined for AI-assisted execution.

Early AI-assisted development was ad-hoc. A developer would open a chat window, describe what they wanted in natural language, get some code back, evaluate it, iterate, and eventually extract something usable. This worked for small tasks and fell apart for anything complex.

What has replaced it, in teams that are getting serious results, is spec-driven development. Instead of ad-hoc natural language instructions, work is defined in structured specifications that include:

Functional requirements with precise input/output definitions
Non-functional requirements (performance, security, accessibility)
Interface contracts that define how the new code integrates with existing systems
Validation criteria that define testable success conditions
Domain context that provides the background knowledge the agent needs

We have written about this in detail. The practical result is that a well-specified task achieves an 80-95% first-pass success rate with agentic implementation, compared to 30-50% for ad-hoc prompting. The time spent writing specifications is recovered many times over in reduced iteration and rework.

This is also where context engineering and spec-driven development converge. The specification is not just an instruction to the agent. It is the primary context document that shapes the agent’s understanding of the task. A good specification includes exactly the information the agent needs and excludes information that would create noise or ambiguity.

What Has Not Changed

For balance, here is what has not changed despite eighteen months of rapid advancement:

Architecture still matters. AI tools can generate code faster, but they cannot decide the right architecture for a system. Architectural decisions – how to decompose a system, what tradeoffs to make between consistency and availability, how to handle data flow across service boundaries – still require experienced human judgment. Bad architectural decisions made faster are still bad architectural decisions.

Testing is still essential. In fact, it is more essential. When code is generated by agents, the test suite is the primary quality gate. The tests are what tell you whether the generated code actually works, not just whether it compiles. Teams that skimp on testing in an AI-assisted workflow produce unreliable systems faster than teams that skimped on testing in a manual workflow.

Security requires human oversight. AI tools can generate code that has security vulnerabilities just as easily as they generate code that does not. The surface area for security issues has not decreased. It has increased, because more code is being produced faster with less line-by-line human review. Security review processes need to adapt to higher throughput, not disappear.

Domain expertise is irreplaceable. AI tools do not understand your business. They do not know your regulatory environment, your customer expectations, your competitive dynamics, or your organizational constraints. Domain expertise – the understanding of what to build and why – remains a human responsibility.

Maintenance costs persist. Code generated by AI still needs to be maintained by humans (and other AI agents). A feature that takes an hour to generate still takes the same effort to maintain, monitor, debug, and support over its lifetime. Faster generation does not reduce total cost of ownership proportionally.

The Practitioner’s Summary

Here is what has actually changed, stripped of hype:

AI is a methodology, not just a feature. The engineering process itself uses AI regardless of the product being built.
Agentic tools have matured from autocomplete to genuine implementation partners, shifting developer work from writing code to directing and reviewing it.
Context engineering has replaced prompt engineering as the critical skill. The quality of the information environment matters more than the quality of the instruction.
Multi-model strategies are necessary, not optional. No single model wins across all tasks, and organizations need routing, abstraction, and fallback at the model layer.
The single-vendor AI relationship is a liability. The market moves too fast for lock-in to be viable.
Spec-driven development has replaced ad-hoc prompting for any team that is serious about quality and consistency.

These are not predictions. They are observations from daily practice. The organizations that understand these shifts and adapt their processes accordingly are shipping faster, at higher quality, and at lower cost than the organizations that are still treating AI as a code completion plugin.

The landscape has changed. The question is whether your engineering methodology has changed with it.

posted by admin

Mar 04, 2026 - 12