Shifting Responsibilities Since LLMs: How Will AI Affect Jobs?
It’s interesting how responsibilities for developers have shifted since the advent of AI code completion and LLMs. I spend less time doing low-level “grunt work”, and more time doing high-level project management. I spend less time staring at code, working out bugs, and going through functions and algorithms, and more time planning what tech stack to use, as well as reading LLM responses and source documentation.
It’s not that I have fewer responsibilities, or easier tasks, but that my responsibilities have shifted to that of a senior developer, or project manager (even for solo projects), since I’ve offloaded a lot of the tedious coding work to LLMs.
How AI Will Affect Jobs (for Tech/IT)
If you’ve been feeling the same way—you’re not imagining it. The job has shifted from “typing code” to orchestrating systems and verifying outcomes. The developers who thrive treat LLMs like a squad of fallible junior devs: high output, uneven judgment. The leverage is real, but only if you own specification, architecture, and verification. If you delegate those, quality drifts and you become a babysitter for flaky code.
What Actually Changed Since LLMs
-
From implementation to orchestration. The work moved up-stack: requirements framing → architecture → interface design → review/merge policies. Generation is cheap; deciding what to build and ensuring it’s correct is the scarce skill.
-
From deterministic pipelines to probabilistic ones. Compilers are deterministic; LLMs aren’t. That pushes more energy into test design, property checks, and guardrails. “It compiles” is now table stakes, not evidence of correctness.
-
Documentation is now a first-class dependency. Your prompt/RAG context is only as good as the internal docs, ADRs, and examples you give it. Maintaining that corpus is part of the job, not an afterthought.
-
Glue beats grind. The value is in stitching services, contracts, and data flows—less in hand-writing boilerplate. This feels like senior/project-lead work even on solo projects.
What is an LLM?
LLM stands for Large Language Model.
In practical terms, the LLM meaning is a neural network—typically a Transformer—trained to predict the next token in text. That simple objective, scaled over huge datasets, yields models that can summarize, translate, write code, and answer questions. Modern large language models (LLM) are post-trained (instruction tuning, RLHF) so they follow directions better; and when you ground them with retrieval (RAG), they cite sources and reduce guesswork. Think of them as fast, fallible pattern engines—you supply the constraints, data, and checks.
New Skill Stack That Matters
- Specification writing. Tight, testable specs with constraints and examples. If you can’t specify precisely, you’ll get plausible nonsense fast.
- Context engineering. Curating the right code, docs, and constraints into the model. Retrieval beats “longer prompts”.
- Verification engineering. Property-based tests, mutation testing, golden files, contract tests—because generation quality varies.
- Knowledge management. ADRs, style guides, canonical examples, decision logs, and “how we do X” patterns the model can learn from.
- Tooling fluency. Static analysis, type systems, CI, SBOMs, and policy-as-code to catch drift.
What to Delegate vs. What to Keep
Delegate confidently
- CRUD scaffolding, DTOs, mappers, zod/validator schemas
- Boring tests (HTTP endpoint smoke, serialization round-trips)
- Regexes, migrations boilerplate, config templates
- Docs, README stubs, typed client generation from OpenAPI/GraphQL
- UI boilerplate (forms, table CRUD, state slice shells)
Keep for yourself
- Domain models and invariants
- Public interfaces and versioning policy
- Security-sensitive paths (auth, crypto, money movement)
- Concurrency, cancellation/timeout semantics
- Data lifecycle, privacy boundaries, telemetry strategy
- Performance-critical sections and anything with tricky edge cases
Practical, High-Level Tech Stack Example (Node/Express + Angular/Vue/React)
-
Write the contract first. OpenAPI schema + TypeScript interfaces + acceptance criteria. Include invariants and failure modes.
-
Generate the scaffolding, not the system. Use the model to draft controllers/services, Angular standalone components, and test skeletons. Keep side effects thin and injectable.
-
Pin patterns. Provide a minimal “golden repo” of examples: one service, one component, one test. Reuse that in prompts or RAG so style stays consistent.
-
Verification gates.
- Unit + property tests for invariants
- Contract tests against mock servers
- Mutation testing to ensure tests bite
- Lint/type/format must pass; block merges otherwise
-
Review like a senior. Read diffs for invariant violations, unnecessary complexity, and hidden coupling, not just syntax.
-
Refactor with intent. After generation, normalize to your architectural patterns (ports/adapters, feature modules, etc.). Don’t leave the “first draft” shape in place.
-
Record decisions. ADR per notable choice (DB shape, cache policy, auth flow). LLMs reuse this context surprisingly well.
Guardrails and Things to Consider
- Type-first plus runtime validation. TS everywhere + zod/io-ts at boundaries.
- Property-based tests on core rules. One good property test beats 20 happy-path unit tests.
- Mutation testing to keep tests honest.
- Supply-chain checks. License and security scanners; LLMs love sprinkling random libs.
- Prompt hygiene. No secrets in prompts; redact inputs; prefer local models or self-hosted inference for sensitive code.
- Determinism when needed. Temperature=0 for codegen in CI; allow creative temps only in local exploration.
Risks to Watch (and how to counter)
- Quality drift. Counter: golden examples, style guide, templates, and CI checks that fail on pattern violations.
- Vendor lock-in. Counter: prompt+context kept in repo; support at least two providers; don’t rely on provider-specific quirks.
- Security/privacy leaks. Counter: redaction, allow-lists for outbound calls, test prompts with “toxic” inputs (prompt injection cases).
- License contamination. Counter: SBOMs, license policy enforcement, and “no new runtime deps without review”.
Career Implications Because of AI (how to stay sharp)
- Double down on architecture and debugging. When things go weird—queues stuck, deadlocks, cache incoherence—LLMs help, but only if you already understand the system. Keep those muscles by doing occasional “manual mode” tasks or running postmortems deeply.
- Develop taste. Models generate many OK solutions; your edge is choosing a simple, resilient one and saying “no” to clever noise.
- Own the evaluation harness. If you can measure correctness, perf, and regressions automatically, you can scale yourself.
Key Performance Indicators (KPIs) for LLM Workflows
-
Escaped Defects per 1,000 Lines of Code (LOC) and Mean Time to Repair (MTTR):
- Escaped Defects: The number of bugs that were not caught before the software was released, measured for every 1,000 lines of code.
- Mean Time to Repair: The average time it takes to fix a bug after it has been reported.
-
Mutation Score and Property-Covered Surface Area:
- Mutation Score: A measure of how well your tests can catch changes or “mutations” in the code. A higher score means better testing.
- Property-Covered Surface Area: The extent to which your tests cover important properties or features of the code.
-
Test Runtime vs. Flake Rate:
- Test Runtime: The amount of time it takes to run all your tests.
- Flake Rate: The frequency of tests that fail intermittently without any changes to the code, indicating instability.
-
Code Churn per Feature vs. Cycle Time:
- Code Churn: The amount of code that is added, modified, or deleted for each feature.
- Cycle Time: The total time taken to develop a feature from start to finish.
-
Bundle Size and APIs Changed per Release:
- Bundle Size: The size of the software package that is released.
- APIs Changed: The number of Application Programming Interfaces (APIs) that have been modified in each release.
-
Dependency Count and Supply-Chain Alerts Resolved:
- Dependency Count: The number of external libraries or tools that your software relies on.
- Supply-Chain Alerts Resolved: The number of security or compatibility issues related to these dependencies that have been addressed.
-
Performance Budgets (P95 Latency, Memory):
- Performance Budgets: Targets for how well the software should perform.
- P95 Latency: The maximum time it takes for 95% of requests to be processed.
- Memory: The amount of memory used by the software during operation.
These KPIs help teams monitor the quality, efficiency, and performance of their software development processes, especially when integrating LLMs.
Prompt Patterns That Work
- Spec + Constraints + Negative examples. “Do X; must not do Y; reject Z; here are failing cases.”
- Interface-first. Provide types and signatures up front; forbid changing them.
- Review pass. Ask the model to critique its own diff against your ADRs before you read it.
- Tight loops. Small files, small deltas, frequent runs.
- Vibe Coding. Don’t just “vibe code”—use LLMs and AI as a “tool” to enhance your skills.
Treat the LLM as a junior team you manage: you set architecture, spell out contracts, provide examples, and build a safety net that catches when it’s confidently wrong. That’s senior work. If you invest in specs, tests, and documentation, you get the leverage without the entropy. If you don’t, the short-term velocity turns into long-term maintenance drag.
Vibe Coding Meaning
“Vibe coding” is great for exploration and scaffolding, terrible as a default delivery mode. Use it like a spike from a junior pair: time-boxed, disposably creative, then rewritten against your architecture and tests. If you let vibe code merge unchallenged, you buy long-term entropy.
Take a look at our other vibe coding article for more details on what vibe coding is and its current 2025 definition.
Here’s the full picture.
What “Vibe Coding” Actually Is
Working by feel with an LLM as pattern-completion: sketch a prompt, accept plausible code, iterate until it runs. It’s fast because it borrows judgment from prior patterns—but those patterns aren’t your domain, your constraints, or your stack conventions.
- It accelerates generation and UI glue, but it degrades specification and verification unless you compensate.
- It magnifies the shift to orchestration: you must police architecture and tests harder, because the model will happily produce confident nonsense.
Where Vibe Coding Helps (use deliberately)
- Greenfield UI and layout: forms, tables, CSS scaffolding, component shells.
- Boring glue: DTOs, mappers, HTTP client wrappers, OpenAPI client gen, README stubs.
- Search-space exploration: “show three approaches” then you pick one and rewrite.
Where Vibe Coding Hurts (do not use)
- Domain invariants and core data models.
- Security paths (authN/Z, crypto, money movement).
- Concurrency/latency semantics, cancellation, resource lifetimes.
- Performance-critical code and anything with gnarly edge cases.
Vibe Coding Gotchas (things to be aware of)
Here are some of the big gotchas (and fixes):
- Hidden coupling & pattern drift
- Symptom: inconsistent names, ad-hoc cross-file contracts, surprise globals, “stringly-typed” soup.
- Fix: enforce ports/adapters. Domain types/interfaces first. CI rule: “public interfaces require ADR or issue link.”
- Dependency creep & supply-chain risk
- Symptom: five new packages per prompt, licenses all over, bundle bloat.
- Fix: No new deps without review. SBOM + license policy in CI. Prefer stdlib/known libs. Fail builds on unknown licenses.
- Type safety erosion
- Symptom:
any
, widened unions, “TODO: fix types later”. - Fix: TS strictness on:
noImplicitAny
,noUncheckedIndexedAccess
,exactOptionalPropertyTypes
,noImplicitOverride
. Block PRs that addany
.
- Illusory tests
- Symptom: generated tests assert mocks, not behavior; snapshots that lock in bugs.
- Fix: property-based tests for invariants, mutation testing to ensure tests bite, contract tests at API boundaries, golden files only where stable.
- Context debt (prompt-only decisions)
- Symptom: design “rules” live in last week’s chat, not in the repo.
- Fix: commit prompts and decisions:
/docs/adr/NNN-*.md
+/prompts/feature-x.md
. Make the repo the source of truth the model sees.
- Security foot-guns
- Symptom: wide-open CORS, secrets in code, “disable TLS verification”.
- Fix: policy-as-code: OPA/Conftest or custom ESLint rules to forbid certain patterns; secret scanners; deny-by-default CORS.
- Performance cliffs
- Symptom: N+1 queries, chatty APIs, accidental O(n²) in loops, memory leaks.
- Fix: perf budgets (P95 latency/mem) in CI, flamegraphs in PR template, load tests for critical flows.
- Non-determinism & flake
- Symptom: different runs produce different shapes; flaky unit tests.
- Fix: pin generation settings for CI (deterministic prompts/temperature), and never merge unreviewed “creative” drafts.
- Outdated patterns copied confidently
- Symptom: obsolete Angular module syntax, deprecated RxJS ops, old Express middleware (
body-parser
instead ofexpress.json
). - Fix: style guide + canonical examples in repo; lint rules that forbid banned APIs.
Quick “Anti-Vibe” Checklist (before merging)
- No new runtime dependency without issue/ADR link
- No
any
/implicitany
; strict TS flags enforced - Angular: standalone imports explicit; no NgModules leaked
- RxJS: no deprecated syntax;
takeUntilDestroyed()
present - Security: no secrets, permissive CORS, or “disable TLS” hacks
- Tests: at least one property test on a core invariant; mutation score ≥ target
- Perf: budget respected; no obvious N+1 / O(n²) in hot path
- ADR committed; prompts/decisions captured in repo
Organizational Implications (even for solo work)
- You become the editor-in-chief. Your taste and guardrails determine quality.
- Docs become runtime dependencies. If the model can’t “see” your conventions, it will invent them.
- Velocity without discipline = debt. If you can’t measure correctness, vibe coding turns into whack-a-mole maintenance.
Vibe coding is a tool, not a methodology. Use it to explore and scaffold; never to define the system. Anchor everything to contracts, ADRs, and verification. That way you keep the speed and dodge the entropy.
How AI Will Affect Jobs (in general)?
AI won’t “replace jobs” wholesale so much as re-price tasks. Exposure is largest where work is cognitive and repeatable; complementarity (humans + AI) is strongest where judgment, trust, and accountability matter. In rich economies, a majority of roles have meaningful AI exposure, with uneven gains and risks across workers and regions. Early field evidence shows real productivity lifts, especially for less-experienced workers, but economy-wide gains follow a J-curve: benefits arrive after firms invest in process redesign, data, training, and guardrails. (IMF, NBER, Science, American Economic Association)
What Changes Concretely (non-tech jobs included)?
Mechanisms
-
Task substitution: routine analysis, drafting, summarizing, scheduling, and basic customer interactions compress in time/cost. (IMF)
-
Task complementarity: AI amplifies people in roles heavy on judgment, persuasion, or duty-of-care (managers, clinicians, teachers). Demand shifts toward coordination and business/process skills. (OECD)
-
Distributional effects: workers with higher education and AI-complementary skills benefit more; without policy, inequality can widen. (OECD)
Magnitude (credible estimates & evidence)
-
Exposure: ~40% of global employment has AI exposure; in advanced economies it’s ~60% given the prevalence of cognitive work. Effects can be positive or negative depending on whether tasks are complemented or substituted. (IMF)
-
Productivity in the field:
- Call-center deployment: +14% issues resolved/hr on average, +34% for novices. (NBER, Stanford Graduate School of Business)
- Professional writing experiment: ~40% faster, quality up ~18%. (Science)
-
Timeline: Expect a Productivity J-curve: measured productivity often lags during the build-out of “complements” (data, workflows, training), then rises. (American Economic Association, NBER)
Sector map (outside core tech)
-
Healthcare (clinical & admin): documentation, coding, triage, and prior-auth automation free clinician time; human oversight remains mandatory. Expect role mix to tilt toward care delivery + coordination over paperwork. (High exposure; high complementarity.) (IMF)
-
Education: lesson planning, feedback, and differentiation scale up; classroom management and safeguarding stay human. Skill demand tilts to instructional design and student support. (OECD)
-
Finance & professional services: faster drafting/review, reconciliations, KYC/AML screening, and analyst notes; humans keep risk, attestation, and client trust. Secretaries/admin, accountants, and analysts are among high-exposure occupations in OECD data. (OECD)
-
Customer service & sales: assisted chat/phone, reply suggestions, and next-best-action—novices gain the most; escalation and relationship work shift human-heavy. (NBER)
-
Public sector & law: summarization, records, benefits processing, discovery—humans still sign/decide; exposure high but bounded by regulation and accountability. (IMF)
-
Retail/hospitality: scheduling, inventory text, product copy, and support automation; on-site service and operations remain embodied/relational. (OECD)
-
Manufacturing/logistics/agriculture: Gen-AI affects planning, maintenance notes, SOPs; physical automation/vision handle more tasks, but open-world dexterity keeps humans central. (Mixed exposure; rising over time.) (IMF)
Who Gains from AI and Who Loses?
- Winners: workers who pair AI with business/process skills; novices in information work (AI narrows skill gaps); regions/firms that invest in complements (data, training, redesign). (NBER, OECD)
- Losers: roles dominated by routine cognitive tasks (basic admin, rote analysis) without customer/clinical ownership; workers without access to training; SMEs that adopt tools but skip process change. Inequality can widen without upskilling and mobility supports. (OECD)
Practical Moves for Workers, Managers, and Policymakers
For workers (any field)
- Reweight your job toward complementarity: take ownership of decisions, relationships, and outcomes; let AI draft and you decide and certify.
- Build the three durable complements: domain expertise, process/coordination skills, and AI operations (prompting + retrieval + verification). OECD vacancy data already shows rising demand for management/process skills in AI-exposed roles. (OECD)
- Quantify your lift: track your own metrics (turnaround time, quality scores, error rate) with and without AI—bring numbers to reviews.
For managers
- Redesign workflows, not just tools: add RAG/grounding, schema validation, and QA; measure latency, accuracy, cost per task; capture feedback loops. Field studies show the biggest, most reliable gains when AI is embedded in process, not just “made available.” (NBER)
- Target novice enablement: staff more junior talent where AI reliably boosts performance and route edge cases to seniors. (NBER)
For policymakers
- Focus on diffusion, not hype: fund training and transition supports where exposure is high; watch inequality channels. Advanced economies have more to gain and more to lose—exposure ~60%—so complements and safety nets matter. (IMF, OECD)
Conclusion
Across the economy, AI reallocates time from routine cognition to coordination, care, and judgment. The near-term wins come from augmentation, with solid evidence of productivity gains in real workplaces—especially for less-experienced workers. The long-term payoff depends on whether firms and governments invest in the complements (data, training, processes) that turn exposure into higher wages and better services, instead of wider inequality. (NBER, Science, IMF, OECD)