ChatGPT & Benji Asperheim— Thu Aug 28th, 2025

What Is an LLM? What Does LLM Stand For?

An LLM stands for Large Language Model—a neural network (usually a Transformer) trained with self-supervised learning to predict the next token in text. In practice, this simple objective lets large language models (LLM) handle summarization, translation, Q&A, code generation, and more. Transformer self-attention weighs relationships among tokens across long contexts, enabling parallel training and broad generalization. That’s the core of the LLM meaning in modern ML: a scaled text predictor whose capabilities emerge from size, data, and training method. (arXiv, NeurIPS Papers)

Over the past few years, scaling parameters and training tokens unlocked “few-shot” behavior—models can do new tasks from instructions and a handful of examples, often without task-specific fine-tuning. Data/compute trade-offs (e.g., train on more tokens, not just bigger models) further boosted accuracy. (arXiv, NeurIPS Proceedings)

Check out our other article covering why LLMs cannot replace good coders.

LLM Meaning for Machine Learning

In ML terms, an LLM is a foundation model:

Objective: autoregressive next-token prediction over massive corpora (code, web, books).
Architecture: Transformer encoders/decoders with positional encodings and multi-head self-attention.
Capabilities: emerge from scale (parameters x tokens), optimization, and post-training.
Post-training: instruction tuning and RLHF align raw models to follow instructions and prefer helpful, safer outputs. (arXiv, NeurIPS Papers)

Two production patterns matter:

Retrieval-Augmented Generation (RAG): add retrieval so the model conditions on fresh, sourced documents—improving specificity and factuality.
Decoding strategies: beyond greedy decoding, ensembles like self-consistency (sample multiple chains and vote) can raise accuracy on reasoning benchmarks. (arXiv)

Why Are LLM Responses Often Accurate?

Statistical competence at scale. The next-token objective + diverse data captures grammar, facts, and common patterns; scaling laws explain strong few-shot generalization. (arXiv, NeurIPS Proceedings)
Alignment and instruction tuning. SFT and RLHF steer models toward user intent and more truthful/harmless outputs. (arXiv, NeurIPS Proceedings)
Tool use & retrieval. With RAG or tools, models ground answers in sources, reducing parametric guesswork. (arXiv)
Better decoding. Methods like self-consistency improve math/logic answers without changing the base model. (arXiv)

Caveats. Outputs can be fluent but false (“hallucinations”), especially on niche, time-sensitive, or adversarial prompts; models are sensitive to irrelevant context, and step-by-step “explanations” can be unfaithful to the actual decision process. Calibration is imperfect; results depend on prompt quality, context length, and domain shift. (arXiv)

Why LLMs Cannot Reason (Reliably)

LLMs don’t natively implement general symbolic reasoning. They generate tokens from learned associations; apparent “reasoning” emerges when those associations mirror valid patterns or when we scaffold with tools.

Objective mismatch: The goal is likelihood (what looks right), not truth/validity—so confident fiction is possible. (arXiv)
Planning & verification deficits: Autoregressive models struggle with multi-step planning and self-checks without external algorithms (search/solvers/verifiers). (arXiv)
Explanation unfaithfulness: Chain-of-thought can help accuracy, but written steps may be post-hoc. (arXiv)
Context fragility: Irrelevant details can derail solutions; distractors measurably lower accuracy. (arXiv, OpenReview)

Bottom line: Treat LLMs as fast, fallible pattern engines. You get reliable “reasoning” by pairing them with retrieval, programs, and verifiers. (arXiv)

What Jobs Can I Do with an LLM?

(Clarifying: LLM = Large Language Model, not the law degree.) With a Node/Express + Angular/Vue background, these roles are realistic:

Core Roles You Can Do Today

LLM Application Engineer / AI Product Engineer — End-to-end features: prompts, RAG, tools, evals, UI.
AI Integration Engineer (Platform) — Add LLM features to existing systems (chat, smart search, autodocs, agents).
RAG/Search Engineer — Ingestion, embeddings, retrieval/reranking, citations, eval harnesses (pgvector/Weaviate/ES).
Evals/Quality Engineer — Ground-truth sets, rubric graders, mutation tests for prompts, regression gates.
AI Solutions Architect / Consultant — Scope, vendor choices, guardrails, TCO, rollout/governance.
Agent/Workflow Engineer — Tool schemas, function calling, retries/timeouts, compensation.
Technical Writer/DevRel (AI) — Guides, reference apps, SDK examples.

Contractable Offerings (Productize These)

Docs/Support Copilot with citations and feedback loop.
Codebase Modernization Assistant (e.g., Angular standalone migration + test-gen).
Text-to-SQL over operational data with schema guards and RLS.
Meeting/Call summarization with action items → Jira/Asana.
Policy/Compliance checker (PII/license/security anti-patterns).
Grounded content factory (RAG + style enforcement + plagiarism checks).

What Each Job Actually Entails (How to Stand Out)

Own the contract & data flow: OpenAPI/JSON Schema first; deterministic tool schemas; I/O validation.
Guardrails: auth, redaction, rate limits, cost caps, content filters; no secrets in prompts; structured logging.
Evals: golden sets, task metrics (exact match/F1/ROUGE), regression CI.
Perf/cost: caching, short contexts, batching, streaming UIs, budget alerts.
Ship-ready UI: Angular standalone components with explicit imports; SSR-safe.

High-Signal Portfolio Pieces (Build 2—3)

RAG starter with dashboard (hit-rate, accuracy, latency, cost).
Tool-using agent (3 tools, timeouts, idempotency, rollback).
Eval harness (CLI + dashboard + mutation tests).
Code-mod kit (AST + LLM hints; before/after tests).

Hiring Signals Companies Look For

Real metrics (“AHT −18%”, “search latency −40%”, “hallucinations <2% on 500-Q eval with citations”).
Evidence of governance (PII handling, audit logs) and supply-chain hygiene (SBOM, license policy).
Repeatable patterns (templates/CLIs), not one-off hacks.

Your 30-Day Break-In Plan

Week 1: Minimal RAG app + 100-Q eval + CI gates (accuracy/latency/cost).
Week 2: Add tool use (issue tracker + analytics), retries/timeouts, caching; add cost/latency dashboards.
Week 3: Eval CLI with regression & perturbation tests; publish a short metrics post.
Week 4: Package as a template + case study; apply with live demo + metrics.

Where to Spend Time (Given ~10x Throughput)

More: specs/contracts, evals, security/perf budgets, reusable templates, docs.
Less: hand-writing glue—generate then normalize to your patterns.
Never: ship “vibe spikes” without tests/guardrails or let the model choose dependencies by default.

What Is AI, How Does AI Work, and How to Use AI?

What Is AI?

Artificial Intelligence (AI) encompasses systems performing tasks that usually require human intelligence: recognition, prediction, planning, translation, dialog, coding, etc. Most useful AI today is machine learning (ML); a large sub-slice is deep learning (neural networks). Generative AI (LLMs for text, diffusion for images/audio) creates content, not just classifies it.

How Does AI Work? (Short, Correct Version)

Three phases: data → training → inference.

Phase 1: Data

Collect labeled (supervised) or unlabeled (self/unsupervised) examples, plus feedback data for alignment. Data quality dominates results (coverage, recency, cleanliness, bias, label consistency).

Phase 2: Training

Optimize model parameters to minimize a loss using gradient descent/backprop. Families include:

Classical ML: trees, SVMs, linear models (tabular workhorses).
Deep learning: CNNs (vision), RNNs/Transformers (sequences; Transformers dominate text/code).
Diffusion models: generate images/audio via denoising.
Reinforcement Learning: learns actions via rewards; RLHF aligns LLMs to instructions.

Phase 3: Inference (Serving)

The trained model receives inputs and outputs predictions (token-by-token for LLMs; ranked list for recommenders; label/probability for classifiers). Tooling—retrieval, caching, guardrails, evals, and cost/latency control—matters as much as the model.

Key Concepts in Modern AI

Embeddings for semantic search and retrieval.
RAG to ground answers and reduce hallucinations.
Fine-tuning vs. prompt engineering to adapt vs. steer models.
Decoding & non-determinism: temperature controls creativity vs. determinism.
Evaluation: hold-out sets/human rubrics + operational metrics (latency, cost, hallucination rate).

How to Use AI (Practical Patterns)

1) Pick the Right Pattern

Lookup/Q&A over your docs: RAG with citations.
Structured extraction: examples + validators; escalate uncertain cases.
Summarization/rewriting: constrain length/style; include source spans.
Classification/routing: start with prompts; fine-tune when volume stabilizes.
Code/automation: draft boilerplate; keep invariants/security logic hand-crafted.
Tabular predictions: often classic ML beats LLMs—use the right tool.

2) Ground It and Guard It

Prefer RAG or explicit rules over parametric memory; require citations. Validate outputs (JSON Schema/Zod), enforce tool/API allow-lists, redact PII, cap rates/costs, and log prompts/outputs. Use low temperature and snapshot models for reproducibility.

3) Measure What Matters

Define acceptance criteria (e.g., ≥85% exact-match on 300 items, P95 < 1.2s, cost < $0.01/request). Track hallucination rate, citation correctness, escalation rate, and user satisfaction. Fail CI on regressions.

4) Ship the Smallest Trustworthy Thing

Start narrow (one document set/task). Add human-in-the-loop for edge cases and feedback. Expand only after metrics and operations (alerts, dashboards) are stable.

Practical Examples

For Knowledge Work

RAG copilots with citations; summarize-then-verify workflows; draft-then-edit pipelines.

For Software Teams

Code scaffolding; spec-first development; PR copilots gated by types, mutation tests, and policy lint.

For Business Processes

Triage/routing; document extraction with confidence scores; internal semantic search with access controls.

Gotchas

Hallucinations (fluent ≠ true), prompt fragility, data leakage/compliance risks, dependency sprawl, and overusing LLMs where classic ML is better. Mitigate with grounding, validation, governance, and metrics.

Quick Start Checklist

Single, testable use-case + metrics
Choose pattern: RAG vs. fine-tune vs. classic ML
Build a 100—300 item eval set
Add schema validation, rate/cost caps, logging
Low temperature; require citations where facts matter
Human review on low-confidence cases
Track metrics; iterate; widen only after targets are met

What Jobs AI Can’t Replace (But Will Augment)

A Quick Framework

AI struggles to fully replace work that is embodied (physical/dexterous), relational (trust/duty-of-care/persuasion), or accountable (licensed, liable decisions).

Job Families

Skilled trades & field service (electricians, plumbers, HVAC, lineworkers, welders).
Hands-on healthcare & care work (nurses, EMTs, PT/OT, hygienists, elder/childcare).
Early/special education (K—5 teachers, aides).
Mental health & social services (therapists, social workers).
Public safety & emergency response (firefighters, SAR, incident command).
High-stakes persuasion/negotiation (enterprise sales, trial lawyers, mediators, diplomats).
Leadership & governance (executives, judges, elected officials).
Creative direction & brand ownership (creative directors, showrunners, editors-in-chief, lead designers).
Inspection/compliance with human sign-off (building inspectors, safety auditors, airline captains).

Jobs AI Will Reshape (Not Remove Soon)

Pilots and surgeons (automation grows but humans command), accountants/actuaries/auditors (automation of prep; human attestation), journalists/analysts (draft/research speedups; human sourcing/ethics).

Practical Implications

Bias your role toward embodiment, relationships, accountability, and taste. Use AI for drafting/search/monitoring; you decide, constrain, and certify. Own guardrails (privacy/safety/cost/evals/compliance). Build meta-skills: problem framing, negotiation, change management.

If You’re a Software Pro

Move up-stack (requirements, architecture, evals, governance). Productize outcomes (SLAs, accuracy, compliance), not just code. Stay connected to ops (incidents, perf budgets) to remain accountable.

Entry-Level AI Jobs (Realistic On-ramps)

1) Junior LLM Application Engineer

Do: small LLM features (chat, autocomplete, form helpers) Show: JS/TS, REST, JSON Schema, streaming UIs, basic prompt/RAG Screen: “Add chat with citations; budget < $0.01/request.” Portfolio: Help-center chat (Angular + Node) with cost/latency dashboard

2) RAG/Search Engineer (Junior)

Do: ingest → chunk → embed → retrieve/rerank → cite Show: pgvector/SQLite-vec or vector DB, metadata filters, evals Screen: “Q&A over 500 PDFs; measure accuracy & latency.” Portfolio: Pipeline + retrieval metrics + “confidence + source” UI

3) AI QA / Evaluation Engineer (Junior)

Do: test sets, graders, regression gates Show: test design, rubric graders, CI integration Screen: “Eval set; fail CI if accuracy drops 5%.” Portfolio: Eval CLI that outputs a markdown report

4) Prompt Engineer / Content Designer (Associate)

Do: prompts, few-shot examples, safety tests, style guides Show: measurable lifts, adversarial tests Screen: “Reduce hallucinations; show metrics.” Portfolio: Case study with iterations + deltas + error taxonomy

5) AI Support Engineer / Solutions Analyst

Do: wire LLM features to customer data; triage issues Show: APIs, auth, logging, redaction, cost caps Screen: “Connect CRM with OAuth + rate limits.” Portfolio: “LLM + CRM” demo with audit logs and back-pressure

6) Data Labeling / Annotation Specialist (Lead-track)

Do: gold datasets, label guides, QA on annotators/tools Show: inter-annotator agreement, sampling strategy Screen: “Design labels and prove consistency.” Portfolio: Small labeled corpus + agreement analysis

7) Model Ops / MLOps Assistant

Do: monitor latency/cost/drift; manage rollouts/A-Bs/alerts Show: Grafana/Prometheus, tracing, canaries Screen: “Drift alerting + rollback on error spikes.” Portfolio: Dockerized demo with tracing and budgets

8) AI Technical Writer / Developer Advocate (Associate)

Do: tutorials, sample apps, runnable repos Show: clear docs + screenshots/gifs Screen: “Write SDK tutorial + sample app.” Portfolio: RAG quickstart + tool-calling demo

9) Agent / Workflow Builder (Low-code + APIs)

Do: tool schemas, retries/timeouts, compensation Show: deterministic contracts, idempotency Screen: “Agent files ticket, fetches data, emails—safely.” Portfolio: 3-tool agent with audit trail

10) AI Content Ops (Grounded Gen)

Do: grounded summaries/snippets with style enforcement Show: RAG + templating, plagiarism/fact checks Screen: “20 product briefs with citations, no plagiarism.” Portfolio: Batch pipeline + QA report (hallucinations < 2%)

How to Find These Roles (Keywords)

Paste these into your favorite search engine, or {job search website](https://careersherpa.net/best-job-search-websites/), to look for available roles:

AI Product Engineer (junior)

LLM Application Engineer

RAG Engineer

AI QA/Evals

Prompt Engineer (Associate)

AI Support Engineer

Model Ops Analyst

Developer Advocate (AI)

TypeScript

Node

Angular/React

RAG

pgvector

LangChain/LlamaIndex

What Hiring Managers Want to See

Live demo + repo (eval set, metrics, citations, logs, cost caps), guardrails (schema validation, redaction, rate limits, deterministic decoding), and numbers (e.g., accuracy 84% on 300 Qs; P95 900 ms; cost $0.006/request; hallucinations 1.7%).

Conclusion

LLMs are powerful pattern learners, not truth oracles. Treat them like a fast, fallible junior team: you decide contracts, provide examples, ground with retrieval, and enforce verification. Use AI where language and unstructured data dominate; use classic ML where numbers rule. If you’re breaking into AI work, bias your time toward specs, guardrails, and evals; ship small, measurable demos; and market outcomes, not hype. That’s how you turn today’s tooling into durable career leverage.