2026-05-18HexSaga

Why Codex and Claude Code Need Git Branches and Pull Requests

A practical guide to using Codex, Claude Code, and other coding agents with Git branches, diff review, validation, PR descriptions, command approval, and sane task scope.

Why Codex and Claude Code Need Git Branches and Pull Requests

Codex, Claude Code, and similar coding agents are not just chat tools that return code snippets. They can work inside a real repository: read files, edit files, run commands, inspect failures, check diffs, and keep iterating.

That is exactly why they need a real Git workflow.

If you let an agent make changes directly on your main branch, the speed feels good for a while. Then one bad task can mix unrelated edits, local config, generated files, half-finished experiments, and risky command output into the same working tree. The problem is not that agents are useless. The problem is that they are powerful enough to need boundaries.

The practical rule is simple: when a coding agent touches a real project, Git branches and pull requests are not ceremony. They are the minimum engineering guardrail.

Do not let the agent work on main

The main branch should represent a stable, understandable state of the project. Even in a solo project, it is useful to treat main as the current trusted version rather than a scratchpad.

Coding agents do not work exactly like humans. A human developer often carries an implicit boundary: fix this endpoint, edit this component, add these tests. An agent can follow instructions, but it also explores adjacent files and may decide that a nearby cleanup, rename, or config change is helpful.

That initiative is useful, but it should not land directly on main.

A branch gives you three protections:

  • Isolation for unfinished work: the agent can try, fail, revise, and try again without destabilizing the base branch.
  • Isolation from other work: in a shared repository, the working tree may already contain someone else's changes or generated files.
  • Cheap rollback: abandoning a branch is much easier than untangling a polluted main branch.

A safer starting point is: inspect the working tree, create a focused branch from the right base, then give the agent a bounded task.

Diff review is where control actually happens

The agent's summary is not the source of truth. The diff is.

The diff shows what files changed, what logic was removed, which dependencies were added, which tests were touched, and whether the agent stayed inside the requested scope.

When reviewing an agent-produced diff, check at least:

  • whether the changed files match the requested task
  • whether unrelated formatting, generated files, local config, or debug code slipped in
  • whether the change respects existing architecture boundaries
  • whether error paths, permissions, cache behavior, concurrency, null states, and rollback cases were considered

If the diff is too large to review carefully, the task was probably too large. A diff you cannot read is not a diff you should merge.

Tests turn the agent into a feedback loop

The value of Codex and Claude Code is not only that they can write code. It is that they can enter the loop: change code, run checks, read the failure, and fix again.

That means you should ask for real validation, not just implementation. Depending on the project, that might be:

npm test
pnpm build
mvn test
cargo test
go test ./...

The exact command depends on the repository. The important part is that the agent should use the same feedback a developer would use. Code can look reasonable and still fail type checks, unit tests, integration tests, or build steps.

Not every small edit needs the full test suite. A copy edit does not need a database integration run. But if the change touches business logic, permissions, authentication, caching, payment, deployment, or data persistence, it needs a matching validation step.

A PR description is an engineering handoff

A pull request description is not just something to make the review page look complete. For agent-assisted work, it is the handoff document.

It should answer:

  • what problem the PR solves
  • which modules changed
  • what behavior changed
  • which tests or checks were run
  • which risks are still not fully covered

This is a good job for the agent, but only if it uses the real diff and real command output. A useful prompt is:

Write a short PR description from the current git diff and validation output.
Include scope, implementation, validation, and remaining risk.
Do not claim tests were run if they were not run.

That last sentence matters. Agents can write confident summaries. The PR needs an accurate one.

Commands you should not approve casually

Command execution is one of the biggest advantages of coding agents. It is also one of the biggest risks.

Commands that inspect state or run validation are usually reasonable: git status, git diff, rg, npm test, pnpm build. You should still read them, but they usually do not destroy work or rewrite shared history.

Commands that delete, overwrite, rewrite, or mutate persistent systems deserve a pause:

  • rm -rf
  • git reset --hard
  • git checkout -- .
  • git push --force
  • removing Docker volumes
  • running migrations against production
  • installing unknown remote scripts
  • uploading logs that contain secrets or user data

The test is simple: if the command goes wrong, could it lose data, lose commits, affect another developer, or affect production? If yes, do not auto-approve it.

Be especially careful with git reset --hard and git checkout -- .. They look like cleanup commands, but they can discard uncommitted work. In a shared or dirty working tree, that is exactly the wrong kind of cleanup.

The right task size for an agent

Agents can handle large tasks, but large work should be split into reviewable slices.

Bad task shapes:

  • "Refactor the whole project."
  • "Improve the frontend."
  • "Find and fix all bugs."
  • "Redesign the permission system."

Better task shapes:

  • "Find why this endpoint does not filter by role and add the missing test."
  • "Move this page's API calls into the services layer without changing UI behavior."
  • "Fix duplicate submission in this form and extract a reusable submit lock."
  • "Apply this review comment and do not touch unrelated files."

A good agent task has a clear goal, an understandable file boundary, a plausible validation command, and a diff that can be reviewed in one sitting.

If a task will touch dozens of files, ask the agent for a plan first. Then split the work across multiple branches or PRs. Do not hide all the uncertainty inside one giant diff.

Branches and PRs usually make you faster

The objection is obvious: if AI is supposed to make coding faster, why slow it down with branches, diffs, tests, and PRs?

Because skipping the workflow does not remove the work. It postpones it.

You may save ten minutes up front, then spend hours later separating unrelated edits, recovering overwritten files, explaining a broken build, or figuring out which change caused a regression.

Branches and PRs make each agent-assisted change legible:

  • what the task was
  • what files changed
  • what validation ran
  • what risk remains
  • who owns the final merge decision

The clearer this boundary is, the more the agent behaves like an engineering tool. The blurrier it is, the more it becomes a fast source of uncertainty.

A simple workflow that works

For both solo projects and teams, this is a good default:

  1. Run git status and understand the current working tree.
  2. Create a new branch from the latest main or target branch.
  3. Give the agent a narrow task and specify allowed scope.
  4. Ask it to inspect and explain before making edits.
  5. Run the relevant test, build, or lint command.
  6. Review git diff for unrelated changes.
  7. Ask the agent to draft the PR description from the real diff and validation result.
  8. Open a PR, review it, then merge deliberately.

For high-risk repositories, add two more rules: the agent cannot run destructive commands automatically, and it cannot commit, push, or merge unless you explicitly ask.

Conclusion

Codex and Claude Code can make development faster, but they should not bypass engineering control. The more capable the agent is, the more important branches, diffs, tests, and PRs become.

Branches isolate risk. Diffs expose facts. Tests validate behavior. PRs create review and memory.

With those guardrails, a coding agent becomes a useful collaborator. Without them, it becomes a very fast way to create changes nobody fully owns.

Related Reading