2026-05-19HexSaga

What AI Code Review Should Check Before Merge

AI code review should not stop at style. Before merge, it should look for behavior regressions, permission mistakes, cache invalidation bugs, missing tests, and unreviewable scope.

What AI Code Review Should Check Before Merge

AI code review can drift toward the wrong things. It may comment on naming, suggest splitting a function, or recommend a cleaner abstraction. Sometimes that is useful. But before merge, those are rarely the most important risks.

The main thing to prevent is behavior regression.

For AI-generated or AI-modified code, the merge review should ask: did this change behavior that should not change? Did it loosen permissions? Did it leave cache invalidation wrong? Did it remove important tests? Did a local fix become a hidden refactor?

AI review should not only be a style pass. It should start as risk detection.

Check scope before style

The first question is not whether a line could be prettier. The first question is whether the change belongs in this pull request.

Check whether:

  • the changed files match the task
  • unrelated formatting slipped in
  • config, lockfiles, or generated files changed unexpectedly
  • new dependencies were added
  • tests changed without implementation, or implementation changed without tests
  • debug logs, local paths, or temporary flags were committed

If the task is “fix export button permissions,” but the diff touches database schema, global CSS, login flow, and a dozen unrelated components, the review should stop at scope.

When scope is out of control, do not line-review everything. Ask for the change to be split or cleaned up first.

Behavior regression matters more than code smell

Code smell can often be improved later. Behavior regression reaches users immediately.

An AI review should ask behavior questions:

  • Does the old success path still work?
  • Are old error paths still handled?
  • Did empty values, boundary values, duplicate submission, or concurrent requests change?
  • Is the API response still compatible?
  • Do page refresh, navigation, and form state still behave correctly?
  • Were logs, audit trails, or tracking calls removed by accident?

Be especially careful when a small bug fix changes a shared helper. The impact may be much wider than the current task.

A useful review comment is not just:

This could be optimized.

It is more like:

This helper now treats an empty array as no permission, but callers previously used an empty array to mean unrestricted.
Please confirm all callers accept that semantic change, or limit the fix to the current page.

That comment points to behavior risk.

Review permissions by failure mode

Permission code should not be reviewed by checking whether “a permission check exists.” The important question is what happens when data is missing, stale, or manipulated.

Look for:

  • unauthenticated access
  • accounts with no roles
  • resources with no permission binding
  • backend checks behind frontend visibility
  • admin and ordinary-user path separation
  • access to another user's resource by changing request parameters
  • permission cache leakage across users or roles

A common AI mistake is treating missing data as allowed. For example, an empty permission list returns true, or an unbound resource is allowed by default. That may feel convenient in a demo. In a real system, it is dangerous.

The default review stance should be fail closed: uncertain means deny, and missing binding means deny. If the business requires otherwise, the rule should be explicit and tested.

Cache review is about ownership and invalidation boundaries

Cache bugs do not always look suspicious in a diff. The write code looks right, but the user still sees stale data after saving. Or private state leaks through a shared cache.

Ask:

  • Is this cache global, per role, or per user?
  • Which key, path, or tag is invalidated after mutation?
  • Does the read path bypass the invalidated layer?
  • Is the invalidation too broad, causing unrelated refreshes?
  • Is user-private data stored in shared cache?

AI often fixes the mutation path and forgets the read boundary. It may also use a broad refresh because it seems to solve the symptom, even when the real issue is a missing cache key or invalidation tag.

The review question is not “did the code call revalidate?” The question is “does the invalidation boundary match the data owner?”

Be specific about missing tests

“Add tests” is not a useful review by itself. Say which behavior is missing coverage.

Examples:

  • no test for a user without permission
  • no test for reading fresh data after mutation
  • no test for duplicate submission
  • no compatibility test for older response shape
  • no test for error responses or empty values
  • no test for cross-user isolation

Also separate must-fix gaps from nice-to-have gaps. Changes involving permissions, authentication, payment, data migration, or cache invalidation usually need stronger coverage before merge. A copy edit, small visual tweak, or internal type rename may not.

Tests are not there to decorate the diff. They prove that the important behavior did not drift.

How to ask AI to review a diff

Do not just say “review this.” Give the model the review priority:

Review the current diff as a pre-merge code review.
Prioritize behavior regressions, permission loosening, cache invalidation errors, missing tests, and unrelated changes.
Do not lead with style suggestions.
For each issue, point to the concrete file and behavior impact.

If the project has security rules, include them:

Permissions fail closed.
Resources without permission bindings must not be allowed by default.
Do not suggest a global refresh to hide a cache boundary problem.

This directs the AI toward the risks that matter, instead of producing a generic style critique.

A short pre-merge checklist

Before merge, check:

  • Does the scope match the task?
  • Are there unrelated files, generated files, or config changes?
  • Did user-visible behavior change intentionally?
  • Are permission failure modes safe?
  • Does cache invalidation match the data boundary?
  • Are key tests present?
  • Are skipped validations clearly disclosed?
  • Is the PR description accurate and free of invented test claims?

If the task is still being shaped, start with AI Context Engineering for Real Projects. If the concern is repository safety while the agent works locally, read How to Keep an AI Agent from Breaking Your Repo.

Conclusion

The value of AI code review is not how many minor style comments it can produce. It is whether it helps you catch merge risk earlier.

Before merge, the important questions are behavior, permissions, caching, tests, and scope. Did behavior regress? Did permissions loosen? Is cache invalidation correct? Are test gaps acceptable? Does the diff still belong to this task?

Put those questions first, and AI review becomes useful engineering review. Otherwise, it can become a polished list of suggestions that misses the risks that matter.