The Entity
Every tool that works with code has to choose a unit of analysis. Text editors work in characters. Git works in lines. Compilers work in tokens and AST nodes. Each of these is the right choice for what that tool does. But when you want to do higher-level things with code, things like understanding what changed, figuring out what depends on what, merging parallel edits, or deciding what to review, you need a different unit. You need something that corresponds to a meaningful piece of logic rather than a piece of text. We call that unit an entity, and choosing it turns out to be one of those decisions that looks obvious in retrospect but has surprisingly deep consequences.
An entity is a function, a class, or a method. That's it. Not a file, not a line, not an AST node, not a module. A function. The reason this particular granularity is the right one isn't arbitrary, and it's worth understanding why, because a lot of the power of entity-level tooling comes from properties that only hold at this specific level of abstraction.
Start by thinking about what makes a good unit of analysis for code. You want something that is self-contained: it has a clear boundary, a name, defined inputs, and defined outputs. You want something that is independently meaningful: you can understand what it does without reading everything around it. You want something that maps to how people actually think about code: when a developer says "I changed the payment logic," they're pointing at something, and you want your unit to be that thing. And you want something that has well-defined relationships to other units: it calls things and is called by things, and those relationships form a graph you can reason about.
Functions satisfy all of these properties. A function has a name, a signature, a body. You can read a function and understand what it does. You can talk about a function in conversation and your colleague knows exactly what you mean. Functions call other functions, forming a dependency graph that tells you how changes propagate through a codebase. Classes satisfy these properties too, as do methods within classes, which are essentially functions with an implicit receiver. These are the natural atoms of software: the smallest units that are self-contained, independently meaningful, and connected to each other through explicit interfaces.
Now consider the alternatives and why they don't work as well. Files are too coarse. A single file might contain ten unrelated functions, and if you treat the file as your unit, you lose the ability to distinguish between them. Two developers editing different functions in the same file look like they're in conflict, even though they're working on completely independent things. Files also don't have a natural dependency structure at the right granularity: file A importing file B tells you almost nothing about which specific functions in A depend on which specific functions in B. You lose the fine-grained graph.
Lines are too fine. A single line of code has no independent meaning. It's a fragment of a larger thought. If you diff at the line level, you can see that something changed, but you can't tell what it means without reading the surrounding context. Lines also don't have dependencies. Line 42 doesn't "depend on" line 17 in any well-defined way. There's no graph to walk, no impact to trace. You're working with raw text, which is what git does, and git does it very well, but it's the wrong abstraction for understanding behavior.
AST nodes might seem like a good candidate, since they capture the structure of code precisely. But most AST nodes are too granular to be useful as units of analysis. An if-statement node, a variable declaration node, an assignment node: these are fragments of logic, not complete thoughts. You can't reason about an if-statement in isolation because it only makes sense inside the function that contains it. An AST node doesn't have a meaningful interface: it doesn't take inputs or produce outputs in the way a function does. And the AST node graph would be enormous, containing every syntactic element in the codebase, which makes it impractical for the kind of analysis you actually want to do.
The entity sits at the sweet spot. It's coarse enough to be meaningful on its own, fine enough to distinguish independent changes within a file, and connected to other entities through a dependency graph that's the right size and shape for practical analysis. When you say "this pull request changed three entities and affects seven dependents," that's a useful summary that a human or an agent can act on. Try saying the same thing about lines or AST nodes and it either doesn't make sense or doesn't help.
For agents specifically, the entity is the right unit because it matches the granularity at which agents naturally work. When an agent modifies code, it doesn't think in lines. It thinks: I need to change this function to handle the new edge case, and I need to update that function to call it with the new parameter. Those are entity-level operations. When an agent reviews code, the useful questions are entity-level: which functions changed? Are the changes structural or cosmetic? What depends on each changed function? If you give an agent a list of changed entities and their dependents, you've given it exactly the information it needs to do its job. If you give it a line diff, you've given it a puzzle to solve before it can even start.
There's also a coordination argument. When multiple agents work on the same codebase, they need a way to avoid stepping on each other. The entity is the natural unit of coordination because it's the smallest unit that can be independently owned. An agent can claim a function, work on it, and merge its changes with a guarantee that no other agent's work will conflict, as long as no other agent claimed the same function. You can't make this guarantee at the file level (two agents editing different functions in the same file would collide) or at the line level (there's no meaningful notion of "owning" a line). The entity is the Goldilocks unit: not too big, not too small, just right for ownership, merging, and independent reasoning.
The interesting thing is that none of this is really a new idea. Developers have always thought in entities. When someone says "I refactored the auth handler," they're talking about an entity. When someone says "the bug is in the payment validation," they're pointing at an entity. When someone reviews a PR and says "the changes to the user model look fine but I'm worried about the session manager," they're reasoning at the entity level. The concept is intuitive and universal. What's new is building tools that actually operate at this level, so that the thing developers naturally think about and the thing their tools report on are the same thing. That alignment, between how you think about code and how your tools represent it, is where the leverage comes from.