sem: Version Control That Understands Code

Git tracks files. Developers think in functions. sem bridges that gap. It gives you diffs, blame, and dependency analysis at the function level, across 13 languages. When someone runs a formatter on your codebase and touches 500 lines, sem tells you which functions actually changed behavior and which ones just got reformatted.

Structural Hashing

The core idea: compute a hash of each function's AST structure, ignoring whitespace, comments, and variable names. Two versions of a function that look different but do the same thing get the same hash. This lets sem classify every change as either structural (the code actually changed) or cosmetic (just formatting or renaming).

$ sem diff HEAD~1

src/payment.rs
  Modified: process_payment
    Type: STRUCTURAL (logic changed)

  Modified: validate_order
    Type: COSMETIC (reformatted only)

  Added: calculate_tax

Now you know exactly what to review. Skip the reformatted function. Focus on the one with real changes.

Better Blame

git blame tells you who last touched each line. But if someone ran cargo fmt after the real author, blame points to the formatter. sem blame ignores cosmetic changes and shows you who last made a structural change to each function. When a function breaks, you find the actual author, not the last person who changed whitespace.

Dependency Graph and Impact Analysis

sem builds a cross-file dependency graph by analyzing which functions call which other functions. Then sem impact answers the question: "If I change this function, what else might break?" It walks the graph transitively and tells you every function that depends on your change, directly or indirectly.

$ sem impact src/payment.rs::calculate_tax

Direct dependents:
  process_payment (src/payment.rs)

Transitive dependents:
  handle_checkout (src/api/checkout.rs)
  run_daily_billing (src/cron/billing.rs)

Total impact: 3 entities across 3 files

No grep chains. No guessing. The graph gives you the answer deterministically.

How sem and Weave Fit Together

sem is for understanding your code: what changed, who changed it, what depends on it. Weave is for merging code across branches and coordinating multiple developers or agents. Both use the same entity extraction library (sem-core) under the hood, so they agree on what a "function" is.

Try It

cd sem/crates && cargo build --release

sem diff HEAD~1
sem blame src/main.rs
sem graph src/lib.rs
sem impact src/lib.rs::my_function

Code is not text. Version control should know that. GitHub.