After seven months of vibe coding it’s time to look at what came out of the box. Below are two angles on the same project: the C# codebase itself, and the Claude Code sessions that produced it.
Code, in lines
Hand-written, EF migrations excluded:
| Bucket | Files | Lines of code |
|---|---|---|
| Production | 1,065 | 51,441 |
| Tests | 263 | 58,690 |
| Total | 1,328 | 110,131 |
Test-to-production ratio is 1.14 — slightly more test code than production code. That is the goal: as many tests and as little production code as possible. I will write a separate post about why. Median hand-written file is 22 lines.
EF Core’s generated migration files weigh in at another 389k lines across 403 files, which I exclude from every metric here because they are not really code I wrote.
Production, by group
| Group | Files | Code |
|---|---|---|
| Core | 160 | 7,219 |
| Domains (26 projects) | 754 | 33,843 |
| Application | 79 | 3,863 |
| Other (Cottonopolis/Diagnostics/ProfileManager) | 72 | 6,516 |
Domains is where the actual game lives — 26 isolated projects, each with its own logic/profiles/repositories, never referencing each other directly. The biggest by line count are History (3,863), Commerce (2,612), Rendering (2,571), Mining (2,121), and Game (2,037). The smallest is Warehouse at 92 lines.
The five chunkiest production files:
TransportPickupService.cs— 691 lines, cyclomatic complexity 131ProducerPricingComputer.cs— 494 lines, complexity 79StreetRenderer.cs— 431 lines (rendering, gets a pass)GodotRenderingCallback.cs— 397 lines (also rendering)MarketClearingService.cs— 370 lines, complexity 52
The transport service is the obvious refactor target. It is doing too much.
Handlers stay thin
The architecture rule is that Application-layer event handlers should be thin orchestrators and never carry business logic. Across 51 handlers:
- median lines of code: 36
- median dependencies: 3
- median complexity: 4
The five fattest:
RegionInitializationHandler— 197 LOCQuarterlyRentCollectionHandler— 154 LOCAnnualTitheCollectionHandler— 130 LOCMarketAnalysisRefreshHandler— 95 LOCEmploymentResultRecordingHandler— 92 LOC
Anything over 50 LOC is on the watchlist. The medians are healthy; the long tail is what I have to keep an eye on.
Claude Code, in sessions
This is the part that still surprises me. The numbers below are just for the GrandStrategy project — there are also adjacent projects (vibe-overflow, the site, Pipeline, etc.) but the main repo dominates.
| Metric | GrandStrategy |
|---|---|
| Conversations | 581 |
| Messages | 210,210 |
| Input tokens | 8.1M |
| Output tokens | 26.6M |
| Cache writes | 388M |
| Cache reads | 9.83B |
| Equivalent API cost | ~$8,286 |
A few things stand out.
Cache reads are 25× cache writes. Each cached prefix gets reused about 25 times on average. Prompt caching is doing extremely heavy lifting here — without it the bill would be much, much higher.
Output is 3× input. The model is writing more than it is reading from me. This is mostly tool output, generated code, and reasoning, not dialogue.
Costs are subscription-flat. The ~$8,286 figure is what those tokens would have cost at API list prices. I am on a Claude subscription, so my actual spend is a flat monthly number. But it does give a sense of the volume.
Models used
| Model | Messages |
|---|---|
| Opus 4.6 | 126,617 |
| Haiku 4.5 | 65,367 |
| Opus 4.5 | 18,588 |
| Opus 4.7 | 5,080 |
| Sonnet 4.6 | 3,934 |
| Sonnet 4.5 | 1,392 |
Opus 4.6 was the workhorse for most of the project. Haiku 4.5 shows up heavily for cheaper subagent and exploration work. I have only recently moved to Opus 4.7 — that number will keep growing.
Activity over time
Weekly messages, oldest first:
| Week starting | Messages |
|---|---|
| 2026-01-05 | 3,306 |
| 2026-01-12 | 6,706 |
| 2026-01-19 | 20,852 |
| 2026-01-26 | 19,436 |
| 2026-02-02 | 7,631 |
| 2026-02-09 | 36,783 |
| 2026-02-16 | 49,846 |
| 2026-02-23 | 10,461 |
| 2026-03-02 | 34,523 |
| 2026-03-09 | 27,486 |
| 2026-03-16 | 53,663 |
| 2026-03-23 | 43,653 |
| 2026-03-30 | 48,866 |
| 2026-04-20 | 12,600 |
| 2026-04-27 | 325 |
Peak week was 53,663 messages — about 7,600 per day. That includes every tool call, subagent message, and intermediate step, not only my prompts. The dip in late April is real: I was doing the outer-market integration and spending more time thinking than typing.
Git, in commits
The Claude session counters are one half of the picture. Git history is the other half — what actually got written down.
Volume and churn
| Metric | Value |
|---|---|
| Commits | 555 |
| Active commit-days | 80 |
| Lines added | 890,427 |
| Lines deleted | 644,318 |
| Net | +246,109 |
| Median churn per commit | 526 |
| p90 churn per commit | 4,716 |
So roughly 890k lines were written and 644k of those were later removed, net of a current 110k tree. The codebase has been written about three times over. Median commit is small (~500 lines of churn), but the p90 is nearly 10× the median: a barbell of tiny corrections and sweeping refactors with not much in between.
The five biggest single commits:
style: fix formatting issues (line endings and whitespace)— 214k churn (one-off, mass\r\nreflow)clean up— 146k churnfix: Fix test failures and improve employment system— 56k churnmove to ecs— 45k churn (a real architectural pivot)fix: Complete trading system and price adjustment mechanism— 40k churn
What kind of commits
Top first-verb in commit messages, n=555:
| Verb | Count | Share |
|---|---|---|
fix | 87 | 15.7% |
refactor | 83 | 15.0% |
feat | 68 | 12.3% |
add | 52 | 9.4% |
cycle | 43 | 7.7% |
merge | 39 | 7.0% |
docs | 25 | 4.5% |
remove | 15 | 2.7% |
cycle is claude-cycles runs — autonomous iterative refactoring. Collapsed into semantic buckets:
- Cleanup/refactor (
refactor+cycle+remove+clean+decouple+move+extract+decompose): 29.0% - Net-new (
feat+add+added+implement+create): 24.9% - Bug-shaped (
fix+fixed): 16.4% - Plumbing (
merge+docs+chore+wip+phase+specs+update+style): 16.2%
Cleanup beats net-new. That is the line that explains the 1.14 test ratio and the modest current LOC. The project spends more commits removing and reshaping code than adding features.
Commits vs Claude messages
Same weeks as the table above, with commits aligned alongside:
| Week starting | Messages | Commits |
|---|---|---|
| 2026-01-05 | 3,306 | 0 |
| 2026-01-12 | 6,706 | 19 |
| 2026-01-19 | 20,852 | 16 |
| 2026-01-26 | 19,436 | 3 |
| 2026-02-02 | 7,631 | 0 |
| 2026-02-09 | 36,783 | 0 |
| 2026-02-16 | 49,846 | 0 |
| 2026-02-23 | 10,461 | 0 |
| 2026-03-02 | 34,523 | 30 |
| 2026-03-09 | 27,486 | 38 |
| 2026-03-16 | 53,663 | 57 |
| 2026-03-23 | 43,653 | 18 |
| 2026-03-30 | 48,866 | 64 |
| 2026-04-20 | 12,600 | 3 |
| 2026-04-27 | 325 | 0 |
The four weeks of Feb 2 – Feb 23 racked up ~104k Claude messages and zero commits. Heavy-typing weeks were not shipping weeks; they were exploration and refactor-design weeks where the commit only landed at the end. March is the inverse — high messages and high commits, because by then I knew what I was building. The shape of the work is visible in this gap.
Churn hotspots
Most-touched .cs files over the project’s life:
| Touches | File |
|---|---|
| 82 | Application/ServiceCollectionExtensions.cs |
| 38 | Domains.Market/Presentation/MarketFacade.cs |
| 34 | Domains.Merchant/Presentation/MerchantFacade.cs |
| 31 | Tests/Behaviors/ComprehensiveEconomyTests.cs |
| 30 | Domains.Transport/Presentation/TransportFacade.cs |
| 28 | Application/Handlers/Migration/ArrivalEvaluationHandler.cs |
| 26 | Domains.Trading/Presentation/TradingFacade.cs |
DI registration tops the list — ServiceCollectionExtensions.cs churns every time a new domain or handler is wired in, so it doubles as a temporal index of project growth. Below it, four of the next six are domain facades: the public seams between Application and the 26 domain projects. The architecture rule that “each domain exposes a public IXxxFacade” shows up here as a pattern in the version control data.
Worth noting: TransportPickupService.cs, the post’s #1 chunkiest production file at 691 LOC, does not appear in the top-touched list. It grew big without being repeatedly rewritten — so it accreted responsibilities rather than going through real iteration. That is a different and more concerning failure mode than “battle-scarred refactor target.”
What this does not measure
These numbers do not say anything about quality — whether the code is correct, whether the tests catch real bugs, or whether the architecture will hold. That is a different post. They do show the shape of the project: a lot of small handlers on top of 26 domain projects, with more test code than production code and a Claude Code interaction graph that is dominated by cache reuse.
Snapshot taken 2026-04-29 from scripts/codebase-stats.ps1, scripts/git-stats.ps1, and claude-explorer stats. I will probably re-run this every couple of months and see what moves.