State of the codebase, April 2026

After seven months of vibe coding it’s time to look at what came out of the box. Below are two angles on the same project: the C# codebase itself, and the Claude Code sessions that produced it.

Code, in lines

Hand-written, EF migrations excluded:

Bucket	Files	Lines of code
Production	1,065	51,441
Tests	263	58,690
Total	1,328	110,131

Test-to-production ratio is 1.14 — slightly more test code than production code. That is the goal: as many tests and as little production code as possible. I will write a separate post about why. Median hand-written file is 22 lines.

EF Core’s generated migration files weigh in at another 389k lines across 403 files, which I exclude from every metric here because they are not really code I wrote.

Production, by group

Group	Files	Code
Core	160	7,219
Domains (26 projects)	754	33,843
Application	79	3,863
Other (Cottonopolis/Diagnostics/ProfileManager)	72	6,516

Domains is where the actual game lives — 26 isolated projects, each with its own logic/profiles/repositories, never referencing each other directly. The biggest by line count are History (3,863), Commerce (2,612), Rendering (2,571), Mining (2,121), and Game (2,037). The smallest is Warehouse at 92 lines.

The five chunkiest production files:

TransportPickupService.cs — 691 lines, cyclomatic complexity 131
ProducerPricingComputer.cs — 494 lines, complexity 79
StreetRenderer.cs — 431 lines (rendering, gets a pass)
GodotRenderingCallback.cs — 397 lines (also rendering)
MarketClearingService.cs — 370 lines, complexity 52

The transport service is the obvious refactor target. It is doing too much.

Handlers stay thin

The architecture rule is that Application-layer event handlers should be thin orchestrators and never carry business logic. Across 51 handlers:

median lines of code: 36
median dependencies: 3
median complexity: 4

The five fattest:

RegionInitializationHandler — 197 LOC
QuarterlyRentCollectionHandler — 154 LOC
AnnualTitheCollectionHandler — 130 LOC
MarketAnalysisRefreshHandler — 95 LOC
EmploymentResultRecordingHandler — 92 LOC

Anything over 50 LOC is on the watchlist. The medians are healthy; the long tail is what I have to keep an eye on.

Claude Code, in sessions

This is the part that still surprises me. The numbers below are just for the GrandStrategy project — there are also adjacent projects (vibe-overflow, the site, Pipeline, etc.) but the main repo dominates.

Metric	GrandStrategy
Conversations	581
Messages	210,210
Input tokens	8.1M
Output tokens	26.6M
Cache writes	388M
Cache reads	9.83B
Equivalent API cost	~$8,286

A few things stand out.

Cache reads are 25× cache writes. Each cached prefix gets reused about 25 times on average. Prompt caching is doing extremely heavy lifting here — without it the bill would be much, much higher.

Output is 3× input. The model is writing more than it is reading from me. This is mostly tool output, generated code, and reasoning, not dialogue.

Costs are subscription-flat. The ~$8,286 figure is what those tokens would have cost at API list prices. I am on a Claude subscription, so my actual spend is a flat monthly number. But it does give a sense of the volume.

Models used

Model	Messages
Opus 4.6	126,617
Haiku 4.5	65,367
Opus 4.5	18,588
Opus 4.7	5,080
Sonnet 4.6	3,934
Sonnet 4.5	1,392

Opus 4.6 was the workhorse for most of the project. Haiku 4.5 shows up heavily for cheaper subagent and exploration work. I have only recently moved to Opus 4.7 — that number will keep growing.

Activity over time

Weekly messages, oldest first:

Week starting	Messages
2026-01-05	3,306
2026-01-12	6,706
2026-01-19	20,852
2026-01-26	19,436
2026-02-02	7,631
2026-02-09	36,783
2026-02-16	49,846
2026-02-23	10,461
2026-03-02	34,523
2026-03-09	27,486
2026-03-16	53,663
2026-03-23	43,653
2026-03-30	48,866
2026-04-20	12,600
2026-04-27	325

Peak week was 53,663 messages — about 7,600 per day. That includes every tool call, subagent message, and intermediate step, not only my prompts. The dip in late April is real: I was doing the outer-market integration and spending more time thinking than typing.

Git, in commits

The Claude session counters are one half of the picture. Git history is the other half — what actually got written down.

Volume and churn

Metric	Value
Commits	555
Active commit-days	80
Lines added	890,427
Lines deleted	644,318
Net	+246,109
Median churn per commit	526
p90 churn per commit	4,716

So roughly 890k lines were written and 644k of those were later removed, net of a current 110k tree. The codebase has been written about three times over. Median commit is small (~500 lines of churn), but the p90 is nearly 10× the median: a barbell of tiny corrections and sweeping refactors with not much in between.

The five biggest single commits:

style: fix formatting issues (line endings and whitespace) — 214k churn (one-off, mass \r\n reflow)
clean up — 146k churn
fix: Fix test failures and improve employment system — 56k churn
move to ecs — 45k churn (a real architectural pivot)
fix: Complete trading system and price adjustment mechanism — 40k churn

What kind of commits

Top first-verb in commit messages, n=555:

Verb	Count	Share
`fix`	87	15.7%
`refactor`	83	15.0%
`feat`	68	12.3%
`add`	52	9.4%
`cycle`	43	7.7%
`merge`	39	7.0%
`docs`	25	4.5%
`remove`	15	2.7%

cycle is claude-cycles runs — autonomous iterative refactoring. Collapsed into semantic buckets:

Cleanup/refactor (refactor+cycle+remove+clean+decouple+move+extract+decompose): 29.0%
Net-new (feat+add+added+implement+create): 24.9%
Bug-shaped (fix+fixed): 16.4%
Plumbing (merge+docs+chore+wip+phase+specs+update+style): 16.2%

Cleanup beats net-new. That is the line that explains the 1.14 test ratio and the modest current LOC. The project spends more commits removing and reshaping code than adding features.

Commits vs Claude messages

Same weeks as the table above, with commits aligned alongside:

Week starting	Messages	Commits
2026-01-05	3,306	0
2026-01-12	6,706	19
2026-01-19	20,852	16
2026-01-26	19,436	3
2026-02-02	7,631	0
2026-02-09	36,783	0
2026-02-16	49,846	0
2026-02-23	10,461	0
2026-03-02	34,523	30
2026-03-09	27,486	38
2026-03-16	53,663	57
2026-03-23	43,653	18
2026-03-30	48,866	64
2026-04-20	12,600	3
2026-04-27	325	0

The four weeks of Feb 2 – Feb 23 racked up ~104k Claude messages and zero commits. Heavy-typing weeks were not shipping weeks; they were exploration and refactor-design weeks where the commit only landed at the end. March is the inverse — high messages and high commits, because by then I knew what I was building. The shape of the work is visible in this gap.

Churn hotspots

Most-touched .cs files over the project’s life:

Touches	File
82	`Application/ServiceCollectionExtensions.cs`
38	`Domains.Market/Presentation/MarketFacade.cs`
34	`Domains.Merchant/Presentation/MerchantFacade.cs`
31	`Tests/Behaviors/ComprehensiveEconomyTests.cs`
30	`Domains.Transport/Presentation/TransportFacade.cs`
28	`Application/Handlers/Migration/ArrivalEvaluationHandler.cs`
26	`Domains.Trading/Presentation/TradingFacade.cs`

DI registration tops the list — ServiceCollectionExtensions.cs churns every time a new domain or handler is wired in, so it doubles as a temporal index of project growth. Below it, four of the next six are domain facades: the public seams between Application and the 26 domain projects. The architecture rule that “each domain exposes a public IXxxFacade” shows up here as a pattern in the version control data.

Worth noting: TransportPickupService.cs, the post’s #1 chunkiest production file at 691 LOC, does not appear in the top-touched list. It grew big without being repeatedly rewritten — so it accreted responsibilities rather than going through real iteration. That is a different and more concerning failure mode than “battle-scarred refactor target.”

What this does not measure

These numbers do not say anything about quality — whether the code is correct, whether the tests catch real bugs, or whether the architecture will hold. That is a different post. They do show the shape of the project: a lot of small handlers on top of 26 domain projects, with more test code than production code and a Claude Code interaction graph that is dominated by cache reuse.

Snapshot taken 2026-04-29 from scripts/codebase-stats.ps1, scripts/git-stats.ps1, and claude-explorer stats. I will probably re-run this every couple of months and see what moves.

Code, in lines#

Production, by group#

Handlers stay thin#

Claude Code, in sessions#

Models used#

Activity over time#

Git, in commits#

Volume and churn#

What kind of commits#

Commits vs Claude messages#

Churn hotspots#

What this does not measure#