Blog | Shipstry

I don't love touching production unless I have to.

Feature work is fun. Production upgrades are not. Feature work gives you screenshots. Production upgrades give you checklists, backups, validation queries, and a very clear picture of whether you actually trust your own system.

Today Shipstry went through one of those upgrades.

The maintenance window took about 24 minutes.

During that stretch I froze writes, backed up production D1, migrated comments, moved payments onto a cleaner internal model, backfilled historical orders, checked that old entitlements and old paid amounts still lined up, redeployed twice, and only reopened writes after every gate passed.

If you're a user, the visible part is pretty small. The site stayed up. Public pages kept loading. Things mostly looked normal.

That's exactly how I wanted it.

Why I bothered

The dev branch had already pushed Shipstry forward in a lot of visible ways:

a real blog with comments and likes
launch badges
better discovery through category pages
flexible featured-day pricing for Flagship
upgrades for already launched products
fairer upgrade pricing that respects what a maker already paid

That part is easy to talk about. It's the fun part.

The less fun part is what happens when the product underneath starts outgrowing yesterday's data model.

Two areas were starting to feel like that.

Comments were one. Product comments and blog comments had drifted enough that pretending they were the same thing was no longer buying simplicity. It was just storing up future mess.

Payments were the other. Once Shipstry started supporting more than one kind of paid action, the old order model stopped being a clean fit. I needed a clearer separation between:

the payment itself
what the user bought
what that purchase unlocks or changes

That sounds abstract until you hit real product questions:

Does an old backlinks purchase still grant access?
If a maker already paid before, does that amount still count toward the next upgrade?
Can I evolve the payment system without dragging a pile of legacy assumptions through every new feature?

At that point, "we'll clean it up later" stops being pragmatic. It's just procrastination with a nicer name.

The rule I cared about most

I did not want a full outage.

I also did not want live writes continuing while I changed the shape of production data underneath them.

So the rule was simple:

reads stay up, writes freeze first.

That only works if the app has a real maintenance switch. Not a plan to "be careful." Not a handful of TODOs. A real switch.

Before the release, I added a centralized MAINTENANCE_WRITE_FREEZE guard and wired it through the places that can mutate D1:

checkout creation
Stripe webhook mutations
comments and likes
votes
profile updates
admin write actions
draft save / submit / delete flows
notification endpoints that mutate state

I wasn't chasing elegance. I wanted control.

If I ever have to do this again, I want one switch that actually means one switch.

The boring part that saves you later

Before I touched a migration, I exported production D1.

Good. It should feel boring.

Backups are only exciting when you forgot to take one.

I also recorded baselines before the destructive parts:

legacy comment count
legacy comment-like count
top-level vs reply comments
soft-deleted comments
legacy order count
paid backlinks count

Those numbers became gates later. Not "seems fine." Not "it loaded on my laptop." Actual before-and-after checks.

I don't think there's a glamorous version of this. It's just the work.

Production had a surprise waiting

One thing I like about production is that it has no interest in your assumptions.

Before the main cutover, I checked for old pricing-tier values that should have been gone by now.

Production still had legacy expedition data in both:

product.pricing_tier
legacy order.tier

That was useful. Annoying, but useful.

It meant I couldn't trust migration history in the abstract. I had to trust the database I was actually about to operate on.

So I reran the tier-normalization migration before the comment and payment cutover. That cleaned up the last bit of visible tier drift before final deployment.

It's a small example, but it captures something important: production work gets easier the moment you stop arguing with reality.

The comment migration

The comment migration was straightforward on paper and high-risk in practice.

Old comment data had to be split into the new tables:

product_comment
product_comment_like
blog_comment
blog_comment_like

The schema work itself wasn't the scary part. The scary part was making sure nothing quietly went missing or got detached during the move.

So I checked what actually matters:

comment counts matched
comment-like counts matched
orphan replies were zero
orphan likes were zero

In this case the production dataset was small. I still treated it like a serious migration. Row count doesn't change the standard.

The payment migration was the real reason for the upgrade

The bigger job was payments.

Shipstry now reads runtime payment state through a cleaner domain:

payment_order
purchase
product_submission_purchase
product_upgrade_purchase

I wanted this because the old model was doing too much with too little structure.

Once Shipstry started to support:

backlinks access
paid submissions
upgrades
historical pricing continuity

...I needed the database to answer product questions without guesswork.

The biggest one was upgrade pricing.

If a maker already paid for an earlier tier, Shipstry should not pretend that money never existed. Their next upgrade should take that prior payment into account. From the user's perspective that's obvious. From the schema's perspective it's only obvious if the data model supports it.

The backfill was where I had to be careful

Schema migration alone wasn't enough.

The new read paths needed old production data to make sense inside the new payment model. That meant historical orders had to be backfilled into the new tables in a way the current code could actually use.

There were two things I absolutely did not want to break:

historical backlinks access
historical paid-amount continuity for product upgrades

I tested the backfill locally against production snapshots before touching live production. That helped answer the question I cared about most:

Could the old data safely reconstruct exact historical upgrade transitions?

Not reliably.

Legacy order rows could safely tell me:

this user bought backlinks access
this product had a paid submission
this product had this historical paid total

What they could not always tell me, with enough confidence, was the exact boundary between an original submission and a later upgrade in every case.

So I kept the migration conservative.

I backfilled what I could defend. I refused to fake the rest just to make the schema look neat.

Cleanly modeled fiction is still fiction.

The one thing that broke

There was exactly one real hiccup during the window.

The first run of 0040_backfill_order_to_payment_domain.sql failed.

The logic was fine. The data was fine. The problem was D1 remote execution refusing the explicit SQL transaction wrapper in the file.

This is the kind of failure you can live with if you planned the window properly:

write freeze still on
backup already taken
no pressure to reopen writes early

So I removed the explicit BEGIN TRANSACTION / COMMIT, reran the migration, and it finished cleanly.

No rollback. No restore. Just a contained operational fix.

Honestly, that felt like a win. Production rarely rewards you with a perfect script and zero surprises. What you want is a setup where surprises stay small.

I treated the database gate as the real release gate

I did not consider the release "done" because migrations stopped erroring.

I considered it done only after the database proved that the migration had actually preserved what mattered.

The gate looked roughly like this:

comment counts still matched
no orphaned comment relationships
no leftover expedition or admiral rows
payment_order count matched legacy order count
migrated purchase count matched expectations
active backlinks purchases matched the old paid backlinks count
a known historical backlinks buyer still resolved correctly
a known paid product still reported the exact same historical paid total under the new model

I also kept write freeze enabled while checking public routes:

homepage
product detail
explore
blog
RSS
sitemap

The site stayed readable the whole time, which was the goal from the start.

Only after the database gates passed did I flip write freeze back off and deploy the final version.

What this actually changes for users

From the outside, this could easily be described as "infra work."

That's true, but incomplete.

This release makes a bunch of user-facing work more trustworthy:

Blog with discussion
launch badges people can actually share
category pages and cleaner discovery
more flexible featured-day pricing
upgrades for launched products
fairer upgrade pricing based on previous payments

The invisible part is what makes the visible part credible.

Users don't care that I created payment_order and purchase. They care that:

paid access still works
previous payments still count
comments still load
upgrades make sense
the platform feels stable

That's the standard I was aiming for.

What I took away from it

The biggest lesson is the obvious one, but I keep relearning it anyway: if a release needs a maintenance switch, build the maintenance switch before release day.

The second lesson is that production always gets the last word. If the live database says an old tier is still there, it is still there, no matter how tidy your migration history looks in Git.

The third lesson is that conservative backfills are underrated. When historical data cannot safely reconstruct perfect semantics, preserve runtime correctness first. Do not invent neat stories for the database.

And maybe the biggest one: the release is not the migration script. The release is the gates. Backups, baselines, validation queries, smoke tests, and not reopening writes early just because you're tired.

Why this one feels important

Shipstry has had plenty of visible progress recently.

That's the easy part to point at. Blog. Comments. Badges. Discovery. Featured placement. Upgrades. Trade Ports.

But every product hits a point where moving faster means cleaning up the foundation under it, not layering another feature on top and hoping future-you figures it out.

This was one of those points.

It took about 24 minutes in production. It took much longer to make those 24 minutes safe.

And I think it was worth it, because Shipstry now feels closer to the thing I want to build: not just a place where products get listed, but a launch platform that can keep growing without dragging old assumptions behind it forever.

Building Stability: How Shipstry Navigated a Live Production Upgrade in 24 Minutes