billroper | The Digging Continues

So you remember the locking fixes that I've been working on for our server since August or thereabouts? The ones that were "finished"?

Yeah, not quite. My changes got through five days of successful Uptime testing (in that the server did not crash), but then the tester clicked on the tab in our admin app that would show the transaction results and the server GPFd. Sadly, we don't know where, because no one was logged in on the server machine, so the just-in-time debugging didn't kick in, but the logical assumption is that something was wrong in the code that would have returned the transaction results.

I cleaned that up, then spent some time looking at other code in the class that manages the transactions. Finding at least one locking error, I decided to rewrite it and bulletproof it, as much as is possible. That's almost done.

In the meantime, we're not entirely happy with the memory profile that the version with the locking fixes was showing, so our QE team went back to run the Uptime test against the baseline version.

And the server crashed within a day. More than once, as they restarted it and tried again.

I feel somewhat vindicated. :)

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Most Popular Tags

apba - 80 uses
baseball - 890 uses
birthday - 53 uses
capricon - 144 uses
cardinals - 274 uses
cars - 95 uses
cats - 80 uses
christmas - 123 uses
coding - 62 uses
comics - 86 uses
computers - 514 uses
cons - 1801 uses
cool - 148 uses
cubs - 505 uses
dodeka - 284 uses
dogs - 131 uses
filk - 2744 uses
food - 127 uses
gretchen - 177 uses
health - 361 uses
home - 7667 uses
house - 61 uses
humor - 225 uses
isfic press - 134 uses
java - 124 uses
kids - 2382 uses
knee - 158 uses
lyrics - 282 uses
meme - 163 uses
microsoft - 127 uses
movies - 197 uses
music - 84 uses
musings - 12564 uses
photos - 123 uses
politics - 297 uses
review - 157 uses
school - 247 uses
sf - 64 uses
softball - 185 uses
space - 82 uses
spacetime - 168 uses
stuff - 324 uses
taxes - 215 uses
tech - 1499 uses
tv - 208 uses
vacation - 106 uses
weather - 292 uses
windycon - 348 uses
work - 2370 uses
worldcon - 68 uses

Flat | Top-Level Comments Only

From:

phillip2637.livejournal.com

Locking in a multithreaded system is tough...also not well understood.

I once worked in a place with a reasonably large, cross-platform codebase where, before my arrival, the CTO had gone directly to a junior programmer and told him to no-op some of the locking macros for one specific system type because he thought they were slowing it down. Nobody else knew it had been done (which was a different and tougher problem). The port of the product to that platform was alpha-ish and the very few related hangs/crashes weren't an immediate priority. At some point they ran it under load on an 8-CPU system and you can imagine what all broke loose. :)

(*The CTO was promoted -- no connection to this incident -- and relocated to a place about 2500 miles away which was better for the tech environment, not so good for the company as a whole.)

billroper

After all of this, I'm getting a much better idea about locking. :)

On the subject of your former company, well, they shoot CTOs, don't they?

Bill Roper's Journal

The Digging Continues

The Digging Continues

no subject

no subject

Profile

February 2026

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags