billroper | An Old-Fashioned Butt Kicking

My calculations now appear to be running correctly when single-threaded, but do silly things when they are multi-threaded. I haven't yet figured out for sure what the problem is (I need to pull more data), but I strongly suspect that the failure is due to continuing problems with my multi-threaded topological sort of a doubly-linked list which is, at least at the moment, kicking my butt.

The problem -- I think -- is that I need to more aggressively sort cells that have been tentatively calculated to higher positions on the list so that they are guaranteed to be above any cell that depends on them. Right now, they may be calculated but below a cell that depends on them, which produces artifacts when looping. I can get rid of the artifacts via brute force, but the brute force approach is expensive for large lists, so I'd rather try a bit more finesse.

(If it were possible to trivially conclude that cell A is either above or below cell B in the list, that would make life simple, but that's not something that can be done without substantial maintenance of ordering information which is even more expensive. I considered a split-the-difference approach with ordered floating point values, but I would end up with all my cells with wonky values between 0 and 1. :) )

In any case, it is time to go to the swimming pool now... :)

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Most Popular Tags

apba - 84 uses
baseball - 898 uses
birthday - 53 uses
capricon - 144 uses
cardinals - 274 uses
cars - 95 uses
cats - 80 uses
christmas - 123 uses
coding - 67 uses
comics - 86 uses
computers - 514 uses
cons - 1810 uses
cool - 148 uses
cubs - 509 uses
dodeka - 284 uses
dogs - 131 uses
filk - 2753 uses
food - 127 uses
gretchen - 179 uses
health - 371 uses
home - 7719 uses
house - 61 uses
humor - 225 uses
isfic press - 134 uses
java - 126 uses
kids - 2385 uses
knee - 158 uses
lyrics - 282 uses
meme - 163 uses
microsoft - 127 uses
movies - 198 uses
music - 84 uses
musings - 12622 uses
photos - 123 uses
politics - 297 uses
review - 158 uses
school - 249 uses
sf - 64 uses
softball - 185 uses
space - 82 uses
spacetime - 168 uses
stuff - 340 uses
taxes - 218 uses
tech - 1499 uses
tv - 211 uses
vacation - 106 uses
weather - 295 uses
windycon - 348 uses
work - 2388 uses
worldcon - 68 uses

Flat | Top-Level Comments Only

$mdlbear: blue fractal bear with text "since 2002" (Default)$

From:

mdlbear

My inclination would be to keep a set of cells sorted by number of unprocessed cells they depend on. You can use a TreeSet for this, with a custom Comparator.

Whenever you process a cell you pull its dependencies out of the set, decrement their counts, and put them back. Access to the set has to be synchronized, of course.

Alternatively, you could pull down the sources to Ant or Maven, and see how they do it.

kyril.livejournal.com

You use "tentative" and "guarantee" together in a way that does not work...

I don't know which is the expensive part: the sorting of the nodes, or the processing of nodes whose dependencies are all processed. If the node processing is expensive, can you leave the sort itself single-threaded but start processing dependency-free/dependency-satisfied nodes with the other threads as nodes become available?

Depending on how your dependencies are expressed, does it work to just notice the artifacts and redo the "consequence" nodes? Or do the dependencies fan out rather than narrowing down?

Is it efficient to calculate the height/depth of each cell, so you can shorten the longest chains first? I assume not, or your "higher positions" calculation would not be tentative.

Is this topological sort in a loop such that your first pass might be slow/inaccurate but subsequent passes will be better?

A queue with just one producer and just one consumer needs no synchronization (though a semaphore might be polite; if you can't block the consumer, maybe you can yield, or just plain do something else for a while, rather than busy wait).

Hopefully you get the profiler going and can figure out where the time is being lost--and whether it's degenerating into linearity somehow. Otherwise it's down to the most generic of optimization strategies: what is taking the most time/effort to compute, and how can you either leave/send ourselves a note (if we ever knew it) or avoid needing to know that information?

(The thing my algorithms class in college never really got into was describing how you store the edges of a directed graph in a usable way...)

Bill Roper's Journal

An Old-Fashioned Butt Kicking

An Old-Fashioned Butt Kicking

no subject

no subject

Profile

April 2026

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags