billroper: (Default)
[personal profile] billroper
My calculations now appear to be running correctly when single-threaded, but do silly things when they are multi-threaded. I haven't yet figured out for sure what the problem is (I need to pull more data), but I strongly suspect that the failure is due to continuing problems with my multi-threaded topological sort of a doubly-linked list which is, at least at the moment, kicking my butt.

The problem -- I think -- is that I need to more aggressively sort cells that have been tentatively calculated to higher positions on the list so that they are guaranteed to be above any cell that depends on them. Right now, they may be calculated but below a cell that depends on them, which produces artifacts when looping. I can get rid of the artifacts via brute force, but the brute force approach is expensive for large lists, so I'd rather try a bit more finesse.

(If it were possible to trivially conclude that cell A is either above or below cell B in the list, that would make life simple, but that's not something that can be done without substantial maintenance of ordering information which is even more expensive. I considered a split-the-difference approach with ordered floating point values, but I would end up with all my cells with wonky values between 0 and 1. :) )

In any case, it is time to go to the swimming pool now... :)

Date: 2014-07-14 12:03 am (UTC)
mdlbear: blue fractal bear with text "since 2002" (Default)
From: [personal profile] mdlbear
My inclination would be to keep a set of cells sorted by number of unprocessed cells they depend on. You can use a TreeSet for this, with a custom Comparator.

Whenever you process a cell you pull its dependencies out of the set, decrement their counts, and put them back. Access to the set has to be synchronized, of course.

Alternatively, you could pull down the sources to Ant or Maven, and see how they do it.

Date: 2014-07-14 07:18 am (UTC)
From: [identity profile] kyril.livejournal.com
You use "tentative" and "guarantee" together in a way that does not work...

I don't know which is the expensive part: the sorting of the nodes, or the processing of nodes whose dependencies are all processed. If the node processing is expensive, can you leave the sort itself single-threaded but start processing dependency-free/dependency-satisfied nodes with the other threads as nodes become available?

Depending on how your dependencies are expressed, does it work to just notice the artifacts and redo the "consequence" nodes? Or do the dependencies fan out rather than narrowing down?

Is it efficient to calculate the height/depth of each cell, so you can shorten the longest chains first? I assume not, or your "higher positions" calculation would not be tentative.

Is this topological sort in a loop such that your first pass might be slow/inaccurate but subsequent passes will be better?

A queue with just one producer and just one consumer needs no synchronization (though a semaphore might be polite; if you can't block the consumer, maybe you can yield, or just plain do something else for a while, rather than busy wait).

Hopefully you get the profiler going and can figure out where the time is being lost--and whether it's degenerating into linearity somehow. Otherwise it's down to the most generic of optimization strategies: what is taking the most time/effort to compute, and how can you either leave/send ourselves a note (if we ever knew it) or avoid needing to know that information?

(The thing my algorithms class in college never really got into was describing how you store the edges of a directed graph in a usable way...)

Profile

billroper: (Default)
billroper

January 2026

S M T W T F S
     1 2 3
4 5 6 7 8910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 11th, 2026 07:20 am
Powered by Dreamwidth Studios