Thursday, April 23, 2009

graph theory, probability, and project management

many buzzwords and visualization strategies have floated down the project management river over the years: gantt chart, pert, cpm, event chain methodology, etc. one thing they all seem to have in common, though, is the underlying representation is a (possibly hierarchical, as with gantt or event chain) directed graph. so maybe the particular representation is not so important, and there have been few real advances in the way of thinking about project managment over the last hundred years. the one rational explanation for the diversity of planning methods i can see is in the application. sometimes, as in a factory/operational/repetitive setting, a process-based approach makes the most sense. one-time projects are fundamentally different, and this is where graph-based methods come in. but many of those, like construction projects, have well-defined, already proven tasks even though the overall project is unique. there may be a little uncertainty in task duration, but there is not much uncertainty about success or failure on the small scale. these are also projects in which resources are easily scaled; pour in more materails, people, and workspace to reduce the project time. that is why so many of these graph methods focus on time-critical paths and slack times. there is an assumption that the completion times are dependency-constrained, and there are people available somewhere to work on any given task as long as you give them the required resources. r&d projects are unique one-timers, which would indicate a graph method. but now there are significant probabilities that any given task on the small scale could fail, so alternate workaround processes are needed to mitigate failure risk on the large scale. also, the mythical man-month principle comes into play: many times these are small projects with a few highly specialized people (maybe just me!) and trying to bring in other people will only slow things down. so concepts like parallel work paths with slack time are irrelevant. i will have to do both parallel tasks even though they don't depend on each other, and the only question is which to do first. methods like pert have always had a concept of task duration as a random variable. maybe i could extend this to a joint pdf with a succeed/fail discrete random variable. then i could have multiple nodes with the same input/output edges that represent alternative approaches to generate the same products. with two possibilities as an example, i would need to choose one to try first and only do the other if the first failed. the time rv given failure of both would not depend on the order; it is the (scaled) convolution of the two failure pdfs. the time rv given success would be the success pdf + the convolution of the failure pdf of the first and the success pdf of the second, so it would depend on the order. in a practical application, i don't think i would bother to compute convolutions. most graphs i'll be dealing with will be small enough that i could just monte carlo the bejesus out of it. with a hierarchical representation, i would only need to consider a part of the total project at a time, just chopping out a chunk for which all possibilities have all the same inputs and outpus. each choice of n possible approaches will have n! orderings, each with a different time pdf given success. but i should be able to throw away many of them automatically if there is another cdf which is greater for every time and therefore indisputably better. more subjective criteria will be needed to choose between cdfs that cross each other. (maybe which one is greater for >50%, a more-likely-better criterion?[1]) to choose between which of parallel tasks i should try first, i should take the attitude that, if the project is going to fail, i want to know as soon as possible. so i should seek short time rvs given failure. i think it's the complementary case to alternatives; i can think of all alternates as being in the graph in series and, when an attempt succeeds, all the following alternates collapse to zero time. for parallel tasks, the rest of the graph disappears if one necessary task completely fails (with no available alternates). in both cases, i want to hit the collapsing event in the shortest time. for parallel, analogous to complete failure of alternates, the total time given success (and the probability of success) in all of them is the same regardless of order (assuming a one-person project). so order matters only for eventual failure for parallel. the discussion above assumes all individual tasks have independent pdfs. if my joint pdfs are conditional on other events, i think i should take that as a sign that i should split up the tasks rather than actually try to deal with probabilistic dependence. so this approach would tell me what i should be working on at any given time in my individual work, thereby avoiding inefficient personal biases and procrastination, etc. i'd like to try it, but i want to make sure i can use a file format that will allow easy data input with a gui and final gantt-chart-like output. i think i'll check openproj and taskjuggler for this. [1] this will not always produce a unique ranking. for example, with 3 cdfs with three orderings 1>2>3, 3>1>2, 2>3>1, each with 1/3 probability, will give three circular comparisons: 1>2, 2>3, 3>1. but this criterion might at least weed out some that are worse than all others.

No comments: