Thursday, April 23, 2009

probability and project management, continued

one question that immediately arises with probabilistic scheduling: how do i define pdfs/cdfs for individual tasks? network combinations of tasks are straightforward from there, and the averaging that happens probably makes the small-scale choices less important as long as there aren't bias errors. it seems popular to use the beta or triangular distributions with upper and lower bounds and a most likely time. the bounds are important, i think, because no task will have <= 0 time and eventually i will need to quit. to make this work with the idea in the last post about a success/failure + time joint pdf, i would need pdfs for both success and failure and a probability of success. there are constraints, though; the success and failure marginal pdfs should have the same upper bound. (once success is no longer possible, failure is guaranteed.) they could have different lower bounds. (the time required to realize something is impossible could be more or less than the minimum time to succeed.) now here's an interesting question: will my pdfs tell me to try an alternate task before i've failed the first one? in other words, can a marginal pdf conditioned on a minimum time (the time i've already spent on it) tell me that my chance of success is so low that i might as well try the next approach? this could very well happen if the most likely time given success is shorter than the most likely time given failure, and i reach the time in between the two without finishing. hmmm, this could be useful. but it also makes the order optimization more complicated if i need to assume that i will switch to the next alternate before failing. project management people have acknowledged the fact that it's hard for people to estimate probabilities in the absence of data. i could at least evaluate my estimation in hindsight, however, by looking at the distribution of the estimated cdf value at the realized time. under the null hypothesis of always getting the right distribution, this empirical distro should be uniform on [0,1].

graph theory, probability, and project management

many buzzwords and visualization strategies have floated down the project management river over the years: gantt chart, pert, cpm, event chain methodology, etc. one thing they all seem to have in common, though, is the underlying representation is a (possibly hierarchical, as with gantt or event chain) directed graph. so maybe the particular representation is not so important, and there have been few real advances in the way of thinking about project managment over the last hundred years. the one rational explanation for the diversity of planning methods i can see is in the application. sometimes, as in a factory/operational/repetitive setting, a process-based approach makes the most sense. one-time projects are fundamentally different, and this is where graph-based methods come in. but many of those, like construction projects, have well-defined, already proven tasks even though the overall project is unique. there may be a little uncertainty in task duration, but there is not much uncertainty about success or failure on the small scale. these are also projects in which resources are easily scaled; pour in more materails, people, and workspace to reduce the project time. that is why so many of these graph methods focus on time-critical paths and slack times. there is an assumption that the completion times are dependency-constrained, and there are people available somewhere to work on any given task as long as you give them the required resources. r&d projects are unique one-timers, which would indicate a graph method. but now there are significant probabilities that any given task on the small scale could fail, so alternate workaround processes are needed to mitigate failure risk on the large scale. also, the mythical man-month principle comes into play: many times these are small projects with a few highly specialized people (maybe just me!) and trying to bring in other people will only slow things down. so concepts like parallel work paths with slack time are irrelevant. i will have to do both parallel tasks even though they don't depend on each other, and the only question is which to do first. methods like pert have always had a concept of task duration as a random variable. maybe i could extend this to a joint pdf with a succeed/fail discrete random variable. then i could have multiple nodes with the same input/output edges that represent alternative approaches to generate the same products. with two possibilities as an example, i would need to choose one to try first and only do the other if the first failed. the time rv given failure of both would not depend on the order; it is the (scaled) convolution of the two failure pdfs. the time rv given success would be the success pdf + the convolution of the failure pdf of the first and the success pdf of the second, so it would depend on the order. in a practical application, i don't think i would bother to compute convolutions. most graphs i'll be dealing with will be small enough that i could just monte carlo the bejesus out of it. with a hierarchical representation, i would only need to consider a part of the total project at a time, just chopping out a chunk for which all possibilities have all the same inputs and outpus. each choice of n possible approaches will have n! orderings, each with a different time pdf given success. but i should be able to throw away many of them automatically if there is another cdf which is greater for every time and therefore indisputably better. more subjective criteria will be needed to choose between cdfs that cross each other. (maybe which one is greater for >50%, a more-likely-better criterion?[1]) to choose between which of parallel tasks i should try first, i should take the attitude that, if the project is going to fail, i want to know as soon as possible. so i should seek short time rvs given failure. i think it's the complementary case to alternatives; i can think of all alternates as being in the graph in series and, when an attempt succeeds, all the following alternates collapse to zero time. for parallel tasks, the rest of the graph disappears if one necessary task completely fails (with no available alternates). in both cases, i want to hit the collapsing event in the shortest time. for parallel, analogous to complete failure of alternates, the total time given success (and the probability of success) in all of them is the same regardless of order (assuming a one-person project). so order matters only for eventual failure for parallel. the discussion above assumes all individual tasks have independent pdfs. if my joint pdfs are conditional on other events, i think i should take that as a sign that i should split up the tasks rather than actually try to deal with probabilistic dependence. so this approach would tell me what i should be working on at any given time in my individual work, thereby avoiding inefficient personal biases and procrastination, etc. i'd like to try it, but i want to make sure i can use a file format that will allow easy data input with a gui and final gantt-chart-like output. i think i'll check openproj and taskjuggler for this. [1] this will not always produce a unique ranking. for example, with 3 cdfs with three orderings 1>2>3, 3>1>2, 2>3>1, each with 1/3 probability, will give three circular comparisons: 1>2, 2>3, 3>1. but this criterion might at least weed out some that are worse than all others.

activity-on-node vs. activity-on-edge

the old pert planning book from the 60s that i'm reading uses an activity-on-edge approach, with an appendix contrasting this approach with activity-on-node. at first i was thinking that aoe had conceptual advantages, but i'm not so sure anymore. and it seems that every project management software out there today that has a network representation uses aon. so what is really the difference? with aoe, the edge represents a work process. people say the nodes represent events, but i tend to think of events as things that are externally imposed. i think of nodes on an aoe graph as collectively defining the project _state_; ie, the nodes that do not depend on any unfinished edges at a given point in time define what work has been completed and what can now begin. with aon, the nodes represent a work process and the edges represent products/resources. each task depends on incoming resources and sends its products out through the graph edge pipelines to other tasks. some dummy nodes might be necessary for interface events, just to make the math work, since true deliverables and external resources are dangling edges.

planning quotes

couple of quotes about planning that i find useful: "In preparing for battle I have always found that plans are useless, but planning is indispensable." Dwight D. Eisenhower (1890-1969) i've seen many variations of this, and even attributions to other people. but this one is from the columbia world of quotations, so i think it's more authoritative. it's a pithy summary of how the mental discipline and attention to detail required by planning is usually more important than the plan produced. "the contemporary advanced-technology project demands advances that cannot depend on accident or chance; breakthroughs are increasingly the result of steady, planned, technological pressure under economical and political conditions which demand advances -- on ever-shortening time cycles. we have arrived at a point where we must _schedule_ creativity and invention, as well as the production which follows right on its heels, and must attempt to predict with some accuracy when operational hardware will be delivered and what it will cost." archibald and villoria, network-based management systems (pert/cpm), p. 77 i like the way this quote rejects the attitude that creative advances are haphazard and serendipidous. if i can plan well enough, i can predict and depend on them.

the success or failure of projects is determined in the first 10% of their lifetimes. fergus o'connell "how to run successful projects" (1994). very true in instructor's experience. you have to start the right way and know how you'll get there.

definition: a project is a unique exercise which aims at a defined outcome. it is not part of otherwise routine operation.

don't use microsoft project; it is useless. people are notoriously bad at estimating time required for tasks. ususally it takes about twice as long as you think. visualizing task dependence can help to envision concurrence and keep track of the critical path (longest minimum time, not necessarily most important) which can change.

Friday, April 17, 2009

project management software

i've been searching in vain lately for software that will magically make my projects organized. i'm most interested in using network-based methods, though these seem to be out of fashion since their peak in the 60s. the best book i've been able to find so far on this is 'network-based management systems (pert/cpm)' by archibald and yilloria (available pretty cheap on amazon). it's so old that most of the computational stuff is probably worthless; just monte carlo the thing and go home. the thing is, i don't care so much about resource leveling, since i'm the only resource. i do care about uncertain time estimates, but i would like to go beyond that to account for uncertain success for task alternatives. i tend to do more r&d type projects, where there is more than one way to approach a problem but not all approaches will work. i want my project plan to help me decide what to try first, either to eliminate unlikely possibilities or to confirm prerequisites quickly. i think this can fit the old network methodology if i use boolean inputs at event nodes or (probably better) i use a hierarchical network where multiple approaches are abstracted as a higher-level activity. i can't believe no one has done this before, but i can't find any refs. i guess the typical gantt/work breakdown structure has a hierarchical structure, so maybe i can take advantage of that when interfacing project management software. taskjuggler uses text format input files for gantt-chart-like dependencies that i might reasonably generate from a network topology. that would allow me to make reports that are intelligible to other people, but i would still have to write all the code for the network operations, especially dealing with probabilistic time and success representations (probably unavoidable anyway). openproj.org is another one worth looking at. relatively feature-rich, including hosting and (i think) some m$ project compatibility. probably this or taskjuggler would be the best to work with. definitely best for simple and/or one-offs; i've used it this way before, on windows and linux. kplato is less than impressive, and i saw some mention of them wanting to merge with taskjuggler anyway. www.sharedplan.com offers hosted planning. nice for security and collaboration, but i'm not sure what exactly their features are.