Tuesday, March 2, 2010

file readers

i've spent way too much time already writing a file reader for slurping up binary data from an experimental setup. i've been burned in so many different ways by incorrect or nonsensical data in these files, i've decided on a rule i should follow any time i need to do this again: read each atomic unit of data with as few context assumptions as possible; i.e., loop over units in the stream, do not loop over any assumed structure for the units. both missing and duplicate data have wreaked havoc on my pretty little reader, each requiring a new refactoring. next time i will just start out iterating on the stream and plan on dealing with nonsensical structure, even when (!!) there is metadata that could turn out to be wrong. for now i'll just build the most convenient structure in memory for all the data that makes sense and throw junk into an extra array that i can check later if i need to. i won't be sure if that's the best way until i get a chance to use it.
maybe i should use something like python-hachoir for this.

No comments: