A Few Simple Rules
Disclaimer
The views expressed in this course represent the views of Anne Deslattes Mays, PhD and do not represent the views of NICHD, NIH or the United States Government.
In what follows - is my development of a practice that enables workflow and platform independence facilitating reproducibility.
Learning from those who have walked the journey
Elements of Programming Style B. W. Kernighan and P. J. Plauger, The Elements of Programming Style 2nd Edition, McGraw Hill, New York, 1978. ISBN 0-07-034207-5
The year was 1919, the first World War was at its close and a student, E. B. White, took a course, English 8, taught by William Strunk Jr. The course featured a required textbook, a slim volume called The Elements of Style. The durability of this slim book informed the development of the book, The Elements of Programming Style by Kernigan and Plauger, whose lessons we adapt here in this course. Showing again the durability of the approach of beginning with philosophy as one approaches their work and use of programs and structure to achieve their work.
So with this nod to E.B. White, William Strunk, Jr, Brian Kernigan and Plauger, we begin with our own Lessons and Pithy Phrases.
Lessons Translated to the Workflow/Containerized Process
(Truncated Pithy Phrases)
Where I have made modifications to these Pithy Phrases
to the map to what we are teaching here will be in italized and emphasized
Its lessons are summarized at the end of each section in pithy maxims, such as "Let the machine do the dirty work":
- Write clearly – don't be too clever.
timeless
- Say what you mean, simply and directly.
timeless
- Use
containerized processes
(in a way similar to library functions) whenever feasible. - Avoid too many temporary variables.
- Write clearly – don't sacrifice clarity for efficiency.
timeless
- Let the machine do the dirty work.
timeless
- Replace repetitive expressions by calls to common functions.
timeless - when you start to see yourself do this - replace with a single function
- Parenthesize to avoid ambiguity.
9 Choose variable names that won't be confused.
timeless
- Avoid unnecessary branches.
- If a logical expression is hard to understand, try transforming it.
timeless
- Choose a data representation that makes the program simple.
- Write first in easy-to-understand pseudo language; then translate into whatever language you have to use.
timeless
- Modularize. Use procedures and functions.
and containerize - use GitHub Actions to build, test and keep up-to-date
- Avoid gotos completely if you can keep the program readable.
- Don't patch bad code – rewrite it.
- Write and test a big program in small pieces.
timeless this can be done by the use of these tested, dockerized processes
- Use recursive procedures for recursively-defined data structures.
- Test input for plausibility and validity.
timeless - make sure you understand the source of your data
- Make sure input doesn't violate the limits of the program.
- Terminate input by end-of-file marker, not by count.
- Identify bad input; recover if possible.
- Make input easy to prepare and output self-explanatory.
- Use uniform input formats.
- Make input easy to proofread.
timeless
- Use self-identifying input. Allow defaults. Echo both on output.
- Make sure all variables are initialized before use.
- Don't stop at one bug.
- Use debugging compilers.
timeless this is different with workflow languages - you can test each of the steps in the workflow verifying inputs, outputs and processes - dockerizing, testing
- Watch out for off-by-one errors.
timeless
- Take care to branch the right way on equality.
- Be careful if a loop exits to the same place from the middle and the bottom.
- Make sure your code does "nothing" gracefully.
- Test programs at their boundary values.
- Check some answers by hand.
timeless
- 10.0 times 0.1 is hardly ever 1.0.
timeless always aim for simplicity
- 7/8 is zero while 7.0/8.0 is not zero.
timeless but this would be better for an R or a Python class
- Don't compare floating point numbers solely for equality.
timeless but this would be better for an R or a Python class
- Make it right before you make it faster.
timeless for everything
- Make it fail-safe before you make it faster.
timeless
- Make it clear before you make it faster.
timeless
- Don't sacrifice clarity for small gains in efficiency.
timeless
- Let your compiler do the simple optimizations. `again, for our world of platforms, let your platform help you - Platforms as a Service, such as CAVATICA by Seven Bridges and CloudOS by Lifebit
- Don't strain to re-use code; reorganize instead.
timeless the more you perform a task, the simpler you see how to get it done, exploit that simplicity and rewrite
- Make sure special cases are truly special.
timeless
- Keep it simple to make it faster.
timeless
- Don't diddle code to make it faster – find a better algorithm.
timeless find another Bioinformatics algorithm, collaborate, give attribution and expand your reach
- Instrument your programs. Measure before making efficiency changes.
timeless - this means if you introduce changes - are they appropriate
- Make sure comments and code agree.
timeless
- Don't just echo the code with comments – make every comment count.
timeless
- Don't comment bad code – rewrite it.
timeless
- Use variable names that mean something.
- Use statement labels that mean something.
- Format a program to help the reader understand it.
timeless
- Document your data layouts.
timeless
- Don't over-comment.
timeless