Page
Scientific Software Development Lessons
The first few assignments of this course are highly focused on software development. This entails more than just writing a working program, as I am sure you have realised by now. Although your assignment submissions have received comments together with your grades, we thought it would be useful to make a list some of the more common mistakes and misconceptions in all student submissions, in order to have a place of reference for what we expect in terms of proper software development with C++ in a Linux command line environment.
On files and filenames
- The content of code files should be plain ASCII text. No .doc, .docx, .rtf, .odt.
- The extension of a C++ source or header file should never be .txt. Instead, .C, .cc , .cxx , .cpp, .h, .hxx, .hpp are all reasonable choices.
- Filenames should not have spaces or punctuation, even if your filesystem allows this, because handling such filenames may cause errors in many scripts and tools.
- Filenames should convey some sense of what its content is or does. Filenames should not encode authorship or provenance. If the file is under version control, its filename should not encode any sort of versioning, unless the file is part of a test that compares with a fixed earlier version of the code.
- Always add a README file to explain what files are for what, and any other instructions that a new user of your code should know.
In code
- Comment and document your code.
- Give functions and variables meaningful names. Function names should indicate what they do. Variable names should indicate what they contain.
- Modularize what makes sense, but do not over-modularize. Having many modules in many files makes it harder for others to understand the code. Functions that that are used in one module can just be part of that module.
- Put every module in a separate file, with a header file for each module.
- Related functionality belongs in the same module. For instance, a single module could deal with both writing and reading netcdf files, instead of having two modules, one for reading and one for writing.
- In high-level functions, like "int main()", use a bit of defensive programming. E.g. for programs that take arguments, check if the number of arguments given to a program is correct.
- Be consistent in your code indentation. Use spaces or tabs, but not both.
Compilation/makefiles
- For almost all projects, Makefiles should be called "Makefile", "makefile" or "GNUmakefile", with no extension, so that just typing the "make" command works.
- Specify your compilation options in variables
CXX = g++ CXXFLAGS = -I.... -std=c++14 -Wall -g -O2
- By default, compile with at least the flags -g (for debugging symbols) and -O2 (to allow the compiler to optimize the object code), as above in the CXXFLAGS variable.
- Always include the flag which sets the C++ standard that your files are to be compiled with.
- Stick to standard makefile variables for compilation and linking flags in makefiles, such as CXX, CC, FC, CXXFLAGS, CFLAGS, FFLAGS, LDFLAGS and LDLIBS.
- Define any non-standard variable that you use with ?=, e.g.
NETCDF_LIB ?= . NETCDF_INC ?= .
- Define every dependency and rule precisely and completely.
- 'Precisely' means that you specify what files are dependent on which other files. Do not combine rules or have rules that have several compilation command.
- 'Completely' means to specify all the dependencies between your files, e.g.
antsontable.o: antsonntable.cc reporter.h timesteppper.h ${CXX} ${CXXFLAGS} -c -o antsontable.o antsonntable.cc reporter.o: reporter.cc ... antsontable: antsontable.o reporter.o timestepper.o ...
Do not rely on wildcards because they make it easy to miss a dependency, and tend to break down when there are several executable targets.
- Do not specify dependencies on external or standard libraries.
- Makefiles are also code, and therefore should also be commented.
- A single makefile is advisable for most projects. Use different rules in the same makefile to incorporate more than one application and for tests.
- Include a 'clean' rule.
On Version Control (Git)
- Make sure to set up git config with your full name and a valid email address from the start.
- Commit frequently, with meaningful commit messages.
- If a file under version control has changed, you must add it explicitly again to git before commiting, or else its changed version will not be in the repo. So always do something like:
$ git add FILETHATCHANGED $ git commit -m 'What you did and why'
- Do not commit object files, binaries, or other 'derived' files.
For instance, any file created by your makefile should not be in
the repo. Output files shouldn't be in the repo either, unless
they are part of a check or a test (in which case they are now
input files).
The reason is that none of these files are part of the codebase and their binary content may change from one computer to the next, even though the versions have not. - Be careful with the command "git add .", which adds everything in the current directory, including hidden directories you may not even realise exist but that your operating system or editor may have put there. Just add what you know needs to be in the repo.
- When continuing to work on a code, do not start a new repo. Note that you can attach a version number to a particular state of the code using the "git tag" command.
On assignment submissions
- Make sure to have commited all changes to the repo before you submit your assignment; the working directory is not part of the repo!
- When asked to submit your git repo, use the git2zip on the teach cluster in the directory where your code and your .git directory reside; this command zips up the content of the .git directory into a zip file with a name based on the directory name.
- Test the resulting zipfile with the 'zip2git' command in a separate directory. This will unzip the git repository and checkout the 'master' version. Make sure this version of the code compiles, works, and contains the expected files (and only the expected files).
- Only after testing the zipfile of your repo, submit that zipfile to the course website.
- Do not submit your files separately in addition to submitting the zipped-up git repo; this is either superfluous, if the files are identical to those in the repo, or represents another version, but then we cannot know which of version is the one you intend to submit.
- When subsequent assignments are continuation of work on the same codebase, like for assignment 3 and 4, there is no reason for a new repo.
- If you've been editing on your own laptop and then copying the files over to the Teach cluster, you may want to know that there are better ways with git. Look into the commands "git clone", "git pull", "git push", and "git reset". So called "bare" repositories can be useful if you want a designated clone of the repo to serve as its central repository.
Last modified: Monday, 14 August 2023, 4:05 PM