For example, in output_cells(), do we need to try parallelizing the for-loop where individual lines are written to the output stream? I initially tried naively parallelizing this block of code, but it seems that because all the threads write to the same output stream variable, this sometimes results in runtime errors.
I was thinking of changing this by: 1) creating a vector to store each line, 2) having each thread store the line in the arrays in parallel, 3) then having a serial step where one thread reads from this vector and outputs each line? But I think this would require including a new library (the <vector> one), initializing new variables, adding a new for-loop etc. -- and I'm not sure if the additional overhead of doing these would change the results we are supposed to get for the Amdahl's law part of the assignment. Thank you!