Dr. Gavin A. Schmidt
GISS modelE coding guidance
- Access to the development code
- Controlling input/output
- Interfacing to a PHYSICS module
- Coding standards
- Transistion to the new standards
The code is manged with a git repository. This keeps track of changes, and ensures that everyone can keep up-to-date with innovations written by everybody else.
Getting the code
Please follow the instructions given here.
Running the model
In order to make a rundeck that you can use, go into the modelE/decks directory and type 'gmake rundeck RUN=E001xyz' where 'xyz' are some unique initials or other identifying tag (so that your test runs to not interfere with anyone else's). This will take the standard rundeck (E001.R) and make you a functional copy. If you want to work off a different source rundeck, use the argument "RUNSRC=". For instance, there is also a Qflux version of the new modelE, described in E001A.R. So you would use 'gmake rundeck RUN=E001Axyz RUNSRC=E001A' to create a new Qflux rundeck. Stratospheric and coupled versions are available with RUNSRC=E001M23, and RUNSRC=E001B, respectively. (Note that these are relatively coarse resolution configurations, not CMIP5 standards).
In the same directory, type 'gmake gcm RUN=E001xyz' to compile the code. The Makefile looks into the rundeck and derives the dependencies of each file on the others, so that if you edit a file, the gmake command will know that affected files should be recompiled also.
To run setup, type 'gmake setup RUN=E001xyz'. This will run the model for the first hour as usual. It will also create a link from this directory to the /raid on which the output will be stored. Thus the printout will be E001xyz/E001xyz.PRT.
To carry on past the first hour use 'run E001xyz'. The functions 'qrsfnt E001xyz' and 'sswB E001xyz' provide timing info, and stop the model.
Note that error files from the compilation will be in the model source code directory modelE/model.
You can edit your version of the code to your heart's content without affecting anyone else. You must check that any code you change still compiles and that it gives the same results (unless you are specifically changing the functionality). Bytewise differences don't usually show up in the first hour, and so comparing the print out at that stage is very useful. There are prt and rsf files for the first hour, and the first month, for all model configurations.
To check perfect bytewise compatibility (which you should generally strive for unless there is good reason not to), use the CMPE001 program. This is compiled using the 'gmake aux' command. Then type ../aux/CMPE001 file1 file2 to check two files. If the output is just a list of arrays, then it is fine. If there are any differences, the program will print out the first ten array elements that differ. This should help you resolve the error that caused them.
Saving your changes
If you have made an alteration to the model that is correct and checked and documented (!), you can add it to the source code using the git instructions.
I/O is now written at a higher level than before. This implies that the code in the main loop will not need to know the details of the arrays that need to be saved for each module. We have split the output file into records, and call a write_PHYSICS routine for each module. This too would hide the details from the main routine, and possibly would be simpler to implement. For example, writing out the restart file looks something like this:
C***** write out restart file
write(kunit) header information
The reading in follows the same pattern (calling the same routine with an iread arguement). The read/write statements are on seperate records since that is clearer. A small header label is also included to be helpful (a simple title with the version of the module and/or information about the length of the record).
Please note that all I/O is now via self-describing netcdf files.
The various functions (initialisation, driver routine, actual physics) for each physics routine are now mostly separated, so that it will be easy to changes the physics without having to change much else. And that if something else in the model changes, nothing should need to be changed in the physics (although changes in the initialisation and driver routines are possible). In order to this cleanly, and have the minimum number of dependencies between subroutines/files, we propose having three seperate files for each PHYSICS unit: Firstly, there will be the local PHYSICS module, that will only contain the local column physics, and which will not reference the main model variables at all. Secondly, there will be the PHYSICS interface, which will be a module that 'owns' the relevant model variables that are related to the local PHYSICS module. The I/O routines will also be included there. Thirdly, there will be the file that contains the driver and initiallisation routines that set up and call the local PHYSICS routines. Thus the local phyiscs should not be dependent on any other module, the interface file will only be dependent on the main model (but not the driver file, or (directly) on the local physics), while the driver file will be dependent on everything. This pattern has been adhered to a little fitfully. Sometimes two of the files are combined, however, the main PHYSICS common block should never be in the same file as the drivers (since that can give rise to circular dependencies).
Other modules that want to use any information from the PHYSICS module, will simply USE the interface module, and thus not be directly dependent either on the local physics or the drivers. The naming convention should reflect both variations in the local physics, and also in the main model, and so I suggest something along the lines of:
PHYS for the local physics PHYS_COM for the interface module PHYS_DRV for the drivers
The file that contains the PHYSICS code (ie. all the column physics, and all routines that are called from within those routines) will be written as a fortran90 module. This contains what would have been the local common blocks, plus all the internal routines. The format for this is
& , sh_wat=>shw
INTEGER, PARAMETER :: internal parameters
REAL*8 :: internal arrays
REAL*8, PUBLIC :: arrays that can be accessed externally
END SUBROUTINE ABC
END SUBROUTINE XYZ
END MODULE PHYSICS
The declarations at the beginning of the module take the required data from the main model module (equivalent to the main common block now), and do any dynamic mapping of names (if required). In the example the parameter shw in the main common block is mapped to sh_wat for use in these routines.
Within the module, no references to external model variables should be made. All the information should be passed in and out via the initialisation or driver routines.
The PHYSICS interface module contains the main model variables associated with the PHYSICS module, and will generally only contain the i/o routine. All variables that are used by another part of the code will be declared here, while only the variables required for a restart will need to be in the i/o routine.
USE MODEL_COM, only : IM,JM,LM,iwrite,iread
CHARACTER*20 :: MODULE_HEADER :: "PHYS_COM v1.4"
REAL*8, PUBLIC :: PHYS1(IM,JM,LM)
REAL*8, PUBLIC :: PHYS2(IM,JM,LM)
INTEGER iunit !@var iunit unit number of read/write
INTEGER irw !@var irw read or write flag
select case (irw)
write(iunit) MODULE_HEADER, PHYS1, PHYS2, ...
read(iunit) HEADER, PHYS1, PHYS2, ...
& print*, "Discrepency in module version",HEADER,MODULE_HEADER
Better checks and traps for unexpected conditions can of course be added.
if there are variables that need to be initialised from the main GCM (routine specific constants or operators, and/or initial values, including the contents of any IFIRST-type constructs) there should be a specific routine that does the initialisation. This will be included in a file with the driver routine (see below), and does not need to belong to a module (since variables local to here are not USE'd by any other part of the code).
call init_PHYSICS(io_option)which will be included in INPUT. The io_option for each PHYSICS module could choose between reading in from an initial condition file, or various other initialisation options.
The routine init_PHYSICS will use and write to the module that contains the internal parameters and arrays for the PHYSICS routine., and the intereface module. ie.
use PHYSICS, only : param, phys_init_ij
use PHYS_COM, only : PHYS1,PHYS2,.....
param = .......
select case (io_option)
PHYS1 = ....
PHYS2 = ....
This routine should supply the physics module with all the information it requires and write the results or apply the fluxes to the main model variables. The internal arrays should be set within the physics module, but the changes to the external (i.e. main model) variables are probably best expressed as fluxes/sources/sinks. With that, we can adjust the second order moments more appropriately and set diagnostics easily.
subroutine do_PHYSICS use MODEL_COM use PHYSICS, only : ABC, XYZ,.... use PHYS_COM set physics_var CALL ABC CALL XYZ set model_var RETURN
The physics variables can be passed either through the PHYSICS module (as above), or through an argument list if that is more appropriate.
All routines MUST use an IMPLICIT NONE statement. Thus every variable has to
- a) have a declared type (REAL, INTEGER etc),
- b) be declared as a SAVE (or not) (to avoid needing -static compilation)
- c) to be declared with an INTENT (IN, OUT, INOUT) for subroutine arguments in the list
- d) be initialised (if necessary)
- e) have some minimal documentation associated with it.
- b) be declared as a SAVE (or not) (to avoid needing -static compilation)
For variables that are just dummy variables (loop indexes, sums, partial calculations), the declarations can be lumped together.
Thus the beginning of a routine will now look something like this:
SUBROUTINE XYZ(DT) !@sum XYZ calculates the xyz contribution to abc !@auth Joe Bloggs !@vers 2.3 (2.2 with new parameterization of zzz) !@calls DEF IMPLICIT NONE use PHYSICS !@param TFO local freezing point (deg C) REAL*8, PARAMETER :: TFO = -1.8 REAL*8, INTENT (IN) :: DT !@var DT time step (s) !@var TL local potential enthalpy array (J/kg air) REAL*8 TL(LM) :: TL = 0. !(initiallised to zero) REAL*8, SAVE :: BYZ(LM) !@var BYZ reciprocal of height (1/m)
The comments should include a '@' sign so that they can be found easily by a script, and be transformed automatically into a real document. I favour allowing this declarations to be put in anywhere and just saying that any line with '!@xxx ' is some kind of declaration. The other comment characters 'C','c' can also be used. Currently, we are using '@sum' (summary of routine function), '@auth' (author of routine), '@ver' (version number and info), '@param' (parameter definition), '@var' (variable definition), '@cont' (list of contents for modules), '@calls' (list of routines/functions used). Continuation lines are denoted with @+. Some others have also been defined.
We intend to implement the UDUNITS2 convention for all units declarations for ease of diagnostic checks and extraction of information. See here for descriptions of these standards.
- all DO loops should end with END DO lines (the old style is obsolete, and may be removed in a new version of the fortran compiler)
- No 'goto's. Use the new CYCLE or EXIT commands to break out of do loops, or use DO WHILE and SELECT CASE constructs.
- All numbers that are either tuning parameters or arbitrary constants or hard-coded array sizes or loop limits should be replaced with variables or parameters.
- All physical constants should be 'USE'd from the shared constants files.
- EQUIVALENCE statements have to go. Be explicit and use the dynamic mapping of variable names available with USE.
- no ENTRY points (this functionality is replaced by the module, i.e just set the internal common block variables in the top level of the module, and they will be available for each internal subroutine)
- names of variables: ALL, HUGE, INDEX, INT, KIND, MASK, SCALE, SIZE, SUM, TINY and TRIM are all fortran90 intrinsics, Hence they should not be used as variable names.
- formatting/indenting. Try and indent DO loops and IF blocks by 2 spaces (this can be set as automatic in Emacs for instance. There is no need to indent every DO loop in a multiple DO loop block. Do not place multiple 'end do's on a single line, and don't use a common 'CONTINUE' statment for multiple loops.
Other fortran90 constructs and conventions will not be insisted upon, but using the array constructs in particular, produce much cleaner and compact code and is therefore highly encouraged.
These restrictions are implemented so that the GISS code will move to having a consistent look-and-feel that will make it easier to search through and understand what is being coded. This is a preliminary list which can (and will be) added to.
- The constants IM,JM,LM,NTM are restricted to the number of latitudinal, longitudinal, vertical boxes and tracers respectively. No other use is allowed.
- When iterating over partial grids, please use the I_0,I_1, J_0,J_1 variables.
- No explicit dependence on vertical/horizontal resolution (i.e. no statements like ``if (L.lt.3) call fred''. It should be related to the physical value of whatever is meant, ie. ``if (Z(L).lt.PBLZMAX)....''
- Do not refer directly to the indexes of the velocity (U,V arrays) points in calculations on the A-grid. This should be done using index arrays (see GEOM_B.f) that implicity contain the information. Thus, the column physics will not depend on whether the velocity points are on a B or C (or D or E...) grid.
- Do not calculate something that has already been calculated eleswhere. USE the variable directly from the primary module (i.e. pressure gradients should be USE'd from the dynamics, rather than recalculated).
- No COMMON blocks.
For the GCM code to be continue to conform to these standards, some work is required both by the gcm development team and also by the original authors of any physics modules. Tasks can be broken down into more easily digestible chunks, as outlined below. You may find the following fortran 90 sites useful: