Dr. Gavin A. Schmidt
The new GISS modelE
- Access to the development code
- Controlling input/output
- Interfacing to a PHYSICS module
- Coding standards
- Transistion to the new standards
The code is being developed using a version control system called CVS. This keeps track of changes, and ensures that everyone can keep up-to-date with innovations written by everybody else. This is a novel feature of working with this model and so needs to be explained a little.
Getting the code
On Ra, set the environmental variable CVSROOT to /u/cmrun/cvs/repository. Go to where you want the modelE directory to reside. Type 'cvs checkout modelE'. This will create your copies of the code for your personal use in the directory modelE. This includes model source code (in sub-directory model), some preliminary documentation (in doc), some auxilliary programs (in aux and exec), and a directory for your rundecks (decks).
There is a manual of sorts for CVS available, but that is probably more technical than is required.
Running the model
In order to make a rundeck that you can use, go into the modelE/decks directory and type 'gmake rundeck RUN=E001xyz' where 'xyz' are some unique initials or other identifying tag (so that your test runs to not interfere with anyone else's). This will take the standard rundeck (E001.R) and make you a functional copy. There is also a Qflux version of the new modelE, which is described in E001A.R. The only difference is that KOCEAN=1. (use 'gmake rundeck RUN=E001Axyz RUNSRC=E001A' to create a new Qflux rundeck). Stratospheric and coupled versions are available with RUNSRC=E001M23, and RUNSRC=E001B, respectively.
In the same directory, type 'gmake gcm RUN=E001xyz' to compile the code. The Makefile looks into the rundeck and derives the dependencies of each file on the others, so that if you edit a file, the gmake command will know that affected files should be recompiled also.
To run setup, type 'gmake setup RUN=E001xyz'. This will run the model for the first hour as usual. It will also create a link from this directory to the /raid on which the output will be stored. Thus the printout will be E001xyz/E001xyz.PRT.
To carry on past the first hour, 'run E001xyz' works as before. The functions 'qrsfnt E001xyz' and 'sswB E001xyz' provide timing info, and stop the model as before.
Note that error files from the compilation will be in the model source code directory modelE/model.
You can edit your version of the code to your heart's content without affecting anyone else. You must check that any code you change still compiles and that it gives the same results (unless you are specifically changing the functionality). Bytewise differences don't usually show up in the first hour, and so comparing the print out at that stage is very useful. We have placed some printouts and restart files in /u/cmrun/modelE for you to compare against. There are prt and rsf files for the first hour, and the first month, for both the fixed SST and Qflux model. To compare printout after the first hour use 'gdiff E001xyz/E001xyz.PRT /u/cmrun/prt_1jan1950_hr1'. If you are testing the Qflux model the file name is 'prtA_1jan1950_hr1'. After one month the file is 'prt_1feb1950' etc.
To check perfect bytewise compatibility (which you should generally strive for unless there is good reason not to), use the CMPE001 program. This is compiled using the 'gmake aux' command. Then type ../aux/CMPE001 E001xyz/fort.2 /u/cmrun/rsf_1jan1950_hr1 to check whether the first hour restart file is the same. If the output is just a list of arrays, then it is fine. If there are any differences, the program will print out the first ten array elements that differ. This should help you resolve the error that caused them.
Saving your changes
If you have made an alteration to the model that is correct and checked and documented (!), you can add it to the source code using 'cvs commit'. Before you run this command though, you must type 'cvs update' to ensure that your code is completely up to date. Once it is, you can then type 'cvs commit'. Depending on the whether you have set '$CVSEDITOR=emacs' this will bring up an emacs window, or vi, in which you must describe the changes you have made. Be as verbose as you like.
If there were any conflicts during the update process, you will need to resolve them before you can 'cvs commit' (possibly someone edited the same part of the code as you). These conflicts are marked in the file using '>>>>>>' and '<<<<<<' (see the CVS manual for details). Mostly these are trivial.
If your changes cause any bytewise changes to the prognostic variables, you will need to save the PRT and fort.x files in the central storage location, so that other people can keep checking against current output. Both model versions should be run.
I/O is now written at a higher level than before. This implies that the code in the main loop will not need to know the details of the arrays that need to be saved for each module. We have split the output file into records, and call a write_PHYSICS routine for each module. This too would hide the details from the main routine, and possibly would be simpler to implement. For example, writing out the restart file looks something like this:
C***** write out restart file
write(kunit) header information
The reading in follows the same pattern (calling the same routine with an iread arguement). The read/write statements are on seperate records since that is clearer. A small header label is also included to be helpful (a simple title with the version of the module and/or information about the length of the record).
The various functions (initialisation, driver routine, actual physics) for each physics routine are now mostly separated, so that it will be easy to changes the physics without having to change much else. And that if something else in the model changes, nothing should need to be changed in the physics (although changes in the initialisation and driver routines are possible). In order to this cleanly, and have the minimum number of dependencies between subroutines/files, we propose having three seperate files for each PHYSICS unit: Firstly, there will be the local PHYSICS module, that will only contain the local column physics, and which will not reference the main model variables at all. Secondly, there will be the PHYSICS interface, which will be a module that 'owns' the relevant model variables that are related to the local PHYSICS module. The I/O routines will also be included there. Thirdly, there will be the file that contains the driver and initiallisation routines that set up and call the local PHYSICS routines. Thus the local phyiscs should not be dependent on any other module, the interface file will only be dependent on the main model (but not the driver file, or (directly) on the local physics), while the driver file will be dependent on everything. This pattern has been adhered to a little fitfully. Sometimes two of the files are combined, however, the main PHYSICS common block should never be in the same file as the drivers (since that can give rise to circular dependencies).
Other modules that want to use any information from the PHYSICS module, will simply USE the interface module, and thus not be directly dependent either on the local physics or the drivers. The naming convention should reflect both variations in the local physics, and also in the main model, and so I suggest something along the lines of:
PHYS for the local physics PHYS_COM for the interface module PHYS_DRV for the drivers
The file that contains the PHYSICS code (ie. all the column physics, and all routines that are called from within those routines) will be written as a fortran90 module. This contains what would have been the local common blocks, plus all the internal routines. The format for this is
& , sh_wat=>shw
INTEGER, PARAMETER :: internal parameters
REAL*8 :: internal arrays
REAL*8, PUBLIC :: arrays that can be accessed externally
END SUBROUTINE ABC
END SUBROUTINE XYZ
END MODULE PHYSICS
The declarations at the beginning of the module take the required data from the main model module (equivalent to the main common block now), and do any dynamic mapping of names (if required). In the example the parameter shw in the main common block is mapped to sh_wat for use in these routines.
Within the module, no references to external model variables should be made. All the information should be passed in and out via the initialisation or driver routines.
The PHYSICS interface module contains the main model variables associated with the PHYSICS module, and will generally only contain the i/o routine. All variables that are used by another part of the code will be declared here, while only the variables required for a restart will need to be in the i/o routine.
USE MODEL_COM, only : IM,JM,LM,iwrite,iread
CHARACTER*20 :: MODULE_HEADER :: "PHYS_COM v1.4"
REAL*8, PUBLIC :: PHYS1(IM,JM,LM)
REAL*8, PUBLIC :: PHYS2(IM,JM,LM)
INTEGER iunit !@var iunit unit number of read/write
INTEGER irw !@var irw read or write flag
select case (irw)
write(iunit) MODULE_HEADER, PHYS1, PHYS2, ...
read(iunit) HEADER, PHYS1, PHYS2, ...
& print*, "Discrepency in module version",HEADER,MODULE_HEADER
Better checks and traps for unexpected conditions can of course be added.
if there are variables that need to be initialised from the main GCM (routine specific constants or operators, and/or initial values, including the contents of any IFIRST-type constructs) there should be a specific routine that does the initialisation. This will be included in a file with the driver routine (see below), and does not need to belong to a module (since variables local to here are not USE'd by any other part of the code).
call init_PHYSICS(io_option)which will be included in INPUT. The io_option for each PHYSICS module could choose between reading in from an initial condition file, or various other initialisation options.
The routine init_PHYSICS will use and write to the module that contains the internal parameters and arrays for the PHYSICS routine., and the intereface module. ie.
use PHYSICS, only : param, phys_init_ij
use PHYS_COM, only : PHYS1,PHYS2,.....
param = .......
select case (io_option)
PHYS1 = ....
PHYS2 = ....
This routine should supply the physics module with all the information it requires and write the results or apply the fluxes to the main model variables. The internal arrays should be set within the physics module, but the changes to the external (i.e. main model) variables are probably best expressed as fluxes/sources/sinks. With that, we can adjust the second order moments more appropriately and set diagnostics easily.
subroutine do_PHYSICS use MODEL_COM use PHYSICS, only : ABC, XYZ,.... use PHYS_COM set physics_var CALL ABC CALL XYZ set model_var RETURN
The physics variables can be passed either through the PHYSICS module (as above), or through an argument list if that is more appropriate.
All routines now use an IMPLICIT NONE statement. Thus every variable has to
- a) have a declared type (REAL, INTEGER etc),
- b) be declared as a SAVE (or not) (to avoid needing -static compilation)
- c) to be declared with an INTENT (IN, OUT, INOUT) for subroutine arguments in the list
- d) be initialised (if necessary)
- e) have some minimal documentation associated with it.
- b) be declared as a SAVE (or not) (to avoid needing -static compilation)
This may seem onerous, but it is going to help, and should be easy to maintain. For variables that are just dummy variables (loop indexes, sums, partial calculations), the declarations can be lumped together. (NB. For the time being, we are not going to insist on f90 KIND constructs for precision and length declarations, the old way is ok for now).
Thus the beginning of a routine will now look something like this:
SUBROUTINE XYZ(DT) !@sum XYZ calculates the xyz contribution to abc !@auth Joe Bloggs !@vers 2.3 (2.2 with new parameterization of zzz) !@calls DEF IMPLICIT NONE use PHYSICS !@param TFO local freezing point (deg C) REAL*8, PARAMETER :: TFO = -1.8 REAL*8, INTENT (IN) :: DT !@var DT time step (s) !@var TL local potential enthalpy array (J/kg air) REAL*8 TL(LM) :: TL = 0. !(initiallised to zero) REAL*8, SAVE :: BYZ(LM) !@var BYZ reciprocal of height (1/m)
The comments should include a '@' sign so that they can be found easily by a script, and be transformed automatically into a real document. I favour allowing this declarations to be put in anywhere and just saying that any line with '!@xxx ' is some kind of declaration. The other comment characters 'C','c' can also be used. Currently, we are using '@sum' (summary of routine function), '@auth' (author of routine), '@ver' (version number and info), '@param' (parameter definition), '@var' (variable definition), '@cont' (list of contents for modules), '@calls' (list of routines/functions used). Continuation lines are denoted with @+. Some others have also been defined.
- all DO loops should end with END DO lines (the old style is
obsolete, and may be removed in a new version of the fortran compiler)
- as few 'goto's as possible (use the new CYCLE or EXIT commands to
break out of do loops, or use DO WHILE and SELECT CASE constructs)
- All numbers that are either tuning parameters or arbitrary constants
or hard-coded array sizes or loop limits should be replaced with
variables or parameters.
- EQUIVALENCE statements have to go. Be explicit and use the
dynamic mapping of variable names available with USE.
- no ENTRY points (this functionality is replaced by the module, i.e
just set the internal common block variables in the top level of the
module, and they will be available for each internal subroutine)
- names of variables: ALL, HUGE, INDEX, INT, KIND, MASK, SCALE, SIZE,
SUM, TINY and TRIM are all fortran90 intrinsics, Hence they should
not be used as variable names.
- formatting/indenting. Try and indent DO loops and IF blocks by 2
spaces (this can be set as automatic in Emacs for instance. There
is no need to indent every DO loop in a multiple DO loop block.
Other fortran90 constructs and conventions will not be insisted upon, but using the array constructs in particular, produce much cleaner and compact code and is therefore highly encouraged.
These restrictions are implemented so that the GISS code will move to having a consistent look-and-feel that will make it easier to search through and understand what is being coded. This is a preliminary list which can (and will be) added to.
- The constants IM,JM,LM,NTM are restricted to the number of
latitudinal, longitudinal, vertical boxes and tracers
respectively. No other use is allowed.
- When iterating over the edges of boxes, the iteration should
always be DO I=1,IM-1 etc and not DO I=2,IM. This is so we have
a consistent pattern.
- No more LMM1 type variables (use LM-1 etc. wherever necessary)
- No explicit dependence on vertical/horizontal resolution (i.e. no
statements like ``if (L.lt.3) call fred''. It should be related to the
physical value of whatever is meant, ie. ``if (Z(L).lt.PBLZMAX)....''
- Do not refer directly to the indexes of the velocity (U,V
arrays) points in calculations on the A-grid. This should be
done using index arrays (see GEOM_B.f) that implicity contain the
information. Thus, the column physics will not depend on whether
the velocity points are on a B or C (or D or E...) grid.
- Do not calculate something that has already been calculated
eleswhere. USE the variable directly from the primary module
(i.e. pressure gradients should be USE'd from the dynamics,
rather than recalculated).
For the GCM code to be transformed to conform to these standards, some work is required both by the gcm development team and also by the original author of the physics module. The task can be broken down into more easily digestible chunks, as outlined below. You may find the following fortran 90 sites useful in getting to grips with the new style:
- i) COMMON blocks ==> Modules DONE
The PHYSICS MODULE is exactly made up of the original subroutines plus
anything that was in the local common blocks. The 'common'
declarations now go at the head of the module (as in the example
above), while the subroutines come below. Remember to add in the
MODULE wrappers (MODULE, CONTAINS, END MODULE etc.). Note that for a
routine within a module, the END statement must include the name of
the routine (i.e. SUBROUTINE ABC ..... END SUBROUTINE ABC). All
INCLUDES are replaced with USE statements. It is best to make the
attribute PRIVATE the default (as in the above example). Any variable
that you want to be available externally should be given the attribute
The MODULE is compiled just like any other *.f file. One thing to watch out for is references to variables without naming them (ie. looping across multiple common block arrays). In this case, the code should be rewritten to be explicit and the optimiser left to find the most efficient way of looping.
- ii) IMPLICIT REAL ==> IMPLICIT NONE DONE
If IMPLICIT NONE is used with no other changes, the compiler will
complain and output a list of the undeclared variables. Use this list
to declare all such variables according to what would have been the
implicit type. All DIMENSION statements should be replaced with a
specific type (REAL etc) - possibly with a DIMENSION attribute.
- iii) static ==> nostatic DONE
Every variable that is required to be saved between calls (ie. things
that are set in an IFIRST block), must be declared with a SAVE.
Variables that are declared in the upper part of the MODULE are
covered by the global SAVE at the start of the MODULE, and so this is
only required to be explicit for internal variables with each
subroutine. Jeff has already found most of these in his parallel
versions of the code. For each such variable, edit the declaration to
state something like REAL, SAVE :: xyz(LM).
- iv) adding standard documentation
For each variable that has now been specifically declared, some text
must accompany the declaration (as shown above). This will take a bit
of time (especially to get the units correct), and unfortunately needs
to be done by the primary author of the routine. Dummy variables, loop
indices and partial calculations do not need additional documentation
and can be lumped together, i.e. !@var i,j,l,k loop indices