Dr. Gavin A. Schmidt

The new GISS modelE


0) Access to the development code for modelE

The code is being developed using a version control system called CVS. This keeps track of changes, and ensures that everyone can keep up-to-date with innovations written by everybody else. This is a novel feature of working with this model and so needs to be explained a little.

Getting the code

On Ra, set the environmental variable CVSROOT to /u/cmrun/cvs/repository. Go to where you want the modelE directory to reside. Type 'cvs checkout modelE'. This will create your copies of the code for your personal use in the directory modelE. This includes model source code (in sub-directory model), some preliminary documentation (in doc), some auxilliary programs (in aux and exec), and a directory for your rundecks (decks).

There is a manual of sorts for CVS available, but that is probably more technical than is required.

Running the model

In order to make a rundeck that you can use, go into the modelE/decks directory and type 'gmake rundeck RUN=E001xyz' where 'xyz' are some unique initials or other identifying tag (so that your test runs to not interfere with anyone else's). This will take the standard rundeck (E001.R) and make you a functional copy. There is also a Qflux version of the new modelE, which is described in E001A.R. The only difference is that KOCEAN=1. (use 'gmake rundeck RUN=E001Axyz RUNSRC=E001A' to create a new Qflux rundeck). Stratospheric and coupled versions are available with RUNSRC=E001M23, and RUNSRC=E001B, respectively.

In the same directory, type 'gmake gcm RUN=E001xyz' to compile the code. The Makefile looks into the rundeck and derives the dependencies of each file on the others, so that if you edit a file, the gmake command will know that affected files should be recompiled also.

To run setup, type 'gmake setup RUN=E001xyz'. This will run the model for the first hour as usual. It will also create a link from this directory to the /raid on which the output will be stored. Thus the printout will be E001xyz/E001xyz.PRT.

To carry on past the first hour, 'run E001xyz' works as before. The functions 'qrsfnt E001xyz' and 'sswB E001xyz' provide timing info, and stop the model as before.

Note that error files from the compilation will be in the model source code directory modelE/model.

Making changes

You can edit your version of the code to your heart's content without affecting anyone else. You must check that any code you change still compiles and that it gives the same results (unless you are specifically changing the functionality). Bytewise differences don't usually show up in the first hour, and so comparing the print out at that stage is very useful. We have placed some printouts and restart files in /u/cmrun/modelE for you to compare against. There are prt and rsf files for the first hour, and the first month, for both the fixed SST and Qflux model. To compare printout after the first hour use 'gdiff E001xyz/E001xyz.PRT /u/cmrun/prt_1jan1950_hr1'. If you are testing the Qflux model the file name is 'prtA_1jan1950_hr1'. After one month the file is 'prt_1feb1950' etc.

To check perfect bytewise compatibility (which you should generally strive for unless there is good reason not to), use the CMPE001 program. This is compiled using the 'gmake aux' command. Then type ../aux/CMPE001 E001xyz/fort.2 /u/cmrun/rsf_1jan1950_hr1 to check whether the first hour restart file is the same. If the output is just a list of arrays, then it is fine. If there are any differences, the program will print out the first ten array elements that differ. This should help you resolve the error that caused them.

Saving your changes

If you have made an alteration to the model that is correct and checked and documented (!), you can add it to the source code using 'cvs commit'. Before you run this command though, you must type 'cvs update' to ensure that your code is completely up to date. Once it is, you can then type 'cvs commit'. Depending on the whether you have set '$CVSEDITOR=emacs' this will bring up an emacs window, or vi, in which you must describe the changes you have made. Be as verbose as you like.

If there were any conflicts during the update process, you will need to resolve them before you can 'cvs commit' (possibly someone edited the same part of the code as you). These conflicts are marked in the file using '>>>>>>' and '<<<<<<' (see the CVS manual for details). Mostly these are trivial.

If your changes cause any bytewise changes to the prognostic variables, you will need to save the PRT and fort.x files in the central storage location, so that other people can keep checking against current output. Both model versions should be run.

1) Controlling input/output for restarts and diagnostics

I/O is now written at a higher level than before. This implies that the code in the main loop will not need to know the details of the arrays that need to be saved for each module. We have split the output file into records, and call a write_PHYSICS routine for each module. This too would hide the details from the main routine, and possibly would be simpler to implement. For example, writing out the restart file looks something like this:

C***** write out restart file
       write(kunit) header information
       call io_RADIATION(kunit,iwrite) 
       call io_CLOUDS(kunit,iwrite) 
       call io_PBL(kunit,iwrite) 
       call io_OCEAN(kunit,iwrite) 
       call io_GROUND(kunit,iwrite) 
       call io_DIAGS(kunit,iwrite) 
...

The reading in follows the same pattern (calling the same routine with an iread arguement). The read/write statements are on seperate records since that is clearer. A small header label is also included to be helpful (a simple title with the version of the module and/or information about the length of the record).


2) Interfacing to a standard PHYSICS routine:

The various functions (initialisation, driver routine, actual physics) for each physics routine are now mostly separated, so that it will be easy to changes the physics without having to change much else. And that if something else in the model changes, nothing should need to be changed in the physics (although changes in the initialisation and driver routines are possible). In order to this cleanly, and have the minimum number of dependencies between subroutines/files, we propose having three seperate files for each PHYSICS unit: Firstly, there will be the local PHYSICS module, that will only contain the local column physics, and which will not reference the main model variables at all. Secondly, there will be the PHYSICS interface, which will be a module that 'owns' the relevant model variables that are related to the local PHYSICS module. The I/O routines will also be included there. Thirdly, there will be the file that contains the driver and initiallisation routines that set up and call the local PHYSICS routines. Thus the local phyiscs should not be dependent on any other module, the interface file will only be dependent on the main model (but not the driver file, or (directly) on the local physics), while the driver file will be dependent on everything. This pattern has been adhered to a little fitfully. Sometimes two of the files are combined, however, the main PHYSICS common block should never be in the same file as the drivers (since that can give rise to circular dependencies).

Other modules that want to use any information from the PHYSICS module, will simply USE the interface module, and thus not be directly dependent either on the local physics or the drivers. The naming convention should reflect both variations in the local physics, and also in the main model, and so I suggest something along the lines of:

 PHYS  for the local physics
 PHYS_COM  for the interface module
 PHYS_DRV  for the drivers 

The PHYSICS module:

The file that contains the PHYSICS code (ie. all the column physics, and all routines that are called from within those routines) will be written as a fortran90 module. This contains what would have been the local common blocks, plus all the internal routines. The format for this is

      MODULE PHYSICS
      IMPLICIT NONE
      PRIVATE
      SAVE
      use parameters1 
     &  , sh_wat=>shw

      INTEGER, PARAMETER :: internal parameters
      REAL*8 :: internal arrays

      REAL*8, PUBLIC :: arrays that can be accessed externally
....

      CONTAINS 
	
      SUBROUTINE ABC
....
      END SUBROUTINE ABC
      SUBROUTINE XYZ
....    
      END SUBROUTINE XYZ

      END MODULE PHYSICS

The declarations at the beginning of the module take the required data from the main model module (equivalent to the main common block now), and do any dynamic mapping of names (if required). In the example the parameter shw in the main common block is mapped to sh_wat for use in these routines.

Within the module, no references to external model variables should be made. All the information should be passed in and out via the initialisation or driver routines.

The PHYSICS interface module

The PHYSICS interface module contains the main model variables associated with the PHYSICS module, and will generally only contain the i/o routine. All variables that are used by another part of the code will be declared here, while only the variables required for a restart will need to be in the i/o routine.

      MODULE PHYS_COM
      USE MODEL_COM, only : IM,JM,LM,iwrite,iread
      IMPLICIT NONE
      SAVE

      CHARACTER*20 :: MODULE_HEADER :: "PHYS_COM v1.4"
      REAL*8, PUBLIC :: PHYS1(IM,JM,LM)
      REAL*8, PUBLIC :: PHYS2(IM,JM,LM)
.....
      CONTAINS

      SUBROUTINE io_PHYS(iunit,irw)

      INTEGER iunit  !@var iunit unit number of read/write
      INTEGER irw    !@var irw   read or write flag
      CHARACTER*20 HEADER

      select case (irw)        
      case (iwrite)
         write(iunit) MODULE_HEADER, PHYS1, PHYS2, ...
      case (iread)
         read(iunit) HEADER, PHYS1, PHYS2, ...
         if (HEADER.ne.MODULE_HEADER) 
     &      print*, "Discrepency in module version",HEADER,MODULE_HEADER
      end select 

      RETURN

Better checks and traps for unexpected conditions can of course be added.

Initialisation

if there are variables that need to be initialised from the main GCM (routine specific constants or operators, and/or initial values, including the contents of any IFIRST-type constructs) there should be a specific routine that does the initialisation. This will be included in a file with the driver routine (see below), and does not need to belong to a module (since variables local to here are not USE'd by any other part of the code).

      call init_PHYSICS(io_option)
which will be included in INPUT. The io_option for each PHYSICS module could choose between reading in from an initial condition file, or various other initialisation options.

The routine init_PHYSICS will use and write to the module that contains the internal parameters and arrays for the PHYSICS routine., and the intereface module. ie.

      subroutine init_PHYSICS(io_option)
      use MODEL_COM
      use PHYSICS, only : param, phys_init_ij
      use PHYS_COM, only :  PHYS1,PHYS2,.....

      param = .......

      select case (io_option)
      case (1)
	PHYS1 = ....
	PHYS2 = ....
      case (2)
        DO J=1,JM
          DO I=1,IM
 	     call phys_init_ij(PHYS1(I,J),PHYS2(I,J)...)
          END DO
        END DO
....
      end select

      RETURN

Driver routines

This routine should supply the physics module with all the information it requires and write the results or apply the fluxes to the main model variables. The internal arrays should be set within the physics module, but the changes to the external (i.e. main model) variables are probably best expressed as fluxes/sources/sinks. With that, we can adjust the second order moments more appropriately and set diagnostics easily.


      subroutine do_PHYSICS
      use MODEL_COM
      use PHYSICS, only : ABC, XYZ,....
      use PHYS_COM

      set physics_var

      CALL ABC
      CALL XYZ

      set model_var

      RETURN

The physics variables can be passed either through the PHYSICS module (as above), or through an argument list if that is more appropriate.

Standards for declarations and documentation

All routines now use an IMPLICIT NONE statement. Thus every variable has to

    a) have a declared type (REAL, INTEGER etc),
    b) be declared as a SAVE (or not) (to avoid needing -static compilation)
    c) to be declared with an INTENT (IN, OUT, INOUT) for subroutine arguments in the list
    d) be initialised (if necessary)
    e) have some minimal documentation associated with it.

This may seem onerous, but it is going to help, and should be easy to maintain. For variables that are just dummy variables (loop indexes, sums, partial calculations), the declarations can be lumped together. (NB. For the time being, we are not going to insist on f90 KIND constructs for precision and length declarations, the old way is ok for now).

Thus the beginning of a routine will now look something like this:


      SUBROUTINE XYZ(DT)
!@sum   XYZ calculates the xyz contribution to abc
!@auth  Joe Bloggs
!@vers  2.3 (2.2 with new parameterization of zzz)
!@calls DEF
      IMPLICIT NONE
      use PHYSICS

!@param TFO local freezing point (deg C)
      REAL*8, PARAMETER :: TFO = -1.8 

      REAL*8, INTENT (IN) :: DT  !@var   DT time step (s) 

!@var   TL local potential enthalpy array (J/kg air)
      REAL*8 TL(LM) :: TL = 0. !(initiallised to zero)

      REAL*8, SAVE :: BYZ(LM)    !@var   BYZ reciprocal of height (1/m)

The comments should include a '@' sign so that they can be found easily by a script, and be transformed automatically into a real document. I favour allowing this declarations to be put in anywhere and just saying that any line with '!@xxx ' is some kind of declaration. The other comment characters 'C','c' can also be used. Currently, we are using '@sum' (summary of routine function), '@auth' (author of routine), '@ver' (version number and info), '@param' (parameter definition), '@var' (variable definition), '@cont' (list of contents for modules), '@calls' (list of routines/functions used). Continuation lines are denoted with @+. Some others have also been defined.


Standards for coding within routines:

These code changes are not as fundamental (ie. the model will still run) but are being changed piecemeal as the code is upgraded. Many of these things can be changed quickly, but for those that can't, please bear these standards in mind as you upgrade code.

General good habits

  • all DO loops should end with END DO lines (the old style is obsolete, and may be removed in a new version of the fortran compiler)

  • as few 'goto's as possible (use the new CYCLE or EXIT commands to break out of do loops, or use DO WHILE and SELECT CASE constructs)

  • All numbers that are either tuning parameters or arbitrary constants or hard-coded array sizes or loop limits should be replaced with variables or parameters.

  • EQUIVALENCE statements have to go. Be explicit and use the dynamic mapping of variable names available with USE.

  • no ENTRY points (this functionality is replaced by the module, i.e just set the internal common block variables in the top level of the module, and they will be available for each internal subroutine)

  • names of variables: ALL, HUGE, INDEX, INT, KIND, MASK, SCALE, SIZE, SUM, TINY and TRIM are all fortran90 intrinsics, Hence they should not be used as variable names.

  • formatting/indenting. Try and indent DO loops and IF blocks by 2 spaces (this can be set as automatic in Emacs for instance. There is no need to indent every DO loop in a multiple DO loop block.

Other fortran90 constructs and conventions will not be insisted upon, but using the array constructs in particular, produce much cleaner and compact code and is therefore highly encouraged.

GISS Specific coding restrictions

These restrictions are implemented so that the GISS code will move to having a consistent look-and-feel that will make it easier to search through and understand what is being coded. This is a preliminary list which can (and will be) added to.

  • The constants IM,JM,LM,NTM are restricted to the number of latitudinal, longitudinal, vertical boxes and tracers respectively. No other use is allowed.

  • When iterating over the edges of boxes, the iteration should always be DO I=1,IM-1 etc and not DO I=2,IM. This is so we have a consistent pattern.

  • No more LMM1 type variables (use LM-1 etc. wherever necessary)

  • No explicit dependence on vertical/horizontal resolution (i.e. no statements like ``if (L.lt.3) call fred''. It should be related to the physical value of whatever is meant, ie. ``if (Z(L).lt.PBLZMAX)....''

  • Do not refer directly to the indexes of the velocity (U,V arrays) points in calculations on the A-grid. This should be done using index arrays (see GEOM_B.f) that implicity contain the information. Thus, the column physics will not depend on whether the velocity points are on a B or C (or D or E...) grid.

  • Do not calculate something that has already been calculated eleswhere. USE the variable directly from the primary module (i.e. pressure gradients should be USE'd from the dynamics, rather than recalculated).


Transition to the new standards

For the GCM code to be transformed to conform to these standards, some work is required both by the gcm development team and also by the original author of the physics module. The task can be broken down into more easily digestible chunks, as outlined below. You may find the following fortran 90 sites useful in getting to grips with the new style:

An introduction to Fortran 90 and a student course for conversion from F77 to F90.

    i) COMMON blocks ==> Modules DONE
    The PHYSICS MODULE is exactly made up of the original subroutines plus anything that was in the local common blocks. The 'common' declarations now go at the head of the module (as in the example above), while the subroutines come below. Remember to add in the MODULE wrappers (MODULE, CONTAINS, END MODULE etc.). Note that for a routine within a module, the END statement must include the name of the routine (i.e. SUBROUTINE ABC ..... END SUBROUTINE ABC). All INCLUDES are replaced with USE statements. It is best to make the attribute PRIVATE the default (as in the above example). Any variable that you want to be available externally should be given the attribute PUBLIC.

    The MODULE is compiled just like any other *.f file. One thing to watch out for is references to variables without naming them (ie. looping across multiple common block arrays). In this case, the code should be rewritten to be explicit and the optimiser left to find the most efficient way of looping.

    ii) IMPLICIT REAL ==> IMPLICIT NONE DONE
    If IMPLICIT NONE is used with no other changes, the compiler will complain and output a list of the undeclared variables. Use this list to declare all such variables according to what would have been the implicit type. All DIMENSION statements should be replaced with a specific type (REAL etc) - possibly with a DIMENSION attribute.

    iii) static ==> nostatic DONE
    Every variable that is required to be saved between calls (ie. things that are set in an IFIRST block), must be declared with a SAVE. Variables that are declared in the upper part of the MODULE are covered by the global SAVE at the start of the MODULE, and so this is only required to be explicit for internal variables with each subroutine. Jeff has already found most of these in his parallel versions of the code. For each such variable, edit the declaration to state something like REAL, SAVE :: xyz(LM).

    iv) adding standard documentation
    For each variable that has now been specifically declared, some text must accompany the declaration (as shown above). This will take a bit of time (especially to get the units correct), and unfortunately needs to be done by the primary author of the routine. Dummy variables, loop indices and partial calculations do not need additional documentation and can be lumped together, i.e. !@var i,j,l,k loop indices