Here are a selection of comments by various people that you might like to browse through. There is some duplication, but there are also some very good ideas.
For reference here are the links to the NCAR CCM4 standards for coding and netCDF output. There may be some ideas that we could use.
Finally, a quote from Duane:
Just because it's called code, it doesn't mean it has to be cryptic
gavin@giss.nasa.gov
Preliminary wishlist: ===================== DOCUMENTATION: - definitions of every variable, - a description of the purpose of each subroutine - a step map through the GCM's normal code operation (e.g. through a one month simulation) and, finally, - a manual describing how to actually run the model. (Mark) DIAGNOSTICS: - a routine should exist to transfer diagnostics in the TITLE,DATA format to netcdf for easier transfer of data and more efficient use of standard graphics packages. Possibly this should be an option in the diagnostic programs (Duane) - a systematic way should be devised to output regional data at relatively high frequency for use in initiallising regional models (Len) - we need an easy way to change the model to produce daily or 6 hourly diagnostics (Ron/Reha) - the diagnostic (accumulation or restart files) output should be able to be readable in a boot-strap way (i.e. all information necessary should be contained in a standard header) (Gavin) - it should be easy to alter which variables are saved, at what frequency they are collected, and over what region the variables are accumulated. For users (not developers) there should be a simple way to select various options for a specified set of standard diagnostics. (Mark) - the standard diagnostic accumulation files (.acc files) need to include header information at the beginning of the file AND labels within the file, the purpose being to make these files much easier to work with when it comes to extracting and averaging subsets of information. (Mark) - it would be useful to alter the standard diagnostic ".PRT" files so that different viewers can be used to look at the output, and make it less cumbersome for higher resolution models. (Mark) - Include surface pressure as an IJ diagnostic (Jean) MODEL COMPILATION AND SETUP: - possibly we should abandon the priority system for linking subroutines, this can lead to errors if non-standard compilation is used (Shan) - Month/day/year input should be used instead of Tau for model start and finish times (Drew) - An automatic makefile should be created during the setup process to prevent linking of out of date object files. (Gavin) - use version control software so that all changes to the code are formally tracked and documented at the time that changes are made. (Mark/Igor/Duane) - Create a GISS-wide database that collects all pertinent info about simulations (with a short description). (Mark) - The GISS Model setup needs a local environment that is not easily translated. Either this should be made easier, or it should be slimmed down and replaced with standard UNIX tools. (Mark) MODEL ROUTINES: - Calculations for fluxes etc. should only be done once where they can be easily adjusted or replaced (Shan) - main model should contain gravity wave drag + subsequent parameterizations (David) - all physical constants should be in a parameter common block (Gavin/Reto) - Fortran 90 modules could be used instead of includes (better control of exactly which variables are wanted etc.) (Max) - All physical quantities should have meaningful names - the blocks O/G/GH/BL DATA etc. should only be used for I/O and inter-routine standardization. (Jean/Gavin) - All hangovers from Fortran 66 should be eliminated (ie. JMM1=JM-1, etc.) (Jean) - atmospheric mass, P**kapa and P*T**kapa should be in common blocks and kept up to date. (Jean) - the standard model should have all the parallelization optimisations as standard features (Jean) - WORK common blocks should not be used to pass variables (Jean) - All common blocks should be named (Max) - PTOP, PLE, SIGE should have more correct names (any other mis-named variables?) (Jean) - INPUT should be written to allow more flexible starting options (Jean) - Need JMONTH variable (Jean) - Naming conventions should be defined and adhered to! - roughness length should be read in only once in input (Gavin) - pbl and cloud arrays should be part of the model common block (Gavin) - SURFCE, PRECIP, GROUND should be split up and used to call separate routines for each surface type (Gavin) - Lakes (open water + ice on land) should be completely divorced from open ocean water/ice and kept as part of atmospheric model. (Gavin) - Drastically reduce the number of GO TO statements in the model (Mark) - try to rewrite the radiation code to avoid using ENTRY points (Mark) EXTRA Stuff added in afterthought: - fixed arrays should not need to be calculated all the time (Max) - implicit none should be standard - diagnostic arrays should not be referenced by number but by integer variable which explains what they are (i.e. IDSLP for the SLP diagnostic). - equivalences should only be over a whole common block or not at all!
Hi Gavin,
I do believe that the GCM would benefit from the inclusion of
gravity wave drag parameterizations, as we have in the stratospheric
model. In addition, the version of the stratospheric model Jeff's
running includes the new high cloud parameterization based on the
presence/absence of (parameterized) gravity waves passing through.
Even if one doesn't want to use the gravity wave drag per se, to use
this parameterization requires that the gravity wave
source/propagation be included. Of course, neither of these
parameterizations are necessary - they just improve the model's
temperature structure tremendously.
David
///////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////
Gavin,
Since you asked. I sort of gave up trying to simulate the impact of
orbital forcing on the Asian Monsoon system because the persistence
of a Tibetian snowfield in the control run of Model 2' messed up the
sensitivity of the model. There was a response but it was shifted to
the east and complicated any data model comparison. I am not sure if
this problem still exists in the model but if it does, trying to
identify and then correct the underlying processes would be of
benefit to me.
thanks -- Robin
Robert S. Webb
NOAA/OAR/CDC
325 Broadway
Boulder, CO USA 80303
Office ph. (303) 497 6967
Fax. (303) 497 7013
e-mail: rwebb@cdc.noaa.gov
Gavin-
In response to Sabrina's note:
I have been working with Matthew Fulakeza and Pat Lonergan to make
regional model simulations that can use GISS GCM results as lateral
boundary conditions. This technique "downscales" GISS GCM results to
50 km horizontal resolution over selected domains. The potential
contribution is significant for many research programs at GISS.
Matthew and Patrick have developed procedures to glean the needed
data from GCM simulations on an ad hoc basis. I suggest that more
permanent code be written which, when activated, would automatically
save the GCM data for selected target areas and at the high temporal
frequency required by the regional model (perhaps six times daily).
If this option is selected when making a GCM run, the resulting data
would be immediately available for a regional model run. Patrick and
Matthew can contribute to this effort should there be a consensus to
implement it.
Len Druyan
Rm 516
ext: 5564
Gavin,
I hope that the 'frozen' model does not mean we have to freeze in
errors. The last time I talked to Reto he was not using the
latest CB265.S.
I found my old wish list. Here are some things on it.
note: When I say the 'C' array, I am talking about the combined real,
integer, and character components.
*) The names of source modules (.S files) should only reflect resolution
dependence if they are in fact resolution dependent. For example,
DB112M9.S has nothing in it that depends on 9 layers, so the '9' should
not be in the name. Likewise, the 'M' probably does not belong..(?).
*) Get rid of LMM1, JMM1, etc. in COMMON. Remove from code wherever
possible. If usage is impossible to eliminate, put in PARAMETER.
*) Expand the 'C-array'. This will make obsolete dozens of current
progams. Think about a way to maintain backward compatibility.
*) Most constants in the C-array should really be put into PARAMETER
statements. Actually, this may eliminate the need to expand C.
*) We need an integer in C that tells us what month we are in. This is
currently computed in DAILY but is not available to other routines.
*) ISTART. Perhaps it could be an array or mask, so that different
starting conditions could be combined. This may require two different
variables, one for restartiubg (istart=10,11,12 (and 4?)) and one for
initial conditions. Currently, INPUT is a confusing tangle of this and
that, depending on such and such, with repetitous code and confusing
GOTOs. I'm thinking more in terms of a 'chineese menu'.
*) Rename PLE as PLB, so it is clear that it refers to the bottom edge.
*) Rename SIGE as SIGB for the same reason.
*) PTOP used to refer to the actual top of the model, but no longer. It
would be useful, maybe in input, to somewhere print the the actual top
with 3 significant digits. This may even be saved in C so that off-line
programs don't have to calculate it.
*) Arrays like GDATA, GHDATA, BLDATA, FDATA, ODATA should be
equivalenced to arrays with more meaningful names. In fact, they should
appear in the code only for I/O.
*) PBLPAR, PBLOUT commons should probably be put into an INCLUDE, since
they appear in more than 2 routines.
*) the names of COMMON's should be systemetized and regulated. 'WORK'
means 'WORK'. This could result in some wasted memory space, but not
necessarily. Currently, people sometimes avoid putting arrays in a
'work' common because they are afraid to clobber something, so they make
up another common or just use DIMENSION.
*) Add a 3-d air mass array. It seems silly in the parts of the
atmosphere where there is constant pressure, but it would greatly
simplify the code. Also, code that is optimized for parallel processing
would be the same as code that is optimized for non-parallel processing.
*) P**kapa and P*T**kapa are frequently recalculated. Maybe they should
be in common.
*) If we continue to group several related routines into modules, I
suggest that we separate the dynamics routines from MAIN and INPUT. Do
ORBIT and DAILY stay with MAIN??. AVRX should be grouped with DYNAM,
PGF, AFLUX, etc. Since GWDRAG and DEFORM are for stratosphere model
only, it's possible they should be on their own...on the other hand,
maybe not.
*) No unnamed files, except, perhaps, fort.99. Or maybe a rule that
says no unnamed files in routines other than MAIN and INPUT.
*) Radiation should be initialized in INPUT.
*) Currently some versions of the model have 3 different namelists,
called from 3 different routines. I'm not sure this is a good idea.
Maybe all this should be done in INPUT, in which case seperate namelists
would not be necessary. However, this would mean the user would need to
cook up a way to get the namelist info to the routine that needs it.
*) I have a note that 'Shapiro filter should have arg. list'. I don't
remember my train of thought on this.
*) we sometimes need the surface pressure or air mass in
the layers for post processing. The best we can do now is use the apj
array, which is zonal. It may be a good idea to save on the acc file an
IJ array of mean surface pressure as well.
*) Apropos the remodelling project, if we are to produce
documentation we should all get on the same page. For
example, Gary's recent 'MODELDATA.TXT', which is fine
and good for Gary, but does not speak for everyone,
and does not even acknowledge the need for the infor-
mation needed in a file to produce a plot from data
with an irregular grid. If standards are to be
adopted, they should not be unilateral.
I personally dislike -999999. for missing data, and
prefer something like -1.e20. In any case, the exact
number for missing data should also depend on the
data itself. You don't want a number that can be
confused with the data (like zero, which many nin-
compoops use).
The existance of Reto's wonderful 'fcop' program, unix
commands like 'cat', and the many other file query
and manipulation tools we've developed over the years
make strict rules like 'Model output should be
organized with separate datafiles for each climate
variable' totally unnecessary. They are reminiscent
of the days of MVS--the dark ages before unix.
I don't know if you were planning on getting into
these kinds of issues....
*) processing of model output, the techniques I use in the 'pd'
procedure to obtain 'plot-ready' output that is fully scaled
can be extended and even put 'on line', so that the model
writes the files at the end of each month. Personally, I think
this is very wasteful, since usually people are interested in
annual and seasonal means, which are best obtained by averaging
the acc files. It would be a simple matter to 'turn on' this
option with a namelist array. I could work on it. The same
could be done with netcdf, but that would be even more wasteful.
Actually, one could replace the acc files completely with
scaled output, such as pd does. The hang up is the non-linearity
of many of the quantities that are printed. Do you average the
individual terms over many years, or the monthly printout?
Gavin Schmidt wrote:
>
> I agree. I think we should probably retain the acc files in something
> similar to today's format and concetrate on giving the post-processing
> programs as much flexibility as possible (including netcdf output/seasonal
> annual means/ regional etc).
>
> Since so many people have brought up issues with the diagnostics, we should
> think about this very carefully before we make the changes. It is unlikely
> that everyone will be satisfied, but we can but try!
>
> Thanks
>
> Gavin
The beauty of the 'pd' process is that it is so flexible. Usually,
all you need is to know the diagnostics module that was used. But
certainly, making a new pd is confusing for the uninitiated, and we
could work on that. netcdf could be added to pd as an option.
-Jean
Gavin,
Thank you for making such an attempt!
I would like to emphasize on modularity and user manual. I know it would
be not practical to expect a highly complicated atmospheric model to be
"point and click", but there is definitely room there to be improved.
NCAR CCM3 has done a lot over the years toward making it a community
model, so that a person with basic Fortran knowledge can download CCM3
code from the website and run it guided with the manual. This may or
may not be our goal, given the shortage of our personnel, but how they
handled this problem can be a guideline for us. For example, in CCM3,
the latent heat calculation is in a subroutine where the formula can be
replaced easily by the choice of a user. Same applies to the convection
or radiation scheme.
A few minor suggestions:
(1) some commonly used fields are not in arrays, for example, wind stresses
are expressed as "RTAUUS" and "RTAUVS".
(2) some subroutines, after updated, occurred twice with both the new and
old version, and when I compile differently (alphabetically), I may
link to the old version by mistake. For example, setsur appears in both
R99G and R99E, where R99G has the latest version. So the order of
linking all subroutines is crucial as the program picks up the first
one and ignores the rest with the same name. I have been a victim for
several times, and consider this a potential hazard. I suggest
eliminating all unused routines from the program.
That is all I can think of for now. Let me know if I can help.
Shan
--
----------------------------------------------------------------------------
Shan Sun, Ph.D
NASA/Goddard Institute for Space Studies tel: 212-678-6031
2880 Broadway fax: 212-678-5622
New York, NY 10025 email: ssun@giss.nasa.gov
----------------------------------------------------------------------------
Sorry to be slow on this. Thanks for putting this stuff together.
Documentation and standard means of working with the model:
It seems to me that the most important forms of documentation
required are also the most basic. These include:
- definitions of every variable,
- a description of the purpose of each subroutine
- a step map through the GCM's normal code operation (e.g. through a
one month simulation) and, finally,
- a manual describing how to actually run the model.
Some of the above already exist, but the items need to be augmented
and brought together in a consistent (and concise) format.
The process becomes more involved if the purpose of this excercise
includes making it easier for individuals outside of GISS to operate
and alter the model (as opposed to a cleaning excercise designed to
optimize model code for use by experienced GISS GCM programmers).
Many operations involving the GCM are invoked by running scripts that
are specific to the systems at GISS. The scripts that deal with
rundecks, update decks, etc. do not translate well outside of the
GISS local environment. Some of these long-time GISS specific
techniques date back to the days of the mainframe and could be
replaced by with more conventional methods of working with
multi-routine software that is developed by many people. Simple
changes would involve the use of make files for linking and compiling
the code, while more detailed adjustments would involve employing
version control software to track the many changes that are made to
the GCM on a regular basis.
Diagnostics:
It would be very useful to alter the way diagnostics are saved. Three
changes are needed most:
1) the ability to alter which variables are saved, at what frequency
they are collected, and over what region the variables are
accumulated. For users (not developers) there should be a simple way
to select various options for a specified set of standard
diagnostics. By "various options" I do not mean "budget pages", JK
tables, and IJ maps, rather, there should be a means by which a user
could select specific, common climate variables that are then saved
in a readily accesible format.
2) Improving the current "standard" output: the standard diagnostic
accumulation files (.acc files) need to include header information at
the beginning of the file AND labels within the file, the purpose
being to make these files much easier to work with when it comes to
extracting and averaging subsets of information.
3) In addition, it would be useful to alter the standard diagnostic
".PRT" files that are intended for the line printers. Restrictions of
the line printer format make these standard files cumbersome for
higher resolution versions of the model. This has, so far, been dealt
with by printing maps on multiple pages or by "skipping" or averaging
the grid cells that are actually reported (e.g. the budget pages for
the 2x2.5 GCM report only every fourth latitude zone.). Furthermore,
though the files are in text format, the necessity of having control
characters and extra lines in order to create "overstrikes" on the
line printers makes the files illegible if viewed electronically (as
opposed to printing them on the line printers).
Other suggestions (some are wishful thinking, of course):
Remove excessive data statements from the code (e.g. radiation) and
substitute input files.
Create a user-friendly (JAVA-based?) technique for running the model
and for extracting basic climate variable information.
Have GISS scientists who work on model development use version
control software so that all changes to the code are formally tracked
and documented at the time that changes are made. This is the only
absolute way to avoid having to repeat the current excercise again in
ten years.
Drastically reduce the number of GO TO statements in the model, in
general, and try to rewrite the radiation code to avoid using ENTRY
points. These things make it very difficult for others (besides the
authors) to evaluate the GISS GCM paramterizations.
Make sure that a "standard version" of the GCM includes all code
necessary to run with any of the major ocean parameterizations used
regularly at GISS (including the qflux, the qflux w/deep ocean,
Gary's dynamic ocean, and the modified MOM model).
Create a GISS-wide database that collects all pertinent info about
simulations (with a short description). The last time we had such a
thing was when Reto made everyone fill out a page in a notebook
describing your run BEFORE you were assigned a run number. This was a
minor inconvience that kept things far more organized.
Finally, make sure that changes are communicated to, and implemented
in, other versions of the model (stratosphere, coupled O-A, Mars
(egad!))
Those are the major things that come to mind. We're continuing to try
and put together a version of the GCM that runs on a Mac. Model II
was fairly simple, but si99 is proving to be more stubborn - the
radiation code is by far the most difficult thing to move to a new
platform it seems.
Bye, Mark
Gavin, Jim, Reto,
Here are some of my thoughts on how to improve the structure of
GISS GCM program and on some techniques which are used to
work with it. I can explain it in more detail if there is an
interest.
Igor
---------------------------------------------------------------------
General
The entire program should be split into several logical modules.
Each module is characterized by the specific functions it performs
and has its own data which is hidden from other modules. The
exchange of data between the modules is performed by passing
parameters to corresponding subroutines (using of common blocks
for this purpose should not be allowed)
Proposed list of modules:
- MAIN - main program
Performs the general management: calls input/output,
initialization procedures, does the main loop over the
time steps. Should be very short and very simple.
Basically should look like:
program main
! read parameters for current run
call read_run_params
! read all input data and store it in a database
call input_data
! initialize all the modules
call init_module_timestep
call init_module_soils
call init_module_radiation
.............
call init_module_...
! now do the main loop
do while ( time < time_end )
! the following program should call all the subroutines
! which do the computations during the time step
call time_step
! the following should write restart file
if ( some_condition ) call write_restart_file
! maybe write some diagnostics here
if ( some_condition_1 ) call output_diagnostics
enddo
end main
Basically that's all that should be present in the MAIN module. All
the computations should be hidden inside TIME_STEP module and
all input/output should be performed by DATABASE module.
- DATABASE - performs reading and writing of data files and also
maintains the database of all the data.
The data is read into internal structure of ``DATABASE''
and is provided to other modules upon request. Such
requests should be made when modules are initialized.
All memory allocation should be performed by ``DATABASE''
so that it knows which data to dump when it is writing
a restart file.
- TIME_STEP - this is the module where all the computational subroutines
are called. All the global data which has to be exchanged
between the modules should be stored here. It should be
requested from the ``DATABASE'' when init_module_timestep
is called.
All data exchange between the modules should be performed
here by means of formal parameters. No COMMON blocks should
be allowed.
- computational modules - i.e. soils, radiation, atmosphere dynamics, e.t.c.
At current stage for most parts of the GCM such modules will
be just wrappers which fill common blocks when init_module
is called and copy data from formal parameters to common
blocks and back when some of the module programs are called
from the TIME_STEP module. Such common blocks should be
gradually replaced by direct passing of parameters to subroutines
or by use of global data in Fortran90 style.
On format of restart files and other data files.
Some general format should be adopted for all binary data files being
read / written by GCM and data processing programs. Those files should
have a structure of a simple database. One possible example of such a
structure are ij.* files which are currently used by diagnostics utilities.
The following format is proposed:
- each data unit ( like an array ) should be written as a separate
record
- the structure of the record should be approximately as follows:
| label area | description | binary data |
+------------+---------------+------------------------------+
where the fields are:
label - some short information describing the type and the length of
the data ( may be just two integer numbers )
description - human readable text describing the data, say, 80 characters
long ( for example: ``snow water content (m)'' )
binary data - the data itself
Such format will have the following advantages:
- allows easy extraction of data from any data file without looking
into the code of the program which has written such a file.
- provides easy way to add data to the restart file while preserving
compatibility with other versions.
- simplifies writing post-processing utilities
- allows more flexible diagnostic output, since data can be easily added
to diagnostic output or removed from it without changing post-processing
routines
Some notes on programming languages and other
computing tools
Compilation of the program should be done using standard ``makefile''
approach. This will eliminate the danger of using obsolete object
files. It will also make the process of compilation much more
flexible in terms of specifying options, directories, libraries e.t.c
Some version control utilities can also be included into the makefile.
I have a script which creates a Makefile from a *.R file if that
can make the transition easier.
About .U files: As far as I understand they were introduced to spare
the disk space so that only the difference between the current version
and control version is stored on the disk. The method which is being
used now relies on specific information in the files (numbers after
72nd position) and is non-portable and very inconvenient. It will
not even work with native Fortran90 format (free format).
I suggest that diff / patch commands be used for this purpose, as it
is done almost universally in UNIX. They work with any text file and
their output is portable between all UNIX systems.
Also, since the disk space is not as much an issue now, I would
suggest that the full text of programs is kept for the current version
(and may be for some recent versions) to eliminate possible confusion.
I would strongly suggest that ``upper level'' modules (MAIN, DATABASE,
TEME_STEP) should be written on some modern language, preferably C/C++.
Fortran 90 would be a minimum requirement, since Fortran 77 doesn't
support any serious data management (pointers, memory allocation,
structures, global data e.t.c.). I want to stress that once some
``upper level'' program is written, all programs which are called
from it have to be of the same level of abstraction or lower. I.e
if the MAIN program is written in C++ then all languages (Fortran 90,
Fortran 77, C, C++) can be used in the package. But if MAIN is
written in Fortran 90 the use of C++/C becomes much more difficult.
------------------------------------------------------------------------
Gavin,
This pretty much covers most of the things I wished for working
with the model over the years. I might add the following:
- The paleo year (year Before Present) as an input variable (in
NAMELIST if we
keep that) so that the insolation is automatically calculated.
- Get rid of NAMELIST as in an input and use a standard text file
configuration so that all parameterizations are explicit and
explanations of each are accessible inside the file.
- I like the ideas about the acc and rsf files. To expand it I suggest
creating multiple records inside the rsf and acc files and each record
contains standard header information describing the length and type of
quantities in the rest of the record so that one can read the acc/rsf file
no matter what model generated it. The first record will tell how many
records are in the file. Of course, strict adherence to this format would
be necessary for it to work.
- The database of models should be just that - a database. There is a
free version of a SQL standard database called PostgreSQL which is easily
installed on unix and can have a web interface.
I'd like to attend this meeting. As I mentioned I'll be coming down
Memorial Day weekend and I can drop by Tuesday after.
Gavin,
I just thought of a couple more suggestions (I've been writing up some of the
modifications to the DEC version of the model).
- Change array INDEX in DB112M9.S to something else (NDEX?). INDEX
is an intrinsic function in Fortran.
- Setup platform specific meta blocks (#ifdef - #endif) in the code and use the
precompiler option -D
when compiling.
- Use a file manager system as in the GFDL MOM 2 model for opening and closing
files and put
the file names inside the configuration file (with an appropriate conf name).
The configuration
file is the same as I mentioned in the previous email.
- Let the tracegas constants be defined as an input parameter.
There may be more as I think of them.
-Rick
Hi Gavin,
The list seems pretty good.
One suggestion is that it would be a good idea to know what eveyone does
with the GCM.
e.g the users could provide a 4-5 lines stating what they use it for. this
could go in the documentation manual.
I am quite unaware of what each person does and what modules do what.
e.g We have 5 tracers for sulfate chemistry and we have organics and
seasalt sources.. which we use to
study the indirect effect.
This would be helpful in that if people were interested in some aspect of
the code we have or likewise
we know that it is available right here and can be incorporated.
Thanks and good that this is being worked on.
Cheers
surabi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Surabi Menon
NASA GISS/Columbia Univ
2880 Broadway
New York, NY 10025
Tel: 212 678 5592
Fax: 212 678 5552
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hi, Gavin,
About the GCM documentation, perhaps we can develop
the following scheme (similar to javadoc):
(1)
Keep the doc and code in one place (in the source code).
In the beginning of every subroutine, the writer of the subroutine
are required to write comments in a given format, like the following:
c/**
c @summary general comment here
c @author name(s) of author(s)
c @version versioning
c @param list and explain each parameter
c @param list continued
c @calling list all subroutines to be called by this subroutine
c @calling list continued
c @other stuff
c*/
In the comments, HTML tags can be used, including links.
All the commnets between c/** and c*/ will be retrieved by a utility
(see below); while other comments will not.
(2)
Develop a doc utility (call it, say, gcmdoc) to automatically retrieve
the formated comments (like the one above) from all subroutines, and turn
them (together with other info) into a nicely structured HTML file.
Any user can run gcmdoc on any rundeck any time to get an updated
, professional-looking HTML document and view it using a browser,
and with the built-in hyper-link, jump freely from on module to another.
(3)
Traditional documentation suffers from the problem that the code
and the doc diverge over time. But the above scheme is free of this problem.
Also, imposing on the writers a given format of comments will
effectively (I hope) drive the writers to add more decent comments
(realizing that their comments will be easily retrieved and actually used
by other user).
(4)
The proposed doc utility may first look into a rundeck, get the development
hierarchy, get all the modules, and all subroutines' names and signatures
and the formated comments. Putting together, it will be a HTML document
ready on the browser. The useage may be:
gcmdoc B567M12
and in a few moments, B567M12.html will be ready. When viewing it, you will
first see a tree structure, indicating B567M12's parent, grand, and grand-grand
parents. By clicking a link, you will see all the modules used by B567M12
(together with other resources used like initial files),
and see their relation among each other. You continue to click a module,
then you see all the subroutines, when clicking a subroutine, you see all its
parameters' definitions, which calls which, and so on.
Regards,
Ye, 5-19-00
Hi Gavin,
Finished that list trying not to sound
like too much of a complainer.
You can open this in your browser I hope.
There are a couple of ways to automate the tedious task
of enumerating the untyped variables in a particular subroutine.
The simplest method is to the compiler do the work.
Compile a subroutine using IMPLICIT NONE,
and utilize the compiler's list of complaints to compile a list
of untyped variables in that subroutine, which is usually
stored in a file subroutine.ERR:
An example of (my) confusion arising from the storage of important
arrays like
There will of course be a discussion on which "constants" are actually
constants. I think it is safe to say that variable "constants" will
not improve the performance of the current GCM by much, excepting
those "constants" having to do with snow and ice.
The scheme was programmed so that
the moments of a given gridbox are separated from
one another in memory by
where
A related issue, about which I have only an opinion:
why are so many arrays
(like
Meanwhile, subroutine EARTH also acts as a driver routine
for the PBL code.
Thus to maintain the PBL code, one has
to keep track of two driver routines.
It would be nice if PBL were a separate latitude-longitude
subroutine, called perhaps from SURFCE,
which stores latitude-longitude arrays needed to calculate surface
fluxes over the different surface types. It would get its
input data from, say, the
In this sense, a module is just a glorified hybrid of
INCLUDE files and COMMON blocks. So what would be the
point of adopting it? At the current time, one could argue
that perhaps there is no need, the same way that
no one ever thought that early computer software would
ever be used past the year 1999.
But I think that as the GCM grows (modularly!)
to encompass ever
more processes, the communications between its component
subroutines will grow in complexity to a point where
the INCLUDE/COMMON combination will no longer be practical.
Modules offer some additional features over the
INCLUDE/COMMON combination which greatly facilitate
the development and maintenance of code and foolproof it
against those who often forget or don't know what they are doing,
like me:
There are only a few main issues that would arise if simple
modules were used in the GISS model:
From shosein@giss.nasa.gov Mon May 8 14:55:45 2000
Received: from babylon.giss.nasa.gov (babylon.giss.nasa.gov [192.42.70.14]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id OAA23390 for
Fromrwebb@cdc.noaa.gov Mon May 8 15:49:37 2000
Received: from server2.giss.nasa.gov (server2.giss.nasa.gov [192.42.70.179]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id PAA27892 for
From LDruyan@giss.nasa.gov Tue May 9 11:44:12 2000
Received: from server2.giss.nasa.gov (server2.giss.nasa.gov [192.42.70.179]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id LAA24916 for
Fromjlerner@giss.nasa.gov Wed May 10 16:21:52 2000
Received: from heka.giss.nasa.gov (heka.giss.nasa.gov [198.116.18.235]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id QAA04212 for
Fromsun@venus2.giss.nasa.gov Mon May 15 14:27:40 2000
Received: from venus2.giss.nasa.gov (venus2.giss.nasa.gov [192.42.70.111]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id OAA24398 for
Frommchandler@giss.nasa.gov Tue May 16 10:52:47 2000
Received: from server2.giss.nasa.gov (server2.giss.nasa.gov [192.42.70.179]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id KAA13468 for
Fromialeinov@simplex.giss.nasa.gov Tue May 16 21:38:11 2000
Received: from simplex.giss.nasa.gov (simplex.giss.nasa.gov [198.116.18.160]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id VAA19726 for
From rhealy@whoi.edu Wed May 17 10:14:20 2000
Received: from server2.giss.nasa.gov (server2.giss.nasa.gov [192.42.70.179]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id IAA19860 for
From smenon@giss.nasa.gov Wed May 17 10:30:33 2000
Received: from server2.giss.nasa.gov (server2.giss.nasa.gov [192.42.70.179]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id KAA24278 for
From acyxc@kirk.giss.nasa.gov Thu May 18 11:05:34 2000
Received: from kirk.giss.nasa.gov (kirk.giss.nasa.gov [192.42.70.51]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with SMTP id LAA17938 for
Fromkelley@giza.giss.nasa.gov Thu May 18 16:19:01 2000
Received: from giza.giss.nasa.gov (giza.giss.nasa.gov [192.42.70.33]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id QAA18888 for
GCM $.02 USD
Maxwell Kelley
f90-113 mfef90: ERROR MYSUB, File = mysub.f90, Line = 19, Column = 3
IMPLICIT NONE is specified in the local scope, therefore an explicit
type must be specified for data object "X".
With grep and a search-and-replace action in a text editor
one can strip away all but the names of the offending variables, and
then use a utility such as sort to alphabetize the names
and deduce their types. That's the easy part.
Then it of course takes someone who actually knows the subroutine to
describe the purpose of the variables.
pij*dsig(l) all the time.
It would be nice to have the GCM provide this as a courtesy
to subroutines which are not changing the air masses.
There is absolutely no reason for any subroutine which is
not changing the air masses to know how they were computed
in the first place.
This would probably be advantageous if GISS ever adopted
a vertical coordinate which would not allow the air mass
to be computed so simply, e.g.
a smoothly varying hybrid sigma-pressure coordinate instead
of the current scheme which has an abrupt transition
between the two at the "tropopause."
It would also be nice if the array had mks units instead of millibars.
I've been burned in the past when I forgot to convert between the two.
Introduction of p(i,j,l) and/or
pk(i,j,l)=p**kapa arrays would also be nice.
RAM is large enough now that these arrays would not be a burden.
prec,tprec,precss,cosz existed in a sensible
location, rather than in workspace common blocks.
The relative cost of storing a few extra two-dimensional
arrays outside of workspace becomes negligible as the vertical
resolution of the model increases.
precss in workspace may be found
in subroutine EARTH. The amount of supersaturation
precipitation is a quantity passed by EARTH to the
land surface routine.
Although subroutine CONDSE goes
to the trouble of saving the amount of supersaturation precipitation
in addition to the total precipitation, it saves the precss
array in a workspace common block which is subsequently
overwritten in the radiation code before it can be accessed by
EARTH. The latter subroutine declares an
array for this purpose but the array is never accessed by
CONDSE and (probably) contains nothing but zeros.
Did someone decide that perhaps it was better that the land surface
routine receive all convective precipitation?
lhe,grav,stbo,twopi
were made into centrally declared PARAMETERs rather than
residing in common blocks or being declared (and duplicated) in individual
subroutines.
When I am postprocessing GCM output and need to call GEOM,
it would be convenient if GEOM did not expect to find
constants like twopi stored in a common block.
qsat returning
saturation vapor pressure (or mixing ratio) existed
accessible to all subroutines, rather than having each
subroutine declare their own favorite version of it.
q(i,j,l),
which in addition to the mean (zeroth order moment).
carries 9 first and second order momentsqx,qy,qz,qxx,qyy,qzz,qxy,qyz,qzx(i,j,l)im*jm*lm units.
The separation grows proportionally worse when the moments
for several variables such as different tracers
are stored together in a common block.
I reprogrammed the scheme so that the moments for a variable
q are stored as:qmom( (/mx,my,mz,mxx,myy,mzz,mxy,myz,mzx/) ,i,j,l)mx,my,mz etc. are integer indices identifying
the moments.
I have programmed the tracers such that the tracer index
is the innermost rather than outermost index as it is currently.
This leads to considerable speedup
when the same operation is being carried out on all the tracers.
The rearrangement generally leads to more compact code when
a similar operation is being carried out upon all the moments
(especially in FORTRAN 90).
odata,gdata,etc.)
stored and referenced as
do j=1,jm
do i=1,im
C
C lots of stuff going on here...
C and then:
x11 = gdata(i,j,11)
x16 = gdata(i,j,16)
C and then do something with x11,x16
C
C Or, like in the convection/condensation routines:
do l=1,lm
cldij(l)=cloud(i,j,l)
enddo
C and then do something to cldij
C and then store cldij back in cloud
do l=1,lm
cloud(i,j,l)=cldij(l)
enddo
C
enddo
enddo
It seems to me, not knowing that much about how the cache operates
on workstations, that this kind of code results in inefficient memory
access
and hinders parallelizations done over i,j.
If gdata/cloud are always involved in loops where i,j
are the outermost loop indices, then why aren't i,j the
outermost indices of gdata/cloud?
Most parameterizations in the
gcm have to do with _vertical_ processes.
I realize that it is often convenient to read or write a latitude-longitude
"map" of a particular outermost index, particularly when the array in
question is a diagnostic array.
And the problem might be alleviated somewhat if gdata
for example were not such a large all-purpose storage array for
all manner of ground variables, only some of which are being accessed
at a given time. But I think for the benefits of parallelizing the
vertical parameterizations over latitude-longitude outweigh any
disadvantages of reordering the storage of certain arrays.
ptype
explicitly containing these numbers were created, and some global
integer parameters like iocean,ioice,ilice,iearth were
used to refer to the different surface types.
NDYN?
If NDYN*DT always has to equal an hour, why bother
computing it? Couldn't there be a coupling timestep called
DTGCM which could be an hour (or perhaps shorter
as the resolution increases)?
The individual routines (like SURFCE) could then choose their
internal timesteps based on DTGCM, not
NDYN*DT.
gdata array.
That way, one does
not have to keep track of different driver code for different
surface types (currently, one has to watch both SURFCE and
EARTH, and if the ocean/ice code were split up, say, into
OCEAN, SEAICE, LNDICE then there would be that much more
driver code).
equivalence, when the common block contains
too many variable names to conveniently enumerate.
Often the communication between any two components of the GCM
only involves a few key variables. Is there a way for
subroutines to communicate without telling their whole life
story?
There may be a number of variables declared in a
module you are USEing which you do not want your program to see,
or perhaps you have variables with the same names and wish to
avoid conflicts.
You can specify exactly which variables you wish to access from
a particular module, and the rest are not visible.
On the other hand,
if you try to USE a variable which doesn't exist in the module
you are USEing, the compiler will tell you.
This feature is really practical in conjunction with IMPLICIT NONE.
Finally, if you wish to see variables from a particular module,
but your subroutine refers to them by different names, you
can easily map between them in the USE statement without employing
EQUIVALENCEs.
In summary, the USE statement is
a robust way to pass specific information between subroutines without
resorting to (long) argument lists which
which have to be enumerated by the calling programs as well.
An additional feature of modules is that one can declare subroutines
and functions within them, leading to a foolproofing capability
in which variables in modules are either PUBLIC (accessible outside
the module) or PRIVATE (accessible only to functions and subroutines
within the module).
One module USEs other module(s) to create a
module customized for a particular purpose.
For example, the quadratic upstream scheme
INCLUDE file (SOMTQ.COM) declares arrays with
dimensions IM,JM,LM, but does not itself define
IM,JM,LM. The reason this works is that
all the routines INCLUDEing SOMTQ.COM also have
to include BBxxx.COM on a previous line, even if
they care only about IM,JM,LM and nothing else
from BBxxx.COM. In fact, the actual quadratic
upstream scheme advection code does not want
to see anything in BBxxx.COM, so it has to go
to the trouble of redeclaring IM,JM,LM itself.
Now, if BBxxx.COM and SOMTQ.COM were made into
modules BBxxx_COM SOMTQ_COM , then SOMTQ_COM
could USE BBxxx_COM to get the values
of IM,JM,LM (only),
and any routine which USEd
SOMTQ_COM but not BBxxx_COM would still see the
values of IM,JM,LM.
U,V,T,P,Q.
From adelgenio@giss.nasa.gov Fri May 19 09:43:08 2000
Received: from polaris.giss.nasa.gov (polaris.giss.nasa.gov [192.42.70.107]) by isis.giss.nasa.gov (AIX4.3/UCB 8.8.8/8.8.8) with ESMTP id JAA23806 for