{"id":2080,"date":"2010-12-14T01:53:30","date_gmt":"2010-12-14T06:53:30","guid":{"rendered":"http:\/\/www.easterbrook.ca\/steve\/?p=2080"},"modified":"2010-12-14T01:53:45","modified_gmt":"2010-12-14T06:53:45","slug":"agu-session-on-software-engineering-for-climate-modeling","status":"publish","type":"post","link":"http:\/\/www.easterbrook.ca\/steve\/2010\/12\/agu-session-on-software-engineering-for-climate-modeling\/","title":{"rendered":"AGU session on Software Engineering for Climate Modeling"},"content":{"rendered":"<p>Here&#8217;s the first of a series of posts from the <a title=\"AGU fall meeting website\" href=\"http:\/\/www.agu.org\/meetings\/fm10\/\" target=\"_blank\">American Geophysical Society (AGU) Fall meeting<\/a>, which is happening this week in San Francisco. The meeting is huge &#8211; they&#8217;re expecting 19,000 scientists to attend, making it the largest such meeting in the physical sciences.<\/p>\n<p>The most interesting session today was a new session for the AGU: \u00a0IN14B &#8220;Software Engineering for Climate Modeling&#8221;. And I&#8217;m not just saying that because it included my talk &#8211; all the talks were fascinating. (I&#8217;ve posted the slides for my talk, &#8220;<a title=\"Slides for my AGU 2010 talk\" href=\"http:\/\/www.cs.toronto.edu\/~sme\/presentations\/Easterbrook-AGU-fall2010.pdf\" target=\"_blank\">Do Over or Make Do: Climate Models as a Software Development Challenge<\/a>&#8220;).<\/p>\n<p>After my talk, the next speaker was\u00a0<a title=\"Cecelia's contact details\" href=\"http:\/\/www.cisl.ucar.edu\/css\/staff\/cecelia\/\" target=\"_blank\"><strong>Cecelia DeLuca<\/strong><\/a> of NOAA, with a talk entitled <em>&#8220;Emergence of a Common Modeling Architecture for Earth System Science&#8221;<\/em>. Cecelia gave a great overview of the <a title=\"ESMF homepage\" href=\"http:\/\/www.earthsystemmodeling.org\/\" target=\"_blank\">Earth System Modelling Framework<\/a>. She began by pointing out that climate models don&#8217;t just contain science code &#8211; they consist of a number of different kinds of software.\u00a0Lots of the code is infrastructure code, which doesn&#8217;t necessarily need to be written by scientists.\u00a0Around ten years ago, a number of projects started up that had the aim of building shared, standards-based infrastructure code. The projects needed to develop the technical and mathematical expertise to build infrastructure code. But the advantages of separating this code development from the science code was clear: the teams building infrastructure code could prioritize best practices, run the nightly testing process, etc, whereas typically the scientists would not do this.<\/p>\n<p>ESMF provides a common modelling architecture.\u00a0Native model data structures (modules, fields, grids, timekeeping) are wrapped into ESMF standard data structures, which conform to relevant standards (E.g.\u00a0ISO standards, <a title=\"NetCDF Climate and Forecast Metadata conventions\" href=\"http:\/\/cf-pcmdi.llnl.gov\/\" target=\"_blank\">CF standards<\/a>, and the <a title=\"Metafor Project Common Information Model\" href=\"http:\/\/metaforclimate.eu\/Table\/Work-Package-2\/Developing-the-CIM\/\" target=\"_blank\">Metafor common information model<\/a>, etc).\u00a0The framework also offers runtime compliance checking (e.g. to check timekeeping behaviour is correct), and automated documentation (e.g. the ability to write out model metadata in an XML standard format).<\/p>\n<p>Because of these efforts, in the US, earth system \u00a0models are converging on a common architecture. It&#8217;s built on standardized component interfaces, and creates a layer of structured information within Earth system codes. The lesson here is that if you\u00a0can take the legacy code, and express it in a standard way, you get tremendous power.<\/p>\n<p>The next speaker was\u00a0<a title=\"Amy's contact details\" href=\"http:\/\/www.gfdl.noaa.gov\/amy-langenhorst-homepage\" target=\"_blank\"><strong>Amy Langenhors<\/strong><\/a><strong>t<\/strong> from GFDL, <em>&#8220;Making sense of complexity with the FRE climate modelling workflow system&#8221;<\/em>. Amy explained the organisational setup at GFDL: there are approximately\u00a0300 people organized into groups:\u00a0<a title=\"GFDL science groups\" href=\"http:\/\/www.gfdl.noaa.gov\/research\" target=\"_blank\">6 science based groups groups<\/a>, plus a technical services group, and a modelling services group. The latter consists of 15 people, with one of them acting as a\u00a0liaison for each of the science groups. This group provides the software engineering support for the science teams.<\/p>\n<p>The <a title=\"Flexible Modeling System at GFDL\" href=\"http:\/\/www.gfdl.noaa.gov\/fms\" target=\"_blank\">Flexible Modeling System (FMS)<\/a> is software framework that provides a coupler and infrastructure support.\u00a0FMS releases happen about once per year; it provides an extensive testing framework that currently includes 209 different model configurations.<\/p>\n<p>One of the biggest challenges for modelling groups like GFDL is the IPCC cycle. Each providing the model runs for the IPCC assessments involves massive complex data processing, for which a good workflow manager is needed.\u00a0<a title=\"A similar talk by Amy on FRE\" href=\"http:\/\/www.gfdl.noaa.gov\/cms-filesystem-action\/user_files\/arl\/IntroToFRE.pdf\" target=\"_blank\">FRE is the workflow manager<\/a> for FMS. Development of FRE was started in 2002 by Amy, at a time when the model services group didn&#8217;t yet exist.<\/p>\n<p>FRE includes version control, configuration management, tools for building executables, control of execution, etc. It also provides facilities for creating\u00a0XML model description files, model configuration (using a\u00a0component-based approach), and integrated model testing (e.g. basic tests, restarts, scaling). It also allows for experiment inheritance, so that it&#8217;s possible to set up new model configurations based on variants of previous runs, which is useful for perturbation studies.<\/p>\n<p>Next up was\u00a0<strong><a title=\"Rob's contact details\" href=\"https:\/\/modelingguru.nasa.gov\/people\/rwburns\" target=\"_blank\">Rob Burns<\/a><\/strong> from NASA GSFC, talking about &#8220;<em>Software Engineering Practices in the Development of NASA Unified Weather Research and Forecasting (NU-WRF) Model<\/em>&#8220;.\u00a0WRF is a weather forecasting model originally developed at NCAR, but widely used across the NWP community. <a title=\"NU-WRF project page\" href=\"https:\/\/modelingguru.nasa.gov\/community\/atmospheric\/nuwrf?view=documents\" target=\"_blank\">NU-WRF<\/a> is an attempt\u00a0to unify variants of NCAR WRF and to facilitate better use of WRF.\u00a0NU-WRF is built from versions of NCAR&#8217;s WRF, with separate process of folding in enhancements.<\/p>\n<p>As is common with many modelling efforts, there were challenges arising from multiple science teams, with individual goals, interests and expertise, and scientists don&#8217;t consider software engineering as their first priority. At NASA, the\u00a0Sofware Integration and Visualization Office (SIVO) provides Software Engineering support for the scientific modelling teams. SIVO helps to drive, but not to lead the scientific modelling efforts. They help with full software lifecycle management, assisting with all software processes from requirements to release, but with domain experts still making the scientific decisions.\u00a0The code is under full version control, using Subversion, and the software engineering team coordinates the effort to get the codes into version control.<\/p>\n<p>The experience with NU-WRF shows that this kind of partnership between science teams and a software support team can work well.\u00a0Leadership and active engagement with the science teams is needed. However, involvement of the entire science team for decisions is too slow, so a core team was formed to do this.<\/p>\n<p>The next speaker was\u00a0<strong><a title=\"Tom's profile on Modeling Guru\" href=\"https:\/\/modelingguru.nasa.gov\/people\/tclune\" target=\"_blank\">Thomas Clune<\/a><\/strong> from NASA GISS, with a talk &#8220;<em>Constraints and Opportunities in GCM Model Development<\/em>&#8220;. Thomas began with the question:\u00a0How did we end up with the software we have today? From a software quality perspective, we wrote the wrong software. Over the years, improvements in fidelity in the models have driven a disproportionate growth in complexity of implementations.<\/p>\n<p>One important constraint is that model codes change relatively slowly, in part because of the model validation processes &#8211; it&#8217;s important to be able to validate each code change individually &#8211; they can&#8217;t be bundled together. But also because code familiarity is important &#8211; the scientists have to understand their code, and if it changes too fast, they lose this familiarity.<\/p>\n<p>However, the problem now is that software quality is incommensurate with the growing socioeconomic role for our models in understanding climate change. There&#8217;s a great quote from\u00a0<a title=\"wikipedia entry\" href=\"http:\/\/en.wikipedia.org\/wiki\/Technical_debt\" target=\"_blank\">Ward Cunningham<\/a>: &#8220;<em>Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite&#8230; The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as\u00a0interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation,\u00a0object-oriented or otherwise\u2026<\/em>&#8221; Examples of this debt in climate models include long procedures, kludges, cut-and-paste duplication, short\/ambiguous names, and inconsistent style.<\/p>\n<p>The opportunities then are to exploit advances in software engineering from elsewhere to systematically and incrementally improve the software quality of climate models.\u00a0For example:<\/p>\n<ul>\n<li><a title=\"Serendipity: Climate Model Coding standards\" href=\"http:\/\/www.easterbrook.ca\/steve\/?p=1986\" target=\"_blank\">Coding standards<\/a> &#8211; these improve productivity through familiarity, reducesome types of bugs, and help newcomers. But must be adopted from within the community by negotiation.<\/li>\n<li>Abandon CVS. It has too many liabilities for managing legacy code, e.g. a permanence to the directory structures. The community needs version control systems that handle branching and merging. NASA GISS is planning to switch to\u00a0GIT in the new year, as soon as the IPCC runs are out of the way.<\/li>\n<li>Unit testing. There&#8217;s a great quote from\u00a0<a title=\"Working effectively with legacy code\" href=\"http:\/\/www.cs.helsinki.fi\/u\/vjkuusel\/gradu\/Working%20Effectively%20With%20Legacy%20Code.pdf\" target=\"_blank\">Michael Feathers<\/a>: <em>&#8220;The main thing that distinguishes legacy code from non-legacy code is tests. Or rather lack of tests&#8221;<\/em>.\u00a0Lack of tests leads to fear of introducing subtle bugs.\u00a0Elsewhere, unit testing frameworks have caused a major shift in how commercial software development works, particularly in enabling test-driven development. Tom has been experimenting with\u00a0<a title=\"pFUnit at Sourceforge\" href=\"http:\/\/sourceforge.net\/projects\/pfunit\/\" target=\"_blank\">pFUnit<\/a>, a testing framework with support for parallel Fortran and MPI. The existence of such testing frameworks removes some of the excuses for not using unit testing for climate models (in most cases, the modeling community relies on regression testing in preference to unit testing).\u00a0Some of the reasons commonly given for not doing unit testing seem to represent some confusion about what unit testing is for: e.g. that some constraints are unknown, that tests would just duplicate implementation, or that it&#8217;s impossible to test emergent behaviour. These kinds of excuse indicate that modelers tend to conflate scientific validation with the verification offered by unit testing.<\/li>\n<li><a title=\"eg see Thomas LaToza's review for an introduction\" href=\"http:\/\/www.google.com\/url?sa=t&amp;source=web&amp;cd=15&amp;ved=0CDkQFjAEOAo&amp;url=http%3A%2F%2Fwww.cs.cmu.edu%2F~aldrich%2Fcourses%2F654%2Ftools%2Flatoza-clone-detection-05.pdf&amp;ei=NRMHTfnQHYnQngeNs8HGDg&amp;usg=AFQjCNEbd1PVNbudRjCeeU0wvHHifyMQ4w&amp;sig2=1P-IWV6uB-AbX59zQCwnbA\" target=\"_blank\">Clone Detection<\/a>. Tools now exist to detect code clones (places where code has been copied, sometimes with minor modifications across different parts of the software). Tom has experimented with some of these with the\u00a0NASA modelE, with promising results.<\/li>\n<\/ul>\n<p>The next talk was by <strong><a title=\"John's homepage at GFDL\" href=\"http:\/\/www.gfdl.noaa.gov\/john-krasting-homepage\" target=\"_blank\">John Krasting<\/a><\/strong> from GFDL, on <em>&#8220;NOAA-GFDL\u2019s Workflow for CMIP5\/IPCC AR5 Experiments&#8221;<\/em>. I didn&#8217;t take many notes, mainly because the subject was very familiar to me, having visited several modeling labs over the summer, all of whom were in the middle of the frantic process of generating their <a title=\"CMIP5 overview\" href=\"http:\/\/cmip-pcmdi.llnl.gov\/cmip5\/\" target=\"_blank\">IPCC CMIP5<\/a> runs (or in some cases struggling to get started).<\/p>\n<p>John explained that CMIP5 is somewhat different from the earlier CMIP projects, because it is much more comprehensive, with a much larger set of model experiments, and much larger set of model variables requested. CMIP1 focussed on pre-industrial control runs, while CMIP2 added some idealized climate change scenario experiments. For CMIP3, the entire archive (from all modeling centres) was 36 terabytes. For CMIP5, this is expected to be at least two orders of magnitude bigger.\u00a0Because of the larger number of experiments, CMIP5 has a tiered structure, so that some kinds of experiments are prioritized (e.g. see the diagram from <a title=\"A Summary of the CMIP5 Experiment Design\" href=\"http:\/\/cmip-pcmdi.llnl.gov\/cmip5\/docs\/Taylor_CMIP5_design.pdf\" target=\"_blank\">Taylor et al<\/a>).<\/p>\n<p>GFDL is expecting to generate around 15,000 model years of simulation, yielding around 10 petabytes of data, of which around 10%-15% will be released to the public, distributed via the ESG Gateway. The remainder of the data represents some redundancy, and some diagnostic data that&#8217;s intended for internal analysis.<\/p>\n<p>The final speaker in the session was <strong><a title=\"Archers homepage at UMich\" href=\"http:\/\/archerb.people.si.umich.edu\/\" target=\"_blank\">Archer Batcheller<\/a><\/strong>, from University of Michigan, with a talk entitled <em>&#8220;Programming Makes Software; Support Makes Users<\/em>&#8220;. Archer was reporting on the results of a study he has been conducting of several software infrastructure projects in the earth system modeling community. His main observation is that e-Science is about growing socio-technical systems, and that people are a key part of these systems. Effort is needed to nurture communities of users, but such effort is crucial for building the scientific cyberinfrastructure.<\/p>\n<p>From his studies, Archer found that most people developing modeling infrastructure software divide their time about 50:50 between coding and other activities, including:<\/p>\n<ul>\n<li>&#8220;selling&#8221; &#8211; explaining\/promoting the software in publications, at conferences, and at community meetings (even though the software is free, it still has to be &#8220;marketed&#8221;)<\/li>\n<li>support &#8211; helping users, which in turn helps with identifying new requirements<\/li>\n<li>training &#8211; including 1-on-1, workshops online tutorials, etc.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s the first of a series of posts from the American Geophysical Society (AGU) Fall meeting, which is happening this week in San Francisco. The meeting is huge &#8211; they&#8217;re expecting 19,000 scientists to attend, making it the largest such meeting in the physical sciences. The most interesting session today was a new session for [&hellip;]<\/p>\n","protected":false},"author":392,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[80],"tags":[],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/posts\/2080"}],"collection":[{"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/users\/392"}],"replies":[{"embeddable":true,"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/comments?post=2080"}],"version-history":[{"count":5,"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/posts\/2080\/revisions"}],"predecessor-version":[{"id":2085,"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/posts\/2080\/revisions\/2085"}],"wp:attachment":[{"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/media?parent=2080"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/categories?post=2080"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.easterbrook.ca\/steve\/wp-json\/wp\/v2\/tags?post=2080"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}