Reproducible large-scale groundwater modelling projects using the iMOD Python package

crossref(2022)

引用 0|浏览0
暂无评分
摘要
<p><strong>Introduction</strong></p><p>Scripting has many benefits for groundwater model development: scripts form transparent reproducible steps and enable easy automation of many tasks. The Python scientific ecosystem provides a wealth of open source libraries and make Python particularly suited for model development. However, we require additional tools to create reproducible large-scale groundwater model projects.</p><p><strong>Handling the large scale &#8211; iMOD Python</strong></p><p>Regional models in the Netherlands are characterized by large size of input and output files; FloPy (Bakker et al. 2016) is not especially tailored to such use. We&#8217;ve built the iMOD Python package on top of modern Python packages such as xarray (Hoyer & Hamman, 2017) to provide greater convenience and performance. Xarray labels dimensions with coordinates for geospatial datasets, supports out of memory computation, reads and writes NetCDF files. The iMOD Python package (https://gitlab.com/deltares/imod/imod-python) provides the connection to model specific file formats (iMOD & MODFLOW6) and a number of additional utilities such as efficient regridding, GIS operations, and visualization.</p><p><strong>Reproducible projects &#8211; Git, DVC & Snakemake</strong></p><p>Our goal is to make entire model projects reproducible. Scripting succeeds in making individual steps of the complex model development process reproducible, but not completely. We use Snakemake (K&#246;ster & Rahmann, 2012) as workflow manager to rally individual scripts into an explicit workflow. Snakemake automatically determines dependencies between the different steps, and will detect which part of a workflow needs to be executed when e.g. some input data or steps change. Code and data version control keeps track of the history project. It functions as a log book of past decisions, but also allows reverting the project to a previous state with one command, which allows for an easy comparison between the results of two versions of the project. This is crucial for projects with a multiple-year duration and/or projects with multiple contributors. Online code repositories allow sharing of files, and further support collaboration with issue boards and by making code browsable.</p><p>In this presentation we will present a reproducible project of a regional model, using open-source tools. The project is shared on Gitlab (https://gitlab.com/deltares/imod/california_model), where a user can pull the code to their machine, using the version control software Git (Chacon & Staub, 2014). Consequently, the user can fetch the data onto their machine with data version control software DVC (https://dvc.org/). The complete workflow from raw data to 3D images of the model output heavily relies on iMOD Python and is executed with Snakemake.</p><p><strong>References</strong></p><p>Bakker, M., Post, V., Langevin, C. D., Hughes, J. D., White, J. T., Starn, J. J., & Fienen, M. N. (2016). Scripting MODFLOW Model Development Using Python and FloPy. Groundwater. https://doi.org/10.1111/gwat.12413</p><p>Chacon, S., & Straub, B. (2014). Pro Git. Pro Git (2nd ed.). Apress. https://doi.org/10.1007/978-1-4302-1834-0</p><p>Hoyer, S., & Hamman, J. J. (2017). xarray: N-D labeled Arrays and Datasets in Python. Journal of Open Research Software, 5, 1&#8211;6. https://doi.org/10.5334/jors.148</p><p>K&#246;ster, J., & Rahmann, S. (2012). Snakemake-a scalable bioinformatics workflow engine. Bioinformatics, 28(19), 2520&#8211;2522. https://doi.org/10.1093/bioinformatics/bts480</p>
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要