Reproducible large-scale groundwater modelling projects using the iMOD Python package

Joeri van Engelen,Joost Delsman, Huite Bootsma

crossref（2022）

引用 0|浏览0

暂无评分

摘要

IntroductionScripting has many benefits for groundwater model development: scripts form transparent reproducible steps and enable easy automation of many tasks. The Python scientific ecosystem provides a wealth of open source libraries and make Python particularly suited for model development. However, we require additional tools to create reproducible large-scale groundwater model projects.Handling the large scale – iMOD PythonRegional models in the Netherlands are characterized by large size of input and output files; FloPy (Bakker et al. 2016) is not especially tailored to such use. We’ve built the iMOD Python package on top of modern Python packages such as xarray (Hoyer & Hamman, 2017) to provide greater convenience and performance. Xarray labels dimensions with coordinates for geospatial datasets, supports out of memory computation, reads and writes NetCDF files. The iMOD Python package (https://gitlab.com/deltares/imod/imod-python) provides the connection to model specific file formats (iMOD & MODFLOW6) and a number of additional utilities such as efficient regridding, GIS operations, and visualization.Reproducible projects – Git, DVC & SnakemakeOur goal is to make entire model projects reproducible. Scripting succeeds in making individual steps of the complex model development process reproducible, but not completely. We use Snakemake (Köster & Rahmann, 2012) as workflow manager to rally individual scripts into an explicit workflow. Snakemake automatically determines dependencies between the different steps, and will detect which part of a workflow needs to be executed when e.g. some input data or steps change. Code and data version control keeps track of the history project. It functions as a log book of past decisions, but also allows reverting the project to a previous state with one command, which allows for an easy comparison between the results of two versions of the project. This is crucial for projects with a multiple-year duration and/or projects with multiple contributors. Online code repositories allow sharing of files, and further support collaboration with issue boards and by making code browsable.In this presentation we will present a reproducible project of a regional model, using open-source tools. The project is shared on Gitlab (https://gitlab.com/deltares/imod/california_model), where a user can pull the code to their machine, using the version control software Git (Chacon & Staub, 2014). Consequently, the user can fetch the data onto their machine with data version control software DVC (https://dvc.org/). The complete workflow from raw data to 3D images of the model output heavily relies on iMOD Python and is executed with Snakemake.ReferencesBakker, M., Post, V., Langevin, C. D., Hughes, J. D., White, J. T., Starn, J. J., & Fienen, M. N. (2016). Scripting MODFLOW Model Development Using Python and FloPy. Groundwater. https://doi.org/10.1111/gwat.12413Chacon, S., & Straub, B. (2014). Pro Git. Pro Git (2nd ed.). Apress. https://doi.org/10.1007/978-1-4302-1834-0Hoyer, S., & Hamman, J. J. (2017). xarray: N-D labeled Arrays and Datasets in Python. Journal of Open Research Software, 5, 1–6. https://doi.org/10.5334/jors.148Köster, J., & Rahmann, S. (2012). Snakemake-a scalable bioinformatics workflow engine. Bioinformatics, 28(19), 2520–2522. https://doi.org/10.1093/bioinformatics/bts480

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要