There are four obstacles to the application of machine learning for subsurface geoscience, data science and engineering.  As a result, it is not generally possible to implement off-the-shelf solutions (methods and workflows) form other technical fields.  On the bright side, there are opportunities to map research activities to each of these and achieve great impact in the field of subsurface modeling.  My students are currently working on these at the University of Texas at Austin and I’m always happy to discuss collaboration.  I call these obstacles the four horsemen of subsurface machine learning.

The Black Horse – Data Paucity

Often the project team is swimming in data.  In fact, many projects may be bogged down with the work load associated with all those data preparation tasks that consume typically 80% of the time allocated to a subsurface study.  Yet, if you consider the scale and complexity of the subsurface system, the data is almost always sparse.  How can that be given all the data? 

Heterogeneity - Our systems are heterogeneous and full of surprises (known as nonstationarities).  Invariably the degree of heterogeneity is high relative to the spacing and resolution of the data.  What does that mean?  There is a lot in the subsurface that we cannot observe; therefore, the uncertainty is high! 

Curse of dimensionality – The subsurface description requires working with many properties.  It is common to consider local paleoenvironment and depositional conditions, multiple scales of facies, porosity, directional permeability, fluid saturation, seismic-based elastic properties and geomechanical properties.  Unconventional reservoirs add additional properties such as total organic carbon and vitrinite reflectance for a measure of maturity.  With all these dimensions, even datasets with thousands of wells are quite sparsely sampled.  Also, with this multidimensional complexity, it is likely any model is entering the extrapolation-beyond-training regime more often than anticipated.  

The Red Horse – Interpretation

Subsurface data integration usually requires a thick layer of irreducible interpretation based on domain expertise and experience.  Well logs require significant petrophysical modeling to yield reservoir properties and seismic response requires significant geophysical modeling to yield attributes that are locally informative of rock and fluid properties.  Furthermore, consider the development of stratigraphic frameworks essential to reservoir prediction and mapping and the integration of production data, the ultimate ground truth for any reservoir forecasting.  It is difficult to find any part of subsurface modeling that is not strongly dependent on interpretation.  How does this impact the use of machine learning?

Objectivity – it is not possible to remove all subjectivity in subsurface modeling.  Matheron (1989) taught improving objectivity by removing unsupported assumptions, but then yielded to the realization that we do not have access to complete objectivity.  We must include these subjective interpretations and expert decisions in all our models as they add value.

Metadata – in the realm of machine learning with standard quantitative and qualitative (categorical) inputs we are challenged to transmit this important interpretation and assessments of the associated quality, uncertainty / veracity through the model.  At times, this uncertainty is quite discontinuous, and we must deal with discrete scenarios.

The Pale Horse – Complicated Physics

There is a physical reality to the subsurface rock and fluid system.  Uncertainty is due to our ignorance, caused by an inability to observe enough of the system or limited understanding of the physical processes.  Yet, there is much that we understand about the physics of geologic heterogeneity, geophysical rock and fluid response, and engineering fluid flow through porous medium.  The subsurface is not a system like clicks on a website, void of physical explanation.  Any machine learning approaches that attempt to naïvely omit the physics will inflate uncertainties and potentially produce non-physical results.

Integration of Physics - most machine learning approaches (barring computer vision) start with a data table.  In the absence of spatial descriptions, the data is treated as a multivariate set of independent, identically distributed samples.  We have found in our research (Nwachukwu et al., 2018) that the inclusion of spatial and furthermore the concepts of spatial continuity and connectivity significantly improve model performance.

The White Horse – Expensive Decisions

The subsurface is modeled for decision support.  Bentley (2015) challenges us to model for discomfort, not to use models to support preconceive ideas, but to use models to challenge our views and to discover the possible upside potential and critical risks associated with the downside.  The subsurface decisions are expensive, deepwater wells cost hundreds of million dollars (US$) each and a change in recovery factor of one percent can impact the recovered resource by tens of millions of barrels.  These decisions are not the same as Amazon’s recommendation engine deciding what product to advertise nor Spotify matching a song to listening history.  Subsurface development decisions are expensive, and often irreducible that impact large companies, and nations.  If modeling is decision support then the model must be understandable, transparent and defendable.

Authority – workflows may be adopted and become routine.  At this point they as ascribed authority.  The workflow results become the standard and to deviate from the workflow requires explanation.  This is dangerous if the model and its limitations are not completely understood.  The model must be understandable, transparent to checking, quality control and performance diagnostics. 

Creativity – machine learning is full of creative solutions.  Natural analogs flourish based on insect swarms, visual perception, neural cognition and forest systems.  Novel solutions build out with more layers, recurrent paths, bagging and boosting etc.  Likewise, creativity is required for solutions in subsurface modeling, where new things are encountered all the time with new subsurface objectives, data and physics.  Creativity is central to long-term success in subsurface-related industries.  It is essential that subsurface creativity is preserved in new machine learning-based approaches.

Bentley, 2015, Modelling for Comfort, Petroleum Geoscience, DOI: 10.1144/petgeo2014-089.

Matheron, G., 1989, Estimating and Choosing, Springer-Verlag Berlin Heidelberg, p. 141.

Nwachukwu, A., Jeong, H., Pyrcz, M.J. and Lake, L.W., 2018, Fast evaluation of well placements in heterogeneous reservoir models using machine learning, Journal of Petroleum Science and Engineering 163, 463-475.

The Four Horsemen of Subsurface Machine Learning

I could write a lengthy essay on my experience, feelings about this topic or I could just give you a very concise list of the reasons.  I had originally written and posted the list below on twitter (@GeostatsGuy) in Q3, 2017 as a response to Prof. Brian Romans, a good friend and well-known stratigrapher at Virginia Tech.  Go here to check out Prof. Roman's work:




I have assumed that engineers need no convincing.  My engineering undergraduate curriculum included: one dedicated coding class and 2-3 classes that required extensive coding on the regular assignments, and then a computational geostatistics Ph.D. supervised by Prof. Clayton Deutsch with a lot of Fortran, and subsequently learning the basics of C++ in a couple of weeks so I could code for Chevron during an internship. Python and R happened later.


Subsequently, I have realized, to my surprise that not all engineers enjoy coding.  To those this is extended.  Also, I want to acknowledge that not all geologists need convincing.  Consider the computational geologists that code for their numerical experiments (e.g. Profs. Chris Paola, David Mohrig and Kyle Straub's experimental stratigraphy groups), and those brave observational, outcrop-oriented that recognize the benefit of coding for automation and quantification (more on quantification later).  I hope this is helpful.  I'm always happy to discuss.

MICHAEL J. PYRCZ, Ph.D., P.Eng.,  Associate Professor
H.B. Harkings, Jr. Professor of Petroleum Engineering 

Hildebrand Department of Petroleum and Geosystems Engineering and Bureau of Economic Geology, Jackson School of Geoscience

The University of Texas at Austin

Geoscientists and Geo-engineers Need to Code