Cross-Dataset Validation in Geotechnical Databases

Cross-dataset validation in geotechnical databases showing lithology versus sampling, recovery versus RQD, SPT consistency, well construction validation, coordinate cross-checking, and automated QA/QC dashboard.
Share the knowledge

Why Data Relationships Matter More Than Individual Data Fields

Modern geotechnical, geological, environmental, and mining projects generate vast amounts of interconnected data. A single borehole may contain lithology logs, sampling records, laboratory results, recovery measurements, Rock Quality Designation (RQD) values, Standard Penetration Test (SPT) results, well construction details, survey information, and spatial coordinates.

Most data validation systems focus on checking individual fields. For example, ensuring a depth is positive or that a required field is populated. While these checks are important, they only detect a portion of potential data quality issues.

Many of the most serious errors occur when related datasets become inconsistent with one another. A lithology interval may not align with sample intervals. Recovery values may contradict RQD measurements. Well construction records may extend beyond the borehole depth. Coordinates may place a borehole outside the project boundary.

Cross-dataset validation addresses these problems by examining relationships between multiple datasets rather than validating each table independently.

This article explores the importance of cross-dataset validation in geotechnical databases and examines common validation scenarios involving lithology, sampling, recovery, RQD, SPT data, well construction records, and spatial coordinates.


What Is Cross-Dataset Validation?

Cross-dataset validation is the process of verifying consistency between related data tables within a geotechnical database.

Unlike simple field validation, which evaluates a single value, cross-dataset validation compares information across multiple records and datasets.

Examples include:

  • Sample intervals should fall within lithology intervals.
  • RQD cannot exceed recovery.
  • Screen intervals should remain within borehole depth.
  • SPT tests should occur within valid sampling intervals.
  • Coordinates should match project boundaries.

These checks ensure that the database represents a logically consistent interpretation of the borehole rather than a collection of isolated records.


Why Cross-Dataset Validation Is Important

Data quality problems often emerge when information is entered by multiple users over long project durations.

For example:

  • A geologist enters lithology.
  • A technician records recovery.
  • A driller enters well construction details.
  • A surveyor updates coordinates.
  • A laboratory uploads testing results.

Each dataset may appear valid independently while still containing inconsistencies when compared with related data.

Without cross-dataset validation, these issues may remain hidden until:

  • Resource estimation begins
  • Geotechnical designs are developed
  • Regulatory reports are prepared
  • Construction decisions are made

Finding problems late in a project is significantly more expensive than identifying them during data entry.


Lithology vs Sampling Validation

One of the most common cross-dataset checks involves comparing lithology intervals with sample intervals.

Lithology provides the geological interpretation of the subsurface, while sampling records represent physical material collected for testing.

The two datasets should align logically.


Sample Intervals Within Borehole Limits

Every sample interval should fall within the borehole depth.

Example:

Borehole DepthSample Interval
50 m10–12 m
50 m25–26 m
50 m48–55 m

The final sample exceeds the borehole depth and should be flagged.

Validation Rule:

SampleTo <= BoreholeDepth

Sample Coverage Verification

Sampling intervals often need to correspond to logged lithology intervals.

Example:

LithologyFromTo
Clay05
Sand512
Till1220

Sample:

SampleFromTo
S-10157

The sample falls entirely within the sand interval and is valid.

If a sample spans multiple lithology units:

SampleFromTo
S-1024.86.5

The sample crosses a lithology boundary and may require review.

Such situations are not necessarily incorrect but should be identified for verification.


Missing Lithology Coverage

Every sampled interval should have corresponding geological logging.

Validation should detect:

  • Samples without lithology
  • Gaps in geological interpretation
  • Unclassified intervals

These conditions can impact geological modeling and reporting.


Recovery vs RQD Validation

Core recovery and Rock Quality Designation (RQD) are among the most important rock quality indicators in geotechnical investigations.

Because RQD is derived from recovered core, the two values must remain logically consistent.


Understanding the Relationship

Recovery represents:

Recovered Core Length / Run Length × 100

RQD represents:

Length of Sound Core Pieces > 10 cm / Run Length × 100

Since RQD is calculated from recovered material, RQD cannot exceed recovery.


Validation Rule

RQD <= Recovery

Examples:

RecoveryRQDResult
9582Valid
7870Valid
6590Invalid

The final example is physically impossible and should generate an error.


Recovery and Rock Type Relationships

Cross-dataset validation may also identify unusual combinations.

Examples include:

  • Very high recovery in highly weathered rock
  • Extremely low recovery in competent granite
  • Sudden recovery changes without lithological explanation

These conditions often warrant review.


SPT Consistency Validation

Standard Penetration Test (SPT) data is widely used in geotechnical investigations.

SPT values are often entered separately from lithology and sampling datasets.

Cross-dataset validation helps ensure consistency.


SPT Depth Verification

SPT tests should occur within borehole limits.

Example:

Borehole DepthSPT Depth
20 m18.5 m
20 m25 m

The second record exceeds borehole depth and should be rejected.


SPT and Sample Alignment

Many organizations collect SPT data within sampling intervals.

Validation can verify:

  • SPT depth falls inside a sample interval
  • Sample identifiers exist
  • Associated recovery records are present

Missing relationships often indicate data entry errors.


Duplicate SPT Depths

Multiple SPT tests at identical depths may indicate duplicate records.

Example:

Depth
10.0
10.0
10.0

Cross-dataset validation can identify duplicates automatically.


SPT Value Range Checks

While technically a field-level validation, comparison with lithology can provide valuable context.

Examples:

  • N-value of 2 in dense gravel
  • N-value of 100 in soft clay

Such combinations may be valid but should be reviewed.


Well Construction Validation

Environmental and hydrogeological projects frequently include well construction information.

These records must remain consistent with borehole geometry.


Screen Interval Validation

Well screens must fit within the drilled borehole.

Example:

Borehole DepthScreen Interval
30 m10–20 m
30 m25–35 m

The second screen extends beyond the borehole depth and is invalid.


Screen and Casing Overlap Checks

Validation should verify:

  • Screen intervals do not overlap casing intervals improperly
  • Filter packs surround screens
  • Annular seals are correctly positioned

These checks improve confidence in monitoring well design.


Construction Sequence Validation

A typical monitoring well contains:

  1. Borehole
  2. Screen
  3. Filter pack
  4. Bentonite seal
  5. Surface completion

Cross-dataset validation ensures components are arranged logically.

Incorrect sequences often indicate data entry mistakes.


Material Consistency Checks

Validation can compare:

  • Screen type
  • Casing material
  • Well diameter
  • Installation method

This helps identify configuration problems before regulatory submissions.


Coordinate Cross-Checking

Spatial errors are among the most expensive mistakes in geotechnical databases.

Incorrect coordinates can affect:

  • Geological models
  • Resource estimates
  • Groundwater studies
  • Engineering designs
  • Regulatory compliance

Cross-dataset validation plays a critical role in spatial quality control.


Project Boundary Validation

Coordinates should fall within the project area.

Validation can automatically compare borehole locations against:

  • Site boundaries
  • Property limits
  • Mining leases
  • Construction corridors

Boreholes outside the project area should be flagged.


Coordinate System Verification

Common issues include:

  • UTM vs State Plane confusion
  • Latitude/longitude reversal
  • Incorrect projection
  • Incorrect datum

Cross-dataset checks can identify coordinates that do not align with other boreholes in the project.


Elevation Consistency

Collar elevations should remain consistent with site topography.

Examples:

BoreholeElevation
BH-101325 m
BH-102327 m
BH-1031125 m

The final elevation may indicate a unit conversion or entry error.


Duplicate Coordinate Detection

Multiple boreholes occupying identical coordinates should be reviewed.

Potential causes include:

  • Duplicate boreholes
  • Copy-and-paste errors
  • GPS recording mistakes

Automated validation can identify these issues immediately.


Workflow Integration

Cross-dataset validation delivers the greatest benefit when integrated into the project workflow.

Validation should occur:

During Data Entry

Immediate feedback prevents invalid records from accumulating.

During Import

Imported datasets should be checked before being accepted into the database.

During Review

Reviewers should see all unresolved cross-dataset issues.

Before Approval

Organizations often require:

  • No unresolved errors
  • No critical validation failures
  • All mandatory datasets present

before approving a borehole.


Benefits of Cross-Dataset Validation

Organizations implementing comprehensive cross-dataset validation often experience:

  • Improved data quality
  • Reduced manual review effort
  • Earlier error detection
  • Better regulatory compliance
  • More reliable geological models
  • Increased confidence in reporting
  • Reduced project risk

Most importantly, validation becomes proactive rather than reactive.

Problems are identified when data is entered rather than when reports are generated.


Best Practices

To maximize effectiveness:

Use Automated Rule Engines

Automated validation ensures consistent application of rules across all projects.

Assign Rule IDs

Examples:

  • R-201 Lithology Coverage Check
  • R-313 RQD Cannot Exceed Recovery
  • R-405 SPT Depth Validation
  • R-510 Screen Interval Validation
  • R-620 Coordinate Boundary Check

Differentiate Warnings and Errors

Not every inconsistency is invalid.

Warnings should highlight unusual conditions.

Errors should identify impossible conditions.

Maintain Audit Trails

Track:

  • Validation date
  • User
  • Rule triggered
  • Resolution status

for complete accountability.


Conclusion

Cross-dataset validation is one of the most powerful tools available for maintaining high-quality geotechnical databases. While field-level checks remain important, many of the most significant data quality issues occur between datasets rather than within them. By validating relationships between lithology, sampling, recovery, RQD, SPT records, well construction data, and coordinates, organizations can detect inconsistencies early, reduce project risk, improve reporting accuracy, and build greater confidence in their geological and

geotechnical information.

As geotechnical databases continue to grow in size and complexity, cross-dataset validation will become an increasingly essential component of modern QA/QC programs and digital borehole data management systems.


1 / ?