Why Data Relationships Matter More Than Individual Data Fields
Modern geotechnical, geological, environmental, and mining projects generate vast amounts of interconnected data. A single borehole may contain lithology logs, sampling records, laboratory results, recovery measurements, Rock Quality Designation (RQD) values, Standard Penetration Test (SPT) results, well construction details, survey information, and spatial coordinates.
Most data validation systems focus on checking individual fields. For example, ensuring a depth is positive or that a required field is populated. While these checks are important, they only detect a portion of potential data quality issues.
Many of the most serious errors occur when related datasets become inconsistent with one another. A lithology interval may not align with sample intervals. Recovery values may contradict RQD measurements. Well construction records may extend beyond the borehole depth. Coordinates may place a borehole outside the project boundary.
Cross-dataset validation addresses these problems by examining relationships between multiple datasets rather than validating each table independently.
This article explores the importance of cross-dataset validation in geotechnical databases and examines common validation scenarios involving lithology, sampling, recovery, RQD, SPT data, well construction records, and spatial coordinates.
What Is Cross-Dataset Validation?
Cross-dataset validation is the process of verifying consistency between related data tables within a geotechnical database.
Unlike simple field validation, which evaluates a single value, cross-dataset validation compares information across multiple records and datasets.
Examples include:
- Sample intervals should fall within lithology intervals.
- RQD cannot exceed recovery.
- Screen intervals should remain within borehole depth.
- SPT tests should occur within valid sampling intervals.
- Coordinates should match project boundaries.
These checks ensure that the database represents a logically consistent interpretation of the borehole rather than a collection of isolated records.
Why Cross-Dataset Validation Is Important
Data quality problems often emerge when information is entered by multiple users over long project durations.
For example:
- A geologist enters lithology.
- A technician records recovery.
- A driller enters well construction details.
- A surveyor updates coordinates.
- A laboratory uploads testing results.
Each dataset may appear valid independently while still containing inconsistencies when compared with related data.
Without cross-dataset validation, these issues may remain hidden until:
- Resource estimation begins
- Geotechnical designs are developed
- Regulatory reports are prepared
- Construction decisions are made
Finding problems late in a project is significantly more expensive than identifying them during data entry.
Lithology vs Sampling Validation
One of the most common cross-dataset checks involves comparing lithology intervals with sample intervals.
Lithology provides the geological interpretation of the subsurface, while sampling records represent physical material collected for testing.
The two datasets should align logically.
Sample Intervals Within Borehole Limits
Every sample interval should fall within the borehole depth.
Example:
| Borehole Depth | Sample Interval |
|---|---|
| 50 m | 10–12 m |
| 50 m | 25–26 m |
| 50 m | 48–55 m |
The final sample exceeds the borehole depth and should be flagged.
Validation Rule:
SampleTo <= BoreholeDepth
Sample Coverage Verification
Sampling intervals often need to correspond to logged lithology intervals.
Example:
| Lithology | From | To |
| Clay | 0 | 5 |
| Sand | 5 | 12 |
| Till | 12 | 20 |
Sample:
| Sample | From | To |
| S-101 | 5 | 7 |
The sample falls entirely within the sand interval and is valid.
If a sample spans multiple lithology units:
| Sample | From | To |
| S-102 | 4.8 | 6.5 |
The sample crosses a lithology boundary and may require review.
Such situations are not necessarily incorrect but should be identified for verification.
Missing Lithology Coverage
Every sampled interval should have corresponding geological logging.
Validation should detect:
- Samples without lithology
- Gaps in geological interpretation
- Unclassified intervals
These conditions can impact geological modeling and reporting.
Recovery vs RQD Validation
Core recovery and Rock Quality Designation (RQD) are among the most important rock quality indicators in geotechnical investigations.
Because RQD is derived from recovered core, the two values must remain logically consistent.
Understanding the Relationship
Recovery represents:
Recovered Core Length / Run Length × 100
RQD represents:
Length of Sound Core Pieces > 10 cm / Run Length × 100
Since RQD is calculated from recovered material, RQD cannot exceed recovery.
Validation Rule
RQD <= Recovery
Examples:
| Recovery | RQD | Result |
| 95 | 82 | Valid |
| 78 | 70 | Valid |
| 65 | 90 | Invalid |
The final example is physically impossible and should generate an error.
Recovery and Rock Type Relationships
Cross-dataset validation may also identify unusual combinations.
Examples include:
- Very high recovery in highly weathered rock
- Extremely low recovery in competent granite
- Sudden recovery changes without lithological explanation
These conditions often warrant review.
SPT Consistency Validation
Standard Penetration Test (SPT) data is widely used in geotechnical investigations.
SPT values are often entered separately from lithology and sampling datasets.
Cross-dataset validation helps ensure consistency.
SPT Depth Verification
SPT tests should occur within borehole limits.
Example:
| Borehole Depth | SPT Depth |
| 20 m | 18.5 m |
| 20 m | 25 m |
The second record exceeds borehole depth and should be rejected.
SPT and Sample Alignment
Many organizations collect SPT data within sampling intervals.
Validation can verify:
- SPT depth falls inside a sample interval
- Sample identifiers exist
- Associated recovery records are present
Missing relationships often indicate data entry errors.
Duplicate SPT Depths
Multiple SPT tests at identical depths may indicate duplicate records.
Example:
| Depth |
| 10.0 |
| 10.0 |
| 10.0 |
Cross-dataset validation can identify duplicates automatically.
SPT Value Range Checks
While technically a field-level validation, comparison with lithology can provide valuable context.
Examples:
- N-value of 2 in dense gravel
- N-value of 100 in soft clay
Such combinations may be valid but should be reviewed.
Well Construction Validation
Environmental and hydrogeological projects frequently include well construction information.
These records must remain consistent with borehole geometry.
Screen Interval Validation
Well screens must fit within the drilled borehole.
Example:
| Borehole Depth | Screen Interval |
| 30 m | 10–20 m |
| 30 m | 25–35 m |
The second screen extends beyond the borehole depth and is invalid.
Screen and Casing Overlap Checks
Validation should verify:
- Screen intervals do not overlap casing intervals improperly
- Filter packs surround screens
- Annular seals are correctly positioned
These checks improve confidence in monitoring well design.
Construction Sequence Validation
A typical monitoring well contains:
- Borehole
- Screen
- Filter pack
- Bentonite seal
- Surface completion
Cross-dataset validation ensures components are arranged logically.
Incorrect sequences often indicate data entry mistakes.
Material Consistency Checks
Validation can compare:
- Screen type
- Casing material
- Well diameter
- Installation method
This helps identify configuration problems before regulatory submissions.
Coordinate Cross-Checking
Spatial errors are among the most expensive mistakes in geotechnical databases.
Incorrect coordinates can affect:
- Geological models
- Resource estimates
- Groundwater studies
- Engineering designs
- Regulatory compliance
Cross-dataset validation plays a critical role in spatial quality control.
Project Boundary Validation
Coordinates should fall within the project area.
Validation can automatically compare borehole locations against:
- Site boundaries
- Property limits
- Mining leases
- Construction corridors
Boreholes outside the project area should be flagged.
Coordinate System Verification
Common issues include:
- UTM vs State Plane confusion
- Latitude/longitude reversal
- Incorrect projection
- Incorrect datum
Cross-dataset checks can identify coordinates that do not align with other boreholes in the project.
Elevation Consistency
Collar elevations should remain consistent with site topography.
Examples:
| Borehole | Elevation |
| BH-101 | 325 m |
| BH-102 | 327 m |
| BH-103 | 1125 m |
The final elevation may indicate a unit conversion or entry error.
Duplicate Coordinate Detection
Multiple boreholes occupying identical coordinates should be reviewed.
Potential causes include:
- Duplicate boreholes
- Copy-and-paste errors
- GPS recording mistakes
Automated validation can identify these issues immediately.
Workflow Integration
Cross-dataset validation delivers the greatest benefit when integrated into the project workflow.
Validation should occur:
During Data Entry
Immediate feedback prevents invalid records from accumulating.
During Import
Imported datasets should be checked before being accepted into the database.
During Review
Reviewers should see all unresolved cross-dataset issues.
Before Approval
Organizations often require:
- No unresolved errors
- No critical validation failures
- All mandatory datasets present
before approving a borehole.
Benefits of Cross-Dataset Validation
Organizations implementing comprehensive cross-dataset validation often experience:
- Improved data quality
- Reduced manual review effort
- Earlier error detection
- Better regulatory compliance
- More reliable geological models
- Increased confidence in reporting
- Reduced project risk
Most importantly, validation becomes proactive rather than reactive.
Problems are identified when data is entered rather than when reports are generated.
Best Practices
To maximize effectiveness:
Use Automated Rule Engines
Automated validation ensures consistent application of rules across all projects.
Assign Rule IDs
Examples:
- R-201 Lithology Coverage Check
- R-313 RQD Cannot Exceed Recovery
- R-405 SPT Depth Validation
- R-510 Screen Interval Validation
- R-620 Coordinate Boundary Check
Differentiate Warnings and Errors
Not every inconsistency is invalid.
Warnings should highlight unusual conditions.
Errors should identify impossible conditions.
Maintain Audit Trails
Track:
- Validation date
- User
- Rule triggered
- Resolution status
for complete accountability.
Conclusion
Cross-dataset validation is one of the most powerful tools available for maintaining high-quality geotechnical databases. While field-level checks remain important, many of the most significant data quality issues occur between datasets rather than within them. By validating relationships between lithology, sampling, recovery, RQD, SPT records, well construction data, and coordinates, organizations can detect inconsistencies early, reduce project risk, improve reporting accuracy, and build greater confidence in their geological and
geotechnical information.
As geotechnical databases continue to grow in size and complexity, cross-dataset validation will become an increasingly essential component of modern QA/QC programs and digital borehole data management systems.


