Steady improvement in the datasource is yielding better results
-
594 training records with 49 total features (39 used for model training)
-
Geographic coverage: 5 states, 5 crops, 10+ years (2012-2022)
Key improvements:
-
Expanded region, crop, datetime diversity
-
Added more granular soil features
-
Expanded sample size
Future improvements
-
Add actual weather data per region and time sample was taken
-
expand record count
-
expand crop, date, and location selection diversity
-
expand soil selection limitations
** If you don’t understand this but you want to, follow along and we can discover training models together
Here’s a Feature Breakdown by Source
1. Soil Chemistry Features (12 features)
Source: ssurgo_lab_chemical_properties table - Real laboratory analysis
-
ph_h2o - Soil pH in water
-
ph_cacl2 - Soil pH in calcium chloride
-
estimated_organic_carbon - Organic carbon percentage
-
total_carbon_ncs - Total carbon content
-
total_nitrogen_ncs - Total nitrogen content
-
carbon_to_nitrogen_ratio - C:N ratio
-
cec_nh4_ph_7 - Cation exchange capacity
-
base_sat_nh4oac_ph_7 - Base saturation percentage
-
ca_nh4_ph_7 - Exchangeable calcium
-
mg_nh4_ph_7 - Exchangeable magnesium
-
k_nh4_ph_7 - Exchangeable potassium
-
na_nh4_ph_7 - Exchangeable sodium
2. Soil Physics Features (5 features)
Source: ssurgo_lab_physical_properties table - Real laboratory analysis
-
clay_total - Clay percentage
-
silt_total - Silt percentage
-
sand_total - Sand percentage
-
bulk_density_oven_dry - Soil bulk density
-
water_retention_15_bar - Water holding capacity
-
particle_density_less_than_2mm - Particle density
3. Location Features (7 features)
Source: ssurgo_lab_site + ssurgo_lab_layer tables - Real GPS coordinates from soil sampling
-
latitude_std_decimal_degrees - GPS latitude
-
longitude_std_decimal_degrees - GPS longitude
-
layer_key - Unique soil layer identifier
-
site_key - Unique soil site identifier
-
user_site_id - Site identification code
-
hzn_top - Soil horizon top depth (cm)
-
hzn_bot - Soil horizon bottom depth (cm)
-
texture_description - Soil texture classification
4. Crop Yield Features (6 features)
Source: nass_crops table - USDA NASS survey data
-
commodity_desc - Crop type (Corn, Wheat, Soybeans, Barley, Cotton)
-
year - Harvest year (2012-2022)
-
yield_value - Crop yield (bu/acre or tons/acre)
-
state_name - State where crop was grown
-
county_name - County where crop was grown
-
unit_desc - Yield measurement units
5. Weather Features (9 features) - SYNTHETIC DATA
Source: State-based climate modeling with regional averages
-
climate_zone - Köppen climate classification
-
avg_temperature - Average growing season temperature
-
max_temperature - Maximum temperature
-
min_temperature - Minimum temperature
-
total_gdd - Growing degree days (calculated)
-
avg_relative_humidity - Average humidity by state
-
total_precipitation - Total precipitation with drought/wet year adjustments
-
days_with_precipitation - Estimated precipitation days
6. Derived Features (7 features)
Source: Calculated from soil chemistry/physics and weather data
-
soil_quality_score - 0-100 composite score (pH + organic matter + CEC + clay)
-
soil_quality_class - Categorical: Poor/Good/Excellent
-
temp_optimality - Temperature suitability for crop growth (0-1)
-
precip_category - Drought/Normal/Wet classification
-
gdd_suitability - Growing degree day suitability by crop
-
ca_mg_ratio - Calcium to magnesium nutrient ratio
-
soil_crop_distance_km - Distance from soil sample to crop location