Was able to put together a small-scale proof of concept for a crop-yield prediction model with an R² = 0.996
The training data:
-
Soil type/composition, temperature, historical crop yields joined by datetime and location
-
Crop types: soybean and corn
-
Location: Rural Minnesota counties (Pope, Mower, Stearns)
-
Date Range; 2018-2019 (2 years only)
Model Summary:
-
Random Forest Regressor from scikit-learn
-
n_estimators: 50 trees (for small dataset)
-
max_depth: 3 levels deep (prevents overfitting)
-
random_state: 42 (reproducible results)
-
Test Data Results:
-
R² Score: 0.9963
-
RMSE: 4.48 bushels/acre
-
MAE: 3.49 bushels/acre
-
MAPE: 4.4%
-
Cross-Validation R²: 0.9990 (±0.0011)
Data Quality Limitations
-
Geographic Matching: ~423km average distance between soil samples and crop yield records
-
Small Sample Size: 27 records insufficient for robust modeling
-
Limited Diversity: Single state, 2 crops, 2 years
-
Synthetic Weather: Not real weather station observations
This model shows promise for crop yield predictions as I can accurately predict the yield of two crops in a specific region. Yet it has major areas of limitation, specifically in the scalability. This will be achieved through broadening the training dataset in terms of crop variance and historical weather patterns. This simply serves as a proof of concept, please reach out if you’d like to take a deeper look into my experiment and I can link you to the GitHub repository