Your Reading List

A place for community data

“Mining data isn’t a substitute for doing actual research,” warns seed company scientist

In the past five years, precision agriculture systems have begun providing an exciting opportunity to collect and pool data on factors ranging from yields to soil quality and beyond. Each data set offers greater insight into the characteristics of a field and its variability, or to its potential response to different treatments and management practices.

To get a sense of the power of this technology, just talk to dealers and systems specialists and see how they integrate it in almost every decision.

Yet to paraphrase the old saying, with great potential comes great responsibility. Just because a yield monitor or other GPS-based data-gathering system can produce huge stacks of numbers doesn’t mean that the numbers will actually be useful, or that they will open up any “quick-fix” capability.

Related Articles

Close up of a soybean plant

It takes years of theorizing, researching and correlating data to fine tune management practices or to suggest changes to those practices. Farmers know what it’s like to engage in their own fact-finding process on their farms, tweaking and adjusting various aspects of their field management before making significant changes. Now, most advisers, retailers and company agronomists make the same recommendations, saying the long-term, total systems approach is best.

It’s why replicated data or multi-year results are more valuable than one-year data.

It’s against this backdrop that a trend is taking shape. In spite of advice to the contrary, more farmers are engaging in the use of so-called “community data.” It’s an amalgamation of results which is often inconclusive yet is used to change on-farm practices, in spite of any clear support to do so.

Last March, Dr. Mark Jeschke, agronomy information manager with DuPont Pioneer in Johnston, Iowa, published an article with a cautionary tone regarding the use of community data (see “Further reading” at the bottom of this page). The piece focuses on the effect of community data versus trial data in decisions on corn-seeding rates only, and lists four shortfalls of community data while citing only one benefit to the practice.


One thing that has to be made clear right from the start is the definition of community data. As Jeschke notes, he is not talking about on-farm trials where there are comparisons set up on a farm and across numerous locations, compiling data from those settings. That type of information gathering is very valuable, and companies perform those all the time.

In this context,“community data” means pulling in normal production data, where there aren’t any comparisons set up ahead of time. It’s where growers or dealers are mining what could be superficial data from the yield monitor, creating breakouts and recommendations based solely on that data in order to identify trends or differences in treatments.

“It was GPS and yield monitoring in the mid-1990s that really opened the door to this, to where you’re able to collect spatial data on fields for the first time, really,” says Jeschke. “What we’re seeing now —and this has been a long time coming —are improvements in that data handling and transfer capabilities that make aggregating data much more seamless than it has been in the past… it’s really starting to open up some possibilities to pull together large data sets in a way that we haven’t been able to before.”

Jeschke doesn’t want to sound like a naysayer when it comes to gathering community data and extracting value. Possibi­lities are starting to open up in that field, and there are positive things a grower can do with that information. It can be useful in identifying trends or generating hypotheses or pointing researchers in a new direction. But on its own, the use of community data can be misleading, unless it involves some degree of standardization of conditions in fields, in-season treatments or ground-truthing.

As always, the more and better the information that’s part of the pool of data, the more reliable it may be. Otherwise, it’s not recommended that growers try to draw any concrete conclusions just from mining data without the proper controls and parameters from the outset.

Using it to identify trends can yield some definite value, provided it’s measured against the appropriate context of actual research data or other information to ensure someone isn’t basing a recommendation on incomplete interpretation.

Superficial results

Jeschke mentions a recent visit to a farm show where a farm network business had seeding rate data on display, citing a Pioneer hybrid among others, and he says it points to community data’s limitations.

“You have a large number of acres covering a range of different seeding rates,” he explains. “The fields planted to 30,000 seeds per acre are likely an entirely separate set of fields from those planted at 35,000 seeds per acre. The data sets have been pulled together and you don’t have any head-to-head comparisons of those two seeding rates; you only have one set of fields planted at 30,000 and another set of fields at 35,000. So it’s that fundamental limitation as to what conclusions you can draw from that data, because you’re not actually comparing those two seeding rates from the data you have. Also, you have no information as to whether or not the seeding rates used in any of the fields were actually the optimal rates for those environments.”

Another example he cites involved one of his agronomists, who ran a series of research trials with 30 locations, half of which were treated with a fungicide and half were not. What can be determined from the data concerning the value of a foliar fungicide? Not much, says Jeschke, because it involves two separate sets of locations. The fungicide-treated locations were quite a bit higher yielding but they might have just been higher-yielding fields from the outset.

“It gives you something to start with but you can’t really draw any conclusions from that,” says Jeschke. “If you scale that up from 30 locations to 300, you have a lot more data, but you still haven’t overcome that underlying problem where you don’t have any head-to-head comparisons in the same environment.”

Instead, in the case of Jeschke’s agronomist, she had these locations where there was a tremendous difference between treated and non-treated fields, and that provides an opportunity to extrapolate other possible conclusions based on actual conditions. For instance, that particular year, northern leaf blight was a considerable problem, so there was a higher-than-average probability of seeing a yield benefit from foliar fungicides. And actual plant research trials in that area also showed a higher benefit.

“You can use that bit of information, and you can’t draw any conclusions from it by itself, but putting it in a context of things that you know, it adds one facet to the overall story,” says Jeschke. “But mining data isn’t a substitute for doing actual research.”

The overall concern is that there is a lot more data available, together with more players in the industry offering data services and trying to make sense of that information. It’s an underlying trend that’s also coinciding with a decline in university and even industry-based research investment in crop management (one of his motivators in writing the article).

“You have new players coming into the marketplace that are pulling together data — and we’re seeing this already — and presenting it in ways where the inferences they’re trying to draw from the data you can’t fundamentally draw the way they’re trying to do it,” says Jeschke.

More information is better than less

As much as she agrees with Jeschke’s statements concerning the need for standardization and a better understanding of scientific principles in research, Karon Cowan believes there is an overriding positive spin to community data: learning. As president of AgTech GIS, she’s more familiar with the term “on-farm research,” adding that it’s been part of a trend she’s noticed that promotes growers gathering and networking, or companies trying to aggregate data on behalf of participating growers.

“In a lot of cases, I don’t know whether ‘research’ is even the right word but it’s a learning opportunity, for certain,” says Cowan, adding that “research” sounds better than “shared data” or “aggregated data.” “The great thing about growers or groups of growers wanting to collaborate is just that — that they do want to learn and they do want to collaborate, and they do want to learn from each other. I think they’re also very curious about ‘where do I stack up?’ So there are three things: one, we want to collaborate; two, we want to learn, and three, we’re curious. Those are all the good things to come out of this — they’re positive as are the effects.”

Where the problems arise, she says, is when growers gather this information and then use it as a measurement. If you’re going to call it research and have it truly be a yardstick, you have to define your yardstick. When data comes in and it’s not calibrated or it’s unbalanced or not set up according to some rigorous research standards, then it’s not a very good yardstick.

Again, those three traits — to collaborate, to learn and to be curious — are all extremely positive things. And Cowan agrees it’s hard to allow one thing (a lack of standards or proper calibration) to weaken or discourage another (collaboration or curiosity).

When aggregating data, the first thing is to share the methodology to see if the results can actually be considered together. If they can’t, don’t do it.

Know from the start

Cowan often encourages growers who are considering sharing data to define their goals together, set up the parameters and try to follow — as best they can — established research practices, ensuring all participants are following the same protocols. That includes pre-documenting as much as possible, such as soil types in the areas of the research, a base-level fertility, so there are as many common footprints or at least those differences are known as underlying information before the seed is planted or the ground is treated. Then they should track in-season conditions and treatments as well, such as plant growth stages, fertilizer application timing and amounts, and herbicides, pesticides and fungicides. Again, the more information that’s included with the final data, the more useful it becomes.

Ultimately, the fewer exceptions to the rules, the more meaningful that information becomes, both as an opportunity to share, and for taking that information forward and using it to improve management practices and crop performance.

Good data can come from less-stringent conditions, but participants must define what it is they’re going after. Only then can they decide whether it was successful, based on what they were trying to accomplish. It may be done according to research standards but it may satisfy their curiosity and what they’re hoping to learn. Separate research-grade from curiosity-grade research and it still has value because it’s collaborative.

Cowan also believes there’s a mindset that causes many people to get stuck — and that includes the variety and hybrid trials currently available. Growers are curious about that data, and companies and dealers are doing a lot of that work. But there are other things they could be collaborating on, including tillage practices or new types of fertilizer.

“They have to document, they have to plan, they have to know what it is they’re trying to achieve,” says Cowan. “They have to replicate it and they have to be able to know enough other things to know about where those plots are placed to rule out the anomalies in the data.”

This article first appeared in the October 2016 issue of the Soybean Guide.

Further reading

Crop Focus (file will open in your browser as a PDF; courtesy of Dupont Pioneer)

About the author

CG Production Editor

Ralph Pearce

Ralph Pearce's recent articles



Stories from our other publications