API Reference¶
Table of Contents
Basic Exploration¶
-
GroClient.
lookup
(entity_type, entity_ids)[source]¶ Retrieve details about a given id or list of ids of type entity_type.
https://developers.gro-intelligence.com/gro-ontology.html
- Parameters
entity_type ({ 'metrics', 'items', 'regions', 'frequencies', 'sources', 'units' }) –
entity_ids (int or list of ints) –
- Returns
A dict with entity details is returned if an integer is given for entity_ids. A dict of dicts with entity details, keyed by id, is returned if a list of integers is given for entity_ids.
Example:
{ 'id': 274, 'contains': [779, 780, ...] 'name': 'Corn', 'definition': 'The seeds of the widely cultivated corn plant <i>Zea mays</i>,' ' which is one of the world's most popular grains.' }
Example:
{ '274': { 'id': 274, 'contains': [779, 780, ...], 'belongsTo': [4138, 8830, ...], 'name': 'Corn', 'definition': 'The seeds of the widely cultivated corn plant' ' <i>Zea mays</i>, which is one of the world's most popular' ' grains.' }, '270': { 'id': 270, 'contains': [1737, 7401, ...], 'belongsTo': [8830, 9053, ...], 'name': 'Soybeans', 'definition': 'The seeds and harvested crops of plants belonging to the' ' species <i>Glycine max</i> that are used in the production' ' of oil and both human and livestock consumption.' } }
- Return type
dict or dict of dicts
-
GroClient.
search
(entity_type, search_terms)[source]¶ Search for the given search term. Better matches appear first.
- Parameters
entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –
search_terms (string) –
- Returns
Example:
[{'id': 5604}, {'id': 10204}, {'id': 10210}, ....]
- Return type
list of dicts
-
GroClient.
search_and_lookup
(entity_type, search_terms, num_results=10)[source]¶ Search for the given search terms and look up their details.
For each result, yield a dict of the entity and it’s properties.
- Parameters
entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –
search_terms (string) –
num_results (int) – Maximum number of results to return. Defaults to 10.
- Yields
dict – Result from
search()
passed tolookup()
to get additional details.Example:
{ 'id': 274, 'contains': [779, 780, ...], 'name': 'Corn', 'definition': 'The seeds of the widely cultivated...' }
See output of
lookup()
. Note that as withsearch()
, the first result is the best match for the given search term(s).
-
GroClient.
search_for_entity
(entity_type, keywords)[source]¶ Returns the first result of entity_type that matches the given keywords.
- Parameters
entity_type ({ 'metric', 'item', 'region', 'source' }) –
keywords (string) –
- Returns
The id of the first search result
- Return type
integer
-
GroClient.
get_data_series
(**selection)[source]¶ Get available data series for the given selections.
https://developers.gro-intelligence.com/data-series-definition.html
- Parameters
metric_id (integer, optional) –
item_id (integer, optional) –
region_id (integer, optional) –
partner_region_id (integer, optional) –
source_id (integer, optional) –
frequency_id (integer, optional) –
- Returns
Example:
[{ 'metric_id': 2020032, 'metric_name': 'Seed Use', 'item_id': 274, 'item_name': 'Corn', 'region_id': 1215, 'region_name': 'United States', 'source_id': 24, 'source_name': 'USDA FEEDGRAINS', 'frequency_id': 7, 'start_date': '1975-03-01T00:00:00.000Z', 'end_date': '2018-05-31T00:00:00.000Z' }, { ... }, ... ]
- Return type
list of dicts
-
GroClient.
find_data_series
(**kwargs)[source]¶ Find data series matching a combination of entities specified by name and yield them ranked by coverage.
Example:
client.find_data_series(item="Corn", metric="Futures Open Interest", region="United States of America")
will yield a sequence of dictionaries of the form:
{ 'metric_id': 15610005, 'metric_name': 'Futures Open Interest', 'item_id': 274, 'item_name': 'Corn', 'region_id': 1215, 'region_name': 'United States', 'frequency_id': 15, 'source_id': 81, 'start_date': '1972-03-01T00:00:00.000Z', ...}, { ... }, ...
See https://developers.gro-intelligence.com/data-series-definition.html
result_filter
can be used to filter entity searches. For example:client.find_data_series(item="vegetation", metric="vegetation indices", region="Central", result_filter=lambda r: ('region_id' not in r or r['region_id'] == 10393))
will only consider that particular region, and not the many other regions with the same name.
This method uses
search()
,get_data_series()
,get_available_timefrequency()
andrank_series_by_source()
.- Parameters
metric (string, optional) –
item (string, optional) –
region (string, optional) –
partner_region (string, optional) –
start_date (string, optional) – YYYY-MM-DD
end_date (string, optional) – YYYY-MM-DD
result_filter (function, optional) – function taking data series selection dict returning boolean
- Yields
dict – A sequence of data series matching the input selections
See also
Data Retrieval¶
-
GroClient.
get_data_points
(**selections)[source]¶ Get all the data points for a given selection.
https://developers.gro-intelligence.com/data-point-definition.html
Example:
client.get_data_points(**{'metric_id': 860032, 'item_id': 274, 'region_id': 1215, 'frequency_id': 9, 'source_id': 2, 'start_date': '2017-01-01', 'end_date': '2017-12-31', 'unit_id': 15})
Returns:
[{ 'start_date': '2017-01-01T00:00:00.000Z', 'end_date': '2017-12-31T00:00:00.000Z', 'value': 408913833.8019222, 'unit_id': 15, 'reporting_date': None, 'metric_id': 860032, 'item_id': 274, 'region_id': 1215, 'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2, 'belongs_to': { 'metric_id': 860032, 'item_id': 274, 'region_id': 1215, 'frequency_id': 9, 'source_id': 2 } }]
Note: you can pass the output of
get_data_series()
intoget_data_points()
to check what series exist for some selections and then retrieve the data points for those series. See quick_start.py for an example of this.get_data_points()
also allows passing a list of ids for metric_id, item_id, and/or region_id to get multiple series in a single request. This can be faster if requesting many series.For example:
client.get_data_points(**{'metric_id': 860032, 'item_id': 274, 'region_id': [1215,1216], 'frequency_id': 9, 'source_id': 2, 'start_date': '2017-01-01', 'end_date': '2017-12-31', 'unit_id': 15})
Returns:
[{ 'start_date': '2017-01-01T00:00:00.000Z', 'end_date': '2017-12-31T00:00:00.000Z', 'value': 408913833.8019222, 'unit_id': 15, 'reporting_date': None, 'metric_id': 860032, 'item_id': 274, 'region_id': 1215, 'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2, 'belongs_to': { 'metric_id': 860032, 'item_id': 274, 'region_id': 1215, 'frequency_id': 9, 'source_id': 2 } }, { 'start_date': '2017-01-01T00:00:00.000Z', 'end_date': '2017-12-31T00:00:00.000Z', 'value': 340614.19507563586, 'unit_id': 15, 'reporting_date': None, 'metric_id': 860032, 'item_id': 274, 'region_id': 1216, 'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2, 'belongs_to': { 'metric_id': 860032, 'item_id': 274, 'region_id': 1216, 'frequency_id': 9, 'source_id': 2 } }]
- Parameters
metric_id (integer or list of integers) – How something is measured. e.g. “Export Value” or “Area Harvested”
item_id (integer or list of integers) – What is being measured. e.g. “Corn” or “Rainfall”
region_id (integer or list of integers) – Where something is being measured e.g. “United States Corn Belt” or “China”
partner_region_id (integer or list of integers, optional) – partner_region refers to an interaction between two regions, like trade or transportation. For example, for an Export metric, the “region” would be the exporter and the “partner_region” would be the importer. For most series, this can be excluded or set to 0 (“World”) by default.
source_id (integer) –
frequency_id (integer) –
unit_id (integer, optional) –
start_date (string, optional) – All points with end dates equal to or after this date
end_date (string, optional) – All points with start dates equal to or before this date
show_revisions (boolean, optional) – False by default, meaning only the latest value for each period. If true, will return all values for a given period, differentiated by the reporting_date field.
insert_null (boolean, optional) – False by default. If True, will include a data point with a None value for each period that does not have data.
at_time (string, optional) – Estimate what data would have been available via Gro at a given time in the past. See at-time-query-examples.ipynb for more details.
include_historical (boolean, optional) – True by default, will include historical regions that are part of your selections
- Returns
- Return type
list of dicts
Geographic¶
-
GroClient.
get_geojson
(region_id)[source]¶ Given a region ID, return shape information in geojson.
- Parameters
region_id (integer) –
- Returns
Example:
{ 'type': 'GeometryCollection', 'geometries': [{'type': 'MultiPolygon', 'coordinates': [[[[-38.394, -4.225], ...]]]}, ...]}
- Return type
a geojson object or None
-
GroClient.
get_descendant_regions
(region_id, descendant_level=None, include_historical=True, include_details=True)[source]¶ Look up details of all regions of the given level contained by a region.
Given any region by id, get all the descendant regions that are of the specified level.
- Parameters
region_id (integer) –
descendant_level (integer, optional) – The region level of interest. See REGION_LEVELS constant. If not provided, get all descendants.
include_historical (boolean, optional) – True by default. If False is specified, regions that only exist in historical data (e.g. the Soviet Union) will be excluded.
include_details (boolean, optional) – True by default. Will perform a lookup() on each descendant region to find name, latitude, longitude, etc. If this option is set to False, only ids of descendant regions will be returned, which makes execution significantly faster.
- Returns
Example:
[{ 'id': 13100, 'contains': [139839, 139857, ...], 'name': 'Wisconsin', 'level': 4 } , { 'id': 13101, 'contains': [139891, 139890, ...], 'name': 'Wyoming', 'level': 4 }, ...]
See output of
lookup()
- Return type
list of dicts
-
GroClient.
get_provinces
(country_name)[source]¶ Given the name of a country, find its provinces.
- Parameters
country_name (string) –
- Returns
Example:
[{ 'id': 13100, 'contains': [139839, 139857, ...], 'name': 'Wisconsin', 'level': 4 } , { 'id': 13101, 'contains': [139891, 139890, ...], 'name': 'Wyoming', 'level': 4 }, ...]
See output of
lookup()
- Return type
list of dicts
See also
Advanced Exploration¶
-
GroClient.
lookup_belongs
(entity_type, entity_id)[source]¶ Look up details of entities containing the given entity.
- Parameters
entity_type ({ 'metrics', 'items', 'regions' }) –
entity_id (int) –
- Yields
dict – Result of
lookup()
on each entity the given entity belongs to.For example: For the region ‘United States’, one yielded result will be for ‘North America.’ The format of which matches the output of
lookup()
:{ 'id': 15, 'contains': [ 1008, 1009, 1012, 1215, ... ], 'name': 'North America', 'level': 2 }
-
GroClient.
rank_series_by_source
(selections_list)[source]¶ Given a list of series selections, for each unique combination excluding source, expand to all available sources and return them in ranked order. The order corresponds to how well that source covers the selection (metrics, items, regions, and time range and frequency).
- Parameters
series_list (list of dicts) – See the output of
get_data_series()
.- Yields
dict – The input series_list, expanded out to each possible source, ordered by coverage.
-
GroClient.
get_available_timefrequency
(**selection)[source]¶ Given a selection, return a list of frequencies and time ranges. The results are ordered by coverage-optimized ranking.
- Parameters
metric_id (integer, optional) –
item_id (integer, optional) –
region_id (integer, optional) –
partner_region_id (integer, optional) –
- Returns
Example:
[{ 'startDate': '2000-02-18T00:00:00.000Z', 'frequencyId': 3, 'endDate': '2020-03-12T00:00:00.000Z', 'name': '8-day' }, { 'startDate': '2019-09-02T00:00:00.000Z', 'frequencyId': 1, 'endDate': '2020-03-09T00:00:00.000Z', 'name': u'daily'}, ... ]
- Return type
list of dicts
-
GroClient.
get_top
(entity_type, num_results=5, **selection)[source]¶ Find the data series with the highest cumulative value for the given time range.
Examples:
# To get FAO's top 5 corn-producing countries of all time: >>> get_top('regions', metric_id=860032, item_id=274, frequency_id=9, source_id=2) # To get FAO's top 5 corn-producing countries of 2014: >>> get_top('regions', metric_id=860032, item_id=274, frequency_id=9, source_id=2, start_date='2014-01-01', end_date='2014-12-31') # To get the United States' top 15 exports in the decade of 2010-2019: >>> get_top('items', num_results=15, metric_id=20032, region_id=1215, frequency_id=9, source_id=2, start_date='2010-01-01', end_date='2019-12-31')
- Parameters
entity_type ({ 'items', 'regions' }) – The entity type to rank, all other selections being the same. Only items and regions are rankable at this time.
num_results (integer, optional) – How many data series to rank. Top 5 by default.
metric_id (integer) –
item_id (integer) – Required if requesting top regions. Disallowed if requesting top items.
region_id (integer) – Required if requesting top items. Disallowed if requesting top regions.
partner_region_id (integer, optional) –
frequency_id (integer) –
source_id (integer) –
start_date (string, optional) – If not provided, the cumulative value used for ranking will include data points as far back as the source provides.
end_date (string, optional) –
- Returns
Example:
[ {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9, 'sourceId': 2, 'value': 400, 'unitId': 14}, {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9, 'sourceId': 2, 'value': 395, 'unitId': 14}, {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9, 'sourceId': 2, 'value': 12, 'unitId': 14}, ]
Along with the series attributes, value and unit are also given for the total cumulative value the series are ranked by. You may then use the results to call
get_data_points()
to get the individual time series points.- Return type
list of dicts
Pandas Utils¶
-
GroClient.
get_df
(show_revisions=False, index_by_series=False)[source]¶ Call
get_data_points()
for each saved data series and return as a combined dataframe.Note you must have first called either
add_data_series()
oradd_single_data_series()
to save data series into the GroClient’s data_series_list. You can inspect the client’s saved list usingget_data_series_list()
.- Returns
The results to
get_data_points()
for all the saved series, appended together into a single dataframe. See https://developers.gro-intelligence.com/data-point-definition.html If index_by_series is set, the dataframe is indexed by series. See https://developers.gro-intelligence.com/data-series-definition.html- Return type
pandas.DataFrame
-
GroClient.
add_data_series
(**kwargs)[source]¶ Adds the top result of
find_data_series()
to the saved data series list.For use with
get_df()
.- Parameters
metric (string, optional) –
item (string, optional) –
region (string, optional) –
partner_region (string, optional) –
start_date (string, optional) – YYYY-MM-DD
end_date (string, optional) – YYYY-MM-DD
result_filter (function, optional) – function taking data series selection dict returning boolean
- Returns
The data_series that was added or None if none were found.
- Return type
data_series object, as returned by
get_data_series()
.
See also
-
GroClient.
add_single_data_series
(data_series)[source]¶ Save a data series object to the GroClient’s data_series_list.
For use with
get_df()
.- Parameters
data_series (dict) – A single data_series object, as returned by
get_data_series()
orfind_data_series()
. See https://developers.gro-intelligence.com/data-series-definition.html- Returns
- Return type
None
-
GroClient.
get_data_series_list
()[source]¶ Inspect the current list of saved data series contained in the GroClient.
For use with
get_df()
. Add new data series to the list usingadd_data_series()
andadd_single_data_series()
.- Returns
A list of data_series objects, as returned by
get_data_series()
.- Return type
list of dicts
Crop Modeling¶
-
CropModel.
compute_weights
(crop_name, metric_name, regions)[source]¶ Compute a vector of ‘weights’ that can be used for crop-weighted average across regions, as in
compute_crop_weighted_series()
.For each region, the weight of is the mean value over time, of the given metric for the given crop, normalized so the sum across all regions is 1.0.
For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by
search_and_lookup()
orget_descendant_regions()
for example. Now say model.compute_weights(‘soybeans’, ‘land cover area’, region_list) returns [0.6, 0.4], that means Province1 has 60% and province2 has 40% of the total area planted across the two regions, when averaged across all time.- Parameters
crop_name (string) –
metric_name (string) –
regions (list of dicts) – Each entry is a region with id and name
- Returns
weights corresponding to the regions.
- Return type
list of floats
See also
-
CropModel.
compute_crop_weighted_series
(weighting_crop_name, weighting_metric_name, item_name, metric_name, regions, weighting_func=<function CropModel.<lambda>>)[source]¶ Compute the ‘crop-weighted average’ of the series for the given item and metric, across regions. The weight of a region is the fraction of the value of the weighting series represented by that region as explained in
compute_weights()
.For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by
search_and_lookup()
or client.get_descendant_regions for example. Now model.compute_crop_weighted_series(‘soybeans’, ‘land cover area’, ‘vegetation ndvi’, ‘vegetation indices index’, region_list) will return a dataframe where the NDVI of each province is multiplied by the fraction of total soybeans area is accounted for by that province. Thus taking the sum across provinces will give a crop weighted average of NDVI.- Parameters
weighting_crop_name (string) –
weighting_metric_name (string) –
item_name (string) –
metric_name (string) –
regions (list of dicts) – Each entry is a region with id and name
weighting_func (optional function) – A function of (weight, value) to apply. Default: weight*value
- Returns
contains the data series for the given item_name, metric_name, for each region in regions, with values adjusted by the crop weight for that region.
- Return type
pandas.DataFrame
-
CropModel.
compute_gdd
(tmin_series, tmax_series, base_temperature, start_date, end_date, min_temporal_coverage, upper_temperature_cap)[source]¶ Compute Growing Degree Days value from specific data series.
This function performs the low-level computation used in
growing_degree_days()
.- Parameters
tmin_series (dict) – A data series object for min temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}
tmax_series (dict) – A data series object for max temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}
base_temperature (number) –
start_date (string) – YYYY-MM-DD date
end_date (string) – YYYY-MM-DD date
min_temporal_coverage (float, optional) –
upper_temperature_cap (number, optional) –
- Returns
The sum of the GDD over all days in the interval
- Return type
number
See also
-
CropModel.
growing_degree_days
(region_name, base_temperature, start_date, end_date, min_temporal_coverage=1.0, upper_temperature_cap=inf)[source]¶ Get Growing Degree Days (GDD) for a region.
Growing degree days (GDD) are a weather-based indicator that allows for assessing crop phenology and crop development, based on heat accumulation. GDD for one day is defined as max(T_mean - T_base, 0), where T_mean is the average temperature of that day if available. Typically T_mean is approximated as (T_max + T_min)/2. If upper_temperature_cap is specified, T_mean is capped to not exceed that value.
The GDD over a longer time interval is the sum of the GDD over all days in the interval. Days where the data is missing contribute 0 GDDs, i.e. are treated as if T_mean = T_base. Use the temporal coverage threshold to avoid computing GDD with too little data.
The threshold and the base temperature should be carefuly selected based on fundamental understanding of the crops and region of interest.
The region can be any region of the Gro regions, from a point location to a district, province etc. This will use the best available data series for T_max and T_min for the given region and time period, using “find_data_series”. In the simplest case, if the given region is a weather station location which has data for the time period, then that will be used. If it’s a district or other region, the underlying data could be from one or more weather stations and/or satellite. To by-pass the search for available series, use
compute_gdd()
directly.- Parameters
region_name (string) –
base_temperature (number) –
start_date (string) – YYYY-MM-DD date
end_date (string) – YYYY-MM-DD date
min_temporal_coverage (float, optional) –
upper_temperature_cap (number, optional) –
- Returns
The sum of the GDD over all days in the interval
- Return type
number