Picterra, the leading provider of geospatial machine learning software, today announced powerful new data curation and exploration technology that allows users to get a better understanding of their datasets and improve model accuracy. This industry-first innovation enables organizations and AI teams to get automated insights into their dataset and build more robust models with lower annotation costs.
This latest technology release builds upon Picterra’s recent market and platform momentum, in which the company announced the closing of a $6.5M investment and introduced powerful collaboration functionality. The company now serves more than 100 enterprises globally, helping leaders from General Motors to The World Bank to innovate operations, improve internal processes, and realize the strategic importance of Earth Observation (EO) data.
Visualizing data is the first step in any machine learning (ML) workflow and can often be challenging to perform when working with large and complex aerial imagery on a global scale.
The Data Exploration Report is an industry-first innovation that helps users reveal visual patterns in their data and provide key insights for better and more robust detectors.
“Dataset exploration is a game changer for Picterra users. It’s the first in a series of advanced data curation tools that will enable users to effortlessly take the performance of their detectors to the next level.” – Julien Rebetez, Chief Technology Officer at Picterra
Accessible alongside the training report, the Data Exploration Report allows a quick assessment of the training coverage and identifies areas where the user should concentrate on future iterations.
- Improve dataset quality to ensure the data covers the variety of appearances of an object that will be seen during production (e.g., “building on grass”, “building on snow”, etc). Better datasets lead to better models.
- Ensure validation set is representative: By making sure the validation set covers the variety of the dataset, the validation score is more representative of how well the model will perform in production on new data.
- Data curation: distribute and focus annotation effort on the dataset’s most impactful images/regions.
The features are based on unsupervised learning and clustering techniques and allow a user to evaluate the distribution of their dataset. This is important because it allows users to spot “annotation gaps” in their datasets.
The report divides large imagery into small tiles before grouping similar tiles together based on their visual similarity (e.g., forest, water, urban, etc). These tiles are then visualized within the interactive report allowing users to understand which regions are covered by the current training dataset and make adjustments where necessary.
Dataset exploration can also be used for “data curation” approaches. This is when you have a team of annotators and you need to assign them to images to annotate. By selecting the region to annotate using the Dataset Exploration Report, you make sure that you distribute the annotation workforce as efficiently as possible because they will annotate regions that maximize the diversity of appearance covered by the dataset. This leads to more robust detectors.
The following client example, using satellite imagery from Morocco, shows how the Data Exploration Report can be used to solve real-world problems. The goal of the detector, in this case, was to identify man-made holes used for reforestation—a natural solution to both preserve and strengthen biodiversity and combat climate change.
Following the initial detector training the Data Exploration Report was able to identify missing training coverage where the detector was not taught what the holes do not look like. Therefore the addition of empty training areas within the identified region reduces the risk of a higher rate of false positive detections when the detector is run at scale. A similar process can also ensure better accuracy area coverage.
Founded in 2016 in Switzerland, Picterra helps clients worldwide solve some of the toughest geospatial problems to future-proof and scale their businesses and support a transition to a decarbonized economy. With access to more Earth Observation (EO) images than ever before, companies across industries are realizing the strategic importance of this data, even ones that traditionally never saw a use case for satellite, drone, and aerial imagery. Picterra is the connecting power between the raw data from satellite, drone, and aerial imagery providers, and the domain experts who deliver geospatial services and advisory to their clients.