Where the Data Comes From

This entry is part 7 of 12 in the series July 2018

Members of Esri’s Living Atlas unit detail the company’s wide variety of data sources.

For nearly 50 years, Esri has been developing and supporting GIS software. What is less known is that for more than 20 years it has also been collecting and curating data and making it available to its users. “It has become harder and harder to acquire better, more consistent, and trusted data and to figure out which data is the best,” says Sean Breyer, program manager for Esri’s Living Atlas. This large team has the sole mission of increasing the content in its platform with regards to demographics, natural resources, weather, landscape areas, oceans, etc.

“Opening ArcGIS and starting with a blank slate is not the way GIS works anymore. Providing rich content at the beginning of your project will make things go much quicker. In the five years since The Living Atlas started, we have seen a massive up-tick in the use of this ready-to-use content, in the order of hundreds of millions of views a year.”

“Sometimes, we build the data or bring it to life ourselves for some federal sources that don’t have a mandate to make it ready for the GIS community,” says Breyer. “In other cases, we partner with large data providers. One of the biggest challenges there is finding good, authoritative data that is available and licensable in such a way that we can distribute it to our entire GIS community.”

Esri develops some datasets entirely in house, such as Updated Demographics, Consumer Spending, Tapestry Segmentation, and Diversity Index. It builds on third-party data to develop other datasets, such as Market Potential, Business Summary, and Daytime Population. It also hosts datasets produced by third parties, such as Crime, Shopping Centers, Traffic, and Business Locations, as well as global data from Michael Bauer Research, AIS Group, Environics Analytics, Esri France, Esri Japan, MapData Services, Nexiga, and OPENmate.

2010 U.S. Census data showing vacant housing by owners and renters.

2010 U.S. Census data showing occupied housing by owners and renters.

U.S. Census data from the American Community Survey showing insured and uninsured neighborhoods.

Demographic Data

Esri’s suite of U.S. demographic products fills the gap between data from the federal Census, which is collected every ten years and released a year or two later. It is produced by a team of demographers and statisticians who use a variety of data sources to estimate figures for the current year and project figures for future years. They help users find out, for example, where people are, what incomes they make, and how that has changed in the past year.

“Think back to the housing crisis of 2007-2008,” says Jim Herries, a project engineer and cartographer at Esri. “In 2006 everything looked great, right? So, it’s important to have that timely information available, not only as raw data, but also in incidentally consumable forms.”

Most of this data is for layers that might be used to do political analysis on different districts and their voting behaviors or for business analytics to study market potential in certain areas, says Breyer. Business analysts will often need demographic information for small areas, such as that around a local Walmart.

The Living Atlas, Herries says, “provides access to those datasets in a variety of useful formats—maps, charts, reports—as well as access to the raw data that you can plug into a process that you are trying to define.”

Accurate world population figures are very hard to obtain. Outside of the United States, Esri relies on the expertise of its large international distributor community to identify and work with local data providers to provide global demographics for more than 130 countries every year.

“Instead of locating just the polygonal area where a census might be done,” Breyer explains, “we use imagery, road intersections, and other data to move population to where people are, instead of laying them evenly out over the surface.”

Weather Data

Esri obtains real-time data on weather conditions, weather projections, floods, and stream flows from public agencies. In 2017, due in large part to the several hurricanes that hit parts of the United States, Esri’s stream gauge service—which aggregates data from NOAA, USGS, USACE, and other agencies, as well as private entities—received more than 24 million hits.

“Those are all real-time sensors that are part of our platform now,” says Breyer. The maps are updated, in some cases, as often as every five minutes.

Another dataset, called HYCOM, projects the direction and flow of currents in three dimensions across the ocean up to seven days into the future and is updated daily.

Jim Herries is a project engineer and cartographer with The Living Atlas.

Other very large datasets include one on precipitation, which shows how much water has fallen from the sky around the world and one that shows how much water has evaporated.

“These are models and estimated numbers,” says Herries. “They enable you to produce information products that show such things as the areas that are currently in draught and those that are likely to be in draught given the current trend lines.”

A lot of accurate real-time data is provided by federal and state agencies and some commercial groups, but not in a GIS-ready format, Breyer points out.

“The data for many of these datasets has been available for years, but no one has put it in a useful format. Our Living Atlas content team is taking responsibility for making some of that system data come into life so that every user does not have to do the same thing.”

Crowd-sourced Data

Esri also uses crowd-sourced data, such as traffic data from Waze. To achieve real-time situational awareness during natural disasters and large-scale emergencies, first-responders use Esri tools that often make use of crowd-sourced data to get a more accurate insight into the situation and align decision-making to community response.

A screenshot from Esri’s Drought Tracker.

Collision warnings in New York City, data provided by MobilEye.

Mobile phone movement during Hurricane Harvey, data provided by TeraLytics.

Another example of Esri crowd-sourced data is a process that enables communities to improve their base maps and routing by providing local information and authoritative data.

“For example,” says Breyer, “the City of Redlands’ City Manager probably has more accurate and more current information than we could get from a commercial provider and can provide that data to our system and then update it as needed so that we get better roads than we would with just a single commercial source.”

During Hurricane Harvey in Houston, working with Mobileye, Esri was able to map the locations of cell phones at the start of the hurricane, five days in, ten days in, and two weeks later, to help understand the flow of people.

“It was an amazing set of maps,” says Herries. “The flooded areas had some people who left, obviously, yet you also had plenty of areas where people stayed throughout the entire event, even though they were in or near a flood plain. How many people left this part of Houston and where did they disperse? Did they go to higher ground? Did they all go south? Or north?”

Esri also aggregated into a GIS layer data collected by an iPhone app that records 10-second sound snippets, geotags them, then asks users to characterize them, such as whether they were recorded in the city or in the country.

Formats

Many demographic attributes are available as prepared Web maps and map image layers in The Living Atlas. Some datasets are also available as standalone datasets in csv, xls, dbf, or fgdb formats. Both Esri’s desktop software and its online software include tools for exporting and importing data of all types, says Herries.

“When you are dealing with NOAA, you are going to have highly structured scientific data. When you are dealing with a business analyst working at a Walmart or Target or something like that, they are probably dealing with Excel or, perhaps, with an Oracle SAP implementation. So a big part of our organization spends time thinking about the interchange of data from one format to another.”

For example, in ArcGIS Online, if you drag and drop an Excel spreadsheet onto a Web browser, it will ask you what the columns are and what you want to do with the data.

Esri has integrated Safe Software’s FME data interchange tool into its Pro package. “It imports and exports almost 200 different data formats, for both tabular and spatial information,” says Breyer. “Our own team uses it very commonly.”

In other cases, such as with its live weather feeds, Esri has built an automated live feed methodology using its own software that allows it to quickly import large volumes of data.

“Sometimes it is not just about importing the data. It is about generalizing it, summarizing it, and putting it into a better format, and to do that we need a lot of these other tabular editing environments that allow you to summarize.”

Esri demographics showing a change in population.

Esri demographics showing areas of predominant low incomes.

Data Sets

Data compiled by Esri and used for analysis in fields from retail optimization to government statistics include:

  • 2017 Daytime Population;
  • 2017/2022 Updated Demographics: derived from a variety of public and private sources;
  • Diversity Index: analysis and mapping of seven race groups that can be either of Hispanic or non-Hispanic origin, for a total of 14 separate race/ethnic groupings;
  • Tapestry Segmentation: a market segmentation system designed to identify consumer markets;
  • Business Locations and Business Summary: extracted from a comprehensive list of businesses licensed from Infogroup;
  • Consumer Spending: data from the latest Consumer Expenditure Surveys from the Bureau of Labor Statistics, with tapestry segmentation data, providing a comprehensive database about consumer expenditures; and
  • Market Potential: data measuring the likely demand for a product or service in an area, based on survey data from the Survey of the American Consumer – GfK MRI.

Esri demographics are available for more than 130 countries and are accessible:

  • as prepared maps and map layers in Esri’s Living Atlas of the World;
  • via the ArcGIS Online GeoEnrichment Service, where developers can build custom applications with Esri software and content; and
  • in other products that access these maps and services, such as Business Analyst and Community Analyst Web apps, Business Analyst Desktop and Server, ArcGIS Pro and ArcMap, ArcGIS Maps for PowerBI, ArcGIS Maps for Office, and Esri Maps.

Some datasets are also available for purchase as standalone datasets and can be licensed for use in non-Esri analysis software.

“Esri has partnerships with other companies, such as Microsoft and IBM, and, where applicable, these tools allow users of other solutions to take advantage of Esri’s data,” says Breyer. “For instance, Microsoft Power BI lets its users have access to a whole host of Esri data that encompasses everything from map layers to statistics that are useful to businesses, such as demographics and market information.”

In other cases, the data are licensed to be used only with Esri client technology.

Challenges and Opportunities

In addition to finding trusted data sources and data-licensing issues, the biggest challenge in sourcing and distributing these data is “ensuring that there is a clear understanding of the data and its fit for use,” Breyer says. “Clear definitions of the attributes of the data and methodologies used in the collection process can greatly impact results.”

Yet another challenge is to move users from the old method—that consisted of installing a client application, downloading the data, querying it or writing an app against it, and then figuring out the best format for presenting the results, such as maps, charts, or tables—to the new Web GIS pattern, which consists of providing ready-to-use information products, apps, and maps.

“For example,” says Breyer, “it is still possible for someone to download Census data or call InfoGroup to purchase and download data. However, today we also have online solutions like Business Analyst Online, which allows someone to begin their analysis immediately, because Census data and infoGroup data is already baked into the product and user experience. Customers can spend their time doing analysis, not downloading and preparing data for analysis.”

“As a geographer for about 20 years,” says Herries, “I have never seen opportunities like this to combine layers that are ready to use from a variety of fields and sources to solve problems. Compared to the old days—when I used to have to walk across campus to go find the right book, find the data I needed, and hand-key it in—I think it is a really good time to be in geography and spatial analysis.”

 

 

Series Navigation<< Surveying Beyond the Water’s EdgeThe Future of Autonomous Transport >>

Where the Data Comes From” Comment

Leave a Reply

Your email address will not be published. Required fields are marked *