Retrieval Scheme Sample: Eagle Mountain, UT

This is the second article about demographic retrieval schemes.  The first article, which described the methodologies behind the most popular retrieval schemes in use today, can be found here.  This article will take the methodologies outlined in the previous article and will apply them to a real world location in Eagle Mountain, UT.

The Study Area

We will be analyzing a 1 mile, 1.5 mile, and a 3 mile radius around the intersection of Cedar Fort Rd & Ranches Pkwy in Eagle Mountain, UT (click here to see on a map).  According to this website, Eagle Mountain was incorporated in December 1996 and has grown from 250 residents to more than 20,000 today.  In addition, the population is expected to grow at 10% over the next several years.  Given its explosive growth, this will be an excellent area to compare/contrast the various retrieval methodologies.  The aerials and the location of the housing subdivisions indicate the majority of Eagle Mountain’s population is located within the 3 mile trade area.

Summary of Findings

The following table outlines the total population that would be shown on a demographic report for the selected intersection.  The only difference between each of these numbers is the demographic retrieval scheme that was selected, the base block group population counts are exactly the same.

Table 1: Summary Results

Retrieval Scheme 1.0 Mile 1.5 Miles 3.0 Miles
Proportional Area 682 1,459 6,425
Block Group Centroid 0 1,817 9,729
Block Centroid 1,353 2,417 13,717
Postal/Building 2,733 8,132 19,294
Street 1,485 3,064 10,262
 

Finding Detail

Proportional Area:

Why are the population numbers generated using proportional area so much lower than the 20,000 known to be in the area?  The discrepancy results from the greatest flaw in the proportional area methodology, the assumption of uniform distribution of population.  As discussed in the previous blog entry, proportional area retrieval assumes that population is uniformly distributed throughout the block group.  Therefore, if 10% of the area of the block group is within your trade area, you get 10% of the population.  Figure 1 shows the location of the 3 trade areas we are analyzing along with the block groups that overlap them.

Figure 1: Proportional Area Detail

EM_PropArea

As you can see there are several block groups that have a very slight area overlap with the trade areas.  For example, look at block group 490490101024.  The proportional overlap between this block group and the trade area is very small, thus this methodology will proportion a very small percentage of the overall population within this block group to the trade areas.  By the way, this block group has the greatest population of any of the block groups in the area and its population is concentrated in some communities located within the 3 mile trade area.

Block Group Centroid:

The block group centroid methodology includes/excludes a block group based on the location of the block group’s centroid in relation to the trade area.  The fact that this methodology returned more accurate numbers at 3 miles than proportional area retrieval is more dumb luck than anything else (it returned 0 people within 1 mile which is completely wrong).  You often see very large changes in population when using this methodology as you increase the size of the trade area a small amount.  This is a result of the wholly included/excluded nature of the methodology.  It is suggested that this methodology should NOT be used for small scale trade area analysis.

Block Centroid:

If this is the most common form of demographic retrieval, why are the numbers so low?  The answer is do to the fact that the census 2000 block weights were used for this analysis (which is the most common form of block centroid retrieval).  This is assuming that the population distribution as it existed in April of 2000 is the same as it exists today.  This area has changed dramatically since April 2000 and has seen substantial housing development within 3 miles of the site.  If you are going to use this retrieval methodology, make sure you use an updated block centroid file that has updated weights.  A few demographic data providers provide up-to-date block centroid files.

Postal/Building:

The postal methodology performed very well here because of its ability to work with updated USPS zip+4 data.  Thus, as the new housing communities where built and started receiving mail they became available within the base data used for retrieval.  Figure 2 shows the location of the USPS zip+4s in the area.

Figure 2: Postal Methodology Detail

EM_Zip4

Each of the red dots in Figure 2 shows the geographic location of a USPS zip+4 centroid.  As you can see, the location of the zip+4s are concentrated around the location of the planned communities.  This sample also highlights one of the weaknesses of the postal methodology, the houses need to be receiving mail directly.  There are a lack of zip+4s in areas of large farms and other very sparsely populated areas that are more than likely on rural delivery routes.  However, for the purposes of most types of retail trade area analysis, these areas are not overly important and constitute a small percentage of the overall population.  However, if you are working in extremely rural areas that do not receive mail than you should use an alternative retrieval methodology.

Street:

In theory, the street retrieval methodology has merit.  So what happened to it in this example?  The discrepancy results from the rural areas of the block groups having a large total length of streets, however they are just very sparsely populated.  Figure 3 shows the streets as light grey lines.

Figure 3: Street Methodology Detail

EM_Streets

By looking at the aerials in this area, several of the streets are connecting large farms.  Thus you could have a 5 mile long road with just a few households on it.  In contrast, the planned communities in this area are relatively dense which means there is a very high population located in an area with relatively low street length.  This methodology would work in areas that have similar size housing lots or areas that are relatively uniform in population density.  However, as this sample shows, it should be avoided in non-homogeneous areas.

Overview of Demographic Retrieval Schemes

This post explains the different demographic retrieval schemes that are in use today.  Illustrations along with pros and cons of each retrieval scheme are provided.  This is an extremely important concept in market analysis and can lead to very different numbers on your demographic reports.  In fact, the choice of retrieval scheme can often lead to larger differences in the demographic profile than if you switched demographic providers altogether!

What is a demographic retrieval scheme?

A demographic retrieval scheme refers to the methodology used to allocate block group data (the smallest unit of geography for which the US Census releases detailed data) to a trade area that encompasses a portion of the block group.  Figure 1 illustrates the need for allocation.

Figure 1: The need for retrieval schemes

Allocation Sample

In Figure 1 the black/white circle represents a trade area in which we want to calculate demographics.  The blue dashed polygons are block groups and their corresponding population numbers are shown in blue.  If the trade area followed block group boundaries the allocation would be simple, it would be 100%.  However, the above trade area splits four block groups.  How do we determine the population from each block group that is within our trade area?  The answer depends on the retrieval scheme your system is using.

Proportional Area

The proportional area methodology assumes equal distribution of population within the block group. The proportion of the population allocated to a trade area from a block group is based on the proportion of land area overlap between the trade area and the block group.  For example, lets look at Figure 2 which shows the total area of each block group along with the amount of overlap between each block group and the trade area.

Figure 2: Proportional Area Overview

PropArea

Given the data in Figure 2, the demographics would be calculated as shown in the following table.

Table 1: Proportional Area Example

Block Group (BG) ID

Population in BG

% Overlap between Polygon and BG

Population In Polygon

A

100

22% (.22/1)

22 (22% * 100)

B

150

16% (.22/1.4)

23 (16% * 150)

C

50

44% (.62/1)

22 (44% * 50)

D

500

35% (.35/1.4)

175 (35% * 500)

In the above table, column 4 is calculated by multiplying the population in the block group (column 2) by the % overlap between the polygon and the block group (column 3).

Pros: Very easy to calculate from within a GIS or spatially enabled database.
Cons: The assumption of uniform distribution of population is greatly flawed, especially in high growth areas where block groups tend to be very geographically large.

This method can become slow since it requires a polygon-polygon overlay.

Block Group Centroid

The block group centroid methodology includes/excludes a block group based on the location of the block group’s centroid in relation to the polygon.  If the block group’s centroid is within the trade area its population is wholly included.  If the block group centroid is outside the trade area than it is completely excluded.

Figure 3: Block Group Centroid Sample

BG_Centroid

In the graphic above the black/white ring represents the trade area, the blue squares are the centroids of the block groups, red indicates block groups whose population would be included based on the location of their centroids, and the yellow indicates block groups that would be excluded.

Pros: Easy to calculate from within a GIS or spatially enabled database since it is a simple point in polygon operation on a relatively small amount of points (there are approximately 211,000 block groups in the 2000 US Census.
Cons: Accuracy is a major issue.  As figure 1 illustrates, numerous block groups can be excluded even when a large percentage of their land area is contained in the trade area.

Block Centroid

Block centroid is the most widely used retrieval scheme today and is based off of the census block.  A census block is the smallest geographic area for which the Bureau of the Census collects and tabulates the decennial US Census.  There were approximately 8 million census blocks in the 2000 US Census.  However, only a very limited amount of data is released at this geographic level (population, households, group quarters, housing units).

The methodology for this retrieval scheme works as follows:

  1. Assume we have a base weight table which contains all the census blocks, the lat/long of their centroid, and the percent of the block group total for the data fields (i.e. population, households, etc).
  2. Determine all census blocks that are contained in the polygon.  This is done via a point in polygon operation against the base weight table.
  3. Once all the census blocks that are contained in the polygon are identified, a query is performed which groups the blocks based on their block group id and sums the weights.  The result is a final weight table which includes block group id, and a total weight for each variable (i.e. population, housing units, etc).
  4. The final weight table is used to proportion the block group data to the polygon.

Figure 4: Block Centroid Sample

Blck_Centroid Figure 4 illustrates the block groups as blue polygons, the block centroids as blue squares and also shows the street network as light black lines.  Given figure 4, the demographics would be calculated as shown in Table 2.  For the sake of simplicity lets assume each block centroid in a block group has the same weight.  For example, block group A has four block centroids, thus we are going to assume each has a weight of 25%.  In actuality, each block centroid is typically given a different weight based of either the 2000 US Census or an updated number supplied by the demographic provider.

Table 2: Block Centroid Sample

Block Group (BG) ID

Population in BG

% of Population in Trade Area Overlap

Population In Trade Area

A

100

25% (1 of 4)

25 (25% * 100)

B

150

16% (1 of 6)

23 (16% * 150)

C

50

50% (2 of 4)

25 (50% * 50)

D

500

33% (2 of 6)

165 (33% * 500)


Pros: Since each block centroid can be assigned different weights, this methodology does not assume equal distribution of population.

Fast to calculate.

Some demographic providers provide updated weights as part of their demographic products.  This is very important since without it you are assuming the weights from the 2000 Census are still correct.

Cons: Census blocks are only updated once every 10 years.

Census blocks are just an artificial point that represents the geographic centroid of the block.  In many cases, this centroid could be in the water or in the middle of a cemetery.  As Figure 4 illustrates, the block centroids are not located along the road segments.

In rural areas (or areas that were rural in 2000), the blocks can still be very large.

Since it is a point based system, a census block is either wholly included or excluded.

Postal/Building Based

The methodology for the postal/building based retrieval scheme is very similar to the block centroid scheme except it is using a combination of USPS zip+4 data and housing start data instead of census blocks.  As of the time of this writing there are approximately 29 million residential based zip+4s in the United States.

Figure 5: Postal/Building Based Sample

Postal

Figure 5 illustrates the block groups as red polygons, the postal zip+4s as blue squares and also shows the street network as light black lines.  By comparing Figure 5 to Figure 4 you can see that the quantity of zip+4s in this area is much larger than the amount of block centroids and that the zip+4s follow the road network.  As we did for block centroids, for the sake of simplicity lets assume each zip+4 in a block group has the same weight.  The following table outlines how population would be calculated for the trade area.

Table 3: Postal Based Sample

Block Group (BG) ID

Population in BG

% of Population in Trade Area Overlap

Population In Trade Area

A

100

7.6% (11 of 145)

8 (7.6% * 100)

B

150

2.2% (5 of 228)

3 (2.2% * 150)

C

50

30.8% (64 of 208)

15 (30.8% * 50)

D

500

51.9% (112 of 216)

259 (51.9% * 500)


Pros: Since this method relies on postal zip+4 data, it ignores areas where people are not living.

The USPS releases the zip+4 data monthly.

As new communities are built and start receiving mail, this methodology will account for the new postal deliveries.

The location of the zip+4s follows the distribution of population.

Cons: Since it is a point based system, a zip+4 is either wholly included or excluded.

Given the large number of zip+4 points, this method is slower than block centroid retrieval.

Street Based

A street based retrieval scheme assumes people are evenly distributed along the street network.  The proportion of the population allocated to a ring/polygon from a block group is based on the proportion of street length between the ring/polygon and the block group.  The following table outlines how this is accomplished.

Table 4: Street Based Example

Block Group (BG) ID

Total miles of Streets In BG

Total miles of Streets In Polygon from BG

% of Population to Allocate

A

2

.35 17.5%

B

3

.35 11.7%

C

2

.7 35%

D

4

1.1 27.5%

In the above table, column 2 represents the total miles of streets within the block group.  Column 3 represents the total miles of streets from the block group that are within the polygon.  The percent of population to allocate is then simply calculated by dividing column 3 by column 2.

Pros: This method does not rely on an all or nothing inclusion of a centroid.

Since its street based, this method ignores all areas where people cannot be living.

As new communities are built and the streets get added, this methodology will account for the new streets.

Cons: The assumption of even distribution of population along the street network is questionable.  For example, think about cul-de-sacs which have a very high population to street length ratio.

The calculation of total street segment length within the polygon is computationally intensive.

Next Steps

In my next post, I will provide a real world sample showing the differences that these retrieval schemes can have on a demographic profile around a location.


Follow us on Twitter

Blog Contributors

Archives


Follow

Get every new post delivered to your Inbox.