Part I : Introduction and Quick Start
Wisdom will dictate to the GIS CAD professional that the most valuable of GIS data will not be sitting prepackaged under a search engine waiting to be downloaded and used for free and without forethought, under most circumstances. If one is to be successful in finding data and relations between data, they should be prepared to collect instances of their own when faced with a personal or professional challenge. This is where the process of data mining comes into play.
The Web is just ripe with information to be mined. Sometimes, a little ingenuity can go a long way in the process of mining interesting GIS data for fun and profit from the Internet. It can be a very rewarding experience, and the world could always use more gratis / open source GIS datasets in general (consider re-releasing your collected data back into the wild, especially through the CCURISA).
In my own personal experience, I recently encountered valuable 2005 demographic data released by ESRI that was not being released in any database or shapefile format. The Community Tapestry Segmentation System profiles the top three economic or commercial segments of most U.S. zipcodes. It is being presented as a search engine, which can be manipulated by a simple GET query (a string sent directly to their web server which simulates the action of actually typing a zipcode in the appropriate field on the page and submitting it). I immediately recognized the potential for this data to be used in ArcGIS and TatukGIS with the ZCTA boundary maps provided by the 2000 Census. Therefore, I went ahead and created shapefiles after spidering this data. The subset of data accessible by this search engine is obviously in the public domain.
Anyone involved in GIS should at least have the programming expertise to be familiar with a scripting language, such as Perl or Python. Both of these interpreted languages have libraries which make it easy to access documents and interactive content on the Web. For an application such as a web spider (an application which automatically submits, reacts to, and collects data), it is helpful to be able to code it in an object-oriented language with classes. Python is natively object-oriented, but Perl 5 can be coerced into an object-oriented structure. Perl is my interpreted language of choice, so I chose it for this project. I will not be releasing the code to my spider, but I will share its output with you - a dBASE database file that can be opened in ArcMap natively. I have also gone through the trouble of generating a complete basemap of the United States that is segmented by zipcode and is integrated with the tapestry segment data.
If you know what you are doing, you can immediately import the Tapestries_in_America shapefiles into ArcMap or TatukGIS viewer and start querying data based upon tapestry segments (which are listed in the link provided in the next paragraph). In other words, the experienced user should be able to import the data provided in this tutorial and start generating SQL queries based upon the different demographic profiles just by skimming the appropriate sections of the document.
The numbers corresponding to the various tapestry segments can be referenced via this document. ArcMap provides some more advanced selection features over the TatukGIS viewer which provide for some better analysis, however. The original tapestry segments database is provided for your convenience.
The following data is included in the Tapestries_in_America.7z archive [22MB] (The archive is in the 7-zip compression format. You may download the archiving program at http://www.7-zip.org).
Tapestries_in_America.* - ZCTA basemap and Tapestry Segment dataset
Technology Tapestries.* - Shapefile corresponding to technology tapestries selection
Technology Yuppies in America.mxd – ArcMap Project file that contains all layers / selections.
Youth Tapestries.* - Shapefile corresponding to youth tapestries selection.
ZIP_CODE_TAPESTRIES.DBF – Database containing spidered data from ESRI’s own tapestry segments database.
I typed up the following just in case you were wondering about how I generated the data and about Technology Yuppies in America, and Exclusive Areas (that contain the best and brightest).
Part II : The Tapestry Database
The tapestry database that I started out with (and which I am including in this package) has the following format:
FILENAME: ZIP_CODE_TAPESTRIES.DBF
Name: ZIP_CODE Type: CHARACTER Length: 5
Description: The 5-Digit ZCTA corresponding to the three major tapestries.
Name: SEGMENT_1 Type: NUMERIC Length: 2
Description: Number corresponding to the first major tapestry of the area. Keep in mind that some areas only are attributed to one dominant tapestry, leaving zeros for SEGMENT_2, SEGMENT_3, or both.
Name: SEGMENT_2 Type: NUMERIC Length: 2
Description: Number corresponding to the second major tapestry of the area. If this field contains zero, it is an indication that SEGMENT_1 is the only dominant tapestry in the area (in this case SEGMENT_3 will also contain a zero).
Name: SEGMENT_3 Type: NUMERIC Length: 2
Description: Number corresponding to the third major tapestry of the area. If this field contains zero, it is an indication that the area contains either one or two dominant tapestries (SEGMENT_2 might contain a zero as well).
Name: CITY Type: CHARACTER Length: 28
Description: The city, or post office name corresponding to the area in question.
Name: STATE Type: CHARACTER Length: 2
Description: The state corresponding to the area in question.
Name: COUNTY Type: NUMERIC Length: 3
Description: The FIPS county code corresponding to the area in question.
Name: LATITUDE Type: NUMERIC Length: 10.6
Description: The latitude corresponding to the area in question. This is a valid X coordinate, and may be utilized as part of the “Add XY Data†process in ArcMap. However, since we have a ZCTA basemap, it is not recommended to add this data in this manner (rather, we will make a join to the basemap based upon zipcode).
Name: LONGITUDE Type: NUMERIC Length: 11.6
Description: The longitude corresponding to the area in question. This is a valid Y coordinate, and may be utilized as part of the “Add XY Data†process in ArcMap. However, since we have a ZCTA basemap, it is not recommended to add this data in this manner (rather, we will make a join to the basemap based upon zipcode).
This table is based off of the U.S. census 1999 Zip Codes table. The ZIP_CODE, COUNTY, LATITUDE, and LONGITUDE fields were copied from this table. The COUNTY, LATITUDE, and LONGITUDE fields were modified from CHARACTER types to NUMBERIC types so that they could be utilized natively by ArcMap.
Part III : Generating the Tapestries Shapefile (Already Included in Project) With Arcmap 9
Now that I had a table that could be related to zip code data, I searched for a basemap that was segmented by zip codes. It turns out that such a basemap is available through the 2000 Census. There are boundary files available for every state. Because there are 52 individual files to download, I used HTTrack to download the files in a spider-like fashion.
The next step was to append all of this state data together into one shapefile so that this data could be easily analyzed.
First, I started a new project in ArcMap and used the “Add Data†function to import all of the shapefiles. Then, opened the ArcToolbox and selected the Data Management Tools → Feature Class → Create Feature Class function. Then, in the Create Feature Class applet, I selected a directory as my Output Location, put in the name “ZCTA_Map†as the Output Feature Class, left the Geometry Type as POLYGON, and then added all of the state ZCTA shapefiles into the Template Feature Class queue. Clicking on the OK button processed all of the data and created a blank shapefile which would soon contain the appended shapefile data.
Second, I selected the Data Management Tools → General → Append function in the ArcToolbox. I selected all of the state shapefiles as the Input Data Element, and “ZCTA_MAP.shp†as the Output Data Element. After some time, I had my appended data file. I proceeded to remove unnecessary columns in the attribute table so as to make the table more efficient.
The final step was to join the basemap and zip code data and export the resulting dataset to a new shapefile.
I right clicked on the ZCTA_MAP shapefile in the DISPLAY view and selected Joins and Relates → Join. Then, I selected “Join Attributes from a table†under the What do you want to join from this layer? dropdown menu. Then, under #1 (Choose the field in this layer that the join will be based on), I selected “ZCTAâ€. Then, under #2 (Choose the table to join to this layer), I selected “ZIP_CODE_TAPESTRIESâ€. Under #3 (Choose the field in the table to base the join on), I selected “ZIP_CODEâ€. Finally, I clicked on the Advanced button and selected Keep only matching records and clicked OK twice.
I right clicked on the ZCTA_MAP shapefile and selected Data → Export Data. Under Export, I kept the default selection of “All Featuresâ€. I selected the radio button next to Use the same coordinate system as this layer’s source data. Under Output shapefile or feature class, I typed in “Tapestries_in_America.shpâ€. Finally, I clicked OK.
So, this is how I generated the Tapestries_in_America.shp file that is included in this project. From this, it is possible to perform different selection queries (Selection → Select by Attributes) based upon the different tapestry segments for each area.
Part IV : Technology Yuppies in America
I decided that it would only be appropriate if I demonstrated the utility of this dataset by using it to illustrate all of the technology yuppies in America by finding areas where both technology-related and youth-related tapestries are prevalent. I did this by generating layers based upon queries which selected the individual tapestries present in each area, then I combined these layers into two groups – youth tapestries and technology tapestries. Finally, I performed a spatial select on them to find where youth tapestries are segmented with technology tapestries.
I decided that the following tapestries represented youth tapestries:
• 19 – Milk and Cookies
• 28 – Aspiring Young Families
• 39 – Young and Restless
• 48 – Great Expectations
• 55 – College Towns
• 63 – Dorms to Diplomas
• 64 – City Commons
And I decided that the following tapestries represented technology tapestries:
• 13 – In Style
• 23 – Trendsetters
• 27 – Metro Renters
• 8 – Laptops & Lattes
• 9 – Urban Chic
After cleaning up (only leaving Tapestries_in_America in the Display view), I decided to generate layers based upon different tapestry queries.
For example, I generated the “Laptops & Lattes†selection layer by first selecting by attributes (Menu option Selection → Select by Attributes). I selected “Tapestries_in_America†as the Layer, “Create a new selection†as the Method, and then generated a SQL query such as the following:
SELECT * FROM tapestries_in_america WHERE:
“SEGMENT_1†= 9 OR “SEGMENT_2â€= 9 OR “SEGMENT_3†= 9
I then clicked OK, switched to the Selection view, right clicked on the Tapestries_in_America (X features selected) feature, and selected Create Layer from Selected Features.
After generating layers based upon all of the tapestry types relating to youth and technology, I performed a spatial select by selecting Selection → Select By Location. Under I want to I selected “select features fromâ€. Under the following layer(s), I placed a checkmark next to “Youth Tapestriesâ€. Under that, I selected “share a line segment withâ€. Under the features in this layer I selected “Technology Tapestriesâ€.
After saving this selection a layer, we now have a selection that represents Technology Yuppies in America.
Part V : “Exclusive†Areas
In order to generate the Exclusive Areas layer, I generated a selection using the following query:
"SEGMENT_1" IN (8,9,13,23,39,63,16) AND
"SEGMENT_2" IN (8,9,13,23,39,63,16) AND
"SEGMENT_3" IN (8,9,13,23,39,63,16)
Hopefully this document has already given you some ideas about how to proceed with utilizing this tapestry data in order to identify demographic segments of interest.
Part VI : Conclusion
Overall, I found the Technology Yuppies in America selection to be quite accurate from what I know about the state of Maryland. That is, areas in western Maryland near D.C., particularly Columbia, Laurel, and Fort Meade, are captured. In north-eastern Maryland there is a small, isolated area being captured, and that is no other than Aberdeen Proving Grounds (APG). Certain parts of Baltimore City are captured, as well as other notable areas, such as Reisterstown and Frederick. The eastern shore is also captured. This is, for the most part, how I pictured the demographic profile before beginning this project.
One may also notice that while VA has its share of captured areas, the state’s coverage is very sparse. This is most likely due to the fact that VA has a greater amount of older folks per zip code than Maryland.
Also, I was not surprised to find cities in CO, CA, VA, FL, NC, WA, and TX presented as some of the most exclusive (selected by the exclusive areas query).
One other thing that caught my eye was the lack of Laptops & Lattes segments across the Mid-Atlantic. Of course, D.C. seems to have plenty of areas like this, but this segment is pretty barren across the area as a whole. I was surprised, however, to find a small, almost isolated segment in PA called Haverford (19041) which is a “Laptops & Lattes†town. I decided to verify this finding by using Google.com and Googlism.com. Google (query: “Haverfordâ€) revealed that the township not only has its own page, but its own set of community forums and BLOGs. It also has an excellent education institution by the same name which apparently makes good use of technology though its courses. In addition, Googlism (query: “Haverfordâ€) revealed that “Haverford is a great example of a school making the most of modern technologyâ€. So, the theory behind the “Laptops and Lattes†concept appears to be accurate through these findings. I feel that this town may deserve a visit in my book, and hopefully, you may find more interesting places as well through the investigation of this data.

