MSAN 622
Information Visualization

Final Project


By Monica Meyer

See Code


Visualization


Category Scores for Each Neighborhood


Discussion


Techniques

The map encodes 6 columns of the data which are used to color each neighborhood in the map. The columns are chosen via a drop down menu, allowing the user to choose between crime, overcrowding, poverty, education, employment and community score to visualize on the map. Whichever column is chosen is then shown on the map via shading of neighborhoods by the metric's density. There is also a tooltip to show details on demand so that the user can see the value for each neighborhood. One issue that came up that affects the lie factor is the fact that the data had separate rows for Mission Bay and South of Market, while the map does not have separate neighborhood boundaries for these. In this case, this increases the lie factor because the values being displayed for South of Market in the map only includes a portion of the overall land area being displayed. I tried finding a better shape file to use in the creation of this topojson, but was unable to. The data ink ratio is very high, and the data density ratio is also high, since very little of the area is not dedicated to displaying data. The map identifies clusters of neighborhoods with similar values as well as providing overall context for the other visualizations in terms of locations that are close together or often have similar trends to the data.

The bubble chart encodes 4 columns of the data, percent area in high heat vulnerability zones on the y-axis, percent impervious surface on the x-axis, percent area within .25 miles of a contamination risk as the bubble size and overall environmental resiliency score as the color. There is a tooltip implemented that allows the user to view all values for the neighborhood that is hovered over. The life factor is affected by the fact that bubble size starts at radius of 5 pixels, which is encoded as 0%. The data ink ratio is high, although the legend does affect the amount of ink being used strictly to display the data. The data density ratio is quite low, since the bubbles do not cover very much of the area being displayed in the chart. The bubble chart allows the user to see the pattern in the data between the x and y axis. It also allows the user to see that Treasure Island is an outlier in terms of proximity to contaminated areas, since the circle is larger than neighborhoods near it on the x and y axis.

For the bar chart, there are 5 columns of the data being encoded as bar size, percent people living alone, residential violation rate, percent households paying rent over 50% of their monthly income, preventable hospitalization rate and average active minutes per resident per day. Resiliency score is also encoded as the color scale for all the bars. The data ink ratio is very high and the data density ratio is not high, but not very low either. There is nothing affecting the lie factor, since every bar starts at zero. This visualization is very good at showing an overview of multiple columns of the data. It also allows the user to see trends between neighborhoods with similar overall resiliency scores (encoded as color) because the interactivity allows the user to focus on neighborhoods with the same score while fading the other neighborhoods. I chose to do small multiples bar charts since a single bar chart only allows us to see one metric, but bar charts are simple enough that we can show multiple bar charts without losing the detail of each individual column of the data being displayed.

The parallel coordinates chart encodes resiliency score as color and then encodes 8 category scores, the overall resiliency rank and the district the neighborhood belongs to as the axes. I chose to use the same color here as was encoded as color for the small multiples bar chart for consistency (both colors represent the neighborhood's overall resiliency score). This chart has low lie factor, low data density and high data ink ratio. This visualization is the best overview of the data. It allows the user insight into the overall trends across the main categories of the data, without bogging us down in the over 50 columns of data in the dataset. It allows the user to brush to see patterns and trends in the data, such as the fact that neighborhoods that score high in transportation score low in housing.

Interactivity

There is interactivity on each chart as well as inter-chart interactivity between the map and bubble chart. The map chart allows a user to filter the data to show a metric of their choice on the map. I implemented a tooltip on the map for the user to see what the value is for whichever metric is currently chosen. This interactivity provides context for the shading of the neighborhoods. For both the map and the bubble chart, when a neighborhood is clicked on, it fades all other neighborhoods on the map and bubble chart, allowing the user to view both charts within the context of the neighborhood they are interested in.

In the small multiples bar chart, on mouseover, the graphic fades neighborhoods which do not have the same resiliency score, to highlight how different columns in the data affect the overall resiliency of a neighborhood. There is also a tooltip to provide details on demand for all the columns that are shown in the small multiples. This gives the user a better overall view of the bar charts.

The parallel coordinates chart allows for brushing and panning across each of the x-axis. This provides focus + context, since when an axis is brushed, the lines that are not brushed are faded into the background, allowing the user to view specific aspects of the data without losing the overall context from all of the data being shown.

Feedback

In class, I demonstrated a prototype which did not contain any data on the map, but which had implemented a bubble chart, small multiples bar chart and a parallel coordinates chart. The feedback that I found particularly helpful was in regards to color. The colors of the small multiples bar chart were originally too difficult to compare, since the range was a multi-hue color scale from colorbrewer. It was also suggested that I use the same color scale (and by design, the same data column for the color) for both the small multiples and parallel coordinates chart, to tie them together in some way. These comments were very useful because they allowed me to find a way to connect the last two charts without adding interactivity between the charts, which did not make sense for the visualizations.

Another suggestion was to change the interactivity I had implemented in the parallel coordinates chart. Originally, I had a hover implemented which highlighted the line (neighborhood) that was hovered over, but I changed this to allow the user to brush the axes instead. This allows the user to still view one neighborhood at a time by brushing on the ordinal axis of neighborhood, but also allows them to brush other axes to see more than one neighborhood at a time. There were not any particular pieces of feedback that I did not agree with, because all of these changes enhanced my visualizations.

Challenges

The biggest challenges I encountered were in my implementation of the map visualization. The other three visualizations were methods I had used previously, so the challenges I encountered there were when trying to make smaller changes. With the map, I started with a shapefile of the San Francisco neighborhoods. I translated that to a geojson format and then to topojson, but realized quickly that that was not the correct way to go about this. I needed an id feature that could then be used to map my rows of data to the neighborhood being referenced. The topojson allowed for a field to be the id, but some of the neighborhoods have more than one word and javascript does not allow for attributes such as id's to have spaces in them. I thus created an id field without spaces that could still be mapped to an string with spaces for the tooltips to show the neighborhood correctly. I also combined my data into the topojson, so that I would not have to reference multiple d3.json calls in my code. Instead, the topojson had two objects, one which was the map boundaries and one which contained all the data needed to shade in the neighborhoods. The next challenging aspect of implementing the map was to shade in neighborhoods correctly based on the data. I wanted to implement a drop down that allowed a user to select which metric to view on the map, and since each of these metrics had different domains, I had to create different maps from the id to the value to determine what color a given neighborhood would be shaded. I am sure there is a more efficient way of doing this, but I created 6 color scales and 6 map objects that could be used to map id to value to color. Then, I added some if/then statements to tell which map and which color scale to use depending on which metric was chosen from the drop down menu.

One challenge with the bubble chart was in creating the bubble size legend. I had already created a color legend in a previous homework, so I knew how to do that, but was unsure of how to create a legend to explain bubble size. I started out with drawing the circles below each other, but had issues with the spacing between bubbles. Thus, I decided to put the bubbles on top of one another, with the bottom of each circle aligned together. This ended up being much less space consuming which increased the data ink ratio, and it actually allowed for better comparison between the sizes.

I spent quite a bit of time trying to make the transition between the fading in and out of the bar charts smoother, but this caused issues when switching between bars to hover over. Since the transition may not be finished if the user switches bars being hovered over too quickly, it caused an issue where all the bars would fade out instead of just ones not being hovered over. Instead, I added a small delay between the fade in and out on the hover so that it would not be quite as quick (avoiding change blindness).

Conclusion

There are some interesting conclusions that can be made from the visualizations of this dataset. Crime, overcrowding and poverty tend to be higher in the more densly populated regions as can be seen on the map. In the bubble chart, we can see that while Treasure Island has lower percent impervious surface and area in high heat vulnerability zones, because it has such a high area near contamination risks, its enviromental resiliency score is quite low.

The bar chart is quite informative, it allows us to see that areas with a higher percent of people living above 50% of their monthly income also tend to have more residential violations and more people living alone. There is a rather large amount of preventable hospitalizations in Bayview which also has a lower average number of active minutes per resident per day, possibly a contributing factor to the hospitalizations. The places that tend to be more expensive to live in are lower on the resiliency score. This leads me to believe that the effect of living beyond one's means could be a contributing factor in being unable to adapt to the health consequences that the climate change will bring.

I added the district axis to the parallel chart because I thought it was interesting to see how neighborhoods in the same district tended to have similar overall resiliency scores. Similarly, they tend to be fairly similar in the other categories as well. This is something we can understand, but to explicitly see it in the data confirms suspicions that residential areas may be similar but quite different than more densly populated, urban areas. This chart allows us to see the overall trends in the data and make more informed decisions about which neighborhoods may be better to live in. The data is originally meant to allow us to see which neighborhoods are most resilient to the climate changes and health issues these will bring, but this last chart may also be used to see what neighborhoods may be better to live in. We can see that neighborhoods with higher transportation scores have lower housing scores. This means that if someone wants nicer living situations, they may sacrifice quicker access to transportation. One especially interesting aspect of this chart is to look at the hazard column. Neighborhoods that score 4 or 5 are more resilient in this category, which means that there are less risks due to things such as contamination and toxic air rates.