Skip to the content.

Data Science Reflection #4 Scale by Geoff West

Provide three examples from Geoff West’s reading that illustrate how scale can be used to more fully understand human development as a complex and adapting social and economic system. Additionally, what did Geoff West have to say about the use of theory and big data? According to West, can a theory be relevant in the face of big data? Please provide an example.

In his book “Scale: The Universal Laws of Life and Death in Organisms, Cities and Companies”, Geoff West describes the nature of complex systems and explains how utilizing scale will lead to a more fuller understanding of human development as a complex and adapting social and economic system. Scaling is how a system responds when it’s size changes. In terms of systems, quantitative scaling describes how any measurable characteristic of plants, animals, ecosystems, companies, and cities can scale with size.

Early in, Geoff West explains something referred to as “generic laws”. A study on heart rate of different species was conducted. What was surprising was the fact that the number of heart beats was a consistent line ranging from species like mice(who have a very short lifespan) to whales (who have a very long lifespan). Noting the fact that all organisms have a certain sense of underlying order between them, Geoff discusses other examples where scaling is utilized. With medicine, certain drugs and tests are conducted on mice. By scaling up the result of these tests, it’s possible to discover how this medicine may affect human beings. This is just one way scaling has expanded the realm of medicine.

Geoff describes cities as engines for creativity and hubs for people to interact and make ideas. How cities react to scaling was a major talking point throughout the course of this book. When looking at developed countries such as the USA, infrastructure in urban areas shifts in a unique way. Less infrastructure is needed per capita in a bigger city. So for example, as the population increases in a city, the amount of gas stations per capita will decrease. Less infrastructure proportionally to the population means an increase in efficiency. This is contrary to biological systems which are scaled sublinearly with size. Also when we scale cities, socioeconomic factors will scale superlinearly. For example, the number of patents produced will increase as we increase the population size. But we have to remember that an increase in size means that we have to take the good and the bad together. While things like # of patents produced and businesses will increase, crime rate and disease will also rise as a result.

Companies are the third major group that Geoff describes to express how scaling uncovers these underlying principles. Similar to biological organisms, companies scale as singer power laws. Hence, we can use sublinear scaling to make an educated guess about the characteristics of a larger company based on the characteristics of a smaller one or vice versa. Geoff describes companies like Walmart and Exxon, which have half a trillion in revenue, and notes that they are scaled up versions of companies who generate under ten million in revenue. In one of the given plots from this book, the scaling exponent for organisms was 0.75 and the scaling exponent of companies was 0.9. Because the scale of companies is so similar to that of biological organisms, it’s also safe to assume that companies have mortality rates. Companies may seem stable initially but the majority of them wane over the years until they’re eventually gone. This is due to the fact that companies become less dimensional as they continue to grow.

At the end of his book, Geof West defends the use of big data. However, he claims that more data will only be better if guided by a conceptual framework. Chris Anderson claims that models and the way we typically conduct tests are dead because of the introduction to big data. Geoff responded to this claim with the idea that correlation does not supersede causality. While correlation may have some indication of a specific result, direct causality can be hard to link. For example, between 1999 and 2010, the variation for the amount of money spent on science/technology was almost the exact variation in # of suicides. One cannot read this data and make causality because that’s just crazy. We need a model. The large hadron collider (LHC) was a particle accelerator being used to try and discover the Higgs particle. This machine was BIG. It had 60 million collisions each second and extracted 150 exabytes of data per day. Now simply using the data to find the particle would be almost impossible considering they have to focus on .00001 percent of the data stream. But with a model telling them to not look at the debris of the collision and to target specific areas, they were able to find the Higgs particle. Using a theory with LHC algorithms is what confirmed this finding. These results support Geoff’s notions that scientific models are indeed not dead, and they will continue to be utilized in the face of big data.