Great user experience for a bike sharing system has many components, including station location, ease of use, bike quality and availability. Specifically to bike operation and availability, two real factors apply:
- the start station with adequate bikes (influenced by a. total bikes in use, and b. bike distribution)
- the end station has adequate docks (influenced by a. total bike-to-dock ratio, and b. bike distribution)
The focus of today’s analysis will be on how bike sharing user experience changes with increases in total bikes in use (up to peak usage), while also considering the bike distribution among stations (as some stations tend to have a net-positive or net-negative flow which requires re-balancing).
Using data from Capital Bikeshare in Washington, DC (a large, urban bike sharing system with over 1500 bikes and 165 stations), we can begin to identify a case for maximum bike usage - both under an equal distribution of bikes, as well as under a more likely distribution of bikes (considering the balance under a natural station flow). The dataset used will be used from 1222 bikes (2443 docks) with 147 stations in the final two weeks of Q1 2012. The average number of bikes is determined by halving the number of total docks available, as we know that Capital Bikeshare uses a 50% bike-to-dock ratio. Based on this information, we can generate a model that simulates how many bikes the average station contains when x bikes are in use (i.e. not at any station).
Here’s how the graph looks when no bikes are in use – the minimum number of docks at a station is 10, while the maximum is 39 (and will first model them at half capacity):
Next, we’ll use this as a baseline and steadily increase the number of bikes in use, decreasing bikes from stations at an equal rate. By maintaining our first goal of user experience, that the start station has adequate bikes, we must not let the peak rise beyond a level where the smallest stations no longer have bikes. For Capital Bikeshare, the smallest station is 10 docks; as it is an outlier, we’ll optimize for the many stations that have 11 docks:
Once we reach 662 bikes in use, we now see that stations with 10 or 11 bike capacity have less than 1 bike per station, on average – a total of 44 stations without bikes. Although the system still has 560 bikes available, there’s now a possibility that a new user will come to a station with no bikes available.
The above “average” case needs to also be viewed as the best-case scenario: bikes are evenly distributed, and new users check out bikes at an equal rate among all stations. However, bike sharing system rebalancers know better – some stations tend to always have a surplus, and some that tend to have few bikes, if any. We can model this by showing the total in/out data from the Capital Bikeshare system for the same period (knowing that any rebalancing does not reflect in this data):
As you can see, the graph of station flows are roughly 30% on each end that require rebalancing (+/-), and about 40% that mostly rebalance themselves. Interestingly, some stations have opposite trends on weekdays vs. weekends.
The modeling difficulty now: how can we apply this flow data to our static model of averages? My method is to take these measured flows (two week period) and divide them into a 12-hour average (which could be tweaked for more/less effect), which I then add to my baseline. Here’s the new graph, sorted by bikes available under typical use, starting at zero bikes:
Even with no bikes in use, some stations are empty or near empty. Using this new baseline, now we’ll assume the same previous number of bikes in use (our potential optimal maximum), 661:
With 661 bikes in use, we now show 40 stations without bikes - as compared to the nominal model, which only found one station without bikes. Some stations are deeply negative, which suggests even more demand for bikes (i.e. unhappy users who can’t check out a bike at the station they want).
A system operator must understand the potential trade-offs between increasing levels of ridership and the cost/ability to rebalance. This last graph helps answer that question, with its output being the number of stations that have less than one bike as the concurrent riders increase:
Using this data, a system operator can then make educated decisions for a target peak ridership (in order to maximize revenue) and what level of service customers should expect at peak system use. The cost of having peak bike usage less-than-optimal will be to forgo revenue or reduce community benefit; the cost of higher-than-optimal usage being increased or unmanageable rebalancing, or degraded user experience for the typical customer.
Read the first article in this series: bike sharing usage patterns in Washington DC’s Capital Bikeshare