SF Bay Area's Regional GTFS Feed Expanded



Earlier this year, Interline and the Metropolitan Transportation Commission released the Regional GTFS Feed for the San Francisco Bay Area. The Regional GTFS Feed is produced on a daily basis and made available through 511 SF Bay's Open Data Portal and its Open Transit Data API.

We're pleased to now share a series of additions to the Regional GTFS Feed that we've released together over recent months:

Historical Feeds

Since we released the Regional GTFS Feed in January, much has changed across the Bay Area. Transit agencies and their dedicated staff have rapidly reduced and re-targeted transit service for those riders who perform essential work or depend upon transit for their necessary journeys.

Using the newly released Historical Feeds component of the Regional GTFS Feed, we can visualize how transit service has changed throughout the Bay Area from January to July of 2020:

Animated series of maps showing bus, train, and ferry route lines throughout the Bay Area with the lines given different weights to show which routes provide the most frequent service
Buses, trains, and ferries throughout the San Francisco Bay Area color-coded by frequency: the reddest route lines are routes that operate at least every 10 minutes; medium hues of red are 10 - 20 minute headways; the lightest hue of red are routes with service less than once every 20 minutes. This animation is created by using the Historical Feeds for the months of January through July of 2020.

Historical Regional Feed products are fully valid GTFS feeds, but they differ somewhat in their contents from the daily Regional Feed products. Read on to understand the process used to produce the Historical Regional Feed products and their key differences, or skip to the end of this section to download Historical Feed products.

Slicing regional feeds

Each day, the Regional Feed is produced from the versions of agency feeds on 511.org that provide the best view of service on that day. Each month, the Historical Regional Feed creation process takes these Regional Feeds and combines them together, taking one day of service from each feed, which we are calling a "slice."

For example:

Feed filenamePublishedContributes service slice for
mtc-regional-feed-2020-04-24.zip2020-04-242020-04-24
mtc-regional-feed-2020-04-23.zip2020-04-232020-04-23
mtc-regional-feed-2020-04-22.zip2020-04-222020-04-22
mtc-regional-feed-2020-04-21.zip2020-04-212020-04-21
mtc-regional-feed-2020-04-20.zip2020-04-202020-04-20
If the Regional Feed for a given day is missing, the closest previous day provides service. For instance, if 2020-04-22 was missing, the 2020-04-21 feed slice would cover both 2020-04-21 and 2020-04-22.

Global entity copying

Agencies, stops, and routes are considered "global", and are handled using a simple ID-based merge with the most recent version winning. For example, if BART has a route with ID "OR-S" that is called "Richmond - Warm Springs", but then later renames it to "Richmond to Warm Springs", then the latter version will be used.

Trip hashing, comparison, and copying

Trips are more complicated and handled separately. A simple combining of all the trips and stop_times in all of the input files can easily create a GTFS feed that is too large for practical use, especially given that programs like OpenTripPlanner need to hold the entire schedule in memory. Therefore, duplicate copies of trips are detected using a hash based approach and only copied to the output once. This reduces the output size by approximately 90%.

For example, here are three hypothetical versions of Trip ID "BA:2210503" from three consecutive days of input regional feeds.

Feed filenameTrip IDRoute IDService IDHeadsign1st stop2nd stopn stopsHash
2020-04-24.zipBA:2210503BA:OR-SBA:Wkd_BASE-Weekday-07Warm Springs/South FremontRICH 5:03DELN 5:07PLZA 5:10...8c4ecb
2020-04-23.zipBA:2210503BA:OR-SBA:Wkd_BASE-Weekday-07Warm Springs/South FremontRICH 5:03DELN 5:07PLZA 5:10...8c4ecb
2020-04-22.zipBA:2210503BA:OR-SBA:Wkd_BASE-Weekday-07Warm Springs to South FremontRICH 5:04DELN 5:06PLZA 5:12...a4bf1a

For each of these, the hashing function takes into account all trip attributes, all the calendar attributes for that trip, and the full details of each entry in stop_times.txt. Any change in any field will result in a different hash. This allows us to directly compare trips between versions of the input feed. Above, all details and schedule for 2020-04-23 and 2020-04-24 match exactly, so these trips two will be considered identical. The trip for 2020-04-22 has some minor differences in name and schedule, so will generate a different hash, and be considered a different trip.

As the historical feed merging program processes each input feed, it calculates the hash of each trip in the feed. If it has not seen a trip before, it copies it to the output and notes the hash for future use. If it has been seen before, it is not copied again. To prevent clashes, the original Trip IDs are appended with the trip hash (e.g. BA:2210503 -> BA:2210503:8c4ecb). The merging program then takes all trips in the input feed (both seen and unseen) and examines the calendars to see which are active for each day in this slice, and then creates calendar_dates.txt entries for each trip on each day where that trip is scheduled to run. The original service IDs are changed to be the same as the hash appended Trip ID, and the calendar is unrolled into a day-by-day format, but it works reliably. This hashing approach is resource efficient and allows us to create historical feeds of arbitrary duration while minimizing the output size.

Example output calendar_dates.txt:

Service IDDateException Type
BA:2210503:8c4ecb2020-04-241 (Added)
BA:2210503:8c4ecb2020-04-231 (Added)
BA:2210503:a4bf1a2020-04-221 (Added)

In this way, the 8c4ecb version of the trip is scheduled to run on the two days of input data where it was seen, and the a4bf1a version is scheduled to run on the other day.

Differences between Regional and Historic feeds

Historic Regional Feeds are equivalent to the original daily Regional Feeds in the stops, routes, and scheduled services they contain. Using a Historic will produce the same output in a routing engine or another type of analysis.

Historic Feeds are different from Regional Feeds in their specific GTFS structure:

  • calendars.txt records are removed and rewritten in calendar_dates.txt
  • trips.txt records are hashed and compared (as described above)
  • IDs for global records are namespaced (as described above)

These differences should not affect routing engine or similar types of analysis. However, keep these differences in mind if you are trying to use historical feeds to understand changes in GTFS data and its practices over time at Bay Area agencies.

Download Historical Feed products

To use the Historical Feed products:

  1. Sign up for a 511 Open Data API token
  2. Download from http://api.511.org/transit/datafeeds?api_key=[your_key]&operator_id=RG&historic=YYYY-MM (for example, to request May 2020: historic=2020-05)
  3. At the start of each month, the Historic Feed for the last month is created and posted. You may download as many months as you wish and combine them to analyze as many months/quarters/years as you wish at once.

As of this blog post, the following months are available for download (that is, you can you any of these values for the historic query parameter):

  • 2020-01
  • 2020-02
  • 2020-03
  • 2020-04
  • 2020-05
  • 2020-06
  • 2020-07

Station Pathways and Levels

The Regional Feed exists to both merge together individual agency GTFS feeds and to serve as a home for new GTFS data that describes cross-agency conditions. The daily Regional Feed now comes with layouts for 35 key transit stations across the Bay Area, each of which serves multiple agencies. Using the newly added pathways.txt and levels.txt files in GTFS, Interline and MTC are now able to provide more detailed information about how to transfer from one agency to another. This includes information on elevators, escalators, and routes that may not be accessible to those in wheelchairs. Pathways and levels information will equip trip planning apps to provide more helpful wayfinding information to their users, particular those who are new to the Bay Area or who have limited vision.

Animated views of pathways and levels through the University Ave/Downtown Palo Alto station served by Caltrain regional rail and buses
Pathways and levels throughout the University Ave/Downtown Palo Alto station served by Caltrain regional rail on two platforms and VTA and SamTrans buses at bus bays.

35 Regional Transit Hubs

Here is a list of the 35 stations currently in the Regional Feed:

Transit station/hubCounty
Embarcadero BARTSan Francisco
Montgomery BARTSan Francisco
Caltrain Station 4th & KingSan Francisco
Salesforce Transit CenterSan Francisco
12th St Oakland City Center BARTAlameda
El Cerrito Del Norte BARTContra Costa
19TH St Oakland BARTAlameda
Powell ST BARTSan Francisco
Civic Center BARTSan Francisco
Walnut Creek BARTContra Costa
Richmond BART/AmtrakContra Costa
San Jose Diridon StationSanta Clara
Palo Alto StationSanta Clara
San Rafael Transit CenterMarin
Millbrae BARTSan Mateo
Pleasant Hill BARTContra Costa
San Francisco Ferry TerminalSan Francisco
Daly City BARTSan Francisco
Santa Rosa Transit MallSonoma
Union City BARTAlameda
MacArthur BARTAlameda
Dublin/ Pleasanton BARTAlameda
Warm Springs/ South Fremont BARTAlameda
Santa Clara CaltrainSanta Clara
Oakland Coliseum BARTAlameda
SFOSan Francisco
Fairfield Transportation CenterSolano
OAKAlameda
Petaluma Transit MallMarin
Vallejo Ferry TerminalSolano
Mountain View StationSanta Clara
Great AmericaSanta Clara
Napa IntermodalNapa
SJCSanta Clara
Great Mall/Milpitas BARTSanta Clara

Note that just as the ongoing pandemic has made our work on the Historic Feed all the more relevant, our work on Station Pathways and Levels has had to adapt to current circumstances. Our staff have been working from home using aerial imagery, architectural drawings, station maps, and other materials. Interline's Station Editor tool also works on tablets, and we look forward to again going out into the field to all of these stations to correct and improve details.

We welcome questions and corrections sent to the 511SFBayDeveloperResources mailing list.

Using Station Pathways and Levels

Download the daily Regional Feed like so:

  1. Sign up for a 511 Open Data API token
  2. Download from http://api.511.org/transit/datafeeds?api_key=[your_key]&operator_id=RG
  3. Look for the pathways.txt and levels.txt files inside the zip archive.
  4. For more information, see the static GTFS documentation and the GTFS-Pathways extension proposal document.

Fares and Transfer Discounts

To date, the Regional GTFS Feed has focused on how to plan a journey by transit. Now we're curating and adding additional data to help riders understand how to pay for their journeys. The Bay Area's Regional GTFS Feed is now the first in the world to include data using the GTFS Fares-v2 specification. This newly proposed specification allows us to model the many different fare products and discounts that are available to the Bay Area's transit riders. For example, we can now account for how some agencies provide riders a discount when they pay by Clipper Card rather than by cash. (Clipper Card is the contact-less payment card and system operated by MTC and available on buses, trains, and ferries throughout the Bay Area.) We can also capture how riders can receive a discount when transferring from certain agencies to other agencies. Read on for a detailed example of how Fares-v2 work in practice, or skip ahead to use the Fares-v2 beta data.

An Example of Fares-v2 in Practice

map of a journey by BART and AC Transit originating in San Francisco, transferring in Oakland, and ending in Alameda
A journey by BART and AC Transit, from San Francisco to Oakland to Alameda, which we will use in the following example of how to calculate the total cost. (Credit: BART Trip Planner)

Fares-v2 expands the traditional route- and zone-based GTFS fares model with several additional files, each focused on modeling a different part of a complex fare scheme. Imagine a rider making a transfer from BART to AC Transit; both BART and AC Transit have different prices for adult fares, eligible discount fares, and different prices when paying with cash or when using a Clipper Card; additionally, the discount applied when transferring from BART to AC Transit has different values and rules when using Clipper. Calculating the individual fares for each leg of this trip requires a description of any zones that apply to routes and stops (fare_networks.txt and fare_areas.txt), the costs and requirements of the base fare for each leg (fare_leg_rules.txt), any discount categories that might include the rider (fare_profiles.txt), and the type of payment used by the rider (fare_containers.txt). All of these factors are considered and then used to search for any transfer discounts that may apply for the trip (fare_transfer_rules.txt).

Let's work through the hypothetical trip above, first taking BART from Embarcadero Station to 12th St. in Oakland, and then riding an AC Local bus, 51A, to Alameda.

The following is an except from fare_leg_rules.txt with some columns removed for brevity. This file contains sets of rules that are matched against a single leg of a trip, without regard to transfers. The columns from_area_id, to_area_id, fare_container_id and fare_category_id references values defined in the other files mentioned above.

orderleg_group_idfrom_area_idto_area_idamountfare_container_idfare_category_idnotes
90BA:EMBR12TH1.35clipperBA:Senior/Disabled ClipperEmbarcadero to 12th with Clipper and discount
100BA:EMBR12TH3.70clipperEmbarcadero to 12th with Clipper
110BA:EMBR12TH4.20Embarcadero to 12th with cash surcharge

The BART leg would match the first the rules above based on from_area_id and to_area_id, with fares of $1.35, $3.70, and $4.20. However, the first two rules are only available when using a Clipper Card (fare_container_id), and the first of these is only available to riders eligible for a discounted fare (fare_category_id). The order field is used to select the correct fare when multiple rules match the trip leg: any applicable discounted Clipper fare would match first, followed by adult Clipper fare, then finally the fare including additional charge for cash riders. In the event two (or more) rules match with the same order value, both are considered valid options to present to the rider.

orderleg_group_idfrom_area_idto_area_idamountfare_container_idfare_category_idnotes
90BA:EMBR12TH1.35clipperBA:Senior/Disabled ClipperEmbarcadero to 12th with Clipper and discount
100BA:EMBR12TH3.70clipperEmbarcadero to 12th with Clipper
110BA:EMBR12TH4.20Embarcadero to 12th with cash surcharge

The AC Transit fare rules are similar, with the addition of a few extra columns. The 51A is an East Bay local only route (to_area_id is assumed), and matches each rule, and the rules for discounts and cash are applied as above. The fare_product_id field describes additional rules that are available to users of certain prepaid products, such as day passes and monthly passes. In this case, a rider with an Adult Local 31 Day Pass ($84, described in fare_products.txt) would enjoy a free ride. More complex AC Transit fares use additional columns not pictured here (fare_network_id) and handle cases such as riding a Transbay bus and the different prices applied when riding within the East Bay vs. taking the trip all the way to the Salesforce Transit Center in downtown San Francisco.

Any applicable transfers are calculated by reading the rules in fare_transfer_rules.txt and matching each leg of the trip against each subsequent leg of the trip. This can create quite complicated models, but let's start by looking at our BART and AC Transit legs. The clever bit is that many fare_leg_rules.txt rules can share the same leg_group_id, which simplifies the number and types of transfer rules that must be defined. For example, all of the BART rules above use BA: and all of the AC Transit rules use AC:local. The table below is an except of fare_transfer_rules.txt.

orderfrom_leg_group_idto_leg_group_idfare_container_idamountduration_limitduration_limit_typefare_transfer_typespanning_limitnotes
100BA:AC:localclipper-0.5902121 credit of $-0.50 to AC w/in 90 minutes
110BA:AC:local-0.2513Cash: 2 fare credits of $0.25 each

Two different discounts apply when taking an AC Transit leg after a BART leg. When using Clipper, a $0.50 (amount=-0.5, fare_transfer_type=1) discount is applied to the first AC Transit leg (spanning_limit=2, which means it can only match a subjourney of two legs). Alternatively, a paper transfer slip given to cash users contains two tabs, each of which applies a $0.25 discount to the AC Transit cash fare (spanning_limit=3, or a subjourney of up to three legs). Additionally, the Clipper discount expires within 90 minutes after tagging off BART (duration_limit=90, duration_limit_type=2). Because the order value is different, these two transfer options are mutually exclusive to Clipper and cash riders respectively.

The combination of matching fare rules and fare transfer rules allows describing even complicated transfers, such as discounts to SamTrans local bus riders who hold a Caltrain 2 zone or higher monthly pass, the several categories of upgrade charges when transfering from an AC Transit Local to an AC Transit Transbay bus, and the many varied products and transfers that apply full or partial fare to trips on the Dumbarton Express.

Agencies in the Beta Release

Note that this is a "beta" release. The GTFS Fares-v2 specification is still being finalized. It does not fully capture some functionality, like how AC Transit and VTA use "fare capping" so that riders never have to pay more in one day than the cost of a day pass. We also expect that as trip planners and other apps begin to consume the Regional Feed's fare data that we may need to revise some of the data or our schema.

To our knowledge, Interline is the only organization to have a "rules engine" that can calculate journey costs using GTFS Fares-v2 data. We expect that as other organizations adopt the Interline fares rules engine or build their own equivalents, we may need to revisit some of the assumptions built into the engine.

The beta release of data includes fares and transfer discounts within and between eight agencies:

  • BART
  • SFMTA
  • AC Transit
  • SamTrans
  • VTA
  • Caltrain
  • Golden Gate Transit (bus only; not ferries)
  • Dumbarton Express

This list includes the seven largest agencies and a key connector across San Francisco Bay (Dumbarton Express). It should provide a broad enough sample of fares and transfer discounts to power a wide range of potential applications. This sample is also deep enough to inform analyses of fares, the cost of riding transit, and ideally even ways to reform fares and transfer discounts to produce more equitable outcomes for Bay Area transit riders.

Using Fares and Transfer Discounts

Download the daily Regional Feed like so:

  1. Sign up for a 511 Open Data API token
  2. Download from http://api.511.org/transit/datafeeds?api_key=[your_key]&operator_id=RG
  3. For more information about the fares and transfer discount files in the GTFS feed, see the GTFS-Fares v2 proposal document.

Interline and MTC expect to improve both the Fares-v2 beta data and schemas as we learn more. We welcome questions and corrections sent to the 511SFBayDeveloperResources mailing list.

Public Transit in 2020

A final note that behind every piece of transit data and every multi-modal trip plan are real people who drive vehicles, serve riders, and maintain equipment and facilities. Thank you to all the front-line workers keeping the Bay Area's public transit systems and riders moving safely!

Acknowledgements

Credit and many thanks to project team members including Ian Rees and Ruth Miller (Interline), Nisar Kapeel and Kapeel Daryani (MTC), and our partners at Bay Area transit agencies.

Originally posted to the Interline blog on August 10, 2020 by: