What we do
We manage a trillion edge graph that connects together anonymized identifiers for consumers and their devices.
Edge Ingestion and Partitioning Framework
We could never process all trillion edges at once and luckily we don’t have to. Instead, we process subgraphs that contain specific types of edges. Our edge ingestion and partitioning framework manages different Hadoop datastores for different types of edges and automates the ingestion of new edge data. It leverages LiveRamp’s Seek MSJ framework to efficiently incorporate new data into existing edge stores.
Path Computation as a Service
We provide a service to other LiveRamp engineering teams for finding specific types of paths within our massive graph. It handles 20,000 requests a day and this is possible due to its use of caching and intelligently batching similar requests together.
We currently use a 79,800 core Hadoop cluster with 90 PB of disk space and 256 TB of RAM (shared across all of LiveRamp data engineering) to power our systems. We’re also in the process of moving everything to GCP. We develop in Java and use MapReduce, Giraph, and Spark. We’re always open to new technologies and languages if they help us better solve a problem. Let us know if there's something we should be looking into.
Identity Engineering Blog