• LiveRamp Identity Engineering

    Processing a trillion edge graph as quickly and efficiently as possible

  • What we do

    We manage a trillion edge graph that connects together anonymized identifiers for consumers and their devices.

    Graph paths

    Pregel Graph Computer

    This system finds relevant graph paths using the pregel graph computation framework as implemented in Apache Giraph. There are challenges in running Giraph at the scale of our graph and we’re constantly looking to refine our Pregel algorithms.

    Edge Ingestion and Partitioning Framework

    We could never process all trillion edges at once and luckily we don’t have to. Instead, we process subgraphs that contain specific types of edges. Our edge ingestion and partitioning framework manages different Hadoop datastores for different types of edges and automates the ingestion of new edge data. It leverages LiveRamp’s Seek MSJ framework to efficiently incorporate new data into existing edge stores.

    Path Computation as a Service

    We provide a service to other LiveRamp engineering teams for finding specific types of paths within our massive graph. It handles 20,000 requests a day and this is possible due to its use of caching and intelligently batching similar requests together.

    Technologies

    We currently use a 79,800 core Hadoop cluster with 90 PB of disk space and 256 TB of RAM (shared across all of LiveRamp data engineering) to power our systems. We’re also in the process of moving everything to GCP. We develop in Java and use MapReduce, Giraph, and Spark. We’re always open to new technologies and languages if they help us better solve a problem. Let us know if there's something we should be looking into.

  • Meet Our Team

    Data Engineer

    Engineering Lead

    Data Engineer

    Engineering Lead

    Data Engineer

    Data Engineer

    Data Engineer

    Engineering Lead

    Data Engineer

  • Identity Engineering Blog

    Introduction LiveRamp receives thousands of large files each day from our customers and we need column type configuration to know how to interpret these files. For many files, we expect them to conform to an existing configuration. For others we need to auto-detect the type of data within the...
    We do a lot of work with massive graphs at LiveRamp. In this post I’ll share the story of how we analyze one such massive graph and discuss the Hadoop-based technology we use to efficiently perform this analysis in real-time. Note, this is only one of several massive graphs that we work with and...
    Technology today has made it really easy to create your own website. Through open-source content management systems like Wordpress, you are ready to publish your ingenious blogs or demonstrate your amazing products with just a few clicks. Meanwhile, thanks to Google Analytics, you can easily...
    More Posts
  • We're hiring!

    • Develop distributed data processing workflows for managing our massive graph.
    • Help us migrate our systems to GCP.
    • Embrace and introduce technology innovations to keep LiveRamp on the cutting edge of industry tools and best practices.
    • Push your teammates to become stronger engineers. They will do the same for you.
    All Posts
    ×