Innovating on IBM’s Data Lineage

This project refactored a 10 year old feature and updated it to better capture the complexities and nuances of data models. It added much needed ergonomics to the experience that would allow users to more clearly understand how data was being used across multiple databases and AI models.

Product
  • Watson Knowledge Catalog is a data catalog used by knowledge workers to find data for use in business intelligence reports and AI models.

Business Problems
  • Data lineage is a core tool used for regulatory compliance because it maps how data is used across databases, and IBM’s existing solution was no longer competitive.

Personas
  • Data Engineers & Analysts → They want to understand where data has come from, what it impacts, and what it is. They will use data lineage to generate compliance reports for their government regulators and for debugging applications.

Outcomes
  • Fully designed and delivered to dev.

My Role
  • Project lead

Timeframe
  • 9 months

Several rounds of research, co-design, and iteration led the team to understanding this complex space.


Core User Needs

  • Clarity → Users can quickly become overwhelmed by the vast amounts of information presented to them.

  • Autonomy → Users typically have no choice over what information is shown to them, and often times, data lineage is bloated with unnecessary information for the task at hand.

Competitive Analysis

  • Overall, the majority of competitors force users to manually create their data lineage diagrams.

    • IBM would not do this as a core requirement and value proposition was the ability to pull all the lineage information and automatically generate the lineage diagram.

  • Top Competitor: Colibra → Approaches issues of clarity and autonomy by allowing users to reveal parts of the diagram step by step. (Though, this may be a way to reduce load times).

Translating User Needs Into Design Hypotheses

  • Clarity → Including useful metadata, especially descriptive tags, will help users contextualize information as they see it.

  • Autonomy → Starting users at a high level summary with the option drill in deep or or go wide will allow users to tailor what information they want to see.

Co-creation with external stakeholders across the world led us to a realistic solution.


Collaborators!

  • ING → Provided the analogy of being able to see all the steps of production as through you were at a manufacturing plant looking out onto the shop floor.

  • First Republic (Chase) → Emphasized the need to scope down the lineage to the task being done.

  • General Motors → Validated the need for descriptive metadata to contextualize data in the lineage diagram.

  • State Street → Participated in user testing, and validated the desire for landing on a high level summary.

Going From Design Hypotheses to Concrete Direction.

  • Autonomy → Start users on a summary view that allows them to focus the lineage diagram on the specific data.

  • Clarity → Attach business terms (metadata tags) to the data so that users know what the data is for.

  • Clarity →Allow users to reveal segments of the lineage diagram as needed.

Designs were delivered!


Next Steps

  • Development needed to architect new metadata capabilities to respond to leadership decisions, which has delayed the project for several years.