funded
Mumak by TxPipe: A PostgreSQL custom extension to process raw Cardano CBOR
Current Project Status
In Progress
Amount
Received
₳102,064
Amount
Requested
₳170,714
Percentage
Received
59.79%
Solution

A PostgreSQL extension that allows the database to understand CBOR Cardano structures so that queries can be performed directly without having to map all the details into a relational model.

Problem

Mumak PostgreSQL extension for Cardano CBOR data

Cardano data is usually serialized as CBOR. Storing Cardano in databases requires a mapping between CBOR and the relational model. This mapping is complex, expensive and usually suboptimal.

Impact Alignment
Feasibility
Value for money

txpipe

2 members

Mumak by TxPipe: A PostgreSQL custom extension to process raw Cardano CBOR

Please describe your proposed solution.

Executive Summary

  • We'll build a custom PostgreSQL extension that allows you to interact with cardano CBOR data directly.
  • Developers will be able to store raw CBOR as table columns and use our extensions to evaluate predicates and projections directly.
  • This simplifies many data indexing scenarios and provides improved performance over alternative queries requiring complex table joins.
  • We'll also provide a generic dataset, a simple PostgreSQL schema + data that represents much of the data found on-chain leveraging the Mumak extension.
  • We'll integrate Mumak with Mithril and Oura, providing a way for fast processing of the whole history of the chain.
  • We'll integrate Mumak into TxPipe's Scrolls indexer providing a way to create richer dataset in a simpler way.
  • We'll integrate the Mumak service on Demeter platform so that it can be accessed directly without the need of provisioning your own instance.

Introduction

The de facto wire format used to exchange Cardano data is CBOR. By Cardano data we refer to Blocks, Transactions, UTxOs, Certificates and all of its inner structures.

When indexing this data onto a relational database, such as PostgresQL, we usually deserialize the CBOR structures and map the values to different columns & tables in our schema.

Some times having a normalized data model is exactly what your use-case needs, but there are multiple scenarios where this mapping just adds complexity and performance penalties.

Mumak is a PostgreSQL extension that provides several custom functions to interact with Cardano CBOR data directly on the database.

For example, this means that you could store a whole Cardano Tx as CBOR in a bytes PostgreSQL column and then use SQL to ask things such as:

  • Does this transaction involve address X?
  • What's the total ADA output of this transaction?
  • Is this transaction minting a token for policy X?

The evaluation of these values will happen in-process, as part of the Db query execution.

Usage Example

Let say that we have a table where we store transactions in CBOR format. By doing this, we could connect to the PostgreSQL instance and run queries like these:

> SELECT

> tx_hash(tx_cbor, era),

> tx_lovelace_output(tx_cbor, era)

> FROM txs

> WHERE

> tx_has_address(tx_cbor, era, 'addr1xxx');

> SELECT

> tx_asset_mint(tx_cbor, era, '0x000000', 'myasset')

> FROM txs

> WHERE

> tx_has_asset(tx_cbor, era, '0x000000', 'myasset')

> SELECT

> tx_metadata_label(tx_cbor, era, 721)

> FROM txs

> WHERE

> tx_has_metadata_label(tx_cbor, era, 721)

Let's say that we now have a table where we store Cardano address as raw bytes. We could run the following queries:

> SELECT

> delegation_part(address)

> WHERE

> is_script_address(address)

> SELECT

> payment_part(address)

> WHERE

> delegation_part(address) = "0x000"

>

More information

If you're interested in learning more about the project, even outside the scope of the Catalyst proposal, please bookmark our git repository. We'll make sure to maintain the README updated with relevant information and news.

Please define the positive impact your project will have on the wider Cardano community.

Simplify development of Cardano dApps

By providing a versatile tool that integrates directly with PostgreSQL, we simplify the process of interacting with Cardano on-chain data.

Improve performance of Cardano dApps

dApps that need to query & process vast volumes of data not well-suited for a normalized data model will find in this project a more efficient way to fulfill their needs.

Reduce maintenance costs of Cardano dApps

dApps that are wasting storage and compute resources maintaining a suboptimal data schema will be able to explore other, more efficient, data models using CBOR data directly.

What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

TxPipe is an active member of the Cardano ecosystem

TxPipe has been developing open-source tools for the Cardano ecosystem for over 2 years and we're not going anywhere. Evidence of our commitment can be found by evaluating the continuous activity of our public code repositories.

Experience developing in the Cardano ecosystem

TxPipe has helped developed several dApps for the Cardano ecosystem. This experience allows us to evaluate the feasibility of the project and its potential benefit from a developer's perspective.

Succesful Catalyst proposals

We have successfully completed 2 prior Catalyst proposals. This may serve as evidence that our team has the required capabilities to fulfill these type of projects.

Development process will be public and open-source

Both the output and the development process will be public and open-source. This approach provides an easy way for the Catalyst team and the Cardano community to evaluate the progress at each step of the process.

What are the key milestones you need to achieve in order to complete your project successfully?

PostgreSQL extension

This milestone is about developing the business logic of the custom PostgreSQL functions that are installed into the PostgreSQL instance, allowing developers to interact with raw Cardano CBOR.

Outputs:

  • Source code for the PostgreSQL extension
  • Projection and filter functions as specified in repository's README
  • Binary release for the extension ready to be installed on a PostgreSQL instance
  • Instructions on how to install the extension

Acceptance Criteria:

  • A technical reviewer / developer is able to install the extension by following the provided instructions.
  • A technical reviewer / developer is able to execute queries against a suitable PostgreSQL instance using the custom projections & filters provided by the extension.

>Block dataset

This milestone is about providing the means to create datasets that hold Cardano block data leveraging the query capabilities of the Mumak PostgreSQL extension.

Outputs:

  • A relational schema to store Cardano block structures leveraging Mumak capabilities, presented as an SQL create script.
  • A dataset up to the latest available epoch containing mainnet blocks leveraging Mumak capabilities, presented as an SQL snapshot.
  • An ingestion tool using Oura and Mithril to create and update block datasets.

Acceptance Criteria:

  • A technical reviewer / developer is able to inspect the db schema and understand the presented structures.
  • A technical reviewer / developer is able to query the presented dataset snapshot using PostgreSQL and Mumak to explore on-chain data.

>Transaction dataset

This milestone is about providing the means to create datasets that hold Cardano transaction data leveraging the query capabilities of the Mumak PostgreSQL extension.

Outputs:

  • A relational schema to store Cardano transaction structures leveraging Mumak capabilities, presented as an SQL create script.
  • A dataset up to the latest available epoch containing mainnet transactions leveraging Mumak capabilities, presented as an SQL snapshot.
  • An ingestion tool using Oura and Mithril to create and update transaction datasets.

Acceptance Criteria:

  • A technical reviewer / developer is able to inspect the db schema and understand the presented structures.
  • A technical reviewer / developer is able to query the presented dataset using PostgreSQL and Mumak to explore on-chain data.

>Demeter Integration

This milestone is about the integrating all Mumak datasets as part of the Demeter platform to provide a zero-configuration experience for developers that want to query the data without provisioning their own infrastructure.

Outputs:

  • A Mumak instance already provisioned and available through Demeter platform
  • Instructions and documentation on how to query the instance using Demeter

Acceptance Criteria:

  • A technical reviewer / developer is able to sign-up to Demeter using the free-tier and query data from the Mumak dataset using the available instructions.

>Completion Report

This milestone is about wrapping up the project and providing relevant insights for the Catalyst team and larger Cardano community to evaluate the result of the proposal.

Outputs:

  • the complete source-code for the Mumak PostgreSQL extension
  • A binary release version of the extension ready to download
  • Complete instructions on how to install the Mumak extension
  • SQL scripts with the schema and latest snapshot for the Mumak dataset
  • a report detailing the usage metrics through Demeter since the release of the feature
  • a video that presents the tool and describes the development process

Acceptance Criteria:

  • A technical reviewer / developer is able to run it's own version of the Mumak extension and dataset by using the publicly available artifacts.
  • a reviewer is able to audit the development process ensuring that all required tasks have been completed.

Who is in the project team and what are their roles?

Assigned Team:

  • Santiago Carmuega, TxPipe - Tech Lead / Backend Developer - Github
  • Rodrigo Suaya, TxPipe - Project manager - Linkedin
  • Paulo Bressan, TxPipe - Rust Developer - Github
  • Alexsander Falcucci, TxPipe - Rust Developer - Github
  • Alejandro Avagnina, TxPipe - SRE - Github
  • Florencia Luna, TxPipe: Technical writer - Linkedin

Additional Team Members

  • We're planning on recruiting a Database Administrator with experience in PostgreSQL, performance tuning and troubleshooting.

Please provide a cost breakdown of the proposed work and resources.

FTE = Full-time equivalent

Total budget = ₳ 170,714

Software Development

  • Rust developer: 1 FTE x 2 months = ₳ 51,429
  • Backend developer: 1 FTE x 1 months = ₳ 25,714
  • Technical writer: 1/2 FTE x 2 months = ₳ 8,571

Infrastructure:

  • Database Administrator: 1 FTE x 2 months = ₳ 51,429
  • Site Reliability Engineer: 1 FTE x 1 months = ₳ 22,857

Project Management

  • Project Manager: 1/4 FTE x 5 months = ₳ 10,714

How does the cost of the project represent value for money for the Cardano ecosystem?

The bulk of the budget falls under the software development category. TxPipe has extensive experience in the field, allowing it to provide good value for money. The hourly rates are defined using fair market prices. The estimation for the level of effort takes into account all of the optimizations that our team is capable of providing after years of experience developing software solutions in the Cardano ecosystem.

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3m 24s
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4m 3s
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3m 48s
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2m 16s
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3m 14s
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3m 19s
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4m 59s
    Darlington Kofa
0:00
/
~0:00