over budget
Cardano blockchain data on BigQuery
Current Project Status
unfunded
Total
amount
Received
$0
Total
amount
Requested
$44,661
Total
Percentage
Received
0.00%
Solution

Load blockchain data into BigQuery to eliminate user’s database set-up process, lower cost and enable more complex cross-chain queries.

Problem

Users hosting independent cardano-db-sync is a repeat process. A centralized database will eliminate redundancy and lower barriers to entry

Addresses Challenge
Feasibility
Auditability

Team

1 member

Detailed Plan

Why a centralized database?

As the blockchain matures into later epochs, it will become increasingly costly for the community to maintain separate databases. Hence, we aim to reduce user costs by utilizing community funds to maintain a centralized database.

How is this different?

We’ll increase accessibility by reducing users’ capital expenses, the knowledge required to set up cardano-db-sync, and the time required to sync to the blockchain. Unlike PostgreSQL, BigQuery can perform analytics with complex queries aggregations. This includes cross-chain analysis and determining the most active/popular tokens, contracts, NFT artists, or applications on the blockchain. This gives the community data-driven insights into different areas of adoption on the blockchain.

We are currently running a proof of concept.

We run a cardano-db-sync locally to fetch information from the chain. Data is then extracted and inserted into Google’s cloud SQL database which is later accessed with PostgreSQL. The database is updated with the latest rows every 24 hours and exported into CSV and subsequently saved in Google cloud storage.

How will the funds be used?

In addition to the cost of maintaining the database, we will hire developers to write automation code for syncing the latest rows of the blockchain with Google Cloud Storage. To streamline and schedule this workflow, we will utilize Google Cloud Composer service built on Apache Airflow as it is more stable than the current workflow process. Once the exported CSV files are loaded into BigQuery, users can then query the data in the BigQuery console or via an API.

Our code is open-sourced.

Users may choose to replicate our solution for optimization or vary the interval at which the latest data is uploaded to BQ. Our current configuration performs daily extraction of data from the Cardano blockchain including results of smart contract transactions such as token transfers. After which, data is de-normalized and stores date-partitioned data to BigQuery for easy and cost-effective exploration.

This solution has proven to be successful for other blockchains.

BTC and ETH already have their blockchain hosted on BQ. Since then, users have been able to easily perform bulk operations on it. It also has support for user-defined functions (UDF), thus it’s easy to run scripts on the whole dataset in seconds. Here’s an example of Tomasz Kolinko analyzing 1.2M ETH contracts in 20 seconds with BigQuery and a custom script. https://tinyurl.com/ethcontractsanalysis

The queries are low cost, and zero setup is required by users.

Furthermore, users no longer need to host independent Cardano DB Sync to perform data analytics. Users may sign up for BigQuery and access the database within minutes.

Our goal is to have a self-sustaining database that grows with the Cardano community.

The funds will maintain the database for 2 years. After which we’ll seek funding via running a Cardano stake pool to maintain the operation of this database.

The project will take 3 months.

  • 1-month: Host an independent version of IOHK code with custom extraction code to stream data into BQ.
  • 2-month: Upload all datasets into BQ. Synchronization will take around 3 weeks.
  • 3-month: BQ datasets are able to automatically sync with the blockchain.

Key performance indicators:

Perform interesting query from the cardano-db-sync document in Big Query https://tinyurl.com/interestingqueriescardano

Our current pool: https://armadacardano.io/

Details on projected cost, and schemas to load: https://tinyurl.com/bigquerycostprojections

Community Reviews (1)

Comments

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    d. 3 se. 24
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    d. 4 se. 3
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    d. 3 se. 48
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    d. 2 se. 16
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    d. 3 se. 14
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    d. 3 se. 19
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    d. 4 se. 59
    Darlington Kofa
0:00
/
~0:00