over budget
Cardano blockchain data on BigQuery
Current Project Status
unfunded
Total
amount
Received
$0
Total
amount
Requested
$44,661
Total
Percentage
Received
0.00%
Solution

将区块链数据加载到BigQuery中,以消除用户的数据库设置过程,降低成本并实现更复杂的跨链查询。

Problem

用户托管独立的cardano-db-sync是一个重复过程。一个集中的数据库将消除冗余,降低准入门槛

Addresses Challenge
Feasibility
Auditability

团队

1 member

Detailed Plan

Why a centralized database?

As the blockchain matures into later epochs, it will become increasingly costly for the community to maintain separate databases. Hence, we aim to reduce user costs by utilizing community funds to maintain a centralized database.

How is this different?

We’ll increase accessibility by reducing users’ capital expenses, the knowledge required to set up cardano-db-sync, and the time required to sync to the blockchain. Unlike PostgreSQL, BigQuery can perform analytics with complex queries aggregations. This includes cross-chain analysis and determining the most active/popular tokens, contracts, NFT artists, or applications on the blockchain. This gives the community data-driven insights into different areas of adoption on the blockchain.

We are currently running a proof of concept.

We run a cardano-db-sync locally to fetch information from the chain. Data is then extracted and inserted into Google’s cloud SQL database which is later accessed with PostgreSQL. The database is updated with the latest rows every 24 hours and exported into CSV and subsequently saved in Google cloud storage.

How will the funds be used?

In addition to the cost of maintaining the database, we will hire developers to write automation code for syncing the latest rows of the blockchain with Google Cloud Storage. To streamline and schedule this workflow, we will utilize Google Cloud Composer service built on Apache Airflow as it is more stable than the current workflow process. Once the exported CSV files are loaded into BigQuery, users can then query the data in the BigQuery console or via an API.

Our code is open-sourced.

Users may choose to replicate our solution for optimization or vary the interval at which the latest data is uploaded to BQ. Our current configuration performs daily extraction of data from the Cardano blockchain including results of smart contract transactions such as token transfers. After which, data is de-normalized and stores date-partitioned data to BigQuery for easy and cost-effective exploration.

This solution has proven to be successful for other blockchains.

BTC and ETH already have their blockchain hosted on BQ. Since then, users have been able to easily perform bulk operations on it. It also has support for user-defined functions (UDF), thus it’s easy to run scripts on the whole dataset in seconds. Here’s an example of Tomasz Kolinko analyzing 1.2M ETH contracts in 20 seconds with BigQuery and a custom script. https://tinyurl.com/ethcontractsanalysis

The queries are low cost, and zero setup is required by users.

Furthermore, users no longer need to host independent Cardano DB Sync to perform data analytics. Users may sign up for BigQuery and access the database within minutes.

Our goal is to have a self-sustaining database that grows with the Cardano community.

The funds will maintain the database for 2 years. After which we’ll seek funding via running a Cardano stake pool to maintain the operation of this database.

The project will take 3 months.

  • 1-month: Host an independent version of IOHK code with custom extraction code to stream data into BQ.
  • 2-month: Upload all datasets into BQ. Synchronization will take around 3 weeks.
  • 3-month: BQ datasets are able to automatically sync with the blockchain.

Key performance indicators:

Perform interesting query from the cardano-db-sync document in Big Query https://tinyurl.com/interestingqueriescardano

Our current pool: https://armadacardano.io/

Details on projected cost, and schemas to load: https://tinyurl.com/bigquerycostprojections

社区顾问评论 (1)

Comments

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3分钟24秒
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4分钟3秒
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3分钟48秒
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2分钟16秒
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3分钟14秒
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3分钟19秒
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4分钟59秒
    Darlington Kofa
0:00
/
~0:00