not approved

Cardano on BigQuery: scalably querying Cardano’s authenticated blockchain data on BigQuery

₳80,000.00 Requested
Ideascale logo View on ideascale
Community Review Results (1 reviewers)
Impact Alignment
Feasibility
Value for money
Solution

This project addresses the challenge of providing free and open data access to Cardano’s blockchain. We export all onchain data to BigQuery and back validate it creating a proof of data authenticity.

Problem:

Current Cardano data querying is either restrictive, using online data providers, or resource-intensive and complex to setup (using db-sync/cardano-cli), hindering efficient blockchain analysis.

Yes Votes:
₳ 41,898,573
No Votes:
Votes Cast:
261

[SOLUTION] Please describe your proposed solution.

This project addresses the challenge of providing free and open data access to Cardano’s blockchain which constantly grows and requires increasing technology to be thrown at the task. Because the network has agreed on the finalisation of the on-chain data, these data can be shared with everybody in its most practical and cheapest form, trustless. Our back validation procedure builds a proof of data authenticity of the exported data in BigQuery.

[IMPACT] Please define the positive impact your project will have on the wider Cardano community.

This project will have a positive impact on the Cardano community as it provides an accessible and affordable data basis to launch DApps, web3, NFT, transaction explorers, on-chain analytics, and other types of data driven projects with ease.

It can also empower any member of the Cardano community with basic SQL skills to interrogate the Cardano blockchain for specific information of interest.

[CAPABILITY & FEASIBILITY] What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

We have more than two years experience with our own developed Db-Sync Enterprise protocol which minimises downtime of a Db-Sync pipeline.

The export of the Cardano blockchain data to BigQuery has been running for ten continuous months without interruption, until June 2023.

The project team covers all aspects required to achieve the stated goals.

[Project Milestones] What are the key milestones you need to achieve in order to complete your project successfully?

M1: create infrastructure for the project (1 month)

This process entails setting up the servers to run the Cardano node, db-sync, and a Postgres database. The setup includes configuring a fully functional db-sync pipeline, including the mentioned components. The criteria for acceptance will focus on ensuring this pipeline runs smoothly, testing failover capabilities, and implementing comprehensive monitoring.

>M2: setup of the export and its continuous processing (1 month)

This milestone involves setting up the export process from db-sync to BigQuery.

The export is split into 2 separate processes: exporting data at the end of every epoch and exporting data every 30 minutes.

The acceptance criteria will testing and validating that both the continuous and the end-of-epoch update, export successfully the data to BigQuery.

>M3: deep comparison to create authoritative data equivalence with Cardano’s blockchain (1 month)

This milestone involves setting up the deep comparison process: we will compare the data exported in BigQuery with the data in db-sync using hashing and creating a proof of data authenticity.

Acceptance criteria would be running the deep comparison process successfully for all past epochs and having all the exported data BigQuery validated.

>Final Milestone: fully document the update process and the data quality monitoring (1 month)

This milestone involves creating extensive and descriptive documentation of the schema, the export process as well as the monitoring process.

The milestone output will be:

  • Documentation of all the data in BigQuery

  • Documentation of the update process process

  • Documentation of the deep comparison process

  • Example code of connecting to BigQuery an querying the data

  • Example dashboards/data analyses based on the data

  • Pre-defined views for popular queries

    [RESOURCES] Who is in the project team and what are their roles?

Alexander Diemand (<span class="mention" data-denotation-char="" data-id="190990" data-index="0" data-value="<member id='190990' communityId='163'>cardanobigquery</member>"><span contenteditable="false"><span class="ql-mention-denotation-char"></span><member communityid="163" id="190990">cardanobigquery</member></span></span> https://www.linkedin.com/in/axeld/): architecture & design, project management, communication, documentation

Thomas Kaliakos (<member communityid="163" id="190996">thomaska</member> - <https://www.linkedin.com/in/tkaliakos/>): data engineering, data quality responsibility, documentation

Bitseat Tadesse (<member communityid="163" id="191553">bitseatt</member> - <https://www.linkedin.com/in/bitseat/>): data science, social networks, documentation

[BUDGET & COSTS] Please provide a cost breakdown of the proposed work and resources.

B1: 50 PD for architecture, design, project management, documentation, communication

B1.1: 10 PD architecture & design

B1.2: 10 PD project management

B1.3: 10 PD documentation

B1.4: 20 PD communication

B2: 20 PD for systems engineering (devops)

B2.1: 10 PD infrastructure setup (redundant hardware, high-availability)

B2.2: 10 PD process monitoring, alerting, mitigation procedures

B3: 55 PD for data engineering

B3.1: 5 PD PostgreSQL optimisations

B3.2: 10 PD BigQuery maintenance

B3.3: 20 PD Update process

B3.4: 20 PD Deep comparison process (back validation)

B4: Infrastructure costs

B4.1: $680 per month for redundant server hardware

(PD = person day; 8 hrs/day; 1 hr = $90)

Sum PD = 125 person days

At rate $90/h, 8 hrs/day: Sum budget PD = $90,000

12 months running costs: Hardware $680 x 12 = $8160

Total Budget: $98,160

[VALUE FOR MONEY] How does the cost of the project represent value for money for the Cardano ecosystem?

Blockchain data is by its definition equal for all participants of the network. So it makes sense to share these data in their most practical form such that each participant can independently work with it. We believe that SQL is the most accessible way of querying data and everybody will find a way in their own setup to connect and query from the BigQuery dataset which is always on.

Trustless data querying is enabled by our back validation which proves that the data in BigQuery really represents the on-chain Cardano blockchain.

Running the complete Db-Sync pipeline amounts to costs of several hundred dollars per month. On the other hand, BigQuery offers a free monthly quota of 1 TB queried data and usually incurs no costs if used sparingly.

Community Reviews (1)

Comments

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3m 24s
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4m 3s
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3m 48s
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2m 16s
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3m 14s
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3m 19s
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4m 59s
    Darlington Kofa
0:00
/
~0:00