completed

Dolos: Cardano “Data Node”

$75,680.00 Received
$75,680.00 Requested
Ideascale logo View on ideascale
Community Review Results (1 reviewers)
Impact / Alignment
Feasibility
Auditability
Solution

We'll develop a new node, fine-tuned to solve a very narrow scope: keeping an updated copy of the ledger and replying to queries from trusted clients, while requiring a small fraction of the resources

Problem:

Nodes used exclusively as data source for client tools have different requirements than block producers / relays nodes. Many performance / cost optimizations are not currently possible.

Yes Votes:
₳ 129,206,023
No Votes:
₳ 22,608,188
Votes Cast:
612

This proposal was approved and funded by the Cardano Community via Project F9: Developer Ecosystem Catalyst funding round.

[IMPACT] Please describe your proposed solution.

Cardano nodes can assume one of two roles:

  • block producer: in charge of minting blocks
  • relay node: in charge relaying blocks from / to peers.

Each of these roles has concrete responsibilities and runtime requirements. Criteria such as network topology, resource allocation, backup procedures, etc vary by role.

We argue that there’s a 3rd role that should be treated independently with the goal of optimizing its workload: nodes that are used with the sole purpose of resolving local state queries or serving as data source for downstream tools that require ledger data.

There are many potential optimizations for nodes performing this type of workload that are not currently possible with the Cardano node:

  • drastically limiting the amount of memory required to execute the node
  • switching to storage solutions with different trade-offs (eg: S3, NFS, etc)
  • providing alternative wire protocols more friendly for data queries (eg: REST, gRPC)
  • providing an auth layer in front of the API endpoints

The goal of this project is to provide a very limited and focused version of the Cardano node that can be used by DevOps as a cost-effective, performant option to deploy data nodes side-by-side with the producer / relay nodes.

This new role would be useful in the following scenarios:

  • As data source for well-known tools such as DB-sync, Ogmios, CARP, Oura, etc.
  • As a fast, low resource node for syncing other producer / relay nodes.
  • As a ledger data source that scales dynamically according to query load.
  • As a node that leverages network / cloud storage technology instead of mounted drives.
  • As a node that scales horizontally, allowing high-availability topologies.
  • As a low resource local node for resolving local state queries.

Data nodes will share some of the features with the mainstream Cardano node:

  • Node-to-Node and Node-to-Client Chain-Sync mini-protocol
  • Node-to-Node Block-Fetch mini-protocol
  • Node-to-Client Local-State-Query mini-protocol

This new type of node will also provide features not currently available in the mainstream Cardano node:

  • HTTP/JSON endpoint for common local state queries
  • gRPC endpoint for local state queries and chain-sync procedure
  • Different storage options including NFS, S3 & GCP Buckets
  • Low memory consumption (allowed by the trade-offs in scope)

Dolos will be developed as an open-source project using Rust as its main development language. The code will do heavy usage of the Pallas library developed by this team and already available as version 0.11.

[IMPACT] Please describe how your proposed solution will address the Challenge that you have submitted it in.

Having an efficient way to access ledger data without incurring high infrastructure costs will accelerate development for both solo developers and small to medium sized teams. A lower entry barrier will drive more developers to the ecosystem.

New wire protocols such as HTTP/JSON and gRPC will also widen the spectrum of options for developers to integrate ledger data without the requirement of integrating low-level mini-protocols directly into their projects.

[IMPACT] What are the main risks that could prevent you from delivering the project successfully and please explain how you will mitigate each risk?

Our experience developing Oura and Pallas allowed us to gain knowledge and implement libraries that will serve as foundational components for this project. Regardless of this advantage, there are some known-unknowns that could present a risk:

  • Performance gains and resource allocation optimizations are theoretical, these were extrapolated from our experience implementing Cardano data processing pipelines using components written in Rust. We won’t have a strict, quantifiable measurement until we develop a PoC of this project. To mitigate this issue, our development process will include performance benchmarks execution at each development milestone. Reports will be included as part of each release.
  • There’s some documentation lacking regarding local state queries wire-format which will need some reverse engineering from the mainstream Cardano node. We have experience with this approach but the level-of-effort associated with the task is hard to anticipate. To try mitigate this issue, we'll reach out to IOG for advise and documentation in case it's available.

Is important to highlight that we consider this project feasible because the complexity of a data node is orders of magnitude lower than the complexity of a full-node. Please note that we are NOT proposing an alternative to the mainstream Cardano node written in Rust, that would be imposible to achieve with our current development bandwidth.

[FEASIBILITY] Please provide a detailed plan, including timeline and key milestones for delivering your proposal.

Milestone #1: PoC

  • 1 month development
  • 1 full-time Rust developer
  • 1 part-time SRE
  • Deliverables
  • Working Prototype
  • ChainSync client
  • Local file-system storage
  • Subset of local state queries

Milestone #2: Features

  • 2 month development
  • 1 full-time Rust developer
  • 1 part-time SRE
  • Deliverables
  • Working Prototype
  • More storage options
  • Authentication mechanism
  • gRPC endpoint
  • HTTP/JSON endpoint
  • ChainSync / BlockFetch server

Milestone #3: Hardening / Documentation

  • 1 month development

  • 1 part-time Rust developer

  • 1 part-time SRE

  • 1 full-time technical writer

  • Deliverables

  • Fully-Functional v1

  • Performance optimizations

  • Bug-fixing

  • Documentation site

    [FEASIBILITY] Please provide a detailed budget breakdown.

Hourly rates:

  • Project Manager: 60 usd / hs
  • Rust Developer: 75 usd / hs
  • Site Reliability Engineer: 70 usd / hs
  • Technical Writer: 36 usd / hs

Required Hours:

  • Project Manager: 80 hs
  • Rust Developer: 560 hs
  • Site Reliability Engineer: 320 hs
  • Technical Writer: 180 hs

Total Budget:

  • Project Manager: 4,800 USD

  • Rust Developer: 42,000 USD

  • Site Reliability Engineer: 22,400 USD

  • Technical Writer: 6,480 USD

    [FEASIBILITY] Please provide details of the people who will work on the project.

Santiago Carmuega will lead the software development effort. He is a senior developer with over 20 years of experience in software development and very active in Cardano Open Source ecosystem leading TxPipe.

Github: https://github.com/scarmuega

Twitter: https://twitter.com/santicarmuega

Alejandro Drabenche will be the SRE in charge of validating, deploying and testing the project at each milestone. He is a senior System Administrator with over 15 years of experience. He has been working on blockchain for over 5 years.

Florencia Luna will be in charge of the technical writing. She is a junior developer with experience in technical writing.

Federico S. Weill will be the project manager. He is a senior project manager, he has a PhD. in science and he has lead more than 10 research projects during the last 20 years managing resources and people.

We're planning on hiring a new software developer with experience in Rust to contribute to the codebase starting from milestone #2.

[FEASIBILITY] If you are funded, will you return to Catalyst in a later round for further funding? Please explain why / why not.

If the project achieves a good level of adoption, we intend to return to Catalyst for a v2 of the tool after we gathered enough feedback from real-world usage.

[AUDITABILITY] Please describe what you will measure to track your project's progress, and how will you measure these?

Progress of the project will be measured by released versions matching the scope of the predetermined milestones.

  • A detailed roadmap will be presented as part of the open source repository.

  • Each milestone will be presented as a partial but working "release" within the repository

  • Milestone and tasks will be tracked via Github using their project management tool.

  • Direct communication with the development team will be able through TxPipe's Discord server.

  • Weekly update summaries will be posted via TxPipe's Twitter account.

    [AUDITABILITY] What does success for this project look like?

Projects within the Cardano ecosystem use the new Data Node as a way to optimize their infrastructure costs and improve the performance of their data-intensive workloads.

[AUDITABILITY] Please provide information on whether this proposal is a continuation of a previously funded project in Catalyst or an entirely new one.

This is an entirely new proposal.

Community Reviews (1)

Comments

Monthly Reports

This is our 1st report. Since we've started the development process, we've completed the following tasks:

  • Github repository provisioning (visibility, license, collaborators, etc)
  • Kanban board provisioning
  • Rust project scaffolding
  • PoC on chain-sync client implementation
  • PoC on local storage implementation
Disbursed to Date
$75,680
Status
Still in progress
Completion Target
3. In the next 6 months
Comments 0

Login or Register to leave a comment!

This is our 2n report. Since our last report, we've completed the following tasks:

  • Implement stand-alone multiplexer stage
  • Improve performance by using a thread pool between muxer and demuxer tasks
  • Add integration tests for the upstream chain-sync stage
  • Implement a stand-alone stage for the upstream block-fetch logic
  • Introduce a new storage layer called RollDB that encapsulates logic for handling sequence of blocks, taking into account rollbacks
  • PoC of a stand-alone stage for applying chaing-sync outputs to the storage layer
Disbursed to Date
$75,680
Status
Still in progress
Completion Target
3. In the next 6 months
Comments 0

Login or Register to leave a comment!

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3m 24s
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4m 3s
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3m 48s
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2m 16s
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3m 14s
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3m 19s
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4m 59s
    Darlington Kofa
0:00
/
~0:00