over budget

Analytics Data Hub

$60,000.00 Requested
Ideascale logo View on ideascale
Community Review Results (1 reviewers)
Addresses Challenge
Feasibility
Auditability
Solution

A data hub for the Cardano ecosystem that makes historical and modelled datasets available through multiple access mechanisms.

Problem:

Data relating to the Cardano ecosystem is granular and scattered, making it difficult to access and use for analytics or machine learning.

Yes Votes:
₳ 46,781,025
No Votes:
₳ 25,955,272
Votes Cast:
327

  • download
  • download
  • download
  • download

[IMPACT]

Problem Overview

Currently data relating to the Cardano ecosystem is available, but is spread out over multiple sources.

DBSync is the most detailed source of on-chain data, but it's not easy or cheap to run, and the data contained is highly normalized, meaning it's often difficult to use the data to gain insights without a high level of knowledge about the database schema. The effort and time required to transform this data is something that any project that consumes this data will need to account for. i.e., this is a repeatable process that doesn't need to be done every time a project needs data.

Additionally there are other sources of data which must be integrated in order to make the data optimally useful, which includes:

  • Stake pool metadata server (SMASH)
  • Extended metadata files as-per the adapools.org standard
  • Token data from sources such as the Cardano Foundation token registry, and the NFT marketplaces (CNFT.io, JPG.store, etc.) policyid databases
  • Market data
  • Social metrics
  • Many others…

Additionally, there is a wealth of data in the transaction metadata which is specific to certain use cases, and is difficult to access without knowledge of how to query JSON data structures.

There are currently several excellent sites which offer pool specific data such as adapools.org and pooltool.io, as well as several block explorers, however none of these sites provide full historical data, custom queries, or data which has been modelled specifically for analysis or machine learning use cases.

Solution

We propose to build the initial MVP of a community data hub which will provide consolidated analytics-ready data to the Cardano ecosystem. We have already begun initial data sets (on-chain data and stake pool data sets), which would be integrated into the single data hub platform.

At a minimum, there will be data available from DBSync and other sources listed above, which have been modelled for various analytics activities. The DBSync data will have additional aggregated views such as the ones in the following repository: <https://github.com/cardanocanuck/db-sync-queries>

Additionally, we will continue to add special purpose datasets for various domains within the Cardano ecosystem. We have / will be submitting several smaller proposals for specialized datasets to be modelled and developed such as:

  • On-chain Analytics - Transactions, volume, rewards, etc. (funded in F6 and wrapping up)
  • Pool Analytics - Machine Learning ready dataset on historical pool performance (funded in F7 and underway)
  • NFT Analytics - information about several aspects of NFT projects
  • Smart Contract Analytics

The initial MVP Data Hub will allow the download of scheduled CSV data sets. In the future, the range of sharing methods will be expanded. Some of these sharing methods will be:

  • API access
  • Web based data explorer
  • Community available Google Sheets
  • Direct cloud database access
  • Direct data sharing (Azure / Snowflake)

We will prioritize free community access methods, but some access methods such as direct database access or data sharing may be monetized with a subscription model. The purpose of monetizing premium aspects of the data hub is to fund future ongoing development and enhancement.

This proposal is for the core functionality and backend infrastructure development of this community hub.

Project Plan

The requested funds will cover the first 3 months of development of the platform as well as the first 6 months of running costs.

We propose to follow a hybrid waterfall / agile methodology, starting with some upfront architecture and design and feature planning, followed by 4 sprints of feature development. The project plan will be updated throughout this catalyst process as we find team members and refine our idea and feature set.

See attached diagram.

Budget

The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.

The approximate budget breakdown by role is as follows:

  • Architect / Senior Dev - 100h x $75 = $7,500
  • Graphic Designer - 60h x $75 = $4,500
  • Web Developer - 80h x $75 = $6,000
  • Data Engineer - 280h x $75 = $21,000
  • Project Manager - 80h x $75 = $6,000
  • QA - 40h x $75 = $3,000

Total development costs: $48,000

Infrastructure costs estimated at $2000/mo x 6 months = $12,000

Total Budget: $60,000

Core Team Experience

Michael Stewart

  • 17+ years of software development and architecture experience.
  • 10+ years focused in the data and analytics space
  • Led the development team of a boutique data / analytics firm where I designed and architected cloud based data warehouse solutions for fortune 500 companies
  • Member of the Cardano community since 2017
  • Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
  • Co-Founder of Canucks Publishing NFT Minting Platform and Service
  • Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)

Vivek Nankissoor

  • 15+ years of experience in database requirements, design and development
  • Established and grew web analytics, marketing automation and QA practices
  • Engaged in marketing, data and analytics strategy development with enterprise retail, cpg organizations, banks, automotive, pharma, fintech and others
  • Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
  • Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)
  • Co-Founder of Canucks Publishing NFT Minting Platform and Service
  • Participant in community work such as financial literacy relating to crypto and raising awareness with various investment groups

This solution will address the challenge by providing a starting point for data and analytics projects within the developer ecosystem, removing the overhead of time and effort for creating a usable data set. In addition, developers will not need to ramp up on the nuances of raw data sets (e.g., structures and relationships within DBSync). Instead, they can start with curated data that lends itself to easy integration within developer applications.

Also, this solution will allow for previously funded projects to be integrated into a single place for the aggregation and distribution of curated data sets:

  • on-chain dataset
  • stake pool dataset

The risks are:

  • Resource management - ensure that the resources assigned have the proper skills and experience to complete the project

  • Strict adherence to timelines - ensure that the project stays on time and on budget

  • Significant changes to data sources - risk that the source data schema, format, etc. may change during the duration of this project. If this occurs, there is a risk that timelines may be extended. This is assumed to be a low risk as any changes in the past have been minor and accompanied by good notice and documentation

    [FEASIBILITY]

Please see the attachment for the overall project timeline.

  • Legal Setup - creation of a legal entity dedicated to this project so it can be managed and maintained beyond the initial funding

  • Milestones/Deliverables: creation of the legal entity under which the data hub will be managed

  • Project Planning - creation and revision of a detailed plan including requirements documentation, resourcing, management via Jira (epic and issue/task creation, assignment and management), milestone definition and deployment (publish/schedule to site)

  • Milestones/Deliverables: project plan and setup

  • Creative Design - portal creation and feed integration plan

  • Milestones/Deliverables: wireframes/mockups for portal, brief for web development

  • Solution architecture - technical solution planning and initial systems provisioning

  • Milestones/Deliverables: architecture diagram and development plan

  • Website Development - creation of the portal

  • Milestones/Deliverables: completed website where users can find/download data sets

  • Feature Development - Sprint X - data set creation, including extraction, load, transformation/aggregation, build of curated views, export to supported formats.

  • Milestones/Deliverables: scheduled data sets presented via the website

Budget

The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.

The approximate budget breakdown by role is as follows:

  • Legal Setup / Incorporation = $5000
  • Architect / Senior Dev - 100h * $75 = $7,500
  • Graphic Designer - 60h * $75 = $4,500
  • Web Developer - 80h x $75 = $6,000
  • Data Engineer - 280h x $75 = $21,000
  • Project Manager - 80h x $75 = $6,000
  • QA - 40h x $75 = $3,000

Total development costs: $53,000

Infrastructure costs estimated at $2000/mo x 6 months = $12,000

Total Budget: $65,000

Project resource breakdown (see budget breakdown).

Leadership:

Michael Stewart

  • 17+ years of software development and architecture experience.
  • 10+ years focused in the data and analytics space
  • Led the development team of a boutique data / analytics firm where I designed and architected cloud based data warehouse solutions for fortune 500 companies
  • Member of the Cardano community since 2017
  • Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
  • Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)

Vivek Nankissoor

  • 15+ years of experience in database requirements, design and development

  • Established and grew web analytics, marketing automation and QA practices

  • Engaged in marketing, data and analytics strategy development with enterprise retail, cpg organizations, banks, automotive, pharma, fintech and others

  • Co-Founder of Cardano Canucks stake pool and Canuckz NFTs

  • Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)

  • Participant in community work such as financial literacy relating to crypto and raising awareness with various investment groups

    [AUDITABILITY]

This project will be measured primarily by:

  • milestone/deliverable on time completion
  • budget control (spend by resource, by phase)

The secondary KPIs may include:

  • Volume: breadth and depth of data sets created (columns, rows)
  • Relevance: number of use cases satisfied by data set design

Success is defined as a website where curated datasets are refreshed daily and can be downloaded by data consumers on an ad hoc basis.

Once completed, this project will serve as the foundation for the enablement of data for analytics: modeling, visualization, machine learning, applications specific to particular domains (stake pools, NFTs, etc.), and much more.

The project is a net new project, but will tie in the outputs from previous projects:

  • on-chain data sets (fund 6)
  • stake pool data sets (fund 7)

It will also provide a platform for future data set development.

SDG Rating

The vision for this project is to increase awareness and engagement with Cardano among the data and analytics community. There are many developers who have high levels of excitement and energy for data science, but grapple with the barrier of data curation to apply their skills. This project is intended to remove those barriers.

SDG goals:

Goal 8. Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all

SDG subgoals:

8.3 Promote development-oriented policies that support productive activities, decent job creation, entrepreneurship, creativity and innovation, and encourage the formalization and growth of micro-, small- and medium-sized enterprises, including through access to financial services

Key Performance Indicator (KPI):

8.3.1 Proportion of informal employment in total employment, by sector and sex

#proposertoolsdg

Community Reviews (1)

Comments

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3m 24s
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4m 3s
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3m 48s
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2m 16s
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3m 14s
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3m 19s
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4m 59s
    Darlington Kofa
0:00
/
~0:00