“Data-Driven Catalyst”: Toolkit for on-chain and off-chain data analytics & insight, optimize governance and prevent gaming and abuse

Funds Fund 11 Proposals F11: Catalyst Systems improvements: Discovery “Data-Driven Catalyst”: Toolkit for on-chain and off-chain d...

completed

View on Ideascale View on projectcatalyst.io

Current Project Status

Complete

Amount
Received

₳175,000

Amount
Requested

₳175,000

Percentage
Received

100.00%

Solution

Research and showcase optimal technology stack and procedures to build publicly accessible database, data retrieval and mining toolkit. Enable use of popular tools like LangChain, TensorFlow & Gephi.

Problem

More targeted analysis of large datasets, both from past funds and Catalyst Voices future iterations, needs to be possible to truly learn between iterations and improve the process continuously.

“Data-Driven Catalyst”: Toolkit for on-chain and off-chain data analytics & insight, optimize governance and prevent gaming and abuse

Solution

Please describe your proposed solution.

Process leading to “Data-Driven Catalyst”:

Stakeholder Feedback Process: Each milestone will involve presentation of objectives and finding to the Catalyst Team, Catalyst Voices Team, Proposal Assessor / Community Advisor Telegram community. There have been countless contributions to data analytics by various members of the community. The objectives have been manyfold: finding gaming of the system, duplicates and AI-generated content. Finding the most efficient categories to get funded "easily". Correlation between voting success and some other metric such as PA score, length of proposal, etc. We need to converge on the most meaningful past datasets and ensure that data collection and comparison over funds is possible.
Dashboard and Executive Reports: Once the suitable Catalyst datasets and time series have been identified, we need to collect, clean and present the data in the most intuitive and unbiased way, give the largest possible number of community members access and the tools to tinker with the data, find flaws, be creative and converge on the most meaningful metrics to improve the Catalyst process going forward, find threats (like gaming the rules, Sybil attack vulnerability etc)
Robust, Open Source Tech stack: We have relied too much on Big Tech tools like Google Drive, Excel spreadsheets and Google forms in the past. Catalyst Voices will be a huge improvement, but risks being accessible by far too few people with the sufficient technical expertise. We need to research, compare and test drive the most robust, open source and indestructible, free and lightweight tech stack possible. Open source data visualization and reports software using popular languages like python and/or JavaScript.
Collaborative: There are a lot of people building similar tools in web3, in Cardano, for Catalyst - we need to move out of our silos and find the working groups creating gold standards for data-driven, open source community learning and governance around permissionless systems and data-driven self improvement.
Flexible and reactive: We don't want to create white elephants with the Catalyst budget, but move from milestone to milestone with an open mind, pursue the datasets, tech solutions and collaboration tools that work best, and drop the ones that don't.

At the end of our 7 month project in close collaboration with the Catalyst team, Catalyst Voices and the community, we will present findings and have final working database for Funds 7 to 10, open to anyone collaboration platform and data storage and retrieval solution that allows seamless integration with the most exciting and cutting edge data science, machine learning and LLM tools out there, for example LangChain, Gemini, ChatGPT, TimeGPT, and do network graphs, identify clusters and discover insights that can drive Catalyst to get better and better over each iteration.

Impact

Please define the positive impact your project will have on the wider Cardano community.

A lot of changes from fund to fund seem arbitrary and sometimes over-calibrated, trying to right past wrong by overshooting in the other direction. This is surely the result of too little meaningful decentralization and the cacophony of the rise and fall of certain powers held by the community (like the "rise and fall of the proposal assessor"). By doing extensive research into a parallel data infrastructure owned by the community and run by the community, to find facts and prove them with hard data, we hope and believe that the process will develop a unique, decentralized dynamic and attract more and more "benign data nerds" to the space that feel that their expertise can have a real-world impact, like Kaggle did in the early days of data science and the internet going mainstream.

Capability & Feasibility

What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

Proposers are Plutus Pioneers (cohorts 3 and 4), Atala PRISM Pioneers (2nd cohort) and Marlowe Pioneers. The team covers a broad range of skills from successful startup founders, technical and business backgrounds to community functions. We are funded proposers, have closed out successfully projects in Funds 7 and 8, and have participated in various Catalyst functions including Challenge Teams since Fund-7 (Boosting Cardano’s DeFi) to fund 8,9 (Developer Ecosystem) and current fund 10 (Developer Ecosystem-The Evolution), CA, PA, VCA, VPA and Sub-Circle. We have grown the Developer Ecosystem community on telegram channel: <https://t.me/DeveloperEcosystem> (149 members as of 7 December 2023) from Fund-8 until now and would like to add value to the ecosystem by applying our experience and insights which we gained from our participation in the PoA pilot in fund 9 to improve the overall process and form the Catalyst Reward DAO community.

Sapient Predictive Analytics has been in the data science, machine learning and collective intelligence space since 2018, winning awards and recognition in Singapore and beyond. We have previously worked with Fraunhofer Institute, Macquarie bank and many small and medium size companies for their decision making, governance, data analytics and trading desks.

<https://www.18hall.com/sapient-predictive-analytics/>

Project Milestones

What are the key milestones you need to achieve in order to complete your project successfully?

Milestone 1: Mar 2024 (Project outline, discussions with Catalyst team, documentation of objectives and adjustments of milestones where necessary based on other proposals in this category and dynamics of the Project Catalyst evolution, Github repo and definition of roles of collaborators)

All milestones including first will be subject to Statement-of-Milestones procedure and documented briefings with the Catalyst Team to ensure value for money and appropriate finetuning with the technical capabilities and requirements of the Catalyst Voices and Hermes infrastructure. Community telegram channels and townhall will be briefed as often as possible as part of this process, which should exceed the community-involvement standard for open projects.

10% of total budget (ADA 17,500)

>Milestone 2: Apr 2024 Establish tech stack, collaborative project management & workshop

20 examples to data-driven reports presented on Telegram and Github using funds 7 to 10.

Choice of database and API solutions. Teasers of LLM organization and unsupervised learning insights from previous funds to brief Catalyst teams and community of the potentials and limitations.

20% of total budget (ADA 35,000)

>Milestone 3: May 2024

On-demand data analytics projects: community voting on 7 most sought after tools or datasets projects - this milestone is the most open and depends on community finding and research findings up to this stage. Statement of Milestones process and liaison with Catalyst Team will allow flexibility to ensure optimal integration into ongoing liquid democracy implementation.

20% of total budget (ADA 35,000)

>Milestone 4: June 2024 (database and API production ready, 7 community-tasked research projects finalized and presented)

15% of total budget (ADA 26,250)

>Milestone 5: July2024 (research into integration of experimental generative AI and unsupervised learning, improvements of data quality and incompatibility shortcomings)

15% of total budget (ADA 26,250)

>Milestone 6: September 2024

Close-out video and documentation, finalization meetings with IOG and Catalyst Team.

Gathering feedback from Proposal Assessor community, funded proposers, Catalyst coordinators and other such groups. Presentation at Townhall and full report as PDF.

20% of total budget (ADA 35,000)

Resources

Who is in the project team and what are their roles?

Thomas Wedler:

Experienced financial trader and entrepreneur. Ex Shell, Vattenfall, Masefield senior futures and options trader. Individual floor trader at Singapore Exchange. Tom has been building and deploying programs for automated market making and energy derivatives since 2014. 15 years Derivatives experience at multi-national organizations working closely with industry bodies and speaker at market conferences and workshops. Involved in crypto trading since 2014 and DeFi/oracles since 2018. Plutus Pioneer, Marlowe Pioneer and Atala Prism Pioneer.

Thomas is a certified Superforecaster with the Good Judgment Project and winner of the inaugural Hybrid Forecasting Challenge at SAGE / University of Southern California.

https://www.linkedin.com/in/thomas-wedler-18960/

Role in Catalyst: Challenge Team (Fund 8-10), Sub-circle3, Catalyst Coordinators (funded proposers), Veteran Proposal Assessor, Reviewer in funded project milestone reporting (PoA pilot) Fund 9 & Fund 10

June Akra:

Sapient developer team: to provide UI front-end and API for the portal

Founding member of BlockCarbon, financial market expert and academic with vast experience in risk management, derivatives and commodities. Experience for various risk functions in 2 billion dollar AUM fund. Holder of Master degree in Investment with distinction and awarded Draper Prize. Certified Quantitative Finance (CQF) alumni London. Experienced video editor, content creator with combined 50,000 followers on social media, NFT collector and creator. Certified python AI practitioner, Plutus Pioneer & Atala Prism Pioneer.

https://www.linkedin.com/in/june-a-a3a0b4174

Role in Catalyst: Challenge Team (Fund 7-10), Sub-circle3, Catalyst Coordinators (funded proposers), Veteran Proposal Assessor, Reviewer in funded project milestone reporting (PoA pilot) Fund 9 & Fund 10

Budget & Costs

Please provide a cost breakdown of the proposed work and resources.

A. Project management and community reporting, dedicated social media and Github to be created for this purpose and populated for each milestone with the findings and minutes of meetings with Catalyst team members. 280 hours total @ 200 ada/hour = 42,000 ada

B. Implementation of open source database and data analytics tool, integration of LangChain and UX = 300 hours @ 200 ada/hour = 60,000 ada

C. On-demand data analysis after community feedback, 7 total bounties @ 10,000 ada = 70,000 ada

D. Miscellaneous costs like domain registration, cloud hosting and UX third party software (unrelated to the data tool and open source stack): 3000 ada

Value For Money

How does the cost of the project represent value for money for the Cardano ecosystem?

Our proposal leverages modern data analysis tools and generative AI, for example tools like LangChain build on top of an open source data collection of past and future funds, to scrutinize past funding rounds within the Catalyst ecosystem. The primary objective is to unveil hidden relationships and extract valuable insights from historical data. This report assesses the project's potential to deliver value for money, outlining key factors contributing to its efficacy and overall impact on the blockchain ecosystem. Key benefits include:

Utilization of cutting-edge data analysis tools and generative AI capabilities enhances the project's potential to uncover novel insights.

Automation of data analysis processes reduces manual efforts and accelerates the identification of patterns, enabling more efficient decision-making. The project aims to provide actionable insights derived from past funding rounds, facilitating informed decision-making for stakeholders in the blockchain ecosystem.

Ecosystem Development:

The project's outcomes have the potential to enable a more robust and informed Catalyst process and ecosystem, supporting sustainable growth and innovation. We expect that the project will enhance Catalyst efficiency and fairness by learning from predictive models that can reveal hidden correlations and relationships, improving the accuracy of forecasting impact of rule changes and future funding trends.

More Informed Decision-Making & Enhanced Collaboration:

Proposers can leverage the insights generated to make data-driven decisions, leading to more successful funding strategies and project developments. The project's findings may encourage collaboration within the blockchain community, fostering an environment conducive to shared insights and collective growth.

Combining for the best possible Human+AI Outputs:

Develop clear frameworks for interpreting AI-generated insights to ensure stakeholders can easily comprehend and act upon the information. The project's utilization of modern data analysis tools and generative AI, specifically LangChain, presents a compelling case for value creation within the blockchain ecosystem. By efficiently uncovering hidden relationships in past funding rounds, the project has the potential to significantly impact decision-making processes, foster collaboration, and contribute to the overall growth and resilience of the Catalyst ecosystem.

Image file

Example of community members analyzing Catalyst data: VPA Telegram group contribution by Victor Corcino (https://t.me/victorcorcino)

We need a lot more of this, across funds, and leveraging open source data mining and AI packages to the standard of 2023.

bookmarked!

bookmarked!