- Joined
- Mar 18, 2017
- Messages
- 233
- Reaction score
- 42
- Points
- 84
I collaborated with over 200 websites, employing AI text generation throughout 2022.
Despite having early access to GPT-4, its utilization is currently restricted by the NDA and OpenAI regulations, preventing its use in production.
Securing an investor for this substantial project has allowed us to enter with unlimited OpenAI credits, utilizing a finely tuned GPT-3 with no rate-limiting.
The pivotal aspect of this project revolves around rapid development, aiming to reach an MVP by the end of February – a goal set by the investor. While I retain all rights to the app, I am obligated to implement a solution tailored to the investor's needs by month-end (focused on e-commerce/local AI content).
For text generation, I will employ the well-tailored GPT-3 model "text-davinci-003," and for embeddings, the "text-embedding-ada-002" model.
As of writing this, the article quality generated by my app is akin to that of a proficient Upwork writer. I plan to share a sample soon to illustrate our current standing in terms of article quality.
My articles encompass jokes, anecdotes, internal links, citations with external links, lists, definitions, counterarguments, examples, statistics, historical facts, and quotes.
With the advent of GPT-3 and ChatGPT, the internet is evolving rapidly, demanding astuteness in navigating these changes.
This marks the opportune moment to elevate this venture, particularly as I observe Google adapting and penalizing thin content, not AI-generated content. Notable Google volatility was evident around last year, impacting both AI and non-AI sites.
Today, I aim to present my latest (possibly last?) project in meticulous detail. The coding process commenced from scratch on January 30th.
Presenting Turing Sites
I've named this initiative "Turing Sites" inspired by the Turing test, with the objective of making it challenging for humans to discern whether a website is crafted by another human or by AI.
It's crucial to note that this isn't a promotion for a new service. This application won't evolve into a subscription-based SaaS. The prospect of managing customer support for a complex app is the last thing I desire in my life.
Embarking on a Coding Odyssey
Prepare for a journey deep into the realms of programming, adorned with a touch of nerdy enthusiasm.
This marks the next chapter in the evolution of my initial app, a project that consumed over a year of my efforts. You can trace the origins of this expedition in my year-long journal found here: link to the journey.
The first app eventually morphed into a labyrinthine tangle, necessitating a complete refactoring to ascend to the next echelon, and now I find myself here.
In this narrative, I'll delve into the intricate process of constructing a robust piece of software. Expect insights into scientific papers that have influenced my solutions. I'm eager to share specific code snippets and tricks employed in this venture, open to divulging approximately 90% of the knowledge. Be aware that my responses will be limited to what I can share within the constraints of my business arrangement.
The Core Concept
Allow me to encapsulate the essence of the project through a few fundamental concepts:
The primary application is designed to autonomously generate and manage evolving WordPress sites with minimal human intervention. Notably, traditional webservers will not be utilized. Instead, the app will locally oversee the WordPress instances, pushing static HTML to Cloudflare Pages and images to Cloudflare R2, an S3-compatible service.
Our primary focus revolves around content quality and EEAT (Expertise, Authoritativeness, Trustworthiness).
An AI agent will scrutinize user signals from the sites and Google Search Console, leveraging this information to enhance site performance.
The objective is to create and manage 1000 websites on a dedicated server. This projection is based on the assumption of utilizing a 64-core server, serving as a general target.
The sites will be established on aged or expired domains featuring DR80+ do-follow backlinks.
Monetization will be achieved through Mediavine. Leveraging Mediavine Pro, I can incorporate sites once they reach 25k monthly visits.
Upon establishing a robust earnings history for a site, the intention is to sell it. The overarching idea is that each of the 1,000 sites will generate an average of $1k per month, recognizing a potential bell curve in performance.
Having engaged with AI-generated websites since the GPT-2 private beta release, a substantial portion of the code has been swiftly completed. However, a few weeks are required to implement all the new ideas.
Despite being a lone wolf, I have enlisted the support of one virtual assistant (VA), a programmer, to assist with tasks like refactoring and testing.
Efforts are underway to introduce a frontend for enhanced user convenience. This is in contrast to my previous command-line-based app, which was characterized by convoluted spaghetti code.
The Technological Framework
I'm developing the entire system using Python 3.10.9, as certain fundamental libraries are not compatible with Python 3.11.
The Present Condition of the Database Schema
This is subject to change as the app progresses. Presently, I'm disclosing the complete database schema in PostgreSQL format here:
References
The Python book that has been a significant influence is "Architecture Patterns with Python" by Harry and Bob Gregory.
It delves into building an architecture supporting domain modeling and event-driven design in Python, emphasizing test-driven development, domain-driven design, and the dependency inversion principle. The book covers various topics, including encapsulation, abstractions, layering, repository patterns, aggregates, consistency boundaries, unit of work patterns, and message buses. It provides sample code, coding tips, and examples illustrating the use of events to integrate microservices.
Link: Architecture Patterns with Python
The AI paper that has left a profound impact is "Re3: Generating Longer Stories With Recursive Reprompting and Revision" by Kevin Yang, Yuandong Tian, Nanyun Peng, and Dan Klein. The paper proposes the Recursive Reprompting and Revision (Re3) framework for generating longer texts (over 2,000 words) with long-range plot coherence and relevance. The framework involves prompting a language model to construct an overarching plan, generating story passages, reranking continuations for coherence and relevance, and editing for factual consistency. Evaluation showed that Re3's texts exhibited a substantial increase in coherent overarching plots (14% absolute increase) and relevance to the initial premise (20% absolute increase) compared to texts generated directly from the language model.
Link: Re3: Generating Longer Stories With Recursive Reprompting and Revision
Key Insights and Lessons Learned:
Best regards
Despite having early access to GPT-4, its utilization is currently restricted by the NDA and OpenAI regulations, preventing its use in production.
Securing an investor for this substantial project has allowed us to enter with unlimited OpenAI credits, utilizing a finely tuned GPT-3 with no rate-limiting.
The pivotal aspect of this project revolves around rapid development, aiming to reach an MVP by the end of February – a goal set by the investor. While I retain all rights to the app, I am obligated to implement a solution tailored to the investor's needs by month-end (focused on e-commerce/local AI content).
For text generation, I will employ the well-tailored GPT-3 model "text-davinci-003," and for embeddings, the "text-embedding-ada-002" model.
As of writing this, the article quality generated by my app is akin to that of a proficient Upwork writer. I plan to share a sample soon to illustrate our current standing in terms of article quality.
My articles encompass jokes, anecdotes, internal links, citations with external links, lists, definitions, counterarguments, examples, statistics, historical facts, and quotes.
With the advent of GPT-3 and ChatGPT, the internet is evolving rapidly, demanding astuteness in navigating these changes.
This marks the opportune moment to elevate this venture, particularly as I observe Google adapting and penalizing thin content, not AI-generated content. Notable Google volatility was evident around last year, impacting both AI and non-AI sites.
Today, I aim to present my latest (possibly last?) project in meticulous detail. The coding process commenced from scratch on January 30th.
Presenting Turing Sites
I've named this initiative "Turing Sites" inspired by the Turing test, with the objective of making it challenging for humans to discern whether a website is crafted by another human or by AI.
It's crucial to note that this isn't a promotion for a new service. This application won't evolve into a subscription-based SaaS. The prospect of managing customer support for a complex app is the last thing I desire in my life.
Embarking on a Coding Odyssey
Prepare for a journey deep into the realms of programming, adorned with a touch of nerdy enthusiasm.
This marks the next chapter in the evolution of my initial app, a project that consumed over a year of my efforts. You can trace the origins of this expedition in my year-long journal found here: link to the journey.
The first app eventually morphed into a labyrinthine tangle, necessitating a complete refactoring to ascend to the next echelon, and now I find myself here.
In this narrative, I'll delve into the intricate process of constructing a robust piece of software. Expect insights into scientific papers that have influenced my solutions. I'm eager to share specific code snippets and tricks employed in this venture, open to divulging approximately 90% of the knowledge. Be aware that my responses will be limited to what I can share within the constraints of my business arrangement.
The Core Concept
Allow me to encapsulate the essence of the project through a few fundamental concepts:
The primary application is designed to autonomously generate and manage evolving WordPress sites with minimal human intervention. Notably, traditional webservers will not be utilized. Instead, the app will locally oversee the WordPress instances, pushing static HTML to Cloudflare Pages and images to Cloudflare R2, an S3-compatible service.
Our primary focus revolves around content quality and EEAT (Expertise, Authoritativeness, Trustworthiness).
An AI agent will scrutinize user signals from the sites and Google Search Console, leveraging this information to enhance site performance.
The objective is to create and manage 1000 websites on a dedicated server. This projection is based on the assumption of utilizing a 64-core server, serving as a general target.
The sites will be established on aged or expired domains featuring DR80+ do-follow backlinks.
Monetization will be achieved through Mediavine. Leveraging Mediavine Pro, I can incorporate sites once they reach 25k monthly visits.
Upon establishing a robust earnings history for a site, the intention is to sell it. The overarching idea is that each of the 1,000 sites will generate an average of $1k per month, recognizing a potential bell curve in performance.
Having engaged with AI-generated websites since the GPT-2 private beta release, a substantial portion of the code has been swiftly completed. However, a few weeks are required to implement all the new ideas.
Despite being a lone wolf, I have enlisted the support of one virtual assistant (VA), a programmer, to assist with tasks like refactoring and testing.
Efforts are underway to introduce a frontend for enhanced user convenience. This is in contrast to my previous command-line-based app, which was characterized by convoluted spaghetti code.
The Technological Framework
I'm developing the entire system using Python 3.10.9, as certain fundamental libraries are not compatible with Python 3.11.
- Metronic Bootstrap (HTML, CSS, JS)
- Jinja2 (Template Renderer)
- Flask (Web Server Gateway Interface)
- *Gunicorn (Webserver)
- *Nginx (Reverse Proxy)
- Python Backend
- GPT LLM (Generative Pre-trained Transformer Large Language Model)
- GPT EM (Generative Pre-trained Transformer Embeddings Model)
- Local Database
The Present Condition of the Database Schema
This is subject to change as the app progresses. Presently, I'm disclosing the complete database schema in PostgreSQL format here:
Code:
CREATE TABLE "author" (
"id" SERIAL PRIMARY KEY,
"wp_id" TEXT NOT NULL,
"username" TEXT NOT NULL,
"firstname" TEXT NOT NULL,
"lastname" TEXT NOT NULL,
"nickname" TEXT NOT NULL,
"bio" TEXT NOT NULL,
"email" TEXT NOT NULL,
"website" TEXT NOT NULL,
"twitter" TEXT NOT NULL,
"facebook" TEXT NOT NULL
);
CREATE TABLE "cluster" (
"id" SERIAL PRIMARY KEY
);
CREATE TABLE "credential" (
"id" SERIAL PRIMARY KEY,
"type" TEXT NOT NULL,
"username" TEXT NOT NULL,
"password" TEXT NOT NULL,
"api_key" TEXT NOT NULL,
"api_secret" TEXT NOT NULL
);
CREATE TABLE "language_model" (
"id" SERIAL PRIMARY KEY,
"name" TEXT NOT NULL,
"model" TEXT NOT NULL
);
CREATE TABLE "prompt_chain" (
"id" SERIAL PRIMARY KEY
);
CREATE TABLE "similarity_embedding" (
"id" SERIAL PRIMARY KEY,
"string" TEXT NOT NULL
);
CREATE TABLE "social_media" (
"id" SERIAL PRIMARY KEY,
"type" TEXT NOT NULL,
"username" TEXT NOT NULL,
"password" TEXT NOT NULL,
"api_key" TEXT NOT NULL,
"api_secret" TEXT NOT NULL,
"author" INTEGER NOT NULL
);
CREATE INDEX "idx_social_media__author" ON "social_media" ("author");
ALTER TABLE "social_media" ADD CONSTRAINT "fk_social_media__author" FOREIGN KEY ("author") REFERENCES "author" ("id") ON DELETE CASCADE;
CREATE TABLE "vector" (
"id" SERIAL PRIMARY KEY,
"string" TEXT NOT NULL,
"cosine_similarity" DOUBLE PRECISION,
"similarity_embedding" INTEGER NOT NULL
);
CREATE INDEX "idx_vector__similarity_embedding" ON "vector" ("similarity_embedding");
ALTER TABLE "vector" ADD CONSTRAINT "fk_vector__similarity_embedding" FOREIGN KEY ("similarity_embedding") REFERENCES "similarity_embedding" ("id") ON DELETE CASCADE;
CREATE TABLE "website" (
"id" SERIAL PRIMARY KEY,
"domain" TEXT UNIQUE NOT NULL,
"wp_username" TEXT NOT NULL,
"wp_password" TEXT NOT NULL,
"api_username" TEXT NOT NULL,
"api_password" TEXT NOT NULL,
"name" TEXT NOT NULL,
"tagline" TEXT NOT NULL,
"email" TEXT NOT NULL,
"topic" TEXT NOT NULL,
"description" TEXT NOT NULL,
"language" TEXT NOT NULL,
"proxy" TEXT NOT NULL,
"status" TEXT NOT NULL,
"language_model" INTEGER NOT NULL
);
CREATE INDEX "idx_website__language_model" ON "website" ("language_model");
ALTER TABLE "website" ADD CONSTRAINT "fk_website__language_model" FOREIGN KEY ("language_model") REFERENCES "language_model" ("id") ON DELETE CASCADE;
CREATE TABLE "api_key" (
"id" SERIAL PRIMARY KEY,
"type" TEXT NOT NULL,
"api_key" TEXT NOT NULL,
"in_use" BOOLEAN,
"website" INTEGER
);
CREATE INDEX "idx_api_key__website" ON "api_key" ("website");
ALTER TABLE "api_key" ADD CONSTRAINT "fk_api_key__website" FOREIGN KEY ("website") REFERENCES "website" ("id") ON DELETE SET NULL;
CREATE TABLE "category" (
"id" SERIAL PRIMARY KEY,
"name" TEXT NOT NULL,
"description" TEXT NOT NULL,
"website" INTEGER NOT NULL
);
CREATE INDEX "idx_category__website" ON "category" ("website");
ALTER TABLE "category" ADD CONSTRAINT "fk_category__website" FOREIGN KEY ("website") REFERENCES "website" ("id") ON DELETE CASCADE;
CREATE TABLE "keyword" (
"id" SERIAL PRIMARY KEY,
"keyword" TEXT NOT NULL,
"category" INTEGER NOT NULL,
"volume" INTEGER,
"difficulty" INTEGER,
"language" TEXT NOT NULL,
"cpc" DOUBLE PRECISION,
"cluster" INTEGER
);
CREATE INDEX "idx_keyword__category" ON "keyword" ("category");
CREATE INDEX "idx_keyword__cluster" ON "keyword" ("cluster");
ALTER TABLE "keyword" ADD CONSTRAINT "fk_keyword__category" FOREIGN KEY ("category") REFERENCES "category" ("id") ON DELETE CASCADE;
ALTER TABLE "keyword" ADD CONSTRAINT "fk_keyword__cluster" FOREIGN KEY ("cluster") REFERENCES "cluster" ("id") ON DELETE SET NULL;
CREATE TABLE "article" (
"id" SERIAL PRIMARY KEY,
"title" TEXT NOT NULL,
"content" TEXT NOT NULL,
"posted" TEXT NOT NULL,
"type" TEXT NOT NULL,
"keyword" INTEGER NOT NULL,
"author" INTEGER NOT NULL
);
CREATE INDEX "idx_article__author" ON "article" ("author");
CREATE INDEX "idx_article__keyword" ON "article" ("keyword");
ALTER TABLE "article" ADD CONSTRAINT "fk_article__author" FOREIGN KEY ("author") REFERENCES "author" ("id") ON DELETE CASCADE;
ALTER TABLE "article" ADD CONSTRAINT "fk_article__keyword" FOREIGN KEY ("keyword") REFERENCES "keyword" ("id") ON DELETE CASCADE;
CREATE TABLE "image" (
"id" SERIAL PRIMARY KEY,
"name" TEXT NOT NULL,
"url" TEXT NOT NULL,
"article" INTEGER NOT NULL
);
CREATE INDEX "idx_image__article" ON "image" ("article");
ALTER TABLE "image" ADD CONSTRAINT "fk_image__article" FOREIGN KEY ("article") REFERENCES "article" ("id") ON DELETE CASCADE;
CREATE TABLE "prompt" (
"id" SERIAL PRIMARY KEY,
"name" TEXT NOT NULL,
"prompt" TEXT NOT NULL,
"tokens" INTEGER,
"temperature" DOUBLE PRECISION,
"best_of" INTEGER,
"website" INTEGER NOT NULL,
"prompt_chain" INTEGER
);
CREATE INDEX "idx_prompt__prompt_chain" ON "prompt" ("prompt_chain");
CREATE INDEX "idx_prompt__website" ON "prompt" ("website");
ALTER TABLE "prompt" ADD CONSTRAINT "fk_prompt__prompt_chain" FOREIGN KEY ("prompt_chain") REFERENCES "prompt_chain" ("id") ON DELETE SET NULL;
ALTER TABLE "prompt" ADD CONSTRAINT "fk_prompt__website" FOREIGN KEY ("website") REFERENCES "website" ("id") ON DELETE CASCADE;
CREATE TABLE "serp" (
"id" SERIAL PRIMARY KEY,
"title" TEXT NOT NULL,
"url" TEXT NOT NULL,
"meta" TEXT NOT NULL,
"position" INTEGER,
"keyword" INTEGER NOT NULL
);
CREATE INDEX "idx_serp__keyword" ON "serp" ("keyword");
ALTER TABLE "serp" ADD CONSTRAINT "fk_serp__keyword" FOREIGN KEY ("keyword") REFERENCES "keyword" ("id") ON DELETE CASCADE
References
The Python book that has been a significant influence is "Architecture Patterns with Python" by Harry and Bob Gregory.
It delves into building an architecture supporting domain modeling and event-driven design in Python, emphasizing test-driven development, domain-driven design, and the dependency inversion principle. The book covers various topics, including encapsulation, abstractions, layering, repository patterns, aggregates, consistency boundaries, unit of work patterns, and message buses. It provides sample code, coding tips, and examples illustrating the use of events to integrate microservices.
Link: Architecture Patterns with Python
The AI paper that has left a profound impact is "Re3: Generating Longer Stories With Recursive Reprompting and Revision" by Kevin Yang, Yuandong Tian, Nanyun Peng, and Dan Klein. The paper proposes the Recursive Reprompting and Revision (Re3) framework for generating longer texts (over 2,000 words) with long-range plot coherence and relevance. The framework involves prompting a language model to construct an overarching plan, generating story passages, reranking continuations for coherence and relevance, and editing for factual consistency. Evaluation showed that Re3's texts exhibited a substantial increase in coherent overarching plots (14% absolute increase) and relevance to the initial premise (20% absolute increase) compared to texts generated directly from the language model.
Link: Re3: Generating Longer Stories With Recursive Reprompting and Revision
Key Insights and Lessons Learned:
- Long-range plot coherence and relevance are crucial challenges in generating longer texts.
- The Re3 framework provides a systematic approach to address these challenges through recursive reprompting and revision.
- Re3 can generate longer texts that are more coherent and relevant than texts generated directly from the language model.
- Integrating Re3 into content generation tools for more coherent and relevant longer texts.
- Improving language models by incorporating the Re3 framework.
- Personalizing story generation using Re3 based on user interests and preferences.
Best regards