by Marco Elumba

Background and Objectives

This project was initiated as part of an effort to enhance business expansion and increase the visibility of product offerings to potential clients. The objective was to develop an automated system that aggregates essential data from an online analytics platform for a predefined list of potential leads. The system utilizes this data to create customized email communications for targeted outreach. Built on cloud-based technology using Python and advanced AI tools, this automated solution operates at set intervals, ensuring consistent data collection and the delivery of personalized email content tailored to the specific needs and online activities of each lead.

Process and Tools

The process is divided into two main workflows, each utilising several functions and tools within the Google Cloud (GCP) environment.

Process Diagram

flowchart TB

%% Colors %%
linkStyle default stroke-width:2px
classDef blue fill:#2374f7,stroke:#000,stroke-width:2px,color:#fff
classDef orange fill:#fc822b,stroke:#000,stroke-width:2px,color:#fff
classDef green fill:#16b552,stroke:#000,stroke-width:2px,color:#fff
classDef red fill:#ed2633,stroke:#000,stroke-width:2px,color:#fff
classDef magenta fill:magenta,stroke:#000,stroke-width:2px,color:#fff

%% Table%
%% 0 %%
L[(*leads*)]:::blue
C[(*customers*)]:::blue
EH[(*leads_extract_history*)]:::blue
LS[(*leads_scraped_data*)]:::blue
LE[(leads_email)]:::blue

%% Scrape %%
%% 1,2,3,4 %%
L ---o |Select| ED(Extract Data):::orange 
C ---o |Select| ED(Extract Data):::orange 
ED(Extract Data):::orange ---> |Append| EH
EH ---o |Select| ED(Extract Data):::orange
ED(Extract Data):::orange ---> |Append| LS

%% Generate Email %%
%% 5,6 %%
LS ---o |Select| GE([Generate Email]):::red
GE([Generate Email]):::red ---> |Append| LE

%% View %%
%% 9 %%
C -..- |Select| FL[future_leads.view] -..- |Select| LE

%% Link Colors %%
linkStyle 0 stroke:blue
linkStyle 2,5 stroke:green
linkStyle 3,6 stroke:red
linkStyle 4 stroke:magenta

%% Clickable Links %%
click ED "<https://www.notion.so/Leads-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#110e8405570a801097c8f6935a89964a>"
click GE "<https://www.notion.so/Leads-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#110e8405570a80818696df699740c18e>"
click FL "<https://www.notion.so/Leads-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#110e8405570a807e869dc6579c685359>"
click C "<https://www.notion.so/Leads-email-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#111e8405570a8052b51bf804ad891f1b>"
click L "<https://www.notion.so/Leads-email-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#111e8405570a80c0bddbc7ae3d291faf>"
click EH "<https://www.notion.so/Leads-email-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#111e8405570a80ffb2a3ecd57e902eba>"
click LS "<https://www.notion.so/Leads-email-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#111e8405570a80599773c6875e07620a>"
click LE "<https://www.notion.so/Leads-email-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#111e8405570a8034b06aced72a742245>"
click FL "<https://www.notion.so/Leads-email-generation-workflow-110e8405570a80918970ed288cf34488?pvs=4#111e8405570a80a89f2ec0258c4646d5>"

1. Data Extraction

This process is designed to extract data for a list of leads. The steps involved are:

Step 1: Load Leads into BigQuery (BQ)

Step 2: Create Cloud Functions for Data Extraction

Tools: Google Cloud Functions, Python, Firecrawl package

A Python-based Google Cloud Function is used to extract data using the Firecrawl package. The process includes the following functions:

 Python function

get_leads Python function

 Python function

scrape_firecrawl Python function

The process is scheduled to run every 15 minutes using Google Cloud Scheduler (an option is to run this process at once and use multiple thread jobs but for ethical crawling the process is executed at interval time), checking for new domains to scrape and storing the extracted data in the leads_scraped_data table. Another process is also put in to make sure that random domain selection does not select the domain twice. To achieve this an intermediary table called l*eads_extract_history* is used to cross-checked that no domain will be selected that already exists in *leads_extract_history*.

Final Output of Data Extraction:

2. Targeted Email Generation

Tools: pandas_gbq, Python, OpenAI API, BigQuery

A Python-based Google Cloud Function is created to generate targeted emails based on the extracted data from SimilarWeb. The process includes the following functions: