Website Scraper

Extract content from websites for further processing and analysis.

What is the Website Scraper node?

'Image showing a software interface labeled 'Website Scraper'. It includes a dropdown menu labeled 'Source' with 'Custom URL' selected. Below is a text field labeled 'Custom URL' with the placeholder text 'Provide an URL to scrape'. Icons are present at the top for settings and information.'

The Website Scraper node is designed to extract content from specified websites using direct URL inputs. It offers the flexibility to choose between custom URLs or external URL sources, making it highly adaptable for various web scraping needs. The scraped content can be easily utilized for further processing or analysis within your workflow.

How to use it?

To set up and use the Website Scraper node, follow these steps:

Add the Website Scraper Node to Your Workflow:
- Drag and drop the Website Scraper node from the tool panel to your workflow canvas.
Configure the Node:
- Source: Choose the source of the URL you want to scrape.
  - Custom URL: If you want to manually specify the URL.
  - External URL: If you want to use a URL provided by another node.
- Custom URL: If you selected "Custom URL" as the source, provide the URL of the website you want to scrape.
Connect Input Anchors (if applicable):
- If you selected "External URL" as the source, connect an appropriate node that outputs a URL to the "URL" input anchor of the Website Scraper node.
Connect Output Anchors:
- Connect the "Website Content" output anchor to the next node in your workflow where the scraped content will be processed or analyzed.

Example of Usage

Example Task: Scraping a News Article for Sentiment Analysis

Objective: Set up a workflow to scrape a news article and analyze its sentiment

Step-by-Step Guide

Text Input:
- Add a Text Input node to your canvas.
- Configure it to accept a URL from the user.
Website Scraper Node:
- Add the Website Scraper node to the canvas.
- Set the Source to "External URL."
- Connect the "Text" output of the Text Input node to the "URL" input of the Website Scraper node.
Text Prompt Node:
- Add the Text Prompt node to the canvas.
- Add an Prompt Template to guide the LLM to perform an Sentiment Analysis. You can create your own template or use the following template

**Context:**
You have the text output from a website that has been scraped. The goal is to analyze the sentiment expressed in the text to determine whether it is positive, negative, or neutral.
**Input:**
The text content extracted from the scraped website. This text may include articles, comments, reviews, or any other form of written content found on the website.

**Task:**

1. **Sentiment Classification:**
   - Classify each sentence or paragraph in the text as having positive, negative, or neutral sentiment.
2. **Sentiment Score:**
   - Assign a sentiment score to each sentence or paragraph. The score can range from -1 (very negative) to +1 (very positive), with 0 being neutral.
3. **Overall Sentiment:**
   - Provide an overall sentiment classification and score for the entire text based on the individual sentence/paragraph analyses.
4. **Key Sentiment Phrases:**
   - Identify key phrases or sentences that contribute most to the overall sentiment of the text. Highlight these phrases and indicate their individual sentiment classification and score.

**Output:**

- A summary of the sentiment classification and scores for each sentence or paragraph.
- The overall sentiment classification and score for the entire text.
- A list of key sentiment phrases with their classifications and scores.

**Example Input:**

Text: "The product is amazing! I love the quality and the design. However, the customer service was disappointing. I had to wait for an hour to get a response. Overall, it's a good product, but the service needs improvement."

**Example Output:**

Sentence Analysis:

1. "The product is amazing!" - Positive (Score: +0.9)
2. "I love the quality and the design." - Positive (Score: +0.8)
3. "However, the customer service was disappointing." - Negative (Score: -0.7)
4. "I had to wait for an hour to get a response." - Negative (Score: -0.6)
5. "Overall, it's a good product, but the service needs improvement." - Neutral (Score: +0.1)

Overall Sentiment:
Classification: Mixed
Score: +0.3

Key Sentiment Phrases:

1. "The product is amazing!" - Positive (Score: +0.9)
2. "However, the customer service was disappointing." - Negative (Score: -0.7)

## The Scraped Website

<input-0>

OpenAI Node for Sentiment Analysis:
- Add a OpenAI node to the canvas.
- Connect the "Website Content" output of the Website Scraper node to the input of the Sentiment Analysis node.
Output Node:
- Add an Output node to display the sentiment analysis results.
- Connect the output of the Open AI node to the input of the Output node.

A flowchart with five connected boxes on a dark background. The boxes contain text, website scraper, text input, OpenAI LLM, and output. Lines connect the boxes, showing a process flow from left to right, ending at the output box. Each box has various settings and variables displayed.

Summary:

Text Input Node: Accepts a URL.
Website Scraper Node: Scrapes content from the URL.
Sentiment Analysis Node: Analyzes the sentiment of the scraped content.
Output Node: Displays the analysis results.

Additional Information

Customization: You can extend this setup by adding more nodes for further processing, such as storing the scraped content in a database or triggering alerts based on specific content.
Error Handling: Ensure that the URLs provided are valid and accessible to avoid errors during scraping. You might want to add error-handling nodes or checks to manage potential issues like inaccessible URLs or unexpected content formats.

By following these steps, you can easily set up a workflow that scrapes website content and processes it according to your specific needs.

What is the Website Scraper node?​

How to use it?​

Example of Usage​

Example Task: Scraping a News Article for Sentiment Analysis​

Step-by-Step Guide​

Summary:​

Additional Information​