n8n Tutorial: Automate AI News Digests and Publishing

n8n has been widely recognized as a core framework for "low-code × AI orchestration" within the Agentic AI ecosystem. It enables both developers and non-developers to seamlessly integrate traditional services (databases/HTTP APIs) with various large language models and vector retrieval systems through drag-and-drop operations, rapidly producing deployable automation scenarios.

This article will demonstrate how to use n8n to build an "AI News Collection and Publishing" pipeline that can automatically scrape news from multiple tech media sources, generate structured Chinese newsletters through OpenDataSky large language models, and publish them automatically. The entire process requires no backend coding-only node configuration on the n8n canvas.

What is n8n?

n8n (pronounced n-eight-n, an abbreviation for "nodes & numbers") is an open-source workflow automation and integration platform. It provides:

Visual DAG Designer: Each node represents an API call, script execution, or AI inference
400+ Built-in Official/Community Nodes: Covering databases, messaging, cloud services, and LangChain AI components
Conditional branching, error retry, centralized credential management, and version control
JavaScript/TypeScript snippet support, balancing "no-code" with "customizable"

In essence, n8n is a freely assemblable "data + AI assembly line."

What Will This Tutorial Achieve?

Objective: One-click generation of "AI industry news from the past 24 hours" with automatic publishing. The workflow is illustrated as follows:

Scheduled Triggers:
- Every 4 hours: RSS scraping → feed into "Collection Pipeline"
- Daily at 09:00: Launch "Newsletter Generation Pipeline"
Collection Pipeline (Data Layer)
- Read RSS → Standardize JSON → Calculate MD5 for deduplication
- Embedding → Vector + metadata written to pgvector
Summary Pipeline (Semantic Layer/Content Layer)
- Keyword Embedding → Similarity search for top-K chunks
- LLM Prompt: Sort, translate, Markdown formatting
- Select publishing platform and call API for publication

Final Result: A well-formatted, Chinese-readable AI newsletter without manual intervention.

Technology Stack Overview

n8n 1.93.0 + @n8n/n8n-nodes-langchain
PostgreSQL 14 + pgvector 0.6 (n8n built-in vector storage also available)
OpenDataSky API (OpenAI SDK compatible) Get API Key

How to Use n8n?

1. Web Version: Official n8n cloud service, visit n8n official website to login. (Requires paid subscription, 14-day free trial for new users) 2. Local Deployment: Deploy locally according to official documentation.

n8n also supports deployment in DataSky AI Box. This article is based on practical experience with DataSky AI Box deployment.

Orchestrating the Workflow

The entire automation system consists of two main independent workflows:

News collection workflow
Topic-based newsletter generation workflow

Let's break down these two workflows separately.

First, log into the n8n interface and click "Create Workflow" to create a blank workflow.

News Collection Workflow

Configure Scheduled Trigger

Click "Add first step..." and select "On a schedule" to create a scheduled trigger.

add_trigger

You can also choose other triggers based on actual needs. For automated workflow execution, a trigger must be the first node.

Note: Since n8n uses US timezone by default, modify the timezone to ensure the scheduler triggers as expected.

n8n_setting

timezone

Open the "On a schedule" node to customize the timing configuration. Configure the trigger frequency as needed. Here we select triggering every 4 hours.

schedule

Alternative Triggers

You can also directly use the "RSS Feed Trigger". The RSS trigger also supports configuring trigger frequency.

Since the RSS trigger is inconvenient for adding multiple RSS news sources later, we recommend using "On a schedule" trigger first, then configuring RSS news sources in subsequent nodes. (If you only have one news source, you can use "RSS Feed Trigger")

Set RSS News Source URLs

Add an "Edit Fields" node after the scheduler to define the RSS URL array. Configure as follows:

rss

The type must be "Array". Here's the input template:

Array

[
  "https://feeds.arstechnica.com/arstechnica/index",
  "https://www.theverge.com/rss/index.xml",
  "...other RSS source URLs"
]

Click "Test Step" to test this node. The output should look like this:

add_rss_test

RSS URLs can be found on various news media platforms

Split Array

Split the array from the previous node into different items for subsequent RSS reading nodes:

split_out

Test output:

split_out_test

You can see the array has been successfully split, with 2 items.

Read RSS News Sources

Add an RSS Read node to extract news content. In the URL field, input the rss variable from the previous node $json.rss. The RSS Read node will output news from each URL:

rss_read

Test to get news content:

rss_read_test

Extract Valid Fields

Add another "Edit Fields" node to extract the required valid fields for subsequent processing:

edit_fields

Field Name	Field Type	Value (Variable)
title	String	`$json.title`
content	String	`$json['content:encodedSnippet'] ?? $json.contentSnippet`
date	String	`$json.isoDate`
link	String	`$json.link`

Test output result:

edit_fields_test

Generate Hash Value (Optional)

This node generates identifiers for article titles to facilitate deduplication:

crypto

Test result:

crypto_test

Store News as Vector Data

Vector storage enables semantic search among large volumes of news, providing more topic-relevant articles. (If high precision isn't required, you can directly store in SQL and use simple keyword searches later)

Add a Vector Store node:

add_vector

Besides the database node, vector storage requires 3 additional nodes: embedding model node, data loader, and text splitter.

pgstore

Configure Database Credentials

Open the Postgres PGVector Store node and add database credentials in "Credential to connect with":

add_pg

Configure according to your database setup. Note: PostgreSQL needs the pgvector extension installed for vector storage functionality.

create_pg

After adding database credentials, configure the Postgres PGVector Store node. Ensure the Table name doesn't conflict with existing database tables. The Embedding Batch Size parameter depends on the embedding model used.

pg_node

Configure Embedding Model

Add embedding model resources in the Embeddings node. First, add API interface credentials. Enter OpenDataSky API here, with Base URL: http://server.opendatasky.com/v1/api/open-ai/ds, and get the API Key from the platform.

LLMapi

Note

When saving API credentials, n8n calls the /modellist interface for connectivity verification. Since OpenDataSky hasn't opened the /modellist interface yet, error messages will appear when clicking save-ignore them. (This won't affect subsequent model calls)

Enter the embedding model ID. Here we selected the Qwen series embedding model text-embedding-v3. Other models are also available. See Model List for supported models.

embedding

Configure Data Loader

The data loader processes news data from previous nodes. Select JSON for data type.

Since this data will be vectorized, selectively provide data for vectorization (like article titles and content). Choose "Load Specific Data" for Mode, and combine news title + content in Data field. Copy the variable below:

# {{$json.title}}
{{ $json.content }}

Additionally, configure Metadata, which will be stored as jsonb type in PostgreSQL. Configure title, hash, link, time to prevent data loss, and add other fields if needed (like createDate).

Click "Add Option" to configure Metadata according to the example:

Name	Value
title	`$json.title`
createDate	`$now.toISO()`
publishDate	`$json.date`
link	`$json.link`
title_hash	`$json.md5`

Select "Expression" when filling Value fields

After configuring title_hash in metadata, you can add triggers in PostgreSQL to extract title_hash as a separate column during database writes for easier querying. (You can also query directly in metadata JSON without triggers)

Trigger example:

sql

CREATE OR REPLACE FUNCTION extract_title_hash()
RETURNS trigger AS $$
BEGIN
  NEW.title_hash := NEW.metadata ->> 'title_hash';
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_extract_title_hash
BEFORE INSERT OR UPDATE ON n8n_vectors_test
FOR EACH ROW EXECUTE FUNCTION extract_title_hash();

Note

Using this trigger requires the database to have a title_hash column. Manually create the title_hash column with text type.

Configure Text Splitter

When adding a text splitter node, recommend selecting "Recursive Character Text Splitter" type, which splits news into complete paragraphs.

Set a reasonable Chunk Size within the node. No data overlap between chunks is needed, so set Chunk Overlap to 0. (Chunk Size shouldn't be too high to avoid exceeding embedding model Max Tokens)

splitter

Testing

After configuring vector storage according to the above steps, click "Test step" in the "Postgres PGVector Store" node.

Test result should be:

pgstore_test

You can see articles have been stored in the database.

Current workflow nodes should look like:

read_news_flow

Note

You can also replace Postgres PGVector Store with n8n's built-in "Simple Vector Store", but Simple Vector Store isn't suitable for persistent storage and can't perform subsequent deduplication work, so it's not recommended.

Article Deduplication (Optional)

Since we added the Generate Hash Value node earlier, we can use the generated hash values to filter duplicate articles.

Deduplication requires two nodes: "Postgres" and "Merge" for querying and merging respectively. First, disconnect the Generate Hash Value node from the database storage connection, with other nodes providing output to the database node later.

Postgres Node

Using the Generate Hash Value node as output, add a new node.

Select Postgres node with SQL query action. Don't confuse with "Postgres PGVector Store" node-"Postgres PGVector Store" and "Postgres" are different types.

pgsql

Following the previous workflow, input the query statement in this node to find duplicate hashes:

sql

SELECT DISTINCT title_hash FROM n8n_vectors_test WHERE title_hash IN ('{{ $json.md5 }}');

If you didn't create the title_hash trigger following the above steps (database has no title_hash column, only metadata), use this command for deduplication:

sql

SELECT DISTINCT metadata->>'title_hash' AS title_hash
FROM n8n_vectors_test
WHERE metadata->>'title_hash' IN ('{{ $json.md5 }}');

Configuration and test results:

pgsql_test

Since no new data was captured, all hash values in the test are duplicates, indicating correct output.

Merge Node

Add a Merge node with both Generate Hash Value and Postgres node as inputs.

As shown, connect Postgres node to Input 1 and Generate Hash Value node to Input 2.

2input

The Merge node uses SQL queries to filter out duplicate articles:

merge

Query statement:

sql

SELECT *
FROM input2
WHERE md5 NOT IN (SELECT title_hash FROM input1);

Since data hasn't been updated, all articles are duplicates, so no non-duplicate articles are output. You can add other RSS addresses for retesting (or wait for current RSS to output new news).

merge2

Connect the deduplicated news to the "Postgres PGVector Store" node to complete the entire news storage workflow.

Complete news collection workflow:

get_news

Configure Scheduled Trigger

Since this is a new independent workflow, start with configuring the trigger. Configuration is the same as News Collection Scheduled Trigger.

Add Vector Retrieval Node

Retrieve relevant news through vector search. Add Postgres PGVector Store node with Actions set to "Get ranked documents from vector store."

get_doc_rank

Select the database and table from the News Collection workflow. Write the Prompt according to the topics you want to retrieve. Input an appropriate Limit value to restrict article output quantity. Enable "Include Metadata" option.

get_doc

Recommend complete sentences for Prompt for more accurate semantic retrieval.

To filter by news time, recommend adding time-filtered views in PostgreSQL and filling the view name in the Table Name field for retrieving qualified news.

View creation example (only provides news with publishDate within 24 hours):

sql

CREATE OR REPLACE VIEW articles_last_24h AS
SELECT *
FROM n8n_vectors_test
WHERE (metadata->>'publishDate')::timestamptz >= NOW() - INTERVAL '24 hours';

Now fill articles_last_24h in the Table Name field of the Postgres PGVector Store node to retrieve only news within 24 hours.

Add Embedding

Connect this node's Embedding to the previously configured "Embeddings OpenAI" node from the Store News as Vector Data step. You can also create a new "Embeddings OpenAI" node (Note: must use the same model as the previous Embedding node).

Test result:

get_doc_test

Due to n8n's query method, score values represent Euclidean distance (L2), so lower score values indicate higher similarity.

Filter High-Similarity Content/Chunks

The previous node retrieved the 50 most similar articles (determined by Limit), but among these 50, there may be articles with low similarity. Therefore, we need a node to filter articles that meet similarity (score value) criteria.

Add a Filter node, configured as shown:

filter

Input an appropriate value to filter out articles with scores higher than this value. (Higher score values mean lower similarity. In the example, the filter value is set to 0.5, adjust based on actual testing)

Test result:

filter_test

Successfully filtered relevant news chunks with expected results.

Extract Hash Values from Chunks (Optional)

Since the previous step filtered data as news chunks, long articles will be split into several chunks by the text splitter. Sending these chunks to the large model might result in incomplete articles. Here we can extract matched hash values and use them to reassemble complete articles.

Select the "Code" node with this example:

const unique = [...new Set(items.map(i => i.json.document.metadata.title_hash))];

return unique.map(hash => ({ json: { title_hash_unique: hash } }));

Output result:

code_test

Use Hash to Reassemble Complete Articles (Optional)

Add a PostgreSQL query node:

Use the hash output from the previous step to query the database and reassemble articles with a single SQL statement:

sql

WITH parts AS (
    SELECT
        title_hash,
        text,

        metadata ->> 'link'        AS link,
        metadata ->> 'title'       AS title,
        metadata ->> 'publishDate' AS publish_date,

        (metadata #>> '{loc,lines,from}')::int AS line_from,
        (metadata #>> '{loc,lines,to}')::int   AS line_to
    FROM n8n_vectors_test
    WHERE title_hash IN ('{{ $json.title_hash_unique }}')
)

SELECT
    title,
    string_agg(text, ' ' ORDER BY line_from, line_to) AS body,
    link,
    publish_date
FROM parts
GROUP BY
    title_hash,
    link,
    title,
    publish_date
ORDER BY
    title_hash,
    publish_date;

Output result:

news_output

TIP

If you set a large Chunk Size in the Store News step so that article content isn't split into multiple chunks, you can completely skip the Extract Hash and Reassemble Articles steps.

Merge Items

To send all articles to the large model together, merge the items output from the above steps into one. Otherwise, the large model will make individual requests based on the number of items.

Use the "Code" node again with example code:

const arr = items.map(item => item.json); // Get all output content
return [
  {
    json: {
      myArrayStr: JSON.stringify(arr), // Convert to string
    }
  }
];

Output result:

news

You can see there's only one item now.

Now you can send the news materials to the large model for summarization.

Add an AI Agent node with "Source for Prompt" set to "Define below" for custom prompts. Simple prompt example:

prompt

You are a professional news editor. Please summarize an AI-related news report based on the following materials:
{{ $json.myArrayStr }}

You also need to add a large model (Chat Model). Select "OpenAI Chat Model". Since OpenDataSky API credentials were provided when configuring the embedding model earlier, you only need to input the Model ID here.

See OpenDataSky supported Chat Models

Test output using gpt-4.1-mini-2025-04-14:

final_news

Optimize the prompt according to requirements to achieve expected results.

Subsequent steps are entirely based on requirements and preferences. Common methods include: using HTTP Request nodes to connect various platforms for automated sending, or using Send Email to send emails. You can also use n8n's built-in nodes for simpler configuration.

Using QQ email as an example, you need to add QQ email credentials in the Send Email node. User is your email address, Host is the official address smtp.qq.com, and Password requires generating an authorization code in QQ email settings (QQ email login password cannot be used for third parties). Use default Port and SSL/TLS settings (465 and enabled).

Simple guide: QQ Web Email -> Settings -> Account -> POP3/IMAP/SMTP/Exchange/CardDAV/CalDAV Services Generate authorization code here following the instructions.

qq_email

You can see the email was successfully saved.

In "Send Email", "From Email" must match the User email in credentials, and "To Email" should be the destination email address (can also send to yourself).

Send_email

Since the large model output is mainly in Markdown format and email doesn't support Markdown, we added a Markdown to HTML conversion node between the newsletter generation and email nodes.

md2html

Final complete workflow:

workflow

Check the email:

Conclusion

Through this detailed practical guide, we successfully built a complete AI newsletter automation generation system. This is not just a technical demonstration, but showcases n8n's powerful potential in AI workflow orchestration.

Mastering n8n + AI core skills is more than just a technical tool-it's a bridge connecting traditional business with AI capabilities. In the era of AI-native applications, such automation capabilities will become core competitive advantages for enterprises and individuals.

Start taking action! Begin by replicating this article's workflow, then gradually customize your own AI automation solutions. Remember, the best way to learn is through continuous iteration and optimization in practice.

n8n in Action: Automated AI Newsletter Generation and Publishing ​

What is n8n? ​

What Will This Tutorial Achieve? ​

Technology Stack Overview ​

How to Use n8n? ​

Orchestrating the Workflow ​

News Collection Workflow ​

Configure Scheduled Trigger ​

Set RSS News Source URLs ​

Split Array ​

Read RSS News Sources ​

Extract Valid Fields ​

Generate Hash Value (Optional) ​

Store News as Vector Data ​

Article Deduplication (Optional) ​

Retrieve Relevant News and Generate Newsletter ​

Configure Scheduled Trigger ​

Add Vector Retrieval Node ​

Filter High-Similarity Content/Chunks ​

Extract Hash Values from Chunks (Optional) ​

Use Hash to Reassemble Complete Articles (Optional) ​

Merge Items ​

Generate Newsletter ​

Send Newsletter ​

Conclusion ​

n8n in Action: Automated AI Newsletter Generation and Publishing

What is n8n?

What Will This Tutorial Achieve?

Technology Stack Overview

How to Use n8n?

Orchestrating the Workflow

News Collection Workflow

Configure Scheduled Trigger

Set RSS News Source URLs

Split Array

Read RSS News Sources

Extract Valid Fields

Generate Hash Value (Optional)

Store News as Vector Data

Article Deduplication (Optional)

Retrieve Relevant News and Generate Newsletter

Configure Scheduled Trigger

Add Vector Retrieval Node

Filter High-Similarity Content/Chunks

Extract Hash Values from Chunks (Optional)

Use Hash to Reassemble Complete Articles (Optional)

Merge Items

Generate Newsletter

Send Newsletter

Conclusion