← 返回
未分类

PoliBERT Sentiment Analysis

Political sentiment analysis using PoliBERTweet - a RoBERTa model pre-trained on 83M political tweets. Analyzes support, opposition, and stance toward politi...
使用 PoliBERTweet(基于 8300 万条政治推文预训练的 RoBERTa 模型)进行政治情感分析,识别支持、反对以及对政治议题的立场。
erongcao erongcao 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 322
下载
💾 1
安装
1
版本
#latest#nlp#political-analysis#sentiment

概述

PoliBERT Sentiment Analysis

Political sentiment analysis skill powered by PoliBERTweet - a transformer model trained on 83 million political tweets (Georgetown University, LREC 2022).

Overview

This skill provides political sentiment analysis capabilities using a specialized NLP model trained on political content. It can analyze sentiment toward political candidates, issues, and events from various data sources including Reddit, local files, or direct text input.

Features

  • Sentiment Classification: Support / Oppose / Neutral toward political targets
  • Stance Detection: Issue-specific stance analysis (e.g., pro/anti immigration)
  • Entity Targeting: Analyze sentiment toward specific politicians
  • Confidence Scoring: Probability scores for each classification
  • Reddit Data Integration: Auto-fetch political discussions from Reddit (free, read-only)
  • Batch Processing: Analyze multiple texts from files or stdin
  • JSON Output: Machine-readable results for integration with other tools

When to Use

Use this skill when you need to:

  • Analyze public sentiment toward political candidates or figures
  • Track political opinion trends on social media
  • Complement prediction market data with social sentiment
  • Monitor political discourse around specific issues
  • Aggregate opinions from Reddit political communities

Model Information

  • Model: PoliBERTweet
  • Architecture: RoBERTa (Robustly Optimized BERT)
  • Training Data: 83 million political tweets (2016-2020 US elections)
  • HuggingFace Hub: kornosk/polibertweet-political-twitter-roberta-mlm
  • Model Size: ~500MB
  • Academic Paper: LREC 2022
  • Institution: Georgetown University DataLab

Installation

Prerequisites

# Python 3.9 or higher
python --version

# Install core dependencies
pip install transformers>=4.18.0 torch>=1.10.2

# Optional: Reddit data fetching
pip install praw>=7.8.1

First Run

On first execution, the model will be automatically downloaded from HuggingFace Hub (~500MB):

python polibert_sentiment.py --text "Test"

Data Sources

SourceMethodCostData QualityUse Case
----------------------:------------::---------
Reddit--redditFreeHighReal-time political discussions
Local File--file-User-dependentBatch analysis of collected data
Stdin--stdin-User-dependentPipeline integration
Direct Text--text-User-dependentQuick testing and single analysis

Reddit Data

Default Subreddits: r/politics, r/Conservative, r/democrats, r/Republican, r/PoliticalDiscussion

Note: Reddit data fetching uses read-only mode (no API credentials required). Rate limits apply.

Usage Examples

1. Single Text Analysis

python polibert_sentiment.py --text "J.D. Vance is the future of the Republican party"

Output:

Text: J.D. Vance is the future of the Republican party
Sentiment: SUPPORT (78.3% confidence)

2. Reddit Sentiment Analysis

# Analyze J.D. Vance sentiment from Reddit
python polibert_sentiment.py --candidate "J.D. Vance" --reddit --limit 50

# Analyze specific query
python polibert_sentiment.py --query "2028 election" --reddit --limit 100

# Custom subreddits
python polibert_sentiment.py --query "climate policy" --reddit --subreddits politics,environment

3. Batch File Analysis

# File with one text per line
python polibert_sentiment.py --candidate "Trump" --file tweets.txt

4. JSON Output (for integration)

python polibert_sentiment.py --candidate "Biden" --reddit --json

Output:

{
  "candidate": "Biden",
  "total_analyzed": 47,
  "sentiment_breakdown": {
    "support": {"count": 15, "percentage": 31.9},
    "oppose": {"count": 22, "percentage": 46.8},
    "neutral": {"count": 10, "percentage": 21.3}
  },
  "net_sentiment": -14.9,
  "average_confidence": 72.4
}

Integration with Other Skills

With Polymarket

Polymarket (market odds)  →  PoliBERT (social sentiment)  →  Prediction synthesis
     18.6% (Vance)                    35% Support                      Combined signal

With Prediction Skill

Use PoliBERT sentiment as an input factor in the BRACE forecasting framework:

  • Base rate: Historical election patterns
  • Sentiment: Social media trends (via PoliBERT)
  • Market: Prediction market odds (via Polymarket)

Example Workflow

# 1. Get market data
python polymarket.py search "presidential election winner 2028" --json

# 2. Get social sentiment
python polibert_sentiment.py --candidate "J.D. Vance" --reddit --limit 100 --json

# 3. Synthesize in prediction framework
# (Use prediction skill to combine signals)

Output Format

Human-Readable Output

📊 Sentiment Analysis: J.D. Vance
Source: Reddit | Total analyzed: 47

Support: 31.9% (15)
Oppose: 46.8% (22)
Neutral: 21.3% (10)

Net Sentiment: -14.9%
Avg Confidence: 72.4%

JSON Output Structure

{
  "candidate": "string",
  "total_analyzed": "integer",
  "sentiment_breakdown": {
    "support": {"count": "integer", "percentage": "float"},
    "oppose": {"count": "integer", "percentage": "float"},
    "neutral": {"count": "integer", "percentage": "float"}
  },
  "average_confidence": "float",
  "net_sentiment": "float",
  "sample_results": [
    {"text": "string", "sentiment": "string", "confidence": "float"}
  ]
}

Limitations and Considerations

Model Limitations

  1. Training Data: Model trained on 2016-2020 tweets, may not capture 2024-2028 linguistic patterns
  2. Context Sensitivity: May miss sarcasm, irony, or cultural references
  3. Temporal Drift: Political language evolves; model accuracy may degrade over time
  4. Confidence Calibration: Confidence scores are model outputs, not calibrated probabilities

Data Limitations

  1. Reddit Sample Bias: Reddit users skew younger, more educated, more liberal than general population
  2. Selection Bias: Active Reddit users are not representative voters
  3. Timing: Social sentiment can shift rapidly; snapshot may not represent election day mood
  4. Volume: Low-liquidity markets may have few social media discussions

Best Practices

  • Use as one input among many, not sole prediction basis
  • Combine with prediction markets, polling data, economic indicators
  • Track sentiment trends over time, not single snapshots
  • Adjust for platform demographics (Reddit ≠ Twitter ≠ general population)

Citation

If you use this skill or PoliBERTweet model in research, please cite:

@inproceedings{kawintiranon2022polibertweet,
  title={{P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter},
  author={Kawintiranon, Kornraphop and Singh, Lisa},
  booktitle={Proceedings of the Language Resources and Evaluation Conference (LREC)},
  year={2022},
  pages={7360--7367},
  publisher={European Language Resources Association}
}

License

  • Skill Code: MIT License
  • PoliBERTweet Model: Subject to HuggingFace Hub and original paper terms

Feedback and Contributions

Related Skills

  • polymarket-unified - Prediction market data for political forecasting
  • prediction - BRACE framework for calibrated forecasting
  • ai-model-team - Multi-model prediction system for financial markets

Version History

  • v1.0.0 (2026-04-17): Initial release
  • PoliBERTweet model integration
  • Reddit data source support
  • Sentiment analysis pipeline
  • JSON and human-readable output formats
  • Batch processing capabilities

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 20:55 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Agency Agents

erongcao
AI Agent团队:49个专业Agent,8大部门,提供完整的AI代理服务。支持单Agent独立使用和多Agent协同编排。
★ 0 📥 413

Humanize Chinese

erongcao
检测并去除中文文本中的AI写作痕迹。用户说“去AI味”“改得自然点”“太机器”“帮我润色”“去掉AI感”时触发。支持文件或粘贴输入,输出改写后文本及对比报告。适用于论文、文案、公众号、社交媒体等场景。
★ 1 📥 505

Crypto Analyst

erongcao
加密货币综合分析工具,集成OKX、Binance数据,提供行情、技术分析、交易信号、资金流向、仓位管理、DCA计划、风险计算。触发词:分析BTC、行情查询、交易信号、仓位计算、DCA计划、巨鲸追踪、风险评估。
★ 0 📥 425