SilentStigma — Mapping Emotional and Coping Language at Scale

How It Works

Public Comments

Collected from mental health advocacy channels

→

Sentence Embeddings

Transformer-based semantic encoding

→

Unsupervised Clustering

Density-based pattern discovery

→

Emotional Landscapes

Dimensionality reduction for visualization

→

Coping Strategies

Pattern extraction and keyword analysis

Ethical Commitments

This platform analyzes only publicly available comments from YouTube. All processing occurs at the aggregate level. No individual-level classification, tracking, or diagnosis is performed.

The system identifies patterns in language use across large volumes of text. It does not attempt to infer personal characteristics, mental health status, or risk factors for any individual commenter.

All analysis is designed for research and educational purposes. The platform is not intended for clinical use, diagnostic purposes, or individual assessment.

Data collection respects platform terms of service and focuses exclusively on public comments from mental health advocacy and education channels.

Research Orientation

SilentStigma uses unsupervised machine learning to identify naturally emerging patterns in mental health discourse. The pipeline employs transformer-based sentence embeddings (Sentence-BERT) to encode semantic meaning, then applies density-based clustering (HDBSCAN) to group similar expressions without predefined categories.

Dimensionality reduction via UMAP enables visualization of high-dimensional semantic spaces in two dimensions, revealing the landscape of discourse patterns. Pattern extraction combines keyword analysis (KeyBERT) with curated lexicons to identify coping strategies, emotional language, and stigma indicators.

The methodology prioritizes transparency and reproducibility. All configuration parameters are versioned, and the pipeline can be run end-to-end from raw comments to final visualizations. The system makes no assumptions about what patterns will emerge, allowing the data to speak for itself.

This approach is designed for computational social scientists, mental health communication researchers, and methodologists interested in unsupervised NLP pipelines for discourse analysis.