<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://blog.nikolaarinanda.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.nikolaarinanda.com/" rel="alternate" type="text/html" /><updated>2026-04-20T04:40:33+00:00</updated><id>https://blog.nikolaarinanda.com/feed.xml</id><title type="html">Nikola</title><subtitle>Copyright © 2026 Nikola&apos;s Blog</subtitle><entry><title type="html">Menganalisis Faktor-Faktor yang Memengaruhi Hasil Akademik Siswa (SSC dan HSC)</title><link href="https://blog.nikolaarinanda.com/2026/04/15/analisis-data-pendidikan-bangladesh/" rel="alternate" type="text/html" title="Menganalisis Faktor-Faktor yang Memengaruhi Hasil Akademik Siswa (SSC dan HSC)" /><published>2026-04-15T00:00:00+00:00</published><updated>2026-04-15T00:00:00+00:00</updated><id>https://blog.nikolaarinanda.com/2026/04/15/analisis-data-pendidikan-bangladesh</id><content type="html" xml:base="https://blog.nikolaarinanda.com/2026/04/15/analisis-data-pendidikan-bangladesh/"><![CDATA[<p>Halo semua, kali ini saya akan menganalisis dataset “Student Performance Dataset” yang dapat anda lihat di <a href="https://www.kaggle.com/datasets/ihasan88/student-performance-dataset">sini</a>. Dataset tersebut berisi 1000 data siswa-siswi di Bangladesh dengan variabel-variabel terkait sejumlah 14 kolom. Dari dataset berikut diharapkan dapat ditemukan keterkaitan antara satu variabel dengan variabel lainnya yang dapat mempengaruhi hasil akademik dari siswa dan siswi di Bangladesh. Terdapat dua kelompok variabel dalam data yang saya gunakan.</p>

<h3 id="variable-numerik">Variable Numerik</h3>

<p>Variabel numerik adalah variabel yang memiliki nilai berupa angka dan dapat dilakukan operasi matematis. Disini ditemukan 8 variabel yang dapat dikategorikan sebagai variabel numerik seperti yang ada pada tabel berikut:</p>

<table>
  <thead>
    <tr>
      <th>Variabel</th>
      <th>Deskripsi</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Student_ID</td>
      <td>ID unik untuk setiap siswa</td>
    </tr>
    <tr>
      <td>Age</td>
      <td>Usia siswa (tahun)</td>
    </tr>
    <tr>
      <td>Study_Hours_per_Week</td>
      <td>Jumlah jam belajar per minggu</td>
    </tr>
    <tr>
      <td>Attendance</td>
      <td>Persentase kehadiran siswa (%)</td>
    </tr>
    <tr>
      <td>Family_Income_BDT</td>
      <td>Pendapatan keluarga (dalam BDT)</td>
    </tr>
    <tr>
      <td>Previous_GPA</td>
      <td>Nilai GPA sebelumnya</td>
    </tr>
    <tr>
      <td>SSC_Result</td>
      <td>Nilai ujian SSC</td>
    </tr>
    <tr>
      <td>HSC_Result</td>
      <td>Nilai ujian HSC</td>
    </tr>
  </tbody>
</table>

<h3 id="variabel-kategorikal">Variabel Kategorikal</h3>

<p>Variabel numerik adalah variabel yang berisi kategori atau tabel tertentu. Disini terdapat 6 variabel yang dapat dikategorikan sebagai variabel numerik seperti yang ada pada tabel berikut:</p>

<table>
  <thead>
    <tr>
      <th>Variabel</th>
      <th>Deskripsi</th>
      <th>Contoh Nilai</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Gender</td>
      <td>Jenis kelamin siswa</td>
      <td>Male, Female</td>
    </tr>
    <tr>
      <td>District</td>
      <td>Wilayah tempat tinggal siswa</td>
      <td>Rangpur, Dhaka, dll</td>
    </tr>
    <tr>
      <td>School_Type</td>
      <td>Jenis sekolah</td>
      <td>Private, Public</td>
    </tr>
    <tr>
      <td>Parent_Education</td>
      <td>Tingkat pendidikan orang tua</td>
      <td>Graduate, Undergraduate, dll</td>
    </tr>
    <tr>
      <td>Internet_Access</td>
      <td>Akses internet di rumah</td>
      <td>Yes, No</td>
    </tr>
    <tr>
      <td>Private_Tuition</td>
      <td>Mengikuti les privat atau tidak</td>
      <td>Yes, No</td>
    </tr>
  </tbody>
</table>

<h2 id="statistik-dasar">Statistik Dasar</h2>

<p>Disini statistik dasar dari dataset utamanya variabel numerik akan diuraikan satu demi satu.</p>

<h3 id="jam-belajar-per-minggu">Jam Belajar Per Minggu</h3>

<p><strong>🔢 Ukuran Pemusatan</strong></p>

<ul>
  <li>Rata-rata (mean) nilai HSC: 17,102</li>
  <li>Median nilai HSC: 17</li>
  <li>Nilai mean yang sangat dekat dengan median menunjukkan bahwa distribusi data cenderung simetris.</li>
</ul>

<p><strong>📉 Ukuran Penyebaran</strong></p>

<ul>
  <li>Standar deviasi: 7,295</li>
  <li>Hal ini menunjukkan bahwa terdapat variasi nilai yang cukup besar antar siswa, sehingga nilai tidak terlalu terpusat di sekitar rata-rata.</li>
</ul>

<p><strong>📌 Nilai Ekstrem</strong></p>

<ul>
  <li>Nilai minimum: 5</li>
  <li>Nilai maksimum: 29</li>
  <li>Rentang nilai yang cukup lebar (24 poin) mengindikasikan adanya perbedaan performa yang signifikan antar siswa.</li>
</ul>

<h3 id="kehadiran">Kehadiran</h3>

<p><strong>🔢 Ukuran Pemusatan</strong></p>

<ul>
  <li>Rata-rata kehadiran: 79,40%</li>
  <li>Median kehadiran: 79%</li>
  <li>Nilai mean yang sangat dekat dengan median menunjukkan bahwa distribusi data cenderung simetris.</li>
</ul>

<p><strong>📉 Ukuran Penyebaran</strong></p>

<ul>
  <li>Standar deviasi: 11,48</li>
  <li>Hal ini menunjukkan bahwa terdapat variasi kehadiran yang cukup besar antar siswa, sehingga data tidak terlalu terpusat di sekitar rata-rata.</li>
</ul>

<p><strong>📌 Nilai Ekstrem</strong></p>

<ul>
  <li>Nilai minimum: 60%</li>
  <li>Nilai maksimum: 99%</li>
  <li>Rentang nilai yang cukup lebar (39 poin persentase) mengindikasikan adanya perbedaan tingkat kehadiran yang signifikan antar siswa.</li>
</ul>

<h3 id="pendapatan-keluarga-bdt">Pendapatan Keluarga (BDT)</h3>

<p><strong>🔢 Ukuran Pemusatan</strong></p>

<ul>
  <li>Rata-rata pendapatan: 34.888 BDT</li>
  <li>Median pendapatan: 34.847 BDT</li>
  <li>Nilai mean yang sangat dekat dengan median menunjukkan bahwa distribusi pendapatan cenderung simetris, tanpa kemencengan yang signifikan.</li>
</ul>

<p><strong>📉 Ukuran Penyebaran</strong></p>

<ul>
  <li>Standar deviasi: 14.523 BDT</li>
  <li>Hal ini menunjukkan bahwa terdapat variasi pendapatan yang cukup besar antar keluarga siswa, sehingga data tidak terlalu terpusat di sekitar rata-rata.</li>
</ul>

<p><strong>📌 Nilai Ekstrem</strong></p>

<ul>
  <li>Pendapatan minimum: 10.016 BDT</li>
  <li>Pendapatan maksimum: 59.767 BDT</li>
  <li>Rentang pendapatan yang cukup lebar (49.751 BDT) mengindikasikan adanya kesenjangan ekonomi yang signifikan antar keluarga siswa.</li>
</ul>

<h3 id="gpa-grade-point-average">GPA (Grade Point Average)</h3>

<p><strong>🔢 Ukuran Pemusatan</strong></p>

<ul>
  <li>Rata-rata GPA: 3,99</li>
  <li>Median GPA: 4,0</li>
  <li>Nilai mean yang sangat dekat dengan median menunjukkan bahwa distribusi GPA cenderung simetris, tanpa kemencengan yang signifikan.</li>
</ul>

<p><strong>📉 Ukuran Penyebaran</strong></p>

<ul>
  <li>Standar deviasi: 0,58</li>
  <li>Hal ini menunjukkan bahwa terdapat variasi nilai GPA yang kecil hingga moderat, sehingga sebagian besar nilai relatif terkonsentrasi di sekitar rata-rata.</li>
</ul>

<p><strong>📌 Nilai Ekstrem</strong></p>

<ul>
  <li>GPA minimum: 3,0</li>
  <li>GPA maksimum: 5,0</li>
  <li>Rentang nilai (2,0 poin) menunjukkan adanya perbedaan tingkat prestasi akademik, namun masih dalam batas skala penilaian yang wajar.</li>
</ul>

<h3 id="ssc-secondary-school-certificate">SSC (Secondary School Certificate)</h3>

<p><strong>🔢 Ukuran Pemusatan</strong></p>

<ul>
  <li>Rata-rata nilai SSC: 4,25</li>
  <li>Median nilai SSC: 4,26</li>
  <li>Nilai mean yang sangat dekat dengan median menunjukkan bahwa distribusi nilai SSC cenderung simetris, tanpa kemencengan yang signifikan.</li>
</ul>

<p><strong>📉 Ukuran Penyebaran</strong></p>

<ul>
  <li>Standar deviasi: 0,42</li>
  <li>Hal ini menunjukkan bahwa terdapat variasi nilai yang kecil, sehingga sebagian besar nilai relatif terkonsentrasi di sekitar rata-rata.</li>
</ul>

<p><strong>📌 Nilai Ekstrem</strong></p>

<ul>
  <li>Nilai minimum: 3,5</li>
  <li>Nilai maksimum: 5,0</li>
  <li>Rentang nilai (1,5 poin) menunjukkan adanya perbedaan tingkat prestasi akademik, namun masih dalam batas skala penilaian yang wajar.</li>
</ul>

<h3 id="hsc-higher-secondary-certificate">HSC (Higher Secondary Certificate)</h3>

<p><strong>🔢 Ukuran Pemusatan</strong></p>

<ul>
  <li>Rata-rata nilai HSC: 4,01</li>
  <li>Median nilai HSC: 4,01</li>
  <li>Nilai mean yang sama dengan median menunjukkan bahwa distribusi nilai HSC sangat simetris, tanpa kemencengan yang signifikan.</li>
</ul>

<p><strong>📉 Ukuran Penyebaran</strong></p>

<ul>
  <li>Standar deviasi: 0,58</li>
  <li>Hal ini menunjukkan bahwa terdapat variasi nilai yang kecil hingga moderat, sehingga sebagian besar nilai relatif terkonsentrasi di sekitar rata-rata.</li>
</ul>

<p><strong>📌 Nilai Ekstrem</strong></p>

<ul>
  <li>Nilai minimum: 3,0</li>
  <li>Nilai maksimum: 5,0</li>
  <li>Rentang nilai (2,0 poin) menunjukkan adanya perbedaan tingkat prestasi akademik, namun masih dalam batas skala penilaian yang wajar.</li>
</ul>]]></content><author><name></name></author><summary type="html"><![CDATA[Halo semua, kali ini saya akan menganalisis dataset “Student Performance Dataset” yang dapat anda lihat di sini. Dataset tersebut berisi 1000 data siswa-siswi di Bangladesh dengan variabel-variabel terkait sejumlah 14 kolom. Dari dataset berikut diharapkan dapat ditemukan keterkaitan antara satu variabel dengan variabel lainnya yang dapat mempengaruhi hasil akademik dari siswa dan siswi di Bangladesh. Terdapat dua kelompok variabel dalam data yang saya gunakan.]]></summary></entry><entry><title type="html">Hello World</title><link href="https://blog.nikolaarinanda.com/2026/04/13/hello-world/" rel="alternate" type="text/html" title="Hello World" /><published>2026-04-13T00:00:00+00:00</published><updated>2026-04-13T00:00:00+00:00</updated><id>https://blog.nikolaarinanda.com/2026/04/13/hello-world</id><content type="html" xml:base="https://blog.nikolaarinanda.com/2026/04/13/hello-world/"><![CDATA[<h1 id="tiktok-comment-sentiment-analysis-using-textcnn">TikTok Comment Sentiment Analysis Using TextCNN</h1>

<!-- This is the short explanation about the title of the project. -->

<h2 id="project-overview">Project Overview</h2>

<p>This project is a Final Project that implements a TextCNN to perform sentiment analysis on TikTok comments in Indonesian language. The system is designed to classify comments into two categories: Cyberbullying (insult/embarrass content comments) and Non-Cyberbullying (normal/clear content comments).</p>

<p><strong>Institution</strong>: Institut Teknologi Sumatera (ITERA)<br />
<strong>Study Program</strong>: Informatics Engineering<br />
<strong>Author</strong>: Nikola Arinanda<br />
<strong>Year</strong>: 2026</p>

<hr />

<h2 id="abstract">Abstract</h2>

<p>This project uses TextCNN architecture to analyze YouTube comments sentiment. Initially, the data will go through preprocessing stages such as data division (80:20), case folding, text cleaning, augmentation (AEDA, random swap character, random delete character), tokenization and stopword removal to prepare the data. The next step is model training using k-fold cross-validation as many as 5 fold to ensure robustness and good generalization. Final step is model evaluation using confussion matrix such as accuracy, precision, recall and F1 score.</p>

<hr />

<h2 id="project-structure">Project Structure</h2>

<!-- <pre> -->

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tugas-akhir-main/
├── dataset/
│ ├── k_fold.json           <span class="c"># k-fold cross-validation dictionary</span>
│ └── cyberbullying.csv     <span class="c"># Original dataset</span>
├── code/
│ ├── datareader.py         <span class="c"># Data loader and preprocessing</span>
│ ├── model.py              <span class="c"># Model architecture</span>
│ └── train.py              <span class="c"># Main script for model training</span>
├── model_outputs/
│ ├── run_YYYYMMDD_HHMMSS/
│ │ ├── fold_1_model.pth    <span class="c"># Model output</span>
│ │ ├── fold_2_model.pth    <span class="c"># ...</span>
│ │ ├── fold_3_model.pth
│ │ ├── fold_4_model.pth
│ │ ├── fold_5_model.pth
│ │ └── ...
│ └── ...
└── report/
│ └── thesis.pdf            <span class="c"># Documentation and reports</span>
└── requirements.txt        <span class="c"># Python dependencies</span>
</code></pre></div></div>

<!-- </pre> -->

<hr />

<h2 id="environment-setup">Environment Setup</h2>

<h3 id="prerequisites">Prerequisites</h3>

<p>This project requires:</p>

<ul>
  <li>Python: 3.8 or higher (tested with Python 3.9+)</li>
  <li>CUDA: Optional (for GPU acceleration)</li>
</ul>

<h3 id="system-requirements">System Requirements</h3>

<ul>
  <li>RAM: Minimum 8 GB (recommended 16 GB)</li>
  <li>Storage: Minimum 10 GB (for model and dataset)</li>
  <li>GPU: Optional, but highly recommended for faster training</li>
</ul>

<hr />

<h2 id="dependencies">Dependencies</h2>

<p>All dependencies are listed in the <code class="language-plaintext highlighter-rouge">requirements.txt</code> file. Main libraries:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>| Library        | Version  | Purpose                              |
|----------------|----------|--------------------------------------|
| torch          | <span class="o">&gt;=</span>2.0.0  | Deep learning framework              |
| pandas         | <span class="o">&gt;=</span>1.5.0  | Data manipulation                    |
| numpy          | <span class="o">&gt;=</span>1.23.0 | Numerical computing                  |
| matplotlib     | <span class="o">&gt;=</span>3.7.0  | Data visualization                   |
| seaborn        | <span class="o">&gt;=</span>0.12.0 | Statistical visualization            |
| scikit-learn   | <span class="o">&gt;=</span>1.2.0  | Machine learning utilities           |
| transformers   | <span class="o">&gt;=</span>4.30.0 | NLP models <span class="o">(</span>IndoBERT, etc.<span class="o">)</span>          |
| nltk           | <span class="o">&gt;=</span>3.8.0  | Text preprocessing                   |
| tqdm           | <span class="o">&gt;=</span>4.65.0 | Progress bar                         |
| wandb          | <span class="o">&gt;=</span>0.15.0 | Experiment tracking                  |
</code></pre></div></div>

<p>For the complete list, see <code class="language-plaintext highlighter-rouge">requirements.txt</code></p>

<hr />

<h2 id="installation--setup">Installation &amp; Setup</h2>

<h3 id="step-1-clone-repository">Step 1: Clone Repository</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/nikolaarinanda/tugas-akhir.git
<span class="nb">cd </span>tugas-akhir
</code></pre></div></div>

<h3 id="step-2-create-virtual-environment">Step 2: Create Virtual Environment</h3>

<p>It is highly recommended to use a virtual environment to avoid dependency conflicts.
<strong>Using venv (built-in python)</strong>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Linux/Mac</span>
python3 <span class="nt">-m</span> venv venv
<span class="nb">source </span>venv/bin/activate

<span class="c"># Windows</span>
python <span class="nt">-m</span> venv venv
venv<span class="se">\S</span>cripts<span class="se">\a</span>ctivate
</code></pre></div></div>

<p><strong>Using conda</strong>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>conda create <span class="nt">-n</span> youtube-sentiment <span class="nv">python</span><span class="o">=</span>3.9
conda activate youtube-sentiment
</code></pre></div></div>

<h3 id="step-3-install-dependencies">Step 3: Install Dependencies</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Upgrade pip to the latest version</span>
pip <span class="nb">install</span> <span class="nt">--upgrade</span> pip

<span class="c"># Install all requirements</span>
pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
</code></pre></div></div>

<p><strong>Note for PyTorch with GPU</strong>: If you want to use GPU, install the CUDA-specific version of PyTorch:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># For CUDA 11.8</span>
pip <span class="nb">install </span>torch torchvision torchaudio <span class="nt">--index-url</span> https://download.pytorch.org/whl/cu118

<span class="c"># For CUDA 12.1</span>
pip <span class="nb">install </span>torch torchvision torchaudio <span class="nt">--index-url</span> https://download.pytorch.org/whl/cu121
</code></pre></div></div>

<hr />

<h2 id="dataset-information">Dataset Information</h2>

<p>The dataset consists of TikTOk comments in Indonesian language which comes from <a href="https://ieeexplore.ieee.org/document/10468424">this</a> research with labels:</p>

<ul>
  <li>Cyberbullying (-1): Insult/embarrass content comments</li>
  <li>Non-cyberbullying (1): Normal/clear content comments</li>
</ul>

<h3 id="dataset-format">Dataset Format</h3>

<p>The dataset <code class="language-plaintext highlighter-rouge">cyberbullying.csv</code> has the following columns:</p>

<table>
  <thead>
    <tr>
      <th>Column</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>sentiment</td>
      <td>Integer</td>
      <td>label (-1 cyberbullying, 1 non-cyberbullying)</td>
    </tr>
    <tr>
      <td>comment</td>
      <td>String</td>
      <td>Comment content</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="model-architecture">Model Architecture</h2>

<h3 id="textcnn-text-classifier">TextCNN Text Classifier</h3>

<!-- ![Model Architecture](/assets/images/arsitektur-textcnn.png) -->

<h3 id="sedepthwise-textcnn-text-classifier">SEDepthwise TextCNN Text Classifier</h3>

<!-- ![Model Architecture](/assets/images/arsitektur-sedepthwise-textcnn.png) -->

<h2 id="key-components">Key Components:</h2>

<ul>
  <li><strong>Embedding</strong>: Converts token IDs into dense vectors (IndoBERT tokenizer compatible)</li>
  <li><strong>Transpose</strong>: Adjusts tensor shape for Conv1D input (embedding_dim → channel dimension)</li>
  <li><strong>Depthwise Separable Convolution</strong>:
    <ul>
      <li>Depthwise Conv (kernel sizes = 3, 4)</li>
      <li>Pointwise Conv (channel mixing)</li>
    </ul>
  </li>
  <li><strong>Activation</strong>: ReLU for non-linearity</li>
  <li><strong>Pooling</strong>: Global Max Pooling to extract dominant features</li>
  <li><strong>Concatenation</strong>: Combines features from multiple convolution branches</li>
  <li><strong>SE Block (Squeeze-and-Excitation)</strong>: Channel-wise attention to recalibrate feature importance</li>
  <li><strong>Output Layer</strong>: Fully connected layer for classification (num_classes)</li>
</ul>

<hr />

<h2 id="how-to-run">How to Run</h2>

<h3 id="1-training-with-default-configuration">1. Training With Default COnfiguration</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>py train.py
</code></pre></div></div>

<p>Result:</p>

<ul>
  <li>Create fold indices as manys as 5 fold using k-fold cross-validation (if it doesn’t exist yet)</li>
  <li>Training model Training model on 5 folds sequentially</li>
  <li>Save the training result model in model_outputs/run_YYYYMMDD_HHMMSS/</li>
  <li>Metrics plot and model checkpoints in Wandb</li>
</ul>

<h3 id="2-training-dengan-custom-parameter">2. Training dengan custom parameter</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python train.py <span class="se">\</span>
    <span class="nt">--max_length</span> 128 <span class="se">\</span>
    <span class="nt">--dropout</span> 0.3 <span class="se">\</span>
    <span class="nt">--batch_size</span> 50 <span class="se">\</span>
    <span class="nt">--optimizer_name</span> Muon <span class="se">\</span>
    <span class="nt">--embed_dim</span> 100 <span class="se">\</span>
    <span class="nt">--conv_filters</span> 50 <span class="se">\</span>
    <span class="nt">--kernel_size</span> 3 4 <span class="se">\</span>
    <span class="nt">--epochs</span> 100 <span class="se">\</span>
    <span class="nt">--lr</span> 5e-4 <span class="se">\</span>
</code></pre></div></div>

<h3 id="command-line-arguments">Command Line Arguments</h3>

<table>
  <thead>
    <tr>
      <th>Argument</th>
      <th>Type</th>
      <th>Default</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>–seed</td>
      <td>int</td>
      <td>01012001</td>
      <td>Random seed for reproducibility</td>
    </tr>
    <tr>
      <td>–dataset_path</td>
      <td>str</td>
      <td>’../dataset/cyberbullying.csv’</td>
      <td>Path to dataset file</td>
    </tr>
    <tr>
      <td>–max_length</td>
      <td>int</td>
      <td>128</td>
      <td>Maximum sequence length</td>
    </tr>
    <tr>
      <td>–tokenizer</td>
      <td>str</td>
      <td>‘indobenchmark/indobert-base-p1’</td>
      <td>Tokenizer name</td>
    </tr>
    <tr>
      <td>–dropout</td>
      <td>float</td>
      <td>0.5</td>
      <td>Dropout rate</td>
    </tr>
    <tr>
      <td>–batch_size</td>
      <td>int</td>
      <td>50</td>
      <td>Batch size for embedding</td>
    </tr>
    <tr>
      <td>–embed_dim</td>
      <td>int</td>
      <td>100</td>
      <td>Embedding dimension for CNN</td>
    </tr>
    <tr>
      <td>–num_classes</td>
      <td>int</td>
      <td>2</td>
      <td>Number of classes</td>
    </tr>
    <tr>
      <td>–conv_filters</td>
      <td>int</td>
      <td>50</td>
      <td>Number of filters for CNN</td>
    </tr>
    <tr>
      <td>–kernel_size</td>
      <td>int</td>
      <td>[3, 4]</td>
      <td>Kernel sizes for CNN</td>
    </tr>
    <tr>
      <td>–n_folds</td>
      <td>int</td>
      <td>5</td>
      <td>Fold number for cross-validation</td>
    </tr>
    <tr>
      <td>–epochs</td>
      <td>int</td>
      <td>100</td>
      <td>Number of epochs</td>
    </tr>
    <tr>
      <td>–lr</td>
      <td>float</td>
      <td>52-4</td>
      <td>Learning rate</td>
    </tr>
    <tr>
      <td>–output_model</td>
      <td>flag</td>
      <td>True</td>
      <td>Save model after training</td>
    </tr>
    <tr>
      <td>–output_dir</td>
      <td>str</td>
      <td>‘model_outputs’</td>
      <td>Directory to save model outputs</td>
    </tr>
    <tr>
      <td>–use_wandb</td>
      <td>flag</td>
      <td>False</td>
      <td>Enable Weights &amp; Biases logging</td>
    </tr>
    <tr>
      <td>–wandb_group</td>
      <td>str</td>
      <td>‘Light TextCNN’</td>
      <td>Create group for Weights &amp; Biases runs</td>
    </tr>
    <tr>
      <td>–wandb_note</td>
      <td>str</td>
      <td>‘Light TextCNN Note’</td>
      <td>Add Weights &amp; Biases notes</td>
    </tr>
    <tr>
      <td>–patience</td>
      <td>int</td>
      <td>5</td>
      <td>Patience for early stopping (epochs to wait after no improvement)</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="output-and-results">Output and Results</h2>

<h3 id="output-structure">Output Structure</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> model_outputs/
├── run_YYYYMMDD_HHMMSS/
│ ├── fold_1_model.pth    <span class="c"># Model output</span>
│ ├── fold_2_model.pth    <span class="c"># ...</span>
│ ├── fold_3_model.pth
│ └── fold_4_model.pth
│ └── fold_5_model.pth
│ └── ...
└── ...
</code></pre></div></div>

<h3 id="wandb-key-matrics">Wandb Key Matrics</h3>

<p>The model produces the following metrics:</p>

<ul>
  <li><strong>Accuracy</strong>: Percentage of correct predictions</li>
  <li><strong>Precision</strong>: Accuracy for positive predictions</li>
  <li><strong>Recall</strong>: Ability to find all positive samples</li>
  <li><strong>F1-Score</strong>: Harmonic mean of precision and recall</li>
  <li><strong>Loss</strong>: Cross-entropy loss</li>
</ul>

<hr />

<h2 id="data-augmentation">Data Augmentation</h2>

<p>Augmentation techniques are applied during training to improve robustness:</p>

<ul>
  <li><strong>AEDA (An Easy Data Augmentation)</strong>: augmentation that works by inserting punctuation marks “.”, ”;”, ”?”, ”:”,”!”,”,” randomly into the text</li>
  <li><strong>Random Swap Character</strong>: Randomly swap positions of two words in the text</li>
  <li><strong>Random Delete Character</strong>: Randomly delete character from the text\</li>
  <li><strong>Augmentation Probability</strong>: Default 0.5 (50% of data is selected for augmentation)</li>
</ul>

<p>Example:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Original: <span class="s2">"makannya segentong buset"</span>
AEDA: <span class="s2">"makannya segentong buset!"</span>
Random swap Character: <span class="s2">"makannya segetnong buset"</span>
Random delete Character: <span class="s2">"makanya segentong buset"</span>
</code></pre></div></div>

<hr />

<h2 id="troubleshooting">Troubleshooting</h2>

<h3 id="issue-cuda-out-of-memory">Issue: CUDA out of memory</h3>

<p>If you encounter issues CUDA out of memory:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Reduce batch size</span>
python train.py <span class="nt">--batch_size</span> 8

<span class="c"># Reduce embedding dimension</span>
python train.py <span class="nt">--embedding_dim</span> 64
</code></pre></div></div>

<h3 id="issue-module-not-found">Issue: Module not found</h3>

<p>If you encounter issues module not found:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Make sure virtual environment is activated</span>
<span class="c"># Reinstall dependencies</span>
pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt <span class="nt">--force-reinstall</span>
</code></pre></div></div>

<h3 id="issue-dataset-file-not-found">Issue: Dataset file not found</h3>

<p>If you encounter issues dataset file not found:</p>

<ul>
  <li>Make sure dataset_youtube_comment.xlsx is in the root directory</li>
  <li>Check file permissions (must be readable)</li>
</ul>

<h3 id="issue-transformers-model-cache">Issue: Transformers model cache</h3>

<p>If you encounter issues downloading IndoBERT:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Manual download</span>
python <span class="nt">-c</span> <span class="s2">"from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('indobenchmark/indobert-base-p1')"</span>
</code></pre></div></div>

<hr />

<h2 id="how-to-cite">How to Cite</h2>

<p>If you use or adapt this code/model in your research or publication, please use one of the following citation formats:</p>

<h3 id="bibtex-format">BibTeX Format</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@thesis<span class="o">{</span>arinanda2026cyberbullying,
  <span class="nv">title</span><span class="o">={</span>Sentiment Analysis of Cyberbullying Comments on TikTok Social Media Using TextCNN Architecture<span class="o">}</span>,
  <span class="nv">author</span><span class="o">={</span>Nikola Arinanda<span class="o">}</span>,
  <span class="nv">year</span><span class="o">={</span>2026<span class="o">}</span>,
  <span class="nv">school</span><span class="o">={</span>Institut Teknologi Sumatera <span class="o">(</span>ITERA<span class="o">)}</span>,
  <span class="nb">type</span><span class="o">={</span>Final Project<span class="o">}</span>,
  <span class="nv">address</span><span class="o">={</span>Lampung, Indonesia<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="apa-format">APA Format</h3>

<p>Arinanda, N. (2026). Sentiment Analysis of Cyberbullying Comments on TikTok Social Media Using TextCNN Architecture [Final Project]. Institut Teknologi Sumatera (ITERA).</p>

<h3 id="mla-format">MLA Format</h3>

<p>Arinanda, Nikola. “Sentiment Analysis of Cyberbullying Comments on TikTok Social Media Using TextCNN Architecture.” Final Project, Institut Teknologi Sumatera (ITERA), 2026.</p>

<h3 id="chicago-format">Chicago Format</h3>

<p>Arinanda, Nikola. “Sentiment Analysis of Cyberbullying Comments on TikTok Social Media Using TextCNN Architecture.” Final Project, Institut Teknologi Sumatera (ITERA), 2026.</p>

<h3 id="ieee-format">IEEE Format</h3>

<p>N. Arinanda, “Sentiment Analysis of Cyberbullying Comments on TikTok Social Media Using TextCNN Architecture”, Final Project, Institut Teknologi Sumatera (ITERA), 2026.</p>

<hr />

<h2 id="author-information">Author Information</h2>

<ul>
  <li><strong>Name</strong>: Nikola Arinanda</li>
  <li><strong>Study Program</strong>: Informatics Engineering</li>
  <li><strong>Institution</strong>: Institut Teknologi Sumatera (ITERA)</li>
  <li><strong>Year</strong>: 2026</li>
  <li><strong>Github</strong>: <a href="https://github.com/larinand">larinand</a></li>
</ul>

<hr />

<h2 id="contact-and-support">Contact and Support</h2>

<p>For questions or issues about this project, please:</p>

<ol>
  <li>Create an issue on the GitHub repository
<!-- Create an issue on the GitHub [repository](https://github.com/mctosima/Tugas_Akhir_Nikola/issues) --></li>
  <li>Contact the author via university email</li>
  <li>See documentation in the laporan/ folder</li>
</ol>

<hr />

<p><strong>Last Update</strong>: April 2026
<strong>Status</strong>: Active Development</p>

<!-- This research focuses on TextCNN and modified SEDepthwiseTextCNN for classifying cyberbullying in the TikTok comments dataset. In this study, I improve the accuracy with model modified SEDepthwiseTextCNN compared to previous [research](https://ieeexplore.ieee.org/document/10468424) using same dataset that uses BERT. -->

<!-- For using LaTeX on a desktop computer with VS Code, you can follow [this](https://youtu.be/4lyHIQl4VM8?si=TOYXOIaCTGxaEusH) video. -->]]></content><author><name></name></author><summary type="html"><![CDATA[TikTok Comment Sentiment Analysis Using TextCNN]]></summary></entry></feed>