(1332, 7256)
$number$ | a | a4 | abajo | abandonada | abandonado | abandonados | abandono | abarata | abarcan | ... | kw_ya_que | struc_modal_auxiliary | struc_text_length | struc_text_position | struc_token_count | struc_avg_word_length | struc_punct_marks_count | synt_parse_tree_depth | synt_sub_clauses_count | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.022751 | 0.000000 | 0.012270 | 0.272727 | 0.000000 | 0.048780 | 0.015625 | spam |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.065150 | 0.000000 | 0.055215 | 0.272727 | 0.043478 | 0.219512 | 0.078125 | claim |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.111111 | 0.130300 | 0.000000 | 0.128834 | 0.272727 | 0.086957 | 0.146341 | 0.117188 | premise |
3 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.062048 | 0.047619 | 0.049080 | 0.363636 | 0.043478 | 0.146341 | 0.054688 | claim |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.002068 | 0.000000 | 0.006135 | 0.181818 | 0.086957 | 0.000000 | 0.000000 | spam |
5 rows × 7256 columns
C:\Users\Usuario\AppData\Local\Temp/ipykernel_7780/1727221646.py:8: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` num_df[col] = df[col]
(1332, 7255)
$number$ | a | a4 | abajo | abandonada | abandonado | abandonados | abandono | abarata | abarcan | ... | kw_visto_que | kw_ya_que | struc_modal_auxiliary | struc_text_length | struc_text_position | struc_token_count | struc_avg_word_length | struc_punct_marks_count | synt_parse_tree_depth | synt_sub_clauses_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.022751 | 0.000000 | 0.012270 | 0.272727 | 0.000000 | 0.048780 | 0.015625 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.065150 | 0.000000 | 0.055215 | 0.272727 | 0.043478 | 0.219512 | 0.078125 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.111111 | 0.130300 | 0.000000 | 0.128834 | 0.272727 | 0.086957 | 0.146341 | 0.117188 |
3 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.062048 | 0.047619 | 0.049080 | 0.363636 | 0.043478 | 0.146341 | 0.054688 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.002068 | 0.000000 | 0.006135 | 0.181818 | 0.086957 | 0.000000 | 0.000000 |
5 rows × 7255 columns
2
Explained Variance Ratio: 100.0
LDA1 | LDA2 | label | |
---|---|---|---|
0 | 11.179953 | 12.886342 | spam |
1 | -44.069918 | -11.088276 | claim |
2 | 37.946152 | -55.683425 | premise |
3 | -44.009614 | -11.224104 | claim |
4 | 5.302567 | 10.745235 | spam |
C:\Users\Usuario\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:881: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=6. warnings.warn(