(1332, 7256)
$number$ | a | a4 | abajo | abandonada | abandonado | abandonados | abandono | abarata | abarcan | ... | kw_ya_que | struc_modal_auxiliary | struc_text_length | struc_text_position | struc_token_count | struc_avg_word_length | struc_punct_marks_count | synt_parse_tree_depth | synt_sub_clauses_count | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.022751 | 0.000000 | 0.012270 | 0.272727 | 0.000000 | 0.048780 | 0.015625 | spam |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.065150 | 0.000000 | 0.055215 | 0.272727 | 0.043478 | 0.219512 | 0.078125 | claim |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.111111 | 0.130300 | 0.000000 | 0.128834 | 0.272727 | 0.086957 | 0.146341 | 0.117188 | premise |
3 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.062048 | 0.047619 | 0.049080 | 0.363636 | 0.043478 | 0.146341 | 0.054688 | claim |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.000000 | 0.002068 | 0.000000 | 0.006135 | 0.181818 | 0.086957 | 0.000000 | 0.000000 | spam |
5 rows × 7256 columns
C:\Users\Usuario\AppData\Local\Temp/ipykernel_8088/1727221646.py:8: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` num_df[col] = df[col]
(1332, 7255)
$number$ | a | a4 | abajo | abandonada | abandonado | abandonados | abandono | abarata | abarcan | ... | kw_visto_que | kw_ya_que | struc_modal_auxiliary | struc_text_length | struc_text_position | struc_token_count | struc_avg_word_length | struc_punct_marks_count | synt_parse_tree_depth | synt_sub_clauses_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.022751 | 0.000000 | 0.012270 | 0.272727 | 0.000000 | 0.048780 | 0.015625 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.065150 | 0.000000 | 0.055215 | 0.272727 | 0.043478 | 0.219512 | 0.078125 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.111111 | 0.130300 | 0.000000 | 0.128834 | 0.272727 | 0.086957 | 0.146341 | 0.117188 |
3 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.062048 | 0.047619 | 0.049080 | 0.363636 | 0.043478 | 0.146341 | 0.054688 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000000 | 0.002068 | 0.000000 | 0.006135 | 0.181818 | 0.086957 | 0.000000 | 0.000000 |
5 rows × 7255 columns
810
Explained Variance Ratio: 95.01325459254306
PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | ... | PC802 | PC803 | PC804 | PC805 | PC806 | PC807 | PC808 | PC809 | PC810 | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -1.164060 | -0.069563 | 0.151985 | 0.195801 | -0.104088 | 0.441369 | 0.058761 | -0.112306 | -0.004139 | 0.013003 | ... | -0.048169 | -0.053117 | -0.032265 | -0.027763 | -0.041062 | 0.011108 | 0.024697 | -0.010126 | -0.024952 | spam |
1 | -0.311263 | -0.199701 | 0.810015 | 0.540140 | -0.920673 | 0.243345 | 0.228098 | -0.369541 | 0.069813 | 0.116217 | ... | -0.054212 | -0.110060 | -0.088408 | -0.058945 | -0.038747 | 0.039362 | -0.004662 | -0.090483 | -0.029393 | claim |
2 | -0.124422 | -0.231020 | -0.739091 | 0.138815 | 0.258520 | 0.037944 | 0.492401 | 0.018274 | -0.043145 | -0.161600 | ... | -0.039788 | 0.126112 | 0.053602 | -0.006358 | 0.022935 | 0.001613 | 0.085924 | 0.062241 | 0.050706 | premise |
3 | -0.765176 | -0.049421 | 0.025430 | 0.482796 | 0.425197 | -0.190003 | 0.163052 | -0.228852 | 0.023106 | 0.066532 | ... | -0.064764 | -0.187304 | -0.011264 | 0.184579 | -0.243556 | 0.030899 | -0.034883 | -0.022024 | -0.175897 | claim |
4 | -1.329017 | -0.067418 | 0.070951 | 0.030050 | -0.245983 | 0.461533 | -0.164240 | 0.161236 | -0.013799 | 0.203501 | ... | -0.016564 | -0.015201 | 0.013751 | -0.000534 | -0.000784 | -0.006372 | 0.022294 | -0.008888 | 0.004404 | spam |
5 rows × 811 columns
C:\Users\Usuario\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:881: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=6. warnings.warn(