C codes to compute tf and df for all substrings in a corpus.
-
C codes to appear in the paper:
-
Mikio Yamamoto and Kenneth W. Church. 2001. Using suffix arrays
to compute term frequency and document frequency for all substrings
in a corpus. Computational Linguistics, 27:1, pages 1-30.
-
The corrected version of Figure 7 in the above paper
-
print_LDIs.c
-
print_LDIs_stack.c
-
print_LDIs_with_df.c
-
C codes with the main function and sample data for the demonstration.
-
demo_print_LDIs.c
-
demo_print_LDIs_with_df.c
- Usage:
- % cc -o tfdf demo_print_ldis_with_df.c
- % tfdf