[D] Are there any Python libraries that allow byte-pair encoding that splits on something other than space?
I have a large corpus of source code, and space matters in certain languages (like Python). It seems that https://github.com/rsennrich/subword-nmt splits on space. Are there other packages that will split on ”?
submitted by /u/shamoons