nlp

NLP 101: Text Prepocessing 1 - Tokenization

This blog provides a comprehensive overview of key text preprocessing techniques like tokenization, lemmatization, stemming, stop-word removal, and handling punctuation. It also highlights their importance, practical applications, and limitations, setting a strong foundation for efficient natural language processing workflows.

kameshcodes

December 8, 2024

nlp tokenization stemming lemmitization

NLP 201: Text Prepocessing 2 - Vectorization

This blog breaks down essential methods for converting text into numerical representations for machine learning. It also covers One-Hot Encoding (OHE), Bag of Words (BoW), and TF-IDF, explaining their principles, use cases, and limitations. With clear examples and insights, it serves as a practical guide for building effective NLP models.

kameshcodes

November 30, 2024

nlp vectorization embedding jupyter-notebook

Made with REPL Notes Build your own website in minutes with Jupyter notebooks.

Start blogging for free