Skip to content

Strategy Design Pattern with Java Enum

The Strategy is a design pattern that allows the software to chose one from a family of algorithms during the runtime. Each algorithm is implemented in its own class, which makes their clients interchangeable. Using the Strategy design pattern, a class can execute the same method in different ways, with different implementations. It is one of the patterns in the book Design Patterns by Gamma et al.

Extração de texto com Tika Server

Text Extraction and OCR With Apache Tika

Apache Tika is a library for extracting text from most file formats, including PDF, DOC, and PPT. Tika has a simplified interface that extracts the content, making it easy to operate the library. Its main uses are related to the indexing process in search engines, content analysis (journalism, for example), and even translation (using paid APIs).

Extração de texto com Tika Server

O Apache Tika é uma biblioteca para extração de texto da maioria dos formatos de arquivo, incluindo PDF, DOC e PPT. O Tika tem uma interface simplificada faz a extração do conteúdo, tornando-a uma biblioteca fácil de operar. Seus principais usos estão ligados ao processo de indexação em mecanismos de busca, análise de conteúdo (jornalismo, por exemplo) e até mesmo tradução (usando APIs pagas).

Lista de datasets para download

Alguns datasets disponíveis para download que podem ser usados para estudar data science.   http://dados.gov.br/ https://www.data.gov/ http://open.canada.ca/en https://data.gov.uk/ https://www.healthdata.gov/ http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml http://snap.stanford.edu/data/sx-stackoverflow.html https://archive.org/web/ https://index.okfn.org/dataset/ http://snap.stanford.edu/data/ https://github.com/caesar0301/awesome-public-datasets… Read More »Lista de datasets para download