Dask - Ever tried using Pandas to process data that won't fit into memory? Dask makes it easy. ![]() It is focused on real-time operation, but supports scheduling as well." Celery - "an asynchronous task queue/job queue based on distributed message passing.Bonobo - Simple, modern and atomic data transformation graphs for Python 3.5+.Blaze - "translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems.".BeautifulSoup - Popular library used to extract data from web pages.Scriptella - Java-XML ETL toolbox for every day use.JSR 352 - Java native API for batch processing.GETL - Groovy toolbox for ETL Tasks from practicing architectures.Built with Java, it provides over 1000 plugins to support automating virtually anything, so that humans can actually spend their time doing things machines cannot." Jenkins - "the leading open-source automation server.Each job then kicks off a series of tasks (subprocesses) in an order defined by a dependency graph you can easily draw with click-and-drag in the web interface." Dagobah allows you to schedule periodic jobs using Cron syntax. Dagobah - "a simple dependency-based job scheduler written in Python.Chronos - "a distributed and fault-tolerant scheduler that runs on top of Apache Mesos that can be used for job orchestration.".With a unified view of pipelines and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke." ![]() It lets you define pipelines in terms of the data flow between reusable, logical components, then test locally and run anywhere.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |