
Presto experiments, interaction with HDFS/superset.Release Mediawiki History in JSON/CSV or mysql dump format (the best dataset to measure content and contributors).Enqueue eventlogging requests for better performance.Develop mediarequests API to get statistics of view of individual Wikimedia images.
Bot Detection Code Prototype: “Remove automated traffic not identified as such from readers data”.Set up of Mediarequests API public endpoint.Start planning work on Stream Configuration Service and Product use of with Event Platform.Continue moving events from eventbus to eventgate-main.Schema Repository CI for convention and backwards compatibility enforcement.Enable GPU infrastructure on stats machines with purely OS components.Finish Swift workflow to transfer binaries from Hadoop to production.Allow all Analytics tools to work with Kerberos auth.Sunset MySQL data store for eventlogging.Bot Detection “Remove automated traffic not identified as such from readers data” In progress.Deprecate eventlogging-service-eventbus Done.
Create test Kerberos identities/accounts for some selected users from Analytics Team in test cluster Done. Set up a generic workflow to create Kerberos accounts In progress. Finalize productionizing kerberos service, and then possibly enabling it Done. task T159170, this work continues from Q1 Done Migrate eventlogging to python3 task T234593 Done Reduce Operational Load by Phasing Out Legacy Systems/Technologies Sunset MySQL data store for eventlogging. Increase Resilience of Systems New zookeeper cluster for tier-2 task T217057 Done Core. Increase Data Quality, Privacy and Security Deploy Entropy-based alarms for data issues that could indicate, bugs, traffic drops due to censorship on inconsistencies task T215863, this work continues from Q1 In progress Productionize Kerberos Service Done Create test Kerberos identities/accounts for some selected users from Analytics Team in test cluster T212258, Done Core. Announce the deployment of the mediarequests API: task T231589 Done Add mediarequests metrics to Wikistats UI task T234589 Done Smart Tools for Better Data. Make easier to understand how Commons media is used across our projects. Make easier to understand the history of all Wikimedia projects Release Mediawiki History in JSON/CSV or mysql dump format (the best dataset to date measure content and contributors) N Blocked Deploy hadoop client to dump hosts so mediawiki history public dataset can get to dumps on a reasonable timeframe task T234229 In progress Smart Tools for Better Data. Resolve Kafka Connect HDFS Licensing issue and decide if we will use Kafka Connect task T223626 N Postponed Initial (Stream) Config Service implementation in vagrant task T233634 Done Smart Tools for Better Data. Modern Event Platform Build a reliable, scalable, and comprehensive platform for creating services, tools and user facing features that produce and consume event data Team Manager: Nuria Ruiz Reduce platform Complexity.