Project Name:
- krehel
Wikitech Usernames of requestors:
- Dušan Kreheľ <Dusan_Krehel>
Purpose:
- §1: Create an analysis of the most frequent characters in the local some Wikipedies. [Article]
- §2: Creation of statistics of the most frequent URL domains in the source code for almost all Wikipedias. [Article]
- §3: Analysis of the number of unique terms (linguistics). [Article]
- Create an analyst of my new format for pageviews statistics
- 2 parts:
- Comparison with other matrix formats (I don't plan to run it on Cloud-VPS because it is not related to Wikimedia projects)
- §4: Practical comparison of the existing solution with my format for statistics pageviews (statistical results should be used in a deployment request for some Wikipedia Dump)
- 2 parts:
Legends/Terms:
- [Article] – now my plan is to use some statistics in the Wikipedia Signpost article.
- Wikipedia/-es – enwiki, dewiki, skwiki and more Wikipedia with shorted code suffix ended with "wiki".
For example, my experience: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-04-26/Special_report
Brief description:
- Analyzes:
- Software:
- My created software.
- Environment:
- C with GNU extensions
- PHP (php-cli, php-mbstring, php-intl).
- parallel, screen, pv
- get/curl
- another, the standart Linux enviroment
- Software:
Difficulty (in rough):
- §1
- Minimal difficulty:
- 2 thread cores, 35GB HDD, 1 GB RAM
- §2
- Minimal difficulty:
- 2 thread cores, 35GB HDD, 2 GB RAM
- §3
- Quite demanding for large Wikipedias. Preferably multi-threaded processing.
- More (=faster) thread cores, 100GB HDD, minimal 4 GB RAM
- §4
- a) daily statistic
- 3 thread cores, 70 GB HDD, 2GB RAM
- b) hourly statistic
- 4 thread cores, 1T GB HDD, 3GB RAM
- a) daily statistic
How soon you are hoping this can be fulfilled:
- I am not forced by any third party to have it as soon as possible. When it will be, it will be.