User Details
- User Since
- Oct 7 2014, 6:35 PM (498 w, 1 d)
- Availability
- Available
- IRC Nick
- dr0ptp4kt
- LDAP User
- Unknown
- MediaWiki User
- ABaso (WMF) [ Global Accounts ]
Fri, Apr 19
For those following along, have a look at the comment in T358349#9727873 to identify the notebook helping to fill a table in @EBernhardson's namespace and an example Superset.
Updated AC to say daily where it incorrectly said monthly within the Preferred section. It already said "estimated daily unique devices" so was hopefully sufficiently clear, but still. Sorry!
Thu, Apr 18
Wed, Apr 17
@EBernhardson I had duplicated the verbiage "estimated daily unique devices, based on unique_devices_per_domain_monthly" (emphasis on incorrect "monthly" in Preferred section), but have now updated the Preferred section to say "estimated daily unique devices, based on unique_devices_per_domain_daily" to correct this glitch. I think you have this covered already, but just wanted to make sure the edit was obvious.
@EBernhardson I had duplicated the verbiage "estimated daily unique devices, based on unique_devices_per_domain_monthly", but have now updated the Preferred section to say "estimated daily unique devices, based on unique_devices_per_domain_daily". I think you have this covered already, but just wanted to make sure the edit was obvious.
Tue, Apr 16
Running time
Total Uptime: 55 min
I kicked off a run using the current version of the patch with the following command and backing table, and its status should be able to be followed here: https://yarn.wikimedia.org/cluster/app/application_1713178047802_16409
Fri, Apr 12
@EBernhardson I updated the AC.
Short run we determined that the following are the initial focus:
(Updated previous comment. Do this in conjunction with the other tickets, not necessarily afterward.)
@EBernhardson I updated the AC to capture the essence of IRC discussion and the what we went over in Etherpad.
@EBernhardson I updated the AC to indicate that this should only be specified where there is high confidence signaling. For the near term, this notion of "successful searches" (satisifed searches) analysis comes after the other analysis.
@EBernhardson should we close this as a duplicate and move "(full text search, go bar, ...)" as a dimension in T358352: Search Metrics - Number of user sessions using search?
Wed, Apr 10
Good news. With the N-triples style scholarly entity graph files, with a buffer capacity of 1000000, a write retention queue capacity of 4000, and a heap size of 31g, on the gaming-class desktop, it took about 2.40 days. Recall that with buffer capacity of 100000 it took about 3.25 days on this desktop (and again, recall that it was 5.875 days on wdqs1024). So, there was about a 35% (1.35 minus 1) speed increase with the higher buffer capacity here on this gaming-class desktop.
Mon, Apr 8
Update: With the buffer capacity at 1000000, file number 550 of the scholarly graph was imported as of Mon Apr 8 03:22:08 PM CDT 2024 . So, under 28 hours so far (buffer capacity at 100000 was more than 36 hours).
Historically this was based on dwell time as a satisfied search. Plan would be to re-use that metrics if the source data points still hold.
Closing this out until newer Java comes to the analytics cluster.
Sun, Apr 7
With bufferCapacity at 1000000, kicked it off again with the scholarly article entity graph files:
Update. On the gaming-class machine it took about 3.25 days to import the scholarly article entity graph, using a buffer capacity of 100000 (compare this with 5.875 days on wdqs1024). This resulted in 7_643_858_078 triples as expected. Next up will be with a buffer capacity of 1000000 to see if there is any obvious difference in import time.
Fri, Apr 5
Just updating on how far along this run is, file 550 of the scholarly article entity side of the graph is being processed. There are files 0 through 1023 for this side of the graph. Note that I did think to tee output this time around so that generally/hopefully there's more info available to review output, stack traces (although hopefully there are none), and so on, should it be needed.
Thu, Apr 4
Following roughly the procedure in P54284 to rename the Spark-produced graph files (and updating loadData.sh with FORMAT=part-%05d-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz and still having a date call after each curl in it), I kicked off an import of the scholarly article entity graph like so to see how it goes with a buffer capacity of 100000:
Wed, Apr 3
This morning of April 3 around 6:25 AM I had SSH'd to check progress, and it was working, but going slowly, similar to the day before. It was on a file number in the 1200s, but I didn't write down the number or copy terminal output; I do remember seeing it was taking around 796 seconds for one of the files at that time. Look at the previous comment, you'll see those were going slow; not surprising as we know imports on these munged files are slower upon more stuff is imported.
Tue, Apr 2
Now this is interesting: we're now past 4 days (about 4 days and 1 hour) of this running, and with buffer capacity at 100000 instead of 1000000 (but this time without any gap between the batches of files), there's still a good way to go yet.
Mon, Apr 1
The run with with buffer at 1000000 and heap size at 31g and queue capacity at 4000 on the gaming-class desktop completed.
See attached. Here I was trying to click on the "terms of use" and "privacy policy" links, with no luck. Then see a click on the footer of the overlay working okay.
Makes sense to shelve for now @Marostegui.
Mar 22 2024
I'm interested as well, as I intend to looking at some image dumping stuff, and the surrounding HTML will be important for understanding context.
Okay, if I understand correctly, then the idea would be to...
Mar 21 2024
By the way, I'm attempting a run for the first 1332 munged files (one shy of the 1333 where terminated last time around) with buffer at 1000000 and heap size at 31g and queue capacity at 4000 on the gaming-class desktop to see whether this imports smoothly and whether performance gains are noticeable.
Mar 20 2024
The run to check with heap size of 31g, queue capacity of 8000, and buffer at 1000000 stalled at file 107.
Attempting a run with a queue capacity of 8000 and buffer of 1000000 and heap size of 16g on the gaming-class desktop to mimic the MacBook Pro, things were slower than a queue capacity of 4000 and buffer of 1000000 and heap size of 31g on the gaming-class desktop.
Mar 19 2024
About Amazon Neptune
Going for much of the full import
More about bufferCapacity
More about NVMe versus SSD
AWS EC2 servers
Mar 8 2024
Thanks @bking ! It looks like the NVMe in this one is not a higher speed one for writes (based on the reported model from lsblk I think this is a https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/Dell-PowerEdge-Express-Flash-NVMe-Mixed-Use-PCIe-SSD.pdf ), and I'm also wondering if perhaps its write performance has degraded with age. I'll paste in the results here, but this was slower than the other servers, ironically (although not surprisingly because of the slower NVMe and slightly slower processor). This slower write speed is atypical of the other NVMes I've encountered. I believe the newer model ones are rated for 6000 MB/s for writes. But, I'm going to ping on task to see if we can get a comparative read of disk throughput from one of the newer and faster cp#### NVMes.
Mar 7 2024
First, adding some commands that were used for Blazegraph imports on Ubuntu 22.04. I had originally tried a good number of EC2 instance types, and then after that went back to focus on just four of them with a sequence of repeatable commands (this wasn't scripted, as I didn't want to spend time automating and also wanted to make sure I got the systems' feedback along the way). I forgot to grab RAM frequency as a routine step when running these commands (I recall checking on one server maybe in the original checks, and did look at my Alienware), but generally servers are DDR4 unless the documentation in AWS says DDR5 (for my 2018 Alienware and 2019 MacBook Pro they're DDR4, BTW).
Mar 6 2024
Mar 5 2024
Originally, the thought was to be able to simply count relative volume of these types of inbound taps/clicks. Although we want fidelity on whether a link actually resolves to a page (and I know there are Phabricator comments about this here and elsewhere), often a simple count is sufficient to know if there's any traction whatsoever. I see that it's considered desirable to have a definite mapping of bona fide pageviews or previews (or other things of that nature) to these wprov values - makes sense.
@VRiley-WMF any pointers on how to iDRAC / iLO to this node and establish with a hostname of wdqs1025.eqiad.wmnet? I'm wondering if maybe there's a direct IP or IPs given that there don't seem to be DNS records for cp1086.eqiad.wmnet or cp1086.mgmt.eqiad.wmnet?
Mar 4 2024
Mar 1 2024
Thanks @VRiley-WMF ! @bking is up next for imaging, I think.
Feb 29 2024
Hi team - @lbowmaker asked if I could take a look at this and provide some context. I was having a think on this, and I'd like to ponder up to a few more days and provide some thoughts.
Feb 28 2024
Feb 27 2024
After setup, I would be interested in using it for 6 weeks if that's okay (hopefully things would only take 4 weeks, but there's some PTO and real life stuff always comes up). Would that be okay?
Feb 9 2024
Feb 8 2024
Feb 5 2024
I summarized at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Graph_split_IGUANA_performance . When we have a mailing list post during the next week or so, we'll want to move this to be a subpage of the target page of the post.