Page MenuHomePhabricator

create script to dump RT tickets and relevant data to static and structured format
Closed, ResolvedPublic

Description

Details

Reference
fl408
TitleReferenceAuthorSource BranchDest Branch
Allow configuration of AddressFamily used for DNS validationrepos/sre/acme-chief!1brettchange-574221-allow-config-of-addressfamily-dns-validationmain
Customize query in GitLab

Event Timeline

flimport raised the priority of this task from to High.Sep 12 2014, 1:39 AM
flimport set Reference to fl408.

dzahn wrote on 2014-08-28 01:25:32 (UTC)

so we have a bunch of attachments on RT, the problem is that some should be kept, for example.. NDA's from volunteers

but others we don't care about at all, like when external users attach a bunch of clutter to their HTML mail, like, say a Dell logo..

so..i looked at the RT database schema a bit.

in the "Tickets" table we have a "type" property. I checked which types there are and we have only 2 of them, actual "ticket"s and reminders. I would say the reminders can be completely ignored, so sorting these out.

next there is "id" and "effective id" , and i'm not sure how exactly they work yet. but we have:

7099 where type is Ticket and id=EffectiveId but also

632 where type is Ticket and id != EffectiveId

i expect you would like one list of tickets "per queue" to import them into one project per queue, right

should we first just create the needed projects already?

Tickets:

+-----------------+--------------+------+-----+--------------+----------------+
| Field           | Type         | Null | Key | Default      | Extra          |
+-----------------+--------------+------+-----+--------------+----------------+
| id              | int(11)      | NO   | PRI | NULL         | auto_increment |
| EffectiveId     | int(11)      | NO   | MUL | 0            |                |
| Queue           | int(11)      | NO   | MUL | 0            |                |
| Type            | varchar(16)  | YES  |     | NULL         |                |
| IssueStatement  | int(11)      | NO   |     | 0            |                |
| Resolution      | int(11)      | NO   |     | 0            |                |
| Owner           | int(11)      | NO   | MUL | 0            |                |
| Subject         | varchar(200) | YES  |     | [no subject] |                |
| InitialPriority | int(11)      | NO   |     | 0            |                |
| FinalPriority   | int(11)      | NO   |     | 0            |                |
| Priority        | int(11)      | NO   |     | 0            |                |
| TimeEstimated   | int(11)      | NO   |     | 0            |                |
| TimeWorked      | int(11)      | NO   |     | 0            |                |
| Status          | varchar(64)  | YES  |     | NULL         |                |
| TimeLeft        | int(11)      | NO   |     | 0            |                |
| Told            | datetime     | YES  |     | NULL         |                |
| Starts          | datetime     | YES  |     | NULL         |                |
| Started         | datetime     | YES  |     | NULL         |                |
| Due             | datetime     | YES  |     | NULL         |                |
| Resolved        | datetime     | YES  |     | NULL         |                |
| LastUpdatedBy   | int(11)      | NO   |     | 0            |                |
| LastUpdated     | datetime     | YES  |     | NULL         |                |
| Creator         | int(11)      | NO   |     | 0            |                |
| Created         | datetime     | YES  |     | NULL         |                |
| Disabled        | smallint(6)  | NO   |     | 0            |                |
+-----------------+--------------+------+-----+--------------+----------------+

Attachments:

+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int(11)      | NO   | PRI | NULL    | auto_increment |
| TransactionId   | int(11)      | NO   | MUL | NULL    |                |
| Parent          | int(11)      | NO   | MUL | 0       |                |
| MessageId       | varchar(160) | YES  |     | NULL    |                |
| Subject         | varchar(255) | YES  |     | NULL    |                |
| Filename        | varchar(255) | YES  |     | NULL    |                |
| ContentType     | varchar(80)  | YES  |     | NULL    |                |
| ContentEncoding | varchar(80)  | YES  |     | NULL    |                |
| Content         | longblob     | YES  |     | NULL    |                |
| Headers         | longtext     | YES  |     | NULL    |                |
| Creator         | int(11)      | NO   |     | 0       |                |
| Created         | datetime     | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+

Users:

+-----------------------+--------------+------+-----+---------+----------------+
| Field                 | Type         | Null | Key | Default | Extra          |
+-----------------------+--------------+------+-----+---------+----------------+
| id                    | int(11)      | NO   | PRI | NULL    | auto_increment |
| Name                  | varchar(200) | NO   | UNI | NULL    |                |
| Password              | varchar(256) | YES  |     | NULL    |                |
| Comments              | text         | YES  |     | NULL    |                |
| Signature             | text         | YES  |     | NULL    |                |
| EmailAddress          | varchar(120) | YES  | MUL | NULL    |                |
| FreeformContactInfo   | text         | YES  |     | NULL    |                |
| Organization          | varchar(200) | YES  |     | NULL    |                |
| RealName              | varchar(120) | YES  |     | NULL    |                |
| NickName              | varchar(16)  | YES  |     | NULL    |                |
| Lang                  | varchar(16)  | YES  |     | NULL    |                |
| EmailEncoding         | varchar(16)  | YES  |     | NULL    |                |
| WebEncoding           | varchar(16)  | YES  |     | NULL    |                |
| ExternalContactInfoId | varchar(100) | YES  |     | NULL    |                |
| ContactInfoSystem     | varchar(30)  | YES  |     | NULL    |                |
| ExternalAuthId        | varchar(100) | YES  |     | NULL    |                |
| AuthSystem            | varchar(30)  | YES  |     | NULL    |                |
| Gecos                 | varchar(16)  | YES  |     | NULL    |                |
| HomePhone             | varchar(30)  | YES  |     | NULL    |                |
| WorkPhone             | varchar(30)  | YES  |     | NULL    |                |
| MobilePhone           | varchar(30)  | YES  |     | NULL    |                |
| PagerPhone            | varchar(30)  | YES  |     | NULL    |                |
| Address1              | varchar(200) | YES  |     | NULL    |                |
| Address2              | varchar(200) | YES  |     | NULL    |                |
| City                  | varchar(100) | YES  |     | NULL    |                |
| State                 | varchar(100) | YES  |     | NULL    |                |
| Zip                   | varchar(16)  | YES  |     | NULL    |                |
| Country               | varchar(50)  | YES  |     | NULL    |                |
| Timezone              | varchar(50)  | YES  |     | NULL    |                |
| PGPKey                | text         | YES  |     | NULL    |                |
| Creator               | int(11)      | NO   |     | 0       |                |
| Created               | datetime     | YES  |     | NULL    |                |
| LastUpdatedBy         | int(11)      | NO   |     | 0       |                |
| LastUpdated           | datetime     | YES  |     | NULL    |                |
| AuthToken             | varchar(16)  | YES  |     | NULL    |                |
+-----------------------+--------------+------+-----+---------+----------------+

out of the other existing tables:

"Articles" - this table is empty
"Topics" - this table is empty
"Templates" - those are mail templates not content, can ignore

"Links" - these are the links from one ticket to another - that does contain valuable data somehow .. uhm.. would be nice to have this info and also link them here

"Queues" - well yeah, the queues, we want to create one project per queue, right?

"ACL" - this would be tricky to map, though documented on wikitech/office wiki how to map them all to rights per queue per user group. we are probably going to copy the permissions manually to a new standard, described in T39

"Attributes" - these are settings like dashboards, saved searches .. i think we can ignore entirely

"CachedGroupMembers" ??

"Classes" - this is empty

"CustomFieldValues" - these were custom tags , we had 2 of them "hackathon" and "hardware failure" - i don't think we'll care about them anymore

"CustomFields" - these are links from RT to related Bugzilla tickets, also URLs that link from RT to relevant wikitech pages, and a "third party ticket" field.. i don't know .. the RT->BZ links would be nice to have

"GroupMembers"/"Groups" - group membership of users, can be used to determine if people are in ops, engineering, volunteers , legal or external user like vendors and random requestors from outside the team

"ObjectClasses" - empty

"ObjectCustomFieldValues" - like tags for tickets that give it a certain topic.. i think we ignore

"ObjectTopics" - empty

"Principals" - mapping of user/groups to object ids ?

"ScripActions"/"ScripConditions"/"Scrips" - those were scrips (yea, not scripts) to do things automatically,like sending out mail template on certain actions - doesn't matter, will have to be replaced by herald or something

"Users" - well, we will want those, already dumped a list of users in another ticket though

"sessions" - obviously not needed

... and this is all tables there are..

SELECT Tickets.id,EffectiveId,Queues.name as Queue,Subject,Status,Users.Name as TicketOwner,(select Users.Name from Users where Tickets.Creator=Users.id) as TicketCreator,Tickets.Created,Tickets.LastUpdated,(select Users.Name from Users where Tickets.LastUpdatedBy=Users.id) as LastUpdatedBy FROM Tickets JOIN Users on Tickets.Owner=Users.id JOIN Queues on Tickets.Queue=Queues.id WHERE Queues.name="core-ops" ORDER BY id LIMIT 10;

I have the beginnings of a script to grab data from the api, when diffusion is opened here I'll link to it. phabricator/tools

exportRT.sh

#!/bin/bash
# export tickets from all RT queues as XML files
# dzahn for "T245"

declare -a QUEUES=( access-requests codfw core-ops domains eqiad esams legal maint-announce network ops-requests pmtpa procurement ulsfo )

MYSQL=$(which mysql)
OUTDIR="/tmp/RTexport"

if [ ! -d "$OUTDIR" ]; then
        mkdir $OUTDIR
        chown mysql $OUTDIR
fi

echo -e "\nstarting RT export - using ${OUTDIR}\n"
for queue in "${QUEUES[@]}"
do
        $MYSQL -u root rt -X -e "SELECT Tickets.id,EffectiveId,Queues.name as Queue,Subject,Status,Users.Name as TicketOwner,(select Users.Name from Users where Tickets.Creator=Users.id) as TicketCreator,Tickets.Created,Tickets.LastUpdated,(select Users.Name from Users where Tickets.LastUpdatedBy=Users.id) as LastUpdatedBy FROM Tickets JOIN Users on Tickets.Owner=Users.id JOIN Queues on Tickets.Queue=Queues.id WHERE Queues.name='${queue}' ORDER BY id" > ${OUTDIR}/${queue}.xml

        echo -e "exporting queue '${queue}' - wrote ${OUTDIR}/${queue}.xml\n"
done

cd $OUTDIR
pwd
ls
echo -e "\ndone\n"

this gives us:

access-requests.xml core-ops.xml eqiad.xml legal.xml network.xml pmtpa.xml ulsfo.xml
codfw.xml domains.xml esams.xml maint-announce.xml ops-requests.xml procurement.xml

copied the resulting XML files on iron. this is obviously still missing the content of comments and the attachments though.. i figure we can import in more than 1 step, make the queues, import the tickets with their titles and owners, add the comments/attachments

also see scripts in the phabricator/tools repo called "rt_create.py" and "rt_fetch.py"

rt_fetch.py mimicking similar behavior to bugzilla_fetch.py seems to work reliably.

Out of the queues on the block for migration we get 5927 tasks, all of which seem to convert.