FNAL - LQCD Documentation

New Users / Account Renewal

User Authentication

Kerberos and SSH Troubleshooting

Project Allocations

SLURM Batch System

Software Details

Hardware Details

Data Management

Filesystem Details

Mass/Tape Storage Details

Globus Online

Getting Help

FAQs

LIVE Cluster Status

Contact Us

Fermilab LQCD Support - Service Desk

The LQCD-Admin staff use Fermilab's Service Now (aka. SNOW) ticketing system for tracking incidents and requests. Email sent to lqcd-admin@fnal.gov automatically generates an Incident ticket in Fermilab's Service Desk system. This is by design so that these emails are tracked and each user's report or request is seen and responded to. There are timers and automatic escalation procedures in place to notify management if the response takes longer than the agreed upon response time as laid out in a Service Level Agreement (SLA).

It can be confusing when the email you sent creates an Incident and the first thing LQCD-Admin staff does is convert it into a Request ticket. We explain the workflows and ticket lifetimes below to help our users understand how this system works. The whole Service Desk system is designed to get your issue or question to the proper team member(s) to be addressed as well as to save a work log and better comunicate with our users.

Incidents vs Requests

There are two types of tickets, each having a series of states (ie. assigned, work-in-progress, pending) that it moves through during its lifetime.

Incidents - Something broken

Incident ticket's are for when something is broken and needs a timely response. Our current SLA requires that we acknowledge a new Incident within four business hours. All Incident tickets will have a unique INC number. When HPC-Admin begins work on the ticket it is moved to a Work-in-Progress status. If the ticket is waiting for a response from the user or from a vendor (ie. hardware repair) it is put in a Pending status. When the task is completed, the ticket is set to a Resolved status.

Examples of the types of issues that are managed as Incidents.

  • Reporting a node or host being down or unreachable
  • Errors while logging in or submitting jobs
  • Problems accessing disk or tape storage areas

Requests - Something asked for

Request tickets are for when a user asks for something new, something to be changed or simply asks a question. Response time for Request tickets are still tracked, but they do not have a strict escallation process like Incidents. Request ticket go through a status sequence of Accepted (awaiting approvals, etc.), Work-in-Progress, Pending and Closed.

Each Request has an REQ number assigned at the top level. and will also create Our system creates one or more RITM elements (aka. Requested ITeM) for each request, normally just one RITM. All work and correspondance are tracked using the RITM number. Once all RITM components are completed, the top level REQ is closed.

Examples of the types of issues that are managed as Incidents.

  • Requests to add a new user account to an existing project
  • Questions about allocations, storage quotas or resources available to a particular account
  • Questions about submitting jobs or understanding jobs that are pending or held

Interacting with Service Desk tickets

Email to lqcd-admin@fnal.gov starts the process

When users need to ask for something or report a problem, we ask that you send email to lqcd-admin@fnal.gov. Please do not send such emails to individual members of our team or to the hpc-admin@fnal.gov address. Your email will generate an Incident ticket. Where appropriate, our team will convert the INC to an REQ / RITM. All correspondance related to the ticket should then be sent as replies to INC or RITM emails from the Service Desk system. Using the ticket number in the message subject, all coorespondance will be recorded into the proper ticket's worklog.

Additional notes regarding ticket lifetimes

Once an INC has been migrated to an REQ / RITM, please be sure to send your responses to the RITM emails. Any further responses sent againt the INC number will still be tacked in to the INC worklog, but HPC-Admin staff will be working off of the work log / correspondance tied to the RITM.

Once an INC is resolved or each RITM is closed, a user can still send addition correspondance as a reply to that ticket number if you feel your problem or request has not been resolved or completed. It is often more appropriate to open a new Incident or Request that better describes what you are reporting or asking for. Your can reference the original INC or RITM number when you create your new ticket so that our staff is made aware of a previous related ticket. Referencing the previous worklog may help us better respond or help us see some larger related issue that needs to be addressed.

This page last updated 16 April 2019. If you have questions or feedback regarding this policy, please send email to hpc-admin@fnal.gov.

usqcd-webmaster@usqcd.org