Logging Service Incident in Americas
Incident Report for Palo Alto Networks Cloud Services
Resolved
We have now resolved the log ingestion delay for Logging Service. A small number of customers continue recovery.
Posted Oct 26, 2018 - 21:56 UTC
Update
On October 12, 2018 Logging Service in the Americas began experiencing intermittent log ingestion outages for a small number of customers. During the outage windows affected customers we able to access historical logs, the service continued to collect and cache new information, but there was a delay in its availability.

Affected Services

Logging Service: Log ingestion disruption during outage window
Traps: Log viewing in Traps management service
Panorama: Log viewing for GlobalProtect cloud service and next-generation firewalls

Mitigation Efforts

To speed resolution, we paused indexing of new logs into the Logging Service from 11:20am to 10:10pm PDT on October 18, 2018, followed by a controlled resumption of log indexing. During the downtime, logs continued to be collected by Logging Service, but were delayed for processing to ensure stability of the service. After the service returned to normal, a small number of customers may have experienced a delay in viewing the latest, as the system caught up with all cached logs. The duration of the delay varies by customer, depending on data volume and ingestion rate.

Root Cause

We discovered an issue with certain log formats that lead to an unexpected load on log indexing clusters. We have implemented effective short-term measures to handle these log formats and continue to work on permanent fixes. We expect that ingestion and indexing will continue without issue. We continue to monitor the log viewing gap and catch-up rate to ensure all customers can access their data in a timely manner.
Posted Oct 19, 2018 - 23:01 UTC
Update
Since October, 12th 2018 Logging Service in the Americas region has been experiencing intermittent log ingestion outages for a small number of customers. During the outage windows affected customers can access historical logs, and the service continues to collect and cache new information, but there will be a delay in its availability.

Affected Services

Logging Service: Log Ingestion
Traps: Log viewing in Traps management service
Panorama: Log viewing for GlobalProtect cloud service, and next-generation firewalls

Mitigation Efforts

Restoring all services to 100% is our top priority. To speed resolution, we have paused ingestion of new logs into the Logging Service for approximately six hours. During this time customers will not experience data loss, as it continues to be collected and cached. After the service resumes customers may see a delay in viewing their latest data as the system returns to normal operations.

Root Cause

We are actively investigation root cause and will provide a full update when more details are available.
Posted Oct 18, 2018 - 22:18 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 17, 2018 - 04:39 UTC
Identified
We are currently experiencing ingestion outage with Logging Service in Americas. We are actively working on this and will post updates on the status.
Posted Oct 16, 2018 - 22:28 UTC
This incident affected: Cortex Data Lake (United States - Americas).