Răsfoiți Sursa

Fix graphite connection write contention

If polling for sensors + writing to graphite takes longer than the
configured interval, then it is possible for two threads to be writing
data to the same graphite connection. This will cause corrupted data to
be sent to graphite.

The fix is the limit the number of threads that have access to the
connection to 1. Other threads that would have caused contention are
forced to wait outside. To ensure there is not an unbounded number of
threads waiting to write, if they can't acquire a lock in a second then
they jettison the write attempt.
Nick Babcock 4 ani în urmă
părinte
comite
7a83fffba0
1 a modificat fișierele cu 26 adăugiri și 1 ștergeri
  1. 26 1
      OhmGraphite/GraphiteWriter.cs

+ 26 - 1
OhmGraphite/GraphiteWriter.cs

@@ -3,9 +3,9 @@ using System.Collections.Generic;
 using System.IO;
 using System.Net.Sockets;
 using System.Text;
+using System.Threading;
 using System.Threading.Tasks;
 using NLog;
-using LibreHardwareMonitor.Hardware;
 using static System.FormattableString;
 
 namespace OhmGraphite
@@ -13,6 +13,7 @@ namespace OhmGraphite
     public class GraphiteWriter : IWriteMetrics
     {
         private static readonly Logger Logger = LogManager.GetCurrentClassLogger();
+        private readonly SemaphoreSlim _semaphore = new SemaphoreSlim(1, 1);
 
         private readonly string _localHost;
         private readonly string _remoteHost;
@@ -31,6 +32,30 @@ namespace OhmGraphite
         }
 
         public async Task ReportMetrics(DateTime reportTime, IEnumerable<ReportedValue> sensors)
+        {
+            // Since the graphite writer keeps the same connection open across
+            // writes, we need to ensure that only one thread has access to
+            // the connection at a time. Multiple threads can be in this
+            // method when the time it takes to poll and write the data is
+            // longer than the interval time. However we don't want an
+            // unbounded number of threads stuck waiting to write, so
+            // jettison any attempt after waiting for more than a second.
+            if (!await _semaphore.WaitAsync(TimeSpan.FromSeconds(1)))
+            {
+                throw new ApplicationException("unable to acquire lock on graphite connection");
+            }
+
+            try
+            {
+                await SendGraphite(reportTime, sensors);
+            }
+            finally
+            {
+                _semaphore.Release();
+            }
+        }
+
+        private async Task SendGraphite(DateTime reportTime, IEnumerable<ReportedValue> sensors)
         {
             try
             {