Cyber-physical system resilience challenges
Our society’s mission towards sustainability puts the electricity system into the spotlight. Electrification of transport, heating, and industry makes the last remaining sectors also depending on a reliable power system. At the same moment, and even accelerated by the COVID-19 pandemic, the society’s dependency on digital services is increasing tremendously. As a consequence, a disruption of the power supply will have catastrophic and unprecedented consequences for society especially if it exceeds a critical time span. Therefore, resilience of the electrical system is becoming even more important than it already was.
By Prof. Dr. Peter Palensky & Siem Bruijns
At the same moment, however, the resilience of the electrical system is getting a new dimension due to fast digitization of the electrical system itself, ultimately changing it into a cyber-physical system. This digitization can be divided into two parts, the digitization of transmission and distribution networks itself and the digitization of all systems connected to the grid such as PV systems, EV charging, flexible assets and loads, and wholesale markets. An intended or unintended failure of these complex cyber systems are confronting grid operators with new challenges.
New resilience approaches are needed, taking cyber-physical mechanisms, people in the loop, and the entire complexity into account. This article describes these challenges from the perspective of a transmission system operator including the role of universities to get prepared for this.
As said, digitization of the electrical system takes place on two levels. The first level is the control of the network itself. Also here sustainability is an important driver. The replacement of conventional power plants requires new grid devices including advanced control systems to maintain the stability of the grid, DC-connections including power electronics are integrated in the system and smart systems are added to make it possible to operate the grid close to its physical limits.
The second level are all the systems connected to the grid and influencing its balance and by this its frequency. This ranges from pure consumers, to wind and solar producers, EV charging etc. including the connected flexibility and wholesale markets. Most of these are steered by highly sophisticated systems, especially the ones that are aggregated to (potentially) large amounts of steerable generation or demand. An example is EV-charging where organizations are growing very fast and consolidating smaller initiatives into systems with the potential to control large amounts of electricity. The number of these connected systems will also increase as a consequence of future sector coupling.
All these cyber systems are without doubt beneficial for the functioning of the electrical system and by that strongly supporting the energy transition. However, there are also new risks coming along with these developments. Because the electrical system, both the network itself as the connected systems, are so highly depending on cyber systems that a failing or corrupted cyber system could have serious consequences especially if such a cyber incident leads to a serious power outage or even black-out. Unfortunately, the chance for such an incident is increasing: On one side, cyber-attacks are getting “popular” and even face a growing attack-surface. On the other side, the fact that the number of cyber systems is rapidly increasing can also lead to unintended hard- or software failures such as race conditions or interoperability issues. On top of that, the complexity of the behavior of all these interacting systems - especially in not normal situations such as restoration - will add additional challenges.
TSOs are well prepared for solving problems in the physical system by operators and engineers. The cyber experts are also well prepared to solve IT/OT issues. The level of cyber security is in general also adequate. However, the question is whether we are sufficient prepared for a cascading black-out caused by a cyber incident (attack or unintended failure).
Main questions to be answered are:
- Are TSOs, together with DSOs and other relevant parties, prepared to adequately recover from a black-out in a situation with missing or corrupted control systems like SCADA/EMS, data communication, substation automation etc.?
- Do we know how connected systems, and especially the ones with substantial impact on the system balance, are behaving during restoration?
- Have we developed or are we developing new systems with a huge potential impact on the system balance like PV and EV charging systems, in such a way that failing or corrupted cyber systems cannot cause critical deviations in system balance/frequency?
Unfortunately, we cannot answer these questions with a clear "yes". Although TSOs and DSOs are very well prepared to deal with power outages, this is not sufficiently the case for situations caused by a cyber incident.
Fields of further work
A lot of further work has to be done to increase the level of resilience for possible power outages caused by a cyber incident. This work touches a wide range of aspects:
- More specific and advanced cyber security protection and detection techniques and tools for transmission and distribution control systems.
- Development of cyber resilient fall back systems in case crucial primary systems like SCADA/EMS are failing or corrupted.
- Develop smart solutions for containment and restoration that are reliable even in situations where parts of the cyber system are corrupted or down.
- Develop strategies, procedures and training for restoration after a power outage caused by a cyber incident with involvement of all relevant stakeholders.
- The complex cyber-physical system asks for common preparation and training of operators and IT/OT-specialists to adequately analyze, contain and restore the system after a power outage.
- Design connected systems like e.g. EV charging but also future sector coupling in such a way that they cannot jeopardize the system balance. Avoid future retro-fitting like for PV systems in Europe after the solar eclipse of 2015.
- Keep the view on the behavior and interaction between the increasing number of digital systems in the grid itself and especially under not-normal circumstances like a power outage.
- The control over the behavior of connected systems during restoration needs to be better organized to avoid counter-acting.
- Involve rule makers and regulators, especially regarding the necessary regulation for connected systems in order to avoid a collapse of the electrical system.
This broad range of developments is crucial get the level of resilience in the same pace as the fast digitization and increasing complexity of the system. TSOs and DSOs have an important role here but also need a strong support of and cooperation with universities and research institutes to develop new techniques and tools but also to model and simulate the complexity of the electrical system not only to understand the behavior but also to train operators, IT/OT specialists and other engineers.
Grid operators need new tools and methods in order to master the daily routine.. It is the complexity and diversity of the cyber-physical system that needs to be mastered, so that planning and operations can be done reliably. Virtually all methods and processes might require an “upgrade”: N-1 calculations, dynamic security assessment, or determining adequacy or stability limits of cyber-physical power systems cannot be done with static formulas or Eigenvalues alone. The foundations and models of our methods have to catch up.
Many of the challenges can be addressed with numerical models. They can serve as an analytical tool, or as an experimental or on-line replica that supports decision making in real-time. Once such a cyber-physical numerical model is embedded in a useful workflow, we call them “digital twins”. They are only as good as their models and as good as the applied workflow: If the model is inaccurate, or if the workflow is flawed, they do not help much. Beside these obvious requirements, there are still some more fundamental challenges that research is currently working on in order to make these twins a useful tool:
- Uncertainty propagation: how does uncertain and incomplete input data and model parameters affect the quality of the output data?
- Data-driven models: how can we generate model topologies and parameters via measurements and machine learning?
- Model synchronization: how can we keep a numerical model at pace with the real system?
- Humans-in-the-loop: how can these models interact with experts and operators? How can they learn from each other?
Digital twins are an established method in other sectors such as automotive, construction or defense. The power sector is now adopting this concept, which is especially interesting since it is both a result of digitalization and a driver.
Improving resiliency of cyber-physical power systems can be supported with cyber-physical twins, that represent IT, OT, the physical power grid, and connected/interacting systems. Some twins will be used in real-time for anomaly detection, others will be used off-line for planning purposes, others will be used ad-hoc in case of an incident or for training purposes. Important is to keep an eye on the interoperability of the used models and data in order to avoid inconsistencies.