AI applications must be built on solid foundations to be secure. In addition to understanding the use context and building robust models with proper guardrails, they must be deployed and operate in a secured environment with appropriate controls, monitoring, and incident response. In this post, we take a quick look at the security of AI platforms.
In the previous posts, we started with AI USE CONTEXT and moved to the AI MODEL to finally arrive at the AI PLATFORM layer. Building and operating AI applications requires participants in multiple roles, significant resources, and complex infrastructures throughout their lifecycles. The underlying infrastructure may seem familiar territory with well-established cybersecurity standards and practices that apply to many other modern applications. However, there are new challenges related to unique technical dependencies and trust relationships or the speed of delivering prototypes to production needed to gain an advantage in the AI race. These challenges can lead to tensions in prioritizing security efforts and difficult decisions, especially when required security controls are missing, or they could be connected with a perceived reduction of the value of new scenarios. It should be clear that AI applications must be secure to be successful, and to be secure, they must be built on solid technical foundations. The goals of AI Security cannot be met if there are failures in AI PLATFORM, either due to open infrastructure, missing data protection, or gaps in access control or monitoring and response.
PLATFORM as foundation of AI Security
AI applications require most computing resources during training or tuning the model, but even when models are used (inference stage), the costs can be high compared to traditional applications. AI applications can be deployed in the cloud, on-prem, or with edge devices distributed in the physical world, with core cybersecurity requirements fully applicable in all these cases. Compared to AI MODEL and USE CONTEXT layers, the threats against the AI PLATFORM are much better understood, as they are mostly shared with other online applications. That, however, means that this layer is also best known by potential adversaries (now often using AI on their side), who will most likely start at this level, looking for the weakest link of the system. AI PLATFORM must be protected on its own but also with taking into consideration the specific requirements related to uniqueness of AI application, the sensitivity of data in use, and the possible impact of results – with particular attention paid to high-risk domains or mission-critical scenarios. We need to be sure that all requirements for the PLATFORM are identified and addressed in the scope of Infrastructure, Applications, Data protections, and Users.Â
INFRASTRUCTUREÂ for AI applications includes all required computing resources, storage, and network devices, but also sensors and actuators for interacting with environments.Â
The infrastructure can have different configurations, with the most common cloud, on-prem, or hybrid hosting for high-risk applications. AI applications tend to be more embedded in environments and complex decision-making workflows compared to traditional software. That could mean different trust relationships or the need for hardened and isolated environments for critical tasks.
The platforms require effective access control and secure communication between different resources (with the importance of a defense-in-depth or zero-trust approach). That can be challenged by the requirements of edge devices, e.g., the Internet of Medical Things. Some additional changes may be introduced by Machine Learning Ops (derived from DevOps), especially around continuous integration, delivery, or training.
AI applications are new, complex, and in high demand. As a result, they are very often created in cooperation with external partners as providers of data, services, or results from their models. There are different integration patterns, and some may be related to additional security and infrastructure requirements (e.g., hosting 3rd party AI components in an internal environment with inbound access for support).Â
SOFTWAREÂ components of AI applications include models, algorithms, frameworks, and other dependencies necessary to provide core functionality.Â
AI MODELS should be considered critical components of any AI application. They can be not only targets of specific adversarial attacks but also complex software that is usually released to production very quickly. AI products are very dependent on open-source frameworks and libraries, which can have a lot of traditional security vulnerabilities or features which can be unsafe to use. Â
Supply chain challenges are not limited to software dependencies but also include data used in training or external pre-trained models that are imported. Pre-trained models are programs and need to be treated as such from a security perspective. Since supply chain details may not always be fully transparent, failures at earlier stages may be unknowingly propagated to final applications.
AI MODELS are developed differently than traditional software and require more research and experimentation. We need proper support for model inventories, versioning, and tracking of experiments (often executed in parallel, with changes towards continuous learning). Sharing models with end-users requires proper design of APIs, validation of input and output (sic!), and support for recovery scenarios in case of incidents.
DATAÂ in AI applications refers to all information used in training or fine-tuning an AI model and later accepted and returned by it during inference.Â
The practical value of AI applications is closely tied to the amount and quality of available information. Sensitive data are commonly used and must be closely protected, with regulations in specific domains (e.g., HIPAA in healthcare). These days, encryption in transit and at rest has become standard. However, access to sensitive data should still be very limited, and anonymized or synthetic data should be used wherever possible.Â
Effective data governance is a vital requirement for successful data protection. We need to understand what data are stored, how they are used, their lineage, and how they flow through development, testing, and production environments. That means the need for complete and up-to-date data dictionaries with classification labels, documented business impact, and related processes for automated monitoring of data flows.Â
Sharing data with external parties always requires special attention from a security point of view. In the context of AI, there can be strong reasons not to share even fully anonymized data sets. Importing external data is another element of the supply chain for AI applications. External data (including results) needs to be authenticated and validated, and the source's reputation must be considered before any internal use.
USERSÂ and their behavior are critical for the security of complex systems as they can be targets of attack, or their actions can create exploitable vulnerabilities.
AI applications have different types of users participating in their development and operations, including business owners, data scientists, and ML developers or Ops engineers. They have various requirements regarding access to data and resources. Still, strict access control to data, models, and pipelines must be in place in every case, and the principle of least privileges should be followed.
AI development usually occurs in research environments that benefit from openness and collaboration. Security restrictions can be disruptive but also can be implemented in ways supporting such culture, e.g., by automation of training and experimentation, without direct access to data. Roles and responsibilities regarding access to sensitive data should be clear, and mitigations against insider threats must be in place.
AI applications must also be protected from end-users depending on requirements from USE CONTEXT. In most cases, proper access control (authentication & authorization), input and output validation, and comprehensive monitoring should be applied in inference. With externally exposed APIs, there should be limits and thresholds between calls as classical mitigation for adversarial attacks against confidentiality.
Need for completeness and continuity
AI Security efforts must cover all elements of AI applications, throughout their lifecycles, from idea and design, development and testing, to operating and incident response. The need for completeness applies to all layers, and PLATFORM plays a unique role as many requirements identified at USE CONTEXT and MODEL Layers will be implemented and integrated here. This is also the layer where the effectiveness of controls will be tested in practice and where we have the best chance to verify that all requirements are met (assuming they were correctly identified). That leads to the adaptation of best practices to unique characteristics of AI, and we can see emerging new areas like MLSecOps, which extends MLOps with more security focus. One area requiring special attention includes any external integrations in the scope of data, models, or services. That area becomes especially interesting in new scenarios built around Foundation Models (e.g., Open AI’s GPT-n or DALL-E), which can be fine-tuned and adapted to local tasks and contexts. Can we trust to share our data or import external data to improve our model? Can we trust the models and their results for use in our specific use context? Such questions should be asked with any new AI applications, but they are essential in an enterprise context where propriety or sensitive data are in use, and the impact of these applications might be significant. Â
AI Security efforts not only need to be complete but also continuous and adaptable to changing threats’ landscape or USE CONTEXT. Logging, monitoring, and reporting are essential tools to detect potential problems and trigger appropriate actions. The scope of PLATFORM monitoring starts with low-level metrics related to users’ activity and the security of storage, networks, or pipelines. That scope should be expanded to monitoring ML models in production and include input/output data, interactions of users in different roles, quality of results (e.g., drift detection), or robustness and resiliency. In the end, we should be able to detect anomalies, events, or changes in security state, but also new behavior of a model, unexpected results, or mismatches with requirements from the use context. The results should be closely integrated with the incident response process, covering all layers of AI application. That process would result in action addressing traditional incidents but could also impact the suitability of using a model in a particular context. Fixing some problems might be more complicated with AI applications, for example, when an AI model would have to be patched. Handling all that information is the area where specialized AI solutions can eventually become very useful. However, with automation and augmentation of security tasks, new questions arrive around the proper design of those solutions and required human oversight.Â
AI PLATFORM layer may seem the easiest part of AI Security, as it is the most familiar with known problems and the toolbox of solutions. That familiarity may result in focusing more on other layers and insufficient attention paid to the PLATFORM, which could lead to making mistakes that otherwise could and should have been avoided. That risk is very current, as with many AI applications, resources for security may be already limited due to extraordinary pressure for innovation and being first on the market. It will take some time for reality to re-adjust these priorities. In the meantime, we need to remember that the security of AI PLATFORM is critical for the security and success of any non-trivial AI application. This layer will be the first one attacked, and this is the layer where many controls addressing requirements from other layers can be most effectively implemented. Even if we do a great job of understanding AI USE CONTEXT and protecting AI MODEL but will make mistakes with the PLATFORM, the whole system will fail. The AI Security efforts are complex, executed under time pressure, with multiple stakeholders and priorities, but we need to ensure that nothing critical is missed or lost in translation.Â
Another great paper. Lots of thought providing information. Some questions answered and more questions to ask.