Unlocking the Power of AI with a Real-Time Data Strategy
By
George
Trujillo,
Principal
Data
Strategist,
DataStax
Increased
operational
efficiencies
at
airports.
Instant
reactions
to
fraudulent
activities
at
banks.
Improved
recommendations
for
online
transactions.
Better
patient
care
at
hospitals.
By
George
Trujillo,
Principal
Data
Strategist,
DataStax
Increased
operational
efficiencies
at
airports.
Instant
reactions
to
fraudulent
activities
at
banks.
Improved
recommendations
for
online
transactions.
Better
patient
care
at
hospitals.
Investments
in
artificial
intelligence
are
helping
businesses
to
reduce
costs,
better
serve
customers,
and
gain
competitive
advantage
in
rapidly
evolving
markets.
Titanium
Intelligent
Solutions,
a
global
SaaS
IoT
organization,
even
saved
one
customer
over
15%
in
energy
costs
across
50
distribution
centers,
thanks
in
large
part
to
AI.
To
succeed
with
real-time
AI,
data
ecosystems
need
to
excel
at
handling
fast-moving
streams
of
events,
operational
data,
and
machine
learning
models
to
leverage
insights
and
automate
decision-making.
Here,
I’ll
focus
on
why
these
three
elements
and
capabilities
are
fundamental
building
blocks
of
a
data
ecosystem
that
can
support
real-time
AI.
DataStax
Real-time
data
and
decisioning
First,
a
few
quick
definitions.
Real-time
data
involves
a
continuous
flow
of
data
in
motion.
It’s
streaming
data
that’s
collected,
processed,
and
analyzed
on
a
continuous
basis.
Streaming
data
technologies
unlock
the
ability
to
capture
insights
and
take
instant
action
on
data
that’s
flowing
into
your
organization;
they’re
a
building
block
for
developing
applications
that
can
respond
in
real-time
to
user
actions,
security
threats,
or
other
events.
AI
is
the
perception,
synthesis,
and
inference
of
information
by
machines,
to
accomplish
tasks
that
historically
have
required
human
intelligence.
Finally,
machine
learning
is
essentially
the
use
and
development
of
computer
systems
that
learn
and
adapt
without
following
explicit
instructions;
it
uses
models
(algorithms)
to
identify
patterns,
learn
from
the
data,
and
then
make
data-based
decisions.
Real-time
decisioning
can
occur
in
minutes,
seconds,
milliseconds,
or
microseconds,
depending
on
the
use
case.
With
real-time
AI,
organizations
aim
to
provide
valuable
insights
during
the
moment
of
urgency;
it’s
about
making
instantaneous,
business-driven
decisions.
What
kinds
of
decisions
are
necessary
to
be
made
in
real-time?
Here
are
some
examples:
Fraud
It’s
critical
to
identify
bad
actors
using
high-quality
AI
models
and
data
Product
recommendations
It’s
important
to
stay
competitive
in
today’s
ever-expanding
online
ecosystem
with
excellent
product
recommendations
and
aggressive,
responsive
pricing
against
competitors.
Ever
wonder
why
an
internet
search
for
a
product
reveals
similar
prices
across
competitors,
or
why
surge
pricing
occurs?
Supply
chain
With
companies
trying
to
stay
lean
with
just-in-time
practices,
it’s
important
to
understand
real-time
market
conditions,
delays
in
transportation,
and
raw
supply
delays,
and
adjust
for
them
as
the
conditions
are
unfolding.
Demand
for
real-time
AI
is
accelerating
Software
applications
enable
businesses
to
fuel
their
processes
and
revolutionize
the
customer
experience.
Now,
with
the
rise
of
AI,
this
power
is
becoming
even
more
evident.
AI
technology
can
autonomously
drive
cars,
fly
aircraft,
create
personalized
conversations,
and
transform
the
customer
and
business
experience
into
a
real-time
affair.
ChatGPT
and
Stable
Diffusion
are
two
popular
examples
of
how
AI
is
becoming
increasingly
mainstream.
With
organizations
looking
for
increasingly
sophisticated
ways
to
employ
AI
capabilities,
data
becomes
the
foundational
energy
source
for
such
technology.
There
are
plenty
of
examples
of
devices
and
applications
that
drive
exponential
growth
with
streaming
data
and
real-time
AI:
-
Intelligent
devices,
sensors,
and
beacons
are
used
by
hospitals,
airports,
and
buildings,
or
even
worn
by
individuals.
Devices
like
these
are
becoming
ubiquitous
and
generate
data
24/7.
This
has
also
accelerated
the
execution
of
edge
computing
solutions
so
compute
and
real-time
decisioning
can
be
closer
to
where
the
data
is
generated. -
AI
continues
to
transform
customer
engagements
and
interactions
with
chatbots
that
use
predictive
analytics
for
real-time
conversations. -
Augmented
or
virtual
reality,
gaming,
and
the
combination
of
gamification
with
social
media
leverages
AI
for
personalization
and
enhancing
online
dynamics. -
Cloud-native
apps,
microservices
and
mobile
apps
drive
revenue
with
their
real-time
customer
interactions.
It’s
clear
how
these
real-time
data
sources
generate
data
streams
that
need
new
data
and
ML
models
for
accurate
decisions.
Data
quality
is
crucial
for
real-time
actions
because
decisions
often
can’t
be
taken
back.
Determining
whether
to
close
a
valve
at
a
power
plant,
offer
a
coupon
to
10
million
customers,
or
send
a
medical
alert
has
to
be
dependable
and
on-time.
The
need
for
real-time
AI
has
never
been
more
urgent
or
necessary.
Lessons
not
learned
from
the
past
Organizations
have
over
the
past
decade
put
a
tremendous
amount
of
energy
and
effort
into
becoming
data
driven
but
many
still
struggle
to
achieve
the
ROI
from
data
that
they’ve
sought.
A
2023
New
Vantage
Partners/Wavestone
executive
survey
highlights
how
being
data-driven
is
not
getting
any
easier
as
many
blue-chip
companies
still
struggle
to
maximize
ROI
from
their
plunge
into
data
and
analytics
and
embrace
a
real
data-driven
culture:
-
19.3%
report
they
have
established
a
data
culture -
26.5%
report
they
have
a
data-driven
organization -
39.7%
report
they
are
managing
data
as
a
business
asset -
47.4%
report
they
are
competing
on
data
and
analytics
Outdated
mindsets,
institutional
thinking,
disparate
siloed
ecosystems,
applying
old
methods
to
new
approaches,
and
a
general
lack
of
a
holistic
vision
will
continue
to
impact
success
and
hamper
real
change.
Organizations
have
balanced
competing
needs
to
make
more
efficient
data-driven
decisions
and
to
build
the
technical
infrastructure
to
support
that
goal.
While
big
data
technologies
like
Hadoop
were
used
to
get
large
volumes
of
data
into
low-cost
storage
quickly,
these
efforts
often
lacked
the
appropriate
data
modeling,
architecture,
governance,
and
speed
needed
for
real-time
success.
This
resulted
in
complex
ETL
(extract,
transform,
and
load)
processes
and
difficult-to-manage
datasets.
Many
companies
today
struggle
with
legacy
software
applications
and
complex
environments,
which
leads
to
difficulty
in
integrating
new
data
elements
or
services.
To
truly
become
data-
and
AI-driven,
organizations
must
invest
in
data
and
model
governance,
discovery,
observability,
and
profiling
while
also
recognizing
the
need
for
self-reflection
on
their
progress
towards
these
goals.
Achieving
agility
at
scale
with
Kubernetes
As
organizations
move
into
the
real-time
AI
era,
there
is
a
critical
need
for
agility
at
scale.
AI
needs
to
be
incorporated
into
their
systems
quickly
and
seamlessly
to
provide
real-time
responses
and
decisions
that
meet
customer
needs.
This
can
only
be
achieved
if
the
underlying
data
infrastructure
is
unified,
robust,
and
efficient.
A
complex
and
siloed
data
ecosystem
is
a
barrier
to
delivering
on
customer
demands,
as
it
prevents
the
speedy
development
of
machine
learning
models
with
accurate,
trustworthy
data.
Kubernetes
is
a
container
orchestration
system
that
automates
the
management,
scaling,
and
deployment
of
microservices.
It’s
also
used
to
deploy
machine
learning
models,
data
streaming
platforms,
and
databases.
A
cloud-native
approach
with
Kubernetes
and
containers
brings
scalability
and
speed
with
increased
reliability
to
data
and
AI
the
same
way
it
does
for
microservices.
Real-time
needs
a
tool
and
an
approach
to
support
scaling
requirements
and
adjustments;
Kubernetes
is
that
tool
and
cloud-native
is
the
approach.
Kubernetes
can
align
a
real-time
AI
execution
strategy
for
microservices,
data,
and
machine
learning
models,
as
it
adds
dynamic
scaling
to
all
of
these
things.
Kubernetes
is
a
key
tool
to
help
do
away
with
the
siloed
mindset.
That’s
not
to
say
it’ll
be
easy.
Kubernetes
has
its
own
complexities,
and
creating
a
unified
approach
across
different
teams
and
business
units
is
even
more
difficult.
However,
a
data
execution
strategy
has
to
evolve
for
real-time
AI
to
scale
with
speed.
Kubernetes,
containers,
and
a
cloud-native
approach
will
help.
(Learn
more
about
moving
to
cloud-native
applications
and
data
with
Kubernetes
in
this
blog
post.)
Unifying
your
organization’s
real-time
data
and
AI
strategies
Data,
when
gathered
and
analyzed
properly,
provides
the
inputs
necessary
for
functional
ML
models.
An
ML
model
is
an
application
created
to
find
patterns
and
make
decisions
when
accessing
datasets.
The
application
will
contain
ML
mathematical
algorithms.
And,
once
ML
models
are
trained
and
deployed,
they
help
to
more
effectively
guide
decisions
and
actions
that
make
the
most
of
the
data
input.
So
it’s
critical
that
organizations
understand
the
importance
of
weaving
together
data
and
ML
processes
in
order
to
make
meaningful
progress
toward
leveraging
the
power
of
data
and
AI
in
real-time.
From
architectures
and
databases
to
feature
stores
and
feature
engineering,
a
myriad
of
variables
must
work
in
sync
for
this
to
be
accomplished.
ML
models
need
to
be
built,
trained,
and
then
deployed
in
real-time.
Flexible
and
easy-to-work-with
data
models
are
the
oil
that
makes
the
engine
for
building
models
run
smoothly.
ML
models
require
data
for
testing
and
developing
the
model
and
for
inference
when
the
ML
models
are
put
in
production
(ML
inference
is
the
process
of
an
ML
model
making
calculations
or
decisions
on
live
data).
Data
for
ML
is
made
up
of
individual
variables
called
features.
The
features
can
be
raw
data
that
has
been
processed
or
analyzed
or
derived.
ML
model
development
is
about
finding
the
right
features
for
the
algorithms.
The
ML
workflow
for
creating
these
features
is
referred
to
as
feature
engineering.
The
storage
for
these
features
is
referred
to
as
a
feature
store.
Data
and
ML
model
development
fundamentally
depend
on
one
another..
That’s
why
it
is
essential
for
leadership
to
build
a
clear
vision
of
the
impact
of
data-and-AI
alignment—one
that
can
be
understood
by
executives,
lines
of
business,
and
technical
teams
alike.
Doing
so
sets
up
an
organization
for
success,
creating
a
unified
vision
that
serves
as
a
foundation
for
turning
the
promise
of
real-time
AI
into
reality
.
A
real-time
AI
data
ingestion
platform
and
operational
data
store
Real-time
data
and
supporting
machine
learning
models
are
about
data
flows
and
machine-learning-process
flows.
Machine
learning
models
require
quality
data
for
model
development
and
for
decisioning
when
the
machine
learning
models
are
put
in
production.
Real-time
AI
needs
the
following
from
a
data
ecosystem:
-
A
real-time
data
ingestion
platform
for
messaging,
publish/subscribe
(“pub/sub”
asynchronous
messaging
services),
and
event
streaming -
A
real-time
operational
data
store
for
persisting
data
and
ML
model
features -
An
aligned
data
ingestion
platform
for
data
in
motion
and
an
operational
data
store
working
together
to
reduce
the
data
complexity
of
ML
model
development -
Change
data
capture
(CDC)
that
can
send
high-velocity
database
events
back
into
the
real-time
data
stream
or
in
analytics
platforms
or
other
destinations. -
An
enterprise
data
ecosystem
architected
to
optimize
data
flowing
in
both
directions.
DataStax
Let’s
start
with
the
real-time
operational
data
store,
as
this
is
the
central
data
engine
for
building
ML
models.
A
modern
real-time
operational
data
store
excels
at
integrating
data
from
multiple
sources
for
operational
reporting,
real-time
data
processing,
and
support
for
machine
learning
model
development
and
inference
from
event
streams.
Working
with
the
real-time
data
and
the
features
in
one
centralized
database
environment
accelerates
machine
learning
model
execution.
Data
that
takes
multiple
hops
through
databases,
data
warehouses,
and
transformations
moves
too
slow
for
most
real-time
use
cases.
A
modern
real-time
operational
data
store
(Apache
Cassandra®
is
a
great
example
of
a
database
used
for
real-time
AI
by
the
likes
of
Apple,
Netflix,
and
FedEx)
makes
it
easier
to
integrate
data
from
real-time
streams
and
CDC
pipelines.
Apache
Pulsar
is
an
all-in-one
messaging
and
streaming
platform,
designed
as
a
cloud-native
solution
and
a
first
class
citizen
of
Kubernetes.
DataStax
Astra
DB,
my
employer’s
database-as-a-service
built
on
Cassandra,
runs
natively
in
Kubernetes.
Astra
Streaming
is
a
cloud-native
managed
real-time
data
ingestion
platform
that
completes
the
ecosystem
with
Astra
DB.
These
stateful
data
solutions
bring
alignment
to
applications,
data,
and
AI.
The
operational
data
store
needs
a
real-time
data
ingestion
platform
with
the
same
type
of
integration
capabilities,
one
that
can
ingest
and
integrate
data
from
streaming
events.
The
streaming
platform
and
data
store
will
be
constantly
challenged
with
new
and
growing
data
streams
and
use
cases,
so
they
need
to
be
scalable
and
work
well
together.
This
reduces
the
complexity
for
developers,
data
engineers,
SREs,
and
data
scientists
to
build
and
update
data
models
and
ML
models.
A
real-time
AI
ecosystem
checklist
Despite
all
the
effort
that
organizations
put
into
being
data-driven,
the
New
Vantage
Partners
survey
mentioned
above
highlights
that
organizations
still
struggle
with
data.
Understanding
the
capabilities
and
characteristics
for
real-time
AI
is
an
important
first
step
toward
designing
a
data
ecosystem
that’s
agile
and
scalable.
Here
is
a
set
of
criteria
to
start
with:
-
A
holistic
strategic
vision
for
data
and
AI
that
unifies
an
organization -
A
cloud-native
approach
designed
for
scale
and
speed
across
all
components -
A
data
strategy
to
reduce
complexity
and
breakdown
silos -
A
data
ingestion
platform
and
operational
data
store
designed
for
real-time -
Flexibility
and
agility
across
on-premises,
hybrid-cloud,
and
cloud
environments -
Manageable
unit
costs
for
ecosystem
growth
Wrapping
up
Real-time
AI
is
about
making
data
actionable
with
speed
and
accuracy.
Most
organizations’
data
ecosystems,
processes
and
capabilities
are
not
prepared
to
build
and
update
ML
models
at
the
speed
required
by
the
business
for
real-time
data.
Applying
a
cloud-native
approach
to
applications,
data,
and
AI
improves
scalability,
speed,
reliability,
and
portability
across
deployments.
Every
machine
learning
model
is
underpinned
by
data.
A
powerful
datastore,
along
with
enterprise
streaming
capabilities
turns
a
traditional
ML
workflow
(train,
validate,
predict,
re-train
…)
into
one
that
is
real-time
and
dynamic,
where
the
model
augments
and
tunes
itself
on
the
fly
with
the
latest
real-time
data.
Success
requires
defining
a
vision
and
execution
strategy
that
delivers
speed
and
scale
across
developers,
data
engineers,
SREs,
DBAs,
and
data
scientists.
It
takes
a
new
mindset
and
an
understanding
that
all
the
data
and
ML
components
in
a
real-time
data
ecosystem
have
to
work
together
for
success.
Special
thanks
to
Eric
Hale
at
DataStax,
Robert
Chong
at
Employers
Group,
and
Steven
Jones
of
VMWare
for
their
contributions
to
this
article.
Learn
how
DataStax
enables
real-time
AI.
About
George
Trujillo:
George
is
principal
data
strategist
at
DataStax.
Previously,
he
built
high-performance
teams
for
data-value
driven
initiatives
at
organizations
including
Charles
Schwab,
Overstock,
and
VMware.
George
works
with
CDOs
and
data
executives
on
the
continual
evolution
of
real-time
data
strategies
for
their
enterprise
data
ecosystem.