Tag: docker
Extract Text from Your PDF and Image Files with Apache Tika
With AI becoming increasingly popular in everything, and retrieval-augmented generation (RAG) becoming a requirement in everyone's organization, how you're providing context to the AI tools becomes important.
Some of the more popular subscription model tools like ChatGPT accept images and files in the prompts, at which point it can decipher what's in those files, but many of the local models and tools can only work with plaintext.
There have been a few personal scenarios where I've needed to work with PDFs in my AI tools. Such examples include:
- The LiveKit voice agent demo I created that accepts resumes and job descriptions in any document format.
- The self-hosted Open WebUI chat tool that lets the user create a knowledgebase from various document formats.
So how do we make it work?
Take a look at Apache Tika, an open source tool that extracts metadata and text from popular file formats and returns it as plaintext. In this short tutorial, we're going to see how to deploy Apache Tika and watch it work its magic.
Read MoreRunning MongoDB in Docker - A Complete Guide with Examples
So you're looking to self-host MongoDB or start dabbling with it in a local setting? There are a few options to get started if you don't want to jump directly into MongoDB Atlas, one of those options being containers with Docker. Making use of Docker is a solid choice when managing your MongoDB instance because it doesn't take more than a minute to do and it is easy to maintain or move between host computers.
In this article, we're going to see a few approaches toward deploying MongoDB with Docker and explore a few tips and tricks along the way.
Read MoreEasy Automated Docker Volume Backups That Are Database Friendly
I recently picked up a Beelink EQR6 Mini PC to reduce some of the Docker stress on my aging Synology NAS. Since my Synology used the Btrfs filesystem, I never had to worry about file locks and corruption during a backup because that particular file system used copy-on-write (CoW). However, since I decided to use Ubuntu Server on my Mini PC and neglected choosing which filesystem I wanted to use, I ended up with ext4.
Here's the problem though.
The ext4 filesystem does not support copy-on-write. This means that if I tried to make backups of my Docker volumes, I'd run the risk of file corruption if those files were in use at the time of backup. This is particularly more of a problem with Docker volumes that contain SQLite databases with write-ahead log (WAL) or databases in general.
There's good news though! There are a few automated solutions for safe backups of Docker volumes that can be used with minimal effort.
Read MoreLocal Development with the MongoDB Atlas CLI and Docker
Need a consistent development and deployment experience as developers work across teams and use different machines for their daily tasks? That is where Docker has you covered with containers. A common experience might include running a local version of MongoDB Community in a container and an application in another container. This strategy works for some organizations, but what if you want to leverage all the benefits that come with MongoDB Atlas in addition to a container strategy for your application development?
In this tutorial we'll see how to create a MongoDB-compatible web application, bundle it into a container with Docker, and manage creation as well as destruction for MongoDB Atlas with the Atlas CLI during container deployment.
Read MoreGet Hyped: Using Docker + Go with MongoDB
In the developer community, ensuring your projects run accurately regardless of the environment can be a pain. Whether it’s trying to recreate a demo from an online tutorial or working on a code review, hearing the words, "Well, it works on my machine…" can be frustrating. Instead of spending hours debugging, we want to introduce you to a platform that will change your developer experience: Docker.
Docker is a great tool to learn because it provides developers with the ability for their applications to be used easily between environments, and it's resource-efficient in comparison to virtual machines. This tutorial will gently guide you through how to navigate Docker, along with how to integrate Go on the platform. We will be using this project to connect to our previously built MongoDB Atlas Search Cluster made for using Synonyms in Atlas Search. Stay tuned for a fun read on how to learn all the above while also expanding your Gen-Z slang knowledge from our synonyms cluster. Get hyped!
Read MoreTPDP Episode #33: Containers, Virtual Machines, and Orchestration, Part 1
I'm pleased to announce that Containers, Virtual Machines, and Orchestration has been published to all of the popular podcast networks. This is the 33rd episode of the show and the first two-part episode to make an appearance.
This episode features Marek Sadowski from IBM and dives into the DevOps space, focusing particularly on deployment strategies such as virtual machines and containers, and how to orchestrate potentially massive amounts of them in an efficient and automated fashion.
If you're not quite comfortable with Docker, virtual machines, Kubernetes (K8s), and similar, this is an episode you should listen to, since all are very relevant and necessary skills for developers.
Read MoreFix GLIBCXX Errors From Serverless Framework And AWS Lambda
While I haven't done too much with Serverless Framework and Functions as a Service (Faas) recently, I did in the past and it isn't something that I've forgotten. In the past I demonstrated how to deploy Node.js functions to Amazon Web Services (AWS) Lambda that contain native dependencies. While not a necessity for all Lambda functions, it is for functions that use libraries for specific operating systems and architectures. For example, my previous article titled, Use AWS Lambda and API Gateway with Node.js and Couchbase NoSQL, fell into this situation. Making use of an EC2 instance or a Docker container with Amazon Linux will help most of the time, but there are scenarios where a little bit extra must be done to accomplish the task.
In certain circumstances everything may package and deploy correctly, but still throw errors. For example, a common error is around libstdc++ and a version of GLIBCXX not being found.
In this tutorial we're going to see how to resolve library errors that might not be caught in a typical packaging and deployment scenario with Serverless Framework and AWS Lambda.
Read More