Python OCR PyTesseract Keyboard Automation
MonkeyType Bot

Project Overview

A sophisticated typing automation bot that uses optical character recognition (OCR) to read text from the screen and automatically type it with configurable speed, simulating human typing on MonkeyType or similar typing test websites.

Completed: 2022
Status: Completed
Type: Typing Automation

Project Description

The MonkeyType Bot is designed to automate typing tests on websites like MonkeyType using optical character recognition (OCR) technology. The bot captures a specified area of the screen, uses PyTesseract to extract text from the image, and then automatically types the detected text with customizable typing speed.

Unlike simpler typing automation tools, this bot uses actual screen content recognition, allowing it to work with dynamic content and different typing test platforms. The typing speed can be adjusted to achieve realistic human-like typing or superhuman performance.

Technical Implementation

The bot is implemented in Python and uses several key libraries:

monkeytype_bot.py
Copied to clipboard!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

import pytesseract

from PIL import ImageGrab

from pynput import keyboard

import time

x = 45 # you may have to reposition the points

y = 545 # coordinates can be found by another script

width = 1030

height = 40

enter_pressed = False

scan_completed = False

def extract_text_from_screen(x, y, width, height):

screenshot = ImageGrab.grab(bbox=(x, y, x + width, y + height))

screenshot = screenshot.convert("L")

extracted_text = pytesseract.image_to_string(screenshot)

return extracted_text

def replace_enter_with_space(text):

return text.replace("\n", " ")

def type_text(text):

for char in text:

keyboard.Controller().type(char)

time.sleep(0.09) # Delay of 0.09 seconds between keystrokes

def scan_and_type():

global scan_completed

scan_x = 45

scan_y = 495

scan_width = 1030

scan_height = 40

extracted_text = extract_text_from_screen(scan_x, scan_y, scan_width, scan_height)

modified_text = replace_enter_with_space(extracted_text)

type_text(modified_text)

scan_completed = True

def on_press(key):

global enter_pressed, scan_completed

if key == keyboard.Key.space:

enter_pressed = False

if scan_completed:

extracted_text = extract_text_from_screen(x, y, width, height)

modified_text = replace_enter_with_space(extracted_text)

type_text(modified_text)

else:

scan_and_type()

elif key == keyboard.Key.enter:

enter_pressed = True

return False

def on_release(key):

pass

def listen_for_input():

with keyboard.Listener(on_press=on_press, on_release=on_release) as listener:

listener.join()

listen_for_input()

Key Features

  • Optical Character Recognition (OCR) to extract text from screen
  • Configurable typing speed to simulate human typing
  • Keyboard shortcuts for control (Space to start, Enter to exit)
  • Automatic text formatting (newline to space conversion)
  • Progressive scanning of text as typing test continues

Development Process

The development of this bot involved several key steps:

  1. Researching OCR technologies and selecting PyTesseract for text extraction
  2. Determining optimal screen regions for text capture on MonkeyType
  3. Implementing image processing for better OCR accuracy
  4. Creating keyboard simulation using pynput
  5. Adding keyboard shortcuts for easy control
  6. Testing and fine-tuning typing speed for optimal performance

Challenges & Solutions

  • Challenge: OCR accuracy with different fonts and text styles
    Solution: Converting screenshots to grayscale and adjusting image contrast for better recognition
  • Challenge: Timing delays for natural typing simulation
    Solution: Fine-tuned typing delays to balance between speed and realism
  • Challenge: Handling different screen resolutions and website layouts
    Solution: Added configurable screen coordinates for text capture regions

Results & Impact

The MonkeyType Bot successfully automates typing tests with impressive results. It can achieve typing speeds of over 100 words per minute with near-perfect accuracy. Beyond demonstrating automation capabilities, this project showcases practical applications of OCR technology and keyboard simulations.

The bot can be used as a tool for testing typing websites, demonstrating automation principles, or simply as a fun way to showcase programming skills. It serves as an educational example of how computer vision can be applied to automate everyday tasks.

Future Improvements

  • Implement pattern-based typing speed variations to better mimic human typing
  • Add support for multiple typing test platforms with dynamic layout detection
  • Create a graphical user interface for easier configuration and monitoring
  • Implement error handling for occasional typos to appear more realistic

Go Back