VLM Challenge

Challenge Overview

You are given images representing the final state of a completed Cascadia board game played by three players.

Your task is to determine whether a Vision–Language Model (VLM) system can correctly understand the game state and compute final scores based on the official scoring rules.

Task

Using the provided images as input, answer the following:

How many of each animal token is present for each player
What is the score associated with each animal type
The final total score for each player
Who won the game

Bonus (optional):

Identify and count habitat tiles
Reason about habitat-based scoring if applicable

Inputs

Assume your inputs are:

Images showing the final board positions of all three players
Images of the wildlife scoring cards used in the game

These images represent the complete and only source of truth.

Final board positions of all three players. This image is the primary input to the challenge.

Scoring Rules Reference

The following wildlife scoring cards were used to calculate victory points in this game.

Wildlife scoring cards showing point calculation rules for bear, elk, hawk, salmon, and fox

You may refer to publicly available resources to understand the rules of Cascadia:

Written rules:
https://whatsericplaying.com/2020/09/14/cascadia/
Video walkthrough (reference only):

Game Reference

Note: External resources are provided only to understand the game rules.
The final board images and scoring card images above are the only inputs to your solution.

Expected Output

Your system should produce:

Animal counts per player
Score breakdown by animal type
Final total score for each player
Winner determination

Outputs should be clearly explained and verifiable.

Example Cascadia scoring sheet showing animal and habitat score breakdown

Approach

You may choose any approach, including but not limited to:

Using existing VLMs (e.g. Gemini, ChatGPT, Claude)
Combining vision models with prompting and post-processing
Building or fine-tuning your own model
Applying rule-based logic on top of VLM outputs

There are no restrictions on tools, libraries, or frameworks.

Evaluation Criteria

Submissions will be evaluated based on:

Correct interpretation of visual inputs
Accuracy of animal identification and counting
Correct application of scoring rules
Clarity of reasoning and assumptions
Reproducibility of results

Partial solutions are acceptable if clearly explained.

Deliverables

Submit your work in any reproducible format, such as:

Google Colab notebook
GitHub repository with a README
ZIP file containing code and instructions
Shared VLM prompt conversations (Gemini / Claude / ChatGPT), with explanations

Submission Instructions

Send your completed assignment to:

📧 careers@phronetic.ai

Email subject:
VLM Challenge Submission for Machine Learning Scientist

Include:

Link(s) to your work
A short explanation of your approach
Any assumptions made

Time Expectation

Estimated effort: 4-6 hours
No fixed deadline unless communicated separately

Notes

This challenge is the first step in the Machine Learning Scientist interview process
Focus on correctness and reasoning rather than UI or presentation
Clearly state any assumptions you make

FAQs

Everything you need to know about the VLM challenge, submission process, and evaluation.

Is this challenge mandatory to apply for the Machine Learning Scientist role?

Yes. Completing this challenge is the first step in the Machine Learning Scientist hiring process at Phronetic AI.

Is there a deadline to submit the challenge?

There is no fixed deadline unless communicated separately. You may submit once your solution is complete.

How much time is this challenge expected to take?

Most candidates typically spend 4–6 hours. We value clarity of reasoning and correctness over speed.

Can I use existing Vision–Language Models or APIs?

Yes. You may use any existing VLMs or APIs (e.g., Gemini, ChatGPT, Claude), build your own model, or use a hybrid approach.

Can I use external resources to understand the game rules?

Yes. Public resources may be used only to understand the rules of Cascadia.
The only inputs to your solution should be the images provided on this page.

Is it acceptable to submit a partial solution?

Yes. Partial solutions are acceptable if you clearly explain what works, what does not, and any assumptions made.

What format should I submit my solution in?

You may submit a Google Colab notebook, GitHub repository, ZIP file with instructions, or shared VLM prompt conversations. All submissions must be reproducible.

How will submissions be evaluated?

We evaluate submissions based on correctness, reasoning quality, clarity of assumptions, and reproducibility, not on presentation or UI polish.

Who can I contact if I have questions?

Please email careers@phronetic.ai for any questions related to this challenge.